Hi Steve,
thanks for your reply , i know farsi is written and read right-to-left.
i am using RangeOuery class and it's rewrite(IndexReader reader) method
decides if the word is in range or not by compareTo method and this decision
is made by using unicodes.
while searching for "د-ژ" range the lo
> Even if they're in multiple indexes, the doc IDs being ints
> will still prevent
> it going past 2Gi unless you wrap your own framework around it.
Hm. Does this mean that a MultiReader has the int-limit too?
I thought that this limit applies to a single index only...
Hi there,
We're using lucene with Hibernate search and we're very happy so far
with the performance and the usability of lucene. We have however a
specific use cases that prevent us to use only lucene: spatial
queries. I already sent a mail on this list a while back about the
problem and we starte
Hi,
document's encoding is "UTF-8".
i tried the explain() method and the result for "د-ژ" range searching is:
fieldWeight(keywordIndex:ساب وو�ر in 0), product of:
1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
0.30685282 = idf(docFreq=1)
1.0 = fieldNorm(field=keywordIndex,
The issue here is a general one of trying to perform an efficient join between
an external resource (rdbms) and Lucene.
This experiment may be of interest:
http://issues.apache.org/jira/browse/LUCENE-434
KeyMap.java embodies the core service which translates from lucene doc ids to
DB primary
Hi Esra,
Going back to the original problem statement, I see something that looks
illogical to me - please correct me if I'm wrong:
On Apr 30, 2008, at 3:21 AM, esra wrote:
> i am using lucene's "IndexSearcher" to search the given xml by
> keyword which contains farsi information.
> while search
On May 1, 2008, at 4:36 AM, esra wrote:
Hi,
document's encoding is "UTF-8".
i tried the explain() method and the result for "د-ژ" range
searching is:
fieldWeight(keywordIndex:ساب وو�ر in 0),
product of:
1.0 = tf(termFreq(keywordIndex:ساب وو�ر)=1)
0.30685282 = idf(do
On Wed, Apr 30, 2008 at 10:52 PM, Rajesh parab <[EMAIL PROTECTED]> wrote:
> Can we somehow keep
> internal document id same after updating (i.e. delete
> and re-insert) index document?
No. ParallelReader is not a general solution, it's an expert-level
solution that leaves the task of keeping t
Thanks Yonik.
So, if rebuilding the second index is not an option
due to large no of documents, then ParallelReader will
not work :-(
And I believe there is no other way than
parallelReader to search across multiple indexes that
contain related data. Is there any other alternative?
I think, Multi
Stephane,
Could you describe how you setup the spatial area? Having BooleanQuery with
200 terms in it definitely slows things down (I'm not sure exactly why yet
-- it seems like it shouldn't be "that" slow). If you can describe your
spatial area in fewer terms you can get much better performance.
I am not sure why this is the case, docid is internal to the sub index. As
long as the sub index size is below 2 bil, there is no need for docid to be
long. With multiple indexes, I was thinking having an aggregater which
merges maybe only a page of search result.
Example:
sub index 1: 1 billion
From: John Wang <[EMAIL PROTECTED]>
[...]
> sub index 1: 1 billion docs
> sub index 2: 1 billion docs
> sub index 3: 1 billion docs
>
> federating search to these subindexes, you represent an index of 3
> billiondocs, and all internal doc ids are of type int.
That falls under Daniel's "...unless
That's correct, Rajesh. ParallelReader has its uses, but I guess your case is
not one of them, unless we are all missing some key aspect of PR or a trick to
make it work in your case.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Rajesh
Right. And the typical answer to that is:
- If your terms are roughly equally distributed in all N indices (e.g. random
doc->index/shard assignment), the relevance score will roughly match.
- If you have business rules for doc->index/shard distribution, then your
relevance scores will not be c
One trick I can think of is somehow keeping internal
document id of Lucene document same after document is
updated (i.e. deleted and re-inserted). I am not sure
if we have this capability in Lucene.
Regards,
Rajesh
--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> That's correct, Rajesh. Paral
Well for the moment we don't. The lucene index only contains the full
text content (indexed, not stored). We use lucene to perform full text
and fuzzy searches on the keywords field. Once we have the result, we
match them with the geospatial box provided by the user (we use Oracle
Spatial for that)
16 matches
Mail list logo