Forwarding this to java-dev per request. Seems like the best place
to discuss this topic.
Erik
Begin forwarded message:
From: John Wang [EMAIL PROTECTED]
Date: October 17, 2007 5:43:29 PM EDT
To: [EMAIL PROTECTED]
Subject: lucene indexing and merge process
Hi Erik:
We are
Make all documents have a term, say ID:UID, and for each document,
store its UID in the term's payload. You can read off this posting
list to create your array. Will this work for you, John?
Cheers,
Ning
On 10/18/07, Erik Hatcher [EMAIL PROTECTED] wrote:
Forwarding this to java-dev per
Erik Hatcher wrote:
2) Load/Warmup the FieldCache (for large corpus, loading up the
indexreader can be slow)
With the new IndexReader#reopen(), the cost of opening a new IndexReader
is much reduced. However, loading a FieldCache is not that much faster,
so that may or may not be enough to
Hoss has worked on a new FieldCache implementation that should address
this if finished and used with the new reopen. I have been meaning to
look at it in greater detail myself, but havn't gotten at it. It sounds
as if he has been a bit too busy to be be able to work on it himself. It
would
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535970
]
Yonik Seeley commented on LUCENE-743:
-
{quote}
A reader which is being used for deletes or setting norms is
This is what I do with general search caches. It works very well. I
think the same approach would work great with the field cache.
I do think though that we might want direct support for this - using
a fixed length field file (per segment).
E.g. so that you would configure keys with n
Hi Robert:
Say I have a hit set of 1000 docids, would I need to go to the disk for
each docid to get the key?
Thanks
-John
On 10/18/07, robert engels [EMAIL PROTECTED] wrote:
This is what I do with general search caches. It works very well. I
think the same approach would work great
Hi Ning:
That is essentially what field cache does. Doing this for each docid in
the result set will be slow if the result set is large. But loading it in
memory when opening index can also be slow if the index is large and updates
often.
Thanks
-John
On 10/18/07, Ning Li [EMAIL PROTECTED]
robert engels wrote:
seek (segment doc no * keylength), read (byte[keylength])
This would be very efficient when using external document storage.
A seek per document in hits is to be avoided. This is similar to the
way field data is stored, which is, as mentioned in the first message
very
True, but what is the other option except loading all of them in memory?
On Oct 18, 2007, at 11:57 AM, Doug Cutting wrote:
robert engels wrote:
seek (segment doc no * keylength), read (byte[keylength])
This would be very efficient when using external document storage.
A seek per document in
As a follow-up, it seemed that in the past much of Lucene relied on
the OS disk cache for performance. The FieldCache seems to go
against this, probably because of the parsing involved.
The 'fixed-length' key file would not need extensive parsing, and
thus seems more suitable for OS level
robert engels wrote:
True, but what is the other option except loading all of them in memory?
Loading them into memory is the FieldCache approach. It is effective in
many cases. If there's not enough memory, then Ning's proposal might
provide a middle ground: efficient sequential access
I see what you mean by 2) now. What Mark said should work for you when
it's done.
Cheers,
Ning
On 10/18/07, John Wang [EMAIL PROTECTED] wrote:
Hi Ning:
That is essentially what field cache does. Doing this for each docid in
the result set will be slow if the result set is large. But
Roy,
Thanks for the review and comments. My comments inline below.
Roy Ward wrote:
(1) You only added timeouts to:
public TopDocs search(Weight weight, Filter filter, final int nDocs)
It's confusing if timeout functionality is not also added to:
public TopFieldDocs search(Weight
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536028
]
Michael Busch commented on LUCENE-743:
--
Hmm one other thing: how should IndexReader.close() work? If we re-open
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536033
]
Michael McCandless commented on LUCENE-743:
---
I think reference counting would solve this issue quite
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536041
]
Yonik Seeley commented on LUCENE-743:
-
When it is closed, it decrefs the RC and marks itself closed (to make
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536044
]
Michael Busch commented on LUCENE-743:
--
{quote}
The implementation seems simple. When a reader is opened, it
[
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536063
]
Michael McCandless commented on LUCENE-743:
---
But if a reader is shared, how do you tell two real closes
Hi,
I'm experimenting with a few different scoring implementations and I
was wondering what the easiest way would be to incorporate a new
scorer into a searcher implementation.
From reading the docs on Scoring at:
Sean Timm wrote:
(2) Estimating the the number of results snip
Is there a test case that shows this breakage, or can you point me to
the code in Hits.java that my patch causes problems with? Sorry, I'm
not seeing it.
In the case of no hits at all getting returned, the following code:
21 matches
Mail list logo