Re: Per-document Payloads

2007-10-20 Thread Michael Busch
John Wang wrote: > Hi Michael: > Thanks for the info. > > I haven't played with payloads. Can you give me an example or point me > to how it is used to solve this problem? > Hi John, I (quickly) put together a class that is able to store UIDs as payloads. I believe the type of your UI

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536409 ] Michael McCandless commented on LUCENE-743: --- It's not nearly this complex (we don't need two ref counts). I

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-10-20 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536413 ] Michael Busch commented on LUCENE-743: -- {quote} I'm assuming in your example you meant for reader2 and reader3 t

[jira] Issue Comment Edited: (LUCENE-743) IndexReader.reopen()

2007-10-20 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536353 ] michaelbusch edited comment on LUCENE-743 at 10/20/07 3:11 AM: Hi Mike, I'm not sure

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536418 ] Michael McCandless commented on LUCENE-743: --- {quote} If you do: {code:java} IndexReader reader1 = IndexRead

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-10-20 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536419 ] Michael McCandless commented on LUCENE-743: --- {quote} I think we are forced to keep this semantics, for back

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Nicolas Lalevée
Le samedi 20 octobre 2007, Michael Busch a écrit : > John Wang wrote: > > I can tried to get some numbers for leading an int[] array vs > > FieldCache.getInts(). > > I've had a similar performance problem when I used the FieldCache. The > loading performance is apparently so slow, because each

Re: lucene indexing and merge process

2007-10-20 Thread Grant Ingersoll
John, For case 1, can you describe your document structure? Do you have a lot of other fields besides the UID field? Most importantly, do you have some large fields? Did you give the FieldSelector mechanism a try? In fact, I think you may even be able to create a caching FieldSelector

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Grant Ingersoll
On Oct 19, 2007, at 6:53 PM, Michael Busch wrote: John Wang wrote: I can tried to get some numbers for leading an int[] array vs FieldCache.getInts(). I've had a similar performance problem when I used the FieldCache. The loading performance is apparently so slow, because each value is

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Yonik Seeley
On 10/20/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > I think one of the questions that will come up from users is when > should I use addMetadata and when should I use addField? Why make > the distinction to the user? Fields have always represented > metadata, all your doing is optimizing th

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Yonik Seeley
On 10/20/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > What about switching from char > counts to byte counts for indexed (String) fields that are stored > separately? In fact, what about switching to byte counts for all stored fields? It should be much easier than the full-blown byte-counts for t

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Grant Ingersoll
https://issues.apache.org/jira/browse/LUCENE-510 is related, then, I presume On Oct 20, 2007, at 11:09 AM, Yonik Seeley wrote: On 10/20/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: What about switching from char counts to byte counts for indexed (String) fields that are stored separately?

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Grant Ingersoll
On Oct 20, 2007, at 10:51 AM, Yonik Seeley wrote: On 10/20/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: I think one of the questions that will come up from users is when should I use addMetadata and when should I use addField? Why make the distinction to the user? Fields have always repres

Re: Per-document Payloads

2007-10-20 Thread Michael Busch
Grant Ingersoll wrote: > > Some randomly pieced together thoughts (I may not even be fully awake > yet :-) so feel free to tell me I'm not understanding this correctly) > > My first thought was how is this different from just having a binary > field, but if I understand correctly it is to be sto

Re: Per-document Payloads

2007-10-20 Thread Marvin Humphrey
On Oct 20, 2007, at 12:49 PM, Michael Busch wrote: In fact, what I'm proposing is a new kind of posting list. http://www.rectangular.com/pipermail/kinosearch/2007-July/001096.html Marvin Humphrey Rectangular Research http://www.rectangular.com/

Re: Per-document Payloads

2007-10-20 Thread Marvin Humphrey
On Oct 19, 2007, at 3:53 PM, Michael Busch wrote: The next question would be how to store the per-doc payloads (PDP). If all values have the same length (as the unique docIds), then we should store them as efficiently as possible, like the norms. However, we still want to offer the flexibilit