[jira] Updated: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-08-18 Thread Michael Busch (JIRA)
ex >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.1 > > > This new feature has been proposed and discussed here: > http://markmail.org/search/?q=per-document+payloads#query:per-document%20pa

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-09 Thread Michael McCandless (JIRA)
rse the OS hasn't swapped your RAM out to your SSD ;). > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-1231 >

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Marvin Humphrey (JIRA)
that significant for systems where the index fits into RAM, or when the persistant storage device is an SSD. And of course a different caching strategy altogether (popular document caching) is best for dedicated doc servers.

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Earwin Burrfoot (JIRA)
n it. Loading a mix of cached/uncached fields is massive win, it becomes even more massive if all required fields happen to be cached. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-12

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-08 Thread Michael McCandless (JIRA)
ient to call document() somewhere and get all fields back. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-123

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Michael Busch (JIRA)
ents: Index >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > > This new feature has been proposed and discussed here: > http://markmail.org/search/?q=per-document+payloads#query:per-docu

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Jason Rutherglen (JIRA)
ense. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-1231 > Project: Lucene - Java >

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-07 Thread Michael McCandless (JIRA)
becomes). Ie, when possible, that method should maybe pull from CSFs for values. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-04 Thread Michael McCandless (JIRA)
and throw exceptions if you don't consume the number of bytes you should consume. {quote} I generally prefer liberal use of asserts to trip bugs like this, instead of explicit strongly divoced code paths / classes / modes etc., containing real if stateme

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-04-03 Thread Michael Busch (JIRA)
e DataInput/Output patch. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-1231 > Project: Lucene - Java >

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2009-03-24 Thread Michael Busch (JIRA)
or 3.0 to overhaul the document/field/fieldinfos APIs. I have some ideas which I started hacking during a long flight. I'll try to summarize the ideas/goals I'd have for such a new API and send it to java-dev. > Column-stride fields (aka

[jira] Updated: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-08-18 Thread Michael Busch (JIRA)
shouldn't block the 2.4 release. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-1231 > Project: Lucene - Jav

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-26 Thread Michael McCandless (JIRA)
these "combinations". But I haven't wrapped my brain around what all this will entail... it's a biggie! {quote} BTW, setTermPositions(TermPositions) and setTermDocs(TermDocs) might be a reasonable API for updating sparse fields. {quote} I like that!

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-26 Thread Doug Cutting (JIRA)
e.g., no-freqs, no-positions and (perhaps) updateable. BTW, setTermPositions(TermPositions) and setTermDocs(TermDocs) might be a reasonable API for updating sparse fields. > Column-stride fields (aka per-document Payloads) > > >

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-26 Thread Michael McCandless (JIRA)
tc. > Column-stride fields (aka per-document Payloads) > > > Key: LUCENE-1231 > URL: https://issues.apache.org/jira/browse/LUCENE-1231 > Project: Lucene - Java > Issu

[jira] Commented: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-24 Thread Doug Cutting (JIRA)
ring of position and freq optional for a field? Then one could have an indexed field with a payload or boost but with no freq (or positions, since freq is required for positions). Would that be equivalent? > Column-stride fields (aka per-document P

Re: [jira] Created: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-14 Thread eks dev
t;[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Friday, 14 March, 2008 7:57:24 AM Subject: [jira] Created: (LUCENE-1231) Column-stride fields (aka per-document Payloads) Column-stride fields (aka per-document Payloads) Key: LUC

[jira] Created: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-13 Thread Michael Busch (JIRA)
Column-stride fields (aka per-document Payloads) Key: LUCENE-1231 URL: https://issues.apache.org/jira/browse/LUCENE-1231 Project: Lucene - Java Issue Type: New Feature Components

Re: Per-document Payloads

2007-10-30 Thread Ning Li
> That may be a little too seamless. We want the user to have specific > control over which fields are efficiently stored separately since they > will know how that field will be used. Maybe let users decide field families, like the column families in BigTable? --

Re: Per-document Payloads

2007-10-30 Thread Nicolas Lalevée
Le lundi 29 octobre 2007, Michael McCandless a écrit : > "Michael Busch" <[EMAIL PROTECTED]> wrote: > > Michael McCandless wrote: > > > Michael, are you thinking that the storage would/could be non-sparse > > > (like norms), and loaded/cached once in memory, especially for fixed > > > size fields?

Re: Per-document Payloads

2007-10-29 Thread Michael McCandless
"Michael Busch" <[EMAIL PROTECTED]> wrote: > Michael McCandless wrote: > > > > Michael, are you thinking that the storage would/could be non-sparse > > (like norms), and loaded/cached once in memory, especially for fixed > > size fields? EG a big array of ints of length maxDocID? In John's > >

Re: Per-document Payloads

2007-10-29 Thread Michael Busch
Michael McCandless wrote: > > Michael, are you thinking that the storage would/could be non-sparse > (like norms), and loaded/cached once in memory, especially for fixed > size fields? EG a big array of ints of length maxDocID? In John's > original case, every doc has this UID int field; I think

Re: Per-document Payloads

2007-10-29 Thread Michael McCandless
> Michael Busch wrote: > > > Doug Cutting wrote: > > > > If this is really required, perhaps it ought to appear as an > > attribute for stored fields, indicating that the field should be > > stored in a separate "column store". This would permit efficient > > enumeration of values of just that f

Re: Per-document Payloads

2007-10-28 Thread Michael Busch
Doug Cutting wrote: > > If this is really required, perhaps it ought to appear as an attribute > for stored fields, indicating that the field should be stored in a > separate "column store". This would permit efficient enumeration of > values of just that field. > Yes I was thinking about this

Re: Per-document Payloads

2007-10-22 Thread John Wang
Hi Micahel: After removing isDelete(), the index loads in 430 ms. Thanks -john On 10/21/07, Michael Busch <[EMAIL PROTECTED]> wrote: > > John Wang wrote: > > > > > Since all three methods loads docids into an int[], the lookup time is > the > > same for all three methods, what's > > differen

Re: Per-document Payloads

2007-10-22 Thread Doug Cutting
next term, seek to the current position in a file, etc. Profiling should show if we've missed obvious optimizations for this case. I was therefore thinking about adding per-document payloads to Lucene If this is really required, perhaps it ought to appear as an attribute for stored fields,

Re: Per-document Payloads

2007-10-21 Thread Michael Busch
John Wang wrote: > > Since all three methods loads docids into an int[], the lookup time is the > same for all three methods, what's > different are the load times: > > 1) 16.5 seconds, 43 MB > 2) 590 milliseconds 32.5 MB > 3) 186 milliseconds 26MB Good analysis! Thanks for sharing th

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-21 Thread Yonik Seeley
On 10/20/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > I would think the typical use case would be you want all the > "small" fields to be returned w/ the document and the large fields to > be lazily loaded. I think it should be seamless to the user. That may be a little too seamless. We want

Re: Per-document Payloads

2007-10-21 Thread John Wang
>> I/O seeks (one for term lookup + one to open the posting list). > >> > >> In my app it took for a big index several minutes to fill the cache > like > >> that. > >> > >> To speed things up I did essentially what Ning suggested. Now I store &g

Re: Per-document Payloads

2007-10-20 Thread Marvin Humphrey
On Oct 19, 2007, at 3:53 PM, Michael Busch wrote: The next question would be how to store the per-doc payloads (PDP). If all values have the same length (as the unique docIds), then we should store them as efficiently as possible, like the norms. However, we still want to offer the flexibilit

Re: Per-document Payloads

2007-10-20 Thread Marvin Humphrey
On Oct 20, 2007, at 12:49 PM, Michael Busch wrote: In fact, what I'm proposing is a new kind of posting list. http://www.rectangular.com/pipermail/kinosearch/2007-July/001096.html Marvin Humphrey Rectangular Research http://www.rectangular.com/

Re: Per-document Payloads

2007-10-20 Thread Michael Busch
Grant Ingersoll wrote: > > Some randomly pieced together thoughts (I may not even be fully awake > yet :-) so feel free to tell me I'm not understanding this correctly) > > My first thought was how is this different from just having a binary > field, but if I understand correctly it is to be sto

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Grant Ingersoll
On Oct 20, 2007, at 10:51 AM, Yonik Seeley wrote: On 10/20/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: I think one of the questions that will come up from users is when should I use addMetadata and when should I use addField? Why make the distinction to the user? Fields have always repres

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Grant Ingersoll
https://issues.apache.org/jira/browse/LUCENE-510 is related, then, I presume On Oct 20, 2007, at 11:09 AM, Yonik Seeley wrote: On 10/20/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: What about switching from char counts to byte counts for indexed (String) fields that are stored separately?

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Yonik Seeley
On 10/20/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > What about switching from char > counts to byte counts for indexed (String) fields that are stored > separately? In fact, what about switching to byte counts for all stored fields? It should be much easier than the full-blown byte-counts for t

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Yonik Seeley
On 10/20/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > I think one of the questions that will come up from users is when > should I use addMetadata and when should I use addField? Why make > the distinction to the user? Fields have always represented > metadata, all your doing is optimizing th

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Grant Ingersoll
e stored with a fixed size, which means both random access and sequential scan are optimal. Norms are also cached in memory, and filling that cache is much faster compared to the current FieldCache approach. I was therefore thinking about adding per-document payloads to Lucene (we can also c

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-20 Thread Nicolas Lalevée
plementation, but it still can be improved. In fact, we already have a > mechanism for doing that: the norms. Norms are stored with a fixed size, > which means both random access and sequential scan are optimal. Norms > are also cached in memory, and filling that cache is much faster &g

Re: Per-document Payloads

2007-10-20 Thread Michael Busch
posting list of the specific term, then it is just a >> sequential scan to load all values. With this approach the time for >> filling the cache went down from minutes to seconds! >> >> Now this approach is already much better than the current field cache >> implementat

Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-19 Thread John Wang
that: the norms. Norms are stored with a fixed size, > which means both random access and sequential scan are optimal. Norms > are also cached in memory, and filling that cache is much faster > compared to the current FieldCache approach. > > I was therefore thinking about adding per-

Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-19 Thread Michael Busch
and sequential scan are optimal. Norms are also cached in memory, and filling that cache is much faster compared to the current FieldCache approach. I was therefore thinking about adding per-document payloads to Lucene (we can also call it document-metadata). The API could look like this: D