Re: custom segment files

2009-09-18 Thread John Wang
Thank you very much Michael for the information! -John On Fri, Sep 18, 2009 at 6:01 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Say you have a type of field with fixed length data per doc, e.g. a > > 8 bytes. > > OK this makes sense -- thanks for the example! This sounds like

Re: custom segment files

2009-09-18 Thread Michael McCandless
> Say you have a type of field with fixed length data per doc, e.g. a > 8 bytes. OK this makes sense -- thanks for the example! This sounds like getting column-stride-fields before that feature is added to Lucene "for real". For flushing, you can plugin your own indexing chain to IndexWriter. Th

Re: custom segment files

2009-09-18 Thread Earwin Burrfoot
I bet custom per-segment files could very well be used for per-segment userdata/debuginfo we introduced earlier. With them it could be stored neatly in a separate file instead of being grafted onto current ones. On Thu, Sep 17, 2009 at 18:35, Michael McCandless wrote: > I'm actively working on LU

Re: custom segment files

2009-09-17 Thread Jason Rutherglen
Yes, I guess you could branch the code? It probably doesn't need to be final Mike? On Thu, Sep 17, 2009 at 7:16 PM, John Wang wrote: > Hi Michael: > > Is there a wiki or some sort of write up on LUCENE-1458? It looks > extremely cool! > > Re: Jason: isn't flush final? > > -John > > On Fri,

Re: custom segment files

2009-09-17 Thread Marvin Humphrey
On Fri, Sep 18, 2009 at 08:14:24AM +0800, John Wang wrote: > Say you have a type of field with fixed length data per doc, e.g. a 8 bytes. > It might be good to store in a segment: > Heh. You've just described this proof of concept class: http://www.rectangular.com/kinosearch/docs/deve

Re: custom segment files

2009-09-17 Thread John Wang
Hi Michael: Is there a wiki or some sort of write up on LUCENE-1458? It looks extremely cool! Re: Jason: isn't flush final? -John On Fri, Sep 18, 2009 at 9:09 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > I believe you could override the IW.flush and IW.mergeSuccess > method

Re: custom segment files

2009-09-17 Thread Jason Rutherglen
I believe you could override the IW.flush and IW.mergeSuccess methods. flush unfortunately doesn't expose the new SegmentInfo, however it could be obtained via IW.getReader().getSequentialSubReaders (by comparing the before and after). Adjacent segment files could then be maintained without hackin

Re: custom segment files

2009-09-17 Thread John Wang
Sure. A simple example: Say you have a type of field with fixed length data per doc, e.g. a 8 bytes. It might be good to store in a segment: so if you have 1000 docs, your seg file is 8k+4 bytes. Merging would be rather trivial as well. Doing this right now involves storing into payload,

Re: custom segment files

2009-09-17 Thread Michael McCandless
I'm actively working on LUCENE-1458, to enable differenct codecs for reading/writing the terms dict and doc/freq/prox/payload postings. I'm working now towards getting PforDelta working... However, that change doesn't [yet] do anything for norms, stored fields nor term vectors. Can you describe m