Re: Flexible index format / Payloads Cont'd

2006-10-08 Thread Nicolas Lalevée
Le Samedi 05 Août 2006 09:54, Nicolas Lalevée a écrit : Le Jeudi 3 Août 2006 21:49, Marvin Humphrey a écrit : On Jul 31, 2006, at 8:25 AM, Nicolas Lalevée wrote: That looks good, but there is one restriction : it have to be per document. Yes, what I laid out was per-document - for

Re: Flexible index format / Payloads Cont'd

2006-08-05 Thread Nicolas Lalevée
Le Jeudi 3 Août 2006 21:49, Marvin Humphrey a écrit : On Jul 31, 2006, at 8:25 AM, Nicolas Lalevée wrote: That looks good, but there is one restriction : it have to be per document. Yes, what I laid out was per-document - for each document, the fdx file would keep a file pointer and an

Re: Flexible index format / Payloads Cont'd

2006-07-31 Thread robert engels
Doing this beak compatibility with non-Java Lucene implementations. Not sure it matters, but I thought I would point it out. I have always thought that Lucene should be compatible at an API level only, and MAYBE create a network access protocol for queries and updates. On Jul 31, 2006, at

Re: Flexible index format / Payloads Cont'd

2006-07-21 Thread Nicolas Lalevée
Le Jeudi 20 Juillet 2006 22:18, Marvin Humphrey a écrit : On Jul 19, 2006, at 10:26 AM, Nicolas Lalevée wrote: Then I looked deeper in the Lucene file format, and I manage to introduce some generic field metadata without breaking the file format compatibility. I just used another bit of

Re: Flexible index format / Payloads Cont'd

2006-07-21 Thread Marvin Humphrey
On Jul 21, 2006, at 1:23 AM, Nicolas Lalevée wrote: In fact, that was my first implementaion. The problem with that is you can only store one value. But thinking a little more about it, storing one or more value is not an issue, because with the solution I proposed, no space is saved at

Re: Flexible index format / Payloads Cont'd

2006-07-20 Thread Marvin Humphrey
On Jul 19, 2006, at 10:26 AM, Nicolas Lalevée wrote: Then I looked deeper in the Lucene file format, and I manage to introduce some generic field metadata without breaking the file format compatibility. I just used another bit of the Bits to mark that there is or not some metadata on the

Re: Flexible index format / Payloads Cont'd

2006-07-19 Thread Nicolas Lalevée
Le Mercredi 05 Juillet 2006 13:23, Michael Busch a écrit : Doug Cutting wrote: Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. Perhaps we can

Proximity-enhanced boolean scoring (was: Re: Flexible index format / Payloads Cont'd)

2006-07-06 Thread Nadav Har'El
On Wed, Jul 05, 2006, Paul Elschot wrote about Re: Flexible index format / Payloads Cont'd: Ok, then, I thought to myself - the normal queries and scorers only work on the document level and don't use positions - but SpanQueries have positions so I can create some sort

Re: Flexible index format / Payloads Cont'd

2006-07-05 Thread Paul Elschot
On Tuesday 04 July 2006 23:51, Nadav Har'El wrote: ... The problem is that Scorer, and it's implementations - BooleanScorer2, DisjunctionSumScorer and ConjunctionScorer - only work on the document level. Scorer has next() and skipTo(), but no way to view positions inside the document. If you

Re: Flexible index format / Payloads Cont'd

2006-07-05 Thread Michael Busch
Doug Cutting wrote: Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. Perhaps we can implement an all-new index format, in a new package. An

Re: Flexible index format / Payloads Cont'd

2006-07-05 Thread Doug Cutting
Michael Busch wrote: I would like to help working on a new index format. Who else is going to work on it? The folks working on Lucy are probably interested (Marvin David). Perhaps the first thing should be to specify the file format, then implement it both in Java (for Lucene Java) and C

Re: Flexible index format / Payloads Cont'd

2006-07-04 Thread Doug Cutting
Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. Perhaps we can implement an all-new index format, in a new package. An implementation of

Re: Flexible index format / Payloads Cont'd

2006-07-04 Thread Marvin Humphrey
On Jul 4, 2006, at 3:35 AM, Doug Cutting wrote: Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. Perhaps we can implement an all-new index

Re: Flexible index format / Payloads Cont'd

2006-07-04 Thread Nadav Har'El
On Fri, Jun 30, 2006, Marvin Humphrey wrote about Re: Flexible index format / Payloads Cont'd: On Thu, Jun 29, 2006, Marvin Humphrey wrote about Re: Flexible index format / Payloads Cont'd: * Improve IR precision, by writing a Boolean Scorer that takes position into account, a la Brin

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Michael Busch
Marvin Humphrey wrote: Personally, I'm less interested in adding new features than I am in solidifying and improving the core. The benefits I care about are: * Decouple Lucene from it's file format. o Make back-compatibility easier. o Make refactoring easier. o All the other

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Nadav Har'El
On Thu, Jun 29, 2006, Marvin Humphrey wrote about Re: Flexible index format / Payloads Cont'd: * Improve IR precision, by writing a Boolean Scorer that takes position into account, a la Brin/Page '98. Yes, I'd love to see that too (and it doesn't even require any new payloads support

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Daniel John Debrunner
Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. For newcomers to the project is there a link to these commitments? I looked aorund the Lucene site

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Marvin Humphrey
On Jun 30, 2006, at 6:32 AM, Daniel John Debrunner wrote: Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. For newcomers to the project is

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Marvin Humphrey
On Jun 30, 2006, at 6:07 AM, Nadav Har'El wrote: On Thu, Jun 29, 2006, Marvin Humphrey wrote about Re: Flexible index format / Payloads Cont'd: * Improve IR precision, by writing a Boolean Scorer that takes position into account, a la Brin/Page '98. Yes, I'd love to see that too

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Daniel John Debrunner
Marvin Humphrey wrote: On Jun 30, 2006, at 6:32 AM, Daniel John Debrunner wrote: Marvin Humphrey wrote: IMO, this should wait. It's going to be freakishly difficult to get this stuff to work and maintain the commitments that Doug has laid out for backwards compatibility. For

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Marvin Humphrey
On Jun 30, 2006, at 1:55 AM, Michael Busch wrote: So adding this payload feature to the Lucene core for a release 2.X is not a big risk in my opinion for the following reasons: - API only extended - Lucene 2.X will be able to read an index created with an earlier version, because the

Flexible index format / Payloads Cont'd

2006-06-29 Thread Michael Busch
Hi everyone, I'm working for IBM and started recently looking into Lucene. I am very interested in the topic flexible indexing / payloads, that was discussed a couple of times in the last two months. I did some investigation in the mailing lists, and found several threads about this topic. Those