On Jun 15, 2009, at 2:11 PM, Grant Ingersoll wrote:
More questions:
1. What about Highlighter and MoreLikeThis? They have not been
converted. Also, what are they going to do if the attributes they
need are not available? Caveat emptor?
2. Same for TermVectors. What if the user specif
On 6/15/09 10:10 AM, Grant Ingersoll wrote:
But, as Michael M reminded me, it is complex, so please accept my
apologies.
No worries, Grant! I was not really offended, but rather confused...
Thanks for clarifying.
Michael
Grant Ingersoll wrote:
1. What about Highlighter
I would guess Highlighter has not been updated because its kind of a
royal * :)
--
- Mark
http://www.lucidimagination.com
-
To unsubscribe, e-mail: java-dev-unsubscr
Mark Miller wrote:
Grant Ingersoll wrote:
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about
Grant Ingersoll wrote:
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about how to
convert a st
*Sent:* Monday, June 15, 2009 10:39 PM
*To:* java-dev@lucene.apache.org
*Subject:* Re: New Token API was Re: Payloads and TrieRangeQuery
I have implemented most of that actually (the interface part and Token
implementing all of them).
The problem is a paradigm change with the new API: the assum
yeah about 5 seconds in I saw that and decided to stick with what I know :)
On Mon, Jun 15, 2009 at 5:10 PM, Mark Miller wrote:
> I may do the Highlighter. Its annoying though - I'll have to break back
> compat because Token is part of the public API (Fragmenter, etc).
>
> Robert Muir wrote:
>>
>>
rg
> Subject: Re: New Token API was Re: Payloads and TrieRangeQuery
>
> On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote:
>
> > And, in tests: test/o/a/l/index/store is somehow wrong placed. The class
> > inside should be in test/o/a/l/store. Shou
I may do the Highlighter. Its annoying though - I'll have to break back
compat because Token is part of the public API (Fragmenter, etc).
Robert Muir wrote:
Michael OK, I plan on adding some tests for the analyzers that don't have any.
I didn't try to migrate things such as highlighter, which
@lucene.apache.org
Subject: Re: New Token API was Re: Payloads and TrieRangeQuery
I have implemented most of that actually (the interface part and Token
implementing all of them).
The problem is a paradigm change with the new API: the assumption is that
there is always only one single instance of an Attribute
Michael OK, I plan on adding some tests for the analyzers that don't have any.
I didn't try to migrate things such as highlighter, which are
definitely just as important, only because I'm not familiar with that
territory.
But I think I can figure out what the various language analyzers are
trying
I agree. It's my fault, the task of changing the contribs (LUCENE-1460)
is assigned to me for a while now - I just haven't found the time to do
it yet.
It's great that you started the work on that! I'll try to review the
patch in the next couple of days and help with fixing the remaining
ones
On Mon, Jun 15, 2009 at 4:21 PM, Uwe Schindler wrote:
> And, in tests: test/o/a/l/index/store is somehow wrong placed. The class
> inside should be in test/o/a/l/store. Should I move?
Please do!
Mike
-
To unsubscribe, e-mail: j
I have implemented most of that actually (the interface part and Token
implementing all of them).
The problem is a paradigm change with the new API: the assumption is
that there is always only one single instance of an Attribute. With the
old API, it is recommended to reuse the passed-in token
Michael, again I am terrible with such things myself...
Personally I am impressed that you have the back compat, even if you
don't change any code at all I think some reformatting of javadocs
might make the situation a lot friendlier. I just listed everything
that came to my mind immediately.
I g
> And I don't like the *useNewAPI*() methods either. I spent a lot of time
> thinking about backwards compatibility for this API. It's tricky to do
> without sacrificing performance. In API patches I find myself spending
> more time for backwards-compatibility than for the actual new feature! :(
hetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Monday, June 15, 2009 10:18 PM
> To: java-dev@lucene.apache.org
> Subject: RE: New Token API was Re: Payloads and TrieRangeQuery
>
> > there's also
This is excellent feedback, Robert!
I agree this is confusing; especially having a deprecated API and only a
experimental one that replaces the old one. We need to change that.
And I don't like the *useNewAPI*() methods either. I spent a lot of time
thinking about backwards compatibility for th
Some great points - especially the decision between a deprecated API,
and a new experimental one subject to change. Bit of a rock and a hard
place for a new user.
Perhaps we should add a little note with some guidance.
- Mark
Robert Muir wrote:
let me try some slightly more constructive fee
> there's also a stray bold tag gone haywire somewhere, possibly
> .incrementToken()
I fixed this. This was going me on my nerves the whole day when I wrote
javadocs for NumericTokenStream...
Uwe
-
To unsubscribe, e-mail: java-
let me try some slightly more constructive feedback:
new user looks at TokenStream javadocs:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/analysis/TokenStream.html
immediately they see deprecated, text in red with the words
"experimental", warnings in bold, the
Mark, I'll see if I can get tests produced for some of those analyzers.
as a new user of the new api myself, I think I can safely say the most
confusing thing about it is having the old deprecated API mixed in the
javadocs with it :)
On Mon, Jun 15, 2009 at 2:53 PM, Mark Miller wrote:
> Robert Mu
> If you understood that, you'd be able to look
> at the actual token value if you were interested in what shift was
> used. So it's redundant, has a runtime cost, it's not currently used
> anywhere, and it's not useful to fields other than Trie. Perhaps it
> shouldn't exist (yet)?
You are right
> On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote:
> > There is a new Attribute called ShiftAttribute (or
> NumericShiftAttribute),
> > when trie range is moved to core. This attribute contains the shifted-
> away
> > bits from the prefix encoded value during trie indexing.
>
> I was wonderin
On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler wrote:
> There is a new Attribute called ShiftAttribute (or NumericShiftAttribute),
> when trie range is moved to core. This attribute contains the shifted-away
> bits from the prefix encoded value during trie indexing.
I was wondering about this
> Also, what about the case where one might have attributes that are meant
> for downstream TokenFilters, but not necessarily for indexing? Offsets
> and type come to mind. Is it the case now that those attributes are not
> automatically added to the index? If they are ignored now, what if I
Robert Muir wrote:
Mark, I created an issue for this.
Thanks Robert, great idea.
I just think you know, converting an analyzer to the new api is really
not that bad.
I don't either. I'm really just complaining about the initial
readability. Once you know whats up, its not too much differ
Mark, I created an issue for this.
I just think you know, converting an analyzer to the new api is really
not that bad.
reverse engineering what one of them does is not necessarily obvious,
and is completely unrelated but necessary if they are to be migrated.
I'd be willing to assist with some o
Robert Muir wrote:
As Lucene's contrib hasn't been fully converted either (and its been quite
some time now), someone has probably heard that groan before.
hope this doesn't sound like a complaint,
Complaints are fine in any case. Every now and then, it might cause a
little rant from me o
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'd be happy to discuss other API proposals that anybody brings up
here, that have the same advantages and are more intuitive. We could
also beef up the documentation and give a better example about how
to convert a stream/filter from the
>
> As Lucene's contrib hasn't been fully converted either (and its been quite
> some time now), someone has probably heard that groan before.
hope this doesn't sound like a complaint, but in my opinion this is
because many do not have any tests.
I converted a few of these and its just grunt work
Yonik Seeley wrote:
The high-level description of the new API looks good (being able to
add arbitrary properties to tokens), unfortunately, I've never had the
time to try and use it and give any constructive feedback.
As far as difficulty of use, I assume this only applies to
implementing your o
The high-level description of the new API looks good (being able to
add arbitrary properties to tokens), unfortunately, I've never had the
time to try and use it and give any constructive feedback.
As far as difficulty of use, I assume this only applies to
implementing your own TokenFilter? It see
On Jun 14, 2009, at 8:05 PM, Michael Busch wrote:
I'm not sure why this (currently having to implement next() too) is
such an issue for you. You brought it up at the Lucene meetup too.
No user will ever have to implement both (the new API and the old)
in their streams/filters. The only reas
On Jun 15, 2009, at 12:19 PM, Michael McCandless wrote:
I don't think anything was "held back" in this effort. Grant, are you
referring to LUCENE-1458? That's "held back" simply because the only
person working on it (me) got distracted by other things to work on.
I'm sorry, I didn't mean to
I thought the primary goal of switching to AttributeSource (yes, the
name is very generic...) was to allow extensibility to what's created
per-Token, so that an app could add their own attrs without costly
subclassing/casting per Token, independent of other other "things"
adding their tokens, etc.
The "old" API is deprecated, and therefore when we release 2.9 there might
be some people who'd think they should move away from it, to better prepare
for 3.0 (while in fact this many not be the case). Also, we should make sure
that when we remove all the deprecations, this will still exist (and
th
Mark Miller wrote:
I don't know how I feel about rolling the new token api back.
I will say that I originally had no issue with it because I am very
excited about Lucene-1458.
At the same time though, I'm thinking Lucene-1458 is a very advanced
issue that will likely be for really expert usa
I don't know how I feel about rolling the new token api back.
I will say that I originally had no issue with it because I am very
excited about Lucene-1458.
At the same time though, I'm thinking Lucene-1458 is a very advanced
issue that will likely be for really expert usage (though I can see
On 6/14/09 5:17 AM, Grant Ingersoll wrote:
Agreed. I've been bringing it up for a while now and made the same
comments when it was first introduced, but felt like the lone voice in
the wilderness on it and gave way [1], [2], [3]. Now that others are
writing/converting, I think it is worth rev
Agreed. I've been bringing it up for a while now and made the same
comments when it was first introduced, but felt like the lone voice in
the wilderness on it and gave way [1], [2], [3]. Now that others are
writing/converting, I think it is worth revisiting.
That being said, I did just wr
> Just to throw something out, the new Token API is not very consumable in my
> experience. The old one was very intuitive and very easy to follow the code.
>
> I've had to refigure out what the heck was going on with the new one more
> than once now. Writing some example code with it is hard to fo
Yonik Seeley wrote:
Even non-API changes have tradeoffs... the indexing improvements (err,
total rewrite) made that code *much* harder to understand and debug.
It's a net win since the indexing performance improvements were so
fantastic.
I agree - very hard to follow, worth the improvements.
On Jun 13, 2009, at 8:58 AM, Michael McCandless wrote:
OK, good points Grant. I now agree that it's not a simple task,
moving stuff core stuff from Solr -> Lucene. So summing this all up:
* Some feel Lucene should only aim to be the core "expert" engine
used by Solr/Nutch/etc., so things
Of course consumability (good APIs) is important, but rational people
can disagree when it comes to the specifics... many things come with
tradeoffs.
Even non-API changes have tradeoffs... the indexing improvements (err,
total rewrite) made that code *much* harder to understand and debug.
It's a
Very true write up Grant!
On Sat, Jun 13, 2009 at 2:58 PM, Michael
McCandless wrote:
> OK, good points Grant. I now agree that it's not a simple task,
> moving stuff core stuff from Solr -> Lucene. So summing this all up:
>
> * Some feel Lucene should only aim to be the core "expert" engine
>
OK, good points Grant. I now agree that it's not a simple task,
moving stuff core stuff from Solr -> Lucene. So summing this all up:
* Some feel Lucene should only aim to be the core "expert" engine
used by Solr/Nutch/etc., so things like moving trie to core (with
consumable naming, go
On Jun 12, 2009, at 12:20 PM, Michael McCandless wrote:
On Thu, Jun 11, 2009 at 4:58 PM, Yonik Seeley> wrote:
In Solr land we can quickly hack something together, spend some time
thinking about the external HTTP interface, and immediately make it
available to users (those using nightlies at l
On Thu, Jun 11, 2009 at 4:58 PM, Yonik Seeley wrote:
> In Solr land we can quickly hack something together, spend some time
> thinking about the external HTTP interface, and immediately make it
> available to users (those using nightlies at least). It would be a
> huge burden to say to Solr that
In Solr land we can quickly hack something together, spend some time
thinking about the external HTTP interface, and immediately make it
available to users (those using nightlies at least). It would be a
huge burden to say to Solr that anything of interest to the Lucene
community should be pulled
On Thu, Jun 11, 2009 at 9:20 AM, Uwe Schindler wrote:
> In my opinion, solr and lucene should exchange technology much more. Solr
> should concentrate on the "search server" and lucene should provide the
> technology.
+1
> All additional implementations inside solr like faceting and so
> on, cou
On Thu, Jun 11, 2009 at 8:46 AM, Yonik Seeley wrote:
>>> Really goes into Solr land... my pref for Lucene is to remain a core
>>> expert-level full-text search library and keep out things that are
>>> easy to do in an application or at another level.
>>
>> I think this must be the crux of our disa
From: Michael McCandless [mailto:luc...@mikemccandless.com]
> On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley
> wrote:
>
> > Really goes into Solr land... my pref for Lucene is to remain a core
> > expert-level full-text search library and keep out things that are
> > easy to do in an application or
On Thu, Jun 11, 2009 at 7:01 AM, Michael
McCandless wrote:
> On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley
> wrote:
>
>> Really goes into Solr land... my pref for Lucene is to remain a core
>> expert-level full-text search library and keep out things that are
>> easy to do in an application or at
On Wed, Jun 10, 2009 at 6:07 PM, Yonik Seeley wrote:
> Really goes into Solr land... my pref for Lucene is to remain a core
> expert-level full-text search library and keep out things that are
> easy to do in an application or at another level.
I think this must be the crux of our disagreement.
On Wed, Jun 10, 2009 at 5:45 PM, Michael McCandless
wrote:
> But, I realize this is a stretch... eg we'd have to fix rewrite to be
> per-segment, which certainly seems spooky. A top-level schema would
> definitely be cleaner.
Really goes into Solr land... my pref for Lucene is to remain a core
e
> > Another question not so simple to answer: When embedding these
> TermPositions
> > into the whole process, how would this work with MultiTermQuery?
>
> There's no reason why Trie has to use MultiTermQuery, right?
No but is elegant and simplifies much (see current code in trunk).
Uwe
--
> I think we'd need richer communication between MTQ and its subclasses,
> so that eg your enum would return a Query instead of a Term?
>
> Then you'd either return a TermQuery, or, a BooleanQuery that's
> filtering the TermQuery?
>
> But yes, doing after 3.0 seems good!
There is one other thing
> Another question not so simple to answer: When embedding these TermPositions
> into the whole process, how would this work with MultiTermQuery?
There's no reason why Trie has to use MultiTermQuery, right?
-Yonik
http://www.lucidimagination.com
--
On Wed, Jun 10, 2009 at 5:24 PM, Yonik Seeley wrote:
> On Wed, Jun 10, 2009 at 5:03 PM, Michael McCandless
> wrote:
>> On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote:
>> * Was the field even indexed w/ Trie, or indexed as "simple text"?
>
> Why the special treatment for Trie?
So that at
I think we'd need richer communication between MTQ and its subclasses,
so that eg your enum would return a Query instead of a Term?
Then you'd either return a TermQuery, or, a BooleanQuery that's
filtering the TermQuery?
But yes, doing after 3.0 seems good!
Mike
On Wed, Jun 10, 2009 at 5:26 PM,
> * Was the field even indexed w/ Trie, or indexed as "simple text"?
> It's useful to know this "automatically" at search time, so eg a
> RangeQuery can do the right thing by default. FieldInfos seems
> like the natural place to store this. It's basically Lucene's
> per-segment write
> I would like to go forward with moving the classes into the right packages
> and optimize the way, how queries and analyzers are created (only one
> class
> for each). The idea from LUCENE-1673 to use static factories to create
> these
> classes for the different data types seems to be more elega
On Wed, Jun 10, 2009 at 5:03 PM, Michael McCandless
wrote:
> On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote:
> * Was the field even indexed w/ Trie, or indexed as "simple text"?
Why the special treatment for Trie?
> It's useful to know this "automatically" at search time, so eg a
>
On Wed, Jun 10, 2009 at 5:07 PM, Uwe Schindler wrote:
> I would really like to leave this optimization out for 2.9. We can still add
> this after 2.9 as an optimization. The number of bits encoded into the
> TermPosition (this is really a cool idea, thanks Yonik, I was missing
> exactly that, becau
> On Wed, Jun 10, 2009 at 3:43 PM, Michael McCandless
> wrote:
> > On Wed, Jun 10, 2009 at 3:19 PM, Yonik
> Seeley wrote:
> >
> >>> And this information about the trie
> >>> structure and where payloads are should be stored in FieldInfos.
> >>
> >> As is the case today, the info is encoded in the
On Wed, Jun 10, 2009 at 4:04 PM, Earwin Burrfoot wrote:
> And then, when you merge segments indexed with different Trie*
> settings, you need to convert them to some common form.
> Sounds like something too complex and with minimum returns.
Oh yeah... tricky. So... there are various situations t
On Wed, Jun 10, 2009 at 3:43 PM, Michael McCandless
wrote:
> On Wed, Jun 10, 2009 at 3:19 PM, Yonik Seeley
> wrote:
>
>>> And this information about the trie
>>> structure and where payloads are should be stored in FieldInfos.
>>
>> As is the case today, the info is encoded in the class you use (
>>> And this information about the trie
>>> structure and where payloads are should be stored in FieldInfos.
>>
>> As is the case today, the info is encoded in the class you use (and
>> it's settings)... no need to add it to the index structure. In any
>> case, it's a completely different issue an
On Wed, Jun 10, 2009 at 3:19 PM, Yonik Seeley wrote:
>> And this information about the trie
>> structure and where payloads are should be stored in FieldInfos.
>
> As is the case today, the info is encoded in the class you use (and
> it's settings)... no need to add it to the index structure. In
On Wed, Jun 10, 2009 at 3:07 PM, Uwe Schindler wrote:
>> I wonder how performance would compare. Without payloads, there are
>> many more terms (for the tiny ranges) in the index, and your OR query
>> will have lots of these tiny terms. But then these tiny terms don't
>> hit many docs, and with
On Wed, Jun 10, 2009 at 3:07 PM, Uwe Schindler wrote:
> My problem with all this is how to optimize after which shift value to
> switch between terms and payloads.
Just make it a configurable number of bits at the end that are
"stored" instead of indexed. People will want to select different
tra
;
> > Was it understandable? (Its complicated, I know)
> >
> >
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
>
Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> ____
>
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Wednesday, June 10, 2009 7:59 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Payloads and TrieRangeQuer
taphi.de
> eMail: u...@thetaphi.de
>
> ________
>
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Wednesday, June 10, 2009 7:59 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Payloads and TrieRangeQuery
>
>
>
> I thin
men
<http://www.thetaphi.de> http://www.thetaphi.de
eMail: u...@thetaphi.de
_
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Wednesday, June 10, 2009 7:59 PM
To: java-dev@lucene.apache.org
Subject: Re: Payloads and TrieRangeQuery
I think instead of ORing post
I think instead of ORing postings (trie range, rangequery, etc), have a
custom Query + Scorer that examines the payload (somehow)? It could encode
the multiple levels of trie bits in it? (I'm just guessing here).
On Wed, Jun 10, 2009 at 4:04 AM, Michael McCandless <
luc...@mikemccandless.com> wr
Use them how? (Sounds interesting...).
Mike
On Tue, Jun 9, 2009 at 10:32 PM, Jason
Rutherglen wrote:
> At the SF Lucene User's group, Michael Busch mentioned using
> payloads with TrieRangeQueries. Is this something that's being
> worked on? I'm interested in what sort performance benefits
> the
Grant Ingersoll wrote:
Couldn't agree more. This is good progress.
I like the payloads patch, but I would like to see the lazy prox
stream (Lucene 761) stuff done (or at least details given on it) so
that we can hook this into Similarity so that it can be hooked into
scoring. For 761 and th
Marvin Humphrey wrote:
On Jan 18, 2007, at 8:31 AM, Michael Busch wrote:
I think it makes sense to add new functions incrementally, as long as
we try to only extend the API in a way, so that it is compatible with
the long-term goal, as Doug suggested already. After the payload
patch is commi
On Jan 18, 2007, at 8:59 AM, Grant Ingersoll wrote:
I think one thing that would really bolster the flex. indexing
format changes would be to have someone write another
implementation for it so that we can iron out any interface details
that may be needed. For instance, maybe the Kino mer
I agree (and this has been discussed on this very thread in the past,
see Doug's comments). I would love to have someone take a look at
the flexible indexing patch that was submitted (I have looked a
little at it, but it is going to need more than just me since it is a
big change, although
Couldn't agree more. This is good progress.
I like the payloads patch, but I would like to see the lazy prox
stream (Lucene 761) stuff done (or at least details given on it) so
that we can hook this into Similarity so that it can be hooked into
scoring. For 761 and the payload stuff, we n
On Jan 18, 2007, at 8:31 AM, Michael Busch wrote:
I think it makes sense to add new functions incrementally, as long
as we try to only extend the API in a way, so that it is compatible
with the long-term goal, as Doug suggested already. After the
payload patch is committed we can work on a
Nadav Har'El wrote:
On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads":
As you pointed out it is still possible to have per-doc payloads. You
need an analyzer which adds just one Token with payload to a specific
field for each doc. I understand that this code woul
Grant Ingersoll wrote:
Just to put in two cents: the Flexible Indexing thread has also talked
about the notion of being able to store arbitrary data at: token,
field, doc and Index level.
-Grant
Yes I agree that this should be the long-term goal. The payload feature
is just a first step in
Just to put in two cents: the Flexible Indexing thread has also
talked about the notion of being able to store arbitrary data at:
token, field, doc and Index level.
-Grant
On Jan 18, 2007, at 11:01 AM, Nadav Har'El wrote:
On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payl
On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads":
> As you pointed out it is still possible to have per-doc payloads. You
> need an analyzer which adds just one Token with payload to a specific
> field for each doc. I understand that this code would be quite ugly
Nadav Har'El wrote:
Hi Michael,
For some uses (e.g., faceted search), one wants to add a payload to each
document, not per position for some text field. In the faceted search example,
we could use payloads to encode the list of facets that each document
belongs to. For this, with the old API, y
Doug,
sorry for the late response. I was on vacation after New Year's... oh
btw. Happy New Year to everyone! :-)
Doug Cutting wrote:
Michael Busch wrote:
Yes I could introduce a new class called e.g. PayloadToken that
extends Token (good that it is not final anymore). Not sure if I
understa
On Mon, Jan 08, 2007, Nicolas Lalevיe wrote about "Re: Payloads":
> I have looked closer to how lucene index, and I realized that for the facet
> feature, the kind of payload handling by Michael's patch are not designed for
> that. In this patch, the payloads are in th
Le Mercredi 3 Janvier 2007 14:46, Nadav Har'El a écrit :
> On Wed, Dec 20, 2006, Michael Busch wrote about "Payloads":
> >..
> > Some weeks ago I started working on an improved design which I would
> > like to propose now. The new design simplifies the API extensions (the
> > Field API remains unch
Le Samedi 23 Décembre 2006 00:32, Michael Busch a écrit :
> Nicolas Lalevée wrote:
> > I have just looked at it. It looks great :)
>
> Thanks! :-)
>
> > But I still doesn't understand why a new entry in the fieldinfo is
> > needed.
>
> The entry is not really *needed*, but I use it for
> backwards-
On Wed, Dec 20, 2006, Michael Busch wrote about "Payloads":
>..
> Some weeks ago I started working on an improved design which I would
> like to propose now. The new design simplifies the API extensions (the
> Field API remains unchanged) and uses less disk space in most use cases.
> Now there a
Nicolas Lalevée wrote:
I have just looked at it. It looks great :)
Thanks! :-)
But I still doesn't understand why a new entry in the fieldinfo is needed.
The entry is not really *needed*, but I use it for
backwards-compatibility and as an optimization for fields that don't
have any
On Dec 22, 2006, at 10:36 AM, Doug Cutting wrote:
The easiest way to do this would be to have separate files in each
segment for each PostingFormat. It would be better if different
posting formats could share files, but that's harder to coordinate.
The approach I'm taking in KinoSearch 0.
On 12/22/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Ning Li wrote:
> The draft proposal seems to suggest the following (roughly):
> A dictionary entry is .
Perhaps this ought to be , where TermInfo contains a
FilePointer and perhaps other information (e.g., frequency data).
Yes. Another exam
Ning Li wrote:
I'm aware of this design. Boolean and phrase queries are an example.
The point is, there are different queries whose processing will
(continue to) require different information of terms, especially when
flexible posting is allowed. The question is, should the number of
files used t
Ning Li wrote:
The draft proposal seems to suggest the following (roughly):
A dictionary entry is .
Perhaps this ought to be , where TermInfo contains a
FilePointer and perhaps other information (e.g., frequency data).
A posting entry for a term in a document is .
Classes which implement
On Dec 22, 2006, at 9:17 AM, Ning Li wrote:
The question is, should the number of
files used to store postings be customizable?
I think it ought to remain an implementation detail for now. Using
multiple files is an optimization of unknown advantage.
Optimizations have to work very hard
1 - 100 of 112 matches
Mail list logo