Re: Global field semantics

2006-07-10 Thread Chuck Williams
Chris Hostetter wrote on 07/10/2006 12:31 PM: > So i guess we are on the same page that this kind of thing can be done at > the App level -- what benefits do you see moving them into the Lucene > index level? > Other than performance per David's and Marvin's ideas, the functionality benefits

Re: Global field semantics

2006-07-10 Thread Chris Hostetter
: previously mentioned a very simple one: validating fields in the query : parser. More interesting examples are: This strikes me as something that can be done with an abstraction layer above and seperate from the physical index (this is in fact what Solr does) without needing to add any hard c

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/10/06, David Balmain <[EMAIL PROTECTED]> wrote: > I don't think declaring all fields up front is necessary for > substantial optimizations. I've found that the key to some really good > optimizations is having constant field numbers. That i

Re: Global field semantics

2006-07-10 Thread Yonik Seeley
On 7/10/06, David Balmain <[EMAIL PROTECTED]> wrote: I don't think declaring all fields up front is necessary for substantial optimizations. I've found that the key to some really good optimizations is having constant field numbers. That is, once a field is added to the index it is assigned a fie

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote: David Balmain wrote on 07/10/2006 01:04 AM: > The only problem I could find with this solution is that > fields are no longer in alphabetical order in the term dictionary but > I couldn't think of a use-case where this is necessary although I'

Re: Global field semantics

2006-07-10 Thread Chuck Williams
dex. > This is certainly a large issue, as David says he has achieved a 5x performance gain. My interest in global field semantics originally sprang from functionality considerations, not performance considerations. I've got many features that require reasoning about field semantics. I p

Re: Global field semantics

2006-07-10 Thread Chuck Williams
David Balmain wrote on 07/10/2006 01:04 AM: > The only problem I could find with this solution is that > fields are no longer in alphabetical order in the term dictionary but > I couldn't think of a use-case where this is necessary although I'm > sure there probably is one. So presumably fields ar

Re: Global field semantics

2006-07-10 Thread Chris Hostetter
two "FIelds" for field name "f", both have a stored value for f, both have some indexed terms for f, both have some tokenized terms and one utokenized term for f ... but do these two docs both conform to the same "Global field semantics" ? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/10/06, Doug Cutting <[EMAIL PROTECTED]> wrote: Chuck Williams wrote: > Lucene today allows many field properties to vary at the Field level. > E.g., the same field name might be tokenized in one Field on a Document > while it is untokenized in another Field on the same or different > Documen

Re: Global field semantics

2006-07-10 Thread Doug Cutting
Chuck Williams wrote: Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on the same or different Document. The rationale for this design was to keep the API simp

Re: Global field semantics

2006-07-09 Thread David Balmain
see any >> obvious need to vary the type of term vector (positions, offsets or >> both). > > I think Store could definitely legitimately vary across Fields or > Documents for the same reason your term vectors do. Perhaps you are > indexing pages from the web and you want to cache

Re: Global field semantics

2006-07-09 Thread Chuck Williams
;> obvious need to vary the type of term vector (positions, offsets or >> both). > > I think Store could definitely legitimately vary across Fields or > Documents for the same reason your term vectors do. Perhaps you are > indexing pages from the web and you want to cache only th

Re: Global field semantics

2006-07-09 Thread David Balmain
he smaller pages. There are significant benefits to global semantics, as evidenced by the fact that several of us independently came to desire this. However, deciding what can be global and what cannot is more subtle. I agree. I can't see global field semantics making it into Lucene in

Re: Global field semantics

2006-07-09 Thread Marvin Humphrey
On Jul 9, 2006, at 11:31 AM, Chuck Williams wrote: Marvin Humphrey wrote on 07/08/2006 11:13 PM: On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: Many things would be cleaner in Lucene if fields had a global semantics, i.e., if properties like text vs. binary, Index, Store, TermVector,

Re: Global field semantics

2006-07-09 Thread Chuck Williams
Marvin Humphrey wrote on 07/08/2006 11:13 PM: > > On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: > >> Many things would be cleaner in Lucene if fields had a global semantics, >> i.e., if properties like text vs. binary, Index, Store, TermVector, the >> appropriate Analyzer, the assignment of Dir

Re: Global field semantics

2006-07-08 Thread Marvin Humphrey
. were a function of just the field name and the index. This is the direction I would like to go. This approach would naturally admit a class, say IndexFieldSet, that would hold global field semantics for an index. Lucene today allows many field properties to vary at the Field level. E.g., the

Re: Global field semantics

2006-07-08 Thread Chuck Williams
karl wettin wrote on 07/08/2006 12:27 PM: > On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote: > >> Karl, do you have specific reasons or use cases to normalize fields at >> Document rather than at Index? >> > > Nothing more than that the way the API looks it implies features that >

Re: Global field semantics

2006-07-08 Thread karl wettin
On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote: > > Karl, do you have specific reasons or use cases to normalize fields at > Document rather than at Index? Nothing more than that the way the API looks it implies features that does not exist. Boost, store, index and vectors. I've learned

Re: Global field semantics

2006-07-08 Thread Chuck Williams
karl wettin wrote on 07/08/2006 10:27 AM: > On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote: > >> Many things would be cleaner in Lucene if fields had a global semantics, >> > > >> Has this been considered before? Are there good reasons this path has >> not been followed? >>

Re: Global field semantics

2006-07-08 Thread karl wettin
On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote: > Many things would be cleaner in Lucene if fields had a global semantics, > Has this been considered before? Are there good reasons this path has > not been followed? I've been posting some advocacy about the current Field. Basically I wo

Global field semantics

2006-07-08 Thread Chuck Williams
approach would naturally admit a class, say IndexFieldSet, that would hold global field semantics for an index. Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on