Chris Hostetter wrote on 07/10/2006 12:31 PM:
> So i guess we are on the same page that this kind of thing can be done at
> the App level -- what benefits do you see moving them into the Lucene
> index level?
>
Other than performance per David's and Marvin's ideas, the functionality
benefits
: previously mentioned a very simple one: validating fields in the query
: parser. More interesting examples are:
This strikes me as something that can be done with an abstraction layer
above and seperate from the physical index (this is in fact what Solr
does) without needing to add any hard c
On 7/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 7/10/06, David Balmain <[EMAIL PROTECTED]> wrote:
> I don't think declaring all fields up front is necessary for
> substantial optimizations. I've found that the key to some really good
> optimizations is having constant field numbers. That i
On 7/10/06, David Balmain <[EMAIL PROTECTED]> wrote:
I don't think declaring all fields up front is necessary for
substantial optimizations. I've found that the key to some really good
optimizations is having constant field numbers. That is, once a field
is added to the index it is assigned a fie
On 7/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote:
David Balmain wrote on 07/10/2006 01:04 AM:
> The only problem I could find with this solution is that
> fields are no longer in alphabetical order in the term dictionary but
> I couldn't think of a use-case where this is necessary although I'
dex.
>
This is certainly a large issue, as David says he has achieved a 5x
performance gain.
My interest in global field semantics originally sprang from
functionality considerations, not performance considerations. I've got
many features that require reasoning about field semantics. I
p
David Balmain wrote on 07/10/2006 01:04 AM:
> The only problem I could find with this solution is that
> fields are no longer in alphabetical order in the term dictionary but
> I couldn't think of a use-case where this is necessary although I'm
> sure there probably is one.
So presumably fields ar
two "FIelds" for field name "f", both have a stored
value for f, both have some indexed terms for f, both have
some tokenized terms and one utokenized term for f ... but do these two
docs both conform to the same "Global field semantics" ?
-Hoss
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On 7/10/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Chuck Williams wrote:
> Lucene today allows many field properties to vary at the Field level.
> E.g., the same field name might be tokenized in one Field on a Document
> while it is untokenized in another Field on the same or different
> Documen
Chuck Williams wrote:
Lucene today allows many field properties to vary at the Field level.
E.g., the same field name might be tokenized in one Field on a Document
while it is untokenized in another Field on the same or different
Document.
The rationale for this design was to keep the API simp
see any
>> obvious need to vary the type of term vector (positions, offsets or
>> both).
>
> I think Store could definitely legitimately vary across Fields or
> Documents for the same reason your term vectors do. Perhaps you are
> indexing pages from the web and you want to cache
;> obvious need to vary the type of term vector (positions, offsets or
>> both).
>
> I think Store could definitely legitimately vary across Fields or
> Documents for the same reason your term vectors do. Perhaps you are
> indexing pages from the web and you want to cache only th
he smaller
pages.
There are significant benefits to global semantics, as evidenced by the
fact that several of us independently came to desire this. However,
deciding what can be global and what cannot is more subtle.
I agree. I can't see global field semantics making it into Lucene in
On Jul 9, 2006, at 11:31 AM, Chuck Williams wrote:
Marvin Humphrey wrote on 07/08/2006 11:13 PM:
On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote:
Many things would be cleaner in Lucene if fields had a global
semantics,
i.e., if properties like text vs. binary, Index, Store,
TermVector,
Marvin Humphrey wrote on 07/08/2006 11:13 PM:
>
> On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote:
>
>> Many things would be cleaner in Lucene if fields had a global semantics,
>> i.e., if properties like text vs. binary, Index, Store, TermVector, the
>> appropriate Analyzer, the assignment of Dir
. were a function of just the field name and the
index.
This is the direction I would like to go.
This approach would naturally admit a class, say IndexFieldSet,
that would hold global field semantics for an index.
Lucene today allows many field properties to vary at the Field level.
E.g., the
karl wettin wrote on 07/08/2006 12:27 PM:
> On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote:
>
>> Karl, do you have specific reasons or use cases to normalize fields at
>> Document rather than at Index?
>>
>
> Nothing more than that the way the API looks it implies features that
>
On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote:
>
> Karl, do you have specific reasons or use cases to normalize fields at
> Document rather than at Index?
Nothing more than that the way the API looks it implies features that
does not exist. Boost, store, index and vectors. I've learned
karl wettin wrote on 07/08/2006 10:27 AM:
> On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote:
>
>> Many things would be cleaner in Lucene if fields had a global semantics,
>>
>
>
>> Has this been considered before? Are there good reasons this path has
>> not been followed?
>>
On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote:
> Many things would be cleaner in Lucene if fields had a global semantics,
> Has this been considered before? Are there good reasons this path has
> not been followed?
I've been posting some advocacy about the current Field. Basically I
wo
approach would naturally admit a class, say IndexFieldSet,
that would hold global field semantics for an index.
Lucene today allows many field properties to vary at the Field level.
E.g., the same field name might be tokenized in one Field on a Document
while it is untokenized in another Field on
21 matches
Mail list logo