Chris Hostetter <[EMAIL PROTECTED]> wrote on 05/31/2007 02:28:58 
PM:

> I'm having a little trouble following this discussion, first off as to
> your immediate issue...
> 
> : Thanks, but I think I'm going to have to work out a different 
solution. I
> : have written my own analyzer that does everything I need: it's not a
> : different analyzer I need but a way to specify that certain fields 
should
> : be tokenized and others not -- while still leaving all other options 
open.
> 
> ...maybe there is some terminology confusion here ... if you've already
> got an "Analyzer" (capital A Lucene classname) then you can specify it 
for
> one fieldType, and use that field type for the fields you want analysis
> done.  if you have other fields were you don't want tokenizing/analysis
> done, use a different fieldType (with a StrField).
>
This is precisely what I've done (but see below for more).

> As for your followup question...
> 
> : As far as the generic options parsing resulting in unused properties 
in a
> : ShcemaField object, not it is not specifically documented anywhere, 
but
> : the Solr Wiki lists, for both fields and field types: "Common options 
that
> : fields can have are...". I could not find anywhere a definitive list 
of
> : what is allowed/used or excluded, so I went to the code and found that 
the
> 
> That's because there is no definitive list.  Every FieldType can define
> it's own list of attributes that can be declared and handled by it's own
> init method.
> 
Unfortunately, unless I've missed something obvious, the "tokenized" 
property is not available to classes that extend FieldType: the setArgs() 
method of FieldType strips "tokenized" and other standard properties away 
before calling the init() method. Yes, of course one could override 
setArgs(), but that's not a robust solution.

The terminology confusion stems (sorry, pun sort of not intended) from the 
frequent overlap of the terms "tokenize" and "analyze". As I mentioned in 
an earlier message on this thread, it is quite possible to create an 
Analyzer that does all sorts of things without tokenizing, or, more 
precisely, creates a single Token from the field value. I would posit that 
tokenization and analysis are two separate things, albeit most frequently 
done together.

-- Robert

Reply via email to