Chris Hostetter <[EMAIL PROTECTED]> wrote on 05/31/2007 02:28:58 PM:
> I'm having a little trouble following this discussion, first off as to > your immediate issue... > > : Thanks, but I think I'm going to have to work out a different solution. I > : have written my own analyzer that does everything I need: it's not a > : different analyzer I need but a way to specify that certain fields should > : be tokenized and others not -- while still leaving all other options open. > > ...maybe there is some terminology confusion here ... if you've already > got an "Analyzer" (capital A Lucene classname) then you can specify it for > one fieldType, and use that field type for the fields you want analysis > done. if you have other fields were you don't want tokenizing/analysis > done, use a different fieldType (with a StrField). > This is precisely what I've done (but see below for more). > As for your followup question... > > : As far as the generic options parsing resulting in unused properties in a > : ShcemaField object, not it is not specifically documented anywhere, but > : the Solr Wiki lists, for both fields and field types: "Common options that > : fields can have are...". I could not find anywhere a definitive list of > : what is allowed/used or excluded, so I went to the code and found that the > > That's because there is no definitive list. Every FieldType can define > it's own list of attributes that can be declared and handled by it's own > init method. > Unfortunately, unless I've missed something obvious, the "tokenized" property is not available to classes that extend FieldType: the setArgs() method of FieldType strips "tokenized" and other standard properties away before calling the init() method. Yes, of course one could override setArgs(), but that's not a robust solution. The terminology confusion stems (sorry, pun sort of not intended) from the frequent overlap of the terms "tokenize" and "analyze". As I mentioned in an earlier message on this thread, it is quite possible to create an Analyzer that does all sorts of things without tokenizing, or, more precisely, creates a single Token from the field value. I would posit that tokenization and analysis are two separate things, albeit most frequently done together. -- Robert