Re: creating SchemaField and FieldType programmatically

2012-06-02 Thread Mike Sokolov
ok, never mind all is well - I had a mismatch between the 
schema-declared field and my programmatic field, where I was overzealous 
in using OMIT_TF_POSITIONS.


-Mike

On 6/2/2012 5:02 PM, Mike Sokolov wrote:
I'm creating a some Solr plugins that index and search documents in a 
special way, and I'd like to make them as easy as possible to 
configure.  Ideally I'd like users to be able to just drop a jar in 
place without having to copy any configuration into schema.xml, 
although I suppose they will have to register the plugins in 
solrconfig.xml.


I tried making my UpdateProcessor core aware and creating FieldTypes 
and SchemaFields in the inform(SolrCore) method.  This was a good 
start, but I'm running into some issues getting the types properly 
initialized.  One of my types, for example, derives from TextField, 
but this seems to require an initialization pass in order to get its 
properties set up properly.  What I'm seeing is that my field values 
aren't being tokenized, even though I specify TOKENIZED when I create 
the SchemaField.  I'm beginning to get the feeling I'm doing something 
not-quite anticipated by the API designers.


My question is: is there a way to go about doing something like this 
that isn't swimming upstream?  Should I just give up and require users 
to incorporate my schema in the xml config?


Here is a code snippet for anyone willing to dig in a little:

/** Called when each core is initialized; we ensure that lux 
fields are configured. */

public void inform(SolrCore core) {
IndexSchema schema = core.getSchema();
MapString,SchemaField fields = schema.getFields();
if (fields.containsKey(lux_path)) {
return;
}
MapString,FieldType fieldTypes = schema.getFieldTypes();
FieldType luxTextWs = fieldTypes.get(lux_text_ws);
if (luxTextWs == null) {
luxTextWs = new TextField ();
luxTextWs.setAnalyzer(new WhitespaceGapAnalyzer());
luxTextWs.setQueryAnalyzer(new WhitespaceGapAnalyzer());
fieldTypes.put(lux_text_ws, luxTextWs);
}
fields.put(lux_path, new SchemaField (lux_path, luxTextWs, 
0x233, )); // 0x233 = INDEXED | TOKENIZED | OMIT_NORMS | 
OMIT_TF_POSITIONS | MULTIVALUED
fields.put(lux_elt_name, new SchemaField (lux_elt_name, 
new StrField(), 0x231, ));// INDEXED | OMIT_NORMS | 
OMIT_TF_POSITIONS | MULTIVALUED
fields.put(lux_att_name, new SchemaField (lux_att_name, 
new StrField(), 0x231, ));

// must call this after making changes to the field map:
schema.refreshAnalyzers();
}




Re: creating SchemaField and FieldType programmatically

2012-06-02 Thread Mike Sokolov
Oh yes, final followup for the terminally curious; I also had to add 
this little class in order to get analysis turned on for my programmatic 
field:


class PathField extends TextField {

PathField (IndexSchema schema) {
setAnalyzer(new WhitespaceGapAnalyzer());
setQueryAnalyzer(new WhitespaceGapAnalyzer());
}

protected Field.Index getFieldIndex(SchemaField field, String 
internalVal) {

return Field.Index.ANALYZED;
}

}

On 6/2/2012 5:48 PM, Mike Sokolov wrote:
ok, never mind all is well - I had a mismatch between the 
schema-declared field and my programmatic field, where I was 
overzealous in using OMIT_TF_POSITIONS.


-Mike

On 6/2/2012 5:02 PM, Mike Sokolov wrote:
I'm creating a some Solr plugins that index and search documents in a 
special way, and I'd like to make them as easy as possible to 
configure.  Ideally I'd like users to be able to just drop a jar in 
place without having to copy any configuration into schema.xml, 
although I suppose they will have to register the plugins in 
solrconfig.xml.


I tried making my UpdateProcessor core aware and creating 
FieldTypes and SchemaFields in the inform(SolrCore) method.  This was 
a good start, but I'm running into some issues getting the types 
properly initialized.  One of my types, for example, derives from 
TextField, but this seems to require an initialization pass in order 
to get its properties set up properly.  What I'm seeing is that my 
field values aren't being tokenized, even though I specify TOKENIZED 
when I create the SchemaField.  I'm beginning to get the feeling I'm 
doing something not-quite anticipated by the API designers.


My question is: is there a way to go about doing something like this 
that isn't swimming upstream?  Should I just give up and require 
users to incorporate my schema in the xml config?


Here is a code snippet for anyone willing to dig in a little:

/** Called when each core is initialized; we ensure that lux 
fields are configured. */

public void inform(SolrCore core) {
IndexSchema schema = core.getSchema();
MapString,SchemaField fields = schema.getFields();
if (fields.containsKey(lux_path)) {
return;
}
MapString,FieldType fieldTypes = schema.getFieldTypes();
FieldType luxTextWs = fieldTypes.get(lux_text_ws);
if (luxTextWs == null) {
luxTextWs = new TextField ();
luxTextWs.setAnalyzer(new WhitespaceGapAnalyzer());
luxTextWs.setQueryAnalyzer(new WhitespaceGapAnalyzer());
fieldTypes.put(lux_text_ws, luxTextWs);
}
fields.put(lux_path, new SchemaField (lux_path, 
luxTextWs, 0x233, )); // 0x233 = INDEXED | TOKENIZED | OMIT_NORMS | 
OMIT_TF_POSITIONS | MULTIVALUED
fields.put(lux_elt_name, new SchemaField (lux_elt_name, 
new StrField(), 0x231, ));// INDEXED | OMIT_NORMS | 
OMIT_TF_POSITIONS | MULTIVALUED
fields.put(lux_att_name, new SchemaField (lux_att_name, 
new StrField(), 0x231, ));

// must call this after making changes to the field map:
schema.refreshAnalyzers();
}