Re: Can you parse the contents of a field to populate other fields?
On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote: Yonik - thanks so much for your help! Just to clarify; where should the regex go for each field? Each field should have a different FieldType (referenced by the type XML attribute). Each fieldType can have it's own analyzer. You can use a different PatternTokenizer (which specifies a regex) for each analyzer. -Yonik
RE: Can you parse the contents of a field to populate other fields?
So, I think I have things set up correctly in my schema, but it doesn't appear that any logic is being applied to my Category_# fields - they are being populated with the full string copied from the Category field (facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc. I have several different field types, each with a different regex to match a specific part of the input string. In this example, I'm matching facet1 in input string facet1::facet2::facet3...facetn fieldtype name=cat1str class=solr.TextField analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=^([^:]+) group=1/ /analyzer /fieldtype I have copyfields set up for each Category_# field. Anything obviously wrong? Thanks! Kristen -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, November 07, 2007 9:38 AM To: solr-user@lucene.apache.org Subject: Re: Can you parse the contents of a field to populate other fields? On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote: Yonik - thanks so much for your help! Just to clarify; where should the regex go for each field? Each field should have a different FieldType (referenced by the type XML attribute). Each fieldType can have it's own analyzer. You can use a different PatternTokenizer (which specifies a regex) for each analyzer. -Yonik
Re: Can you parse the contents of a field to populate other fields?
I'm not sure I fully understand your ultimate goal or Yonik's response. However, in the past I've been able to represent hierarchical data as a simple enumeration of delimited paths: field name=taxonomyroot/field field name=taxonomyroot/region/field field name=taxonomyroot/region/north america/field field name=taxonomyroot/region/south america/field Then, at response time, you can walk the result facet and build a hierarchy with counts that can be put into a tree view. The tree can be any arbitrary depth, and documents can live in any combination of nodes on the tree. In addition, you can represent any arbitrary name value pair (attribute/tuple) as a two level tree. That way, you can put any combination of attributes in the facet and parse them out at results list time. For example, you might be indexing computer hardware. Memory, Bus Speed and Resolution may be valid for some objects but not for others. Just put them in a facet and specify a separator: field name=attributememory:1GB/name field name=attributebusspeed:133Mhz/name field name=attributevoltage:110/220/name field name=attributemanufacturer:Shiangtsu/field When you do a facet query, you can easily display the categories appropriate to the object. And do facet selections like show me all green things and show me all size 4 things. Even if that's not your goal, this might help someone else. George Everitt On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote: So, I think I have things set up correctly in my schema, but it doesn't appear that any logic is being applied to my Category_# fields - they are being populated with the full string copied from the Category field (facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc. I have several different field types, each with a different regex to match a specific part of the input string. In this example, I'm matching facet1 in input string facet1::facet2::facet3...facetn fieldtype name=cat1str class=solr.TextField analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=^([^:]+) group=1/ /analyzer /fieldtype I have copyfields set up for each Category_# field. Anything obviously wrong? Thanks! Kristen -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, November 07, 2007 9:38 AM To: solr-user@lucene.apache.org Subject: Re: Can you parse the contents of a field to populate other fields? On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote: Yonik - thanks so much for your help! Just to clarify; where should the regex go for each field? Each field should have a different FieldType (referenced by the type XML attribute). Each fieldType can have it's own analyzer. You can use a different PatternTokenizer (which specifies a regex) for each analyzer. -Yonik
RE: Can you parse the contents of a field to populate other fields?
Yonik - thanks so much for your help! Just to clarify; where should the regex go for each field? Thanks! Kristen Kristen Roth Associate Software Engineer P 617.218.6661 F 617.218.6861 E [EMAIL PROTECTED] Molecular 343 Arsenal Street Watertown, MA 02472 www.Molecular.com Linked by Isobar -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, November 05, 2007 4:52 PM To: solr-user@lucene.apache.org Subject: Re: Can you parse the contents of a field to populate other fields? On 11/5/07, Kristen Roth [EMAIL PROTECTED] wrote: I'm wondering if this is possible... I am trying to model a hierarchy of facets, and have a field in my xml (Category) that structured like this: facet1::facet2::facet3... At index time, I would like to split this field on the :: to populate several other fields I have defined in my schema (Category_1, Category_2, Category_3). Is this possible? If so, what is the best way to do this? I think a PatternTokenizer might be able to do this... set up a copyField from Category to all the other category fields, and then set up a different regex on each field to pull out the right part. -Yonik