Re: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread Yonik Seeley
On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote:
 Yonik - thanks so much for your help!  Just to clarify; where should the
 regex go for each field?

Each field should have a different FieldType (referenced by the type
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik


RE: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread Kristen Roth
So, I think I have things set up correctly in my schema, but it doesn't
appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category field
(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.

I have several different field types, each with a different regex to
match a specific part of the input string.  In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn

fieldtype name=cat1str class=solr.TextField
analyzer type=index
tokenizer class=solr.PatternTokenizerFactory
pattern=^([^:]+) group=1/
/analyzer 
/fieldtype

I have copyfields set up for each Category_# field.  Anything obviously
wrong?

Thanks!
Kristen

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote:
 Yonik - thanks so much for your help!  Just to clarify; where should
the
 regex go for each field?

Each field should have a different FieldType (referenced by the type
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik


Re: Can you parse the contents of a field to populate other fields?

2007-11-07 Thread George Everitt
I'm not sure I fully understand your ultimate goal or Yonik's  
response.  However, in the past I've been able to represent  
hierarchical data as a simple enumeration of delimited paths:


field name=taxonomyroot/field
field name=taxonomyroot/region/field
field name=taxonomyroot/region/north america/field
field name=taxonomyroot/region/south america/field

Then, at response time, you can walk the result facet and build a  
hierarchy with counts that can be put into a tree view.  The tree can  
be any arbitrary depth, and documents can live in any combination of  
nodes on the tree.


In addition, you can represent any arbitrary name value pair  
(attribute/tuple) as a two level tree.   That way, you can put any  
combination of attributes in the facet and parse them out at results  
list time.  For example, you might be indexing computer hardware.
Memory, Bus Speed and Resolution may be valid for some objects but not  
for others.   Just put them in a facet and specify a separator:


field name=attributememory:1GB/name
field name=attributebusspeed:133Mhz/name
field name=attributevoltage:110/220/name
field name=attributemanufacturer:Shiangtsu/field


When you do a facet query, you can easily display the categories  
appropriate to the object.  And do facet selections like show me all  
green things and show me all size 4 things.



Even if that's not your goal, this might help someone else.


George Everitt







On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote:

So, I think I have things set up correctly in my schema, but it  
doesn't

appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category  
field

(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.

I have several different field types, each with a different regex to
match a specific part of the input string.  In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn

   fieldtype name=cat1str class=solr.TextField
analyzer type=index
tokenizer class=solr.PatternTokenizerFactory
pattern=^([^:]+) group=1/
/analyzer
   /fieldtype

I have copyfields set up for each Category_# field.  Anything  
obviously

wrong?

Thanks!
Kristen

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/6/07, Kristen Roth [EMAIL PROTECTED] wrote:

Yonik - thanks so much for your help!  Just to clarify; where should

the

regex go for each field?


Each field should have a different FieldType (referenced by the type
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik





RE: Can you parse the contents of a field to populate other fields?

2007-11-06 Thread Kristen Roth
Yonik - thanks so much for your help!  Just to clarify; where should the
regex go for each field?

Thanks!
Kristen


Kristen Roth
Associate Software Engineer
P 617.218.6661
F 617.218.6861
E [EMAIL PROTECTED]

Molecular
343 Arsenal Street
Watertown, MA 02472
www.Molecular.com

Linked by Isobar 


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Monday, November 05, 2007 4:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/5/07, Kristen Roth [EMAIL PROTECTED] wrote:
 I'm wondering if this is possible... I am trying to model a hierarchy
of
 facets, and have a field in my xml (Category) that structured like
this:
 facet1::facet2::facet3...  At index time, I would like to split this
 field on the :: to populate several other fields I have defined in my
 schema (Category_1, Category_2, Category_3).  Is this possible?  If
so,
 what is the best way to do this?

I think a PatternTokenizer might be able to do this...
set up a copyField from Category to all the other category fields, and
then set up a different regex on each field to pull out the right
part.

-Yonik