Re: [MASSMAIL]Re: Split content of metatag to multi value field

Jorge Luis Betancourt González Tue, 23 Jun 2015 08:10:51 -0700

----- Original Message -----
From: "Peter Kraume" <[email protected]>
To: [email protected]
Sent: Tuesday, June 23, 2015 10:38:46 AM
Subject: [MASSMAIL]Re: Split content of metatag to multi value field




> Am 23.06.2015 um 15:06 schrieb Jorge Luis Betancourt González 
> <[email protected]>:
> 
> I don't think there is any built-in way of doing this, the problem is 
> essentially that for your particular use case you want the value of the meta 
> tag to be an array instead of only one value, this means that you'll need to 
> either change the parse-metatags plugin (or index-metadata) to accomplish the 
> desired goal. 

Maybe it’s easier to change the HTML code of the website to have multiple lines 
of <meta name="ORIGIN" content="xx" /> with single country codes since the 
metatags plugin can handle this as array.

Well if you can to change the website :-) 


> 
> I'm guessing that you want to store as a multivalued field in Solr just to 
> get the stored value as an array instead of a comma separated list right? 
> because for search purposes the StandardTokenizerFactory will emit a stream 
> of separated tokens for search, and you will be able to search for any of the 
> language codes specified in the ORIGIN meta. 

If I get you right, you think I can query a comma separated string like it 
would be an array in Solr?

This depend on the tokenizer and analyzers you're using in your field, can you 
share the relevant schema.xml section? the field declaration and field type of 
the "countries_stringM" field should be enough. 

The general answer is that you can query this string for it parts (terms), if 
you're indexing this field in a solr.TextField. When you use the solr.TextField 
in your schema, the you'll need to define a tokenizer which only purpose is to 
transform the original value sent to the field into a stream/list of tokens. In 
your particular case, the StandardTokenizerFactory[1] will generate a stream of 
tokens, which basically means that it will split your original text 
"de,us,be,in,il,it" into a list of tokens [de, us, be, in, il, it] and this 
will be the terms that solr will use to match the document to a given query, so 
if your issue a query like: (countries_stringM:us) it will match your document.

For instance a very basic fieldType that you can use in your schema.xml file is:

<fieldType name="keywords" class="solr.TextField" sortMissingLast="true">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
</fieldType

and then declare your field to use this fieldType:

<field name="countries_stringM" type="keywords" indexed="true" stored="true"/>

[1] https://cwiki.apache.org/confluence/display/solr/Tokenizers

Cheers
Peter

Re: [MASSMAIL]Re: Split content of metatag to multi value field

Reply via email to