Hi,

The LanguageIndexingFilter adds a lang field to the NutchDocument object which 
in turn can be read in the Solr indexer.

You other suggestion seems separate from this issue but you could open a 
ticket for that one, or you can use the subcollection plugin to set a value, 
although this seems a bit overkill ;)

Cheers,

On Wednesday 20 October 2010 16:04:25 Robert Douglass wrote:
> This highlights a general need in this mapping scheme, and yes, the
> ${lang} approach is desirable.
> 
> The general need is to be able to pass more information into Solr than
> just a simple 1:1 mapping of existing values to Solr fields. Another
> area where I've run into this need is the case of passing a constant
> field into Solr that identifies this document as coming from Nutch.
> 
> In my Solr schema, there is an "entity" field which tracks the kind of
> document we're dealing with. For those documents coming from Nutch, I'd
> like to be able to do something like:
> 
> <field dest="entity" value="nutch"/>
> or
> <field dest="entity" source="some nutch field" default="nutch"/>
> (the second example would only use the default if the source were null)
> 
> Note that in the case of ${lang}, this doesn't seem to be one of the
> available NutchField's (see write() in indexer.solr.SolrWriter.java); is
> there a configuration of Nutch that makes language available at that
> point in execution?
> 
> On 10/20/2010 03:50 PM, Markus Jelsma wrote:
> > Hi,
> > 
> > I believe this is very useful indeed. I'd go for the ${lang} method
> > because it allows you to keep your own preferred Solr schema namespacing
> > for languages. The first method isn't clear on how the fields are named
> > in the Solr schema.
> > 
> > Other thoughts in this one?
> > 
> > Could you open an issue in Jira?
> > 
> > Cheers,
> > 
> > On Wednesday, October 20, 2010 09:27:42 am Matthias Paul wrote:
> >> Hi,
> >> 
> >> I'm using Nutch with the language-identifier plugin enabled to detect
> >> the language of the html-pages. For indexing I use a Solr server.
> >> So far everything works but there's one problem: I don't know how to map
> >> multilingual fields to their corresponding Solr-field.
> >> The mapping file solrindex-mapping.xml contains the following:
> >> <field dest="lang" source="lang"/>
> >> <field dest="title" source="title"/>
> >> 
> >> But what I would like to have is the following
> >> <field dest="lang" source="lang"/>
> >> <field dest="title" source="title" multilingual="true" language="lang"/>
> >> or maybe
> >> <field dest="lang" source="lang"/>
> >> <field dest="title_${lang}" source="title" />
> >> so that the title-field gets mapped to title_en for English-pages and
> >> tilte_fr for French pages.
> >> 
> >> I found the SolrWriter- and SorlMappingReader-classes in the
> >> source-code, an it should be easy to integrate it there.
> >> What do you think? Could this be useful also to others?
> >> Or are there any other solutions out there?
> >> 
> >> Thanks
> >> Matthias

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Reply via email to