-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This highlights a general need in this mapping scheme, and yes, the
${lang} approach is desirable.

The general need is to be able to pass more information into Solr than
just a simple 1:1 mapping of existing values to Solr fields. Another
area where I've run into this need is the case of passing a constant
field into Solr that identifies this document as coming from Nutch.

In my Solr schema, there is an "entity" field which tracks the kind of
document we're dealing with. For those documents coming from Nutch, I'd
like to be able to do something like:

<field dest="entity" value="nutch"/>
or
<field dest="entity" source="some nutch field" default="nutch"/>
(the second example would only use the default if the source were null)

Note that in the case of ${lang}, this doesn't seem to be one of the
available NutchField's (see write() in indexer.solr.SolrWriter.java); is
there a configuration of Nutch that makes language available at that
point in execution?

On 10/20/2010 03:50 PM, Markus Jelsma wrote:
> Hi,
> 
> I believe this is very useful indeed. I'd go for the ${lang} method because 
> it 
> allows you to keep your own preferred Solr schema namespacing for languages. 
> The first method isn't clear on how the fields are named in the Solr schema.
> 
> Other thoughts in this one?
> 
> Could you open an issue in Jira?
> 
> Cheers,
> 
> On Wednesday, October 20, 2010 09:27:42 am Matthias Paul wrote:
>> Hi,
>>
>> I'm using Nutch with the language-identifier plugin enabled to detect the
>> language of the html-pages. For indexing I use a Solr server.
>> So far everything works but there's one problem: I don't know how to map
>> multilingual fields to their corresponding Solr-field.
>> The mapping file solrindex-mapping.xml contains the following:
>> <field dest="lang" source="lang"/>
>> <field dest="title" source="title"/>
>>
>> But what I would like to have is the following
>> <field dest="lang" source="lang"/>
>> <field dest="title" source="title" multilingual="true" language="lang"/>
>> or maybe
>> <field dest="lang" source="lang"/>
>> <field dest="title_${lang}" source="title" />
>> so that the title-field gets mapped to title_en for English-pages and
>> tilte_fr for French pages.
>>
>> I found the SolrWriter- and SorlMappingReader-classes in the source-code,
>> an it should be easy to integrate it there.
>> What do you think? Could this be useful also to others?
>> Or are there any other solutions out there?
>>
>> Thanks
>> Matthias
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJMvvbpAAoJEJGMmWjehO3iXVUH/1f1dCMZ/qErS6AoTWrcRq0x
C8BSOEG4Kj6tns8YEwTyLjaNKfxpr3VROzz0xtxlTSJqotECy7tKFEyJ0GzMPEDI
PUqUU7apcMpGwi8WG3iiTgJ3otWv3RChEXT2iCikKPrdv8bJHNhbowej+d135T5Y
jwXWGvY90URjdHSNGuFHUdQaQ52e4n7ZmOf77MrsIvLp1eiYGflz4BGgLQw1td2C
UGDLCYFQA+n6lod1TsL51XpIBfjLxqFQ4KLiNg7yJQLdtxov7DiQA1NVR2oWh+Oo
fedx24JJltNxDI5y9np8Go5xeXTqonkq3qlQgCy5f45iKg9IQPOeoRdlVriSLJI=
=LSrq
-----END PGP SIGNATURE-----

Reply via email to