Chris, thanks for your reply!

 

The only additional information i can give is the Nutch subcollection 
configuration, result i get from Solr's index and that i'm using a nightly 
build that's not more than two weeks old. I'm testing Nutch/Solr by creating an 
index of some newspaper so i define categories such as economy, sport, film 
etc. Here's one of my subcollection definitions:

 

 <subcollection>
  <name>buitenland</name>
  <id>buitenland</id>
   <whitelist>
    http://www.DOMAIN.nl/buitenland/
   </whitelist>
  <blacklist />
 </subcollection>

 

There are about 10 definitions like this one for now. All specifiy some URL and 
the name and id field without the prefixed space, as you can see. Here is the 
subcollection field in some document in an resultset:

 

<str name="subcollection"> binnenland</str>

 

This problem is consistent throughout all resultsets and with all values for 
the subcollection field. All other fields in my Solr index are fine, it's just 
this field that's troublesome. There is no useful information in hadoop.log, 
nor in Solr's log as far as i can see. The plugin.includes property in my Nutch 
config just includes the subcollection plugin in the regex.

 

Cheers,

 
-----Original message-----
From: Chris Mattmann <[email protected]>
Sent: Sat 19-06-2010 19:08
To: [email protected]; 
Subject: Re: prefixed space in subcollection field

Hi Markus,

I read the documentation for the subcollection plugin here:

http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/subcollection/README.
txt

It didn 1;t mention anything about prefixing your field names with a space.
So, I went and checked:

http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/subcollection/src/jav
a/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java

It seems like the only thing it does beyond your normal NutchDocument that 1;s
indexed is add the sub collection name to the indexed set of fields, so I 1;m
wondering what you 1;re seeing here. Do you have any further information?

Cheers,
Chris


On 6/19/10 9:55 AM, "Markus Jelsma" <[email protected]> wrote:

> I'm sorry, but i need to bump this one. Any suggestions?
>  
> -----Original message-----
> From: Markus Jelsma <[email protected]>
> Sent: Tue 15-06-2010 10:51
> To: [email protected];
> Subject: prefixed space in subcollection field
> 
> Hi list,
> 
>  
> 
> Fields created by the subcollection plugin end up with a prefixed space in my
> Solr index but the name and id fields in my subcollection.xml don't have that
> same space prefixed, i checked it three times just to be certain i didn't mess
> up the configuration. I am unsure where the space comes from and where to fix
> it. Any ideas on this one?
> 
>  
> 
> Cheers,
> 


Reply via email to