Hi,

> 1. With ref to link below, what is the difference between '*site*' and '
> *url*'?
> Link [0]: http://wiki.apache.org/nutch/IndexStructure

the field "site" (from indexing filter plugin index-basic)
has been removed some time ago (since Nutch 1.5?) because
it's an alias for "host". We need to update the mentioned
wiki page, accordingly (done right now). Thanks for the hint!

> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will be
> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?

Which fields are indexed does not depend on the storage mapping.
Fields are filled by indexing filters, some "basic" fields also
by the indexer itself. Which fields are sent to the indexing
back-ends is defined in schema and mappings of the back-end.

Sebastian


On 04/19/2014 03:18 AM, A Laxmi wrote:
> Hi,
> 
> I am using Nutch 2.2.1 with HBase. I have couple of questions about the
> index fields in Nutch:
> 
> 1. With ref to link below, what is the difference between '*site*' and '
> *url*'?
> Link [0]: http://wiki.apache.org/nutch/IndexStructure
> 
> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will be
> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?
> 
> 
> Thanks for any help..
> 

Reply via email to