> From what you mentioned, if I understand it correctly - I am going to take
> "anchor" as an example as I don't see that being captured in Solr index
> data -
>
> (a) "anchor" *is *defined under Solr-  schema.xml
> (b) "anchor *is NOT *defined in gora-hbase-mapping.xml
> (c) "anchor"* is NOT* defined in solrindex-mapping.xml
>
> So, my question would be - why is "anchor" not see in Solr index data
> ...

(d) anchor is filled by index-anchor.

Is the index-anchor plugin active?
There will be also no anchors, if there are no inlinks at all, or
no inlinks from different hosts if the property "db.ignore.internal.links"
is true (default).

Sebastian




On 04/21/2014 04:38 PM, A Laxmi wrote:
> Hi Sebastian,
> 
> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will be
>> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?
> 
> 
> 
> 
> * Which fields are indexed does not depend on the storage mapping. Fields
> are filled by indexing filters, some "basic" fields also by the indexer
> itself. Which fields are sent to the indexing back-ends is defined in
> schema and mappings of the back-end.*
> 
> From what you mentioned, if I understand it correctly - I am going to take
> "anchor" as an example as I don't see that being captured in Solr index
> data -
> 
> (a) "anchor" *is *defined under Solr-  schema.xml
> (b) "anchor *is NOT *defined in gora-hbase-mapping.xml
> (c) "anchor"* is NOT* defined in solrindex-mapping.xml
> 
> So, my question would be - why is "anchor" not see in Solr index data
> though it is defined in Solr - schema.xml [(a) from above]?? When you said
> mappings of the back-end - are you referring to gora-hbase-mapping.xml [(b)
> from above]??
> 
> Thanks for your help..
> 
> 
> 
> On Sat, Apr 19, 2014 at 4:28 AM, Sebastian Nagel <[email protected]
>> wrote:
> 
>> Hi,
>>
>>> 1. With ref to link below, what is the difference between '*site*' and '
>>> *url*'?
>>> Link [0]: http://wiki.apache.org/nutch/IndexStructure
>>
>> the field "site" (from indexing filter plugin index-basic)
>> has been removed some time ago (since Nutch 1.5?) because
>> it's an alias for "host". We need to update the mentioned
>> wiki page, accordingly (done right now). Thanks for the hint!
>>
>>> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will
>> be
>>> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?
>>
>> Which fields are indexed does not depend on the storage mapping.
>> Fields are filled by indexing filters, some "basic" fields also
>> by the indexer itself. Which fields are sent to the indexing
>> back-ends is defined in schema and mappings of the back-end.
>>
>> Sebastian
>>
>>
>> On 04/19/2014 03:18 AM, A Laxmi wrote:
>>> Hi,
>>>
>>> I am using Nutch 2.2.1 with HBase. I have couple of questions about the
>>> index fields in Nutch:
>>>
>>> 1. With ref to link below, what is the difference between '*site*' and '
>>> *url*'?
>>> Link [0]: http://wiki.apache.org/nutch/IndexStructure
>>>
>>> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will
>> be
>>> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?
>>>
>>>
>>> Thanks for any help..
>>>
>>
>>
> 

Reply via email to