Hi Sebastian,

2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will be
> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?




* Which fields are indexed does not depend on the storage mapping. Fields
are filled by indexing filters, some "basic" fields also by the indexer
itself. Which fields are sent to the indexing back-ends is defined in
schema and mappings of the back-end.*

>From what you mentioned, if I understand it correctly - I am going to take
"anchor" as an example as I don't see that being captured in Solr index
data -

(a) "anchor" *is *defined under Solr-  schema.xml
(b) "anchor *is NOT *defined in gora-hbase-mapping.xml
(c) "anchor"* is NOT* defined in solrindex-mapping.xml

So, my question would be - why is "anchor" not see in Solr index data
though it is defined in Solr - schema.xml [(a) from above]?? When you said
mappings of the back-end - are you referring to gora-hbase-mapping.xml [(b)
from above]??

Thanks for your help..



On Sat, Apr 19, 2014 at 4:28 AM, Sebastian Nagel <[email protected]
> wrote:

> Hi,
>
> > 1. With ref to link below, what is the difference between '*site*' and '
> > *url*'?
> > Link [0]: http://wiki.apache.org/nutch/IndexStructure
>
> the field "site" (from indexing filter plugin index-basic)
> has been removed some time ago (since Nutch 1.5?) because
> it's an alias for "host". We need to update the mentioned
> wiki page, accordingly (done right now). Thanks for the hint!
>
> > 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will
> be
> > '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?
>
> Which fields are indexed does not depend on the storage mapping.
> Fields are filled by indexing filters, some "basic" fields also
> by the indexer itself. Which fields are sent to the indexing
> back-ends is defined in schema and mappings of the back-end.
>
> Sebastian
>
>
> On 04/19/2014 03:18 AM, A Laxmi wrote:
> > Hi,
> >
> > I am using Nutch 2.2.1 with HBase. I have couple of questions about the
> > index fields in Nutch:
> >
> > 1. With ref to link below, what is the difference between '*site*' and '
> > *url*'?
> > Link [0]: http://wiki.apache.org/nutch/IndexStructure
> >
> > 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will
> be
> > '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'?
> >
> >
> > Thanks for any help..
> >
>
>

Reply via email to