Hi, > 1. With ref to link below, what is the difference between '*site*' and ' > *url*'? > Link [0]: http://wiki.apache.org/nutch/IndexStructure
the field "site" (from indexing filter plugin index-basic) has been removed some time ago (since Nutch 1.5?) because it's an alias for "host". We need to update the mentioned wiki page, accordingly (done right now). Thanks for the hint! > 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will be > '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'? Which fields are indexed does not depend on the storage mapping. Fields are filled by indexing filters, some "basic" fields also by the indexer itself. Which fields are sent to the indexing back-ends is defined in schema and mappings of the back-end. Sebastian On 04/19/2014 03:18 AM, A Laxmi wrote: > Hi, > > I am using Nutch 2.2.1 with HBase. I have couple of questions about the > index fields in Nutch: > > 1. With ref to link below, what is the difference between '*site*' and ' > *url*'? > Link [0]: http://wiki.apache.org/nutch/IndexStructure > > 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will be > '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'? > > > Thanks for any help.. >

