Hi Sebastian,
*>(d) anchor is filled by index-anchor. Is the index-anchor plugin active? * Could you please tell me how I can check if index-anchor plugin is active. I do see that listed under plugin.includes as index - basic|anchor|more. Is there any other way to check if it is active? > *There will be also no anchors, if there are no inlinks at all, or no inlinks from different hosts if the property "db.ignore.internal.links" is true (default).* I have *"db.ignore.internal.links" *as false *.* Thanks for your help! On Mon, Apr 21, 2014 at 4:09 PM, Sebastian Nagel <[email protected] > wrote: > > From what you mentioned, if I understand it correctly - I am going to > take > > "anchor" as an example as I don't see that being captured in Solr index > > data - > > > > (a) "anchor" *is *defined under Solr- schema.xml > > (b) "anchor *is NOT *defined in gora-hbase-mapping.xml > > (c) "anchor"* is NOT* defined in solrindex-mapping.xml > > > > So, my question would be - why is "anchor" not see in Solr index data > > ... > > (d) anchor is filled by index-anchor. > > Is the index-anchor plugin active? > There will be also no anchors, if there are no inlinks at all, or > no inlinks from different hosts if the property "db.ignore.internal.links" > is true (default). > > Sebastian > > > > > On 04/21/2014 04:38 PM, A Laxmi wrote: > > Hi Sebastian, > > > > 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' will > be > >> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'? > > > > > > > > > > * Which fields are indexed does not depend on the storage mapping. Fields > > are filled by indexing filters, some "basic" fields also by the indexer > > itself. Which fields are sent to the indexing back-ends is defined in > > schema and mappings of the back-end.* > > > > From what you mentioned, if I understand it correctly - I am going to > take > > "anchor" as an example as I don't see that being captured in Solr index > > data - > > > > (a) "anchor" *is *defined under Solr- schema.xml > > (b) "anchor *is NOT *defined in gora-hbase-mapping.xml > > (c) "anchor"* is NOT* defined in solrindex-mapping.xml > > > > So, my question would be - why is "anchor" not see in Solr index data > > though it is defined in Solr - schema.xml [(a) from above]?? When you > said > > mappings of the back-end - are you referring to gora-hbase-mapping.xml > [(b) > > from above]?? > > > > Thanks for your help.. > > > > > > > > On Sat, Apr 19, 2014 at 4:28 AM, Sebastian Nagel < > [email protected] > >> wrote: > > > >> Hi, > >> > >>> 1. With ref to link below, what is the difference between '*site*' and > ' > >>> *url*'? > >>> Link [0]: http://wiki.apache.org/nutch/IndexStructure > >> > >> the field "site" (from indexing filter plugin index-basic) > >> has been removed some time ago (since Nutch 1.5?) because > >> it's an alias for "host". We need to update the mentioned > >> wiki page, accordingly (done right now). Thanks for the hint! > >> > >>> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' > will > >> be > >>> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'? > >> > >> Which fields are indexed does not depend on the storage mapping. > >> Fields are filled by indexing filters, some "basic" fields also > >> by the indexer itself. Which fields are sent to the indexing > >> back-ends is defined in schema and mappings of the back-end. > >> > >> Sebastian > >> > >> > >> On 04/19/2014 03:18 AM, A Laxmi wrote: > >>> Hi, > >>> > >>> I am using Nutch 2.2.1 with HBase. I have couple of questions about the > >>> index fields in Nutch: > >>> > >>> 1. With ref to link below, what is the difference between '*site*' and > ' > >>> *url*'? > >>> Link [0]: http://wiki.apache.org/nutch/IndexStructure > >>> > >>> 2. From gora-hbase-mapping.xml - with ref to Link[0] above, '*url*' > will > >> be > >>> '*baseUrl*' in hbase (data Nutch stores). What will be '*site*'? > >>> > >>> > >>> Thanks for any help.. > >>> > >> > >> > > > >

