> hi > after doing invert link i see the complete link graph...THANKS > > I m bit confused, please help me understand.. > > I do crawl using crawl command. I see around 7000+ urls when i dump > crawldb. Then i do invertlink and i see the complete link graph. > After this i do solrindex. > > After solr indexing is completed i see only 2421 docs. I was expecting > 7000+ docs (i.e exact number of unique urls which i got from dumping > crawldb as text)
Did you consider URL's that responsed with a non-200 HTTP response code? They are not sent to the index. > > Why i just see 2421 urls/docs in solr? > Do i need to execute crawl again after invertlink? No :) > > Here are some settings > -------------------------------------------------------------- > <name>db.update.max.inlinks</name> > <value>10000</value> > > <name>db.ignore.internal.links</name> > <value>false</value> > > <name>db.max.inlinks</name> > <value>10000</value> > > <name>db.max.outlinks.per.page</name> > <value>-1</value> > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp32741 > 27p3278779.html Sent from the Nutch - User mailing list archive at > Nabble.com.

