> hi
> after doing invert link i see the complete link graph...THANKS
> 
> I m bit confused, please help me understand..
> 
> I do crawl using crawl command. I see around 7000+ urls when i dump
> crawldb. Then i do invertlink and i see the complete link graph.
> After this i do solrindex.
> 
> After solr indexing is completed i see only 2421 docs. I was expecting
> 7000+ docs (i.e exact number of unique urls which i got from dumping
> crawldb as text)

Did you consider URL's that responsed with a non-200 HTTP response code? They 
are not sent to the index.

> 
> Why i just see 2421 urls/docs in solr?
> Do i need to execute crawl again after invertlink?

No :)

> 
> Here are some settings
> --------------------------------------------------------------
>   <name>db.update.max.inlinks</name>
>   <value>10000</value>
> 
>   <name>db.ignore.internal.links</name>
>   <value>false</value>
> 
>   <name>db.max.inlinks</name>
>   <value>10000</value>
> 
>   <name>db.max.outlinks.per.page</name>
>   <value>-1</value>
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/readdblink-not-showing-alllinks-tp32741
> 27p3278779.html Sent from the Nutch - User mailing list archive at
> Nabble.com.

Reply via email to