Apache Nutch Output structure

sanjay singh Thu, 01 Oct 2015 23:22:44 -0700

Hi,
I am trying to crawl certain set of websites using Apache nutch. I
configured nutch with required parameters. After crawling I got various
segments as output which I merged into one segement.
But still I am unable to relate with the file structure that is there in
output and meaning associated with it.
I got in merged segment following directories
content
crawl_fetch
crawl_generate
crawl_parse
parse_data
parse_text


Can someone please explain the significance of these directories or point
me to certain documentation which explains it in detail.


-- 
Regards,
Sanjay Singh, PICT Pune

Apache Nutch Output structure

Reply via email to