Get Crawled Data in Java or C# Collections

Bing Li Tue, 14 Dec 2010 20:26:17 -0800

Hi, all,

I am a new Nutch user. Before knowing Nutch, I designed a crawler myself.
However, the quality is not good. So I decide to try Nutch.


However, after reading some materials about Nutch, I notice that Nutch puts
all of crawled pages into persistent Lucene indexes. In my project, I hope I
could get crawled data in memory. So I can manipulate them in Java or C#
collections. I don't want to retrieve the indexes crawled by Nutch.

Could you give me a solution to that? Thanks so much!

Best regards,
Li Bing

Get Crawled Data in Java or C# Collections

Reply via email to