Can you tell how u designed crawler ? Is it by by writing code like this
CrawlDb.java<http://www.docjar.com/html/api/org/apache/nutch/crawl/CrawlDb.java.html>
?

Actually wring your own Crawler is important stuff, I want to know.

Thanks

On Wed, Dec 15, 2010 at 9:56 AM, Bing Li [via Lucene] <
[email protected]<ml-node%[email protected]>
> wrote:

> Hi, all,
>
> I am a new Nutch user. Before knowing Nutch, I designed a crawler myself.
> However, the quality is not good. So I decide to try Nutch.
>
> However, after reading some materials about Nutch, I notice that Nutch puts
>
> all of crawled pages into persistent Lucene indexes. In my project, I hope
> I
> could get crawled data in memory. So I can manipulate them in Java or C#
> collections. I don't want to retrieve the indexes crawled by Nutch.
>
> Could you give me a solution to that? Thanks so much!
>
> Best regards,
> Li Bing
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2089972.html
> To start a new topic under Nutch - User, email
> [email protected]<ml-node%[email protected]>
> To unsubscribe from Nutch - User, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=>.
>
>



-- 
Kumar Anurag


-----
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2092990.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to