Thanks Li On Thu, Dec 16, 2010 at 3:44 AM, Bing Li [via Lucene] < [email protected]<ml-node%[email protected]> > wrote:
> Hi, Kumar, > > To design a crawler is not an easy job. It depends on your goals. The most > complicated one is to crawl the entire Web. > > http://www.amazon.com/HTTP-Programming-Recipes-Java-Bots/dp/0977320669 > > This book might give you a hand. > > Thanks, > LB > > On Thu, Dec 16, 2010 at 12:28 AM, Anurag <[hidden > email]<http://user/SendEmail.jtp?type=node&node=2094945&i=0>> > wrote: > > > > > Can you tell how u designed crawler ? Is it by by writing code like this > > CrawlDb.java< > > http://www.docjar.com/html/api/org/apache/nutch/crawl/CrawlDb.java.html> > > > ? > > > > Actually wring your own Crawler is important stuff, I want to know. > > > > Thanks > > > > On Wed, Dec 15, 2010 at 9:56 AM, Bing Li [via Lucene] < > > [hidden email] > > <http://user/SendEmail.jtp?type=node&node=2094945&i=1><[hidden > email] <http://user/SendEmail.jtp?type=node&node=2094945&i=2>> > > <[hidden email] > > <http://user/SendEmail.jtp?type=node&node=2094945&i=3><[hidden > email] <http://user/SendEmail.jtp?type=node&node=2094945&i=4>> > > > > > > wrote: > > > > > Hi, all, > > > > > > I am a new Nutch user. Before knowing Nutch, I designed a crawler > myself. > > > However, the quality is not good. So I decide to try Nutch. > > > > > > However, after reading some materials about Nutch, I notice that Nutch > > puts > > > > > > all of crawled pages into persistent Lucene indexes. In my project, I > > hope > > > I > > > could get crawled data in memory. So I can manipulate them in Java or > C# > > > collections. I don't want to retrieve the indexes crawled by Nutch. > > > > > > Could you give me a solution to that? Thanks so much! > > > > > > Best regards, > > > Li Bing > > > > > > > > > ------------------------------ > > > View message @ > > > > > > http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2089972.html<http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2089972.html?by-user=t> > > > To start a new topic under Nutch - User, email > > > [hidden email] > > > <http://user/SendEmail.jtp?type=node&node=2094945&i=5><[hidden > email] <http://user/SendEmail.jtp?type=node&node=2094945&i=6>> > > <[hidden email] > > <http://user/SendEmail.jtp?type=node&node=2094945&i=7><[hidden > email] <http://user/SendEmail.jtp?type=node&node=2094945&i=8>> > > > > > > To unsubscribe from Nutch - User, click here< > > > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=&by-user=t> > > >. > > > > > > > > > > > > > > -- > > Kumar Anurag > > > > > > ----- > > Kumar Anurag > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2092990.html<http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2092990.html?by-user=t> > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > > > ------------------------------ > View message @ > http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2094945.html > > To start a new topic under Nutch - User, email > [email protected]<ml-node%[email protected]> > To unsubscribe from Nutch - User, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=>. > > -- Kumar Anurag ----- Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2097015.html Sent from the Nutch - User mailing list archive at Nabble.com.

