Thanks Li
On Thu, Dec 16, 2010 at 3:44 AM, Bing Li [via Lucene] <
[email protected]<ml-node%[email protected]>
> wrote:

> Hi, Kumar,
>
> To design a crawler is not an easy job. It depends on your goals. The most
> complicated one is to crawl the entire Web.
>
> http://www.amazon.com/HTTP-Programming-Recipes-Java-Bots/dp/0977320669
>
> This book might give you a hand.
>
> Thanks,
> LB
>
> On Thu, Dec 16, 2010 at 12:28 AM, Anurag <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=2094945&i=0>>
> wrote:
>
> >
> > Can you tell how u designed crawler ? Is it by by writing code like this
> > CrawlDb.java<
> > http://www.docjar.com/html/api/org/apache/nutch/crawl/CrawlDb.java.html>
>
> > ?
> >
> > Actually wring your own Crawler is important stuff, I want to know.
> >
> > Thanks
> >
> > On Wed, Dec 15, 2010 at 9:56 AM, Bing Li [via Lucene] <
> > [hidden email] 
> > <http://user/SendEmail.jtp?type=node&node=2094945&i=1><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=2094945&i=2>>
> > <[hidden email] 
> > <http://user/SendEmail.jtp?type=node&node=2094945&i=3><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=2094945&i=4>>
> > >
> > > wrote:
> >
> > > Hi, all,
> > >
> > > I am a new Nutch user. Before knowing Nutch, I designed a crawler
> myself.
> > > However, the quality is not good. So I decide to try Nutch.
> > >
> > > However, after reading some materials about Nutch, I notice that Nutch
> > puts
> > >
> > > all of crawled pages into persistent Lucene indexes. In my project, I
> > hope
> > > I
> > > could get crawled data in memory. So I can manipulate them in Java or
> C#
> > > collections. I don't want to retrieve the indexes crawled by Nutch.
> > >
> > > Could you give me a solution to that? Thanks so much!
> > >
> > > Best regards,
> > > Li Bing
> > >
> > >
> > > ------------------------------
> > >  View message @
> > >
> >
> http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2089972.html<http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2089972.html?by-user=t>
> > > To start a new topic under Nutch - User, email
> > > [hidden email] 
> > > <http://user/SendEmail.jtp?type=node&node=2094945&i=5><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=2094945&i=6>>
> > <[hidden email] 
> > <http://user/SendEmail.jtp?type=node&node=2094945&i=7><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=2094945&i=8>>
> > >
> > > To unsubscribe from Nutch - User, click here<
> >
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=&by-user=t>
> > >.
> > >
> > >
> >
> >
> >
> > --
> > Kumar Anurag
> >
> >
> > -----
> > Kumar Anurag
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2092990.html<http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2092990.html?by-user=t>
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> >
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2094945.html
>
> To start a new topic under Nutch - User, email
> [email protected]<ml-node%[email protected]>
> To unsubscribe from Nutch - User, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=>.
>
>



-- 
Kumar Anurag


-----
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-Crawled-Data-in-Java-or-C-Collections-tp2089972p2097015.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to