Re: what crawler do you use for Solr indexing?

2009-03-13 Thread ristretto.rb
Hello, I built my own crawler with Python, as I couldn't find (not complaining, probably didn't look hard enough) nutch documentation. I use BeautifulSoup, because the site is mostly based on Python/Django, and we like Python. Writing one was good for us because we spent most of out time

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Toby Cole
Hi Tony, Strangely I started looking into the Solr/Nutch integration yesterday so I might be able to help :) The documentation for it is very sparse, but the trunk of nutch does have the solr integration committed. If I remember correctly, what I had to do was... I went through one of

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Ryan McKinley
Also consider droids: http://incubator.apache.org/droids/ On Mar 5, 2009, at 6:32 PM, Tony Wang wrote: Hi, I wonder if there's any open source crawler product that could be integrated with Solr. What crawler do you guys use? or you coded one by yourself? I have been trying to find out

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Tony Wang
Thank you all so much! I sincerely appreciate the help received. Tony On Fri, Mar 6, 2009 at 5:02 AM, Toby Cole toby.c...@semantico.com wrote: Hi Tony, Strangely I started looking into the Solr/Nutch integration yesterday so I might be able to help :) The documentation for it is very

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Sean Timm
We too use Heritrix. We tried Nutch first but Nutch was not finding all of the documents that it was supposed to. When Nutch and Heritrix were both set to crawl our own site to a depth of three, Nutch missed some pages that were linked directly from the seed. We ended up with 10%-20% fewer pages

Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Sean Timm
See http://crawler.archive.org/faq.html#new_writer For other Heritrix questions, this should probably go to the Heritrix list. -Sean Tony Wang wrote: Sean - I found Heritrix is pretty easy to set up. I am testing it on my server here http://66.197.161.133:8081, and am trying to create crawl

what crawler do you use for Solr indexing?

2009-03-05 Thread Tony Wang
Hi, I wonder if there's any open source crawler product that could be integrated with Solr. What crawler do you guys use? or you coded one by yourself? I have been trying to find out solutions for Nutch/Solr integration, but haven't got any luck yet. Could someone shed me some light? thanks!

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Baalman, Laura A. (ARC-TI)[QSS GROUP INC]
We are using Heritrix, the Internet Archive’s open source crawler, which is very easy to extend. We have augmented it with a custom parser to crawl some specific data formats and coded our own processors (Heritrix’s terminology for extensions) to link together different data sources as well as

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Nick Tkach
Yes, Nutch works quite well as a crawler for Solr. - Original Message - From: Tony Wang ivyt...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, March 5, 2009 5:32:57 PM GMT -06:00 US/Canada Central Subject: what crawler do you use for Solr indexing? Hi, I wonder if there's any

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Tony Wang
To: solr-user@lucene.apache.org Sent: Thursday, March 5, 2009 5:32:57 PM GMT -06:00 US/Canada Central Subject: what crawler do you use for Solr indexing? Hi, I wonder if there's any open source crawler product that could be integrated with Solr. What crawler do you guys use? or you coded one

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Otis Gospodnetic
- Nutch - Original Message From: Tony Wang ivyt...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, March 5, 2009 6:32:57 PM Subject: what crawler do you use for Solr indexing? Hi, I wonder if there's any open source crawler product that could be integrated with Solr

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Tony Wang
that's where the integration is coming from. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tony Wang ivyt...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, March 5, 2009 6:32:57 PM Subject: what crawler do you use for Solr

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Tony Wang
Message From: Tony Wang ivyt...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, March 5, 2009 6:32:57 PM Subject: what crawler do you use for Solr indexing? Hi, I wonder if there's any open source crawler product that could be integrated with Solr. What crawler do you

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Chris Hostetter
: with Solr. What crawler do you guys use? or you coded one by yourself? I neither -- i've never indexed crawled data With Solr, i only ever index structured data in one form or another. (the closest i've ever come to using a crawler with Solr is some ant tasks i whiped up one day to

Re: what crawler do you use for Solr indexing?

2009-03-05 Thread Tony Wang
Hi Hoss, But I cannot find documents about the integration of Nutch and Solr in anywhere. Could you give me some clue? thanks Tony On Thu, Mar 5, 2009 at 11:14 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : with Solr. What crawler do you guys use? or you coded one by yourself? I