Hello,
I built my own crawler with Python, as I couldn't find (not
complaining, probably didn't look hard enough)
nutch documentation. I use BeautifulSoup, because the site is mostly
based on Python/Django, and we like
Python.
Writing one was good for us because we spent most of out time
Hi Tony,
Strangely I started looking into the Solr/Nutch integration yesterday
so I might be able to help :)
The documentation for it is very sparse, but the trunk of nutch does
have the solr integration committed.
If I remember correctly, what I had to do was...
I went through one of
Also consider droids:
http://incubator.apache.org/droids/
On Mar 5, 2009, at 6:32 PM, Tony Wang wrote:
Hi,
I wonder if there's any open source crawler product that could be
integrated
with Solr. What crawler do you guys use? or you coded one by
yourself? I
have been trying to find out
Thank you all so much! I sincerely appreciate the help received. Tony
On Fri, Mar 6, 2009 at 5:02 AM, Toby Cole toby.c...@semantico.com wrote:
Hi Tony,
Strangely I started looking into the Solr/Nutch integration
yesterday so I might be able to help :)
The documentation for it is very
We too use Heritrix. We tried Nutch first but Nutch was not finding all
of the documents that it was supposed to. When Nutch and Heritrix were
both set to crawl our own site to a depth of three, Nutch missed some
pages that were linked directly from the seed. We ended up with 10%-20%
fewer pages
See http://crawler.archive.org/faq.html#new_writer For other Heritrix
questions, this should probably go to the Heritrix list.
-Sean
Tony Wang wrote:
Sean -
I found Heritrix is pretty easy to set up. I am testing it on my server here
http://66.197.161.133:8081, and am trying to create crawl
Hi,
I wonder if there's any open source crawler product that could be integrated
with Solr. What crawler do you guys use? or you coded one by yourself? I
have been trying to find out solutions for Nutch/Solr integration, but
haven't got any luck yet.
Could someone shed me some light?
thanks!
We are using Heritrix, the Internet Archive’s open source crawler, which is
very easy to extend. We have augmented it with a custom parser to crawl some
specific data formats and coded our own processors (Heritrix’s terminology for
extensions) to link together different data sources as well as
Yes, Nutch works quite well as a crawler for Solr.
- Original Message -
From: Tony Wang ivyt...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, March 5, 2009 5:32:57 PM GMT -06:00 US/Canada Central
Subject: what crawler do you use for Solr indexing?
Hi,
I wonder if there's any
To: solr-user@lucene.apache.org
Sent: Thursday, March 5, 2009 5:32:57 PM GMT -06:00 US/Canada Central
Subject: what crawler do you use for Solr indexing?
Hi,
I wonder if there's any open source crawler product that could be
integrated
with Solr. What crawler do you guys use? or you coded one
- Nutch
- Original Message
From: Tony Wang ivyt...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, March 5, 2009 6:32:57 PM
Subject: what crawler do you use for Solr indexing?
Hi,
I wonder if there's any open source crawler product that could be integrated
with Solr
that's where the integration
is coming from.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Tony Wang ivyt...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, March 5, 2009 6:32:57 PM
Subject: what crawler do you use for Solr
Message
From: Tony Wang ivyt...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thursday, March 5, 2009 6:32:57 PM
Subject: what crawler do you use for Solr indexing?
Hi,
I wonder if there's any open source crawler product that could be
integrated
with Solr. What crawler do you
: with Solr. What crawler do you guys use? or you coded one by yourself? I
neither -- i've never indexed crawled data With Solr, i only ever index
structured data in one form or another.
(the closest i've ever come to using a crawler with Solr is some ant tasks
i whiped up one day to
Hi Hoss,
But I cannot find documents about the integration of Nutch and Solr in
anywhere. Could you give me some clue? thanks
Tony
On Thu, Mar 5, 2009 at 11:14 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: with Solr. What crawler do you guys use? or you coded one by yourself? I
15 matches
Mail list logo