This is all great news. It sounds like Nutch 2.0 is more of what I am
looking for.

Any idea on the timeline for the first nutch 2 release? I would love
to get involved.

Are the 19 unresolved issues for the 2.0 release in JIRA the only
things that need to be tackled?

I can move this conversation over to the DEV mailing list if you would like.

Thanks!!!!!!!1!!!

Jeremy


On Thu, Feb 24, 2011 at 5:40 PM, Markus Jelsma
<[email protected]> wrote:
>> Thanks for the reply Mark.
>>
>> So this means Nutch is really only going to be used for crawling now?
>> Are there any plans for a JSON/XML RPC interface to using Nutch like
>> Solr supports?
>
> Yes, Nutch is going to focus to the fetch and parse jobs. Andrzej was working
> on a REST interface to control these jobs. This is part of 2.0.
>
>>
>> I am interested in a tight app integration where I can easily start
>> crawls of new sites, and add/remove things from the index quickly. I
>> guess I can rely directly on Solr for adding/removing from the index
>> as well, or would you recommend this going through nutch?
>
> Removing items from the index can be forced from Solr and Nutch. Solr provides
> easy methods to remove documents or documents that are the result of some
> query. Nutch can deduplicate (1.2+ and 2.0) and possibly remove 404 pages (1.3
> and 2.0) but the latter is not committed.
>
>>
>>
>> Thanks,
>> Jeremy
>>
>> On Thu, Feb 24, 2011 at 12:23 PM, Markus Jelsma
>>
>> <[email protected]> wrote:
>> > Hi Jeremy,
>> >
>> > Nutch' own search server is in the process of being deprecated, Nutch 1.2
>> > was the last release to provide the search server. Please consider using
>> > Apache Solr as your search server.
>> >
>> > Cheers,
>> >
>> >> I recently installed Nutch and have spent some time trying to get it
>> >> working with limited success.
>> >>
>> >> ./nutch crawl urls -dir crawl -depth 5 -topN 50
>> >>
>> >> After the crawl completes I am trying to run the web frontend with the
>> >> following command:
>> >>
>> >> ./nutch server 8080 crawl
>> >>
>> >> The server seems to be running (no output on the command line), but
>> >> when I hit localhost:8080 I get a Error 324 (net::ERR_EMPTY_RESPONSE):
>> >> Unknown error. Any ideas on how to get past this?
>> >>
>> >> I've been using  this tutorial to get started.
>> >> http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine
>> >>
>> >>
>> >> Thanks,
>> >> Jeremy
>

Reply via email to