Re: Fetcher2 Slow

2009-03-17 Thread Roger Dunk
Now that the soon to be released v1 uses Fetcher2 as default (or as the only fetcher available?), I would think that this slowness problem that is facing a number of users might be addressed? In short the case for me is like this: Nutch trunk revision 755143 JDK 1.6_12 on Linux Crawl list

Re: Fetcher2 Slow

2009-03-17 Thread Roger Dunk
Andrzej stated in NUTCH-669 that some people reported performance issues with Fetcher2, i.e. that it doesn't use the available bandwidth. These reports are unconfirmed, and they may have been caused by suboptimal URL / host distribution in a fetchlist - but it would be good to review the

Re: nutch 0.7

2009-03-17 Thread Mayank Kamthan
Hello ppl, Please provide a pointer to 0.7 release.. I need it urgently.. Thanks n regards, Mayank. On Mon, Mar 16, 2009 at 2:23 PM, Mayank Kamthan mkamt...@gmail.com wrote: Hi! I need nutch 0.7. Can someone please provide me a pointer to it to download. When I try via the Apache site it

Re: nutch 0.7

2009-03-17 Thread W
Just check out the code from the svn branch, and build your self .., i think it's easy enough ... On Tue, Mar 17, 2009 at 5:21 PM, Mayank Kamthan mkamt...@gmail.com wrote: Hello ppl, Please provide a pointer to 0.7 release.. I need it  urgently.. Thanks  n regards, Mayank. On Mon, Mar 16,

Re: Fetcher2 Slow

2009-03-17 Thread Andrzej Bialecki
Roger Dunk wrote: Andrzej stated in NUTCH-669 that some people reported performance issues with Fetcher2, i.e. that it doesn't use the available bandwidth. These reports are unconfirmed, and they may have been caused by suboptimal URL / host distribution in a fetchlist - but it would be good

Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread Lukas, Ray
I have some basic questions about Nutch. Can someone point me in the right direction, or if you have time, maybe just blast out an answer. Question One: I can see the terms that come from the web page. Can I set up a way to also add these things to the index. In other words, if ice cream came

Re: Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread vishal vachhani
pls see the inline comments!! On Tue, Mar 17, 2009 at 7:34 PM, Lukas, Ray ray.lu...@idearc.com wrote: I have some basic questions about Nutch. Can someone point me in the right direction, or if you have time, maybe just blast out an answer. Question One: I can see the terms that come from

Re: Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread Eric J. Christeson
On Mar 17, 2009, at 9:04 AM, Lukas, Ray wrote: Question Four ( is will start hunting for this ): Last one, promise.. The indexes themselves. Is there an explanation written up for each of the fields in the index. http://wiki.apache.org/nutch/IndexStructure is the closest thing I've found

Re: Original tags, attribute defs, multiword tokens, how is this done.

2009-03-17 Thread Eric J. Christeson
On Mar 17, 2009, at 9:04 AM, Lukas, Ray wrote: Question Four ( is will start hunting for this ): Last one, promise.. The indexes themselves. Is there an explanation written up for each of the fields in the index. http://wiki.apache.org/nutch/IndexStructure is the closest thing I've found

wild card query in nutch

2009-03-17 Thread Raagu
Hello people, I have used nutch-0.9 to crawl my application.. While searching , Its not giving results for query which is a part of the string .. For example the word Message is indexed , and the search query is essa, Its not searching for the message, and hence it will give No results .. So

Re: Index Disaster Recovery

2009-03-17 Thread Eric J. Christeson
On Mar 16, 2009, at 7:55 PM, Otis Gospodnetic wrote: Eric, There are a couple of ways you can back up a Lucene index built by Solr: 1) have a look at the Solr replication scripts, specifically snapshooter. This script creates a snapshot of an index. It's typically triggered by Solr

Professional Nutch Support and Distribution

2009-03-17 Thread Dennis Kubes
Wanted to gauge community interest in having a certified Nutch distribution with support? Similar to what Lucid Imagination is doing for Solr and Lucene and what Cloudera is providing for Hadoop. Anybody interested? Dennis

Re: Task failed to report status when merging segments

2009-03-17 Thread Justin Yao
I raised heap size to 2GB for each child in mapred.child.java.opts and the segment merging succeeded. Justin Yao wrote: Hi, I encountered an error when I try to merge segment using the latest nightly build nutch. I have 3 hadoop nodes and all servers have CentOS 5.2 installed. Every time

nutch - solr integration advantages

2009-03-17 Thread Bartosz Gadzimski
Hello, It's hard for me to get big picture of why to use solr as indexing and searching. Could someone try to describe this a little bit? I understand that nutch is doing crawling and solr just indexing and searching? Any help would be great. Thanks, Bartosz

Re: nutch - solr integration advantages

2009-03-17 Thread Andrew Smith
Hello Bartosz. I can only really describe my own experiences and what I have done with Nutch/Solr is pretty simple. My reasons for using Nutch/Solr were that the query interface to solr is more powerful (Nutch is optimised for speed instead) and that I felt that ot would be easier for me to

Re: The Future of Nutch

2009-03-17 Thread Marc Boucher
Dennis, Otis et al, My very small team has kept silent for a long time. We've been playing with Nutch, Hadoop and to a lesser extent Solr for about 2 years now. Before I get into my thoughts on what direction things should take I would like to offer a thought on why Nutch is not as active as

Re: Professional Nutch Support and Distribution

2009-03-17 Thread Marc Boucher
This sounds interesting. I might be interested in this. Marc Boucher http://hyperix.com On Tue, Mar 17, 2009 at 12:31 PM, Dennis Kubes ku...@apache.org wrote: Wanted to gauge community interest in having a certified Nutch distribution with support?  Similar to what Lucid Imagination is doing

Re: The Future of Nutch

2009-03-17 Thread Dennis Kubes
Marc, Glad you responded. Always good to hear peoples thoughts. Marc Boucher wrote: Dennis, Otis et al, My very small team has kept silent for a long time. We've been playing with Nutch, Hadoop and to a lesser extent Solr for about 2 years now. Before I get into my thoughts on what direction

Re: The Future of Nutch

2009-03-17 Thread Marc Boucher
Dennis, That adds another dimension to the issue which I had not considered. One avenue as you suggest would be to add another committer to the Lucene PMC. If that does not work them maybe going the route of TLP is the best option. Marc Part of this is about releases.  Currently releases are

Nutch-based Application for Windows

2009-03-17 Thread John Whelan
Hi All, For fun, I created a windows-based installer for Nutch and added a administrative GUI to it. If interested, you can grab it from http://www.freewarefiles.com/WhelanLabs-Search-Engine-Manager_program_47202.html FreewareFiles . Regards, John -- View this message in context:

embed nutch crawl in an application

2009-03-17 Thread n_developer
Generally nutch crawl in done thru cygwin. If i dont want to run cygwin, and i want to crawl an application from an application of my own what can i do? N also i want nutch to perform wildcard query search(as in, if search query is book*, then it shd return al search results whic contain isbn

Re: embed nutch crawl in an application

2009-03-17 Thread MyD
This is an interesting question. If you know how to run the Crawl process out of another Java program, plz let me know it. Thanks in advance. n_developer wrote: Generally nutch crawl in done thru cygwin. If i dont want to run cygwin, and i want to crawl an application from an application