Now that the soon to be released v1 uses Fetcher2 as default (or as the only
fetcher available?), I would think that this slowness problem that is facing
a number of users might be addressed?
In short the case for me is like this:
Nutch trunk revision 755143
JDK 1.6_12 on Linux
Crawl list
Andrzej stated in NUTCH-669 that some people reported performance issues
with Fetcher2, i.e. that it doesn't use the available bandwidth. These
reports are unconfirmed, and they may have been caused by suboptimal URL /
host distribution in a fetchlist - but it would be good to review the
Hello ppl,
Please provide a pointer to 0.7 release.. I need it urgently..
Thanks n regards,
Mayank.
On Mon, Mar 16, 2009 at 2:23 PM, Mayank Kamthan mkamt...@gmail.com wrote:
Hi!
I need nutch 0.7. Can someone please provide me a pointer to it to
download.
When I try via the Apache site it
Just check out the code from the svn branch, and build your self .., i
think it's easy enough ...
On Tue, Mar 17, 2009 at 5:21 PM, Mayank Kamthan mkamt...@gmail.com wrote:
Hello ppl,
Please provide a pointer to 0.7 release.. I need it urgently..
Thanks n regards,
Mayank.
On Mon, Mar 16,
Roger Dunk wrote:
Andrzej stated in NUTCH-669 that some people reported performance
issues with Fetcher2, i.e. that it doesn't use the available bandwidth.
These reports are unconfirmed, and they may have been caused by
suboptimal URL / host distribution in a fetchlist - but it would be good
I have some basic questions about Nutch. Can someone point me in the
right direction, or if you have time, maybe just blast out an answer.
Question One:
I can see the terms that come from the web page. Can I set up a way to
also add these things to the index. In other words, if ice cream came
pls see the inline comments!!
On Tue, Mar 17, 2009 at 7:34 PM, Lukas, Ray ray.lu...@idearc.com wrote:
I have some basic questions about Nutch. Can someone point me in the
right direction, or if you have time, maybe just blast out an answer.
Question One:
I can see the terms that come from
On Mar 17, 2009, at 9:04 AM, Lukas, Ray wrote:
Question Four ( is will start hunting for this ):
Last one, promise.. The indexes themselves. Is there an explanation
written up for each of the fields in the index.
http://wiki.apache.org/nutch/IndexStructure
is the closest thing I've found
On Mar 17, 2009, at 9:04 AM, Lukas, Ray wrote:
Question Four ( is will start hunting for this ):
Last one, promise.. The indexes themselves. Is there an explanation
written up for each of the fields in the index.
http://wiki.apache.org/nutch/IndexStructure
is the closest thing I've found
Hello people,
I have used nutch-0.9 to crawl my application.. While searching , Its not
giving results for query which is a part of the string .. For example the
word Message is indexed , and the search query is essa, Its not
searching for the message, and hence it will give No results ..
So
On Mar 16, 2009, at 7:55 PM, Otis Gospodnetic wrote:
Eric,
There are a couple of ways you can back up a Lucene index built by
Solr:
1) have a look at the Solr replication scripts, specifically
snapshooter. This script creates a snapshot of an index. It's
typically triggered by Solr
Wanted to gauge community interest in having a certified Nutch
distribution with support? Similar to what Lucid Imagination is doing
for Solr and Lucene and what Cloudera is providing for Hadoop. Anybody
interested?
Dennis
I raised heap size to 2GB for each child in mapred.child.java.opts and
the segment merging succeeded.
Justin Yao wrote:
Hi,
I encountered an error when I try to merge segment using the latest
nightly build nutch.
I have 3 hadoop nodes and all servers have CentOS 5.2 installed.
Every time
Hello,
It's hard for me to get big picture of why to use solr as indexing and
searching.
Could someone try to describe this a little bit?
I understand that nutch is doing crawling and solr just indexing and
searching?
Any help would be great.
Thanks,
Bartosz
Hello Bartosz. I can only really describe my own experiences and what I have
done with Nutch/Solr is pretty simple.
My reasons for using Nutch/Solr were that the query interface to solr is
more powerful (Nutch is optimised for speed instead) and that I felt that ot
would be easier for me to
Dennis, Otis et al,
My very small team has kept silent for a long time. We've been playing
with Nutch, Hadoop and to a lesser extent Solr for about 2 years now.
Before I get into my thoughts on what direction things should take I
would like to offer a thought on why Nutch is not as active as
This sounds interesting. I might be interested in this.
Marc Boucher
http://hyperix.com
On Tue, Mar 17, 2009 at 12:31 PM, Dennis Kubes ku...@apache.org wrote:
Wanted to gauge community interest in having a certified Nutch distribution
with support? Similar to what Lucid Imagination is doing
Marc,
Glad you responded. Always good to hear peoples thoughts.
Marc Boucher wrote:
Dennis, Otis et al,
My very small team has kept silent for a long time. We've been playing
with Nutch, Hadoop and to a lesser extent Solr for about 2 years now.
Before I get into my thoughts on what direction
Dennis,
That adds another dimension to the issue which I had not considered.
One avenue as you suggest would be to add another committer to the
Lucene PMC. If that does not work them maybe going the route of TLP is
the best option.
Marc
Part of this is about releases. Currently releases are
Hi All,
For fun, I created a windows-based installer for Nutch and added a
administrative GUI to it. If interested, you can grab it from
http://www.freewarefiles.com/WhelanLabs-Search-Engine-Manager_program_47202.html
FreewareFiles .
Regards,
John
--
View this message in context:
Generally nutch crawl in done thru cygwin. If i dont want to run cygwin, and
i want to crawl an application from an application of my own what can i do?
N also i want nutch to perform wildcard query search(as in, if search query
is book*, then it shd return al search results whic contain isbn
This is an interesting question. If you know how to run the Crawl process out
of another Java program, plz let me know it. Thanks in advance.
n_developer wrote:
Generally nutch crawl in done thru cygwin. If i dont want to run cygwin,
and i want to crawl an application from an application
22 matches
Mail list logo