Hello,
I am using Nutch with a Hadoop cluster of 5 servers. The
Reduce job is split into many jobs like my config sets but the map only uses
one job always.
Running Map Tasks
Running Reduce Tasks
Total Submissions
Nodes
Occupied Map Slots
Occupied Reduce Slots
Reserved
Hi Talat,
At the moment its the parse job that is causing me problems. Its been
running parse in the map job for few hours now (1 job). I googled a bit but I
can't find a map input size parameter.
Btw I am using Gora and Cassandra. (2.x branch)
Ásgeir Halldórsson
Hey Talat,
So what was the issue with this book?
Renato M.
2014-03-18 10:21 GMT+01:00 Talat Uyarer ta...@uyarer.com:
Hi All,
Some write a book about Nutch. I saw in Gora issue.
http://www.packtpub.com/web-crawling-and-data-mining-with-apache-nutch/book
--
Talat UYARER
Websitesi:
HI I managed to get NUtch 2.2.1 running in pseudoi distributed mode by
making sure all libs are the same version across de Hadoop/Hbase/Nutch
essemble.
However, now when using the crawl script, the solrdedup job fails with:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
yep, if it has a less ambitious title I would give it 4 1/2 stars (assuming
all the facts and details check-out with errors)
with the current title, I would give it 3 stars, its got nothing really on
data mining
On Wed, Mar 19, 2014 at 11:57 AM, BlackIce blackice...@gmail.com wrote:
I skimmed
I skimmed this book as well,
It saves a lot of time not having to Google all the info yourself.
It also expands on some of things, so it clarified many things for me
It is a very good starting point for a noob like me!
I Agree on the Title, it's a getting started book
On Wed, Mar 19, 2014 at
Sorry my engilish. I wanted to said While i read an issue, someone write
this book name. Did you know someone wrote a book about Nutch
I think we should support books of nutch on our wiki. Wdyt ?
19 Mar 2014 21:04 tarihinde Nicholas Roberts niccolo.robe...@gmail.com
yazdı:
yep, if it has a less
yes, its a useful reference, wiki-up Talat
On Wed, Mar 19, 2014 at 12:33 PM, Talat Uyarer ta...@uyarer.com wrote:
Sorry my engilish. I wanted to said While i read an issue, someone write
this book name. Did you know someone wrote a book about Nutch
I think we should support books of nutch
Go for it Talat :)
On Mar 19, 2014 8:39 PM, Nicholas Roberts niccolo.robe...@gmail.com
wrote:
yes, its a useful reference, wiki-up Talat
On Wed, Mar 19, 2014 at 12:33 PM, Talat Uyarer ta...@uyarer.com wrote:
Sorry my engilish. I wanted to said While i read an issue, someone write
this
Hi Folks,
On Wed, Mar 19, 2014 at 7:49 PM, user-digest-h...@nutch.apache.org wrote:
Re: Book of Nutch
So what was the issue with this book?
You can see some rather interesting reviews online
http://s.apache.org/qy
oh dear, oh dear, oh dear
On Wed, Mar 19, 2014 at 12:53 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Folks,
On Wed, Mar 19, 2014 at 7:49 PM, user-digest-h...@nutch.apache.org
wrote:
Re: Book of Nutch
So what was the issue with this book?
You can see some
Hi,
you remove it the same way, no matter whether the crawl is run locally
or in a cluster: you have to remove the command invertlinks
(or LinkDb.invert(...) when called from Java). Consequently,
there will be no linkdb and you cannot use it when indexing.
The concrete steps depend on how the
I went too fast, I got errors when I turn off my HBase server.
When my HBase server is on, I have no errors
--
View this message in context:
http://lucene.472066.n3.nabble.com/Probleme-with-nutch-inject-blocked-tp4125454p4125484.html
Sent from the Nutch - User mailing list archive at
Hi All,
I am not using the Nutch indexer but indexing using my own utility method
after every page is fetched and I need to bypass any additional steps that
Nutch executes in a crawl .Along those line I have identified the following
steps to implement.
1. Disable LinkDB creation by
Hello to all Nutch users.
I write a proposal for Google Summer of code concerning new gui
application, so I want to hear some suggestions about features which this
app should include.
Then we can sort suggestions by priority and make some kind of plan.
PS. This is quite urgent, because GSoC
Just for a reference, see what was proposed in the past. Unfortunately, it
has not been touched in about 4 years.
http://wiki.apache.org/nutch/NutchAdministrationUserInterface
https://github.com/101tec/nutch/wiki
--
Jon Uhal
On Mar 19, 2014 7:36 PM, Fjodor Vershinin fjo...@vershinin.net wrote:
Hi anupamk,
On Tue, Mar 18, 2014 at 2:45 AM, user-digest-h...@nutch.apache.org wrote:
While running the two crawler's concurrently I have run into the problems
and nutch sometimes throws a IOException saying that the .locked file
exists in crawldb. While one of crawl script tries to
Hi BlackIce,
On Wed, Mar 19, 2014 at 3:07 PM, user-digest-h...@nutch.apache.org wrote:
HI,
My first try to run Nutch in pseudo dist, when trying to run any nutch
comman from the /runtime/deploy folder I get following error:
Which version of Hadoop?
Check the classpath for the offending
Hi a.ciccia04,
On Wed, Mar 19, 2014 at 3:07 PM, user-digest-h...@nutch.apache.org wrote:
Im working with apache-nutch-2.2.1, hbase-0.90.4 solr-4.7.0
You've not stated how you've configured your stack.
You've not mentioned how many machines you run with.
This may simply be an IO problem
19 matches
Mail list logo