Nutch 2.2.1 Hadoop map tasks

2014-03-19 Thread Ásgeir Halldórsson
Hello, I am using Nutch with a Hadoop cluster of 5 servers. The Reduce job is split into many jobs like my config sets but the map only uses one job always. Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved

RE: Nutch 2.2.1 Hadoop map tasks

2014-03-19 Thread Ásgeir Halldórsson
Hi Talat, At the moment its the parse job that is causing me problems. Its been running parse in the map job for few hours now (1 job). I googled a bit but I can't find a map input size parameter. Btw I am using Gora and Cassandra. (2.x branch) Ásgeir Halldórsson

Re: Book of Nutch

2014-03-19 Thread Renato Marroquín Mogrovejo
Hey Talat, So what was the issue with this book? Renato M. 2014-03-18 10:21 GMT+01:00 Talat Uyarer ta...@uyarer.com: Hi All, Some write a book about Nutch. I saw in Gora issue. http://www.packtpub.com/web-crawling-and-data-mining-with-apache-nutch/book -- Talat UYARER Websitesi:

solrdedup crashing in pseudo distributed mode (Nutch 2.2.1)

2014-03-19 Thread BlackIce
HI I managed to get NUtch 2.2.1 running in pseudoi distributed mode by making sure all libs are the same version across de Hadoop/Hbase/Nutch essemble. However, now when using the crawl script, the solrdedup job fails with: java.lang.RuntimeException: java.lang.ClassNotFoundException:

Re: Book of Nutch

2014-03-19 Thread Nicholas Roberts
yep, if it has a less ambitious title I would give it 4 1/2 stars (assuming all the facts and details check-out with errors) with the current title, I would give it 3 stars, its got nothing really on data mining On Wed, Mar 19, 2014 at 11:57 AM, BlackIce blackice...@gmail.com wrote: I skimmed

Re: Book of Nutch

2014-03-19 Thread BlackIce
I skimmed this book as well, It saves a lot of time not having to Google all the info yourself. It also expands on some of things, so it clarified many things for me It is a very good starting point for a noob like me! I Agree on the Title, it's a getting started book On Wed, Mar 19, 2014 at

Re: Book of Nutch

2014-03-19 Thread Talat Uyarer
Sorry my engilish. I wanted to said While i read an issue, someone write this book name. Did you know someone wrote a book about Nutch I think we should support books of nutch on our wiki. Wdyt ? 19 Mar 2014 21:04 tarihinde Nicholas Roberts niccolo.robe...@gmail.com yazdı: yep, if it has a less

Re: Book of Nutch

2014-03-19 Thread Nicholas Roberts
yes, its a useful reference, wiki-up Talat On Wed, Mar 19, 2014 at 12:33 PM, Talat Uyarer ta...@uyarer.com wrote: Sorry my engilish. I wanted to said While i read an issue, someone write this book name. Did you know someone wrote a book about Nutch I think we should support books of nutch

Re: Book of Nutch

2014-03-19 Thread Renato Marroquín Mogrovejo
Go for it Talat :) On Mar 19, 2014 8:39 PM, Nicholas Roberts niccolo.robe...@gmail.com wrote: yes, its a useful reference, wiki-up Talat On Wed, Mar 19, 2014 at 12:33 PM, Talat Uyarer ta...@uyarer.com wrote: Sorry my engilish. I wanted to said While i read an issue, someone write this

Re: Book of Nutch

2014-03-19 Thread Lewis John Mcgibbney
Hi Folks, On Wed, Mar 19, 2014 at 7:49 PM, user-digest-h...@nutch.apache.org wrote: Re: Book of Nutch So what was the issue with this book? You can see some rather interesting reviews online http://s.apache.org/qy

Re: Book of Nutch

2014-03-19 Thread Nicholas Roberts
oh dear, oh dear, oh dear On Wed, Mar 19, 2014 at 12:53 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Folks, On Wed, Mar 19, 2014 at 7:49 PM, user-digest-h...@nutch.apache.org wrote: Re: Book of Nutch So what was the issue with this book? You can see some

Re: Disable LinkInversion Phase

2014-03-19 Thread Sebastian Nagel
Hi, you remove it the same way, no matter whether the crawl is run locally or in a cluster: you have to remove the command invertlinks (or LinkDb.invert(...) when called from Java). Consequently, there will be no linkdb and you cannot use it when indexing. The concrete steps depend on how the

Re: Probleme with nutch inject blocked

2014-03-19 Thread a.ciccia04
I went too fast, I got errors when I turn off my HBase server. When my HBase server is on, I have no errors -- View this message in context: http://lucene.472066.n3.nabble.com/Probleme-with-nutch-inject-blocked-tp4125454p4125484.html Sent from the Nutch - User mailing list archive at

fetcher.store.content property

2014-03-19 Thread S.L
Hi All, I am not using the Nutch indexer but indexing using my own utility method after every page is fetched and I need to bypass any additional steps that Nutch executes in a crawl .Along those line I have identified the following steps to implement. 1. Disable LinkDB creation by

Nutch web GUI (GSoC 2014)

2014-03-19 Thread Fjodor Vershinin
Hello to all Nutch users. I write a proposal for Google Summer of code concerning new gui application, so I want to hear some suggestions about features which this app should include. Then we can sort suggestions by priority and make some kind of plan. PS. This is quite urgent, because GSoC

Re: Nutch web GUI (GSoC 2014)

2014-03-19 Thread Jon Uhal
Just for a reference, see what was proposed in the past. Unfortunately, it has not been touched in about 4 years. http://wiki.apache.org/nutch/NutchAdministrationUserInterface https://github.com/101tec/nutch/wiki -- Jon Uhal On Mar 19, 2014 7:36 PM, Fjodor Vershinin fjo...@vershinin.net wrote:

Re: Interleaved nutch crawls locks crawldb

2014-03-19 Thread Lewis John Mcgibbney
Hi anupamk, On Tue, Mar 18, 2014 at 2:45 AM, user-digest-h...@nutch.apache.org wrote: While running the two crawler's concurrently I have run into the problems and nutch sometimes throws a IOException saying that the .locked file exists in crawldb. While one of crawl script tries to

Re: Nutch 2.2.1 pseudo dist, errors

2014-03-19 Thread Lewis John Mcgibbney
Hi BlackIce, On Wed, Mar 19, 2014 at 3:07 PM, user-digest-h...@nutch.apache.org wrote: HI, My first try to run Nutch in pseudo dist, when trying to run any nutch comman from the /runtime/deploy folder I get following error: Which version of Hadoop? Check the classpath for the offending

Re: Probleme with nutch inject blocked

2014-03-19 Thread Lewis John Mcgibbney
Hi a.ciccia04, On Wed, Mar 19, 2014 at 3:07 PM, user-digest-h...@nutch.apache.org wrote: Im working with apache-nutch-2.2.1, hbase-0.90.4 solr-4.7.0 You've not stated how you've configured your stack. You've not mentioned how many machines you run with. This may simply be an IO problem