Re: Nutch Hadoop question

2009-11-13 Thread Eran Zinman
Hi All, Don't want to bother you guys too much... I've tried searching for this topic and do some testing myself but so far was quite unsuccessful. Basically - I wish to use some computers only for map-reduce processing and not for HDFS, does anyone know how this can be done? Thanks, Eran On

Re: Nutch Hadoop question

2009-11-13 Thread TuxRacer69
Hi Eran, mapreduce has to store its data on HDFS file system. But if you want to separate the two groups of servers, you could build two separate HDFS filesystems. To separate the two setups, you will need to make sure there is no cross communication between the two parts, cheer Alex Eran

Re: Nutch Hadoop question

2009-11-13 Thread Andrzej Bialecki
TuxRacer69 wrote: Hi Eran, mapreduce has to store its data on HDFS file system. More specifically, it needs read/write access to a shared filesystem. If you are brave enough you can use NFS, too, or any other type of filesystem that can be mounted locally on each node (e.g. a NetApp). But

Re: Synonym Filter with Nutch

2009-11-13 Thread Andrzej Bialecki
Dharan Althuru wrote: Hi, We are trying to incorporate synonym filter during indexing using Nutch. As per my understanding Nutch doesn’t have synonym indexing plug-in by default. Can we extend IndexFilter in Nutch to incorporate the synonym filter plug-in available in Lucene using WordNet or

How to configure nutch to crawl parallelly

2009-11-13 Thread xiao yang
Hi, All I'm using Nutch-1.0 on a 12 nodes cluster, and configure conf/hadoop-site.xml as follow: ... property namemapred.tasktracker.map.tasks.maximum/name value20/value /property property namemapred.tasktracker.reduce.tasks.maximum/name value20/value /property ... but

can't deploy nutch-1.0.war ???

2009-11-13 Thread MilleBii
I'm stuck and not able to deploy nutch-1.0.war I get following error in the catalina.log: Exception when processing TLD indicated by the ressource path /WEB-INF/taglibs-i18n.tld in the context /nutch-1.0 What could it be the taglibs is there, the *.properties files are there. ANY HELP where

Re: How to configure nutch to crawl parallelly

2009-11-13 Thread Otis Gospodnetic
I don't recall off the top of my head what that jobtracker.jsp shows, but judging by name, it shows your job. Each job is composed of multiple map and reduce tasks. Drill into your job and you should see multiple tasks running. Otis -- Sematext is hiring --

Re: Nutch Hadoop question

2009-11-13 Thread Eran Zinman
Thanks for the help guys. On Fri, Nov 13, 2009 at 5:20 PM, Andrzej Bialecki a...@getopt.org wrote: TuxRacer69 wrote: Hi Eran, mapreduce has to store its data on HDFS file system. More specifically, it needs read/write access to a shared filesystem. If you are brave enough you can use