How to configure SWIM

2012-03-01 Thread Arvind
Hi all, Can anybody help me to configure SWIM -- Statistical Workload Injector for MapReduce on my hadoop cluster

Re: Browse the filesystem weblink broken after upgrade to 1.0.0: HTTP 404 Problem accessing /browseDirectory.jsp

2012-03-01 Thread madhu phatak
On Wed, Feb 29, 2012 at 11:34 PM, W.P. McNeill bill...@gmail.com wrote: I can do perform HDFS operations from the command line like hadoop fs -ls /. Doesn't that meant that the datanode is up? No. It is just meta data lookup which comes from Namenode. Try to cat some file like hadoop fs

Distributed Indexing on MapReduce

2012-03-01 Thread Frank Scholten
Hi all, I am looking into reusing some existing code for distributed indexing to test a Mahout tool I am working on https://issues.apache.org/jira/browse/MAHOUT-944 What I want is to index the Apache Public Mail Archives dataset (200G) via MapReduce on Hadoop. I have been going through the

Re: Streaming Hadoop using C

2012-03-01 Thread Charles Earl
How was your experience of starfish? C On Mar 1, 2012, at 12:35 AM, Mark question wrote: Thank you for your time and suggestions, I've already tried starfish, but not jmap. I'll check it out. Thanks again, Mark On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl charles.ce...@gmail.comwrote:

Re: Should splittable Gzip be a core hadoop feature?

2012-03-01 Thread Michel Segel
I do agree that a git hub project is the way to go unless you could convince Cloudera, HortonWorks or MapR to pick it up and support it. They have enough committers Is this potentially worthwhile? Maybe, it depends on how the cluster is integrated in to the overall environment. Companies

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Merto Mertek
From the fairscheduler docs I assume the following should work: property namemapred.fairscheduler.poolnameproperty/name valuepool.name/value /property property namepool.name/name value${mapreduce.job.group.name}/value /property which means that the default pool will be the group of

RE: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Dave Shine
I've just started playing with the Fair Scheduler. To specify the pool at job submission time you set the mapred.fairscheduler.pool property on the Job Conf to the name of the pool you want the job to use. Dave -Original Message- From: Merto Mertek [mailto:masmer...@gmail.com] Sent:

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Austin Chungath
Thanks, I will be trying the suggestions and will get back to you soon. On Thu, Mar 1, 2012 at 8:09 PM, Dave Shine dave.sh...@channelintelligence.com wrote: I've just started playing with the Fair Scheduler. To specify the pool at job submission time you set the mapred.fairscheduler.pool

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Austin Chungath
Hi, I tried what you had said. I added the following to mapred-site.xml: property namemapred.fairscheduler.poolnameproperty/name valuepool.name/value /property property namepool.name/name value${mapreduce.job.group.name}/value /property Funny enough it created a pool with the name

kill -QUIT

2012-03-01 Thread Mohit Anchlia
When I try kill -QUIT for a job it doesn't send the stacktrace to the log files. Does anyone know why or if I am doing something wrong? I find the job using ps -ef|grep attempt. I then go to logs/userLogs/jobid/attemptid/

High quality hadoop logo?

2012-03-01 Thread Keith Wiley
Is there a high quality version of the hadoop logo anywhere? Even the graphic presented on the Apache page itself suffers from dreadful jpeg artifacting. A google image search didn't inspire much hope on this issue (they all have the same low-quality jpeg appearance). I'm looking for good

Re: High quality hadoop logo?

2012-03-01 Thread Keith Wiley
Sorry, false alarm. I was looking at the popup thumbnails in google image search. If I click all the way through, there are some high quality versions available. Why is the version on the Apache site (and the Wikipedia page) so poor? On Mar 1, 2012, at 14:09 , Keith Wiley wrote: Is there

Re: High quality hadoop logo?

2012-03-01 Thread Owen O'Malley
On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley kwi...@keithwiley.com wrote: Sorry, false alarm.  I was looking at the popup thumbnails in google image search.  If I click all the way through, there are some high quality versions available.  Why is the version on the Apache site (and the Wikipedia

Re: Streaming Hadoop using C

2012-03-01 Thread Mark question
Starfish worked great for wordcount .. I didn't run it on my application because I have only map tasks. Mark On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl charles.ce...@gmail.comwrote: How was your experience of starfish? C On Mar 1, 2012, at 12:35 AM, Mark question wrote: Thank you for

Re: High quality hadoop logo?

2012-03-01 Thread Keith Wiley
Excellent! Thank you. Sent from my phone, please excuse my brevity. Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com Owen O'Malley omal...@apache.org wrote: On Thu, Mar 1, 2012 at 2:14 PM, Keith Wiley kwi...@keithwiley.com wrote: Sorry, false

Adding nodes

2012-03-01 Thread Mohit Anchlia
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ: http://wiki.apache.org/hadoop/FAQ 1. Update conf/slave 2. on the slave nodes start datanode and tasktracker 3. hadoop balancer Do I also need to run dfsadmin -refreshnodes?

Re: Adding nodes

2012-03-01 Thread Joey Echeverria
You only have to refresh nodes if you're making use of an allows file. Sent from my iPhone On Mar 1, 2012, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote: Is this the right procedure to add nodes? I took some from hadoop wiki FAQ: http://wiki.apache.org/hadoop/FAQ 1. Update

Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria j...@cloudera.com wrote: You only have to refresh nodes if you're making use of an allows file. Thanks does it mean that when tasktracker/datanode starts up it communicates with namenode using master file? Sent from my iPhone On Mar 1, 2012,

Re: Adding nodes

2012-03-01 Thread Joey Echeverria
Not quite. Datanodes get the namenode host from fs.defalt.name in core-site.xml. Task trackers find the job tracker from the mapred.job.tracker setting in mapred-site.xml. Sent from my iPhone On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote: On Thu, Mar 1, 2012 at 4:46

Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
The master and slave files, if I remember correctly are used to start the correct daemons on the correct nodes from the master node. Raj From: Joey Echeverria j...@cloudera.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Cc:

Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria j...@cloudera.com wrote: Not quite. Datanodes get the namenode host from fs.defalt.name in core-site.xml. Task trackers find the job tracker from the mapred.job.tracker setting in mapred-site.xml. I actually meant to ask how does

Re: Adding nodes

2012-03-01 Thread anil gupta
Whatever Joey said is correct for Cloudera's distribution. For same, I am not confident about other distribution as i haven't tried them. Thanks, Anil On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan rajv...@yahoo.com wrote: The master and slave files, if I remember correctly are used to start

Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
WHat Joey said is correct for both apache and cloudera distros. The DN/TT   daemons  will connect to the NN/JT using the config files. The master and slave files are used for starting the correct daemons. From: anil gupta anilg...@buffalo.edu To:

Re: Adding nodes

2012-03-01 Thread Arpit Gupta
It is initiated by the slave.If you have defined files to state which slaves can talk to the namenode (using configdfs.hosts) and which hosts cannot (using propertydfs.hosts.exclude) then you would need to edit these files and issue the refresh command.On Mar 1, 2012, at 5:35 PM, Mohit Anchlia

Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
Thanks all for the answers!! On Thu, Mar 1, 2012 at 5:52 PM, Arpit Gupta ar...@hortonworks.com wrote: It is initiated by the slave. If you have defined files to state which slaves can talk to the namenode (using config dfs.hosts) and which hosts cannot (using property dfs.hosts.exclude)

Re: Adding nodes

2012-03-01 Thread George Datskos
Mohit, New datanodes will connect to the namenode so thats how the namenode knows. Just make sure the datanodes have the correct {fs.default.dir} in their hdfs-site.xml and then start them. The namenode can, however, choose to reject the datanode if you are using the {dfs.hosts} and

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Tried but still getting the error 0.4.15. Really lost with this. My hadoop release is 0.20.2 from more than a year ago. Could this be related to the problem? -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792484.html Sent

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Harsh J
Marc, Was the lzo libs on your server upgraded to a higher version recently? Also, when you deployed a built copy of 0.4.15, did you ensure you replaced the older native libs for hadoop-lzo as well? On Fri, Mar 2, 2012 at 9:05 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Tried but still

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Yes, The steps I followed where: 1-Intall lzo 2.06 in a machine with the same kernel as my nodes. 2-Compile there lzo 0.4.15 (in /lib replaced cdh3u3 per my hadoop 0.20.2 release) 3-Replace hadoop-lzo-0.4.9.jar for the now compiled hadoop-lzo-0.4.15.jar in the hadoop lib directory of all my nodes

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
I use to have 2.05 but now as I said I installed 2.06 -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792511.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Joey Echeverria
I know this doesn't fix lzo, but have you considered Snappy for the intermediate output compression? It gets similar compression ratios and compress/decompress speed, but arguably has better Hadoop integration. -Joey On Thu, Mar 1, 2012 at 10:01 PM, Marc Sturlese marc.sturl...@gmail.com wrote:

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Absolutely. In case I don't find the root of the problem soon I'll definitely try it. -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792531.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: LZO exception decompressing (returned -8)

2012-03-01 Thread Marc Sturlese
Absolutely. In case I don't find the root of the problem soon I'll definitely try it. -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3792530.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Subir S
Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? Also there was a claim in our company that Pig performs better than Map Reduce jobs? Is this true? Are there any such benchmarks available Thanks, Subir

Re: Where Is DataJoinMapperBase?

2012-03-01 Thread madhu phatak
Hi, Please look inside $HADOOP_HOME/contrib/datajoin folder of 0.20.2 version. You will find the jar. On Sat, Feb 11, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote: Hi, all, I am starting to learn advanced Map/Reduce. However, I cannot find the class DataJoinMapperBase in my downloaded

Re: DFSIO

2012-03-01 Thread madhu phatak
Hi, Only HDFS should be enough. On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do than...@cs.wisc.edu wrote: hi all, in order to run DFSIO in my cluster, do i need to run JobTracker, and TaskTracker, or just running HDFS is enough? Many thanks, Thanh -- Join me at

Re: DFSIO

2012-03-01 Thread Harsh J
Madhu, That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR setup to use it. On Fri, Mar 2, 2012 at 11:07 AM, madhu phatak phatak@gmail.com wrote: Hi,  Only HDFS should be enough. On Fri, Nov 25, 2011 at 1:45 AM, Thanh Do than...@cs.wisc.edu wrote: hi all, in order to

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Jie Li
Considering Pig essentially translates scripts into Map Reduce jobs, one can always write as good Map Reduce jobs as Pig does. You can refer to Pig experience paper to see the overhead Pig introduces, but it's been improved all the time. Btw if you really care about the performance, how you

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-01 Thread Harsh J
On Fri, Mar 2, 2012 at 10:18 AM, Subir S subir.sasiku...@gmail.com wrote: Hello Folks, Are there any pointers to such comparisons between Apache Pig and Hadoop Streaming Map Reduce jobs? I do not see why you seek to compare these two. Pig offers a language that lets you write data-flow

Re: DFSIO

2012-03-01 Thread madhu phatak
Hi Harsha, Sorry i read DFSIO as DFS Input/Output which i thought reading and writing using HDFS API:) On Fri, Mar 2, 2012 at 12:32 PM, Harsh J ha...@cloudera.com wrote: Madhu, That is incorrect. TestDFSIO is a MapReduce job and you need HDFS+MR setup to use it. On Fri, Mar 2, 2012 at