how do I keep reduce tmp files in mapred.local.dir

2011-05-10 Thread elton sky
hello all, I am trying to keep the output and copied files for reduce tasks after a job finishes. I commented out many "remove", "delete" kind of code, from TaskTracker, Task, etc, but still can not keep them. Any idea?

Re: hadoop/hive data loading

2011-05-10 Thread amit jaiswal
Hi, What is the meaning of 'union' over here. Is there any hadoop job with 1 (or few) reducer that combines all data together. Have you tried external (dynamic) partitions for combining data? -amit - Original Message - From: hadoopman To: common-user@hadoop.apache.org Cc: Sent: Tues

Null pointer exception in Mapper initialization

2011-05-10 Thread Mapred Learn
Hi, I get error like: java.lang.NullPointerException at org.apache.hadoop.io .serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOu

Re: Where should combiner processes write temporary sequence files?

2011-05-10 Thread W.P. McNeill
Is there any way I can get the system to generate a name for me that is guaranteed to be unique, something analogous to File.createTempFile()? I looked at using File.createTempFile() because the sequence file I'm creating is just a local cache and doesn't need to be on HDFS, but I couldn't figure

Re: Where should combiner processes write temporary sequence files?

2011-05-10 Thread Harsh J
Hello, I'm sure you're aware of it, but just for a refresh/archives: A combiner may run 0...N times, and on both the Map side and the Reduce side. On Tue, May 10, 2011 at 11:50 PM, W.P. McNeill wrote: > This creates readers and writers that work both in a reducer and in a > combiner context.  Ho

Re: Where should combiner processes write temporary sequence files?

2011-05-10 Thread W.P. McNeill
I've partially answered this question for myself, but still have some additional questions. Some of the answers are in an earlier threadI initiated with a similar question. Following Harsh J.'s advice t

hadoop/hive data loading

2011-05-10 Thread hadoopman
When we load data into hive sometimes we've run into situations where the load fails and the logs show a heap out of memory error. If I load just a few days (or months) of data then no problem. But then if I try to load two years (for example) of data then I've seen it fail. Not with every f

Re: configuration and FileSystem

2011-05-10 Thread Chris Stier
Hadoop newbie here, I have a few of the same questions that Gang has. I have the single node configuration installed but every time I restart my computer I loose my name node because it's location is defaulted to /tmp. I'm using 0.21.0. Is there a write up some where that shows what config files

Re: configuration and FileSystem

2011-05-10 Thread Allen Wittenauer
On May 10, 2011, at 9:57 AM, Gang Luo wrote: > I was confused by the configuration and file system in hadoop. when we create > a > FileSystem object and read/write something through it, are we writing to or > reading from HDFS? Typically, yes. > Could it be local file system?

how to get user-specified Job name from hadoop for running jobs?

2011-05-10 Thread Mark Zand
While I can get JobStatus with this: JobClient client = new JobClient(new JobConf(conf)); JobStatus[] jobStatuses = client.getAllJobs(); I don't see any way to get user-specified Job name. Please help. Thanks.

configuration and FileSystem

2011-05-10 Thread Gang Luo
Hi, I was confused by the configuration and file system in hadoop. when we create a FileSystem object and read/write something through it, are we writing to or reading from HDFS? Could it be local file system? If yes, what determines which file system it is? Configuration object we used to crea

Re: problems with start-all.sh

2011-05-10 Thread Keith Thompson
Yes, that does seem easier. Perhaps I will go back and extract to my home directory. Is there a simple way to uninstall the version in my root directory? Note: I also installed Maven and Mahout there as well. /usr/local/hadoop-0.20.2 /usr/local/apache-maven-2.2.1 /usr/local/trunk/bin/mahout.sh ..

Extreme slow starting (virtual) hadoop cluster & problems receiving processed data from slave node

2011-05-10 Thread Stefan Wienert
Hi there, Installed Hadoop on VMWare Workstation 7.1.4 (1 master, 1 slave) both of the machines use 2 cores (host has 6, win7x64) with 2048 mb ram, ubuntu 10.4. LTS (64 bit) with hadoop 0.20.2 I used this tutorial: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cl

Re: problems with start-all.sh

2011-05-10 Thread Keith Thompson
Thanks Matt. That makes sense. I will read up on those topics. On Tue, May 10, 2011 at 12:02 PM, GOEKE, MATTHEW [AG/1000] < matthew.go...@monsanto.com> wrote: > Keith if you have a chance you might want to look at Hadoop: The > Definitive guide or other various faqs around for rolling a cluster

Re: problems with start-all.sh

2011-05-10 Thread Luca Pireddu
On May 10, 2011 17:54:27 Keith Thompson wrote: > Thanks for catching that comma. It was actually my HADOOP_CONF_DIR rather > than HADOOP_HOME that was the culprit. :) > As for sudo ... I am not sure how to run it as a regular user. I set up > ssh for a passwordless login (and am able to ssh local

RE: problems with start-all.sh

2011-05-10 Thread GOEKE, MATTHEW [AG/1000]
Keith if you have a chance you might want to look at Hadoop: The Definitive guide or other various faqs around for rolling a cluster from tarball. One thing that most recommend is to setup a hadoop user and then to chown all of the files / directories it needs over to it. Right now what you are run

Re: problems with start-all.sh

2011-05-10 Thread Keith Thompson
Thanks for catching that comma. It was actually my HADOOP_CONF_DIR rather than HADOOP_HOME that was the culprit. :) As for sudo ... I am not sure how to run it as a regular user. I set up ssh for a passwordless login (and am able to ssh localhost without password) but I installed hadoop to /usr/l

Re: problems with start-all.sh

2011-05-10 Thread Luca Pireddu
On May 10, 2011 17:39:12 Keith Thompson wrote: > Hi Luca, > > Thank you. That worked ... at least I didn't get the same error. Now I > get: > > k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh > starting namenode, logging to > /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-n

Re: problems with start-all.sh

2011-05-10 Thread Keith Thompson
Hi Luca, Thank you. That worked ... at least I didn't get the same error. Now I get: k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh starting namenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux-8awa.out cat: /usr/local/hadoop-0,20.2/conf/slave

Re: problems with start-all.sh

2011-05-10 Thread Luca Pireddu
Hi Keith, On May 10, 2011 17:20:15 Keith Thompson wrote: > I have installed hadoop-0.20.2 (using quick start guide) and mahout. I am > running OpenSuse Linux 11.1 (but am a newbie to Linux). > My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop > start-all.sh I get the following e

Re: problems with start-all.sh

2011-05-10 Thread Dieter Plaetinck
On Tue, 10 May 2011 11:20:15 -0400 Keith Thompson wrote: > I have installed hadoop-0.20.2 (using quick start guide) and mahout. > I am running OpenSuse Linux 11.1 (but am a newbie to Linux). > My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop > start-all.sh I get the following e

problems with start-all.sh

2011-05-10 Thread Keith Thompson
I have installed hadoop-0.20.2 (using quick start guide) and mahout. I am running OpenSuse Linux 11.1 (but am a newbie to Linux). My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop start-all.sh I get the following error message: Exception in thread "main" java.lang.NoClassDefFoun