Re: hadoop/hive data loading

2011-05-12 Thread Fei Pan
hi,hadoopman you can put the large data into your hdfs using hadoop fs -put src dest and then you can use alter table xxx add partition(x) location 'desc' 2011/5/11 amit jaiswal amit_...@yahoo.com Hi, What is the meaning of 'union' over here. Is there any hadoop job with 1 (or few)

What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-12 Thread Dieter Plaetinck
Hi, I'm running some experiments using hadoop streaming. I always get a output_dir/part-0 file at the end, but I wonder: when exactly will this filename show up? when it's completely written, or will it already show up while the hapreduce software is still writing to it? Is the write atomic?

Host-address or Hostname

2011-05-12 Thread Matthew John
Hi all, The String[] that is output by the InputSplit.getLocations() gives the list of nodes where the input split resides. But the node detail is either represented as the ip-address or the hostname (for eg - an entry in the list could be either 10.72.147.109 or mattHDFS1 (hostname). Is it

Question about InputSampler

2011-05-12 Thread Panayotis Antonopoulos
Hello, I am writing a MR job where the distribution of the Keys emitted by the Map phase is not known beforehand and so I can't create the partitions for the TotalOrderPartitioner. I would like to sample those keys to create the partitions and then run the job that will process the whole

Re: Host-address or Hostname

2011-05-12 Thread Matthew John
Is it possible to get a Host-address to Host-name mapping in the JIP ? Someone please help me with this! Thanks, Matthew On Thu, May 12, 2011 at 5:36 PM, Matthew John tmatthewjohn1...@gmail.comwrote: Hi all, The String[] that is output by the InputSplit.getLocations() gives the list of

Error reading task output for benchmark test TESTDFSIO

2011-05-12 Thread Matthew Tice
Hello, I have a four node hadoop cluster running hadoop v.0.20.2 on CentOS 5.6. Here is my layout: Name01.hadoop.stage (namenode) Name02.hadoop.stage (sec namenode / jobtracker) Data01.hadoop.stage (data node) Data02.hadoop.stage (data node) When trying to run a benchmark test for

Re: What exactly are the output_dir/part-00000 semantics (of a streaming job) ?

2011-05-12 Thread Aman
The creation of files part-n is atomic. When you run a MR job, these files are created in directory output_dir/_temporary and moved to output_dir after the files is closed for writing. This move is atomic hence as long as you don't try to read these files from temporary directory (which I see

Call to namenode failures

2011-05-12 Thread Sidney Simmons
Hi there, I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to fail: 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201105090819_059_m_0038_0Child Error

Call to namenode fails with java.io.EOFException

2011-05-12 Thread Sidney Simmons
Hi there, I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to fail: 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201105090819_059_m_0038_0Child Error

mapper java process not exiting

2011-05-12 Thread Adi
For one long running job we are noticing that the mapper jvms do not exit even after the mapper is done. Any suggestions on why this could be happening. The java processes get cleaned up if I do a hadoop job -kill job_id. The java processes get cleaned up of I run in it in a smaller batch and the

Call to namenode fails (java.io.EOFException)

2011-05-12 Thread Sidney Simmons
Hi there, Apologies if this comes through twice but i sent the mail a few hours ago and haven't seen it on the mailing list. I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster. Randomly (periodically), we're getting Call to namenode failures on tasktrackers causing tasks to

Re: mapper java process not exiting

2011-05-12 Thread Joey Echeverria
Which version of hadoop are you running? Are you running on linux? -Joey On Thu, May 12, 2011 at 1:39 PM, Adi adi.pan...@gmail.com wrote: For one long running job we are noticing that the mapper jvms do not exit even after the mapper is done. Any suggestions on why this could be happening.

Re: mapper java process not exiting

2011-05-12 Thread Adi
Which version of hadoop are you running? Hadoop 0.21.0 with some patches. Are you running on linux? Yes Linux 2.6.18-238.9.1.el5 #1 SMP x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_21 Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build

Datanode doesn't start but there is no exception in the log

2011-05-12 Thread Panayotis Antonopoulos
Hello, I am trying to set up Hadoop HDFS in a cluster for the first time. So far I was using pseudo-distributed mode on my PC at home and everything was working perfectly. Tha NameNode starts but the DataNode doesn't start and the log contains the following: 2011-05-13 04:01:13,663 INFO

Re: Datanode doesn't start but there is no exception in the log

2011-05-12 Thread Bharath Mundlapudi
Is that all the messages in the datanode log? Do you see any SHUTDOWN message also? -Bharath From: Panayotis Antonopoulos antonopoulos...@hotmail.com To: common-user@hadoop.apache.org Sent: Thursday, May 12, 2011 6:07 PM Subject: Datanode doesn't start but

Re: is it possible to concatenate output files under many reducers?

2011-05-12 Thread Jun Young Kim
yes. that is a general solution to control counts of output files. however, if you need to control counts of outputs dynamically, how could you do? if an output file name is 'A', counts of this output files are needed to be 5. if an output file name is 'B', counts of this output files are

Re: mapper java process not exiting

2011-05-12 Thread Joey Echeverria
Hadoop 0.21.0 with some patches. Hadoop 0.21.0 doesn't get much use, so I'm not sure how much help I can be. 2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process

Re: is it possible to concatenate output files under many reducers?

2011-05-12 Thread Joey Echeverria
You can control the number of reducers by calling job.setNumReduceTasks() before you launch it. -Joey On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim juneng...@gmail.com wrote: yes. that is a general solution to control counts of output files. however, if you need to control counts of outputs

Re: mapper java process not exiting

2011-05-12 Thread Adi
2011-05-12 13:52:04,147 WARN org.apache.hadoop.mapreduce.util.ProcessTree: Error executing shell command org.apache.hadoop.util.Shell$ExitCodeException: kill -12545: No such process Your logs showed that Hadoop tried to kill processes but the kill command claimed they didn't exist. The

Re: mapper java process not exiting

2011-05-12 Thread highpointe
Is there a reason for using OpenJDK and not Sun's JDK? Also... I believe there were noted issues with the .17 JDK. I will look for a link and post if I can find. Otherwise, the behaviour I have seen before. Hadoop is detaching from the JVM and stops seeing it. I think your problem lies in

Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? I tried looking up the Reporter class and see how is it printing the logical splits on the UI for each

I can't see my messages immediately, and sometimes doesn't even arrive why !

2011-05-12 Thread Mark question

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Owen O'Malley
On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com wrote: Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? Look at

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Thanks for the reply Owen, I only knew about map.input.file. So there is no way I can see the other possible splits (start+length)? like some function that returns strings of map.input.file and map.input.offset of the other mappers ? Thanks, Mark On Thu, May 12, 2011 at 9:08 PM, Owen O'Malley

Re: how to get user-specified Job name from hadoop for running jobs?

2011-05-12 Thread Mark question
you mean by user-specified is when you write your job name via JobConf.setJobName(myTask) ? Then using the same object you can recall your name as follows: JobConf conf ; conf.getJobName() ; ~Cheers Mark On Tue, May 10, 2011 at 10:16 AM, Mark Zand mz...@basistech.com wrote: While I can get

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Owen O'Malley
On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com wrote: So there is no way I can see the other possible splits (start+length)? like some function that returns strings of map.input.file and map.input.offset of the other mappers ? No, there isn't any way to do it using the

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Then which class is filling the Thanks again Owen, hopefully last but: Who's filling the map.input.file and map.input.offset (ie. which class) so I can extend it to have a function to return these strings. Thanks, Mark On Thu, May 12, 2011 at 10:07 PM, Owen O'Malley omal...@apache.org wrote:

Re: Call to namenode fails with java.io.EOFException

2011-05-12 Thread Harsh J
One of the reasons I can think of could be a version mismatch. You may want to ensure that the job in question was not carrying a separate version of Hadoop along with it inside, perhaps? On Fri, May 13, 2011 at 12:42 AM, Sidney Simmons ssimm...@nmitconsulting.co.uk wrote: Hi there, I'm

Re: Datanode doesn't start but there is no exception in the log

2011-05-12 Thread highpointe
Have you defined the IP of the DN in the slaves file? Sent from my iPhone On May 12, 2011, at 7:27 PM, Bharath Mundlapudi bharathw...@yahoo.com wrote: Is that all the messages in the datanode log? Do you see any SHUTDOWN message also? -Bharath