Intermittent DataStreamer Exception while appending to file inside HDFS

2013-10-10 Thread Arinto Murdopo
Hi there, I have this following exception while I'm appending existing file in my HDFS. This error appears intermittently. If the error does not show up, I can append the file successfully. If the error appears, I could not append the file. Here is the error:

TestHDFSCLI error

2013-10-10 Thread lei liu
I use CDH4.3.1 and run the TestHDFSCLI unit test,but there are below errors: 2013-10-10 13:05:39,671 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(156)) - --- 2013-10-10 13:05:39,671 INFO cli.CLITestHelper

RE: Intermittent DataStreamer Exception while appending to file inside HDFS

2013-10-10 Thread Uma Maheswara Rao G
Hi Arinto, Please disable this feature with smaller clusters. dfs.client.block.write.replace-datanode-on-failure.policy Reason for this exception is, you have replication set to 3 and looks like you have only 2 nodes in the cluster from the logs. When you first time created pipeline we will

Re: Problem with streaming exact binary chunks

2013-10-10 Thread Youssef Hatem
Hello, Thanks a lot for the information. It helped me figure out the solution of this problem. I posted the sketch of solution on StackOverflow (http://stackoverflow.com/a/19295610/337194) for anybody who is interested. Best regards, Youssef Hatem On Oct 9, 2013, at 14:08 , Peter Marron

Read Avro schema automatically?

2013-10-10 Thread DSuiter RDX
Hi, We are working on building a MapReduce program that takes Avro input from HDFS, gets the timestamp, and counts the number of events written in any given day. We would like to make a program that does not need to have the Avro data declared previously, rather, it would be best if it could read

Hadoop-2.0.1 log files deletion

2013-10-10 Thread Reyane Oukpedjo
Hi there, I was running somme mapreduce jobs on hadoop-2.1.0-beta . These are multiple unit tests that can take more than a day to finish running. However I realized the logs for the jobs are being deleted some how quickly than the default 24 hours setting of mapreduce.job.userlog.retain.hours

Re: Hadoop-2.0.1 log files deletion

2013-10-10 Thread Krishna Kishore Bonagiri
Hi Reyane, Did you try yarn.nodemanager.log.retain-seconds? increasing that might help. The default value is 10800 seconds, that means 3 hours. Thanks, Kishore On Thu, Oct 10, 2013 at 8:27 PM, Reyane Oukpedjo oukped...@gmail.comwrote: Hi there, I was running somme mapreduce jobs on

Re: Java version with Hadoop 2.0

2013-10-10 Thread J. Ryan Earl
We recently switched all our productions clusters to JDK7 off the EOL JDK6. The one big gotcha, and this was -not- specifically a problem with the Hadoop framework but you may have issues with your own applications or clients is the with the Java 7 bytecode verifier which can be disabled with

Re: Hadoop-2.0.1 log files deletion

2013-10-10 Thread Reyane Oukpedjo
Thanks problem solved. Reyane OUKPEDJO On 10 October 2013 11:10, Krishna Kishore Bonagiri write2kish...@gmail.comwrote: Hi Reyane, Did you try yarn.nodemanager.log.retain-seconds? increasing that might help. The default value is 10800 seconds, that means 3 hours. Thanks,

Improving MR job disk IO

2013-10-10 Thread Xuri Nagarin
Hi, I have a simple Grep job (from bundled examples) that I am running on a 11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT on), 64GB RAM and 8 x 1TB disks. I have mappers set to 20 per node. When I run the Grep job, I notice that CPU gets pegged to 100% on multiple

Re: Improving MR job disk IO

2013-10-10 Thread Pradeep Gollakota
Actually... I believe that is expected behavior. Since your CPU is pegged at 100% you're not going to be IO bound. Typically jobs tend to be CPU bound or IO bound. If you're CPU bound you expect to see low IO throughput. If you're IO bound, you expect to see low CPU usage. On Thu, Oct 10, 2013

Re: Improving MR job disk IO

2013-10-10 Thread Xuri Nagarin
Thanks Pradeep. Does it mean this job is a bad candidate for MR? Interestingly, running the cmdline '/bin/grep' under a streaming job provides (1) Much better disk throughput and, (2) CPU load is almost evenly spread across all cores/threads (no CPU gets pegged to 100%). On Thu, Oct 10, 2013

Re: Improving MR job disk IO

2013-10-10 Thread Pradeep Gollakota
I don't think it necessarily means that the job is a bad candidate for MR. It's a different type of a workload. Hortonworks has a great article on the different types of workloads you might see and how that affects your provisioning choices at

Re: Improving MR job disk IO

2013-10-10 Thread Xuri Nagarin
On Thu, Oct 10, 2013 at 1:27 PM, Pradeep Gollakota pradeep...@gmail.comwrote: I don't think it necessarily means that the job is a bad candidate for MR. It's a different type of a workload. Hortonworks has a great article on the different types of workloads you might see and how that affects

Conflicting dependency versions

2013-10-10 Thread Albert Shau
Hi, I have a yarn application that launches a mapreduce job that has a mapper that uses a newer version of guava than the one hadoop is using. Because of this, the mapper fails and gets a NoSuchMethod exception. Is there a way to indicate that application dependencies should be used over hadoop

Re: Conflicting dependency versions

2013-10-10 Thread Hitesh Shah
Hi Albert, If you are using distributed cache to push the newer version of the guava jars, you can try setting mapreduce.job.user.classpath.first to true. If not, you can try overriding the value of mapreduce.application.classpath to ensure that the dir where the newer guava jars are present

Re: Intermittent DataStreamer Exception while appending to file inside HDFS

2013-10-10 Thread Arinto Murdopo
Thank you for the comprehensive answer, When I inspect our NameNode UI, I see there are 3 datanodes are up. However, as you mentioned, the log only showed 2 datanodes are up. Does it mean that one of the datanodes was unreachable when we try to append into the files? Best regards, Arinto

State of Art in Hadoop Log aggregation

2013-10-10 Thread Sagar Mehta
Hi Guys, We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly 1. Tasktracker logs 2. Datanode logs 3. Hbase RegionServer logs One way is to use something like a

Re: State of Art in Hadoop Log aggregation

2013-10-10 Thread Raymond Tay
You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs. On 11 Oct, 2013, at 1:36 PM, Sagar Mehta sagarme...@gmail.com wrote: Hi Guys, We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what