HDFS metrics

2013-06-12 Thread Pedro Sá da Costa
I am using Yarn, and 1 - I want to know the average IO throughput of the HDFS (like know how fast the datanodes are writing in a disk) so that I can compare beween 2 HDFS intances. The command hdfs dfsadmin -report doesn't give me that. The HDFS has a command for that? 2 - and there is a similar

Re: HDFS metrics

2013-06-12 Thread Bhasker Allene
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ On 12/06/2013 09:49, Pedro Sá da Costa wrote: I am using Yarn, and 1 - I want to know the average IO throughput of the HDFS (like know how fast the datanodes

Get the history info in Yarn

2013-06-12 Thread Pedro Sá da Costa
I tried the command mapred job list all to get the history of the jobs completed, but the log doesn't have the time where a jobs started, end, the number of maps and reduce, and the size of data read and written. Can I get this info by a shell command? I am using Yarn. -- Best regards,

RE: Get the history info in Yarn

2013-06-12 Thread Devaraj K
Hi, You can get all the details for Job using this mapred command mapred job –status Job-ID For this you need to have Job History Server Running and the same job history server address configured in the client side. Thanks Regards Devaraj K From: Pedro Sá da Costa

Task Tracker going down on hive cluster

2013-06-12 Thread Ravi Shetye
In last 4-5 of day the task tracker on one of my slave machines has gone down couple of time. It has been working fine from the past 4-5 months The cluster configuration is 4 machine cluster on AWS 1 m2.xlarge master 3 m2.xlarge slaves The cluster is dedicated to run hive queries, with the data

Re: Container allocation on the same node

2013-06-12 Thread Krishna Kishore Bonagiri
Hi Harsh, What will happen when I specify local host as the required host? Doesn't the resource manager give me all the containers on the local host? I don't want to constrain myself to the local host, which might be busy while other nodes in the cluster have enough resources available for

Re: Management API

2013-06-12 Thread MARCOS MEDRADO RUBINELLI
Rita, There aren't any specs as far as I know, but in my experience the interface is stable enough from version to version, with the occasional extra field added here or there. If you query specifically for the beans you want (e.g.

Re: Now give .gz file as input to the MAP

2013-06-12 Thread Sanjay Subramanian
Rahul-da I found bz2 pretty slow (although splittable) so I switched to snappy (only sequence files are splittable but compress-decompress is fast) Thanks Sanjay From: Rahul Bhattacharjee rahul.rec@gmail.commailto:rahul.rec@gmail.com Reply-To:

Re: Now give .gz file as input to the MAP

2013-06-12 Thread Rahul Bhattacharjee
Yeah I too found that quite slow and memory hungry ! Thanks, Rahul-da On Wed, Jun 12, 2013 at 11:13 PM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Rahul-da I found bz2 pretty slow (although splittable) so I switched to snappy (only sequence files are splittable but

RE: Shuffle design: optimization tradeoffs

2013-06-12 Thread John Lilley
In reading this link as well as the sailfish report, it strikes me that Hadoop skipped a potentially significant optimization. Namely, why are multiple sorted spill files merged into a single output file? Why not have the auxiliary service merge on the fly, thus avoiding landing them to disk?

Aggregating data nested into JSON documents

2013-06-12 Thread Tecno Brain
Hello, I'm new to Hadoop. I have a large quantity of JSON documents with a structure similar to what is shown below. { g : some-group-identifier, sg: some-subgroup-identifier, j : some-job-identifier, page : 23, ... // other fields omitted

Install CDH4 using tar ball with MRv1, Not YARN version

2013-06-12 Thread selva
Hi folks, I am trying to install CDH4 using tar ball with MRv1, Not YARN version(MRv2). I downloaded two tarballs (mr1-0.20.2+n and hadoop-2.0.0+n) from this location http://archive.cloudera.com/cdh4/cdh/4/ as per cloudera instruction i found If you install CDH4 from a tarball, you will

recovery accidently deleted pig script

2013-06-12 Thread feng jiang
Hi everyone, We have a pig script scheduled running every 4 hours. Someone accidentally deleted the pig script(rm). Is there any way to recover the script? I am guessing Hadoop copy the program to every nodes before running. Just in case it has any copy in the nodes. Best regards, Feng Jiang

Re: recovery accidently deleted pig script

2013-06-12 Thread Michael Segel
Where was the pig script? On HDFS? How often does your cluster clean up the trash? (Deleted stuff doesn't get cleaned up when the file is deleted... ) Its a configurable setting so YMMV On Jun 12, 2013, at 8:58 PM, feng jiang jiangfut...@gmail.com wrote: Hi everyone, We have a pig

Re: SSD support in HDFS

2013-06-12 Thread Michael Segel
I could have sworn there was a thread on this already. (Maybe the HBase list?) Andrew P. kinda nailed it when he talked about the fact that you had to write the replication(s). If you wanted improved performance, why not look at the hybrid drives that have a small SSD buffer and a spinning

Compatibility of Hadoop 0.20.x and hadoop 1.0.3

2013-06-12 Thread Lin Yang
Hi, all, I was wondering could an application written with hadoop 0.20.3 API run on a hadoop 1.0.3 cluster? If not, is there any way to run this application on hadoop 1.0.3 instead of re-writing all the code?? -- Lin Yang

Reducer not getting called

2013-06-12 Thread Omkar Joshi
Hi, I have a SequenceFile which contains several jpeg images with (image name, image bytes) as key-value pairs. My objective is to count the no. of images by grouping them by the source, something like this : Nikon Coolpix 100 Sony Cybershot 251 N82 100 The MR code is : package

Re: Reducer not getting called

2013-06-12 Thread Harsh J
You're not using the recommended @Override annotations, and are hitting a classic programming mistake. Your issue is same as this earlier discussion: http://search-hadoop.com/m/gqA3rAaVQ7 (and the ones before it). On Thu, Jun 13, 2013 at 9:52 AM, Omkar Joshi omkar.jo...@lntinfotech.com wrote:

RE: Reducer not getting called

2013-06-12 Thread Omkar Joshi
Ok but that link is broken - can you provide a working one? Regards, Omkar Joshi -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, June 13, 2013 11:01 AM To: user@hadoop.apache.org Subject: Re: Reducer not getting called You're not using the recommended

Re: Compatibility of Hadoop 0.20.x and hadoop 1.0.3

2013-06-12 Thread Lin Yang
Hi, Vinod, Thanks.* * 2013/6/13 Vinod Kumar Vavilapalli vino...@hortonworks.com It should mostly work. I just checked our CHANGES.txt file and haven't seen much incompatibilities introduced between those releases. But 0.20.3 is pretty old, so only one way to know for sure - compile and