Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Thanks for the suggestions, My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't have to copy data out of dfs. But here I am having Apache hadoop 0.20.205 and I want t

Re: How to fetch the block names from fsimage/edits file?

2012-05-03 Thread JunYong Li
use $HADOOP_HOME/bin/hadoop ovi command 2012/5/3 Manu S > Hi All, > > Can we find out the complete block names from the fsimage we have? > > Scenario: > Accidentally we had lost the hdfs data. We have the previous fsimage before > the data loss. We have restored some data using some data recover

i can not send mail to common-user

2012-05-03 Thread JunYong Li
-- Regards Junyong

Re: How to fetch the block names from fsimage/edits file?

2012-05-03 Thread JunYong Li
use $HADOOP_HOME/bin/hadoop ovi 2012/5/3 Manu S > Hi All, > > Can we find out the complete block names from the fsimage we have? > > Scenario: > Accidentally we had lost the hdfs data. We have the previous fsimage before > the data loss. We have restored some data using some data recovery tools.

Re: Reduce Hangs at 66%

2012-05-03 Thread Michel Segel
Well... Lots of information but still missing some of the basics... Which release and version? What are your ulimits set to? How much free disk space do you have? What are you attempting to do? Stuff like that. Sent from a remote device. Please excuse any typos... Mike Segel On May 2, 2012,

Re: How to fetch the block names from fsimage/edits file?

2012-05-03 Thread JunYong Li
use $HADOOP_HOME/bin/hadoop ovi 2012/5/3 Manu S > Hi All, > > Can we find out the complete block names from the fsimage we have? > > Scenario: > Accidentally we had lost the hdfs data. We have the previous fsimage before > the data loss. We have restored some data using some data recovery tools.

Re: Problem with using BinSedesTuple as Mapper key

2012-05-03 Thread Pere Ferrera
Hi Gayatri, Looks like you might want to use a low-level enhancement of the default Hadoop API called Pangool (http://pangool.net) which uses tuples and simplifies grouping by, sorting by and joining datasets in Hadoop. On Mon, Apr 23, 2012 at 7:30 AM, Gayatri Rao wrote: > Hello, > > I am using

Re: i can not send mail to common-user

2012-05-03 Thread Nitin Pawar
this one just came in :) 2012/5/3 JunYong Li > -- > Regards > Junyong > -- Nitin Pawar

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Nitin Pawar
you can actually look at the distcp http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different set of clusters available to do the migration On Thu, May 3, 2012 at 12:51 PM, Austin Chungath wrote: > Thanks for the suggestions, > My concerns are that I c

Re:Re: i can not send mail to common-user

2012-05-03 Thread lijy83
thanks, i am lij...@gmail.com, this another mail of mine. but i always reveive the following warning Your message Subject: Re: How to fetch the block names from fsimage/edits file? was not delivered to: "NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford"@sas.sungardrs.com because: The message

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I ha

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Prashant Kommireddi
Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath wrote: > There is only one cluster. I am not copying between clusters. > > Say I have a cluster runn

RE: How to fetch the block names from fsimage/edits file?

2012-05-03 Thread Amith D K
Use $HADOOP_HOME/bin/hdfs oiv/oie in 0.23 and above versions From: JunYong Li [lij...@gmail.com] Sent: Thursday, May 03, 2012 5:04 PM To: common-user@hadoop.apache.org Subject: Re: How to fetch the block names from fsimage/edits file? use $HADOOP_HOME/bi

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say f

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Michel Segel
Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please t

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Yeah I know :-) and this is not a production cluster ;-) and yes there is more hardware coming :-) On Thu, May 3, 2012 at 4:10 PM, Michel Segel wrote: > Well, you've kind of painted yourself in to a corner... > Not sure why you didn't get a response from the Cloudera lists, but it's a > generic q

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Michel Segel
Ok... When you get your new hardware... Set up one server as your new NN, JT, SN. Set up the others as a DN. (Cloudera CDH3u3) On your existing cluster... Remove your old log files, temp files on HDFS anything you don't need. This should give you some more space. Start copying some of the direct

High Availability Framework for HDFS Namenode in 2.0.0

2012-05-03 Thread Shi Yu
It sounds like an exciting feature. Does anyone have tried this in practice? How does the hot standby namenode perform and how reliable is the HDFS recovery? Is it now a good chance to migrate to 2.0.0, in your opinions? Best, Shi

Re: High Availability Framework for HDFS Namenode in 2.0.0

2012-05-03 Thread Harsh J
Hey Shi Yu, Some questions of yours are answered at this comment: https://issues.apache.org/jira/browse/HDFS-1623?focusedCommentId=13215309&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13215309 (and below) and at was tracked at https://issues.apache.org/jira/browse

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Edward Capriolo
Honestly that is a hassle, going from 205 to cdh3u3 is probably more or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the same gear and distcp. If you are using RF=3 you could also lower your replication to rf=2 'hadoop dfs -setrepl

Re: High Availability Framework for HDFS Namenode in 2.0.0

2012-05-03 Thread Shi Yu
Hi Harsh J, It seems that the 20% performance lost is not that bad, at least some smart people are still working to improve it. I will keep an eye on this interesting trend. Shi

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Suresh Srinivas
This probably is a more relevant question in CDH mailing lists. That said, what Edward is suggesting seems reasonable. Reduce replication factor, decommission some of the nodes and create a new cluster with those nodes and do distcp. Could you share with us the reasons you want to migrate from Apa

Problem with cluster

2012-05-03 Thread Pat Ferrel
I'm trying to use a small cluster to make sure I understand the setup and have my code running before going to a big cluster. I have two machines. I've followed the tutorial here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ I have been using 0.20.203

Re: High Availability Framework for HDFS Namenode in 2.0.0

2012-05-03 Thread Todd Lipcon
Hi Shi, The 20% regression was prior to implementing a few optimizations on the branch. Here's the later comment: https://issues.apache.org/jira/browse/HDFS-1623?focusedCommentId=13218813&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218813 Also, the 20% measurem

Re: High Availability Framework for HDFS Namenode in 2.0.0

2012-05-03 Thread Shi Yu
Hi Todd, Okay, that sounds really good (sorry didn't grab all the information in that long page). Shi

Pig question

2012-05-03 Thread Aleksandr Elbakyan
Hello All, I was wandering if it is possible to filter all groups in pig which have size N. This sounds like something common but can not find the way to do it Please help :) Thanks,

Re: Pig question

2012-05-03 Thread Mathias Herberts
B = GROUP A BY x; C = FOREACH B GENERATE group,SIZE(B),B; D = FILTER C BY $1 == N; On Thu, May 3, 2012 at 8:58 PM, Aleksandr Elbakyan wrote: > Hello All, > I was wandering if it is possible to filter all groups in pig which have size > N. This sounds like something common but can not find th

Re: Pig question

2012-05-03 Thread Aleksandr Elbakyan
Thanks for help - Original Message - From: Mathias Herberts To: common-user@hadoop.apache.org; Aleksandr Elbakyan Cc: Sent: Thursday, May 3, 2012 12:04 PM Subject: Re: Pig question B = GROUP A BY x; C = FOREACH B GENERATE group,SIZE(B),B; D = FILTER C BY $1 == N; On Thu, May 3,

Re: Splitting data input to Distcp

2012-05-03 Thread Himanshu Vijay
Pedro, Thanks for the response. Unfortunately I am running it on in-house cluster and from there I need to upload to S3. -Himanshu On Wed, May 2, 2012 at 2:03 PM, Pedro Figueiredo wrote: > > On 2 May 2012, at 18:29, Himanshu Vijay wrote: > > > Hi, > > > > I have 100 files each of ~3 GB. I need

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Michel Segel
Ok... So riddle me this... I currently have a replication factor of 3. I reset it to two. What do you have to do to get the replication factor of 3 down to 2? Do I just try to rebalance the nodes? The point is that you are looking at a very small cluster. You may want to start the be cluster with

Re: Problem with cluster

2012-05-03 Thread Ravi Prakash
Hi Pat, 20.205 is the stable version before 1.0. 1.0 is not substantially different than 0.20. Any reasons you don't wanna use it? I don't think "occasional HDFS corruption" is a known issue. That would be, umm... lets just say pretty severe. Are you sure you've configured it properly? Your task

Re: How to add debugging to map- red code

2012-05-03 Thread Mapred Learn
Hi Harsh, Does doing (ii) mess up with hadoop (i) level ? Or does it happen in both the options anyways ? Thanks, -JJ On Fri, Apr 20, 2012 at 8:28 AM, Harsh J wrote: > Yes this is possible, and there's two ways to do this. > > 1. Use a distro/release that carries the > https://issues.apache.

Re: How to add debugging to map- red code

2012-05-03 Thread Harsh J
Doing (ii) would be an isolated app-level config and wouldn't get affected by the toggling of (i). The feature from (i) is available already in CDH 4.0.0-b2 btw. On Fri, May 4, 2012 at 4:58 AM, Mapred Learn wrote: > Hi Harsh, > > Does doing (ii) mess up with hadoop (i) level ? > > Or does it happ

Re: Reduce Hangs at 66%

2012-05-03 Thread Keith Thompson
I am not sure about ulimits, but I can answer the rest. It's a Cloudera distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step, I am taking keys in the form of (gridID, date), each with a value of 1. The reduce step just sums the 1's as the final output value for the key (It's co

Re: How to add debugging to map- red code

2012-05-03 Thread Harsh J
(Shit, sorry about my last response, I got confused and thought this were a cdh-user list. I deeply apologize.) The (i) is in 0.23.1/2.x if you upgrade to that today. On Fri, May 4, 2012 at 5:02 AM, Harsh J wrote: > Doing (ii) would be an isolated app-level config and wouldn't get > affected by

Re: How to add debugging to map- red code

2012-05-03 Thread Mapred Learn
Thanks Harsh. Are they in cdh3 too ? On Thu, May 3, 2012 at 4:32 PM, Harsh J wrote: > Doing (ii) would be an isolated app-level config and wouldn't get > affected by the toggling of > (i). The feature from (i) is available already in CDH 4.0.0-b2 btw. > > On Fri, May 4, 2012 at 4:58 AM, Mapre

Re: Reduce Hangs at 66%

2012-05-03 Thread Raj Vishwanathan
Keith What is the the output for ulimit -n? Your value for number of open files is probably too low. Raj > > From: Keith Thompson >To: common-user@hadoop.apache.org >Sent: Thursday, May 3, 2012 4:33 PM >Subject: Re: Reduce Hangs at 66% > >I am not sure abou