Thanks for the suggestions,
My concerns are that I can't actually copyToLocal from the dfs because the
data is huge.
Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a
namenode upgrade. I don't have to copy data out of dfs.
But here I am having Apache hadoop 0.20.205 and I want t
use $HADOOP_HOME/bin/hadoop ovi command
2012/5/3 Manu S
> Hi All,
>
> Can we find out the complete block names from the fsimage we have?
>
> Scenario:
> Accidentally we had lost the hdfs data. We have the previous fsimage before
> the data loss. We have restored some data using some data recover
--
Regards
Junyong
use $HADOOP_HOME/bin/hadoop ovi
2012/5/3 Manu S
> Hi All,
>
> Can we find out the complete block names from the fsimage we have?
>
> Scenario:
> Accidentally we had lost the hdfs data. We have the previous fsimage before
> the data loss. We have restored some data using some data recovery tools.
Well...
Lots of information but still missing some of the basics...
Which release and version?
What are your ulimits set to?
How much free disk space do you have?
What are you attempting to do?
Stuff like that.
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 2, 2012,
use $HADOOP_HOME/bin/hadoop ovi
2012/5/3 Manu S
> Hi All,
>
> Can we find out the complete block names from the fsimage we have?
>
> Scenario:
> Accidentally we had lost the hdfs data. We have the previous fsimage before
> the data loss. We have restored some data using some data recovery tools.
Hi Gayatri,
Looks like you might want to use a low-level enhancement of the default
Hadoop API called Pangool (http://pangool.net) which uses tuples and
simplifies grouping by, sorting by and joining datasets in Hadoop.
On Mon, Apr 23, 2012 at 7:30 AM, Gayatri Rao wrote:
> Hello,
>
> I am using
this one just came in :)
2012/5/3 JunYong Li
> --
> Regards
> Junyong
>
--
Nitin Pawar
you can actually look at the distcp
http://hadoop.apache.org/common/docs/r0.20.0/distcp.html
but this means that you have two different set of clusters available to do
the migration
On Thu, May 3, 2012 at 12:51 PM, Austin Chungath wrote:
> Thanks for the suggestions,
> My concerns are that I c
thanks, i am lij...@gmail.com, this another mail of mine. but i always reveive
the following warning
Your message
Subject: Re: How to fetch the block names from fsimage/edits file?
was not delivered to:
"NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford"@sas.sungardrs.com
because:
The message
There is only one cluster. I am not copying between clusters.
Say I have a cluster running apache 0.20.205 with 10 TB storage capacity
and has about 8 TB of data.
Now how can I migrate the same cluster to use cdh3 and use that same 8 TB
of data.
I can't copy 8 TB of data using distcp because I ha
Seems like a matter of upgrade. I am not a Cloudera user so would not know
much, but you might find some help moving this to Cloudera mailing list.
On Thu, May 3, 2012 at 2:51 AM, Austin Chungath wrote:
> There is only one cluster. I am not copying between clusters.
>
> Say I have a cluster runn
Use
$HADOOP_HOME/bin/hdfs oiv/oie
in 0.23 and above versions
From: JunYong Li [lij...@gmail.com]
Sent: Thursday, May 03, 2012 5:04 PM
To: common-user@hadoop.apache.org
Subject: Re: How to fetch the block names from fsimage/edits file?
use $HADOOP_HOME/bi
Yes. This was first posted on the cloudera mailing list. There were no
responses.
But this is not related to cloudera as such.
cdh3 is based on apache hadoop 0.20 as the base. My data is in apache
hadoop 0.20.205
There is an upgrade namenode option when we are migrating to a higher
version say f
Well, you've kind of painted yourself in to a corner...
Not sure why you didn't get a response from the Cloudera lists, but it's a
generic question...
8 out of 10 TB. Are you talking effective storage or actual disks?
And please tell me you've already ordered more hardware.. Right?
And please t
Yeah I know :-)
and this is not a production cluster ;-) and yes there is more hardware
coming :-)
On Thu, May 3, 2012 at 4:10 PM, Michel Segel wrote:
> Well, you've kind of painted yourself in to a corner...
> Not sure why you didn't get a response from the Cloudera lists, but it's a
> generic q
Ok... When you get your new hardware...
Set up one server as your new NN, JT, SN.
Set up the others as a DN.
(Cloudera CDH3u3)
On your existing cluster...
Remove your old log files, temp files on HDFS anything you don't need.
This should give you some more space.
Start copying some of the direct
It sounds like an exciting feature. Does anyone have tried this in practice?
How does the hot standby namenode perform and how reliable is the HDFS
recovery? Is it now a good chance to migrate to 2.0.0, in your opinions?
Best,
Shi
Hey Shi Yu,
Some questions of yours are answered at this comment:
https://issues.apache.org/jira/browse/HDFS-1623?focusedCommentId=13215309&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13215309
(and below) and at was tracked at
https://issues.apache.org/jira/browse
Honestly that is a hassle, going from 205 to cdh3u3 is probably more
or a cross-grade then an upgrade or downgrade. I would just stick it
out. But yes like Michael said two clusters on the same gear and
distcp. If you are using RF=3 you could also lower your replication to
rf=2 'hadoop dfs -setrepl
Hi Harsh J,
It seems that the 20% performance lost is not that bad, at least some smart
people are still working to improve it. I will keep an eye on this interesting
trend.
Shi
This probably is a more relevant question in CDH mailing lists. That said,
what Edward is suggesting seems reasonable. Reduce replication factor,
decommission some of the nodes and create a new cluster with those nodes
and do distcp.
Could you share with us the reasons you want to migrate from Apa
I'm trying to use a small cluster to make sure I understand the setup
and have my code running before going to a big cluster. I have two
machines. I've followed the tutorial here:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
I have been using 0.20.203
Hi Shi,
The 20% regression was prior to implementing a few optimizations on the
branch. Here's the later comment:
https://issues.apache.org/jira/browse/HDFS-1623?focusedCommentId=13218813&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218813
Also, the 20% measurem
Hi Todd,
Okay, that sounds really good (sorry didn't grab all the
information in that long page).
Shi
Hello All,
I was wandering if it is possible to filter all groups in pig which have size
N. This sounds like something common but can not find the way to do it
Please help :)
Thanks,
B = GROUP A BY x;
C = FOREACH B GENERATE group,SIZE(B),B;
D = FILTER C BY $1 == N;
On Thu, May 3, 2012 at 8:58 PM, Aleksandr Elbakyan wrote:
> Hello All,
> I was wandering if it is possible to filter all groups in pig which have size
> N. This sounds like something common but can not find th
Thanks for help
- Original Message -
From: Mathias Herberts
To: common-user@hadoop.apache.org; Aleksandr Elbakyan
Cc:
Sent: Thursday, May 3, 2012 12:04 PM
Subject: Re: Pig question
B = GROUP A BY x;
C = FOREACH B GENERATE group,SIZE(B),B;
D = FILTER C BY $1 == N;
On Thu, May 3,
Pedro,
Thanks for the response. Unfortunately I am running it on in-house cluster
and from there I need to upload to S3.
-Himanshu
On Wed, May 2, 2012 at 2:03 PM, Pedro Figueiredo wrote:
>
> On 2 May 2012, at 18:29, Himanshu Vijay wrote:
>
> > Hi,
> >
> > I have 100 files each of ~3 GB. I need
Ok... So riddle me this...
I currently have a replication factor of 3.
I reset it to two.
What do you have to do to get the replication factor of 3 down to 2?
Do I just try to rebalance the nodes?
The point is that you are looking at a very small cluster.
You may want to start the be cluster with
Hi Pat,
20.205 is the stable version before 1.0. 1.0 is not substantially different
than 0.20. Any reasons you don't wanna use it?
I don't think "occasional HDFS corruption" is a known issue. That would be,
umm... lets just say pretty severe. Are you sure you've configured it
properly?
Your task
Hi Harsh,
Does doing (ii) mess up with hadoop (i) level ?
Or does it happen in both the options anyways ?
Thanks,
-JJ
On Fri, Apr 20, 2012 at 8:28 AM, Harsh J wrote:
> Yes this is possible, and there's two ways to do this.
>
> 1. Use a distro/release that carries the
> https://issues.apache.
Doing (ii) would be an isolated app-level config and wouldn't get
affected by the toggling of
(i). The feature from (i) is available already in CDH 4.0.0-b2 btw.
On Fri, May 4, 2012 at 4:58 AM, Mapred Learn wrote:
> Hi Harsh,
>
> Does doing (ii) mess up with hadoop (i) level ?
>
> Or does it happ
I am not sure about ulimits, but I can answer the rest. It's a Cloudera
distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
I am taking keys in the form of (gridID, date), each with a value of 1. The
reduce step just sums the 1's as the final output value for the key (It's
co
(Shit, sorry about my last response, I got confused and thought this
were a cdh-user list. I deeply apologize.)
The (i) is in 0.23.1/2.x if you upgrade to that today.
On Fri, May 4, 2012 at 5:02 AM, Harsh J wrote:
> Doing (ii) would be an isolated app-level config and wouldn't get
> affected by
Thanks Harsh.
Are they in cdh3 too ?
On Thu, May 3, 2012 at 4:32 PM, Harsh J wrote:
> Doing (ii) would be an isolated app-level config and wouldn't get
> affected by the toggling of
> (i). The feature from (i) is available already in CDH 4.0.0-b2 btw.
>
> On Fri, May 4, 2012 at 4:58 AM, Mapre
Keith
What is the the output for ulimit -n? Your value for number of open files is
probably too low.
Raj
>
> From: Keith Thompson
>To: common-user@hadoop.apache.org
>Sent: Thursday, May 3, 2012 4:33 PM
>Subject: Re: Reduce Hangs at 66%
>
>I am not sure abou
37 matches
Mail list logo