Hi,
I am trying to generate random data using hadoop streaming python. It's a
map only job and I need to run a number of maps. There is no input to the
map as it's just going to generate random data.
How do I specify the number of maps to run? ( I am confused here because,
if I am not wrong,
Hi,
Is there any way for setting the default replication factor for specific
directories?
The default replication factor is 3 for the cluster. I don't want to change
this global default, but I want specific directories to have different
replication factor.
I can use the following command to set
, at 10:55 PM, Austin Chungath wrote:
Thanks Adam,
That was very helpful. Your second point solved my problems :-)
The hdfs port number was wrong.
I didn't use the option -ppgu what does it do?
On Mon, May 7, 2012 at 8:07 PM, Adam Faris afa...@linkedin.com wrote:
Hi Austin
, and to
clean up the hdfs directories when you repurpose the nodes?
Does this make sense?
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com
wrote:
Yeah I know :-)
and this is not a production cluster
)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Any idea why this error is coming?
I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
(/user/hadoop)
Thanks Regards,
Austin
On Mon, May 7, 2012 at 3:57 PM, Austin Chungath austi...@gmail.com wrote:
Thanks,
So I decided to try
PM, Austin Chungath austi...@gmail.com
wrote:
Thanks,
So I decided to try and move using distcp.
$ hadoop distcp hdfs://localhost:54310/tmp
hdfs://localhost:8021/tmp_copy
12/05/07 14:57:38 INFO tools.DistCp:
srcPaths=[hdfs://localhost:54310/tmp]
12/05/07 14:57:38 INFO
Hi,
I am migrating from Apache hadoop 0.20.205 to CDH3u3.
I don't want to lose the data that is in the HDFS of Apache hadoop
0.20.205.
How do I migrate to CDH3u3 but keep the data that I have on 0.20.205.
What is the best practice/ techniques to do this?
Thanks Regards,
Austin
On Thu, May 3, 2012 at 11:41 AM, Austin Chungath austi...@gmail.com
wrote:
Hi,
I am migrating from Apache hadoop 0.20.205 to CDH3u3.
I don't want to lose the data that is in the HDFS of Apache hadoop
0.20.205.
How do I migrate to CDH3u3 but keep the data that I have on 0.20.205.
What
On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com
wrote:
Thanks for the suggestions,
My concerns are that I can't actually copyToLocal from the dfs because
the
data is huge.
Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a
namenode upgrade. I don't
this to Cloudera mailing list.
On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com
wrote:
There is only one cluster. I am not copying between clusters.
Say I have a cluster running apache 0.20.205 with 10 TB storage capacity
and has about 8 TB of data.
Now how can I migrate
upcoming proposal talk... ;-)
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com wrote:
Yes. This was first posted on the cloudera mailing list. There were no
responses.
But this is not related to cloudera
I have a 2 node cluster running hadoop 0.20.205. There is only one user ,
username: hadoop of group: hadoop.
What is the easiest way to add one more user say hadoop1 with DFS
permissions set as true?
I did the following to create a user in the master node.
sudo adduser --ingroup hadoop hadoop1
:59 PM, Austin Chungath austi...@gmail.com
wrote:
I have a 2 node cluster running hadoop 0.20.205. There is only one user ,
username: hadoop of group: hadoop.
What is the easiest way to add one more user say hadoop1 with DFS
permissions set as true?
I did the following to create a user
, Austin Chungath austi...@gmail.com wrote:
I tried the patch MAPREDUCE-2457 but it didn't work for my hadoop 0.20.205.
Are you sure this patch will work for 0.20.205?
According to the description it says that the patch works for 0.21 and
0.22 and it says that 0.20 supports group.name without
in https://issues.apache.org/jira/browse/MAPREDUCE-2457
to have group.name support.
On Thu, Mar 1, 2012 at 6:42 PM, Austin Chungath austi...@gmail.com
wrote:
I am running fair scheduler on hadoop 0.20.205.0
http://hadoop.apache.org/common/docs/r0.20.205.0/fair_scheduler.html
The above
, value)
Let me know if it works..
On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote:
How can I set the fair scheduler such that all jobs submitted from a
particular user group go to a pool with the group name?
I have setup fair scheduler and I have two users: A and B
..
On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote:
How can I set the fair scheduler such that all jobs submitted from a
particular user group go to a pool with the group name?
I have setup fair scheduler and I have two users: A and B (belonging to
the
user group hadoop
How can I set the fair scheduler such that all jobs submitted from a
particular user group go to a pool with the group name?
I have setup fair scheduler and I have two users: A and B (belonging to the
user group hadoop)
When these users submit hadoop jobs, the jobs from A got to a pool named A
#Customizing+How+Lines+are+Split+into+Key%2FValue+Pairs
Read this link, your options are wrong below.
On Tue, Feb 28, 2012 at 1:13 PM, Austin Chungath austi...@gmail.com
wrote:
When I am using more than one reducer in hadoop streaming where I am
using
my custom separater rather than
When I am using more than one reducer in hadoop streaming where I am using
my custom separater rather than the tab, it looks like the hadoop shuffling
process is not happening as it should.
This is the reducer output when I am using '\t' to separate my key value
pair that is output from the
20 matches
Mail list logo