spawn maps without any input data - hadoop streaming

2013-07-16 Thread Austin Chungath
Hi, I am trying to generate random data using hadoop streaming python. It's a map only job and I need to run a number of maps. There is no input to the map as it's just going to generate random data. How do I specify the number of maps to run? ( I am confused here because, if I am not wrong,

how to set default replication factor for specific directories

2013-02-03 Thread Austin Chungath
Hi, Is there any way for setting the default replication factor for specific directories? The default replication factor is 3 for the cluster. I don't want to change this global default, but I want specific directories to have different replication factor. I can use the following command to set

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-09 Thread Austin Chungath
, at 10:55 PM, Austin Chungath wrote: Thanks Adam, That was very helpful. Your second point solved my problems :-) The hdfs port number was wrong. I didn't use the option -ppgu what does it do? On Mon, May 7, 2012 at 8:07 PM, Adam Faris afa...@linkedin.com wrote: Hi Austin

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Austin Chungath
, and to clean up the hdfs directories when you repurpose the nodes? Does this make sense? Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com wrote: Yeah I know :-) and this is not a production cluster

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Austin Chungath
) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Any idea why this error is coming? I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3 (/user/hadoop) Thanks Regards, Austin On Mon, May 7, 2012 at 3:57 PM, Austin Chungath austi...@gmail.com wrote: Thanks, So I decided to try

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-07 Thread Austin Chungath
PM, Austin Chungath austi...@gmail.com wrote: Thanks, So I decided to try and move using distcp. $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy 12/05/07 14:57:38 INFO tools.DistCp: srcPaths=[hdfs://localhost:54310/tmp] 12/05/07 14:57:38 INFO

Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What is the best practice/ techniques to do this? Thanks Regards, Austin

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
On Thu, May 3, 2012 at 11:41 AM, Austin Chungath austi...@gmail.com wrote: Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com wrote: Thanks for the suggestions, My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate

Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3

2012-05-03 Thread Austin Chungath
upcoming proposal talk... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com wrote: Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera

how to add more than one user to hadoop with DFS permissions?

2012-03-10 Thread Austin Chungath
I have a 2 node cluster running hadoop 0.20.205. There is only one user , username: hadoop of group: hadoop. What is the easiest way to add one more user say hadoop1 with DFS permissions set as true? I did the following to create a user in the master node. sudo adduser --ingroup hadoop hadoop1

Re: how to add more than one user to hadoop with DFS permissions?

2012-03-10 Thread Austin Chungath
:59 PM, Austin Chungath austi...@gmail.com wrote: I have a 2 node cluster running hadoop 0.20.205. There is only one user , username: hadoop of group: hadoop. What is the easiest way to add one more user say hadoop1 with DFS permissions set as true? I did the following to create a user

fairscheduler : group.name | Please edit patch to work for 0.20.205

2012-03-05 Thread Austin Chungath
, Austin Chungath austi...@gmail.com wrote: I tried the patch MAPREDUCE-2457 but it didn't work for my hadoop 0.20.205. Are you sure this patch will work for 0.20.205? According to the description it says that the patch works for 0.21 and 0.22 and it says that 0.20 supports group.name without

Re: fairscheduler : group.name doesn't work, please help

2012-03-02 Thread Austin Chungath
in https://issues.apache.org/jira/browse/MAPREDUCE-2457 to have group.name support. On Thu, Mar 1, 2012 at 6:42 PM, Austin Chungath austi...@gmail.com wrote: I am running fair scheduler on hadoop 0.20.205.0 http://hadoop.apache.org/common/docs/r0.20.205.0/fair_scheduler.html The above

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Austin Chungath
, value) Let me know if it works.. On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote: How can I set the fair scheduler such that all jobs submitted from a particular user group go to a pool with the group name? I have setup fair scheduler and I have two users: A and B

Re: Hadoop fair scheduler doubt: allocate jobs to pool

2012-03-01 Thread Austin Chungath
.. On 29 February 2012 14:18, Austin Chungath austi...@gmail.com wrote: How can I set the fair scheduler such that all jobs submitted from a particular user group go to a pool with the group name? I have setup fair scheduler and I have two users: A and B (belonging to the user group hadoop

Hadoop fair scheduler doubt: allocate jobs to pool

2012-02-29 Thread Austin Chungath
How can I set the fair scheduler such that all jobs submitted from a particular user group go to a pool with the group name? I have setup fair scheduler and I have two users: A and B (belonging to the user group hadoop) When these users submit hadoop jobs, the jobs from A got to a pool named A

Re: hadoop streaming : need help in using custom key value separator

2012-02-28 Thread Austin Chungath
#Customizing+How+Lines+are+Split+into+Key%2FValue+Pairs Read this link, your options are wrong below. On Tue, Feb 28, 2012 at 1:13 PM, Austin Chungath austi...@gmail.com wrote: When I am using more than one reducer in hadoop streaming where I am using my custom separater rather than

hadoop streaming : need help in using custom key value separator

2012-02-27 Thread Austin Chungath
When I am using more than one reducer in hadoop streaming where I am using my custom separater rather than the tab, it looks like the hadoop shuffling process is not happening as it should. This is the reducer output when I am using '\t' to separate my key value pair that is output from the