Re: hadoop 2.4 using Protobuf - How does downgrade back to 2.3 works ?

2014-10-21 Thread Susheel Kumar Gadalay
New files added in 2.4.0 will not be there in the metadata of 2.3.0. You need to add once again. On 10/21/14, Manoj Samel manojsamelt...@gmail.com wrote: Is the pre-upgrade metadata also kept updated with any changes one in 2.4.0 ? Or is it just the 2.3.0 snapshot preserved? Thanks, On Sat,

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Jakub Stransky
Hello, as far as I understand. Number of mappers you cannot drive. The number of reducers you can control via PARALEL keyword. Number of containers on a node is given by following combination of settings: yarn.nodemanager.resource.memory-mb - set on a cluster. And following properties can be

issue about submit job to local ,not to cluster

2014-10-21 Thread ch huang
hi,maillist: my cluster move from one IDC to another IDC ,when all done ,i run job ,and find the job run on local box not on cluster ,why? it is normal on old IDC!

Re: issue about submit job to local ,not to cluster

2014-10-21 Thread Azuryy Yu
please check your mapred-site.xml is available under conf. On Tue, Oct 21, 2014 at 2:47 PM, ch huang justlo...@gmail.com wrote: hi,maillist: my cluster move from one IDC to another IDC ,when all done ,i run job ,and find the job run on local box not on cluster ,why? it is normal on

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Shahab Yunus
Jakub, are you saying that we can't change the mappers per job through the script, right? Because, otherwise, if invoking through command line or code, then we can, I think. We do have this property mapreduce.job.maps. Regards, Shahab On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Jakub Stransky
What I understand so far is that in pig you cannot decide how many mappers will run. That is given by some optimalization - given the number of files, size of blocks etc. What you can control is the number of reducers via Parallel directive. But for sure you can SET mapreduce.job.maps but not

HDFS - Consolidate 2 small volumes into 1 large volume

2014-10-21 Thread Brian C. Huffman
All, Is it possible to consolidate two small data volumes (500GB each) into a larger data volume (3TB)? I'm thinking that as long as the block file names and metadata are unique, then I should be able to shut down the datanode and use something like tar or rsync to copy the contents of each

Re: Spark vs Tez

2014-10-21 Thread Tim Randles
Yeah, compared to something as performant as java... /sarcasm On 10/20/2014 10:16 PM, Adaryl Bob Wakefield, MBA wrote: Using an interpreted scripting language with something that is billing itself as being fast doesn’t sound like the best idea... B. *From:* Russell Jurney

Re: Spark vs Tez

2014-10-21 Thread Edward Capriolo
scala is not an interpreted language, from my non authoritative view it seems to have 2-3 (thousand) more compile phases than java and as a result some of the things you are doing that look like they are interpreted are actually macro's that get converted into usually efficient java code. About

Re: Spark vs Tez

2014-10-21 Thread Brian O'Neill
@edwardcapriolo, funny running into you over here in the hadoop community. =) FWIW, I have the same perspective and had the same experience with Scala and Spark. (I had LISP/Scheme in College too. =) Additionally, with the JDK8 enhancements (lambda expressions, method references, etc.), there

Hadoop 2.0 job simulation using MiniCluster

2014-10-21 Thread Yehia Elshater
Hi All, I am wondering how can I simulate a Hadoop 2.0 Job using MiniCluster and MiniDFS ? for example, I want to simulate a submission of a Hadoop job with some map/reduce tasks given some data input splits. Your help is really appreciated. Yehia

Hadoop 2.0 job simulation using MiniCluster

2014-10-21 Thread Yehia Elshater
Hi All, I am wondering how can I simulate a Hadoop 2.0 Job using MiniCluster and MiniDFS ? for example, I want to simulate a submission of a Hadoop job with some map/reduce tasks given some data input splits. Your help is really appreciated.

Re: Hadoop 2.0 job simulation using MiniCluster

2014-10-21 Thread Hitesh Shah
Maybe check TestMRJobs.java ( hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestMRJobs.java ) ? — Hitesh On Oct 21, 2014, at 10:01 AM, Yehia Elshater y.z.elsha...@gmail.com wrote: Hi All, I am wondering

Re: HDFS - Consolidate 2 small volumes into 1 large volume

2014-10-21 Thread Azuryy Yu
yes, you can. stop the cluster, change your hdfs-site.xml on your datanode. (dfs.datanode.dir) to the large volume. copy two small data volumes to the large volumes, which was configured on above. start cluster. then you are done. On Tue, Oct 21, 2014 at 9:57 PM, Brian C. Huffman

A problem with Hadoop PID files

2014-10-21 Thread Han-Cheol Cho
Hi all, I am using Monit to monitor hadoop processes and automatically restart them when failed. From time to time, however, a hadoop process (e.g., namenode) runs with the PID, saying , while its pid file (in /var/run/hadoop-hdfs/hadoop-hdfs-namenode.pid) has a different value, saying