Re: Hadoop Test libraries: Where did they go ?

2013-11-25 Thread Jay Vyas
Yup , we figured it out eventually. The artifacts now use the test-jar directive which creates a jar file that you can reference in mvn using the type tag in your dependencies. However, fyi, I haven't been able to successfully google for the quintessential classes in the hadoop test libs like

Relationship between heap sizes and mapred.child.java.opt configuration

2013-11-25 Thread Chih-Hsien Wu
I'm learning about Hadoop configuration. What is the connection between the datanode/ tasktracker heap sizes and the mapre.child.java.opts? Does one have to be exceeded to another?

Re: Relationship between heap sizes and mapred.child.java.opt configuration

2013-11-25 Thread Kai Voigt
mapred.child.java.opts are referring to the settings for the JVMs spawned by the TaskTracker. This JVMs will actually run the tasks (mappers and reducers) The heap sizes for TaskTrackers and DataNodes are unrelated to those. They run in their own JVMs each. Kai Am 25.11.2013 um 15:52 schrieb

Re: Relationship between heap sizes and mapred.child.java.opt configuration

2013-11-25 Thread Chih-Hsien Wu
Thanks for the reply. So what is the purpose of heap sizes for tasktrackers and datanodes then? In other words, if I want to speed up the map/reducing cycle, can I just minimize the heap size and maximize the mapred.child.java.opts? or will the minimizing heap sizes causing out of memory

Map/Reduce/Driver jar(s) organization

2013-11-25 Thread John Conwell
I'm curious what are some best practices for structuring jars for a business framework that uses Map/Reduce? Note: This is assuming you aren't invoking MR manually via the cmd line, but have Hadoop integrated into a larger business framework that invokes MR jobs programmatically. By business

Errors running Hadoop 2.2.0 on Cygwin

2013-11-25 Thread Srinivas Chamarthi
I have the following error while running 2.2.0 using cygwin. anyone can help with the problem ? /cygdrive/c/hadoop-2.2.0/bin $ ./hdfs namenode -format java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/server/namenode/NameNode Caused by: java.lang.ClassNotFoundException:

Re: Errors running Hadoop 2.2.0 on Cygwin

2013-11-25 Thread Ted Yu
Can you show us the classpath ? Cheers On Tue, Nov 26, 2013 at 2:40 AM, Srinivas Chamarthi srinivas.chamar...@gmail.com wrote: I have the following error while running 2.2.0 using cygwin. anyone can help with the problem ? /cygdrive/c/hadoop-2.2.0/bin $ ./hdfs namenode -format

Re: Errors running Hadoop 2.2.0 on Cygwin

2013-11-25 Thread Srinivas Chamarthi
added echo $CLASSPATH in libexec/hadoop-config.sh and here is what it contains

About Hadoop

2013-11-25 Thread RajBasha S
can Map Reduce will run on HDFS or any other file system ? HDFS is Mandatory

Re: About Hadoop

2013-11-25 Thread Nitin Pawar
you don't necessarily have to have to hdfs to run mapreduce. But its recommended :) On Mon, Nov 25, 2013 at 3:25 PM, RajBasha S rajbash...@ermslive.com wrote: can Map Reduce will run on HDFS or any other file system ? HDFS is Mandatory -- Nitin Pawar

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-11-25 Thread Yexi Jiang
As far as I know, there is no ID3 implementation in mahout currently, but you can use the decision forest instead. https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. 2013/11/25 unmesha sreeveni unmeshab...@gmail.com Is that ID3 classification? It includes prediction also?

Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Alejandro Abdelnur
Hi Krishna, Are you starting all AMs from the same JVM? Mind sharing the code you are using for your time testing? Thx On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I have modified the code in

RE: Heterogeneous Cluster

2013-11-25 Thread Andrew Machtolff
Yes, I set one up as a test. I had a windows cluster of 3 machines, and added a 4th Linux node. The Data Node was able to connect and replicate, but MR jobs failed. JobTracker/TaskTracker wasn't translating the path to the data block. They were telling the Linux node to look in C:\ for the

Only one reducer running on canopy generator

2013-11-25 Thread Chih-Hsien Wu
Hi all, I have been experiencing memory issue while working with Mahout canopy algorithm on big set of data on Hadoop. I notice that only one reducer was running while other nodes were idle. I was wondering if increasing the number of reduce tasks would ease down the memory usage and speed up

How can I remote debug application master

2013-11-25 Thread Jeff Zhang
Hi, I build a customized application master but have some issues, is it possible for me to remote debug the application master ? Thanks

How can I see the history log of non-mapreduce job in yarn

2013-11-25 Thread Jeff Zhang
I have configured the history server of yarn. But it looks like it can only help me to see the history log of mapreduce jobs. I still could not see the logs of non-mapreduce job. How can I see the history log of non-mapreduce job ?

Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Krishna Kishore Bonagiri
Hi Alejandro, I don't start all the AMs from the same JVM. How can I do that? Also, when I do that, that will save me time taken to get AM started, which is also good to see an improvement in. Please let me know how can I do that? And, would this also save me time taken for connecting from AM

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-11-25 Thread unmesha sreeveni
ok . Thx Yexi On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang yexiji...@gmail.com wrote: As far as I know, there is no ID3 implementation in mahout currently, but you can use the decision forest instead. https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. 2013/11/25 unmesha

Re: Desicion Tree Implementation in Hadoop MapReduce

2013-11-25 Thread Yexi Jiang
You are welcome :) 2013/11/25 unmesha sreeveni unmeshab...@gmail.com ok . Thx Yexi On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang yexiji...@gmail.com wrote: As far as I know, there is no ID3 implementation in mahout currently, but you can use the decision forest instead.

Re: Heterogeneous Cluster

2013-11-25 Thread Azuryy Yu
I don't think this is a normal way, and It's not suggested. we can deploy cluster cross IDC, cross different network, but don't cross OS. at least currently. On Tue, Nov 26, 2013 at 6:56 AM, Andrew Machtolff amachto...@askcts.comwrote: Yes, I set one up as a test. I had a windows cluster of

Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Alejandro Abdelnur
Krishna, Well, it all depends on your use case. In the case of Llama, Llama is a server that hosts multiple unmanaged AMs, thus all AMs run in the same process. Thanks. On Mon, Nov 25, 2013 at 6:40 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't start

why my terasort job become a local job?

2013-11-25 Thread ch huang
hi,maillist: i run terasort in my hadoop cluster,and it run as a local job,i do not know why ,anyone can help? i use hadoop version is CDH4.4 # sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.4.0.jar teragen 1000 /alex/terasort/10G-input

Is there any design document for YARN

2013-11-25 Thread Jeff Zhang
Hi , I am reading the yarn code, so wondering whether there's any design document for the yarn. I found the blog post on hortonworks is very useful. But more details document would be helpful. Thanks

Re: why my terasort job become a local job?

2013-11-25 Thread Jeff Zhang
Do you set to use yarn framework in mapred-site.xml as following ? property namemapreduce.framework.name/name valueyarn/value /property On Tue, Nov 26, 2013 at 1:27 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i run terasort in my hadoop cluster,and it run as a

Working with Capacity Scheduler

2013-11-25 Thread Munna
Hi, I working with Capacity Scheduler on YARN and I have configured different queues. I can able to see all the queues on RM ui. But, when i start to run MR jobs with configured user names(yarn,mapred), i am unable to run the Jobs and job are suspended. Again i set default as FIFO working fine.

issue about yarn framework

2013-11-25 Thread ch huang
hi,maillist: i have a 5-nodes hadoop cluster,today i find a problem ,one of my job running in the cluster take up all the container and all vcore,so other jobs need stay in pending status ,my question is 1,how to find the number of all containers in hadoop,and the number of

Re: why my terasort job become a local job?

2013-11-25 Thread ch huang
yes ,i did # grep -C 3 framework /etc/hadoop/conf/mapred-site.xml configuration !-- YARN -- property namemapreduce.framework.name/name valueyarn/value /property On Tue, Nov 26, 2013 at 1:36 PM, Jeff Zhang jezh...@gopivotal.com wrote: Do you set to use yarn framework

issue about yarn scheduler

2013-11-25 Thread ch huang
hi,maillist: i see apache doc about yarn schema,it says capacity scheduler became a default scheduler,but what i see in CDH4.4,fifo scheduler still is default scheduler,why?

issue about set the memory for each container

2013-11-25 Thread ch huang
hi,maillist: i find each my container just use 200M heap space,how can i resize it? # ps -ef|grep -i yarnchild yarn 24333 8210 99 14:09 ?00:00:05 /usr/java/jdk1.7.0_25/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m