CPU utilization keeps increasing when using HDFS

2014-09-01 Thread Shiyuan Xiao
Hi We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node). And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local

Re: CPU utilization keeps increasing when using HDFS

2014-09-01 Thread Stanley Shi
Would you please give the output of the top command? at least to show that the HDFS process did use that much of CPU; On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao shiyuan.x...@ericsson.com wrote: Hi We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from

Re: Hadoop 2.5.0 unit tests failures

2014-09-01 Thread Zhijie Shen
Hi Rajat, It is the situation that some test cases will have kinds of race conditions, an fail intermittently. In most cases, contributors are working on a linux box, such that the test case may implicitly make some assumption that is not valid for other systems. It is appreciated to report

Re: Job is reported as complete on history server while on console it shows as only half way thru

2014-09-01 Thread Zhijie Shen
Doe you mean multiple application attempts on YARN? One MR job shouldn't result in multiple YARN applications. Would you please share what is shown on the JHS and RM web UIs? On Thu, Aug 28, 2014 at 2:01 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am running a MRV1 job on Hadoop YARN

RE: CPU utilization keeps increasing when using HDFS

2014-09-01 Thread Shiyuan Xiao
Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS. But we used “top” and “pidstat” to check the CPU utilization of our application, I can confirm the CPU utilization of our application was increasing and the

HDFS rollingUpgrade failed due to unexpected storage info

2014-09-01 Thread sam liu
Hi Experts, According to section 'Upgrading Non-Federated Clusters' of http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html, I tried to upgrade hadoop 2.2.0 to hadoop 2.4.1. However, I failed on step 2.2 'Start NN2 as standby with the -rollingUpgrade

Re: CPU utilization keeps increasing when using HDFS

2014-09-01 Thread Gordon Wang
Because you are using one node Pseudo cluster. When HDFS client write data to HDFS, client will compute the data chunk checksum and the datanode will verify it. It costs cpu shares. You can monitoring the cpu usages for each process. I guess the NameNode cpu usage is OK. But the client process and

Replication factor affecting write performance

2014-09-01 Thread Laurens Bronwasser
Hi, We have a setup with two clusters. On cluster shows a very strong degradation when we increase the replication factor. Another cluster shows hardly any degradation with increased replication factor. Any idea how to find out the bottleneck in the slower cluster?

Re: Replication factor affecting write performance

2014-09-01 Thread Laurens Bronwasser
And now with the right label on the Y-axis. [cid:7D64F387-C6F1-4B37-9894-0166EC949EF9] From: Microsoft Office User laurens.bronwas...@imc.nlmailto:laurens.bronwas...@imc.nl Date: Monday, September 1, 2014 at 9:56 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org

Re: Error when running WordCount Hadoop program in Eclipse

2014-09-01 Thread alex
u can try this : create log4j.properties file in src directory or class path directory log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p

RE: CPU utilization keeps increasing when using HDFS

2014-09-01 Thread Shiyuan Xiao
Yes, the client process used the most CPU shares. But could you please help explain why the CPU utilization kept increasing? We are sure that the traffic of provisioned data into HDFS was stable. Thanks BR/Shiyuan From: Gordon Wang [mailto:gw...@pivotal.io] Sent: 2014年9月1日 15:48 To:

Re: total number of map tasks

2014-09-01 Thread Chris MacKenzie
Thanks for the update ;O) Regards, Chris MacKenzie http://www.chrismackenziephotography.co.uk/Expert in all aspects of photography telephone: 0131 332 6967 tel:0131 332 6967 email: stu...@chrismackenziephotography.co.uk corporate: www.chrismackenziephotography.co.uk

toolrunner issue

2014-09-01 Thread rab ra
Hello I m having an issue in running one simple map reduce job. The portion of the code is below. It gives a warning that Hadoop command line parsing was not peformed. This occurs despite the class implements Tool interface. Any clue? public static void main(String[] args) throws Exception {

Re: toolrunner issue

2014-09-01 Thread unmesha sreeveni
public class MyClass extends Configured implements Tool{ public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); int res = ToolRunner.run(conf, new MyClass(), args); System.exit(res); } @Override public int run(String[] args) throws Exception { //

Installing Hadoop from .deb installer

2014-09-01 Thread Tomas Delvechio
Hello I found an officially hadoop 1.2.1 .deb installer, and I'm trying to install it on Ubuntu 12.04 (I previously installed the JDK). After the installation, I found these scripts: * hadoop * hadoop-daemon.sh * hadoop-setup-applications.sh * hadoop-setup-hdfs.sh * hadoop-validate-setup.sh *

Re: toolrunner issue

2014-09-01 Thread rab ra
Yes, its my bad.. U r right.. Thanks On 1 Sep 2014 17:11, unmesha sreeveni unmeshab...@gmail.com wrote: public class MyClass extends Configured implements Tool{ public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); int res = ToolRunner.run(conf,

cannot start tasktracker because java.lang.NullPointerException

2014-09-01 Thread Dereje Teklu
I created Single node cluster using Hadoop 1.2.1 following Running Hadoop on Ubuntu Linux(single node cluster) http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/.However while trying to run single node cluster I found error cannot start tasktracker because

Re: cannot start tasktracker because java.lang.NullPointerException

2014-09-01 Thread Harsh J
It appears you have made changes to the source and recompiled it. The actual release source line 247 of the failing class can be seen at https://github.com/apache/hadoop-common/blob/release-1.2.1/src/mapred/org/apache/hadoop/mapred/TaskTracker.java#L247, which can never end in a NPE. You need to

Re: Tez and MapReduce

2014-09-01 Thread Alexander Pivovarov
e.g. in hive to switch engines set hive.execution.engine=mr; or set hive.execution.engine=tez; tez is faster especially on complex queries. On Aug 31, 2014 10:33 PM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: Can Tez and MapReduce live together and get along in the same

unsubscribe

2014-09-01 Thread Stanislaw Vasiljev
unsubscribe

Unsubscribe

2014-09-01 Thread Sankar
user@hadoop.apache.org X-Mailer: iPhone Mail (11D257) Sent from my iPhone

Re: Tez and MapReduce

2014-09-01 Thread jay vyas
Yes as an example of running a mapreduce job followed by a tez you can see our last post on this https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache . You can see in the bigtop/tez testing blogpost that you can confirm that Tez is being used easily on the web ui. From

Re: Tez and MapReduce

2014-09-01 Thread Bing Jiang
By the way, mapreduce.framework.name can be set yarn or yarn-tez. It will make differences. 2014-09-02 8:24 GMT+08:00 jay vyas jayunit100.apa...@gmail.com: Yes as an example of running a mapreduce job followed by a tez you can see our last post on this

Re: Replication factor affecting write performance

2014-09-01 Thread Stanley Shi
What's the network setup and topology? Also, the size of the cluster? On Mon, Sep 1, 2014 at 4:10 PM, Laurens Bronwasser laurens.bronwas...@imc.nl wrote: And now with the right label on the Y-axis. From: Microsoft Office User laurens.bronwas...@imc.nl Date: Monday, September 1, 2014 at

RE: Replication factor affecting write performance

2014-09-01 Thread mike Zarrin
Unsubscribe From: Stanley Shi [mailto:s...@pivotal.io] Sent: Monday, September 01, 2014 7:31 PM To: user@hadoop.apache.org Cc: Julien Lehuen; Tyler McDougall Subject: Re: Replication factor affecting write performance What's the network setup and topology? Also, the size of the

RE: Replication factor affecting write performance

2014-09-01 Thread Vimal Jain
Mike, Plz send email to user-unsubscr...@hbase.apache.org Dont spam entire mailing list. Unsubscribe *From:* Stanley Shi [mailto:s...@pivotal.io] *Sent:* Monday, September 01, 2014 7:31 PM *To:* user@hadoop.apache.org *Cc:* Julien Lehuen; Tyler McDougall *Subject:* Re: Replication factor

Re: Hadoop 2.5.0 unit tests failures

2014-09-01 Thread Ray Chiang
Just as a quick follow up, you can also search the JIRAs to see which tests are already known to be on the flakier side (e.g. race conditions like Zhijie mentions, or some similar hard-to-replicate reason). -Ray On Sun, Aug 31, 2014 at 11:40 PM, Zhijie Shen zs...@hortonworks.com wrote: Hi

Re: Hadoop InputFormat - Processing large number of small files

2014-09-01 Thread rab ra
Hi I tried to use your CombileFileInputFormat implementation. However, I get the following exception ‘not org.apache.hadoop.mapred.InputFormat’ I am using hadoop 2.4.1 and it looks like it expect older interface as it does not accept