date:20131024

Exceptions in Data node log

2013-10-24 Thread Vimal Jain

Hi,
I am using Hadoop and Hbase in pseudo distributed mode.
I am using  Hadoop version - 1.1.2 , Hbase version - 0.94.7 .

I am receiving following error messages in data node log.

hadoop-hadoop-datanode-woody.log:2013-10-24 10:55:37,579 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:5001
0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):Got exception while serving
blk_4378636005274237256_55385
 to /192.168.20.30:
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 48
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:60739]
hadoop-hadoop-datanode-woody.log:2013-10-24 10:55:37,603 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
192.168.20.30:500
10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
infoPort=50075, ipcPort=50020):DataXceiver
hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 48
millis timeout while waiting for channel to be ready for write. ch :
java.nio
.channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
192.168.20.30:60739]


Please help in understanding the cause behind this.
-- 
Thanks and Regards,
Vimal Jain

Re: Questions about hadoop-metrics2.properties

2013-10-24 Thread Luke Lu

1. File and Ganglia are the only bundled sinks, though there are
socket/json (for chukwa) and graphite sinks patches in the works.
2. Hadoop metrics (and metrics2) is mostly designed for system/process
metrics, which means you'll need to attach jconsole to your map/reduce task
processes to see your task metrics instrumented via metrics. What you
actually want is probably custom job counters.
3. You don't need any configuration to use JMX to access metrics2, as JMX
is currently on by default. The configuration in hadoop-metrics2.properties
is mostly for optional sink configuration and metrics filtering.

__Luke



On Wed, Oct 23, 2013 at 4:21 PM, Benyi Wang bewang.t...@gmail.com wrote:

 1. Does hadoop metrics2 only support File and Ganglia sink?
 2. Can I expose metrics as JMX, especially for customized metrics? I
 created some  metrics in my mapreduce job and could successfully output
 them using a FileSink. But if I use jconsole to access YARN nodemanager, I
 can only see hadoop metrics e.g Hadoop/NodeManager/NodeManagerMetrices
 etc.,  not mine with prefix maptask. How to setup to see maptask/reducetask
 prefix metrics?
 3. Is there an example using jmx? I could not find

 The configuration syntax is:

   [prefix].[source|sink|jmx|].[instance].[option]


 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html

How to use Hadoop2 HA's logical name URL?

2013-10-24 Thread Liu, Raymond

Hi

I have setting up Hadoop 2.2.0 HA cluster following : 
http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details

And I can check both the active and standby namenode with WEB interface.

While, it seems that the logical name could not be used to access HDFS ?

I have following settings related to HA :

In core-site.xml:

property
namefs.defaultFS/name
valuehdfs://public-cluster/value
/property


property
namedfs.ha.fencing.methods/name
valuesshfence/value
/property

property
namedfs.ha.fencing.ssh.private-key-files/name
value/root/.ssh/id_rsa/value
/property

property
namedfs.ha.fencing.ssh.connect-timeout/name
value3/value
/property

And in hdfs-site.xml:

property
  namedfs.nameservices/name
  valuepublic-cluster/value
/property

property
  namedfs.ha.namenodes.public-cluster/name
  valuenn1,nn2/value
/property

property
  namedfs.namenode.rpc-address.public-cluster.nn1/name
  value10.0.2.31:8020/value
/property
property
  namedfs.namenode.rpc-address.public-cluster.nn2/name
  value10.0.2.32:8020/value
/property


property
  namedfs.namenode.http-address.public-cluster.nn1/name
  value10.0.2.31:50070/value
/property
property
  namedfs.namenode.http-address.public-cluster.nn2/name
  value10.0.2.32:50070/value
/property


property
  namedfs.namenode.shared.edits.dir/name
  
valueqjournal://10.0.0.144:8485;10.0.0.145:8485;10.0.0.146:8485/public-cluster/value
/property


property
  namedfs.journalnode.edits.dir/name
  value/mnt/DP_disk1/hadoop2/hdfs/jn/value
/property

property
  namedfs.client.failover.proxy.provider.mycluster/name
  
valueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value
/property

---

And then :

./bin/hdfs dfs -fs hdfs://public-cluster -ls /
-ls: java.net.UnknownHostException: public-cluster
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...]

While if I use the active namenode's URL, it works:

./bin/hdfs dfs -fs hdfs://10.0.2.31:8020 -ls /
Found 1 items
drwxr-xr-x   - root supergroup  0 2013-10-24 14:30 /tmp

However, shouldn't this hdfs://public-cluster kind of thing works? Anything 
that I might miss to make it work? Thanks!



Best Regards,
Raymond Liu

RE: Using Hbase with NN HA

2013-10-24 Thread Liu, Raymond

Encounter Similar issue with NN HA URL
Have you make it work?

Best Regards,
Raymond Liu


-Original Message-
From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] 
Sent: Friday, October 18, 2013 5:17 PM
To: user@hadoop.apache.org
Subject: Using Hbase with NN HA

Hi team,
Can Hbase be used with namenode HA in latest hadoop-2.2.0 ?
If yes is there something else required to be done other than following ?
1. Set hbase root dir to logical name of namenode service 2. Keep core site and 
hdfs site jn hbase conf

I did above two but logical name is not recognized. 

Also it will be helpful if i could get some help with which versions of Hbase 
hive pig and mahout are compatible with latest yarn release hadoop-2.2.0.  

I am using hbase-0.94.12

Thanks

Sent from my iPhone

Re: map container is assigned default memory size rather than user configured which will cause TaskAttempt failure

2013-10-24 Thread Tsuyoshi OZAWA

Hi,

How about checking the value of mapreduce.map.java.opts? Are your JVMs
launched with assumed heap memory?

On Thu, Oct 24, 2013 at 11:31 AM, Manu Zhang owenzhang1...@gmail.com wrote:
 Just confirmed the problem still existed even the mapred-site.xmls on all
 nodes have the same configuration (mapreduce.map.memory.mb = 2560).

 Any more thoughts ?

 Thanks,
 Manu


 On Thu, Oct 24, 2013 at 8:59 AM, Manu Zhang owenzhang1...@gmail.com wrote:

 Thanks Ravi.

 I do have mapred-site.xml under /etc/hadoop/conf/ on those nodes but it
 sounds weird to me should they read configuration from those mapred-site.xml
 since it's the client who applies for the resource. I have another
 mapred-site.xml in the directory where I run my job. I suppose my job should
 read conf from that mapred-site.xml. Please correct me if I am mistaken.

 Also, not always the same nodes. The number of failures is random, too.

 Anyway, I will have my settings in all the nodes' mapred-site.xml and see
 if the problem goes away.

 Manu


 On Thu, Oct 24, 2013 at 1:40 AM, Ravi Prakash ravi...@ymail.com wrote:

 Manu!

 This should not be the case. All tasks should have the configuration
 values you specified propagated to them. Are you sure your setup is correct?
 Are they always the same nodes which run with 1024Mb? Perhaps you have
 mapred-site.xml on those nodes?

 HTH
 Ravi


 On Tuesday, October 22, 2013 9:09 PM, Manu Zhang
 owenzhang1...@gmail.com wrote:
 Hi,

 I've been running Terasort on Hadoop-2.0.4.

 Every time there is s a small number of Map failures (like 4 or 5)
 because of container's running beyond virtual memory limit.

 I've set mapreduce.map.memory.mb to a safe value (like 2560MB) so most
 TaskAttempt goes fine while the values of those failed maps are the default
 1024MB.

 My question is thus, why a small number of container's memory values are
 set to default rather than that of user-configured ?

 Any thoughts ?

 Thanks,
 Manu Zhang








-- 
- Tsuyoshi

RE: How to use Hadoop2 HA's logical name URL?

2013-10-24 Thread Liu, Raymond

Hmm, my bad. NameserviceID is not sync in one of the properties
After fix, it works.

Best Regards,
Raymond Liu


-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com] 
Sent: Thursday, October 24, 2013 3:03 PM
To: user@hadoop.apache.org
Subject: How to use Hadoop2 HA's logical name URL?

Hi

I have setting up Hadoop 2.2.0 HA cluster following : 
http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details

And I can check both the active and standby namenode with WEB interface.

While, it seems that the logical name could not be used to access HDFS ?

I have following settings related to HA :

In core-site.xml:

property
namefs.defaultFS/name
valuehdfs://public-cluster/value
/property


property
namedfs.ha.fencing.methods/name
valuesshfence/value
/property

property
namedfs.ha.fencing.ssh.private-key-files/name
value/root/.ssh/id_rsa/value
/property

property
namedfs.ha.fencing.ssh.connect-timeout/name
value3/value
/property

And in hdfs-site.xml:

property
  namedfs.nameservices/name
  valuepublic-cluster/value
/property

property
  namedfs.ha.namenodes.public-cluster/name
  valuenn1,nn2/value
/property

property
  namedfs.namenode.rpc-address.public-cluster.nn1/name
  value10.0.2.31:8020/value
/property
property
  namedfs.namenode.rpc-address.public-cluster.nn2/name
  value10.0.2.32:8020/value
/property


property
  namedfs.namenode.http-address.public-cluster.nn1/name
  value10.0.2.31:50070/value
/property
property
  namedfs.namenode.http-address.public-cluster.nn2/name
  value10.0.2.32:50070/value
/property


property
  namedfs.namenode.shared.edits.dir/name
  
valueqjournal://10.0.0.144:8485;10.0.0.145:8485;10.0.0.146:8485/public-cluster/value
/property


property
  namedfs.journalnode.edits.dir/name
  value/mnt/DP_disk1/hadoop2/hdfs/jn/value
/property

property
  namedfs.client.failover.proxy.provider.mycluster/name
  
valueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value
/property

---

And then :

./bin/hdfs dfs -fs hdfs://public-cluster -ls /
-ls: java.net.UnknownHostException: public-cluster
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...]

While if I use the active namenode's URL, it works:

./bin/hdfs dfs -fs hdfs://10.0.2.31:8020 -ls / Found 1 items
drwxr-xr-x   - root supergroup  0 2013-10-24 14:30 /tmp

However, shouldn't this hdfs://public-cluster kind of thing works? Anything 
that I might miss to make it work? Thanks!



Best Regards,
Raymond Liu

Re: dynamically resizing Hadoop cluster on AWS?

2013-10-24 Thread Nan Zhu

Thank you very much for replying and sorry for posting on the wrong list  

Best,  

--  
Nan Zhu
School of Computer Science,
McGill University



On Thursday, October 24, 2013 at 1:06 AM, Jun Ping Du wrote:

 Move to @user alias.
  
 - Original Message -
 From: Jun Ping Du j...@vmware.com (mailto:j...@vmware.com)
 To: gene...@hadoop.apache.org (mailto:gene...@hadoop.apache.org)
 Sent: Wednesday, October 23, 2013 10:03:27 PM
 Subject: Re: dynamically resizing Hadoop cluster on AWS?
  
 If only compute node (TaskTracker or NodeManager) in your instance, then 
 decommission nodes and shutdown related EC2 instances should be fine although 
 some finished/running tasks might need to be re-run automatically. If future, 
 we would support gracefully decommission (tracked by YARN-914 and 
 MAPREDUCE-5381) so that no tasks need to be rerun in this case (but need to 
 wait a while).
  
 Thanks,
  
 Junping
  
 - Original Message -
 From: Nan Zhu zhunans...@gmail.com (mailto:zhunans...@gmail.com)
 To: gene...@hadoop.apache.org (mailto:gene...@hadoop.apache.org)
 Sent: Wednesday, October 23, 2013 8:15:51 PM
 Subject: Re: dynamically resizing Hadoop cluster on AWS?
  
 Oh, I’m not running HDFS in the instances, I use S3 to save data
  
 --  
 Nan Zhu
 School of Computer Science,
 McGill University
  
  
  
 On Wednesday, October 23, 2013 at 11:11 PM, Nan Zhu wrote:
  
  Hi, all  
   
  I’m running a Hadoop cluster on AWS EC2,  
   
  I would like to dynamically resizing the cluster so as to reduce the cost, 
  is there any solution to achieve this?  
   
  E.g. I would like to cut the cluster size with a half, is it safe to just 
  shutdown the instances (if some tasks are just running on them, can I rely 
  on the speculative execution to re-run them on other nodes?)
   
  I cannot use EMR, since I’m running a customized version of Hadoop  
   
  Best,  
   
  --  
  Nan Zhu
  School of Computer Science,
  McGill University

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

2013-10-24 Thread Angelo Matarazzo

Hi,
I think that it is not useful trying to install hadoop on Windows because
hadoop is very integrated in Linux and there is no support for Windows


2013/10/23 chris bidwell chris.bidw...@oracle.com

 Is there any documentation or instructions on installing Hadoop 2.2.0 on
 Microsoft Windows?
 Thank you.

 -Chris

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

2013-10-24 Thread Nitin Pawar

Disclosure: I do not work for Hortonworks, I just use their product. please
do not bash me up
Angelo,
thats not entirely correct now.

Hortonworks has done tremendous amount of work to port hadoop to windows os
as well.
Here is there press release:
http://hortonworks.com/about-us/news/hadoop-on-windows/

Chris,
I do remember seeing hadoop 2.x for windows (
http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/)
I never tried it myself it on windows but you can reach out to their
support forum and I am sure someone will be happy to help .




On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo matarazzoang...@gmail.com
 wrote:

 Hi,
 I think that it is not useful trying to install hadoop on Windows because
 hadoop is very integrated in Linux and there is no support for Windows


 2013/10/23 chris bidwell chris.bidw...@oracle.com

 Is there any documentation or instructions on installing Hadoop 2.2.0 on
 Microsoft Windows?
 Thank you.

 -Chris





-- 
Nitin Pawar

Hadoop 2.2.0 :What are the new features for Windows users

2013-10-24 Thread Joy George

*   In the release note I could see that Support for running Hadoop on 
Microsoft Windows is included.
*
*   Can somebody tell me what are those features?
*
*   - Are these new features added to support easy Windows installation?
*   - Are there any inbuilt MSFT.Net support introduced?

Joy George K
www.JoymonOnline.inhttp://www.joymononline.in/



Orion Systems Integrators Inc


Confidentiality Notice: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply e-mail, delete and then destroy all copies of the 
original message

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

2013-10-24 Thread DSuiter RDX

It was my understanding that HortonWorks depended on CygWin (UNIX emulation
on Windows) for most of the Bigtop family of tools - Hadoop core,
MapReduce, etc. - so, you will probably make all your configuration files
in Windows, since XML is agnostic, and can develop in Windows, since JARs
and other Java constructs are agnostic by design, but when things are
actually happening on your cluster, CygWin is in the middle. For a
framework like Hadoop, I question the wisdom of deciding to use a host
environment that uses more memory to begin with, then adding an emulation
layer that complicates things and takes still more memory (since most
Hadoop constraints are memory-based) simply for the convenience of being
able to use my mouse.

If you have a competent *NIX admin, you may consider the benefits of using
Hadoop in Linux/UNIX, and leveraging the many web-based management tools
(usually will come with a Hadoop distribution) and 3rd-party development
tools from Informatica or Pentaho (let you build MapReduce, Pig, etc jobs
in GUI) rather than going the Windows route - not only are you using
hardware resources better, you also don't need to worry about licensing,
and you won't need to reboot your cluster every Patch Tuesday. :-)

Thanks!

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Thu, Oct 24, 2013 at 8:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Disclosure: I do not work for Hortonworks, I just use their product.
 please do not bash me up
 Angelo,
 thats not entirely correct now.

 Hortonworks has done tremendous amount of work to port hadoop to windows
 os as well.
 Here is there press release:
 http://hortonworks.com/about-us/news/hadoop-on-windows/

 Chris,
 I do remember seeing hadoop 2.x for windows (
 http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/)
 I never tried it myself it on windows but you can reach out to their
 support forum and I am sure someone will be happy to help .




 On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo 
 matarazzoang...@gmail.com wrote:

 Hi,
 I think that it is not useful trying to install hadoop on Windows because
 hadoop is very integrated in Linux and there is no support for Windows


 2013/10/23 chris bidwell chris.bidw...@oracle.com

 Is there any documentation or instructions on installing Hadoop 2.2.0 on
 Microsoft Windows?
 Thank you.

 -Chris





 --
 Nitin Pawar

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

2013-10-24 Thread Adam Diaz

No Cygwin is involved...We have the installation via MSI clearly documented
on our site. Python, Visual C++, JDK and .Net framework. This work was done
in conjunction with Microsoft who is not in the business of supporting
Cygwin. Hadoop is Java and Java runs on windows.


On Thu, Oct 24, 2013 at 9:20 AM, DSuiter RDX dsui...@rdx.com wrote:

 It was my understanding that HortonWorks depended on CygWin (UNIX
 emulation on Windows) for most of the Bigtop family of tools - Hadoop
 core, MapReduce, etc. - so, you will probably make all your configuration
 files in Windows, since XML is agnostic, and can develop in Windows, since
 JARs and other Java constructs are agnostic by design, but when things are
 actually happening on your cluster, CygWin is in the middle. For a
 framework like Hadoop, I question the wisdom of deciding to use a host
 environment that uses more memory to begin with, then adding an emulation
 layer that complicates things and takes still more memory (since most
 Hadoop constraints are memory-based) simply for the convenience of being
 able to use my mouse.

 If you have a competent *NIX admin, you may consider the benefits of using
 Hadoop in Linux/UNIX, and leveraging the many web-based management tools
 (usually will come with a Hadoop distribution) and 3rd-party development
 tools from Informatica or Pentaho (let you build MapReduce, Pig, etc jobs
 in GUI) rather than going the Windows route - not only are you using
 hardware resources better, you also don't need to worry about licensing,
 and you won't need to reboot your cluster every Patch Tuesday. :-)

 Thanks!

 *Devin Suiter*
 Jr. Data Solutions Software Engineer
 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
 Google Voice: 412-256-8556 | www.rdx.com


 On Thu, Oct 24, 2013 at 8:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Disclosure: I do not work for Hortonworks, I just use their product.
 please do not bash me up
 Angelo,
 thats not entirely correct now.

 Hortonworks has done tremendous amount of work to port hadoop to windows
 os as well.
 Here is there press release:
 http://hortonworks.com/about-us/news/hadoop-on-windows/

 Chris,
 I do remember seeing hadoop 2.x for windows (
 http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/)
 I never tried it myself it on windows but you can reach out to their
 support forum and I am sure someone will be happy to help .




 On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo 
 matarazzoang...@gmail.com wrote:

 Hi,
 I think that it is not useful trying to install hadoop on Windows
 because hadoop is very integrated in Linux and there is no support for
 Windows


 2013/10/23 chris bidwell chris.bidw...@oracle.com

 Is there any documentation or instructions on installing Hadoop 2.2.0
 on Microsoft Windows?
 Thank you.

 -Chris





 --
 Nitin Pawar





-- 
   * Adam Diaz * * *  *   * Solution Engineer - Big Data
--

Phone:919 609 4842
  Email:  ad...@hortonworks.com
  Website:   http://www.hortonworks.com/

  * Follow Us: *
http://facebook.com/hortonworks/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature
http://twitter.com/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature
http://www.linkedin.com/company/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature

 [image: photo]

  Latest From Our Blog:  Strong Ecosystem Support for HDP 2.0, Enabling the
Modern Data Architecture
http://hortonworks.com/blog/strong-ecosystem-support-for-hdp-2-0/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

2013-10-24 Thread Angelo Matarazzo

Sorry for my ignorance... I don't bash up you Nitin..Eheh. Thank you very
much for your post Adam.
I 'm going to see your work.

2013/10/24 DSuiter RDX dsui...@rdx.com

Very cool Adam! Thanks for the clarification, and great work for you guys
porting it over to native running.

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Thu, Oct 24, 2013 at 9:27 AM, Adam Diaz ad...@hortonworks.com wrote:

No Cygwin is involved...We have the installation via MSI clearly
documented on our site. Python, Visual C++, JDK and .Net framework. This
work was done in conjunction with Microsoft who is not in the business of
supporting Cygwin. Hadoop is Java and Java runs on windows.

On Thu, Oct 24, 2013 at 9:20 AM, DSuiter RDX dsui...@rdx.com wrote:

It was my understanding that HortonWorks depended on CygWin (UNIX
emulation on Windows) for most of the Bigtop family of tools - Hadoop
core, MapReduce, etc. - so, you will probably make all your configuration
files in Windows, since XML is agnostic, and can develop in Windows, since
JARs and other Java constructs are agnostic by design, but when things are
actually happening on your cluster, CygWin is in the middle. For a
framework like Hadoop, I question the wisdom of deciding to use a host
environment that uses more memory to begin with, then adding an emulation
layer that complicates things and takes still more memory (since most
Hadoop constraints are memory-based) simply for the convenience of being
able to use my mouse.

If you have a competent *NIX admin, you may consider the benefits of
using Hadoop in Linux/UNIX, and leveraging the many web-based management
tools (usually will come with a Hadoop distribution) and 3rd-party
development tools from Informatica or Pentaho (let you build MapReduce,
Pig, etc jobs in GUI) rather than going the Windows route - not only are
you using hardware resources better, you also don't need to worry about
licensing, and you won't need to reboot your cluster every Patch Tuesday.
:-)

Thanks!

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Thu, Oct 24, 2013 at 8:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

Disclosure: I do not work for Hortonworks, I just use their product.
please do not bash me up
Angelo,
thats not entirely correct now.

Hortonworks has done tremendous amount of work to port hadoop to
windows os as well.
Here is there press release:
http://hortonworks.com/about-us/news/hadoop-on-windows/

Chris,
I do remember seeing hadoop 2.x for windows (
http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/
)
I never tried it myself it on windows but you can reach out to their
support forum and I am sure someone will be happy to help .

On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo
matarazzoang...@gmail.com wrote:

Hi,
I think that it is not useful trying to install hadoop on Windows
because hadoop is very integrated in Linux and there is no support for
Windows

2013/10/23 chris bidwell chris.bidw...@oracle.com

Is there any documentation or instructions on installing Hadoop 2.2.0
on Microsoft Windows?
Thank you.

-Chris

--
Nitin Pawar

--
* Adam Diaz * * * * * Solution Engineer - Big Data
--

Phone:919 609 4842
Email: ad...@hortonworks.com
Website: http://www.hortonworks.com/

* Follow Us: *
http://facebook.com/hortonworks/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature
http://twitter.com/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature
http://www.linkedin.com/company/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature

[image: photo]

Latest From Our Blog: Strong Ecosystem Support for HDP 2.0, Enabling
the Modern Data Architecture
http://hortonworks.com/blog/strong-ecosystem-support-for-hdp-2-0/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity
to which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

NullPointerException when trying to write mapper output

2013-10-24 Thread Marcelo Elias Del Valle

I am using hadoop 1.0.3 at Amazon EMR. I have a map / reduce job configured
like this:

private static final String TEMP_PATH_PREFIX =
System.getProperty(java.io.tmpdir) + /dmp_processor_tmp;
...
private Job setupProcessorJobS3() throws IOException, DataGrinderException {
String inputFiles = System.getProperty(DGConfig.INPUT_FILES);
Job processorJob = new Job(getConf(), PROCESSOR_JOBNAME);
processorJob.setJarByClass(DgRunner.class);
processorJob.setMapperClass(EntityMapperS3.class);
processorJob.setReducerClass(SelectorReducer.class);
processorJob.setOutputKeyClass(Text.class);
processorJob.setOutputValueClass(Text.class);
FileOutputFormat.setOutputPath(processorJob, new Path(TEMP_PATH_PREFIX));
processorJob.setOutputFormatClass(TextOutputFormat.class);
 processorJob.setInputFormatClass(NLineInputFormat.class);
FileInputFormat.setInputPaths(processorJob, inputFiles);
NLineInputFormat.setNumLinesPerSplit(processorJob, 1);
 return processorJob;
}

In my mapper class, I have:

private Text outkey = new Text();
private Text outvalue = new Text();
...
outkey.set(entity.getEntityId().toString());
outvalue.set(input.getId().toString());
printLog(context write);
context.write(outkey, outvalue);

This last line (`context.write(outkey, outvalue);`), causes this exception.
Of course both `outkey` and `outvalue` are not null.

2013-10-24 05:48:48,422 INFO
com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Current
Thread: Thread[main,5,main]Current timestamp: 1382593728422 context write
2013-10-24 05:48:48,422 ERROR
com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Error on
entitymapper for input: 03a07858-4196-46dd-8a2c-23dd824d6e6e
java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1293)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1210)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
at org.apache.hadoop.io.Text.write(Text.java:281)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:698)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
com.s1mbi0se.grinder.core.mapred.EntityMapper.map(EntityMapper.java:78)
at
com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:34)
at
com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:14)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-10-24 05:48:48,422 INFO
com.s1mbi0se.grinder.core.mapred.EntityMapperS3 (main): Current Thread:
Thread[main,5,main]Current timestamp: 1382593728422 Entity Mapper end

The first records on each task are just processed ok. In some point of the
task processing, I start to take this exception over and over, and then it
doesn't process a single record anymore for that task.

I tried to set `TEMP_PATH_PREFIX` to `s3://mybucket/dmp_processor_tmp`,
but same thing happened.

Any idea why is this happening? What could be making hadoop not being able
to write on it's output?

Re: Hadoop core jar class update

2013-10-24 Thread Ravi Prakash

Viswanathan,

What version of Hadoop are you using? What is the change?





On Wednesday, October 23, 2013 2:20 PM, Viswanathan J 
jayamviswanat...@gmail.com wrote:
 
Hi guys,
If I update(very small change) the hadoop-core mapred class for one of the OOME 
patch and compiled the jar. If I deploy that jar in production will that cause 
any issue. 
Will that cause any issue for NN or lose any data?
Version of the jar will be same. 
Is that update will check any checksum?
Please help asap.
Thanks,

Re: Unable to use third party jar

2013-10-24 Thread jamal sasha

OOps..forgot the code:
http://pastebin.com/7XnyVnkv


On Thu, Oct 24, 2013 at 10:54 AM, jamal sasha jamalsha...@gmail.com wrote:

 Hi,

 I am trying to join two datasets.. One of which is json..
 I am relying on json-simple library to parse that json..
 I am trying to use libjars.. So far .. for simple data processing.. the
 approach has worked.. but now i am getting the following error
 Exception in thread main java.lang.NoClassDefFoundError:
 org/json/simple/parser/ParseException
 at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
 at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
  at
 org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
 at
 org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
  at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024)
 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041)
  at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
 at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
  at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
 at org.select.Driver.run(Driver.java:130)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.select.Driver.main(Driver.java:139)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.ClassNotFoundException:
 org.json.simple.parser.ParseException
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

 I think I have implemented toolrunner.

 hadoop jar domain_gold.jar org.select.Driver \
 -libjars json-simple-1.1.1.jar  $INPUT1 $INPUT2 $OUTPUT
 .

Unable to use third party jar

2013-10-24 Thread jamal sasha

Hi,

I am trying to join two datasets.. One of which is json..
I am relying on json-simple library to parse that json..
I am trying to use libjars.. So far .. for simple data processing.. the
approach has worked.. but now i am getting the following error
Exception in thread main java.lang.NoClassDefFoundError:
org/json/simple/parser/ParseException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at
org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
at
org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.select.Driver.run(Driver.java:130)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.select.Driver.main(Driver.java:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException:
org.json.simple.parser.ParseException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

I think I have implemented toolrunner.

hadoop jar domain_gold.jar org.select.Driver \
-libjars json-simple-1.1.1.jar  $INPUT1 $INPUT2 $OUTPUT
.

Re: hadoop2.2.0 compile Failed - no such instruction

2013-10-24 Thread Ravi Prakash

Hi Rico!

What was the command line you used to build?





On Wednesday, October 23, 2013 11:44 PM, codepeak gcodep...@gmail.com wrote:
 
Hi all,
       I have a problem when compile the hadoop 2.2.0, the apache only offers 
32bit distribution, but I need 64bit, so I have to compile it myself.
My envirenment is:
       2.6.32_1-7-0-0 #1 SMP Wed Jul 25 16:20:31 CST 2012 x86_64 x86_64 x86_64 
GNU/Linux
       the CPU is:
       Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
       flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s
s ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good xtopology 
nonstop_tsc aperfmperf pni pclmulqdq dtes6
4 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic 
popcnt aes xsave avx lahf_lm ida arat epb x
saveopt pln pts tpr_shadow vnmi flexpriority ept vpid
       12 cores

        when compile the hadoop 2.2.0 with maven, something go wrong like 
belows:
        

       but my cpu support the sse4.1 and sse4.2 instruction set, I don't know 
if there is anything else wrong.

       thanks so much

rico

YARN And NTP

2013-10-24 Thread Jay Vyas

Hi folks.  Is there a way to make YARN more forgiving with last
modification times? The following exception in

org.apache.hadoop.yarn.util.FSDownload:

 changed on src filesystem (expected  + resource.getTimestamp() +
, was  + sStat.getModificationTime());

I realize that time should be the same, but depending on underlying
filesystem the semantics of this last modified time might vary.

Any thoughts on this?


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: dynamically resizing the Hadoop cluster?

2013-10-24 Thread Bryan Beaudreault

It seems like you may want to look into Amazon's EMR (elastic mapreduce),
which does much of what you are trying to do.  It's worth taking a look at
since you're already storing your data in S3 and using EC2 for your
cluster(s).


On Thu, Oct 24, 2013 at 5:07 PM, Nan Zhu zhunans...@gmail.com wrote:

 Good explanation,

 Thank you, Ravi

 Best,


 On Thu, Oct 24, 2013 at 4:51 PM, Ravi Prakash ravi...@ymail.com wrote:

 Hi Nan!

 If the task trackers stop heartbeating back to the JobTracker, the
 JobTracker will mark them as dead and reschedule the tasks which were
 running on that TaskTracker. Admittedly there is some delay between when
 the TaskTrackers stop heartbeating back and when the JobTracker marks them
 dead. This is controlled by the mapred.tasktracker.expiry.intervalparameter 
 (I'm assuming you are using Hadoop 1.x)

 HTH
 Ravi






   On Thursday, October 24, 2013 1:21 PM, Nan Zhu zhunans...@gmail.com
 wrote:
  Hi, Ravi,

 Thank you for the reply

 Actually I'm not running HDFS on EC2, instead I use S3 to store data

 I'm curious about that, if some nodes are decommissioned, the JobTracker
 will deal those tasks which originally ran on them as too slow (since no
 progress for a long time) so to run speculative execution OR it directly
 treats them as belonging to a running Job and ran on a dead TaskTracker?

 Best,

 Nan






 On Thu, Oct 24, 2013 at 2:04 PM, Ravi Prakash ravi...@ymail.com wrote:

 Hi Nan!

 Usually nodes are decommissioned slowly over some period of time so as
 not to disrupt the running jobs. When a node is decommissioned, the
 NameNode must re-replicate all under-replicated blocks. Rather than
 suddenly remove half the nodes, you might want to take a few nodes offline
 at a time. Hadoop should be able to handle rescheduling tasks on nodes no
 longer available (even without speculative execution. Speculative execution
 is for something else).

 HTH
 Ravi


   On Wednesday, October 23, 2013 10:26 PM, Nan Zhu zhunans...@gmail.com
 wrote:
   Hi, all

 I’m running a Hadoop cluster on AWS EC2,

 I would like to dynamically resizing the cluster so as to reduce the
 cost, is there any solution to achieve this?

 E.g. I would like to cut the cluster size with a half, is it safe to just
 shutdown the instances (if some tasks are just running on them, can I rely
 on the speculative execution to re-run them on the other nodes?)

 I cannot use EMR, since I’m running a customized version of Hadoop

 Best,

 --
 Nan Zhu
 School of Computer Science,
 McGill University







 --
 Nan Zhu
 School of Computer Science,
 McGill University
 E-Mail: zhunanmcg...@gmail.com zhunans...@gmail.com





 --
 Nan Zhu
 School of Computer Science,
 McGill University
 E-Mail: zhunanmcg...@gmail.com zhunans...@gmail.com

Mapreduce outputs to a different cluster?

2013-10-24 Thread S. Zhou

The scenario is: I run mapreduce job on cluster A (all source data is in 
cluster A) but I want the output of the job to cluster B. Is it possible? If 
yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

Re: HDP 2.0 Install fails on repo unavailability

2013-10-24 Thread Hitesh Shah

BCC'ing user@hadoop.

This is a question for the ambari mailing list. 

-- Hitesh 

On Oct 24, 2013, at 3:36 PM, Jain, Prem wrote:

 Folks,
  
 Trying to install the newly release Hadoop 2.0 using Ambari. I am able to 
 install Ambari, but when I try to install Hadoop 2.0 on rest of the cluster, 
 the installation fails erroring on repo mirror unavailability.  Not sure 
 where I messed up.
  
 Here are the error messages
  
 Output log from AMBARI during Installation
  
 notice: 
 /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln
  32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure: 
 change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y 
 install hadoop-libhdfs' returned 1:
  
 Error Downloading Packages:
   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
   hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
  
  
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change 
 from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y 
 install hadoop-lzo' returned 1:
  
 Error Downloading Packages:
   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
  
  
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content:
  content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to 
 '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7'
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content:
  content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to 
 '{md5}dd3922fc27f72cd78cf2b47f57351b08'
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content:
  content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to 
 '{md5}1b626aa016a6f916271f67f3aa22cbbb'
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change from 
 absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install 
 hadoop' returned 1:
  
 Error Downloading Packages:
   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
  
  
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop-libhdfs] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop-lzo] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
  Dependency Package[hadoop-libhdfs] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
  Dependency Package[hadoop-lzo] has failures: true
  
 Repo :
  
 http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25
 ErrorCodeNoSuchKey/CodeMessageThe specified key does not 
 exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25/KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error
  
  
 Manual install:
  
 [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs
 Error Downloading Packages:
   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure:

Re: HDP 2.0 Install fails on repo unavailability

2013-10-24 Thread Adam Diaz

The servers have been very busy since the release. You probably just need
to try again.
On Oct 24, 2013 6:37 PM, Jain, Prem premanshu.j...@netapp.com wrote:

  Folks,

 ** **

 Trying to install the newly release Hadoop 2.0 using Ambari. I am able to
 install Ambari, but when I try to install Hadoop 2.0 on rest of the
 cluster, the installation fails erroring on repo mirror unavailability.
 Not sure where I messed up. 

 ** **

 Here are the error messages

 ** **

 *Output log from AMBARI during Installation*

 ** **

 notice:
 /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln
 32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully

 err:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure:
 change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0
 -y install hadoop-libhdfs' returned 1: 

 ** **

 Error Downloading Packages:

   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure:
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256]
 No more mirrors to try.

   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure:
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno
 256] No more mirrors to try.

   hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure:
 hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno
 256] No more mirrors to try.

 ** **

 ** **

 err:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change
 from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y
 install hadoop-lzo' returned 1: 

 ** **

 Error Downloading Packages:

   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure:
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256]
 No more mirrors to try.

   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure:
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno
 256] No more mirrors to try.

 ** **

 ** **

 notice:
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content:
 content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to
 '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7'

 notice:
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content:
 content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to
 '{md5}dd3922fc27f72cd78cf2b47f57351b08'

 notice:
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content:
 content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to
 '{md5}1b626aa016a6f916271f67f3aa22cbbb'

 err:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change
 from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y
 install hadoop' returned 1: 

 ** **

 Error Downloading Packages:

   hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure:
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256]
 No more mirrors to try.

   zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure:
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno
 256] No more mirrors to try.

 ** **

 ** **

 notice:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop
 64::end]: Dependency Package[hadoop-libhdfs] has failures: true

 notice:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop
 64::end]: Dependency Package[hadoop-lzo] has failures: true

 notice:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop
 64::end]: Dependency Package[hadoop] has failures: true

 notice:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
 Dependency Package[hadoop-libhdfs] has failures: true

 notice:
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
 Dependency Package[hadoop-lzo] has failures: true

 ** **

 *Repo :*

 ** **

 http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25**
 **

 ErrorCodeNoSuchKey/CodeMessageThe specified key does not
 exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25
 /KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error
 

 ** **

 ** **

 *Manual install:*

 ** **

 [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs

 Error Downloading Packages:

Re: map container is assigned default memory size rather than user configured which will cause TaskAttempt failure

2013-10-24 Thread Manu Zhang

My mapreduce.map.java.opts is 1024MB

Thanks,
Manu


On Thu, Oct 24, 2013 at 3:11 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.comwrote:

 Hi,

 How about checking the value of mapreduce.map.java.opts? Are your JVMs
 launched with assumed heap memory?

 On Thu, Oct 24, 2013 at 11:31 AM, Manu Zhang owenzhang1...@gmail.com
 wrote:
  Just confirmed the problem still existed even the mapred-site.xmls on
 all
  nodes have the same configuration (mapreduce.map.memory.mb = 2560).
 
  Any more thoughts ?
 
  Thanks,
  Manu
 
 
  On Thu, Oct 24, 2013 at 8:59 AM, Manu Zhang owenzhang1...@gmail.com
 wrote:
 
  Thanks Ravi.
 
  I do have mapred-site.xml under /etc/hadoop/conf/ on those nodes but it
  sounds weird to me should they read configuration from those
 mapred-site.xml
  since it's the client who applies for the resource. I have another
  mapred-site.xml in the directory where I run my job. I suppose my job
 should
  read conf from that mapred-site.xml. Please correct me if I am mistaken.
 
  Also, not always the same nodes. The number of failures is random, too.
 
  Anyway, I will have my settings in all the nodes' mapred-site.xml and
 see
  if the problem goes away.
 
  Manu
 
 
  On Thu, Oct 24, 2013 at 1:40 AM, Ravi Prakash ravi...@ymail.com
 wrote:
 
  Manu!
 
  This should not be the case. All tasks should have the configuration
  values you specified propagated to them. Are you sure your setup is
 correct?
  Are they always the same nodes which run with 1024Mb? Perhaps you have
  mapred-site.xml on those nodes?
 
  HTH
  Ravi
 
 
  On Tuesday, October 22, 2013 9:09 PM, Manu Zhang
  owenzhang1...@gmail.com wrote:
  Hi,
 
  I've been running Terasort on Hadoop-2.0.4.
 
  Every time there is s a small number of Map failures (like 4 or 5)
  because of container's running beyond virtual memory limit.
 
  I've set mapreduce.map.memory.mb to a safe value (like 2560MB) so most
  TaskAttempt goes fine while the values of those failed maps are the
 default
  1024MB.
 
  My question is thus, why a small number of container's memory values
 are
  set to default rather than that of user-configured ?
 
  Any thoughts ?
 
  Thanks,
  Manu Zhang
 
 
 
 
 



 --
 - Tsuyoshi

Re: HDP 2.0 Install fails on repo unavailability

2013-10-24 Thread Clay McDonald

I think I have the fix for this. I'll check when I get home.

Clay McDonald
Sent from my iPhone

On Oct 24, 2013, at 7:36 PM, Hitesh Shah hit...@apache.org wrote:

 BCC'ing user@hadoop.
 
 This is a question for the ambari mailing list. 
 
 -- Hitesh 
 
 On Oct 24, 2013, at 3:36 PM, Jain, Prem wrote:
 
 Folks,
 
 Trying to install the newly release Hadoop 2.0 using Ambari. I am able to 
 install Ambari, but when I try to install Hadoop 2.0 on rest of the cluster, 
 the installation fails erroring on repo mirror unavailability.  Not sure 
 where I messed up.
 
 Here are the error messages
 
 Output log from AMBARI during Installation
 
 notice: 
 /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln
  32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure: 
 change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 
 -y install hadoop-libhdfs' returned 1:
 
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
  zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
  hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
 
 
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change 
 from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y 
 install hadoop-lzo' returned 1:
 
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
  zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
 
 
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content:
  content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to 
 '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7'
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content:
  content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to 
 '{md5}dd3922fc27f72cd78cf2b47f57351b08'
 notice: 
 /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content:
  content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to 
 '{md5}1b626aa016a6f916271f67f3aa22cbbb'
 err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change from 
 absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install 
 hadoop' returned 1:
 
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: 
 hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No 
 more mirrors to try.
  zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: 
 zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 
 256] No more mirrors to try.
 
 
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop-libhdfs] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop-lzo] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 
 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 
 64::end]: Dependency Package[hadoop] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
  Dependency Package[hadoop-libhdfs] has failures: true
 notice: 
 /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]:
  Dependency Package[hadoop-lzo] has failures: true
 
 Repo :
 
 http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25
 ErrorCodeNoSuchKey/CodeMessageThe specified key does not 
 exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25/KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error
 
 
 Manual install:
 
 [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs
 Error Downloading Packages:
  hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure:

RE: Mapreduce outputs to a different cluster?

2013-10-24 Thread java8964 java8964

Just specify the output location using the URI to another cluster. As long as 
the network is accessible, you should be fine.
Yong

Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myx...@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org

The scenario is: I run mapreduce job on cluster A (all source data is in 
cluster A) but I want the output of the job to cluster B. Is it possible? If 
yes, please let me know how to do it.
Here are some notes of my mapreduce job:1. the data source is an HBase table2. 
It only has mapper no reducer.
ThanksSenqiang

Terasort's performance anomaly

2013-10-24 Thread -

Hi,
I've been running Terasort on Hadoop 2.1.0-beta. I have a 6 node cluster 5 of 
which runs a Node Manager and all have a Data Node. I don't understand why I 
have a bad performance in most cases and why in some cases the performance is 
good  (10GB Terasort with 2 reducers).

* When I run 10, 20 and 30 GB with 1 reducer, I get the following results:

Total time: 1100, 3300 and 5700 sec

Avg map time: 29, 50 and 72 sec
Avg reduce time: 870, 2700 and 4800 sec

Killed map tasks: 2, 5 and 5

* When I run 10, 20 and 30 GB with 2 reducers, I get the following results:
Total time: 385, 4575 and 7379 sec
Avg map time: 35, 52 and 70 sec
Avg reduce time: 225, 3879 and 6116 sec
Killed map tasks: 1, 4, 5


* These results don't make sense since there is no correlation between them. 
Somehow, 10 GB Terasort with 2 reducers works much better than 1 reducer. In 
other cases increasing the amount of reducers actually decreases the 
performance. When I check the logs of application master, I see a lot of 
Container killed on request. Exit code is 143 error which are generally 
followed by [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1382628871858_0002_m_21_0: Container killed by the 
ApplicationMaster. (e.g. there are 219 of them in 30GB Terasort with 2 
reducers) - which doesn't give much information.

* I mostly use the default settings, the only changes which may have an impact 
(I also set dfs replication factor to 1) are the following:
  property
    nameyarn.nodemanager.aux-services/name
    valuemapreduce.shuffle/value
  /property
  property
    nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
    valueorg.apache.hadoop.mapred.ShuffleHandler/value
  /property

* I observed that in all cases map output materialized bytes is slightly more 
than map output bytes (which would be ok since I don't use any compression).

Here is one of the terminal outputs (from 20 GB Terasort with 2 reducers):

13/10/23 22:31:32 INFO mapreduce.Job:  map 0% reduce 0%

13/10/23 22:31:45 INFO mapreduce.Job:  map 1% reduce 0%
13/10/23 22:31:47 INFO mapreduce.Job:  map 3% reduce 0%
13/10/23 22:31:48 INFO mapreduce.Job:  map 6% reduce 0%
13/10/23 22:31:49 INFO mapreduce.Job:  map 7% reduce 0%
13/10/23 22:31:50 INFO mapreduce.Job:  map 8% reduce 0%
13/10/23 22:31:51 INFO mapreduce.Job:  map 10% reduce 0%
13/10/23 22:31:53 INFO mapreduce.Job:  map 11% reduce 0%
13/10/23 22:31:55 INFO mapreduce.Job:  map 12% reduce 0%
13/10/23 22:31:57 INFO mapreduce.Job:  map 14% reduce 0%
13/10/23 22:31:59 INFO mapreduce.Job:  map 15% reduce 0%
13/10/23 22:32:00 INFO mapreduce.Job:  map 16% reduce 0%
13/10/23 22:32:02 INFO mapreduce.Job:  map 17% reduce 0%
13/10/23 22:32:03 INFO mapreduce.Job:  map 18% reduce 0%
13/10/23 22:32:08 INFO mapreduce.Job:  map 20% reduce 0%
13/10/23 22:32:09 INFO mapreduce.Job:  map 21% reduce 0%
13/10/23 22:32:12 INFO mapreduce.Job:  map 22% reduce 0%
13/10/23 22:32:15 INFO mapreduce.Job:  map 23% reduce 0%
13/10/23 22:32:22 INFO mapreduce.Job:  map 25% reduce 0%
13/10/23 22:32:28 INFO mapreduce.Job:  map 30% reduce 1%
13/10/23 22:32:32 INFO mapreduce.Job:  map 36% reduce 1%
13/10/23 22:32:35 INFO mapreduce.Job:  map 38% reduce 1%
13/10/23 22:32:41 INFO mapreduce.Job:  map 39% reduce 1%
13/10/23 22:32:43 INFO mapreduce.Job:  map 40% reduce 1%
13/10/23 22:32:44 INFO mapreduce.Job:  map 41% reduce 2%
13/10/23 22:32:47 INFO mapreduce.Job:  map 42% reduce 2%
13/10/23 22:32:49 INFO mapreduce.Job:  map 43% reduce 2%
13/10/23 22:32:58 INFO mapreduce.Job:  map 44% reduce 2%
13/10/23 22:33:01 INFO mapreduce.Job:  map 47% reduce 2%
13/10/23 22:33:04 INFO mapreduce.Job:  map 48% reduce 2%
13/10/23 22:33:05 INFO mapreduce.Job:  map 49% reduce 2%
13/10/23 22:33:07 INFO mapreduce.Job:  map 49% reduce 3%
13/10/23 22:33:08 INFO mapreduce.Job:  map 50% reduce 3%
13/10/23 22:33:12 INFO mapreduce.Job:  map 51% reduce 3%
13/10/23 22:33:13 INFO mapreduce.Job:  map 52% reduce 3%
13/10/23 22:33:24 INFO mapreduce.Job:  map 55% reduce 4%
13/10/23 22:33:36 INFO mapreduce.Job:  map 60% reduce 5%
13/10/23 22:33:39 INFO mapreduce.Job:  map 61% reduce 5%
13/10/23 22:33:44 INFO mapreduce.Job:  map 62% reduce 5%
13/10/23 22:33:49 INFO mapreduce.Job:  map 64% reduce 5%
13/10/23 22:33:54 INFO mapreduce.Job:  map 67% reduce 6%
13/10/23 22:33:57 INFO mapreduce.Job:  map 69% reduce 6%
13/10/23 22:34:00 INFO mapreduce.Job:  map 70% reduce 6%
13/10/23 22:34:02 INFO mapreduce.Job:  map 71% reduce 6%
13/10/23 22:34:03 INFO mapreduce.Job:  map 72% reduce 6%
13/10/23 22:34:05 INFO mapreduce.Job:  map 73% reduce 6%
13/10/23 22:34:07 INFO mapreduce.Job:  map 74% reduce 6%
13/10/23 22:34:16 INFO mapreduce.Job:  map 76% reduce 6%
13/10/23 22:34:19 INFO mapreduce.Job:  map 77% reduce 7%
13/10/23 22:34:22 INFO mapreduce.Job:  map 78% reduce 7%
13/10/23 22:34:24 INFO mapreduce.Job:  map 79% reduce 7%
13/10/23 22:34:27 INFO mapreduce.Job:  map 80% reduce 7%
13/10/23 22:34:30 INFO

Re: Mapreduce outputs to a different cluster?

2013-10-24 Thread S. Zhou

Thanks Shahab  Yong. If cluster B (in which I want to dump output) has url 
hdfs://machine.domain:8080 and data folder /tmp/myfolder, what should I 
specify as the output path for MR job? 

Thanks




On Thursday, October 24, 2013 5:31 PM, java8964 java8964 java8...@hotmail.com 
wrote:
 
Just specify the output location using the URI to another cluster. As long as 
the network is accessible, you should be fine.

Yong




Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myx...@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org


The scenario is: I run mapreduce job on cluster A (all source data is in 
cluster A) but I want the output of the job to cluster B. Is it possible? If 
yes, please let me know how to do it.

Here are some notes of my mapreduce job:
1. the data source is an HBase table
2. It only has mapper no reducer.

Thanks
Senqiang

map phase does not read intermediate results with SequenceFileInputFormat

2013-10-24 Thread Anseh Danesh

Hi all.

I have a mapreduce program with two jobs. second job's key and value comes
from first job output. but I think the second map does not get the result
from first job. in other words I think my second job did not read the
output of my first job.. what should I do?

here is the code:

public class dewpoint extends Configured implements Tool
{
  private static final Logger logger = LoggerFactory.getLogger(dewpoint.class);

static final String KEYSPACE = weather;
static final String COLUMN_FAMILY = user;
private static final String OUTPUT_PATH1 = /tmp/intermediate1;
private static final String OUTPUT_PATH2 = /tmp/intermediate2;
private static final String OUTPUT_PATH3 = /tmp/intermediate3;
private static final String INPUT_PATH1 = /tmp/intermediate1;

public static void main(String[] args) throws Exception
{

ToolRunner.run(new Configuration(), new dewpoint(), args);
System.exit(0);
}

///

public static class dpmap1 extends MapperMapString, ByteBuffer,
MapFloatWritable, ByteBuffer, Text, DoubleWritable
{
DoubleWritable val1 = new DoubleWritable();
Text word = new Text();
String date;
float temp;
public void map(MapString, ByteBuffer keys, MapFloatWritable,
ByteBuffer columns, Context context) throws IOException,
InterruptedException
{

 for (EntryString, ByteBuffer key : keys.entrySet())
 {
 //System.out.println(key.getKey());
 if (!date.equals(key.getKey()))
 continue;
 date = ByteBufferUtil.string(key.getValue());
 word.set(date);
 }


for (EntryFloatWritable, ByteBuffer column : columns.entrySet())
{
if (!temprature.equals(column.getKey()))
continue;
temp = ByteBufferUtil.toFloat(column.getValue());
val1.set(temp);
//System.out.println(temp);
   }
context.write(word, val1);
}
}

///

public static class dpred1 extends ReducerText, DoubleWritable, Text,
DoubleWritable
{
   public void reduce(Text key, IterableDoubleWritable values,
Context context) throws IOException, InterruptedException
{
double beta = 17.62;
double landa = 243.12;
DoubleWritable result1 = new DoubleWritable();
DoubleWritable result2 = new DoubleWritable();
 for (DoubleWritable val : values){
 //  System.out.println(val.get());
   beta *= val.get();
   landa+=val.get();
   }
 result1.set(beta);
 result2.set(landa);

 context.write(key, result1);
 context.write(key, result2);
 }
}
///

public static class dpmap2 extends Mapper Text, DoubleWritable, Text,
DoubleWritable{

Text key2 = new Text();
double temp1, temp2 =0;

public void map(Text key, IterableDoubleWritable values, Context
context) throws IOException, InterruptedException {
String[] sp = values.toString().split(\t);
for (int i=0; i sp.length; i+=4)
//key2.set(sp[i]);
System.out.println(sp[i]);
for(int j=1;j sp.length; j+=4)
temp1 = Double.valueOf(sp[j]);
for (int k=3;k sp.length; k+=4)
temp2 = Double.valueOf(sp[k]);
context.write(key2, new DoubleWritable(temp2/temp1));

}
}

///


public static class dpred2 extends ReducerText, DoubleWritable, Text,
DoubleWritable
{
   public void reduce(Text key, IterableDoubleWritable values,
Context context) throws IOException, InterruptedException
{

   double alpha = 6.112;
double tmp = 0;
DoubleWritable result3 = new DoubleWritable();
 for (DoubleWritable val : values){
 System.out.println(val.get());
 tmp = alpha*(Math.pow(Math.E, val.get()));

 }
 result3.set(tmp);
 context.write(key, result3);


  }
}


///


public int run(String[] args) throws Exception
{

 Job job1 = new Job(getConf(), DewPoint);
 job1.setJarByClass(dewpoint.class);
 job1.setMapperClass(dpmap1.class);
 job1.setOutputFormatClass(SequenceFileOutputFormat.class);
 job1.setCombinerClass(dpred1.class);
 job1.setReducerClass(dpred1.class);
 job1.setMapOutputKeyClass(Text.class);
 job1.setMapOutputValueClass(DoubleWritable.class);
 job1.setOutputKeyClass(Text.class);
 job1.setOutputValueClass(DoubleWritable.class);
 FileOutputFormat.setOutputPath(job1, new Path(OUTPUT_PATH1));


 job1.setInputFormatClass(CqlPagingInputFormat.class);

 ConfigHelper.setInputRpcPort(job1.getConfiguration(), 9160);
 ConfigHelper.setInputInitialAddress(job1.getConfiguration(), localhost);

Exceptions in Data node log

Re: Questions about hadoop-metrics2.properties

How to use Hadoop2 HA's logical name URL?

RE: Using Hbase with NN HA

Re: map container is assigned default memory size rather than user configured which will cause TaskAttempt failure

RE: How to use Hadoop2 HA's logical name URL?

Re: dynamically resizing Hadoop cluster on AWS?

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

Hadoop 2.2.0 :What are the new features for Windows users

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows

NullPointerException when trying to write mapper output

Re: Hadoop core jar class update

Re: Unable to use third party jar

Unable to use third party jar

Re: hadoop2.2.0 compile Failed - no such instruction

YARN And NTP

Re: dynamically resizing the Hadoop cluster?

Mapreduce outputs to a different cluster?

Re: HDP 2.0 Install fails on repo unavailability

Re: HDP 2.0 Install fails on repo unavailability

Re: map container is assigned default memory size rather than user configured which will cause TaskAttempt failure

Re: HDP 2.0 Install fails on repo unavailability

RE: Mapreduce outputs to a different cluster?

Terasort's performance anomaly

Re: Mapreduce outputs to a different cluster?

map phase does not read intermediate results with SequenceFileInputFormat

29 matches

Site Navigation

Mail list logo

Footer information