Exceptions in Data node log
Hi, I am using Hadoop and Hbase in pseudo distributed mode. I am using Hadoop version - 1.1.2 , Hbase version - 0.94.7 . I am receiving following error messages in data node log. hadoop-hadoop-datanode-woody.log:2013-10-24 10:55:37,579 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.20.30:5001 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075, ipcPort=50020):Got exception while serving blk_4378636005274237256_55385 to /192.168.20.30: hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/ 192.168.20.30:60739] hadoop-hadoop-datanode-woody.log:2013-10-24 10:55:37,603 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.20.30:500 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237, infoPort=50075, ipcPort=50020):DataXceiver hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/ 192.168.20.30:60739] Please help in understanding the cause behind this. -- Thanks and Regards, Vimal Jain
Re: Questions about hadoop-metrics2.properties
1. File and Ganglia are the only bundled sinks, though there are socket/json (for chukwa) and graphite sinks patches in the works. 2. Hadoop metrics (and metrics2) is mostly designed for system/process metrics, which means you'll need to attach jconsole to your map/reduce task processes to see your task metrics instrumented via metrics. What you actually want is probably custom job counters. 3. You don't need any configuration to use JMX to access metrics2, as JMX is currently on by default. The configuration in hadoop-metrics2.properties is mostly for optional sink configuration and metrics filtering. __Luke On Wed, Oct 23, 2013 at 4:21 PM, Benyi Wang bewang.t...@gmail.com wrote: 1. Does hadoop metrics2 only support File and Ganglia sink? 2. Can I expose metrics as JMX, especially for customized metrics? I created some metrics in my mapreduce job and could successfully output them using a FileSink. But if I use jconsole to access YARN nodemanager, I can only see hadoop metrics e.g Hadoop/NodeManager/NodeManagerMetrices etc., not mine with prefix maptask. How to setup to see maptask/reducetask prefix metrics? 3. Is there an example using jmx? I could not find The configuration syntax is: [prefix].[source|sink|jmx|].[instance].[option] http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
How to use Hadoop2 HA's logical name URL?
Hi I have setting up Hadoop 2.2.0 HA cluster following : http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details And I can check both the active and standby namenode with WEB interface. While, it seems that the logical name could not be used to access HDFS ? I have following settings related to HA : In core-site.xml: property namefs.defaultFS/name valuehdfs://public-cluster/value /property property namedfs.ha.fencing.methods/name valuesshfence/value /property property namedfs.ha.fencing.ssh.private-key-files/name value/root/.ssh/id_rsa/value /property property namedfs.ha.fencing.ssh.connect-timeout/name value3/value /property And in hdfs-site.xml: property namedfs.nameservices/name valuepublic-cluster/value /property property namedfs.ha.namenodes.public-cluster/name valuenn1,nn2/value /property property namedfs.namenode.rpc-address.public-cluster.nn1/name value10.0.2.31:8020/value /property property namedfs.namenode.rpc-address.public-cluster.nn2/name value10.0.2.32:8020/value /property property namedfs.namenode.http-address.public-cluster.nn1/name value10.0.2.31:50070/value /property property namedfs.namenode.http-address.public-cluster.nn2/name value10.0.2.32:50070/value /property property namedfs.namenode.shared.edits.dir/name valueqjournal://10.0.0.144:8485;10.0.0.145:8485;10.0.0.146:8485/public-cluster/value /property property namedfs.journalnode.edits.dir/name value/mnt/DP_disk1/hadoop2/hdfs/jn/value /property property namedfs.client.failover.proxy.provider.mycluster/name valueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value /property --- And then : ./bin/hdfs dfs -fs hdfs://public-cluster -ls / -ls: java.net.UnknownHostException: public-cluster Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] While if I use the active namenode's URL, it works: ./bin/hdfs dfs -fs hdfs://10.0.2.31:8020 -ls / Found 1 items drwxr-xr-x - root supergroup 0 2013-10-24 14:30 /tmp However, shouldn't this hdfs://public-cluster kind of thing works? Anything that I might miss to make it work? Thanks! Best Regards, Raymond Liu
RE: Using Hbase with NN HA
Encounter Similar issue with NN HA URL Have you make it work? Best Regards, Raymond Liu -Original Message- From: Siddharth Tiwari [mailto:siddharth.tiw...@live.com] Sent: Friday, October 18, 2013 5:17 PM To: user@hadoop.apache.org Subject: Using Hbase with NN HA Hi team, Can Hbase be used with namenode HA in latest hadoop-2.2.0 ? If yes is there something else required to be done other than following ? 1. Set hbase root dir to logical name of namenode service 2. Keep core site and hdfs site jn hbase conf I did above two but logical name is not recognized. Also it will be helpful if i could get some help with which versions of Hbase hive pig and mahout are compatible with latest yarn release hadoop-2.2.0. I am using hbase-0.94.12 Thanks Sent from my iPhone
Re: map container is assigned default memory size rather than user configured which will cause TaskAttempt failure
Hi, How about checking the value of mapreduce.map.java.opts? Are your JVMs launched with assumed heap memory? On Thu, Oct 24, 2013 at 11:31 AM, Manu Zhang owenzhang1...@gmail.com wrote: Just confirmed the problem still existed even the mapred-site.xmls on all nodes have the same configuration (mapreduce.map.memory.mb = 2560). Any more thoughts ? Thanks, Manu On Thu, Oct 24, 2013 at 8:59 AM, Manu Zhang owenzhang1...@gmail.com wrote: Thanks Ravi. I do have mapred-site.xml under /etc/hadoop/conf/ on those nodes but it sounds weird to me should they read configuration from those mapred-site.xml since it's the client who applies for the resource. I have another mapred-site.xml in the directory where I run my job. I suppose my job should read conf from that mapred-site.xml. Please correct me if I am mistaken. Also, not always the same nodes. The number of failures is random, too. Anyway, I will have my settings in all the nodes' mapred-site.xml and see if the problem goes away. Manu On Thu, Oct 24, 2013 at 1:40 AM, Ravi Prakash ravi...@ymail.com wrote: Manu! This should not be the case. All tasks should have the configuration values you specified propagated to them. Are you sure your setup is correct? Are they always the same nodes which run with 1024Mb? Perhaps you have mapred-site.xml on those nodes? HTH Ravi On Tuesday, October 22, 2013 9:09 PM, Manu Zhang owenzhang1...@gmail.com wrote: Hi, I've been running Terasort on Hadoop-2.0.4. Every time there is s a small number of Map failures (like 4 or 5) because of container's running beyond virtual memory limit. I've set mapreduce.map.memory.mb to a safe value (like 2560MB) so most TaskAttempt goes fine while the values of those failed maps are the default 1024MB. My question is thus, why a small number of container's memory values are set to default rather than that of user-configured ? Any thoughts ? Thanks, Manu Zhang -- - Tsuyoshi
RE: How to use Hadoop2 HA's logical name URL?
Hmm, my bad. NameserviceID is not sync in one of the properties After fix, it works. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: Thursday, October 24, 2013 3:03 PM To: user@hadoop.apache.org Subject: How to use Hadoop2 HA's logical name URL? Hi I have setting up Hadoop 2.2.0 HA cluster following : http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Configuration_details And I can check both the active and standby namenode with WEB interface. While, it seems that the logical name could not be used to access HDFS ? I have following settings related to HA : In core-site.xml: property namefs.defaultFS/name valuehdfs://public-cluster/value /property property namedfs.ha.fencing.methods/name valuesshfence/value /property property namedfs.ha.fencing.ssh.private-key-files/name value/root/.ssh/id_rsa/value /property property namedfs.ha.fencing.ssh.connect-timeout/name value3/value /property And in hdfs-site.xml: property namedfs.nameservices/name valuepublic-cluster/value /property property namedfs.ha.namenodes.public-cluster/name valuenn1,nn2/value /property property namedfs.namenode.rpc-address.public-cluster.nn1/name value10.0.2.31:8020/value /property property namedfs.namenode.rpc-address.public-cluster.nn2/name value10.0.2.32:8020/value /property property namedfs.namenode.http-address.public-cluster.nn1/name value10.0.2.31:50070/value /property property namedfs.namenode.http-address.public-cluster.nn2/name value10.0.2.32:50070/value /property property namedfs.namenode.shared.edits.dir/name valueqjournal://10.0.0.144:8485;10.0.0.145:8485;10.0.0.146:8485/public-cluster/value /property property namedfs.journalnode.edits.dir/name value/mnt/DP_disk1/hadoop2/hdfs/jn/value /property property namedfs.client.failover.proxy.provider.mycluster/name valueorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider/value /property --- And then : ./bin/hdfs dfs -fs hdfs://public-cluster -ls / -ls: java.net.UnknownHostException: public-cluster Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] While if I use the active namenode's URL, it works: ./bin/hdfs dfs -fs hdfs://10.0.2.31:8020 -ls / Found 1 items drwxr-xr-x - root supergroup 0 2013-10-24 14:30 /tmp However, shouldn't this hdfs://public-cluster kind of thing works? Anything that I might miss to make it work? Thanks! Best Regards, Raymond Liu
Re: dynamically resizing Hadoop cluster on AWS?
Thank you very much for replying and sorry for posting on the wrong list Best, -- Nan Zhu School of Computer Science, McGill University On Thursday, October 24, 2013 at 1:06 AM, Jun Ping Du wrote: Move to @user alias. - Original Message - From: Jun Ping Du j...@vmware.com (mailto:j...@vmware.com) To: gene...@hadoop.apache.org (mailto:gene...@hadoop.apache.org) Sent: Wednesday, October 23, 2013 10:03:27 PM Subject: Re: dynamically resizing Hadoop cluster on AWS? If only compute node (TaskTracker or NodeManager) in your instance, then decommission nodes and shutdown related EC2 instances should be fine although some finished/running tasks might need to be re-run automatically. If future, we would support gracefully decommission (tracked by YARN-914 and MAPREDUCE-5381) so that no tasks need to be rerun in this case (but need to wait a while). Thanks, Junping - Original Message - From: Nan Zhu zhunans...@gmail.com (mailto:zhunans...@gmail.com) To: gene...@hadoop.apache.org (mailto:gene...@hadoop.apache.org) Sent: Wednesday, October 23, 2013 8:15:51 PM Subject: Re: dynamically resizing Hadoop cluster on AWS? Oh, I’m not running HDFS in the instances, I use S3 to save data -- Nan Zhu School of Computer Science, McGill University On Wednesday, October 23, 2013 at 11:11 PM, Nan Zhu wrote: Hi, all I’m running a Hadoop cluster on AWS EC2, I would like to dynamically resizing the cluster so as to reduce the cost, is there any solution to achieve this? E.g. I would like to cut the cluster size with a half, is it safe to just shutdown the instances (if some tasks are just running on them, can I rely on the speculative execution to re-run them on other nodes?) I cannot use EMR, since I’m running a customized version of Hadoop Best, -- Nan Zhu School of Computer Science, McGill University
Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows
Hi, I think that it is not useful trying to install hadoop on Windows because hadoop is very integrated in Linux and there is no support for Windows 2013/10/23 chris bidwell chris.bidw...@oracle.com Is there any documentation or instructions on installing Hadoop 2.2.0 on Microsoft Windows? Thank you. -Chris
Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows
Disclosure: I do not work for Hortonworks, I just use their product. please do not bash me up Angelo, thats not entirely correct now. Hortonworks has done tremendous amount of work to port hadoop to windows os as well. Here is there press release: http://hortonworks.com/about-us/news/hadoop-on-windows/ Chris, I do remember seeing hadoop 2.x for windows ( http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/) I never tried it myself it on windows but you can reach out to their support forum and I am sure someone will be happy to help . On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo matarazzoang...@gmail.com wrote: Hi, I think that it is not useful trying to install hadoop on Windows because hadoop is very integrated in Linux and there is no support for Windows 2013/10/23 chris bidwell chris.bidw...@oracle.com Is there any documentation or instructions on installing Hadoop 2.2.0 on Microsoft Windows? Thank you. -Chris -- Nitin Pawar
Hadoop 2.2.0 :What are the new features for Windows users
* In the release note I could see that Support for running Hadoop on Microsoft Windows is included. * * Can somebody tell me what are those features? * * - Are these new features added to support easy Windows installation? * - Are there any inbuilt MSFT.Net support introduced? Joy George K www.JoymonOnline.inhttp://www.joymononline.in/ Orion Systems Integrators Inc Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail, delete and then destroy all copies of the original message
Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows
It was my understanding that HortonWorks depended on CygWin (UNIX emulation on Windows) for most of the Bigtop family of tools - Hadoop core, MapReduce, etc. - so, you will probably make all your configuration files in Windows, since XML is agnostic, and can develop in Windows, since JARs and other Java constructs are agnostic by design, but when things are actually happening on your cluster, CygWin is in the middle. For a framework like Hadoop, I question the wisdom of deciding to use a host environment that uses more memory to begin with, then adding an emulation layer that complicates things and takes still more memory (since most Hadoop constraints are memory-based) simply for the convenience of being able to use my mouse. If you have a competent *NIX admin, you may consider the benefits of using Hadoop in Linux/UNIX, and leveraging the many web-based management tools (usually will come with a Hadoop distribution) and 3rd-party development tools from Informatica or Pentaho (let you build MapReduce, Pig, etc jobs in GUI) rather than going the Windows route - not only are you using hardware resources better, you also don't need to worry about licensing, and you won't need to reboot your cluster every Patch Tuesday. :-) Thanks! *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Thu, Oct 24, 2013 at 8:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Disclosure: I do not work for Hortonworks, I just use their product. please do not bash me up Angelo, thats not entirely correct now. Hortonworks has done tremendous amount of work to port hadoop to windows os as well. Here is there press release: http://hortonworks.com/about-us/news/hadoop-on-windows/ Chris, I do remember seeing hadoop 2.x for windows ( http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/) I never tried it myself it on windows but you can reach out to their support forum and I am sure someone will be happy to help . On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo matarazzoang...@gmail.com wrote: Hi, I think that it is not useful trying to install hadoop on Windows because hadoop is very integrated in Linux and there is no support for Windows 2013/10/23 chris bidwell chris.bidw...@oracle.com Is there any documentation or instructions on installing Hadoop 2.2.0 on Microsoft Windows? Thank you. -Chris -- Nitin Pawar
Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows
No Cygwin is involved...We have the installation via MSI clearly documented on our site. Python, Visual C++, JDK and .Net framework. This work was done in conjunction with Microsoft who is not in the business of supporting Cygwin. Hadoop is Java and Java runs on windows. On Thu, Oct 24, 2013 at 9:20 AM, DSuiter RDX dsui...@rdx.com wrote: It was my understanding that HortonWorks depended on CygWin (UNIX emulation on Windows) for most of the Bigtop family of tools - Hadoop core, MapReduce, etc. - so, you will probably make all your configuration files in Windows, since XML is agnostic, and can develop in Windows, since JARs and other Java constructs are agnostic by design, but when things are actually happening on your cluster, CygWin is in the middle. For a framework like Hadoop, I question the wisdom of deciding to use a host environment that uses more memory to begin with, then adding an emulation layer that complicates things and takes still more memory (since most Hadoop constraints are memory-based) simply for the convenience of being able to use my mouse. If you have a competent *NIX admin, you may consider the benefits of using Hadoop in Linux/UNIX, and leveraging the many web-based management tools (usually will come with a Hadoop distribution) and 3rd-party development tools from Informatica or Pentaho (let you build MapReduce, Pig, etc jobs in GUI) rather than going the Windows route - not only are you using hardware resources better, you also don't need to worry about licensing, and you won't need to reboot your cluster every Patch Tuesday. :-) Thanks! *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Thu, Oct 24, 2013 at 8:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Disclosure: I do not work for Hortonworks, I just use their product. please do not bash me up Angelo, thats not entirely correct now. Hortonworks has done tremendous amount of work to port hadoop to windows os as well. Here is there press release: http://hortonworks.com/about-us/news/hadoop-on-windows/ Chris, I do remember seeing hadoop 2.x for windows ( http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/) I never tried it myself it on windows but you can reach out to their support forum and I am sure someone will be happy to help . On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo matarazzoang...@gmail.com wrote: Hi, I think that it is not useful trying to install hadoop on Windows because hadoop is very integrated in Linux and there is no support for Windows 2013/10/23 chris bidwell chris.bidw...@oracle.com Is there any documentation or instructions on installing Hadoop 2.2.0 on Microsoft Windows? Thank you. -Chris -- Nitin Pawar -- * Adam Diaz * * * * * Solution Engineer - Big Data -- Phone:919 609 4842 Email: ad...@hortonworks.com Website: http://www.hortonworks.com/ * Follow Us: * http://facebook.com/hortonworks/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature http://twitter.com/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature http://www.linkedin.com/company/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature [image: photo] Latest From Our Blog: Strong Ecosystem Support for HDP 2.0, Enabling the Modern Data Architecture http://hortonworks.com/blog/strong-ecosystem-support-for-hdp-2-0/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Documentation on installing Hadoop 2.2.0 on Microsoft Windows
Sorry for my ignorance... I don't bash up you Nitin..Eheh. Thank you very much for your post Adam. I 'm going to see your work. 2013/10/24 DSuiter RDX dsui...@rdx.com Very cool Adam! Thanks for the clarification, and great work for you guys porting it over to native running. *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Thu, Oct 24, 2013 at 9:27 AM, Adam Diaz ad...@hortonworks.com wrote: No Cygwin is involved...We have the installation via MSI clearly documented on our site. Python, Visual C++, JDK and .Net framework. This work was done in conjunction with Microsoft who is not in the business of supporting Cygwin. Hadoop is Java and Java runs on windows. On Thu, Oct 24, 2013 at 9:20 AM, DSuiter RDX dsui...@rdx.com wrote: It was my understanding that HortonWorks depended on CygWin (UNIX emulation on Windows) for most of the Bigtop family of tools - Hadoop core, MapReduce, etc. - so, you will probably make all your configuration files in Windows, since XML is agnostic, and can develop in Windows, since JARs and other Java constructs are agnostic by design, but when things are actually happening on your cluster, CygWin is in the middle. For a framework like Hadoop, I question the wisdom of deciding to use a host environment that uses more memory to begin with, then adding an emulation layer that complicates things and takes still more memory (since most Hadoop constraints are memory-based) simply for the convenience of being able to use my mouse. If you have a competent *NIX admin, you may consider the benefits of using Hadoop in Linux/UNIX, and leveraging the many web-based management tools (usually will come with a Hadoop distribution) and 3rd-party development tools from Informatica or Pentaho (let you build MapReduce, Pig, etc jobs in GUI) rather than going the Windows route - not only are you using hardware resources better, you also don't need to worry about licensing, and you won't need to reboot your cluster every Patch Tuesday. :-) Thanks! *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Thu, Oct 24, 2013 at 8:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Disclosure: I do not work for Hortonworks, I just use their product. please do not bash me up Angelo, thats not entirely correct now. Hortonworks has done tremendous amount of work to port hadoop to windows os as well. Here is there press release: http://hortonworks.com/about-us/news/hadoop-on-windows/ Chris, I do remember seeing hadoop 2.x for windows ( http://hortonworks.com/blog/announcing-beta-release-of-apache-hadoop-2/ ) I never tried it myself it on windows but you can reach out to their support forum and I am sure someone will be happy to help . On Thu, Oct 24, 2013 at 6:08 PM, Angelo Matarazzo matarazzoang...@gmail.com wrote: Hi, I think that it is not useful trying to install hadoop on Windows because hadoop is very integrated in Linux and there is no support for Windows 2013/10/23 chris bidwell chris.bidw...@oracle.com Is there any documentation or instructions on installing Hadoop 2.2.0 on Microsoft Windows? Thank you. -Chris -- Nitin Pawar -- * Adam Diaz * * * * * Solution Engineer - Big Data -- Phone:919 609 4842 Email: ad...@hortonworks.com Website: http://www.hortonworks.com/ * Follow Us: * http://facebook.com/hortonworks/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature http://twitter.com/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature http://www.linkedin.com/company/hortonworks?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature [image: photo] Latest From Our Blog: Strong Ecosystem Support for HDP 2.0, Enabling the Modern Data Architecture http://hortonworks.com/blog/strong-ecosystem-support-for-hdp-2-0/?utm_source=WiseStamputm_medium=emailutm_term=utm_content=utm_campaign=signature CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
NullPointerException when trying to write mapper output
I am using hadoop 1.0.3 at Amazon EMR. I have a map / reduce job configured like this: private static final String TEMP_PATH_PREFIX = System.getProperty(java.io.tmpdir) + /dmp_processor_tmp; ... private Job setupProcessorJobS3() throws IOException, DataGrinderException { String inputFiles = System.getProperty(DGConfig.INPUT_FILES); Job processorJob = new Job(getConf(), PROCESSOR_JOBNAME); processorJob.setJarByClass(DgRunner.class); processorJob.setMapperClass(EntityMapperS3.class); processorJob.setReducerClass(SelectorReducer.class); processorJob.setOutputKeyClass(Text.class); processorJob.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(processorJob, new Path(TEMP_PATH_PREFIX)); processorJob.setOutputFormatClass(TextOutputFormat.class); processorJob.setInputFormatClass(NLineInputFormat.class); FileInputFormat.setInputPaths(processorJob, inputFiles); NLineInputFormat.setNumLinesPerSplit(processorJob, 1); return processorJob; } In my mapper class, I have: private Text outkey = new Text(); private Text outvalue = new Text(); ... outkey.set(entity.getEntityId().toString()); outvalue.set(input.getId().toString()); printLog(context write); context.write(outkey, outvalue); This last line (`context.write(outkey, outvalue);`), causes this exception. Of course both `outkey` and `outvalue` are not null. 2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 context write 2013-10-24 05:48:48,422 ERROR com.s1mbi0se.grinder.core.mapred.EntityMapperCassandra (main): Error on entitymapper for input: 03a07858-4196-46dd-8a2c-23dd824d6e6e java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1293) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1210) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244) at org.apache.hadoop.io.Text.write(Text.java:281) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:698) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.s1mbi0se.grinder.core.mapred.EntityMapper.map(EntityMapper.java:78) at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:34) at com.s1mbi0se.grinder.core.mapred.EntityMapperS3.map(EntityMapperS3.java:14) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) 2013-10-24 05:48:48,422 INFO com.s1mbi0se.grinder.core.mapred.EntityMapperS3 (main): Current Thread: Thread[main,5,main]Current timestamp: 1382593728422 Entity Mapper end The first records on each task are just processed ok. In some point of the task processing, I start to take this exception over and over, and then it doesn't process a single record anymore for that task. I tried to set `TEMP_PATH_PREFIX` to `s3://mybucket/dmp_processor_tmp`, but same thing happened. Any idea why is this happening? What could be making hadoop not being able to write on it's output?
Re: Hadoop core jar class update
Viswanathan, What version of Hadoop are you using? What is the change? On Wednesday, October 23, 2013 2:20 PM, Viswanathan J jayamviswanat...@gmail.com wrote: Hi guys, If I update(very small change) the hadoop-core mapred class for one of the OOME patch and compiled the jar. If I deploy that jar in production will that cause any issue. Will that cause any issue for NN or lose any data? Version of the jar will be same. Is that update will check any checksum? Please help asap. Thanks,
Re: Unable to use third party jar
OOps..forgot the code: http://pastebin.com/7XnyVnkv On Thu, Oct 24, 2013 at 10:54 AM, jamal sasha jamalsha...@gmail.com wrote: Hi, I am trying to join two datasets.. One of which is json.. I am relying on json-simple library to parse that json.. I am trying to use libjars.. So far .. for simple data processing.. the approach has worked.. but now i am getting the following error Exception in thread main java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141) at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.select.Driver.run(Driver.java:130) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.select.Driver.main(Driver.java:139) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) I think I have implemented toolrunner. hadoop jar domain_gold.jar org.select.Driver \ -libjars json-simple-1.1.1.jar $INPUT1 $INPUT2 $OUTPUT .
Unable to use third party jar
Hi, I am trying to join two datasets.. One of which is json.. I am relying on json-simple library to parse that json.. I am trying to use libjars.. So far .. for simple data processing.. the approach has worked.. but now i am getting the following error Exception in thread main java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141) at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.select.Driver.run(Driver.java:130) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.select.Driver.main(Driver.java:139) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) I think I have implemented toolrunner. hadoop jar domain_gold.jar org.select.Driver \ -libjars json-simple-1.1.1.jar $INPUT1 $INPUT2 $OUTPUT .
Re: hadoop2.2.0 compile Failed - no such instruction
Hi Rico! What was the command line you used to build? On Wednesday, October 23, 2013 11:44 PM, codepeak gcodep...@gmail.com wrote: Hi all, I have a problem when compile the hadoop 2.2.0, the apache only offers 32bit distribution, but I need 64bit, so I have to compile it myself. My envirenment is: 2.6.32_1-7-0-0 #1 SMP Wed Jul 25 16:20:31 CST 2012 x86_64 x86_64 x86_64 GNU/Linux the CPU is: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s s ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes6 4 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb x saveopt pln pts tpr_shadow vnmi flexpriority ept vpid 12 cores when compile the hadoop 2.2.0 with maven, something go wrong like belows: but my cpu support the sse4.1 and sse4.2 instruction set, I don't know if there is anything else wrong. thanks so much rico
YARN And NTP
Hi folks. Is there a way to make YARN more forgiving with last modification times? The following exception in org.apache.hadoop.yarn.util.FSDownload: changed on src filesystem (expected + resource.getTimestamp() + , was + sStat.getModificationTime()); I realize that time should be the same, but depending on underlying filesystem the semantics of this last modified time might vary. Any thoughts on this? -- Jay Vyas http://jayunit100.blogspot.com
Re: dynamically resizing the Hadoop cluster?
It seems like you may want to look into Amazon's EMR (elastic mapreduce), which does much of what you are trying to do. It's worth taking a look at since you're already storing your data in S3 and using EC2 for your cluster(s). On Thu, Oct 24, 2013 at 5:07 PM, Nan Zhu zhunans...@gmail.com wrote: Good explanation, Thank you, Ravi Best, On Thu, Oct 24, 2013 at 4:51 PM, Ravi Prakash ravi...@ymail.com wrote: Hi Nan! If the task trackers stop heartbeating back to the JobTracker, the JobTracker will mark them as dead and reschedule the tasks which were running on that TaskTracker. Admittedly there is some delay between when the TaskTrackers stop heartbeating back and when the JobTracker marks them dead. This is controlled by the mapred.tasktracker.expiry.intervalparameter (I'm assuming you are using Hadoop 1.x) HTH Ravi On Thursday, October 24, 2013 1:21 PM, Nan Zhu zhunans...@gmail.com wrote: Hi, Ravi, Thank you for the reply Actually I'm not running HDFS on EC2, instead I use S3 to store data I'm curious about that, if some nodes are decommissioned, the JobTracker will deal those tasks which originally ran on them as too slow (since no progress for a long time) so to run speculative execution OR it directly treats them as belonging to a running Job and ran on a dead TaskTracker? Best, Nan On Thu, Oct 24, 2013 at 2:04 PM, Ravi Prakash ravi...@ymail.com wrote: Hi Nan! Usually nodes are decommissioned slowly over some period of time so as not to disrupt the running jobs. When a node is decommissioned, the NameNode must re-replicate all under-replicated blocks. Rather than suddenly remove half the nodes, you might want to take a few nodes offline at a time. Hadoop should be able to handle rescheduling tasks on nodes no longer available (even without speculative execution. Speculative execution is for something else). HTH Ravi On Wednesday, October 23, 2013 10:26 PM, Nan Zhu zhunans...@gmail.com wrote: Hi, all I’m running a Hadoop cluster on AWS EC2, I would like to dynamically resizing the cluster so as to reduce the cost, is there any solution to achieve this? E.g. I would like to cut the cluster size with a half, is it safe to just shutdown the instances (if some tasks are just running on them, can I rely on the speculative execution to re-run them on the other nodes?) I cannot use EMR, since I’m running a customized version of Hadoop Best, -- Nan Zhu School of Computer Science, McGill University -- Nan Zhu School of Computer Science, McGill University E-Mail: zhunanmcg...@gmail.com zhunans...@gmail.com -- Nan Zhu School of Computer Science, McGill University E-Mail: zhunanmcg...@gmail.com zhunans...@gmail.com
Mapreduce outputs to a different cluster?
The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it. Here are some notes of my mapreduce job: 1. the data source is an HBase table 2. It only has mapper no reducer. Thanks Senqiang
Re: HDP 2.0 Install fails on repo unavailability
BCC'ing user@hadoop. This is a question for the ambari mailing list. -- Hitesh On Oct 24, 2013, at 3:36 PM, Jain, Prem wrote: Folks, Trying to install the newly release Hadoop 2.0 using Ambari. I am able to install Ambari, but when I try to install Hadoop 2.0 on rest of the cluster, the installation fails erroring on repo mirror unavailability. Not sure where I messed up. Here are the error messages Output log from AMBARI during Installation notice: /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln 32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs' returned 1: Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-lzo' returned 1: Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content: content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7' notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content: content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to '{md5}dd3922fc27f72cd78cf2b47f57351b08' notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content: content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to '{md5}1b626aa016a6f916271f67f3aa22cbbb' err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop' returned 1: Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop-libhdfs] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop-lzo] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]: Dependency Package[hadoop-libhdfs] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]: Dependency Package[hadoop-lzo] has failures: true Repo : http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25 ErrorCodeNoSuchKey/CodeMessageThe specified key does not exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25/KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error Manual install: [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure:
Re: HDP 2.0 Install fails on repo unavailability
The servers have been very busy since the release. You probably just need to try again. On Oct 24, 2013 6:37 PM, Jain, Prem premanshu.j...@netapp.com wrote: Folks, ** ** Trying to install the newly release Hadoop 2.0 using Ambari. I am able to install Ambari, but when I try to install Hadoop 2.0 on rest of the cluster, the installation fails erroring on repo mirror unavailability. Not sure where I messed up. ** ** Here are the error messages ** ** *Output log from AMBARI during Installation* ** ** notice: /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln 32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs' returned 1: ** ** Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. ** ** ** ** err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-lzo' returned 1: ** ** Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. ** ** ** ** notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content: content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7' notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content: content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to '{md5}dd3922fc27f72cd78cf2b47f57351b08' notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content: content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to '{md5}1b626aa016a6f916271f67f3aa22cbbb' err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop' returned 1: ** ** Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. ** ** ** ** notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop-libhdfs] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop-lzo] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]: Dependency Package[hadoop-libhdfs] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]: Dependency Package[hadoop-lzo] has failures: true ** ** *Repo :* ** ** http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25** ** ErrorCodeNoSuchKey/CodeMessageThe specified key does not exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25 /KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error ** ** ** ** *Manual install:* ** ** [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs Error Downloading Packages:
Re: map container is assigned default memory size rather than user configured which will cause TaskAttempt failure
My mapreduce.map.java.opts is 1024MB Thanks, Manu On Thu, Oct 24, 2013 at 3:11 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.comwrote: Hi, How about checking the value of mapreduce.map.java.opts? Are your JVMs launched with assumed heap memory? On Thu, Oct 24, 2013 at 11:31 AM, Manu Zhang owenzhang1...@gmail.com wrote: Just confirmed the problem still existed even the mapred-site.xmls on all nodes have the same configuration (mapreduce.map.memory.mb = 2560). Any more thoughts ? Thanks, Manu On Thu, Oct 24, 2013 at 8:59 AM, Manu Zhang owenzhang1...@gmail.com wrote: Thanks Ravi. I do have mapred-site.xml under /etc/hadoop/conf/ on those nodes but it sounds weird to me should they read configuration from those mapred-site.xml since it's the client who applies for the resource. I have another mapred-site.xml in the directory where I run my job. I suppose my job should read conf from that mapred-site.xml. Please correct me if I am mistaken. Also, not always the same nodes. The number of failures is random, too. Anyway, I will have my settings in all the nodes' mapred-site.xml and see if the problem goes away. Manu On Thu, Oct 24, 2013 at 1:40 AM, Ravi Prakash ravi...@ymail.com wrote: Manu! This should not be the case. All tasks should have the configuration values you specified propagated to them. Are you sure your setup is correct? Are they always the same nodes which run with 1024Mb? Perhaps you have mapred-site.xml on those nodes? HTH Ravi On Tuesday, October 22, 2013 9:09 PM, Manu Zhang owenzhang1...@gmail.com wrote: Hi, I've been running Terasort on Hadoop-2.0.4. Every time there is s a small number of Map failures (like 4 or 5) because of container's running beyond virtual memory limit. I've set mapreduce.map.memory.mb to a safe value (like 2560MB) so most TaskAttempt goes fine while the values of those failed maps are the default 1024MB. My question is thus, why a small number of container's memory values are set to default rather than that of user-configured ? Any thoughts ? Thanks, Manu Zhang -- - Tsuyoshi
Re: HDP 2.0 Install fails on repo unavailability
I think I have the fix for this. I'll check when I get home. Clay McDonald Sent from my iPhone On Oct 24, 2013, at 7:36 PM, Hitesh Shah hit...@apache.org wrote: BCC'ing user@hadoop. This is a question for the ambari mailing list. -- Hitesh On Oct 24, 2013, at 3:36 PM, Jain, Prem wrote: Folks, Trying to install the newly release Hadoop 2.0 using Ambari. I am able to install Ambari, but when I try to install Hadoop 2.0 on rest of the cluster, the installation fails erroring on repo mirror unavailability. Not sure where I messed up. Here are the error messages Output log from AMBARI during Installation notice: /Stage[1]/Hdp::Snappy::Package/Hdp::Snappy::Package::Ln[32]/Hdp::Exec[hdp::snappy::package::ln 32]/Exec[hdp::snappy::package::ln 32]/returns: executed successfully err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-libhdfs]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs' returned 1: Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-hdfs-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop-lzo]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-lzo' returned 1: Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[hdfs-site]/File[/etc/hadoop/conf/hdfs-site.xml]/content: content changed '{md5}117224b1cf67c151f8a3d7ac0a157fa5' to '{md5}ba383a94bdde1a0b2eb5c59b1f5b61e7' notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[capacity-scheduler]/File[/etc/hadoop/conf/capacity-scheduler.xml]/content: content changed '{md5}08d7e952b3e2d4fd5a2a880dfcd3a2df' to '{md5}dd3922fc27f72cd78cf2b47f57351b08' notice: /Stage[2]/Hdp-hadoop::Initialize/Configgenerator::Configfile[core-site]/File[/etc/hadoop/conf/core-site.xml]/content: content changed '{md5}76d06ebce1310be7e65ae0c7e8c3068a' to '{md5}1b626aa016a6f916271f67f3aa22cbbb' err: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Package[hadoop]/ensure: change from absent to present failed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop' returned 1: Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure: hadoop/hadoop-2.2.0.2.0.6.0-76.el6.x86_64.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. zookeeper-3.4.5.2.0.6.0-76.el6.noarch: failure: zookeeper/zookeeper-3.4.5.2.0.6.0-76.el6.noarch.rpm from HDP-2.0.6: [Errno 256] No more mirrors to try. notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop-libhdfs] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop-lzo] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Hdp::Package[hadoop 64]/Hdp::Package::Process_pkg[hadoop 64]/Anchor[hdp::package::hadoop 64::end]: Dependency Package[hadoop] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]: Dependency Package[hadoop-libhdfs] has failures: true notice: /Stage[main]/Hdp-hadoop/Hdp-hadoop::Package[hadoop]/Anchor[hdp-hadoop::package::helper::end]: Dependency Package[hadoop-lzo] has failures: true Repo : http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.25 ErrorCodeNoSuchKey/CodeMessageThe specified key does not exist./MessageKeyambari/centos6/1.x/updates/1.4.1.25/KeyRequestId4693487CE703DB53/RequestIdHostIdB87iAHvcpH7im27HOuEKBJ0F+qPFf+7aXuTe+O7OhLb9WscyxTbV/2yUPXO+KPOJ/HostId/Error Manual install: [root@dn5 ~]# /usr/bin/yum -d 0 -e 0 -y install hadoop-libhdfs Error Downloading Packages: hadoop-2.2.0.2.0.6.0-76.el6.x86_64: failure:
RE: Mapreduce outputs to a different cluster?
Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine. Yong Date: Thu, 24 Oct 2013 15:28:27 -0700 From: myx...@yahoo.com Subject: Mapreduce outputs to a different cluster? To: user@hadoop.apache.org The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it. Here are some notes of my mapreduce job:1. the data source is an HBase table2. It only has mapper no reducer. ThanksSenqiang
Terasort's performance anomaly
Hi, I've been running Terasort on Hadoop 2.1.0-beta. I have a 6 node cluster 5 of which runs a Node Manager and all have a Data Node. I don't understand why I have a bad performance in most cases and why in some cases the performance is good (10GB Terasort with 2 reducers). * When I run 10, 20 and 30 GB with 1 reducer, I get the following results: Total time: 1100, 3300 and 5700 sec Avg map time: 29, 50 and 72 sec Avg reduce time: 870, 2700 and 4800 sec Killed map tasks: 2, 5 and 5 * When I run 10, 20 and 30 GB with 2 reducers, I get the following results: Total time: 385, 4575 and 7379 sec Avg map time: 35, 52 and 70 sec Avg reduce time: 225, 3879 and 6116 sec Killed map tasks: 1, 4, 5 * These results don't make sense since there is no correlation between them. Somehow, 10 GB Terasort with 2 reducers works much better than 1 reducer. In other cases increasing the amount of reducers actually decreases the performance. When I check the logs of application master, I see a lot of Container killed on request. Exit code is 143 error which are generally followed by [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1382628871858_0002_m_21_0: Container killed by the ApplicationMaster. (e.g. there are 219 of them in 30GB Terasort with 2 reducers) - which doesn't give much information. * I mostly use the default settings, the only changes which may have an impact (I also set dfs replication factor to 1) are the following: property nameyarn.nodemanager.aux-services/name valuemapreduce.shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property * I observed that in all cases map output materialized bytes is slightly more than map output bytes (which would be ok since I don't use any compression). Here is one of the terminal outputs (from 20 GB Terasort with 2 reducers): 13/10/23 22:31:32 INFO mapreduce.Job: map 0% reduce 0% 13/10/23 22:31:45 INFO mapreduce.Job: map 1% reduce 0% 13/10/23 22:31:47 INFO mapreduce.Job: map 3% reduce 0% 13/10/23 22:31:48 INFO mapreduce.Job: map 6% reduce 0% 13/10/23 22:31:49 INFO mapreduce.Job: map 7% reduce 0% 13/10/23 22:31:50 INFO mapreduce.Job: map 8% reduce 0% 13/10/23 22:31:51 INFO mapreduce.Job: map 10% reduce 0% 13/10/23 22:31:53 INFO mapreduce.Job: map 11% reduce 0% 13/10/23 22:31:55 INFO mapreduce.Job: map 12% reduce 0% 13/10/23 22:31:57 INFO mapreduce.Job: map 14% reduce 0% 13/10/23 22:31:59 INFO mapreduce.Job: map 15% reduce 0% 13/10/23 22:32:00 INFO mapreduce.Job: map 16% reduce 0% 13/10/23 22:32:02 INFO mapreduce.Job: map 17% reduce 0% 13/10/23 22:32:03 INFO mapreduce.Job: map 18% reduce 0% 13/10/23 22:32:08 INFO mapreduce.Job: map 20% reduce 0% 13/10/23 22:32:09 INFO mapreduce.Job: map 21% reduce 0% 13/10/23 22:32:12 INFO mapreduce.Job: map 22% reduce 0% 13/10/23 22:32:15 INFO mapreduce.Job: map 23% reduce 0% 13/10/23 22:32:22 INFO mapreduce.Job: map 25% reduce 0% 13/10/23 22:32:28 INFO mapreduce.Job: map 30% reduce 1% 13/10/23 22:32:32 INFO mapreduce.Job: map 36% reduce 1% 13/10/23 22:32:35 INFO mapreduce.Job: map 38% reduce 1% 13/10/23 22:32:41 INFO mapreduce.Job: map 39% reduce 1% 13/10/23 22:32:43 INFO mapreduce.Job: map 40% reduce 1% 13/10/23 22:32:44 INFO mapreduce.Job: map 41% reduce 2% 13/10/23 22:32:47 INFO mapreduce.Job: map 42% reduce 2% 13/10/23 22:32:49 INFO mapreduce.Job: map 43% reduce 2% 13/10/23 22:32:58 INFO mapreduce.Job: map 44% reduce 2% 13/10/23 22:33:01 INFO mapreduce.Job: map 47% reduce 2% 13/10/23 22:33:04 INFO mapreduce.Job: map 48% reduce 2% 13/10/23 22:33:05 INFO mapreduce.Job: map 49% reduce 2% 13/10/23 22:33:07 INFO mapreduce.Job: map 49% reduce 3% 13/10/23 22:33:08 INFO mapreduce.Job: map 50% reduce 3% 13/10/23 22:33:12 INFO mapreduce.Job: map 51% reduce 3% 13/10/23 22:33:13 INFO mapreduce.Job: map 52% reduce 3% 13/10/23 22:33:24 INFO mapreduce.Job: map 55% reduce 4% 13/10/23 22:33:36 INFO mapreduce.Job: map 60% reduce 5% 13/10/23 22:33:39 INFO mapreduce.Job: map 61% reduce 5% 13/10/23 22:33:44 INFO mapreduce.Job: map 62% reduce 5% 13/10/23 22:33:49 INFO mapreduce.Job: map 64% reduce 5% 13/10/23 22:33:54 INFO mapreduce.Job: map 67% reduce 6% 13/10/23 22:33:57 INFO mapreduce.Job: map 69% reduce 6% 13/10/23 22:34:00 INFO mapreduce.Job: map 70% reduce 6% 13/10/23 22:34:02 INFO mapreduce.Job: map 71% reduce 6% 13/10/23 22:34:03 INFO mapreduce.Job: map 72% reduce 6% 13/10/23 22:34:05 INFO mapreduce.Job: map 73% reduce 6% 13/10/23 22:34:07 INFO mapreduce.Job: map 74% reduce 6% 13/10/23 22:34:16 INFO mapreduce.Job: map 76% reduce 6% 13/10/23 22:34:19 INFO mapreduce.Job: map 77% reduce 7% 13/10/23 22:34:22 INFO mapreduce.Job: map 78% reduce 7% 13/10/23 22:34:24 INFO mapreduce.Job: map 79% reduce 7% 13/10/23 22:34:27 INFO mapreduce.Job: map 80% reduce 7% 13/10/23 22:34:30 INFO
Re: Mapreduce outputs to a different cluster?
Thanks Shahab Yong. If cluster B (in which I want to dump output) has url hdfs://machine.domain:8080 and data folder /tmp/myfolder, what should I specify as the output path for MR job? Thanks On Thursday, October 24, 2013 5:31 PM, java8964 java8964 java8...@hotmail.com wrote: Just specify the output location using the URI to another cluster. As long as the network is accessible, you should be fine. Yong Date: Thu, 24 Oct 2013 15:28:27 -0700 From: myx...@yahoo.com Subject: Mapreduce outputs to a different cluster? To: user@hadoop.apache.org The scenario is: I run mapreduce job on cluster A (all source data is in cluster A) but I want the output of the job to cluster B. Is it possible? If yes, please let me know how to do it. Here are some notes of my mapreduce job: 1. the data source is an HBase table 2. It only has mapper no reducer. Thanks Senqiang
map phase does not read intermediate results with SequenceFileInputFormat
Hi all. I have a mapreduce program with two jobs. second job's key and value comes from first job output. but I think the second map does not get the result from first job. in other words I think my second job did not read the output of my first job.. what should I do? here is the code: public class dewpoint extends Configured implements Tool { private static final Logger logger = LoggerFactory.getLogger(dewpoint.class); static final String KEYSPACE = weather; static final String COLUMN_FAMILY = user; private static final String OUTPUT_PATH1 = /tmp/intermediate1; private static final String OUTPUT_PATH2 = /tmp/intermediate2; private static final String OUTPUT_PATH3 = /tmp/intermediate3; private static final String INPUT_PATH1 = /tmp/intermediate1; public static void main(String[] args) throws Exception { ToolRunner.run(new Configuration(), new dewpoint(), args); System.exit(0); } /// public static class dpmap1 extends MapperMapString, ByteBuffer, MapFloatWritable, ByteBuffer, Text, DoubleWritable { DoubleWritable val1 = new DoubleWritable(); Text word = new Text(); String date; float temp; public void map(MapString, ByteBuffer keys, MapFloatWritable, ByteBuffer columns, Context context) throws IOException, InterruptedException { for (EntryString, ByteBuffer key : keys.entrySet()) { //System.out.println(key.getKey()); if (!date.equals(key.getKey())) continue; date = ByteBufferUtil.string(key.getValue()); word.set(date); } for (EntryFloatWritable, ByteBuffer column : columns.entrySet()) { if (!temprature.equals(column.getKey())) continue; temp = ByteBufferUtil.toFloat(column.getValue()); val1.set(temp); //System.out.println(temp); } context.write(word, val1); } } /// public static class dpred1 extends ReducerText, DoubleWritable, Text, DoubleWritable { public void reduce(Text key, IterableDoubleWritable values, Context context) throws IOException, InterruptedException { double beta = 17.62; double landa = 243.12; DoubleWritable result1 = new DoubleWritable(); DoubleWritable result2 = new DoubleWritable(); for (DoubleWritable val : values){ // System.out.println(val.get()); beta *= val.get(); landa+=val.get(); } result1.set(beta); result2.set(landa); context.write(key, result1); context.write(key, result2); } } /// public static class dpmap2 extends Mapper Text, DoubleWritable, Text, DoubleWritable{ Text key2 = new Text(); double temp1, temp2 =0; public void map(Text key, IterableDoubleWritable values, Context context) throws IOException, InterruptedException { String[] sp = values.toString().split(\t); for (int i=0; i sp.length; i+=4) //key2.set(sp[i]); System.out.println(sp[i]); for(int j=1;j sp.length; j+=4) temp1 = Double.valueOf(sp[j]); for (int k=3;k sp.length; k+=4) temp2 = Double.valueOf(sp[k]); context.write(key2, new DoubleWritable(temp2/temp1)); } } /// public static class dpred2 extends ReducerText, DoubleWritable, Text, DoubleWritable { public void reduce(Text key, IterableDoubleWritable values, Context context) throws IOException, InterruptedException { double alpha = 6.112; double tmp = 0; DoubleWritable result3 = new DoubleWritable(); for (DoubleWritable val : values){ System.out.println(val.get()); tmp = alpha*(Math.pow(Math.E, val.get())); } result3.set(tmp); context.write(key, result3); } } /// public int run(String[] args) throws Exception { Job job1 = new Job(getConf(), DewPoint); job1.setJarByClass(dewpoint.class); job1.setMapperClass(dpmap1.class); job1.setOutputFormatClass(SequenceFileOutputFormat.class); job1.setCombinerClass(dpred1.class); job1.setReducerClass(dpred1.class); job1.setMapOutputKeyClass(Text.class); job1.setMapOutputValueClass(DoubleWritable.class); job1.setOutputKeyClass(Text.class); job1.setOutputValueClass(DoubleWritable.class); FileOutputFormat.setOutputPath(job1, new Path(OUTPUT_PATH1)); job1.setInputFormatClass(CqlPagingInputFormat.class); ConfigHelper.setInputRpcPort(job1.getConfiguration(), 9160); ConfigHelper.setInputInitialAddress(job1.getConfiguration(), localhost);