Re: Hadoop 2.2.0-cdh5.0.0-beta-1 - MapReduce Streaming - Failed to run on a larger jobs

2014-04-10 Thread Harsh J
It appears to me that whatever chunk of the input CSV files your map
task 000149 gets, the program is unable to process it and throws an
error and exits.

Look into the attempt_1395628276810_0062_m_000149_0 attempt's task log
to see if there's any stdout/stderr printed that may help. The syslog
in the attempt's task log will also carry a Processing split ...
message that may help you know which file and what offset+length under
that file was being processed.

On Thu, Apr 10, 2014 at 10:55 AM, Phan, Truong Q
troung.p...@team.telstra.com wrote:
 Hi



 My Hadoop 2.2.0-cdh5.0.0-beta-1 is failed to run on a larger MapReduce
 Streaming job.

 I have no issue in running the MapReduce Streaming job which has an input
 data file of around 400Mb CSV file.

 However, it is failed when I try to run the job which has 11 input data
 files of size 400Mb each.

 The job failed with the following error.



 I appreciate for any hints or suggestions to fix this issue.



 +

 2014-04-10 10:28:10,498 FATAL [IPC Server handler 2 on 52179]
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
 attempt_1395628276810_0062_m_000149_0 - exited : java.lang.RuntimeException:
 PipeMapRed.waitOutputThreads(): subprocess failed with code 1

 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)

 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)

 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 at
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



 2014-04-10 10:28:10,498 INFO [IPC Server handler 2 on 52179]
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from
 attempt_1395628276810_0062_m_000149_0: Error: java.lang.RuntimeException:
 PipeMapRed.waitOutputThreads(): subprocess failed with code 1

 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)

 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)

 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 at
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



 2014-04-10 10:28:10,499 INFO [AsyncDispatcher event handler]
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics
 report from attempt_1395628276810_0062_m_000149_0: Error:
 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
 failed with code 1

 at
 org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)

 at
 org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)

 at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)

 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)

 at
 org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)

 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



 +

 MAPREDUCE SCRIPT:

 $ cat devices-hdfs-mr-PyIterGen-v3.sh

 #!/bin/sh

 export HADOOP_CMD=/usr/bin/hadoop

 export 

not able to run map reduce job example on aws machine

2014-04-10 Thread Rahul Singh
Hi,
  I am getting following exception while running word count example,

14/04/10 15:17:09 INFO mapreduce.Job: Task Id :
attempt_1397123038665_0001_m_00_2, Status : FAILED
Container launch failed for container_1397123038665_0001_01_04 :
java.lang.IllegalArgumentException: Does not contain a valid host:port
authority: poc_hadoop04:46162
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:211)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
at
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:210)
at
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


I have everything configured with hdfs  running where i am able to create
files and directories. running jps on my machine shows all components
running.

10290 NameNode
10416 DataNode
10738 ResourceManager
11634 Jps
10584 SecondaryNameNode
10844 NodeManager


Any pointers will be appreciated.

Thanks and Regards,
-Rahul Singh


Re: not able to run map reduce job example on aws machine

2014-04-10 Thread Kiran Dangeti
Rahul,

Please check the port name given in mapred.site.xml
Thanks
Kiran

On Thu, Apr 10, 2014 at 3:23 PM, Rahul Singh smart.rahul.i...@gmail.comwrote:

  Hi,
   I am getting following exception while running word count example,

 14/04/10 15:17:09 INFO mapreduce.Job: Task Id :
 attempt_1397123038665_0001_m_00_2, Status : FAILED
 Container launch failed for container_1397123038665_0001_01_04 :
 java.lang.IllegalArgumentException: Does not contain a valid host:port
 authority: poc_hadoop04:46162
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:211)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:210)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)


 I have everything configured with hdfs  running where i am able to create
 files and directories. running jps on my machine shows all components
 running.

 10290 NameNode
 10416 DataNode
 10738 ResourceManager
 11634 Jps
 10584 SecondaryNameNode
 10844 NodeManager


 Any pointers will be appreciated.

 Thanks and Regards,
 -Rahul Singh



Re: File requests to Namenode

2014-04-10 Thread Diwakar Hadoop
Thanks !!!

Diwakar

Sent from my iPhone

 On Apr 9, 2014, at 9:22 PM, Harsh J ha...@cloudera.com wrote:
 
 You could look at metrics the NN publishes, or look at/process the
 HDFS audit log.
 
 On Wed, Apr 9, 2014 at 6:36 PM, Diwakar Sharma diwakar.had...@gmail.com 
 wrote:
 How and where to check how many datanode block address requests a namenode
 gets when running a map reduce job.
 
 - Diwakar
 
 
 
 -- 
 Harsh J


Re: not able to run map reduce job example on aws machine

2014-04-10 Thread Rahul Singh
here is my mapred.site.xml config

property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property


Also, The job runs fine in memory, if i remove the dependency on yarn, i.e.
if i comment out:
property
  name mapreduce.framework.name/name
  valueyarn/value
/property

in mapred-site.xml.




On Thu, Apr 10, 2014 at 4:43 PM, Kiran Dangeti kirandkumar2...@gmail.comwrote:

 Rahul,

 Please check the port name given in mapred.site.xml
 Thanks
 Kiran

 On Thu, Apr 10, 2014 at 3:23 PM, Rahul Singh 
 smart.rahul.i...@gmail.comwrote:

  Hi,
   I am getting following exception while running word count example,

 14/04/10 15:17:09 INFO mapreduce.Job: Task Id :
 attempt_1397123038665_0001_m_00_2, Status : FAILED
 Container launch failed for container_1397123038665_0001_01_04 :
 java.lang.IllegalArgumentException: Does not contain a valid host:port
 authority: poc_hadoop04:46162
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:211)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:210)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)


 I have everything configured with hdfs  running where i am able to create
 files and directories. running jps on my machine shows all components
 running.

 10290 NameNode
 10416 DataNode
 10738 ResourceManager
 11634 Jps
 10584 SecondaryNameNode
 10844 NodeManager


 Any pointers will be appreciated.

 Thanks and Regards,
 -Rahul Singh





InputFormat and InputSplit - Network location name contains /:

2014-04-10 Thread Patcharee Thongtra

Hi,

I wrote a custom InputFormat and InputSplit to handle netcdf file. I use 
with a custom pig Load function. When I submitted a job by running a pig 
script. I got an error below. From the error log, the network location 
name is 
hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 - 
my input file, containing /, and hadoop does not allow.


It could be something missing in my custom InputFormat and InputSplit. 
Any ideas? Any help is appreciated,


Patcharee


2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: 
Processing the event EventType: JOB_SETUP


2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
job_1387474594811_0071Job Transitioned from SETUP to RUNNING


2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.util.RackResolver: Resolved 
hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to 
/default-rack


2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.IllegalArgumentException: Network location name contains /: 
hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02

at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87)
at org.apache.hadoop.net.NodeBase.init(NodeBase.java:65)
at 
org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111)
at 
org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.init(TaskAttemptImpl.java:548)
at 
org.apache.hadoop.mapred.MapTaskAttemptImpl.init(MapTaskAttemptImpl.java:47)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)

at java.lang.Thread.run(Thread.java:662)
2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.


hadoop 2.0 upgrade to 2.4

2014-04-10 Thread motty cruz
Hi All,

I currently have a hadoop 2.0 cluster in production, I want to upgrade to
latest release.

current version:
[root@doop1 ~]# hadoop version
Hadoop 2.0.0-cdh4.6.0

Cluster has the following services:
hbase
hive
hue
impala
mapreduce
oozie
sqoop
zookeeper

can someone point me to a howto upgrade hadoop from 2.0 to hadoop 2.4.0?

Thanks in advance,


Re: hadoop 2.0 upgrade to 2.4

2014-04-10 Thread Alejandro Abdelnur
Motty,

https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/CDH5-Installation-Guide.html


provides instructions to upgrade from CDH4 to CDH5 (which bundles Hadoop
2.3.0).

If you intention is to use CDH5 that should help you. If you have further
questions about it, the right alias to use is cdh-u...@cloudera.org

If your intention is to use Apache Hadoop 2.4.0, some of CDH documentation
above may still be relevant.

Thanks.



On Thu, Apr 10, 2014 at 12:20 PM, motty cruz motty.c...@gmail.com wrote:

 Hi All,

 I currently have a hadoop 2.0 cluster in production, I want to upgrade to
 latest release.

 current version:
 [root@doop1 ~]# hadoop version
 Hadoop 2.0.0-cdh4.6.0

 Cluster has the following services:
 hbase
 hive
 hue
 impala
 mapreduce
 oozie
 sqoop
 zookeeper

 can someone point me to a howto upgrade hadoop from 2.0 to hadoop 2.4.0?

 Thanks in advance,




-- 
Alejandro


Re: use setrep change number of file replicas,but not work

2014-04-10 Thread ch huang
i can use fsck to get Over-replicated blocks but how can i track pending
delete ?

On Thu, Apr 10, 2014 at 10:50 AM, Harsh J ha...@cloudera.com wrote:

 The replica deletion is asynchronous. You can track its deletions via
 the NameNode's over-replicated blocks and the pending delete metrics.

 On Thu, Apr 10, 2014 at 7:16 AM, ch huang justlo...@gmail.com wrote:
  hi,maillist:
  i try modify replica number on some dir but it seems not work
  ,anyone know why?
 
  # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest
  Replication 2 set:
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  the file still store 3 replica ,but the echo number changed
  # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   2 hdfs hdfs  17660 2014-01-26 18:34
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:39:51 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 
 BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591
  len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010,
  192.168.11.14:50010]
 
  i remove the file ,and upload new file ,as i understand ,the new file
 should
  be stored in 2 replica,but it still store 3 replica ,why?
  # sudo -u hdfs hadoop fs -rm -r -skipTrash
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/*
  Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # hadoop fs -put ./data_0
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/
  [root@ch12 ~]# hadoop fs -ls
  /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   3 root hdfs  17660 2014-04-10 09:40
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:41:12 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786
  len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010,
  192.168.11.13:50010]



 --
 Harsh J



Re: use setrep change number of file replicas,but not work

2014-04-10 Thread ch huang
i set replica number from 3 to 2,but i dump NN metrics ,the
PendingDeletionBlocks is zero ,why?
if the check thread will sleep a interval then do it's check work ,how long
the interval time is?

On Thu, Apr 10, 2014 at 10:50 AM, Harsh J ha...@cloudera.com wrote:

 The replica deletion is asynchronous. You can track its deletions via
 the NameNode's over-replicated blocks and the pending delete metrics.

 On Thu, Apr 10, 2014 at 7:16 AM, ch huang justlo...@gmail.com wrote:
  hi,maillist:
  i try modify replica number on some dir but it seems not work
  ,anyone know why?
 
  # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest
  Replication 2 set:
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  the file still store 3 replica ,but the echo number changed
  # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   2 hdfs hdfs  17660 2014-01-26 18:34
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:39:51 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 
 BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591
  len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010,
  192.168.11.14:50010]
 
  i remove the file ,and upload new file ,as i understand ,the new file
 should
  be stored in 2 replica,but it still store 3 replica ,why?
  # sudo -u hdfs hadoop fs -rm -r -skipTrash
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/*
  Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # hadoop fs -put ./data_0
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/
  [root@ch12 ~]# hadoop fs -ls
  /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   3 root hdfs  17660 2014-04-10 09:40
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:41:12 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786
  len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010,
  192.168.11.13:50010]



 --
 Harsh J



which dir in HDFS can be clean ?

2014-04-10 Thread ch huang
hi,maillist:
  my HDFS cluster run about 1 year ,and i find many dir is very
large,i wonder if some of them can be clean?

like

/var/log/hadoop-yarn/apps


Re: how can i archive old data in HDFS?

2014-04-10 Thread Stanley Shi
AFAIK, no tools now.

Regards,
*Stanley Shi,*



On Fri, Apr 11, 2014 at 9:09 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  how can i archive old data in HDFS ,i have lot of old data ,the
 data will not be use ,but it take lot of space to store it ,i want to
 archive and zip the old data, HDFS can do this operation?



download hadoop-2.4

2014-04-10 Thread lei liu
Hadoop-2.4 is release, where can I download the hadoop-2.4 code from?


Thanks,

LiuLei


Re: download hadoop-2.4

2014-04-10 Thread Mingjiang Shi
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/


On Fri, Apr 11, 2014 at 10:23 AM, lei liu liulei...@gmail.com wrote:

 Hadoop-2.4 is release, where can I download the hadoop-2.4 code from?


 Thanks,

 LiuLei




-- 
Cheers
-MJ


Re: download hadoop-2.4

2014-04-10 Thread Zhijie Shen
The official release can be found on:
http://www.apache.org/dyn/closer.cgi/hadoop/common/

But you can also choose to checkout the code from svn/git repository.


On Thu, Apr 10, 2014 at 8:08 PM, Mingjiang Shi m...@gopivotal.com wrote:

 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0/


 On Fri, Apr 11, 2014 at 10:23 AM, lei liu liulei...@gmail.com wrote:

 Hadoop-2.4 is release, where can I download the hadoop-2.4 code from?


 Thanks,

 LiuLei




 --
 Cheers
 -MJ




-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: InputFormat and InputSplit - Network location name contains /:

2014-04-10 Thread Harsh J
Do not use the InputSplit's getLocations() API to supply your file
path, it is not intended for such things, if thats what you've done in
your current InputFormat implementation.

If you're looking to store a single file path, use the FileSplit
class, or if not as simple as that, do use it as a base reference to
build you Path based InputSplit derivative. Its sources are at
https://github.com/apache/hadoop-common/blob/release-2.4.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileSplit.java.
Look for the Writable method overrides in particular to understand how
to use custom fields.

On Thu, Apr 10, 2014 at 9:54 PM, Patcharee Thongtra
patcharee.thong...@uni.no wrote:
 Hi,

 I wrote a custom InputFormat and InputSplit to handle netcdf file. I use
 with a custom pig Load function. When I submitted a job by running a pig
 script. I got an error below. From the error log, the network location name
 is hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 -
 my input file, containing /, and hadoop does not allow.

 It could be something missing in my custom InputFormat and InputSplit. Any
 ideas? Any help is appreciated,

 Patcharee


 2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0]
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing
 the event EventType: JOB_SETUP

 2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler]
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
 job_1387474594811_0071Job Transitioned from SETUP to RUNNING

 2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler]
 org.apache.hadoop.yarn.util.RackResolver: Resolved
 hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to
 /default-rack

 2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler]
 org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
 java.lang.IllegalArgumentException: Network location name contains /:
 hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02
 at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87)
 at org.apache.hadoop.net.NodeBase.init(NodeBase.java:65)
 at
 org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111)
 at
 org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.init(TaskAttemptImpl.java:548)
 at
 org.apache.hadoop.mapred.MapTaskAttemptImpl.init(MapTaskAttemptImpl.java:47)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866)
 at
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 at
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632)
 at
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99)
 at
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237)
 at
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231)
 at
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
 at java.lang.Thread.run(Thread.java:662)
 2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler]
 org.apache.hadoop.



-- 
Harsh J