Override dfs.socket.timeout parameter for oozie job

Badri Thu, 18 Jul 2013 02:01:10 -0700

We are trying to run a distcp workflow action with "-update" flag. Thisaction attempts to copy around 5 TB of data around the cluster. Theaction keeps timing out in subsequent runs (not the first time though!)and the exception shown is:


With failures, global counters are inaccurate; consider running with -i
Copy failed: java.net.ConnectException: Connection timed out
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

atsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)atorg.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)

atorg.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:750)atorg.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:711)atorg.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:553)atorg.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:53)

    at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1245)
    at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1120)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)

atorg.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:391)

    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)

atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

    at org.apache.hadoop.mapred.Child.main(Child.java:264)


Intercepting System.exit(-999)

Failing Oozie Launcher, Main class [org.apache.hadoop.tools.DistCp],exit code [-999]



Of course, we use both "-i" and "-update" flags.

Oozie client build version: 2.3.2-cdh3u2
Hadoop 0.20.2-cdh3u2

After investigating the code around the exception, we decided toincrease the dfs.socket.timeout from the default "60 * 1000" to"300000". Local tests confirm that this _could_ fix our timeout problem.However, we do not want this parameter to be changed for the wholecluster, but just for this oozie job. Is there a way to override thisparameter only when invoking the job via oozie?


Thanks,
Badri

Override dfs.socket.timeout parameter for oozie job

Reply via email to