[jira] [Created] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips

2013-09-05 Thread yeshavora (JIRA)
yeshavora created YARN-1155:
---

 Summary: RM should resolve hostnames/ips in include/exclude files 
to support matching against both hostnames and ips
 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1154) RM should check do reverse lookup of NM hostname on registration and disallow registration if lookup fails

2013-09-05 Thread yeshavora (JIRA)
yeshavora created YARN-1154:
---

 Summary: RM should check do reverse lookup of NM hostname on 
registration and disallow registration if lookup fails
 Key: YARN-1154
 URL: https://issues.apache.org/jira/browse/YARN-1154
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1154) RM should check do reverse lookup of NM hostname on registration and disallow registration if lookup fails

2013-09-05 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora updated YARN-1154:


Description: RM should be able to do reverse lookup when NM tries to 
register. NM registration should fail if lookup fails.

 RM should check do reverse lookup of NM hostname on registration and disallow 
 registration if lookup fails
 --

 Key: YARN-1154
 URL: https://issues.apache.org/jira/browse/YARN-1154
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to do reverse lookup when NM tries to register. NM 
 registration should fail if lookup fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips

2013-09-05 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora updated YARN-1155:


Description: RM should be able to support both hostnames and ips 

 RM should resolve hostnames/ips in include/exclude files to support matching 
 against both hostnames and ips
 ---

 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to support both hostnames and ips 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1155) RM should resolve hostnames/ips in include/exclude files to support matching against both hostnames and ips

2013-09-05 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora updated YARN-1155:


Description: RM should be able to resolve both ips and host names from 
include and exclude files.   (was: RM should be able to support both hostnames 
and ips )

 RM should resolve hostnames/ips in include/exclude files to support matching 
 against both hostnames and ips
 ---

 Key: YARN-1155
 URL: https://issues.apache.org/jira/browse/YARN-1155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 RM should be able to resolve both ips and host names from include and exclude 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1154) RM should check do reverse lookup of NM hostname on registration and disallow registration if lookup fails

2013-09-05 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora updated YARN-1154:


Description: While new nodemanager is being added (adding nodename in yarn 
include file), RM should be able to do reverse lookup when NM tries to 
register. NM registration should fail if lookup fails.  (was: While new 
nodemanager is being added adding it in include file, RM should be able to do 
reverse lookup when NM tries to register. NM registration should fail if lookup 
fails.)

 RM should check do reverse lookup of NM hostname on registration and disallow 
 registration if lookup fails
 --

 Key: YARN-1154
 URL: https://issues.apache.org/jira/browse/YARN-1154
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Assignee: Xuan Gong

 While new nodemanager is being added (adding nodename in yarn include file), 
 RM should be able to do reverse lookup when NM tries to register. NM 
 registration should fail if lookup fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1129) Job hungs when any node is blacklisted after RMrestart

2013-08-30 Thread yeshavora (JIRA)
yeshavora created YARN-1129:
---

 Summary: Job hungs when any node is blacklisted after RMrestart
 Key: YARN-1129
 URL: https://issues.apache.org/jira/browse/YARN-1129
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora


When RM restarted, if during restart one NM went bad (bad disk), NM got 
blacklisted by AM and RM keeps giving the containers on the same node even 
though AM doesn't want it there.

Need to change AM to specifically blacklist node in the RM requests.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1057) Add mechanism to check validity of a Node to be Added/Excluded

2013-08-29 Thread yeshavora (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753992#comment-13753992
 ] 

yeshavora commented on YARN-1057:
-

Hitesh, the usecase will be when invalid hosts are added in include/exclude 
file, Yarn should recognize them and does not add/remove node from cluster. 

Yarn should not print below message.
INFO  util.HostsFileReader (HostsFileReader.java:readFileToSet(68)) - Adding 
invalidhost.net to the list of included hosts from /tmp/yarn.include 

Ideally it should say something like java.net.UnknownHostException: 
invalidhost.net

I believe RM shutdown is not needed as long as It can verify existence of a 
host. 

 Add mechanism to check validity of a Node to be Added/Excluded
 --

 Key: YARN-1057
 URL: https://issues.apache.org/jira/browse/YARN-1057
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: yeshavora
Assignee: Xuan Gong
 Attachments: YARN-1057.1.patch


 Yarn does not complain while passing an invalid hostname like 
 'invalidhost.com' inside include/exclude node file. (specified by 
 'yarn.resourcemanager.nodes.include-path' or 
 'yarn.resourcemanager.nodes.exclude-path').
 Need to add a mechanism to check the validity of the hostname before 
 including or excluding from cluster. It should throw an error / exception 
 while adding/removing an invalid node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1090) Job does not get into Pending State

2013-08-21 Thread yeshavora (JIRA)
yeshavora created YARN-1090:
---

 Summary: Job does not get into Pending State
 Key: YARN-1090
 URL: https://issues.apache.org/jira/browse/YARN-1090
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora


When there is no resource available to run a job, next job should go in pending 
state. RM UI should show next job as pending app and the counter for the 
pending app should be incremented.

But Currently. Next job stays in ACCEPTED state and No AM has been assigned to 
this job.Though Pending App count is not incremented. 
Running 'job status nextjob' shows job state=PREP. 

$ mapred job -status job_1377122233385_0002
13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
host1/ip1

Job: job_1377122233385_0002
Job File: /ABC/.staging/job_1377122233385_0002/job.xml
Job Tracking URL : http://host1:port1/application_1377122233385_0002/
Uber job : false
Number of maps: 0
Number of reduces: 0
map() completion: 0.0
reduce() completion: 0.0
Job state: PREP
retired: false
reason for failure:

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1086) reducer of sort job restarts from scratch in between after RM restart

2013-08-20 Thread yeshavora (JIRA)
yeshavora created YARN-1086:
---

 Summary: reducer of sort job restarts from scratch in between 
after RM restart
 Key: YARN-1086
 URL: https://issues.apache.org/jira/browse/YARN-1086
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Priority: Blocker


Steps Followed:
1) Run a sort job. As soon as it finishes all the map tasks. [100% map], 
restart resource manager.

2) Analyse the progress of the sort job.
It starts with 100% map 0% reduce
100% map 32% reduce
100% map 0% reduce
Reducer stays at 30% reduce for around 5-10 minutes. and again start reducer 
from scratch.

Log from failed reducer attempt:

Error: java.io.IOException: Error while reading compressed data at 
org.apache.hadoop.io.IOUtils.wrappedReadForCompressedData(IOUtils.java:174) at 
org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:383) at 
org.apache.hadoop.mapred.IFile$Reader.nextRawValue(IFile.java:444) at 
org.apache.hadoop.mapred.Merger$Segment.nextRawValue(Merger.java:327) at 
org.apache.hadoop.mapred.Merger$Segment.getValue(Merger.java:309) at 
org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:533) at 
org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:619) at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:154)
 at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
 at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: 
org.apache.hadoop.fs.FSError: java.io.IOException: Input/output error at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:177)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at 
java.io.BufferedInputStream.read(BufferedInputStream.java:317) at 
java.io.DataInputStream.read(DataInputStream.java:132) at 
org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:209) at 
org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:152) at 
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:127)
 at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
 at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
 at org.apache.hadoop.io.IOUtils.wrappedReadForCompressedData(IOUtils.java:170) 
... 17 more Caused by: java.io.IOException: Input/output error at 
java.io.FileInputStream.readBytes(Native Method) at 
java.io.FileInputStream.read(FileInputStream.java:220) at 
org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.read(RawLocalFileSystem.java:110)
 at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:171)
 ... 26 more


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1087) Succeed job tries to restart after RMrestart

2013-08-20 Thread yeshavora (JIRA)
yeshavora created YARN-1087:
---

 Summary: Succeed job tries to restart after RMrestart
 Key: YARN-1087
 URL: https://issues.apache.org/jira/browse/YARN-1087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora
Priority: Blocker


Run a job , restart RM when job just finished. It should not restart the job 
once it Succeed.

After RM restart, The AM of restarted job fails with below error.

AM log after Rmrestart:

013-08-19 17:29:21,144 INFO [main] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
JobHistoryEventHandler. Size of the outstanding queue size is 0
2013-08-19 17:29:21,145 INFO [main] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped 
JobHistoryEventHandler. super.stop()
2013-08-19 17:29:21,146 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory 
hdfs://host1:port1/user/ABC/.staging/job_1376933101704_0001
2013-08-19 17:29:21,156 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: File does not exist: 
hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1469)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1324)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1291)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:922)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:131)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1184)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:995)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1394)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1390)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1323)
Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://host1:port1/ABC/.staging/job_1376933101704_0001/job.splitmetainfo
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1121)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1113)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:78)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1113)
at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1464)
... 17 more
2013-08-19 17:29:21,158 INFO [Thread-2] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. 
Signaling RMCommunicator and JobHistoryEventHandler.
2013-08-19 17:29:21,159 WARN [Thread-2] 
org.apache.hadoop.util.ShutdownHookManager: ShutdownHook 
'MRAppMasterShutdownHook' failed, java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.setSignalled(MRAppMaster.java:805)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1344)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval

2013-08-19 Thread yeshavora (JIRA)
yeshavora created YARN-1083:
---

 Summary: ResourceManager should fail when 
yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
 Key: YARN-1083
 URL: https://issues.apache.org/jira/browse/YARN-1083
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora


if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat 
iterval, all the node managers will be added in 'Lost Nodes'

Instead, Resource Manager should validate these property and It should fail to 
start if combination of such property is invalid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval

2013-08-19 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora updated YARN-1083:


Affects Version/s: 2.1.0-beta

 ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms 
 is set less than heartbeat interval
 

 Key: YARN-1083
 URL: https://issues.apache.org/jira/browse/YARN-1083
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora

 if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than 
 heartbeat iterval, all the node managers will be added in 'Lost Nodes'
 Instead, Resource Manager should validate these property and It should fail 
 to start if combination of such property is invalid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1084) RM restart does not work for map only job

2013-08-19 Thread yeshavora (JIRA)
yeshavora created YARN-1084:
---

 Summary: RM restart does not work for map only job
 Key: YARN-1084
 URL: https://issues.apache.org/jira/browse/YARN-1084
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora


Map only job (randomwriter, randomtextwriter) restarts from scratch [0% map 0% 
reduce] after RM restart.
It should resume from the last state when RM restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1057) Add mechanism to check validity of a Node to be Added/Excluded

2013-08-12 Thread yeshavora (JIRA)
yeshavora created YARN-1057:
---

 Summary: Add mechanism to check validity of a Node to be 
Added/Excluded
 Key: YARN-1057
 URL: https://issues.apache.org/jira/browse/YARN-1057
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora


Yarn does not complain while passing an invalid hostname like 'invalidhost.com' 
inside include/exclude node file. (specified by 
'yarn.resourcemanager.nodes.include-path' or 
'yarn.resourcemanager.nodes.exclude-path').

Need to add a mechanism to check the validity of the hostname before including 
or excluding from cluster. It should throw an error / exception while 
adding/removing an invalid node.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1057) Add mechanism to check validity of a Node to be Added/Excluded

2013-08-12 Thread yeshavora (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737030#comment-13737030
 ] 

yeshavora commented on YARN-1057:
-

By 'Invalide hostname/node', I mean the incorrect name of the host/node which 
does not exists. Currently, Yarn just adds the invalid hostname without 
checking its existence. Like below. 

INFO  util.HostsFileReader (HostsFileReader.java:readFileToSet(68)) - Adding 
invalidhost.net to the list of included hosts from Include_Yarn_File  

Yarn should firstly confirms host's existence and then include/exclude them. 

 Add mechanism to check validity of a Node to be Added/Excluded
 --

 Key: YARN-1057
 URL: https://issues.apache.org/jira/browse/YARN-1057
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: yeshavora

 Yarn does not complain while passing an invalid hostname like 
 'invalidhost.com' inside include/exclude node file. (specified by 
 'yarn.resourcemanager.nodes.include-path' or 
 'yarn.resourcemanager.nodes.exclude-path').
 Need to add a mechanism to check the validity of the hostname before 
 including or excluding from cluster. It should throw an error / exception 
 while adding/removing an invalid node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-775) stream jobs are not cleaning the Yarn local-dirs after container is released

2013-06-07 Thread yeshavora (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678174#comment-13678174
 ] 

yeshavora commented on YARN-775:


Setting yarn.nodemanager.delete.debug-delay-sec solves the issue. Closing this 
Jira

 stream jobs are not cleaning the Yarn local-dirs after container is released
 

 Key: YARN-775
 URL: https://issues.apache.org/jira/browse/YARN-775
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.0-beta


 Run a stream job:
 hadoop jar hadoop-streaming.jar -files file:///tmp/Tmp.py -input Tmp.py 
 -output /tmp/Tmpout -mapper python Tmp.py -reducer NONE
 Container Dirs are not being cleaned after Stream job is 
 completed/Killed/Failed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-775) stream jobs are not cleaning the Yarn local-dirs after container is released

2013-06-07 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora resolved YARN-775.


Resolution: Invalid

 stream jobs are not cleaning the Yarn local-dirs after container is released
 

 Key: YARN-775
 URL: https://issues.apache.org/jira/browse/YARN-775
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.0-beta


 Run a stream job:
 hadoop jar hadoop-streaming.jar -files file:///tmp/Tmp.py -input Tmp.py 
 -output /tmp/Tmpout -mapper python Tmp.py -reducer NONE
 Container Dirs are not being cleaned after Stream job is 
 completed/Killed/Failed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-775) stream jobs are not cleaning the Yarn local-dirs after container is released

2013-06-06 Thread yeshavora (JIRA)
yeshavora created YARN-775:
--

 Summary: stream jobs are not cleaning the Yarn local-dirs after 
container is released
 Key: YARN-775
 URL: https://issues.apache.org/jira/browse/YARN-775
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora
 Fix For: 2.1.0-beta


Run a stream job:
hadoop jar hadoop-streaming.jar -files file:///tmp/Tmp.py -input Tmp.py -output 
/tmp/Tmpout -mapper python Tmp.py -reducer NONE

Container Dirs are not being cleaned after Stream job is 
completed/Killed/Failed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-775) stream jobs are not cleaning the Yarn local-dirs after container is released

2013-06-06 Thread yeshavora (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677636#comment-13677636
 ] 

yeshavora commented on YARN-775:


I was not setting yarn.nodemanager.delete.debug-delay-sec to 0. I will retest 
with above property set to 0. And also submit nm and rm logs if reproduced.

 stream jobs are not cleaning the Yarn local-dirs after container is released
 

 Key: YARN-775
 URL: https://issues.apache.org/jira/browse/YARN-775
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.0-beta


 Run a stream job:
 hadoop jar hadoop-streaming.jar -files file:///tmp/Tmp.py -input Tmp.py 
 -output /tmp/Tmpout -mapper python Tmp.py -reducer NONE
 Container Dirs are not being cleaned after Stream job is 
 completed/Killed/Failed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira