[jira] [Comment Edited] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2016-07-20 Thread Nikhil Mulley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386041#comment-15386041
 ] 

Nikhil Mulley edited comment on YARN-3403 at 7/20/16 3:37 PM:
--

Has something changed in 2.6 w.r.t this jira?  I notice that I am using 2.6.4 
and this issue is not happening, xml-syntax wrong config files do not cause 
nodemanager to go down. Can someone please confirm this ?


was (Author: mnikhil):
Has anything changed in 2.6? 

> Nodemanager dies after a small typo in mapred-site.xml is induced
> -
>
> Key: YARN-3403
> URL: https://issues.apache.org/jira/browse/YARN-3403
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.2
>Reporter: Nikhil Mulley
>Priority: Critical
>
> Hi,
> We have noticed that with a small typo in terms of xml config 
> (mapred-site.xml) can cause the nodemanager go down completely without 
> stopping/restarting it externally.
> I find it little weird that editing the config files on the filesystem, could 
> cause the running slave daemon yarn nodemanager shutdown.
> In this case, I had a ending tag '/' missed in a property and that induced 
> the nodemanager go down in a cluster. 
> Why would nodemanager reload the configs while it is running? Are not they 
> picked up when they are started? Even if they are automated to pick up the 
> new configs dynamically, I think the xmllint/config checker should come in 
> before the nodemanager is asked to reload/restart.
>  
> ---
> java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
> file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
> element type "value" must be terminated by the matching end-tag "".
>at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
> ---
> Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2016-07-20 Thread Nikhil Mulley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386041#comment-15386041
 ] 

Nikhil Mulley commented on YARN-3403:
-

Has anything changed in 2.6? 

> Nodemanager dies after a small typo in mapred-site.xml is induced
> -
>
> Key: YARN-3403
> URL: https://issues.apache.org/jira/browse/YARN-3403
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.2
>Reporter: Nikhil Mulley
>Priority: Critical
>
> Hi,
> We have noticed that with a small typo in terms of xml config 
> (mapred-site.xml) can cause the nodemanager go down completely without 
> stopping/restarting it externally.
> I find it little weird that editing the config files on the filesystem, could 
> cause the running slave daemon yarn nodemanager shutdown.
> In this case, I had a ending tag '/' missed in a property and that induced 
> the nodemanager go down in a cluster. 
> Why would nodemanager reload the configs while it is running? Are not they 
> picked up when they are started? Even if they are automated to pick up the 
> new configs dynamically, I think the xmllint/config checker should come in 
> before the nodemanager is asked to reload/restart.
>  
> ---
> java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
> file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
> element type "value" must be terminated by the matching end-tag "".
>at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
> ---
> Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2016-07-20 Thread Nikhil Mulley (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Mulley updated YARN-3403:

Affects Version/s: 2.5.2

> Nodemanager dies after a small typo in mapred-site.xml is induced
> -
>
> Key: YARN-3403
> URL: https://issues.apache.org/jira/browse/YARN-3403
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.2
>Reporter: Nikhil Mulley
>Priority: Critical
>
> Hi,
> We have noticed that with a small typo in terms of xml config 
> (mapred-site.xml) can cause the nodemanager go down completely without 
> stopping/restarting it externally.
> I find it little weird that editing the config files on the filesystem, could 
> cause the running slave daemon yarn nodemanager shutdown.
> In this case, I had a ending tag '/' missed in a property and that induced 
> the nodemanager go down in a cluster. 
> Why would nodemanager reload the configs while it is running? Are not they 
> picked up when they are started? Even if they are automated to pick up the 
> new configs dynamically, I think the xmllint/config checker should come in 
> before the nodemanager is asked to reload/restart.
>  
> ---
> java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
> file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
> element type "value" must be terminated by the matching end-tag "".
>at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
> ---
> Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory

2016-07-19 Thread Nikhil Mulley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384516#comment-15384516
 ] 

Nikhil Mulley commented on YARN-5401:
-

So, should apps always use app specific methods to kill their jobs but never 
use yarn kill unless really necessary. (like always use kill(TERM) unless kill 
-9 becomes necessary)

> yarn application kill does not let mapreduce jobs show up in jobhistory
> ---
>
> Key: YARN-5401
> URL: https://issues.apache.org/jira/browse/YARN-5401
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
> Environment: centos 6.6
> apache hadoop 2.6.4
>Reporter: Nikhil Mulley
>
> Hi,
> Its been found in our cluster running apache hadoop 2.6.4, that while the 
> mapreduce jobs that are killed with 'hadoop job -kill' command do end up have 
> the job and its counters to jobhistory server but when 'yarn application 
> -kill' is used on mapreduce application, job does not show up in jobhistory 
> server interface.
> Is this intentional? If so, any particular reasons?
> It would be better to have mapreduce application history reported on 
> jobhistory  irrespective of whether kill is performed using yarn application 
> cli or hadoop job cli.
> thanks,
> Nikhil



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory

2016-07-18 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created YARN-5401:
---

 Summary: yarn application kill does not let mapreduce jobs show up 
in jobhistory
 Key: YARN-5401
 URL: https://issues.apache.org/jira/browse/YARN-5401
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
 Environment: centos 6.6
apache hadoop 2.6.4
Reporter: Nikhil Mulley


Hi,

Its been found in our cluster running apache hadoop 2.6.4, that while the 
mapreduce jobs that are killed with 'hadoop job -kill' command do end up have 
the job and its counters to jobhistory server but when 'yarn application -kill' 
is used on mapreduce application, job does not show up in jobhistory server 
interface.

Is this intentional? If so, any particular reasons?
It would be better to have mapreduce application history reported on jobhistory 
 irrespective of whether kill is performed using yarn application cli or hadoop 
job cli.

thanks,
Nikhil



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2408) Resource Request REST API for YARN

2015-04-23 Thread Nikhil Mulley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508586#comment-14508586
 ] 

Nikhil Mulley commented on YARN-2408:
-

Hi [~rdelvalle]

There are 8 people voting for it and 15 people watching this issue. I am not 
sure what is the requirement in the community for having a general interest 
though but I would be happy to help this move forward in terms of having the 
patch deployed on my test cluster and give it a whirl and see where it goes.

I am as well interested in the rest api to provide means to monitor the cluster 
resources, in general, to have a means to monitor the slow/starving jobs and 
the resources requested/consumed per app/job via rest api.

Nikhil

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
  Labels: features

 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API (a JSON counterpart is also available):
 {code:xml}
 resourceRequests
   MB7680/MB
   VCores7/VCores
   appMaster
 applicationIdapplication_1412191664217_0001/applicationId
 
 applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId
 queueNamedefault/queueName
 totalMB6144/totalMB
 totalVCores6/totalVCores
 numResourceRequests3/numResourceRequests
 requests
   request
 MB1024/MB
 VCores1/VCores
 numContainers6/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
 resourceNames
   resourceNamelocalMachine/resourceName
   resourceName/default-rack/resourceName
   resourceName*/resourceName
 /resourceNames
   /request
 /requests
   /appMaster
   appMaster
   ...
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-27 Thread Nikhil Mulley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383484#comment-14383484
 ] 

Nikhil Mulley commented on YARN-3403:
-

Hi [~Naganarasimha] this is with apache hadoop 2.5.1

 Nodemanager dies after a small typo in mapred-site.xml is induced
 -

 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley
Priority: Critical

 Hi,
 We have noticed that with a small typo in terms of xml config 
 (mapred-site.xml) can cause the nodemanager go down completely without 
 stopping/restarting it externally.
 I find it little weird that editing the config files on the filesystem, could 
 cause the running slave daemon yarn nodemanager shutdown.
 In this case, I had a ending tag '/' missed in a property and that induced 
 the nodemanager go down in a cluster. 
 Why would nodemanager reload the configs while it is running? Are not they 
 picked up when they are started? Even if they are automated to pick up the 
 new configs dynamically, I think the xmllint/config checker should come in 
 before the nodemanager is asked to reload/restart.
  
 ---
 java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
 file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
 element type value must be terminated by the matching end-tag /value.
at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
 ---
 Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-26 Thread Nikhil Mulley (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383285#comment-14383285
 ] 

Nikhil Mulley commented on YARN-3403:
-

The more stack trace is here:  this is reproducible.

---
2015-03-26 20:04:43,690 FATAL org.apache.hadoop.conf.Configuration: error 
parsing conf mapred-site.xml
org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; 
lineNumber: 316; columnNumber: 3; The element type property must be 
terminated by the matching end-tag /property.
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2171)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2242)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2195)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:858)
at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:877)
at 
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1278)
at 
org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:65)
at 
org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressorType(ZlibFactory.java:82)
at 
org.apache.hadoop.io.compress.DefaultCodec.getCompressorType(DefaultCodec.java:74)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at 
org.apache.hadoop.io.file.tfile.Compression$Algorithm.getCompressor(Compression.java:274)
at 
org.apache.hadoop.io.file.tfile.BCFile$Writer$WBlockState.init(BCFile.java:129)
at 
org.apache.hadoop.io.file.tfile.BCFile$Writer.prepareDataBlock(BCFile.java:430)
at 
org.apache.hadoop.io.file.tfile.TFile$Writer.initDataBlock(TFile.java:642)
at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:533)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.writeVersion(AggregatedLogFormat.java:276)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.init(AggregatedLogFormat.java:272)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:166)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:140)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:354)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-03-26 20:04:43,691 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Aggregation did not complete for application application_1426202183036_103251
2015-03-26 20:04:43,691 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[LogAggregationService #2,5,main] threw an Throwable, but we are shutting 
down, so ignoring this
java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The 
element type property must be terminated by the matching end-tag 
/property.
--

 Nodemanager dies after a small typo in mapred-site.xml is induced
 -

 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley
Priority: Critical

 Hi,
 We have noticed that with a small typo in terms of xml config 
 (mapred-site.xml) can cause the nodemanager go down completely without 
 stopping/restarting it externally.
 I find it little weird that editing the config files on the filesystem, could 
 cause the running slave daemon yarn nodemanager shutdown.
 In this case, I had a ending tag '/' missed in a property and that induced 
 the nodemanager go down in a cluster. 
 Why would 

[jira] [Updated] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-26 Thread Nikhil Mulley (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Mulley updated YARN-3403:

Priority: Critical  (was: Major)

 Nodemanager dies after a small typo in mapred-site.xml is induced
 -

 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley
Priority: Critical

 Hi,
 We have noticed that with a small typo in terms of xml config 
 (mapred-site.xml) can cause the nodemanager go down completely without 
 stopping/restarting it externally.
 I find it little weird that editing the config files on the filesystem, could 
 cause the running slave daemon yarn nodemanager shutdown.
 In this case, I had a ending tag '/' missed in a property and that induced 
 the nodemanager go down in a cluster. 
 Why would nodemanager reload the configs while it is running? Are not they 
 picked up when they are started? Even if they are automated to pick up the 
 new configs dynamically, I think the xmllint/config checker should come in 
 before the nodemanager is asked to reload/restart.
  
 ---
 java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
 file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
 element type value must be terminated by the matching end-tag /value.
at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
 ---
 Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-26 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created YARN-3403:
---

 Summary: Nodemanager dies after a small typo in mapred-site.xml is 
induced
 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley


Hi,

We have noticed that with a small typo in terms of xml config (mapred-site.xml) 
can cause the nodemanager go down completely without stopping/restarting it 
externally.

I find it little weird that editing the config files on the filesystem, could 
cause the running slave daemon yarn nodemanager shutdown.
In this case, I had a ending tag '/' missed in a property and that induced the 
nodemanager go down in a cluster. 
Why would nodemanager reload the configs while it is running? Are not they 
picked up when they are started? Even if they are automated to pick up the new 
configs dynamically, I think the xmllint/config checker should come in before 
the nodemanager is asked to reload/restart.
 
---
java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
element type value must be terminated by the matching end-tag /value.
   at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
---

Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2094) how to enable job counters for mapreduce or applications

2014-05-21 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created YARN-2094:
---

 Summary: how to enable job counters for mapreduce or applications
 Key: YARN-2094
 URL: https://issues.apache.org/jira/browse/YARN-2094
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley


Hi,

I was looking at MapReduce jobs in my YARN setup and was wondering about the 
jobcounters. I do not see the jobcounters for the mapreduce applications. When 
I browse through the web page for job counters, there are no job counters. Is 
there a specific setting to enable the application/job counters in YARN? Please 
let me know.

thanks,
Nikhil



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2044) thrift interface for YARN?

2014-05-11 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created YARN-2044:
---

 Summary: thrift interface for YARN?
 Key: YARN-2044
 URL: https://issues.apache.org/jira/browse/YARN-2044
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley


Hi,

I was searching for the thrift interface definitions for YARN but could not 
come across any. Is there any plan to have a thrift interface to YARN ? If 
there is already one, could some one please redirect me to the appropriate 
place?

thanks,
Nikhil



--
This message was sent by Atlassian JIRA
(v6.2#6252)