[jira] [Comment Edited] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386041#comment-15386041 ] Nikhil Mulley edited comment on YARN-3403 at 7/20/16 3:37 PM: -- Has something changed in 2.6 w.r.t this jira? I notice that I am using 2.6.4 and this issue is not happening, xml-syntax wrong config files do not cause nodemanager to go down. Can someone please confirm this ? was (Author: mnikhil): Has anything changed in 2.6? > Nodemanager dies after a small typo in mapred-site.xml is induced > - > > Key: YARN-3403 > URL: https://issues.apache.org/jira/browse/YARN-3403 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.5.2 >Reporter: Nikhil Mulley >Priority: Critical > > Hi, > We have noticed that with a small typo in terms of xml config > (mapred-site.xml) can cause the nodemanager go down completely without > stopping/restarting it externally. > I find it little weird that editing the config files on the filesystem, could > cause the running slave daemon yarn nodemanager shutdown. > In this case, I had a ending tag '/' missed in a property and that induced > the nodemanager go down in a cluster. > Why would nodemanager reload the configs while it is running? Are not they > picked up when they are started? Even if they are automated to pick up the > new configs dynamically, I think the xmllint/config checker should come in > before the nodemanager is asked to reload/restart. > > --- > java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: > file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The > element type "value" must be terminated by the matching end-tag "". >at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) > --- > Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386041#comment-15386041 ] Nikhil Mulley commented on YARN-3403: - Has anything changed in 2.6? > Nodemanager dies after a small typo in mapred-site.xml is induced > - > > Key: YARN-3403 > URL: https://issues.apache.org/jira/browse/YARN-3403 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.5.2 >Reporter: Nikhil Mulley >Priority: Critical > > Hi, > We have noticed that with a small typo in terms of xml config > (mapred-site.xml) can cause the nodemanager go down completely without > stopping/restarting it externally. > I find it little weird that editing the config files on the filesystem, could > cause the running slave daemon yarn nodemanager shutdown. > In this case, I had a ending tag '/' missed in a property and that induced > the nodemanager go down in a cluster. > Why would nodemanager reload the configs while it is running? Are not they > picked up when they are started? Even if they are automated to pick up the > new configs dynamically, I think the xmllint/config checker should come in > before the nodemanager is asked to reload/restart. > > --- > java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: > file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The > element type "value" must be terminated by the matching end-tag "". >at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) > --- > Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikhil Mulley updated YARN-3403: Affects Version/s: 2.5.2 > Nodemanager dies after a small typo in mapred-site.xml is induced > - > > Key: YARN-3403 > URL: https://issues.apache.org/jira/browse/YARN-3403 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.5.2 >Reporter: Nikhil Mulley >Priority: Critical > > Hi, > We have noticed that with a small typo in terms of xml config > (mapred-site.xml) can cause the nodemanager go down completely without > stopping/restarting it externally. > I find it little weird that editing the config files on the filesystem, could > cause the running slave daemon yarn nodemanager shutdown. > In this case, I had a ending tag '/' missed in a property and that induced > the nodemanager go down in a cluster. > Why would nodemanager reload the configs while it is running? Are not they > picked up when they are started? Even if they are automated to pick up the > new configs dynamically, I think the xmllint/config checker should come in > before the nodemanager is asked to reload/restart. > > --- > java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: > file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The > element type "value" must be terminated by the matching end-tag "". >at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) > --- > Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory
[ https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384516#comment-15384516 ] Nikhil Mulley commented on YARN-5401: - So, should apps always use app specific methods to kill their jobs but never use yarn kill unless really necessary. (like always use kill(TERM) unless kill -9 becomes necessary) > yarn application kill does not let mapreduce jobs show up in jobhistory > --- > > Key: YARN-5401 > URL: https://issues.apache.org/jira/browse/YARN-5401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Environment: centos 6.6 > apache hadoop 2.6.4 >Reporter: Nikhil Mulley > > Hi, > Its been found in our cluster running apache hadoop 2.6.4, that while the > mapreduce jobs that are killed with 'hadoop job -kill' command do end up have > the job and its counters to jobhistory server but when 'yarn application > -kill' is used on mapreduce application, job does not show up in jobhistory > server interface. > Is this intentional? If so, any particular reasons? > It would be better to have mapreduce application history reported on > jobhistory irrespective of whether kill is performed using yarn application > cli or hadoop job cli. > thanks, > Nikhil -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory
Nikhil Mulley created YARN-5401: --- Summary: yarn application kill does not let mapreduce jobs show up in jobhistory Key: YARN-5401 URL: https://issues.apache.org/jira/browse/YARN-5401 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: centos 6.6 apache hadoop 2.6.4 Reporter: Nikhil Mulley Hi, Its been found in our cluster running apache hadoop 2.6.4, that while the mapreduce jobs that are killed with 'hadoop job -kill' command do end up have the job and its counters to jobhistory server but when 'yarn application -kill' is used on mapreduce application, job does not show up in jobhistory server interface. Is this intentional? If so, any particular reasons? It would be better to have mapreduce application history reported on jobhistory irrespective of whether kill is performed using yarn application cli or hadoop job cli. thanks, Nikhil -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508586#comment-14508586 ] Nikhil Mulley commented on YARN-2408: - Hi [~rdelvalle] There are 8 people voting for it and 15 people watching this issue. I am not sure what is the requirement in the community for having a general interest though but I would be happy to help this move forward in terms of having the patch deployed on my test cluster and give it a whirl and see where it goes. I am as well interested in the rest api to provide means to monitor the cluster resources, in general, to have a means to monitor the slow/starving jobs and the resources requested/consumed per app/job via rest api. Nikhil Resource Request REST API for YARN -- Key: YARN-2408 URL: https://issues.apache.org/jira/browse/YARN-2408 Project: Hadoop YARN Issue Type: New Feature Components: webapp Reporter: Renan DelValle Labels: features I’m proposing a new REST API for YARN which exposes a snapshot of the Resource Requests that exist inside of the Scheduler. My motivation behind this new feature is to allow external software to monitor the amount of resources being requested to gain more insightful information into cluster usage than is already provided. The API can also be used by external software to detect a starved application and alert the appropriate users and/or sys admin so that the problem may be remedied. Here is the proposed API (a JSON counterpart is also available): {code:xml} resourceRequests MB7680/MB VCores7/VCores appMaster applicationIdapplication_1412191664217_0001/applicationId applicationAttemptIdappattempt_1412191664217_0001_01/applicationAttemptId queueNamedefault/queueName totalMB6144/totalMB totalVCores6/totalVCores numResourceRequests3/numResourceRequests requests request MB1024/MB VCores1/VCores numContainers6/numContainers relaxLocalitytrue/relaxLocality priority20/priority resourceNames resourceNamelocalMachine/resourceName resourceName/default-rack/resourceName resourceName*/resourceName /resourceNames /request /requests /appMaster appMaster ... /appMaster /resourceRequests {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383484#comment-14383484 ] Nikhil Mulley commented on YARN-3403: - Hi [~Naganarasimha] this is with apache hadoop 2.5.1 Nodemanager dies after a small typo in mapred-site.xml is induced - Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Priority: Critical Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would nodemanager reload the configs while it is running? Are not they picked up when they are started? Even if they are automated to pick up the new configs dynamically, I think the xmllint/config checker should come in before the nodemanager is asked to reload/restart. --- java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The element type value must be terminated by the matching end-tag /value. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) --- Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383285#comment-14383285 ] Nikhil Mulley commented on YARN-3403: - The more stack trace is here: this is reproducible. --- 2015-03-26 20:04:43,690 FATAL org.apache.hadoop.conf.Configuration: error parsing conf mapred-site.xml org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The element type property must be terminated by the matching end-tag /property. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2171) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2242) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2195) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112) at org.apache.hadoop.conf.Configuration.get(Configuration.java:858) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:877) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1278) at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:65) at org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressorType(ZlibFactory.java:82) at org.apache.hadoop.io.compress.DefaultCodec.getCompressorType(DefaultCodec.java:74) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.io.file.tfile.Compression$Algorithm.getCompressor(Compression.java:274) at org.apache.hadoop.io.file.tfile.BCFile$Writer$WBlockState.init(BCFile.java:129) at org.apache.hadoop.io.file.tfile.BCFile$Writer.prepareDataBlock(BCFile.java:430) at org.apache.hadoop.io.file.tfile.TFile$Writer.initDataBlock(TFile.java:642) at org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:533) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.writeVersion(AggregatedLogFormat.java:276) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.init(AggregatedLogFormat.java:272) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:166) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:140) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-26 20:04:43,691 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Aggregation did not complete for application application_1426202183036_103251 2015-03-26 20:04:43,691 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LogAggregationService #2,5,main] threw an Throwable, but we are shutting down, so ignoring this java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The element type property must be terminated by the matching end-tag /property. -- Nodemanager dies after a small typo in mapred-site.xml is induced - Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Priority: Critical Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would
[jira] [Updated] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikhil Mulley updated YARN-3403: Priority: Critical (was: Major) Nodemanager dies after a small typo in mapred-site.xml is induced - Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Priority: Critical Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would nodemanager reload the configs while it is running? Are not they picked up when they are started? Even if they are automated to pick up the new configs dynamically, I think the xmllint/config checker should come in before the nodemanager is asked to reload/restart. --- java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The element type value must be terminated by the matching end-tag /value. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) --- Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
Nikhil Mulley created YARN-3403: --- Summary: Nodemanager dies after a small typo in mapred-site.xml is induced Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would nodemanager reload the configs while it is running? Are not they picked up when they are started? Even if they are automated to pick up the new configs dynamically, I think the xmllint/config checker should come in before the nodemanager is asked to reload/restart. --- java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The element type value must be terminated by the matching end-tag /value. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) --- Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2094) how to enable job counters for mapreduce or applications
Nikhil Mulley created YARN-2094: --- Summary: how to enable job counters for mapreduce or applications Key: YARN-2094 URL: https://issues.apache.org/jira/browse/YARN-2094 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Hi, I was looking at MapReduce jobs in my YARN setup and was wondering about the jobcounters. I do not see the jobcounters for the mapreduce applications. When I browse through the web page for job counters, there are no job counters. Is there a specific setting to enable the application/job counters in YARN? Please let me know. thanks, Nikhil -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2044) thrift interface for YARN?
Nikhil Mulley created YARN-2044: --- Summary: thrift interface for YARN? Key: YARN-2044 URL: https://issues.apache.org/jira/browse/YARN-2044 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Hi, I was searching for the thrift interface definitions for YARN but could not come across any. Is there any plan to have a thrift interface to YARN ? If there is already one, could some one please redirect me to the appropriate place? thanks, Nikhil -- This message was sent by Atlassian JIRA (v6.2#6252)