[jira] [Commented] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-03-23 Thread zeekling (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047
 ] 

zeekling commented on YARN-2024:


I have the same proplem in Hadoop 3.1.1

 

!image-2024-03-23-17-22-00-057.png!

 

 

 

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-03-23 Thread zeekling (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047
 ] 

zeekling edited comment on YARN-2024 at 3/23/24 9:23 AM:
-

I have the same proplem in Hadoop 3.1.1

 

2024-02-17 01:09:21,112 | INFO  | SchedulerEventDispatcher:Event Processor | 
container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to 
COMPLETED | RMContainerImpl.java:480
2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in 
dispatcher thread | AsyncDispatcher.java:233
java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY
    at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144)
    at java.lang.Thread.run(Thread.java:748)

 


was (Author: JIRAUSER299659):
I have the same proplem in Hadoop 3.1.1

 

2024-02-17 01:09:21,112 | INFO  | SchedulerEventDispatcher:Event Processor | 
container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to 
COMPLETED | RMContainerImpl.java:480
2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in 
dispatcher thread | AsyncDispatcher.java:233
java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY
    at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144)
    at java.lang.Thread.run(Thread.java:748)

 

 

 

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> 

[jira] [Comment Edited] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-03-23 Thread zeekling (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830047#comment-17830047
 ] 

zeekling edited comment on YARN-2024 at 3/23/24 9:23 AM:
-

I have the same proplem in Hadoop 3.1.1

 

2024-02-17 01:09:21,112 | INFO  | SchedulerEventDispatcher:Event Processor | 
container_e65_1707884856539_27553_01_66 Container Transitioned from NEW to 
COMPLETED | RMContainerImpl.java:480
2024-02-17 01:09:21,112 | FATAL | RM ApplicationHistory dispatcher | Error in 
dispatcher thread | AsyncDispatcher.java:233
java.lang.IllegalStateException: Incorrect state to start a new key: END_KEY
    at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:530)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.writeHistoryData(FileSystemApplicationHistoryStore.java:756)
    at 
org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.containerStarted(FileSystemApplicationHistoryStore.java:523)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:198)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:304)
    at 
org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:299)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:227)
    at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:144)
    at java.lang.Thread.run(Thread.java:748)

 

 

 


was (Author: JIRAUSER299659):
I have the same proplem in Hadoop 3.1.1

 

!image-2024-03-23-17-22-00-057.png!

 

 

 

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11665) Hive jobs support aggregating logs according to real users

2024-03-22 Thread zeekling (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zeekling updated YARN-11665:

Issue Type: Wish  (was: Improvement)

> Hive jobs support aggregating logs according to real users
> --
>
> Key: YARN-11665
> URL: https://issues.apache.org/jira/browse/YARN-11665
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: log-aggregation
>Reporter: zeekling
>Priority: Major
>
> Currently, hive job logs are in /tmp/logs/hive/bucket/appId ,can we aggregate 
> logs against real users running hive jobs, like /tmp/logs/hive/\{real 
> user}/bucket/appId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11665) Hive jobs support aggregating logs according to real users

2024-03-17 Thread zeekling (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zeekling updated YARN-11665:

Summary: Hive jobs support aggregating logs according to real users  (was: 
hive jobs support aggregating logs according to real users)

> Hive jobs support aggregating logs according to real users
> --
>
> Key: YARN-11665
> URL: https://issues.apache.org/jira/browse/YARN-11665
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Reporter: zeekling
>Priority: Major
>
> Currently, hive job logs are in /tmp/logs/hive/bucket/appId ,can we aggregate 
> logs against real users running hive jobs, like /tmp/logs/hive/\{real 
> user}/bucket/appId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11665) hive jobs support aggregating logs according to real users

2024-03-17 Thread zeekling (Jira)
zeekling created YARN-11665:
---

 Summary: hive jobs support aggregating logs according to real users
 Key: YARN-11665
 URL: https://issues.apache.org/jira/browse/YARN-11665
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Reporter: zeekling


Currently, hive job logs are in /tmp/logs/hive/bucket/appId ,can we aggregate 
logs against real users running hive jobs, like /tmp/logs/hive/\{real 
user}/bucket/appId



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org