[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937839#comment-15937839
 ] 

Varun Saxena commented on YARN-6342:


bq. I'll leave this jira open to make the wait period configurable and file 
another jira to fix the other issue that NMTimlinePublisher does not stop 
timeline clients upon stop()
Ok.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937458#comment-15937458
 ] 

Haibo Chen commented on YARN-6342:
--

I have filed YARN-6377 to have NMTimelinePublisher stop all timeline clients 
upon stop(). Removing the merge blocker label from this jira.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>  Labels: yarn-5355-merge-blocker
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937429#comment-15937429
 ] 

Haibo Chen commented on YARN-6342:
--

I'll leave this jira open to make the wait period configurable and file another 
jira to fix the other issue that NMTimlinePublisher does not stop timeline 
clients upon stop()

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>  Labels: yarn-5355-merge-blocker
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-21 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935667#comment-15935667
 ] 

Rohith Sharma K S commented on YARN-6342:
-

Shall we also fix draining of all entities which are exist in queue may be 
instead of hard coding for 2 seconds, shall it be taken as configuration? IMO 2 
seconds is too less where many entities are not published during stop.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>  Labels: yarn-5355-merge-blocker
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-21 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935313#comment-15935313
 ] 

Varun Saxena commented on YARN-6342:


[~haibochen], correct. FutureTask#run won't throw any exception.

How about fixing the other issue which I pointed out?
While stopping i.e. in NMTimelinePublisher#serviceStop we are not explicitly 
stopping app related timeline clients which would mean that some of the 
entities lying around in the queue may not be written...
Thoughts?

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>  Labels: yarn-5355-merge-blocker
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933724#comment-15933724
 ] 

Haibo Chen commented on YARN-6342:
--

publishWithoutBlockingOnQueue() will only throw InterruptedExceptions from 
calling queue.poll(), which is already taken care of by the current code. 
EntitiesHolder.run() never throws an exception based on FutureTask java doc and 
my little testing. Therefore, I think the publishing thread never exits because 
of unexpected exceptions in publishing entities.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Haibo Chen
>  Labels: yarn-5355-merge-blocker
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-16 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928977#comment-15928977
 ] 

Joep Rottinghuis commented on YARN-6342:


A possible other approach is to code / configure one overall timeout under 
which we need to stay to shut down all clients. We can then give each client a 
fraction thereof and keep track of how much is left. I'm doing something 
similar in the spooling code see YARN-4061 and HBASE-17018 (thought that isn't 
complete yet and I need to pick up that work again).

Wrt. data loss on shut-down, note that the loss will be limited to 
TIMELINE_SERVICE_WRITER_FLUSH_INTERVAL_SECONDS, which defaults to 1 minute. On 
average the loss could be half that, but in the worst case it would be 30 
seconds.
The timeout during shutdown and the timout at which we detect HBase doesn't 
accept writes (and we end up spooling to file) should be carefully tuned to not 
loose any data under normal operating circumstances, even when HBase is down.
Perhaps we should have a config for this, or at least have one be a multiple of 
the other. I'll keep this in mind in the spooling work.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926764#comment-15926764
 ] 

Varun Saxena commented on YARN-6342:


Yes, it was a conscious decision to wait for upto 2 seconds. The assumption 
here was that timeline client will be going down when JVM goes down. The points 
considered were as under:
# _Why not drain the queue?_ - We did not go with it because it is possible 
that the collector we are writing to may be down. This can increase the time 
taken by the daemon to shutdown. Moreover, as dispatcher would still be 
running, we can never ensure that no events are lost on shutdown anyways. It 
would always be on a best-effort basis.
# _Why wait for 2 seconds?_ - This seemed like a suitable interval. NM for 
instance may have multiple timeline clients' as multiple apps will be handled 
by a single NM so time for stop can be a multiple of 2 seconds. We did not make 
this configurable because this maybe yet another config which nobody uses.

If timeline client is the last thing to be stopped, 2 seconds should be enough 
to drain most of the events in the queue, barring those stuck in dispatcher. 
AMs' can probably ensure that timeline client is the last thing to go away. 

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-15 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926600#comment-15926600
 ] 

Rohith Sharma K S commented on YARN-6342:
-

Thanks Varun for the pointer. I have another doubt in exception handling of 
createRunnable method. Here, trying to publish as much as possible with in 2 
seconds only. Why this restriction of 2 seconds? There could be more entities 
added into queue which will never get published during stop.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6342) Issues in async API of TimelineClient

2017-03-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925913#comment-15925913
 ] 

Varun Saxena commented on YARN-6342:


bq. In stop: it calls shutdownNow which doens't wait for pending tasks, should 
it use shutdown instead ?
The executor we use in TimelineV2ClientImpl is a single thread executor with 
only one task i.e. the runnable created by createRunnable(). So there wont be 
any pending tasks. And inside createRunnable()'s run method, in finally block, 
we try to dispatch as many events as we can in 2 seconds.

bq. If any exception happens when publish one entity 
(publishWithoutBlockingOnQueue), the thread exists
This makes sense. I think we should catch the exception so that thread does not 
exit.

> Issues in async API of TimelineClient
> -
>
> Key: YARN-6342
> URL: https://issues.apache.org/jira/browse/YARN-6342
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>
> Found these with [~rohithsharma] while browsing the code
> - In stop: it calls shutdownNow which doens't wait for pending tasks, should 
> it use shutdown instead ?
> {code}
> public void stop() {
>   LOG.info("Stopping TimelineClient.");
>   executor.shutdownNow();
>   try {
> executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS);
>   } catch (InterruptedException e) {
> {code}
> - In TimelineClientImpl#createRunnable:
> If any exception happens when publish one entity 
> (publishWithoutBlockingOnQueue), the thread exists. I think it should try 
> best effort to continue publishing the timeline entities, one failure should 
> not cause all followup entities not published.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org