[jira] [Commented] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-12 Thread Zichen Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684546#comment-16684546
 ] 

Zichen Sun commented on MAPREDUCE-7158:
---

the patch has no functionality change, it's been verified in a test cluster 
using 10TB benchmark data and we the performance is much improved

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Zichen Sun
>Priority: Major
> Attachments: MAPREDUCE-7158-001.patch
>
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-12 Thread Zichen Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zichen Sun updated MAPREDUCE-7158:
--
Affects Version/s: 3.2.0
   Attachment: MAPREDUCE-7158-001.patch
   Status: Patch Available  (was: Open)

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Zichen Sun
>Priority: Major
> Attachments: MAPREDUCE-7158-001.patch
>
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-12 Thread Zichen Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zichen Sun updated MAPREDUCE-7158:
--
Attachment: (was: MAPREDUCE-7158-001.patch)

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zichen Sun
>Priority: Major
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-12 Thread Zichen Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zichen Sun updated MAPREDUCE-7158:
--
Attachment: MAPREDUCE-7158-001.patch

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zichen Sun
>Priority: Major
> Attachments: MAPREDUCE-7158-001.patch
>
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-12 Thread Zichen Sun (JIRA)
Zichen Sun created MAPREDUCE-7158:
-

 Summary: Inefficient Flush Logic in JobHistory EventWriter
 Key: MAPREDUCE-7158
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zichen Sun


In HDFS, if the flush is implemented to send server request to actually commit 
the pending writes on the storage service side, we could observe in the 
benchmark runs that the MR jobs are taking much longer. From investigation we 
see the current implementation for writing events doesn't look right:
EventWriter# write()
This flush is redundant and this statement should be removed. It defeats the 
purpose of having a separate flush function itself.
Encoder.flush calls flush of the underlying output stream
After patching with the fix the MR jobs could complete normally, please kindly 
find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org