[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378646#comment-16378646 ] Shashikant Banerjee edited comment on HDFS-12794 at 2/27/18 2:26 PM: - Patch v11 adds some tests to write keys and validate writes by setting the chunk and streamBufferSize. was (Author: shashikant): Patch adds some tests to write keys and validate writes by setting the chunk and streamBufferSize. > Ozone: Parallelize ChunkOutputSream Writes to container > --- > > Key: HDFS-12794 > URL: https://issues.apache.org/jira/browse/HDFS-12794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-12794-HDFS-7240.001.patch, > HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch, > HDFS-12794-HDFS-7240.004.patch, HDFS-12794-HDFS-7240.005.patch, > HDFS-12794-HDFS-7240.006.patch, HDFS-12794-HDFS-7240.007.patch, > HDFS-12794-HDFS-7240.008.patch, HDFS-12794-HDFS-7240.009.patch, > HDFS-12794-HDFS-7240.010.patch, HDFS-12794-HDFS-7240.011.patch > > > The chunkOutPutStream Write are sync in nature .Once one chunk of data gets > written, the next chunk write is blocked until the previous chunk is written > to the container. > The ChunkOutputWrite Stream writes should be made async and Close on the > OutputStream should ensure flushing of all dirty buffers to the container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316122#comment-16316122 ] Shashikant Banerjee edited comment on HDFS-12794 at 1/8/18 11:13 AM: - Thanks [~xyao], for the review comments. {code} Line 139-166: in the new code, the rollback will position the buffer to the very beginning of all async operations even though some of them may succeed while the previous sequential write allows partial rollback. Any thoughts on the tradeoff here? The parallel steam write is beneficial only for certain use cases (e.g., large files)? {code} I agree that the previous code had the flexibility of partial rollback which the current patch doesn't have. {code} public void write(byte b[], int off, int len) throws IOException { {code} If the write over an java output stream fails , write request from the client fails and accordingly the client can retry the write op starting from the original offset. In case of an exception during the write on the stream, rollback to initial offset should work. Given that the write API is void in nature, we are not letting the client know in any case at what exact point, the failure happened. Partial rollback can benefit in case the code retries the to write the chunk internally in case of certain failures which is not the case here. In such situations, rollback to the initial offset should be the usual action. What do you think ? {code} Line 337: should we put offset here or remove the offset parameter assuming this always starts with ByteBuffer offset 0. {code} ChunkInfo contains the field "offset" which is used in a lot of places. I think we can keep it here, and do the change in a different jira if at all this is required. Patch v7 addresses the remaining review comments. Please have a look. was (Author: shashikant): Thanks [~xyao], for the review comments. {code} Line 139-166: in the new code, the rollback will position the buffer to the very beginning of all async operations even though some of them may succeed while the previous sequential write allows partial rollback. Any thoughts on the tradeoff here? The parallel steam write is beneficial only for certain use cases (e.g., large files)? {code} I agree that the previous code had the flexibility of partial rollback which the current patch doesn't have. {code} public void write(byte b[], int off, int len) throws IOException { {code} If the write over an java output stream fails , write request from the client fails and accordingly the client can retry the write op starting from the original offset. In case of an exception during the write on the stream, rollback to initial offset should work. Given that the write API is void in nature, we are not letting the client know in any case at what exact point, the failure happened. Partial rollback can benefit in case the code retries the to write the chunk internally in case of certain failures which is not the case here. In such situations, rollback to the initial offset should be the usual action. What do you think ? {code} Line 337: should we put offset here or remove the offset parameter assuming this always starts with ByteBuffer offset 0. {code} ChunkInfo contains the filed "offset" which is used in a lot of places. I think we can keep it here, and do the change in a different jira if at all this is required. Patch v7 addresses the remaining review comments. Please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container > --- > > Key: HDFS-12794 > URL: https://issues.apache.org/jira/browse/HDFS-12794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12794-HDFS-7240.001.patch, > HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch, > HDFS-12794-HDFS-7240.004.patch, HDFS-12794-HDFS-7240.005.patch, > HDFS-12794-HDFS-7240.006.patch, HDFS-12794-HDFS-7240.007.patch > > > The chunkOutPutStream Write are sync in nature .Once one chunk of data gets > written, the next chunk write is blocked until the previous chunk is written > to the container. > The ChunkOutputWrite Stream writes should be made async and Close on the > OutputStream should ensure flushing of all dirty buffers to the container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316122#comment-16316122 ] Shashikant Banerjee edited comment on HDFS-12794 at 1/8/18 11:13 AM: - Thanks [~xyao], for the review comments. {code} Line 139-166: in the new code, the rollback will position the buffer to the very beginning of all async operations even though some of them may succeed while the previous sequential write allows partial rollback. Any thoughts on the tradeoff here? The parallel steam write is beneficial only for certain use cases (e.g., large files)? {code} I agree that the previous code had the flexibility of partial rollback which the current patch doesn't have. {code} public void write(byte b[], int off, int len) throws IOException { {code} If the write over an java output stream fails , write request from the client fails and accordingly the client can retry the write op starting from the original offset. In case of an exception during the write on the stream, rollback to initial offset should work. Given that the write API is void in nature, we are not letting the client know in any case at what exact point, the failure happened. Partial rollback can benefit in case the code retries the to write the chunk internally in case of certain failures which is not the case here. In such situations, rollback to the initial offset should be the usual action. What do you think ? {code} Line 337: should we put offset here or remove the offset parameter assuming this always starts with ByteBuffer offset 0. {code} ChunkInfo contains the filed "offset" which is used in a lot of places. I think we can keep it here, and do the change in a different jira if at all this is required. Patch v7 addresses the remaining review comments. Please have a look. was (Author: shashikant): Thanks [~xyao], for the review comments. {code} Line 139-166: in the new code, the rollback will position the buffer to the very beginning of all async operations even though some of them may succeed while the previous sequential write allows partial rollback. Any thoughts on the tradeoff here? The parallel steam write is beneficial only for certain use cases (e.g., large files)? {code} I agree that the previous code had the flexibility of partial rollback which the current patch doesn't have. {code} public void write(byte b[], int off, int len) throws IOException { {code} If the write over an java output stream fails , write request from the client fails and accordingly the client can retry the write op starting from the original offset. In case of an exception during the write on the stream, rollback to initial offset should work. Given that the write API is void in nature, we are not letting the client know in any case at what exact point, the failure happened. Partial rollback can benefit in case the code retries the to write the chunk internally in case of certain failures which is not the case here. In such situations, rollback to the initial offset should be the usual action. What do you think ? {code} Line 337: should we put offset here or remove the offset parameter assuming this always starts with ByteBuffer offset 0. {code} ChunkInfo contains the filed offset which is used in a lot of places. I think we can keep it here, and do the change in a different jira if at all this is required. Patch v7 addresses the remaining review comments. Please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container > --- > > Key: HDFS-12794 > URL: https://issues.apache.org/jira/browse/HDFS-12794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12794-HDFS-7240.001.patch, > HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch, > HDFS-12794-HDFS-7240.004.patch, HDFS-12794-HDFS-7240.005.patch, > HDFS-12794-HDFS-7240.006.patch, HDFS-12794-HDFS-7240.007.patch > > > The chunkOutPutStream Write are sync in nature .Once one chunk of data gets > written, the next chunk write is blocked until the previous chunk is written > to the container. > The ChunkOutputWrite Stream writes should be made async and Close on the > OutputStream should ensure flushing of all dirty buffers to the container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314013#comment-16314013 ] Xiaoyu Yao edited comment on HDFS-12794 at 1/5/18 10:21 PM: Thanks [~shashikant] for the patch and all for the reviews/discussion. The latest patch looks good to me overall. I just have a few minor issues. *ChunkGroupOutputStream.java* Suggest rename the blockSize to streamBufferSize here and use a separate configuration key to allow customization. We still can set it the same as SCM block size by DistributedStorageHandler by default. *ChunkOutputStream.java* Line 74: NIT: chukIndex can be removed as we are replacing it with a timestamp. Line 87: should we rename the parameter to avoid confusion. It seems to be more like a maxBufferSize. (Same applies to the change in ChunkGroupOutputStream.java that plumbs the blockSize parameter. Line 139-166: in the new code, the rollback will position the buffer to the very beginning of all async operations even though some of them may succeed while the previous sequential write allows partial rollback. Any thoughts on the tradeoff here? The parallel steam write is beneficial only for certain use cases (e.g., large files)? Line 196: NIT: Need to clarify the comments Line 337: should we put offset here or remove the offset parameter assuming this always starts with ByteBuffer offset 0. *ContainerProtocolCalls.java* Line 56: NIT: unused imports was (Author: xyao): Thanks [~shashikant] for the patch and all for the reviews/discussion. The latest patch looks good to me overall. I just have a few minor issues. ChunkGroupOutputStream.java Suggest rename the blockSize to streamBufferSize here and use a separate configuration key to allow customization. We still can set it the same as SCM block size by DistributedStorageHandler by default. ChunkOutputStream.java Line 74: NIT: chukIndex can be removed as we are replacing it with a timestamp. Line 87: should we rename the parameter to avoid confusion. It seems to be more like a maxBufferSize. (Same applies to the change in ChunkGroupOutputStream.java that plumbs the blockSize parameter. Line 139-166: in the new code, the rollback will position the buffer to the very beginning of all async operations even though some of them may succeed while the previous sequential write allows partial rollback. Any thoughts on the tradeoff here? The parallel steam write is beneficial only for certain use cases (e.g., large files)? Line 196: NIT: Need to clarify the comments Line 337: should we put offset here or remove the offset parameter assuming this always starts with ByteBuffer offset 0. ContainerProtocolCalls.java Line 56: NIT: unused imports > Ozone: Parallelize ChunkOutputSream Writes to container > --- > > Key: HDFS-12794 > URL: https://issues.apache.org/jira/browse/HDFS-12794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12794-HDFS-7240.001.patch, > HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch, > HDFS-12794-HDFS-7240.004.patch, HDFS-12794-HDFS-7240.005.patch, > HDFS-12794-HDFS-7240.006.patch > > > The chunkOutPutStream Write are sync in nature .Once one chunk of data gets > written, the next chunk write is blocked until the previous chunk is written > to the container. > The ChunkOutputWrite Stream writes should be made async and Close on the > OutputStream should ensure flushing of all dirty buffers to the container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260437#comment-16260437 ] Mukul Kumar Singh edited comment on HDFS-12794 at 11/21/17 8:50 AM: Thanks for the updated patch [~shashikant], Some minor comments on the latest v3 patch. 1) ChunkGroupOutputStream : 297 please align the comments 2) ChunkGroupOutputStream: 423, I feel we should not create a new semaphore there, is there a case where the semaphore will be passed as null? 3) ContainerProtocolCalls.java:188-220, should the writeChunk Api use the the writeChunkAsync and then do a future on get ? 4) Please fix the checkstyle errors. 5) Unit test failure of org.apache.hadoop.ozone.TestOzoneConfigurationFields.testCompareConfigurationClassAgainstXml is related. Please have a look. was (Author: msingh): Thanks for the updated patch [~shashikant], 1) ChunkGroupOutputStream : 297 please align the comments 2) ChunkGroupOutputStream: 423, I feel we should not create a new semaphore there, is there a case where the semaphore will be passed as null? 3) ContainerProtocolCalls.java:188-220, should the writeChunk Api use the the writeChunkAsync and then do a future on get ? 4) Please fix the checkstyle errors. 5) Unit test failure of org.apache.hadoop.ozone.TestOzoneConfigurationFields.testCompareConfigurationClassAgainstXml is related. Please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container > --- > > Key: HDFS-12794 > URL: https://issues.apache.org/jira/browse/HDFS-12794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12794-HDFS-7240.001.patch, > HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch > > > The chunkOutPutStream Write are sync in nature .Once one chunk of data gets > written, the next chunk write is blocked until the previous chunk is written > to the container. > The ChunkOutputWrite Stream writes should be made async and Close on the > OutputStream should ensure flushing of all dirty buffers to the container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:58 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) {code} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } {code} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. {code} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } {code} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. {code} response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) {code} {code} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) {code} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) {code} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } {code} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. {code} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } {code} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. {code} response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) {code} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) {code} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:57 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) {code} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } {code} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. {code} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } {code} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. {code} response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) {code} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) {code} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code {} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } code {} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code {} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } code {} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. code{} response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) code{} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) code{} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:50 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code {} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } code {} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code {} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } code {} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. code{} response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) code{} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) code{} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code { response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container >
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:45 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code { response = response.thenApply(reply -> { try{ throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){ throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e) java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code { response = response.thenApply(reply -> { try { throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container >
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:43 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code { response = response.thenApply(reply -> { try { throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } # While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given pint of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the sceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code { try { String requestID = traceID + chunkIndex + ContainerProtos.Type.WriteChunk.name(); //add the chunk write traceId to the queue semaphore.acquire(); LOG.warn("calling async"); response = writeChunkAsync(xceiverClient, chunk, key, data, requestID); response = response.thenApply(reply -> { try { throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) { LOG.info("coming here to throw exception"); throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } } code { java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468) at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne