[jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

2017-08-31 Thread Georgi Chalakov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149691#comment-16149691
 ] 

Georgi Chalakov edited comment on HADOOP-14520 at 8/31/17 10:08 PM:


Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream opened at the same 
time, we couldn't guarantee the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until the last successful hflush()/hsync(). When the block 
compaction is disabled, we grantee nothing. We may or may not have the data 
stored in the service.  




was (Author: georgi):
Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream opened at the same 
time, we couldn't guarantee the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  



> WASB: Block compaction for Azure Block Blobs
> 
>
> Key: HADOOP-14520
> URL: https://issues.apache.org/jira/browse/HADOOP-14520
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.0.0-alpha3
>Reporter: Georgi Chalakov
>Assignee: Georgi Chalakov
> Attachments: HADOOP-14520-006.patch, HADOOP-14520-008.patch, 
> HADOOP-14520-009.patch, HADOOP-14520-05.patch, HADOOP_14520_07.patch, 
> HADOOP_14520_08.patch, HADOOP_14520_09.patch, HADOOP_14520_10.patch, 
> HADOOP-14520-patch-07-08.diff, HADOOP-14520-patch-07-09.diff
>
>
> Block Compaction for WASB allows uploading new blocks for every hflush/hsync 
> call. When the number of blocks is above 32000, next hflush/hsync triggers 
> the block compaction process. Block compaction replaces a sequence of blocks 
> with one block. From all the sequences with total length less than 4M, 
> compaction chooses the longest one. It is a greedy algorithm that preserve 
> all potential candidates for the next round. Block Compaction for WASB 
> increases data durability and allows using block blobs instead of page blobs. 
> By default, block compaction is disabled. Similar to the configuration for 
> page blobs, the client needs to specify HDFS folders where block compaction 
> over block blobs is enabled. 
> Results for HADOOP_14520_07.patch
> tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
> Tests run: 777, Failures: 0, Errors: 0, Skipped: 155



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

2017-08-31 Thread Georgi Chalakov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149691#comment-16149691
 ] 

Georgi Chalakov edited comment on HADOOP-14520 at 8/31/17 10:07 PM:


Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream opened at the same 
time, we couldn't guarantee the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  




was (Author: georgi):
Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream, we cannot grantee 
the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  



> WASB: Block compaction for Azure Block Blobs
> 
>
> Key: HADOOP-14520
> URL: https://issues.apache.org/jira/browse/HADOOP-14520
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.0.0-alpha3
>Reporter: Georgi Chalakov
>Assignee: Georgi Chalakov
> Attachments: HADOOP-14520-006.patch, HADOOP-14520-008.patch, 
> HADOOP-14520-009.patch, HADOOP-14520-05.patch, HADOOP_14520_07.patch, 
> HADOOP_14520_08.patch, HADOOP_14520_09.patch, HADOOP_14520_10.patch, 
> HADOOP-14520-patch-07-08.diff, HADOOP-14520-patch-07-09.diff
>
>
> Block Compaction for WASB allows uploading new blocks for every hflush/hsync 
> call. When the number of blocks is above 32000, next hflush/hsync triggers 
> the block compaction process. Block compaction replaces a sequence of blocks 
> with one block. From all the sequences with total length less than 4M, 
> compaction chooses the longest one. It is a greedy algorithm that preserve 
> all potential candidates for the next round. Block Compaction for WASB 
> increases data durability and allows using block blobs instead of page blobs. 
> By default, block compaction is disabled. Similar to the configuration for 
> page blobs, the client needs to specify HDFS folders where block compaction 
> over block blobs is enabled. 
> Results for HADOOP_14520_07.patch
> tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
> Tests run: 777, Failures: 0, Errors: 0, Skipped: 155



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

2017-08-31 Thread Georgi Chalakov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149691#comment-16149691
 ] 

Georgi Chalakov edited comment on HADOOP-14520 at 8/31/17 10:06 PM:


Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream for all operations is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream, we cannot grantee 
the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  




was (Author: georgi):
Thank you for adding all these fixes. Stream capabilities looks like an useful 
feature.  

I will fix the space in last patch. 

Re:flush()
FSDataOutputStream doesn't overwrite flush() and a normal flush() call on 
application level would not execute BlockBlobAppendStream::flush(). When the 
compaction is disabled hflush/hsync are nop and the performance of 
BlockBlobAppendStream is the same (or better) than before. 

Re:more than one append stream
We take a lease on the blob, that means at any point of time you can have one 
append stream only. If we had more than one append stream, we cannot grantee 
the order of write operations.

I have added hsync() call and made isclosed volatile. 

Re:close()
I think the first exception is the best indication what went wrong. After an 
exception, close() is just best effort. I don't know how useful for a client 
would be to continue after IO related exception, but if that is necessary, the 
client can continue. If block compaction is enabled, the client can go and read 
all the data until last hflush()/hsync(). When the block compaction is 
disabled, we grantee nothing. We may or may not have the data stored in the 
service.  



> WASB: Block compaction for Azure Block Blobs
> 
>
> Key: HADOOP-14520
> URL: https://issues.apache.org/jira/browse/HADOOP-14520
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.0.0-alpha3
>Reporter: Georgi Chalakov
>Assignee: Georgi Chalakov
> Attachments: HADOOP-14520-006.patch, HADOOP-14520-008.patch, 
> HADOOP-14520-009.patch, HADOOP-14520-05.patch, HADOOP_14520_07.patch, 
> HADOOP_14520_08.patch, HADOOP_14520_09.patch, HADOOP_14520_10.patch, 
> HADOOP-14520-patch-07-08.diff, HADOOP-14520-patch-07-09.diff
>
>
> Block Compaction for WASB allows uploading new blocks for every hflush/hsync 
> call. When the number of blocks is above 32000, next hflush/hsync triggers 
> the block compaction process. Block compaction replaces a sequence of blocks 
> with one block. From all the sequences with total length less than 4M, 
> compaction chooses the longest one. It is a greedy algorithm that preserve 
> all potential candidates for the next round. Block Compaction for WASB 
> increases data durability and allows using block blobs instead of page blobs. 
> By default, block compaction is disabled. Similar to the configuration for 
> page blobs, the client needs to specify HDFS folders where block compaction 
> over block blobs is enabled. 
> Results for HADOOP_14520_07.patch
> tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net
> Tests run: 777, Failures: 0, Errors: 0, Skipped: 155



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs

2017-08-29 Thread Georgi Chalakov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146357#comment-16146357
 ] 

Georgi Chalakov edited comment on HADOOP-14520 at 8/29/17 11:50 PM:


HADOOP_14520_07.patch
Results : Tests run: 777, Failures: 0, Errors: 0, Skipped: 155

bq. if you are changing precondition check, I'd recommend StringUtils.isEmpty() 
for Preconditions.checkArgument(StringUtils.isNotEmpty(aKey));

Done.

bq. If fields aren't updated after the constructor, best to set to final 
(example, compactionEnabled ?).

Done.

bq. How long is downloadBlockList going to take in that constructor? More 
specifically: if compaction is disabled, can that step be skipped?

downloadBlockList is used for two purposes: 1) to check for block existence 2) 
to download the block list

bq. If the stream needs a byte buffer, best to use ElasticByteBufferPool as a 
pool of buffers.

Done.

bq. Use StorageErrorCodeStrings as the source of string constants to check for 
in exception error codes.

Done.

bq. Rather than throw IOException(e), I'd prefer more specific (existing ones). 
That's PathIOException and subclasses, AzureException(e), and the 
java.io/java.nio ones.

Done

bq. When wrapping a StorageException with another IOE, always include the 
toString value of the wrapped exception. That way, the log message of the top 
level log retains the underlying problem.

Done.

bq. BlockBlobAppendStream.WriteRequest retry logic will retry even on 
RuntimeExceptions like IllegalArgumentException. Ideally they should be split 
into recoverable vs non-recoverable ops via a RetryPolicy. Is this an issue to 
address here though? Overall, with the new operatins doing retries, this may be 
time to embrace rety policies. Or at least create a JIRA entry on doing so.

add*Command() will rethrow the last exception. That means the following write() 
or close() will retrow stored exception. It is not going to happen right away, 
but the will happen before the stream is closed()

bq. I know java.io.OutputStream is marked as single-thread only, but I know of 
code (hello HBase!) which means that you must make some of the calls thread 
safe. HADOOP-11708/HADOOP-11710 covers this issue in CryptoOutputStream. At the 
very least, flush() must be synchronous with itself, close() & maybe write()

flush() is synchronous with itself through addFlushCommand(). We do not want 
flush() to be synchronous with write(). We would like while a thread waits for 
a flush(), other threads to continue writing. 

bq. I'm unsure about BlockBlobAppendStream.close() waiting for up to 15 minutes 
for things to complete, but looking @ other blobstore clients, I can see that 
they are implicitly waiting without any timeout at all. And it's in the 
existing codebase. But: why was the time limit changed from 10 min to 15? Was 
this based on test failures? If so, where is the guarantee that a 15 minute 
wait is always sufficient.

The change to 15 min was not based on test failures. I have changed the timeout 
back to 10 min and added a const. 

bq. Looking at BlockBlobAppendStream thread pooling, I think having a thread 
pool per output stream is expensive, especially as it has a minimum size of 4; 
it will ramp up fast. A pool of min=1 max=4 might be less expensive. But 
really, the stream should be thinking about sharing a pool common to the FS, 
relying on callbacks to notify it of completion rather than just awaiting pool 
completion and a shared writeable field.

I did a some tests with YCSB and a pool of min=1, max=4. It is slower and the 
difference is measurable. Considering how many output stream you usually have 
per FS, I would like to keep min=4, max=4. The shared pool is a good idea, but 
I am afraid we would need bigger change and at the end I am not sure we will 
get significant benefits. 

bq. I think the access/use of lastException needs to be made stronger than just 
a volatile, as it means that code of the form if (lastException!=null) throw 
lastException isn't thread safe. I know, it's not that harmful provided 
lastException is never set to null, but I'd still like some isolated 
get/getAndSet/maybeThrow operations. Similarly, is lastException the best way 
to propagate failure, as it means that teardown failures are going to get 
reported ahead of earlier ones during the write itself. Overall, I propose 
using Callable WASB: Block compaction for Azure Block Blobs
> 
>
> Key: HADOOP-14520
> URL: https://issues.apache.org/jira/browse/HADOOP-14520
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.0.0-alpha3
>Reporter: Georgi Chalakov
>Assignee: Georgi Chalakov
> Attachments: HADOOP-14520-006.patch, HADOOP-14520-05.patch, 
>