[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180605#comment-16180605
 ] 

Steve Loughran commented on HADOOP-14906:
-

[~Georgi]: thanks for looking at this. Although your patch was the last to go 
near the test that was failing, the fact that it has "gone away" since I moved 
to a different network location makes me thing it is network-infra-related, and 
that could be a sign of an underlying problem, maybe even common to all apps 
using the Azure storage SDK: we just got to find it first.

It'd still be nice to know what's going on, or if there are improvements which 
can be done to reporting/recovery. Otherwise, I'll think about closing as 
cannot reproduce for now. Changing the title to make sure the error text is in 
it (for easier searching)

> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180584#comment-16180584
 ] 

Steve Loughran commented on HADOOP-14906:
-

+happened in both parallel & serial test runs, so it wasn't the case that the 
problem was triggered by the parallel test runner of HADOOP-14553

> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180577#comment-16180577
 ] 

Steve Loughran commented on HADOOP-14906:
-

Doesn't occur at other locations.

The one with the problem had
* BT ADSL
* BT wifi base station which never lets you change DNS servers

One withou
* BT Fibre-to-the-Cabinet
* DD-WRT base station bonded to Google DNS

Same laptop.

It's possible that these tests are failing because they are correctly detecting 
corruption of in-flight data.

* I'd only expect that on HTTP connections, not HTTPS, 
* unless it was a (transient) problem at Azure storage and/or the laptop.


One thing to consider here is what the retry policy is doing. There is retry 
logic in the upload routine, but did it work? How can be we confident of this?

> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-25 Thread Georgi Chalakov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179511#comment-16179511
 ] 

Georgi Chalakov commented on HADOOP-14906:
--

I am not sure that this is related to the block compaction change. The debug 
message shows no directories in the list for block blobs with compaction. I 
posted the code where we check whether the file is in one of those directories 
and if it is not we skip BlockBlobAppendStream.

2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: 
azure.AzureNativeFileSystemStore 
(AzureNativeFileSystemStore.java:initialize(550)) - Block blobs with compaction 
directories:  

  if (isBlockBlobWithCompactionKey(key)) {
BlockBlobAppendStream blockBlobOutputStream = new BlockBlobAppendStream(
(CloudBlockBlobWrapper) blob,
keyEncoded,
this.uploadBlockSizeBytes,
true,
getInstrumentedContext());
outputStream = blockBlobOutputStream;
  } else {
outputStream = openOutputStream(blob);
  }

> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179043#comment-16179043
 ] 

Steve Loughran commented on HADOOP-14906:
-

Most recent code which has touched this test code (and presumably, the upload 
logic) is HADOOP-14520. [~Georgi]: does this stack trace look familiar?

> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179039#comment-16179039
 ] 

Steve Loughran commented on HADOOP-14906:
-

And run with azure log at debug
{code}
2017-09-25 14:47:16,936 INFO  [JUnit-testReadOOBWrites]: impl.MetricsConfig 
(MetricsConfig.java:loadFirst(115)) - loaded properties from 
hadoop-metrics2-azure-file-system.properties
2017-09-25 14:47:16,959 INFO  [JUnit-testReadOOBWrites]: 
impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(207)) - Sink 
azuretestcollector started
2017-09-25 14:47:17,448 INFO  [JUnit-testReadOOBWrites]: impl.MetricsSystemImpl 
(MetricsSystemImpl.java:startTimer(374)) - Scheduled Metric snapshot period at 
10 second(s).
2017-09-25 14:47:17,449 INFO  [JUnit-testReadOOBWrites]: impl.MetricsSystemImpl 
(MetricsSystemImpl.java:start(191)) - azure-file-system metrics system started
2017-09-25 14:47:17,481 DEBUG [JUnit-testReadOOBWrites]: 
azure.AzureNativeFileSystemStore 
(AzureNativeFileSystemStore.java:configureAzureStorageSession(813)) - 
AzureNativeFileSystemStore init. 
Settings=8,true,90,{3000,3000,3,30},{true,1.0,1.0}
2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: 
azure.AzureNativeFileSystemStore 
(AzureNativeFileSystemStore.java:initialize(542)) - Page blob directories:  
2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: 
azure.AzureNativeFileSystemStore 
(AzureNativeFileSystemStore.java:initialize(550)) - Block blobs with compaction 
directories:  
2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: 
azure.AzureNativeFileSystemStore 
(AzureNativeFileSystemStore.java:initialize(567)) - Atomic rename directories: 
/hbase 
2017-09-25 14:47:17,501 DEBUG [JUnit-testReadOOBWrites]: 
azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:sendingRequest(167)) -  SelfThrottlingIntercept:: 
SendingRequest:   threadId=11, requestType=read , isFirstRequest=true, 
sleepDuration=0
2017-09-25 14:47:17,536 DEBUG [JUnit-testReadOOBWrites]: 
azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:responseReceived(115)) - 
SelfThrottlingIntercept:: ResponseReceived: threadId=11, Status=200, 
Elapsed(ms)=34, ETAG="0x8D5041BEF5D2DAF", contentLength=-1, requestMethod=HEAD
2017-09-25 14:47:17,538 DEBUG [JUnit-testReadOOBWrites]: 
azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:sendingRequest(167)) -  SelfThrottlingIntercept:: 
SendingRequest:   threadId=11, requestType=write, isFirstRequest=true, 
sleepDuration=0
2017-09-25 14:47:17,569 DEBUG [JUnit-testReadOOBWrites]: 
azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:responseReceived(115)) - 
SelfThrottlingIntercept:: ResponseReceived: threadId=11, Status=200, 
Elapsed(ms)=30, ETAG="0x8D5041BEF87544D", contentLength=-1, requestMethod=PUT
2017-09-25 14:47:17,637 DEBUG [pool-1-thread-2]: azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:sendingRequest(167)) -  SelfThrottlingIntercept:: 
SendingRequest:   threadId=18, requestType=write, isFirstRequest=true, 
sleepDuration=0
2017-09-25 14:47:17,637 DEBUG [pool-1-thread-1]: azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:sendingRequest(167)) -  SelfThrottlingIntercept:: 
SendingRequest:   threadId=17, requestType=write, isFirstRequest=true, 
sleepDuration=0
2017-09-25 14:47:22,869 DEBUG [pool-1-thread-2]: azure.SelfThrottlingIntercept 
(SelfThrottlingIntercept.java:responseReceived(115)) - 
SelfThrottlingIntercept:: ResponseReceived: threadId=18, Status=400, 
Elapsed(ms)=5229, ETAG=null, contentLength=405, requestMethod=PUT
2017-09-25 14:47:22,892 INFO  [JUnit-testReadOOBWrites]: 
azure.AbstractWasbTestBase (AbstractWasbTestBase.java:describe(172)) - 

testReadOOBWrites: closing test account and filesystem


java.io.IOException
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:770)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:443)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.azure.storage.StorageException: The MD5 value 
specified in the request did not match with the MD5 value calculated by the 
server.
at 

[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179036#comment-16179036
 ] 

Steve Loughran commented on HADOOP-14906:
-

Also surfaces in {{TestAzureConcurrentOutOfBandIoWithSecureMode}}
{code}
testReadOOBWrites(org.apache.hadoop.fs.azure.ITestAzureConcurrentOutOfBandIoWithSecureMode)
  Time elapsed: 7.667 sec  <<< ERROR!
java.io.IOException: null
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:770)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:443)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.azure.storage.StorageException: The MD5 value 
specified in the request did not match with the MD5 value calculated by the 
server.
at 
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
at 
com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315)
at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:175)
at 
com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1078)
at 
com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:1050)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:437)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write

2017-09-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179034#comment-16179034
 ] 

Steve Loughran commented on HADOOP-14906:
-

Stack. The initial exception, "null", is a sign that the Azure SDK isn't 
including exception text when it wraps inner exceptions; the nested exception 
is reporting a mismatch between the MD5 sent up in a PUT/POST and that received 
at the far end.

These tests are being run in a different location/network from usual, if that's 
likely to interfere: over HTTPS it shouldn't. Surfaces in branch-2 and trunk.

{code}
testReadOOBWrites(org.apache.hadoop.fs.azure.ITestAzureConcurrentOutOfBandIo)  
Time elapsed: 8.923 sec  <<< ERROR!
java.io.IOException: null
at 
com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:770)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:443)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.microsoft.azure.storage.StorageException: The MD5 value 
specified in the request did not match with the MD5 value calculated by the 
server.
at 
com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
at 
com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315)
at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:175)
at 
com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1078)
at 
com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:1050)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:437)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387)
at 
com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}



> ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
> -
>
> Key: HADOOP-14906
> URL: https://issues.apache.org/jira/browse/HADOOP-14906
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0, 3.1.0
> Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage 
> ireland
>Reporter: Steve Loughran
>
> {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the 
> text "The MD5 value specified in the request did not match with the MD5 value 
> calculated by the server"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org