[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180605#comment-16180605 ] Steve Loughran commented on HADOOP-14906: - [~Georgi]: thanks for looking at this. Although your patch was the last to go near the test that was failing, the fact that it has "gone away" since I moved to a different network location makes me thing it is network-infra-related, and that could be a sign of an underlying problem, maybe even common to all apps using the Azure storage SDK: we just got to find it first. It'd still be nice to know what's going on, or if there are improvements which can be done to reporting/recovery. Otherwise, I'll think about closing as cannot reproduce for now. Changing the title to make sure the error text is in it (for easier searching) > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180584#comment-16180584 ] Steve Loughran commented on HADOOP-14906: - +happened in both parallel & serial test runs, so it wasn't the case that the problem was triggered by the parallel test runner of HADOOP-14553 > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180577#comment-16180577 ] Steve Loughran commented on HADOOP-14906: - Doesn't occur at other locations. The one with the problem had * BT ADSL * BT wifi base station which never lets you change DNS servers One withou * BT Fibre-to-the-Cabinet * DD-WRT base station bonded to Google DNS Same laptop. It's possible that these tests are failing because they are correctly detecting corruption of in-flight data. * I'd only expect that on HTTP connections, not HTTPS, * unless it was a (transient) problem at Azure storage and/or the laptop. One thing to consider here is what the retry policy is doing. There is retry logic in the upload routine, but did it work? How can be we confident of this? > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179511#comment-16179511 ] Georgi Chalakov commented on HADOOP-14906: -- I am not sure that this is related to the block compaction change. The debug message shows no directories in the list for block blobs with compaction. I posted the code where we check whether the file is in one of those directories and if it is not we skip BlockBlobAppendStream. 2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: azure.AzureNativeFileSystemStore (AzureNativeFileSystemStore.java:initialize(550)) - Block blobs with compaction directories: if (isBlockBlobWithCompactionKey(key)) { BlockBlobAppendStream blockBlobOutputStream = new BlockBlobAppendStream( (CloudBlockBlobWrapper) blob, keyEncoded, this.uploadBlockSizeBytes, true, getInstrumentedContext()); outputStream = blockBlobOutputStream; } else { outputStream = openOutputStream(blob); } > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179043#comment-16179043 ] Steve Loughran commented on HADOOP-14906: - Most recent code which has touched this test code (and presumably, the upload logic) is HADOOP-14520. [~Georgi]: does this stack trace look familiar? > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179039#comment-16179039 ] Steve Loughran commented on HADOOP-14906: - And run with azure log at debug {code} 2017-09-25 14:47:16,936 INFO [JUnit-testReadOOBWrites]: impl.MetricsConfig (MetricsConfig.java:loadFirst(115)) - loaded properties from hadoop-metrics2-azure-file-system.properties 2017-09-25 14:47:16,959 INFO [JUnit-testReadOOBWrites]: impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(207)) - Sink azuretestcollector started 2017-09-25 14:47:17,448 INFO [JUnit-testReadOOBWrites]: impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(374)) - Scheduled Metric snapshot period at 10 second(s). 2017-09-25 14:47:17,449 INFO [JUnit-testReadOOBWrites]: impl.MetricsSystemImpl (MetricsSystemImpl.java:start(191)) - azure-file-system metrics system started 2017-09-25 14:47:17,481 DEBUG [JUnit-testReadOOBWrites]: azure.AzureNativeFileSystemStore (AzureNativeFileSystemStore.java:configureAzureStorageSession(813)) - AzureNativeFileSystemStore init. Settings=8,true,90,{3000,3000,3,30},{true,1.0,1.0} 2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: azure.AzureNativeFileSystemStore (AzureNativeFileSystemStore.java:initialize(542)) - Page blob directories: 2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: azure.AzureNativeFileSystemStore (AzureNativeFileSystemStore.java:initialize(550)) - Block blobs with compaction directories: 2017-09-25 14:47:17,484 DEBUG [JUnit-testReadOOBWrites]: azure.AzureNativeFileSystemStore (AzureNativeFileSystemStore.java:initialize(567)) - Atomic rename directories: /hbase 2017-09-25 14:47:17,501 DEBUG [JUnit-testReadOOBWrites]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:sendingRequest(167)) - SelfThrottlingIntercept:: SendingRequest: threadId=11, requestType=read , isFirstRequest=true, sleepDuration=0 2017-09-25 14:47:17,536 DEBUG [JUnit-testReadOOBWrites]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:responseReceived(115)) - SelfThrottlingIntercept:: ResponseReceived: threadId=11, Status=200, Elapsed(ms)=34, ETAG="0x8D5041BEF5D2DAF", contentLength=-1, requestMethod=HEAD 2017-09-25 14:47:17,538 DEBUG [JUnit-testReadOOBWrites]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:sendingRequest(167)) - SelfThrottlingIntercept:: SendingRequest: threadId=11, requestType=write, isFirstRequest=true, sleepDuration=0 2017-09-25 14:47:17,569 DEBUG [JUnit-testReadOOBWrites]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:responseReceived(115)) - SelfThrottlingIntercept:: ResponseReceived: threadId=11, Status=200, Elapsed(ms)=30, ETAG="0x8D5041BEF87544D", contentLength=-1, requestMethod=PUT 2017-09-25 14:47:17,637 DEBUG [pool-1-thread-2]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:sendingRequest(167)) - SelfThrottlingIntercept:: SendingRequest: threadId=18, requestType=write, isFirstRequest=true, sleepDuration=0 2017-09-25 14:47:17,637 DEBUG [pool-1-thread-1]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:sendingRequest(167)) - SelfThrottlingIntercept:: SendingRequest: threadId=17, requestType=write, isFirstRequest=true, sleepDuration=0 2017-09-25 14:47:22,869 DEBUG [pool-1-thread-2]: azure.SelfThrottlingIntercept (SelfThrottlingIntercept.java:responseReceived(115)) - SelfThrottlingIntercept:: ResponseReceived: threadId=18, Status=400, Elapsed(ms)=5229, ETAG=null, contentLength=405, requestMethod=PUT 2017-09-25 14:47:22,892 INFO [JUnit-testReadOOBWrites]: azure.AbstractWasbTestBase (AbstractWasbTestBase.java:describe(172)) - testReadOOBWrites: closing test account and filesystem java.io.IOException at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:770) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:443) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.microsoft.azure.storage.StorageException: The MD5 value specified in the request did not match with the MD5 value calculated by the server. at
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179036#comment-16179036 ] Steve Loughran commented on HADOOP-14906: - Also surfaces in {{TestAzureConcurrentOutOfBandIoWithSecureMode}} {code} testReadOOBWrites(org.apache.hadoop.fs.azure.ITestAzureConcurrentOutOfBandIoWithSecureMode) Time elapsed: 7.667 sec <<< ERROR! java.io.IOException: null at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:770) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:443) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.microsoft.azure.storage.StorageException: The MD5 value specified in the request did not match with the MD5 value calculated by the server. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:175) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1078) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:1050) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:437) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14906) ITestAzureConcurrentOutOfBandIo failing with checksum errors on write
[ https://issues.apache.org/jira/browse/HADOOP-14906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179034#comment-16179034 ] Steve Loughran commented on HADOOP-14906: - Stack. The initial exception, "null", is a sign that the Azure SDK isn't including exception text when it wraps inner exceptions; the nested exception is reporting a mismatch between the MD5 sent up in a PUT/POST and that received at the far end. These tests are being run in a different location/network from usual, if that's likely to interfere: over HTTPS it shouldn't. Surfaces in branch-2 and trunk. {code} testReadOOBWrites(org.apache.hadoop.fs.azure.ITestAzureConcurrentOutOfBandIo) Time elapsed: 8.923 sec <<< ERROR! java.io.IOException: null at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:770) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:443) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.microsoft.azure.storage.StorageException: The MD5 value specified in the request did not match with the MD5 value calculated by the server. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:175) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:1078) at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:1050) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.writeBlock(BlobOutputStreamInternal.java:437) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal.access$000(BlobOutputStreamInternal.java:52) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:387) at com.microsoft.azure.storage.blob.BlobOutputStreamInternal$1.call(BlobOutputStreamInternal.java:384) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > ITestAzureConcurrentOutOfBandIo failing with checksum errors on write > - > > Key: HADOOP-14906 > URL: https://issues.apache.org/jira/browse/HADOOP-14906 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0, 3.1.0 > Environment: UK BT ASDL connection, 1.8.0_121-b13, azure storage > ireland >Reporter: Steve Loughran > > {{ITestAzureConcurrentOutOfBandIo}} is consistently raising an IOE with the > text "The MD5 value specified in the request did not match with the MD5 value > calculated by the server" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org