[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-24 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626277#comment-16626277
 ] 

Da Zhou commented on HADOOP-15407:
--

Hi, since HADOOP-15407 is merged to trunk, should we move all the 
unfinished/ongoing sub tasks to  HADOOP-15763 Über-JIRA: abfs phase II?

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624927#comment-16624927
 ] 

Hudson commented on HADOOP-15407:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15037 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15037/])
HADOOP-15407. HADOOP-15540. Support Windows Azure Storage - Blob file (tmarq: 
rev f044deedbbfee0812316d587139cb828f27172e9)
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferWorker.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractSeek.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/package-info.java
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/FileSystemOperationUnhandledException.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidUriException.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestTracingServiceImpl.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/diagnostics/package-info.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/diagnostics/BooleanConfigurationBasicValidator.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/diagnostics/package-info.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/Abfs.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/diagnostics/IntegerConfigurationBasicValidator.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractMkdir.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/AbfsHttpService.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRandomRead.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemCopy.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAzureBlobFileSystemBasics.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/package-info.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/package.html
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemUriSchemes.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractRootDirectory.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/ServiceResolutionException.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemOpen.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/SecureAzureBlobFileSystem.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractCreate.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemFileStatus.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/MockAbfsServiceInjectorImpl.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/annotations/package-info.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/ReadBufferStatus.java
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/TimeoutException.java
* (edit) hadoop-tools/hadoop-azure/src/test/resources/log4j.properties
* (add) 
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/ListResultSchema.java
* (add) 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContract.java
* (add) 
h

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-18 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619161#comment-16619161
 ] 

Sean Mackrory commented on HADOOP-15407:


I'm comfortable with that. The changes outside of the module (and even outside 
of the new package in the module) are very few in number and they're quite 
trivial. I've done some testing with MapReduce, Hive and Spark (not on the 
exact current branch state, but recently with most of it) and all issues found 
have been fixed.

{quote}However vote + merge needs another week time as per my 
understanding{quote}

According to the by-laws (https://hadoop.apache.org/bylaws.html), we can merge 
as soon as we have 3 +1's from active committers and there's consensus. In this 
case, the 7 day convention is a courtesy. Once we have 3 +1's, if we'd like to 
wrap it up I think it's reasonable to ask if anybody would like more time (like 
the weekend) to evaluate it and consider consensus reached if nobody says so.

{quote}
Consensus approval of active committers, but with a minimum of one +1. The code 
can be committed after the first +1, unless the code change represents a merge 
from a branch, in which case three +1s are required.
..
Votes relating to code changes are not subject to a strict timetable but should 
be made as timely as possible.
{quote}


> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have use

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618542#comment-16618542
 ] 

Sunil Govindan commented on HADOOP-15407:
-

Hi [~ste...@apache.org] [~tmarquardt] [~mackrorysd] [~DanielZhou]

Thanks for closing many items in this and starting Vote thread.

You have mentioned in mail thread that this feature can work independently and 
has no impacts to other module. However vote + merge needs another week time as 
per my understanding. Could you please confirm the stability of this feature 
and any potential risk in going with 3.2 release train ?

Because we were planning for 3.2 branch cut this week and we might need to 
delay this for a week atleast to get this feature in. As long as you feel the 
feature is functional (barring improvements for next dot releases), has no 
major impacts and could merge in a week, I am fine in getting this in.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618227#comment-16618227
 ] 

Steve Loughran commented on HADOOP-15407:
-

bq. I have not seen the HADOOP-15761  failure, but I'm fine with updating ABFS 
to not use regex, or whatever it takes to make it robust.  Someone who is able 
to reproduce the failure should fix it. Maybe I don't see it because I don't 
have OpenSSL?

I'm not worried about it; I saw it on an IDE run of all the abfs tests. I don't 
see it on standalone test runs, which makes me think its some state preserved 
from a previous test.

I've just created HADOOP-15763 for all those followup issues. I think 
HADOOP-15761 is in that category.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618204#comment-16618204
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

I was able to successfully rebase on trunk.  The test results look good for 
hadoop-azure and hadoop-common.  I force pushed the changes to the HADOOP-15407 
branch.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618111#comment-16618111
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

Ok, it is good to know that the rebase was a success for you.  I am still 
running tests, but so far, so good.  I will force push if all the tests pass 
and infra allows me to do so. 

I have not seen the HADOOP-15761 failure, but I'm fine with updating ABFS to 
not use regex, or whatever it takes to make it robust.  Someone who is able to 
reproduce the failure should fix it. Maybe I don't see it because I don't have 
OpenSSL?

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618086#comment-16618086
 ] 

Steve Loughran commented on HADOOP-15407:
-

I've just done a rebase & retest locally, only one issue w.r.t abfs testing: 
HADOOP-15761, and that's nothing serious.

I don't know if you'll be able to force push the rebased branch up as infra 
like to lock that down to stop anyone forcing up some rollback of branches.

Try it —If  it doesn't take, file a JIRA on the INFRA  project asking the 
branch to support it.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618070#comment-16618070
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

I am going to rebase branch HADOOP-15407 on the latest trunk today.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-17 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617237#comment-16617237
 ] 

Sunil Govindan commented on HADOOP-15407:
-

Thanks [~DanielZhou], I will move to 3.3 for now as target version.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-15 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616590#comment-16616590
 ] 

Da Zhou commented on HADOOP-15407:
--

Hi [~sunilg], sorry for this late reply, we were not able to make it before 
15th Sept. We are still working on resolving the rest JIRAs, once it is in good 
shape, we will call out for a discuss /vote.
Thanks.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-13 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614209#comment-16614209
 ] 

Sunil Govindan commented on HADOOP-15407:
-

Hi [~DanielZhou] and [~ste...@apache.org]

Apart from few open issues, 4 issues are in Patch Available state. It seems 
there are closer to commit. However since this work is in branch, we need to 
call out for a discuss/vote. Given 3.2 is very close (code freeze by 15th 
Sept), could you please let me know how much more time require here.

Thank You.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscri

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-09-01 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599829#comment-16599829
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

[~ste...@apache.org], the 403 errors are happening because your storage account 
key was updated.  Please try the new key I sent you.

Also, make sure you have the latest sources and refer to the "Testing the Azure 
ABFS Client" section of testing_azure.md.  The config was updated recently by 
HADOOP-15663.

In a nutshell, add the following to src/test/resources/azure-auth-keys.xml:

{noformat}


http://www.w3.org/2001/XInclude";>
  
fs.azure.abfs.account.name
{ACCOUNT_NAME}.dfs.core.windows.net
  

  
fs.azure.account.key.{ACCOUNT_NAME}.dfs.core.windows.net
{ACCOUNT_ACCESS_KEY}
  

  
fs.azure.wasb.account.name
{ACCOUNT_NAME}.blob.core.windows.net
  
  
  
fs.azure.account.key.{ACCOUNT_NAME}.blob.core.windows.net
{ACCOUNT_ACCESS_KEY}
  

  
fs.contract.test.fs.abfs
abfs://{CONTAINER_NAME}@{ACCOUNT_NAME}.dfs.core.windows.net
A file system URI to be used by the contract 
tests.
  

  
fs.contract.test.fs.wasb
wasb://{CONTAINER_NAME}@{ACCOUNT_NAME}.blob.core.windows.net
A file system URI to be used by the contract 
tests.
  

{noformat}


> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HD

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-31 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599045#comment-16599045
 ] 

Steve Loughran commented on HADOOP-15407:
-

I'm still hoping to get things stable while its easier to do faster review & 
big rework cycles. 

I've been updating & trying to play with adding support for this in my 
downstream-of-spark tests 
(https://github.com/hortonworks-spark/cloud-integration) . While not a "real" 
integration suite, it works well to pick up classpath integration issues, and 
where spark's expectation of a store doesn't match what is there (e.g: 
partitioning, seek, rename). 

Currently: getting 403 errors, which implies that somehow my current config has 
broken, or there's been some remote change. Filed HADOOP-15710 for mapping 403 
-> AccessDeniedException; we should also make sure that files you can't read 
also have a similar exception.

I've now added special support for abfs in cloudstore
https://github.com/steveloughran/cloudstore/releases/tag/tag_2018-08-31-release 
, but not worked out what's up. I do think I have the valid settings.

I'm worried that if I can't get it to work, we're not ready to drop it on 
unsuspecting users


> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-31 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598962#comment-16598962
 ] 

Sean Mackrory commented on HADOOP-15407:


[~ste...@apache.org] What are your thoughts on starting a merge vote once the 
work Thomas listed is all resolved? I have one patch in flight, but nothing I'm 
working on impacts stuff outside this module, so I'm happy to proceed with it 
post-merge.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h.

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-29 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596884#comment-16596884
 ] 

Da Zhou commented on HADOOP-15407:
--

Yes, [~sunilg], it will be ready. We are now working on  JIRAS mentioned by 
[~tmarquardt] in his last comment, most of them are already under review.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595909#comment-16595909
 ] 

Sunil Govindan commented on HADOOP-15407:
-

[~DanielZhou] [~ste...@apache.org]

Thanks for the work on this. As many of the subtasks are done, is this feature 
change ready to go for a release (3.2 which is planned for next month end)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-28 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595246#comment-16595246
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

Thanks Sean, we would like to merge to trunk soon.  I plan to upload a patch 
for HADOOP-15582 (documentation) this week, and HADOOP-15663 (simplify 
configuration) is in review so should also complete this week. I have not had a 
chance to look at HADOOP-15666 yet, but we should be able to do that this week 
too. We are also looking to improve test paralleization and add the ability to 
select WASB, ABFS, or both when running tests (HADOOP-15664). I expect all this 
to be done this week.

If the community has any additional requests before the merge to trunk, lets 
discuss them here.  It would be great if we could start voting on the merge.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-24 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591812#comment-16591812
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HADOOP-15407 does not apply to HADOOP-15407. Rebase required? 
Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-15407 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927853/HADOOP-15407-HADOOP-15407-008.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15090/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmar

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-08-24 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591800#comment-16591800
 ] 

Sean Mackrory commented on HADOOP-15407:


Hey I wanted to ask what everyone else's thoughts are on when we intend to 
merge this. Several of the recent changes and much of what remains would be 
fine as independent, incremental developments as trunk. The service is not GA, 
yet, but it sounds like there's already a strong need for compatibility in the 
clients (HADOOP-15546). 

HADOOP-15544, HADOOP-15582, HADOOP-15663 seem like good things to do pre-merge. 
I know HADOOP-15666 is supposed to work with the Namespace Service enabled 
prior to GA but doesn't now, and that would be good to resolve one way or 
another. Everything else that I'm aware of seems like it doesn't require 
maintaining this in a separate branch.

Thoughts?

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Blocker
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties 

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-07-05 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534230#comment-16534230
 ] 

Sean Mackrory commented on HADOOP-15407:


As I understand it, the initial commit in HADOOP-15540 covers everything 
referenced by [~esmanii]'s issues that are connected to this, correct? If so, I 
can go ahead and close those all as duplicates.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-15 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514119#comment-16514119
 ] 

Steve Loughran commented on HADOOP-15407:
-

OK committed patch 008. under the JIRA. HADOOP-15540; which I've now closed

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-15 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514039#comment-16514039
 ] 

Steve Loughran commented on HADOOP-15407:
-

I've just added abfs support in the storediag feature in my [little cloudstore 
project](https://github.com/steveloughran/cloudstore/releases/tag/tag_2018_06_15_release);
 knows about all the dependencies and scans for them all before trying to do 
any operations on the FS.

Not done: anything related to credentials or probing the endpoint prior to 
instantiating the FS; the kind of thing you need for the next stage of fault 
diagnostics

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#7600

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513094#comment-16513094
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 55 new or modified test 
files. {color} |
|| || || || {color:brown} HADOOP-15407 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
58s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 
45s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 22m 
45s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 25m 
18s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
5s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
36s{color} | {color:green} HADOOP-15407 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
18s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}153m 55s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}345m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID |
|   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps |
|   | 
hadoop.yarn.server.timelineservice

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512784#comment-16512784
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

The plan sounds good to me.

Credit for this work goes to (hope I don't forget anyone): Steve Loughran, 
Shane Mainali, {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, 
Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, 
Saurabh Pant, and James Baker. {color}
{color:#212121} {color}

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsub

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512773#comment-16512773
 ] 

Steve Loughran commented on HADOOP-15407:
-

Now, regarding JIRAs & things. What JIRA to put into the git commit message, 
and what JIRA to close.

Here's my proposal

# This is the Uber-JIRA, stays open until branch is merged in
# HADOOP-15432 is renamed "Core ABFS module"; the initial patch commit message 
will reference that and include this JIRA too, e.g. HADOOP-15407/HADOOP-15542 
...
# Credit in patch to: Esfandiar, Thomas, Da Zhou. Let me know who has 
contributed code to it & they should be named too.
# The other JIRAs we have here evolve from the initial code submission to more 
one of review of modules/classes, ideally scoped well, e.g.

* imports, javadocs & IDE complaints ( I have this)
* fix all outstanding javadoc issues
* configuration: model names, docs, XML values in core-default, etc
* output stream  code review
* input stream (including  ReadBuffer logic)
* General FS Semantics
* Docs



> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, 
> HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, 
> HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for q

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512486#comment-16512486
 ] 

Steve Loughran commented on HADOOP-15407:
-

Let's not worry about those failures; they are unrelated. It may be they've 
already been fixed in trunk, it's just that this branch hasn't got the fix.

I'm about to update the HADOOP-15407 branch to where trunk is, push that up and 
then apply the patch you have here; resubmit that as a patch 009 & if jenkins 
and my test happy, merge it in as the first initial patch. Trunk has moved up 
to 7.0.0 of the azure SDK, so there's a bit of a merge conflict with this patch 
and that, and a risk of functionality conflict: we need that SDK update in the 
branch.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-12 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510403#comment-16510403
 ] 

Da Zhou commented on HADOOP-15407:
--

The following *independent* unit tests failed in latest Jenkins build:
 - hadoop.fs.shell.TestCopyFromLocal
 - hadoop.crypto.key.TestKeyShell
 - hadoop.crypto.key.TestKeyProviderFactory

Because all Azure changes goes under project hadoop-azure, so these unit test 
failure cannot be caused by the patch.
 Can someone help? 

Regards,
 Da 
  

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Da Zhou
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, 
> HADOOP-15407-HADOOP-15407.008.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-12 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510371#comment-16510371
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 55 new or modified test 
files. {color} |
|| || || || {color:brown} HADOOP-15407 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
36s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
38s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
27s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
39s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
28s{color} | {color:green} HADOOP-15407 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 45s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}219m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.shell.TestCopyFromLocal |
|   | hadoop.crypto.key.TestKeyShell |
|   | hadoop.crypto.key.TestKeyProviderFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15407 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927526/HADOOP-15407-HADOOP-15407.008.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvn

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-12 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510334#comment-16510334
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 55 new or modified test 
files. {color} |
|| || || || {color:brown} HADOOP-15407 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
30s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m  
8s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
17s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 
54s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m  
1s{color} | {color:green} HADOOP-15407 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m  9s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}219m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.shell.TestCopyFromLocal |
|   | hadoop.crypto.key.TestKeyShell |
|   | hadoop.crypto.key.TestKeyProviderFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HADOOP-15407 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927516/HADOOP-15407-HADOOP-15407.008.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvn

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-11 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509107#comment-16509107
 ] 

Da Zhou commented on HADOOP-15407:
--

Submitting HADOOP-15407-HADOOP-15407.008.patch, all ABFS tests passed against 
my storage account in west US.

Updates in the patch:
 - Resolved findbugs violations
 - Resolved checkstyle violations
 - Added missing java docs
 - Updated AzureBlobFileSystemException as a subclass of IOE and updated 
exception check
 - Reinstate wasb contract tests to parallel runs, enable the parallel runs for 
ABFS contract tests
 - Updated names of Service injection interface and impl with Azure specific 
names
 - Replaced loggingService with SL4J

{noformat}
mvn -T 1C -Dparallel-tests -DtestsThreadCount=8 clean verify

[INFO] --- maven-antrun-plugin:1.7:run (create-parallel-tests-dirs) @ 
hadoop-azure ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/1
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/2
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/3
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/4
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/5
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/6
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/7
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/8
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/1
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/2
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/3
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/4
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/5
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/6
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/7
[mkdir] Created dir: 
/home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/8
[INFO] Executed tasks
[INFO]
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-azure ---
[INFO]
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider
[INFO] Running org.apache.hadoop.fs.azure.TestBlobMetadata
[INFO] Running org.apache.hadoop.fs.azure.TestWasbFsck
[INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
[INFO] Running org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations
[INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic
[INFO] Running org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer
[INFO] Running 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked
[WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 1.098 
s - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic
[WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 1.934 
s - in org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider
[INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency
[INFO] Running 
org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked
[WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 8.097 
s - in org.apache.hadoop.fs.azure.TestWasbFsck
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.808 s 
- in org.apache.hadoop.fs.azure.TestBlobMetadata
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.033 s 
- in org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations
[INFO] Running 
org.apache.hadoop.fs.azure.metrics.TestNativeAzureFileSystemMetricsSystem
[INFO] Running org.apache.hadoop.fs.azure.metrics.TestBandwidthGaugeUpdater
[INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorization
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.308 s 
- in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.594 s 
- in org.apache.hadoop.fs.

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507132#comment-16507132
 ] 

Steve Loughran commented on HADOOP-15407:
-

followup: got this working from the hadoop fs command, though we need to 
understand/sort out the packaging there

* the ASF distro doesn't put the hadoop-azure connector & deps into 
hadoop-common, but into hadoop-tools lib, and that doesn't get onto the CP for 
the {{hadoop fs}} command. Though the {{hadoop fs s3a://}} operations do work 
out the box. I need to understand more of what goes on there.
* even with the hadoop-azure, azure-sdk & htrace artifacts copied from hadoop 
tools to hadoop common lib I got a CNFE for htrace

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/htrace/fasterxml/jackson/core/JsonProcessingException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getDeclaredConstructors(Class.java:2020)
at 
com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245)
at 
com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99)
at 
com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:658)
at 
com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:882)
at 
com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:805)
at 
com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:282)
at 
com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:214)
at 
com.google.inject.internal.InjectorImpl.getInternalFactory(InjectorImpl.java:890)
at com.google.inject.internal.FactoryProxy.notify(FactoryProxy.java:46)
at 
com.google.inject.internal.ProcessedBindingData.runCreationListeners(ProcessedBindingData.java:50)
at 
com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:134)
at 
com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)
at com.google.inject.Guice.createInjector(Guice.java:96)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at 
org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.(ServiceProviderImpl.java:43)
at 
org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.create(ServiceProviderImpl.java:60)
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:102)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
Caused by: java.lang.ClassNotFoundException: 
org.apache.htrace.fasterxml.jackson.core.JsonProcessingException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
{code}

Had to  go 
{code}
cp ./share/hadoop/yarn/timelineservice/lib/htrace-core-3.1.0-incubating.jar 
share/hadoop/common/lib
{code}

I don't get this as the dependencies are set up (its a 'compile' dep), so not 
sure why it isn't ending up in hadoop tools lib. Again, clearly something 
packaging related.

I'm not sure how to test this stuff except manually; we don't have any 
integration tests of the actual packaged code (yet), though its probably 
possible via some new test suite in the hadoop-dist packaging. Or something 
downstream/nearby. I know all the hadoop -stack-vendors will have some tests 
for this, but its testing their packaging, not the b

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505018#comment-16505018
 ] 

Steve Loughran commented on HADOOP-15407:
-

h1. First attempt at testing

I found it very hard to get up and running. As in: I have one of the contract 
tests going, but nothing else yet. 

The testing docs will need to explain how to get started. The easier the setup 
process, the easier writing those docs become

I'd make writing that doc priority, as without that, getting the tests working 
will be a blocker to reviews.

Key things I had trouble with

* the difference between the wasb & dfs accounts
* what's needed in terms of pre-test store container setup. I think it's 
happening
automatically, but that's probably repeating the same problem we see with wasb: 
container leakage & the
need to periodically purge them all.  If that's the case, a new version 
of {{org.apache.hadoop.fs.azure.integration.CleanupTestContainers}} is needed, 
and
again, the docs.
* Lack of meaningful details on why a test setup failed other than "skipped". 
The attached patch
addresses that by including a message in the Assume clause. (side-note: I 
expect meaningful messages in
*all* Assume.assume clauses, as I try to do in my own contribs). 


I tried to get {{ITestAzureBlobFileSystemMkDir}} up and working and didn't get 
that far, timeouts.


Every test needs a timeout. This is to avoid messages like

{code}
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test
   (default-test) on project hadoop-azure: There was a timeout or other error 
in the fork -> [Help 1]
{code}

When maven kills a test, all the output is lost, and, as all test teardown is 
skipped,
things on remote stores left in a mess.

I've added one to {{DependencyInjectedTest}} where it will be found everywhere

This shows me what's hanging. I'm assuming its still test setup related, so
will look at my config options more. But the fact things are timing out
if the tests are misconfigured is a problem on its own.


{code}
"Thread-0" #13 prio=5 os_prio=31 tid=0x7f97061b3000 nid=0x5803 waiting on 
condition [0x7f8ac000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:255)
at 
com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:769)
at 
com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:756)
at 
org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.exists(StorageInterfaceImpl.java:233)
at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:856)
at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
at 
org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
at 
org.apache.hadoop.fs.azurebfs.DependencyInjectedTest.initialize(DependencyInjectedTest.java:132)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


# I don't think falling back to anon should happen, at least with tests.
# I absolutely don't think that login failures should be something you retry on.

I see there's a call to {{suppressRetryPolicyInClientIfNeeded();}}, so
the tests need to make sure that's running. I think production-side code
needs to look at the auth codepath and make sure that its operations are all
fail fast.

Proposed: add a test for this. Create a config, remove the auth, try to do
anon access to your test containers. Expect it to fail fast.

Other aspects of that test: LambdaTestUtils.intercept() loves closures which 
return things other than void: the string value of the res

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504683#comment-16504683
 ] 

Steve Loughran commented on HADOOP-15407:
-

[~fabbri], [~esmanii] the hadoop project needs to think about what to do w.r.t 
htrace in future, given its not going to leave incubation. HDFS uses, so I'm 
not worried about it being added as a dependency here, but it's not going to 
get any maintenance unless we think about co-opting it into the client-side 
tracing into own codebase, leaving log collection to other tools. With Todd and 
Colin Patrick McCabe on our committer list, we could make a case for that. 

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-06 Thread Esfandiar Manii (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503951#comment-16503951
 ] 

Esfandiar Manii commented on HADOOP-15407:
--

Thanks [~fabbri]. Yes, htrace was very useful (credits to Steve for letting us 
know to use it :) ). 

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503693#comment-16503693
 ] 

Aaron Fabbri commented on HADOOP-15407:
---

Thanks.. was still reading through latest patch. Just don't want "read only" 
code that can only be reliably changed by vendors.  Glad to see some use of 
htrace. Did you guys find it useful during development?

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-06 Thread Thomas Marquardt (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503686#comment-16503686
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

We updated the patch recently and removed all the generated code.  We will 
begin working to address the feedback above.  Thanks, and keep the feedback 
coming!

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503682#comment-16503682
 ] 

Aaron Fabbri commented on HADOOP-15407:
---

Thanks for the update. In terms of us maintaining this code long term, how do 
we generate the generated code?  I'm generally going to be -1 checking in 
generated code and not giving the community a way to generate it.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503626#comment-16503626
 ] 

Steve Loughran commented on HADOOP-15407:
-

I'm starting off with some high level "strategic issues", not looking at the 
low-level details. IntelliJ is flagging this up there, but its secondary and 
can be dealt with package-by-package. I haven't yet done the testing.

The general things I review by are covered in my someout out of date [proposed 
styleguide 
update|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md].
 That doesn't cover Java 8, lambda expressions and streams, but it does show 
what I'm thinking about when I review stuff
 
The key point: we want the code base to be broadly maintainable by anyone, 
tests to be fast and fail meaningfully, where possible as much of the same code 
style as elsewhere (though that evolves; moving things on is always good). So 
I'm looking it from a not just 'does it work' viewpoint, but "If I was left to 
maintain this, would I stand a chance?" 

Key requirements to be fixed before the big patch goes in

* {{AzureBlobFileSystemException}} a subclass of IOE.
* All those checkstyle complaints except in generated code.
* And findbugs, except when it is wrong & needs to be told so
* reinstate wasb contract tests to parallel runs
* use Azure specific names for the service injection interfaces and impls. 

Followup JIRAs to go in

* doc dependency injection in the javadocs
* review exception handling/wrapping
* Logging to use SLF4J directly for formatting
* Testing enhancements as proposed below
* Documentation.
  
Oh, and I'd like the first patch to be one which only does the POM dependency 
checks. This gives our new branch committers an initial patch to work with.
 
h3. Poms and dependencies

The patch to change the POM dependencies should go in on its own (presumably, 
first patch), so it's more isolated. People cherry picking or doing diffs on 
POMs will appreciate this.

Interesting that you had to exclude the guice transitive dependency on guava, 
but no other project did. Assumption: maven's "depth first" resolution logic 
has hidden it from others.

h3. Dependency Injection

We've tended to avoid this in the past; more a metric of the age of the 
codebase & the odd bad experience in the past. It is used in YARN a fair bit 
though.

I can see you've been busy here That's fine, except you do get to document it 
in detail.

I also worry that names like "ServiceProvider" may be a bit too generic; 
sommething like AzureStoreServiceProvider would be clearer.


Proposed: add a JIRA to cover names & some coverage in the package.html file of 
{{org.apache.hadoop.fs.azurebfs}}

h3. Exceptions

My ideal: exceptions which include stacks, URLs, anything else to diagnose 
problems, ideally consistent with the Hadoop stack "IOException" everywhere 
world.

* Pleae make {{AzureBlobFileSystemException}} a subclass of IOE. This matches 
the rest of our codebase and avoids having to translate things to IOEs in the 
FileSystem implementations.

* make sure that stack traces are always retained... 
{{AzureBlobFileSystem.evaluateFileSystemPathOperation())}} is an example of 
where they are being stripped out in some cases (404, 409)

* Also: look @ {{PathIOException}} for an exception class which takes a path; 
if {{AzureBlobFileSystemException}} could be built under that, it'd contain the 
path and operation, and be something other code already handles.

Proposed: move {{AzureBlobFileSystemException}} under IOE before the first 
patch goes in, tune the other details separately


h3. Logging

Key question: why not just use SLf4J API directly? Except for that specific 
case method {{log(LogLevel logLevel, tring message, Object... arguments)}}. I 
don't see anything there that's not in SLF4J

And, Why a new message format? The SLF4J Log.info("{}", object) log message 
constructor is more efficient than the JDK formatter, and handles null object 
values and only calls toString() on its args when actually logging them. * 
Followup 2: and a failure in formatting must be swallowed, not raised. Everyone 
hates a log failure, not least because the debug level logging doesn't normally 
get any test coverage.  


Given how fundamental this log injection is going on, I guess its hard to 
change, but I believe the formatting should be handed off to SL4J rather thsn 
trying to do thingd with a JSON parser which will inevitably (a) be slow and 
(b) be brittle.

Proposed: add a JIRA: SLf4J to do formatting for AbfsLoggingService. This is 
going to be a blocker BTW, keeping cost of logging down is something we care 
about, which is CPU overhead + cost of creating objects, calling toString(), 
etc.  

h2. Testing

We shouldn't need to add a new auth-keys file, just share the existing azure 
one, adding the new relevant fs URI. Simplifies path & allows .

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-03 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499737#comment-16499737
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 52 new or modified test 
files. {color} |
|| || || || {color:brown} HADOOP-15407 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
24s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
11s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
49s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m  
0s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
32m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
10s{color} | {color:green} HADOOP-15407 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 14s{color} | {color:orange} root: The patch generated 198 new + 5 unchanged 
- 0 fixed = 203 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
48s{color} | {color:red} hadoop-tools/hadoop-azure generated 4 new + 0 
unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}146m 16s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}348m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-azure |
|  |  Hard coded reference to an absolute pathname in 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(URI, 
Configuration)  At AzureBlobFileSystem.java:absolute pathname in 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(URI, 
Configuration)  At AzureBlobFileSystem.

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-03 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499624#comment-16499624
 ] 

Da Zhou commented on HADOOP-15407:
--

Submitting HADOOP-15407-HADOOP-15407.007.patch, all tests passed against my 
storage account in west US.

Updates in the patch:
- Resolved white space violation
- Resolved all findbugs violation except the “redundant null check for 
inputstream”
- Updated Hadoop-Common unit test “TestCommonConfigurationFields”  for azurebfs.


{noformat}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.222 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2E
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.652 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2E
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.844 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRandomRead
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 122.812 
s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRandomRead
[INFO] Running 
org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.008 s 
- in org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.682 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFlush
[WARNING] Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 
180.827 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFlush
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemOpen
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.902 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemOpen
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestFileSystemRegistration
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.168 s 
- in org.apache.hadoop.fs.azurebfs.ITestFileSystemRegistration
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
23.438 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractGetFileStatus
[WARNING] Tests run: 36, Failures: 0, Errors: 0, Skipped: 18, Time elapsed: 
27.869 s - in 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractGetFileStatus
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDelete
[WARNING] Tests run: 16, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 
8.447 s - in 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDelete
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.264 
s - in org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractCreate
[WARNING] Tests run: 22, Failures: 0, Errors: 0, Skipped: 11, Time elapsed: 
15.511 s - in 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractCreate
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractOpen
[WARNING] Tests run: 12, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 
6.441 s - in 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractOpen
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractRename
[WARNING] Tests run: 16, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 14.6 
s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractRename
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSecureDistCp
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 1.61 
s - in 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSecureDistCp
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSeek
[WARNING] Tests run: 36, Failures: 0, Errors: 0, Skipped: 18, Time elapsed: 
24.796 s - in 
org.apache.hadoop.fs.azurebfs.contract.ITestA

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-06-01 Thread Da Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498779#comment-16498779
 ] 

Da Zhou commented on HADOOP-15407:
--

Hi, [~ste...@apache.org], I'm fixing the patch according to the report. There 
is one findbug test I'm not sure: 
{noformat}
Redundant nullcheck of stream, which is known to be non-null in 
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(byte[],
 int, int) Redundant null check at AbfsHttpOperation.java:is known to be 
non-null in 
org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(byte[],
 int, int) Redundant null check at AbfsHttpOperation.java:[line 26]
{noformat}

Code is here, I don't understand why this is a redundant check,  is it possible 
to suppress this "bug"?
{code:borderStyle=solid}
 try (InputStream stream = this.connection.getInputStream()) {
   if (stream == null) {
 return;
   }
   boolean endOfStream = false;
...
...
}
{code}

Thanks,
Da
 

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch, HADOOP-15407-004.patch, 
> HADOOP-15407-HADOOP-15407.006.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have al

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-05-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497445#comment-16497445
 ] 

genericqa commented on HADOOP-15407:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 51 new or modified test 
files. {color} |
|| || || || {color:brown} HADOOP-15407 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 7s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
27s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
13s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m  
2s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
32m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} HADOOP-15407 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
14s{color} | {color:green} HADOOP-15407 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
4m  8s{color} | {color:orange} root: The patch generated 194 new + 0 unchanged 
- 0 fixed = 194 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
7s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project . {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
55s{color} | {color:red} hadoop-tools/hadoop-azure generated 17 new + 0 
unchanged - 0 fixed = 17 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 24s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-azure |
|  |  Hard coded reference to an absolute pathname in 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getHomeDirectory()  At 
AzureBlobFileSystem.java:absolute pathname in 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getHomeDirectory()  At 
AzureBlobFileSystem.java:[line 435] |
|  |  Should 
org

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-05-23 Thread Thomas Marquardt (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488207#comment-16488207
 ] 

Thomas Marquardt commented on HADOOP-15407:
---

Thanks, we are validating the initial patch.  Esfandiar and I have been quiet 
on https://issues.apache.org/jira/browse/HADOOP-15407 and the process of 
committing Azure Blob FS (ABFS), but we are ready to pick up the pace.  The 
reason for our delay was that we discovered a performance issue within the 
client, and were struggling with a fix.  The fix involved refactoring the ABFS 
connector, replacing the HTTP stack, and removing dependencies.  Now the read 
and write performance is better, the code is more self-contained, and we have a 
new patch that we will attach to the JIRA today.  [~DanielZhou] contributed to 
the second iteration.

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JI

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-05-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487780#comment-16487780
 ] 

Steve Loughran commented on HADOOP-15407:
-

Branch for this is created. 



> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, 
> HADOOP-15407-003.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452153#comment-16452153
 ] 

Steve Loughran commented on HADOOP-15407:
-

[~fabbri]: [~chris.douglas] & I will be proposing a branch for this to be 
pulled in, so allow more detailed review & testing before merge into trunk.

But yes, a big patch. If you look close, a lot of it is machine generated, so 
that can be glanced at but not worried about in detail (what can you do, change 
the code generator?).



> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-24 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451492#comment-16451492
 ] 

Aaron Fabbri commented on HADOOP-15407:
---

Wow this is a big patch. (Aside: We really need to move away from mega-patches 
IMO, it is antithetical to quality code reviews.) 
{quote}Third parties and customers have also done various testing of ABFS.
{quote}
Is there any specific reasons you didn't do this work with the Apache 
community?  If there are we should try to address them.

It is much easier for folks like me to digest if this is done as a series of 
smaller commits on a feature branch.  Do you have a clean commit history you 
can push to a public branch on github?
{quote} WASB is not deprecated but is in pure maintenance mode and customers 
should upgrade to ABFS once it hits General Availability later in CY18.
{quote}
Might want to add some caveats around that. ;)

 

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production e

[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449145#comment-16449145
 ] 

Esfandiar Manii commented on HADOOP-15407:
--

My bad, the order of diff was incorrect. Updated with the correct one. :)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449125#comment-16449125
 ] 

Devaraj Das commented on HADOOP-15407:
--

[~esmanii], the patch seems to have been generated incorrectly. I'd expect this 
jira is adding lot of new code, but the patch does otherwise :)

> Support Windows Azure Storage - Blob file system in Hadoop
> --
>
> Key: HADOOP-15407
> URL: https://issues.apache.org/jira/browse/HADOOP-15407
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Major
> Attachments: HADOOP-15407-001.patch
>
>
> *{color:#212121}Description{color}*
>  This JIRA adds a new file system implementation, ABFS, for running Big Data 
> and Analytics workloads against Azure Storage. This is a complete rewrite of 
> the previous WASB driver with a heavy focus on optimizing both performance 
> and cost.
>  {color:#212121} {color}
>  *{color:#212121}High level design{color}*
>  At a high level, the code here extends the FileSystem class to provide an 
> implementation for accessing blobs in Azure Storage. The scheme abfs is used 
> for accessing it over HTTP, and abfss for accessing over HTTPS. The following 
> URI scheme is used to address individual paths:
>  {color:#212121} {color}
>  
> {color:#212121}abfs[s]://@.dfs.core.windows.net/{color}
>  {color:#212121} {color}
>  {color:#212121}ABFS is intended as a replacement to WASB. WASB is not 
> deprecated but is in pure maintenance mode and customers should upgrade to 
> ABFS once it hits General Availability later in CY18.{color}
>  {color:#212121}Benefits of ABFS include:{color}
>  {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big 
> Data and Analytics workloads by allowing higher limits on storage 
> accounts{color}
>  {color:#212121}· Removing any ramp up time with Storage backend 
> partitioning; blocks are now automatically sharded across partitions in the 
> Storage backend{color}
> {color:#212121}          .         This avoids the need for using 
> temporary/intermediate files, increasing the cost (and framework complexity 
> around committing jobs/tasks){color}
>  {color:#212121}· Enabling much higher read and write throughput on 
> single files (tens of Gbps by default){color}
>  {color:#212121}· Still retaining all of the Azure Blob features 
> customers are familiar with and expect, and gaining the benefits of future 
> Blob features as well{color}
>  {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the 
> file system throughput and operations. Ambari metrics are not currently 
> implemented for ABFS, but will be available soon.{color}
>  {color:#212121} {color}
>  *{color:#212121}Credits and history{color}*
>  Credit for this work goes to (hope I don't forget anyone): Shane Mainali, 
> {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar 
> Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, 
> and James Baker. {color}
>  {color:#212121} {color}
>  *Test*
>  ABFS has gone through many test procedures including Hadoop file system 
> contract tests, unit testing, functional testing, and manual testing. All the 
> Junit tests provided with the driver are capable of running in both 
> sequential/parallel fashion in order to reduce the testing time.
>  {color:#212121}Besides unit tests, we have used ABFS as the default file 
> system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a 
> storage option. (HDFS is also used but not as default file system.) Various 
> different customer and test workloads have been run against clusters with 
> such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, 
> Spark Streaming and Spark SQL, and others have been run to do scenario, 
> performance, and functional testing. Third parties and customers have also 
> done various testing of ABFS.{color}
>  {color:#212121}The current version reflects to the version of the code 
> tested and used in our production environment.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop

2018-04-23 Thread Esfandiar Manii (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448976#comment-16448976
 ] 

Esfandiar Manii commented on HADOOP-15407:
--

{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCreate
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.924 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCreate
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.623 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy
[INFO] Running 
org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.731 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2EScale
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.169 
s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2EScale
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.202 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend
[INFO] Running 
org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.805 s 
- in org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
27.916 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestConfigurationServiceFieldsValidation
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 s 
- in 
org.apache.hadoop.fs.azurebfs.services.TestConfigurationServiceFieldsValidation
[INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestAbfsHttpServiceImpl
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.977 s 
- in org.apache.hadoop.fs.azurebfs.services.ITestAbfsHttpServiceImpl
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestParameterizedLoggingServiceImpl
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.283 s 
- in org.apache.hadoop.fs.azurebfs.services.TestParameterizedLoggingServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.services.TestLoggingServiceImpl
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.253 s 
- in org.apache.hadoop.fs.azurebfs.services.TestLoggingServiceImpl
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestNetworkThroughputAnalysisServiceImpl
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.87 s 
- in 
org.apache.hadoop.fs.azurebfs.services.TestNetworkThroughputAnalysisServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestReadWriteAndSeek
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 244.85 s 
- in org.apache.hadoop.fs.azurebfs.services.ITestReadWriteAndSeek
[INFO] Running 
org.apache.hadoop.fs.azurebfs.services.TestAbfsStatisticsServiceImpl
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.195 s 
- in org.apache.hadoop.fs.azurebfs.services.TestAbfsStatisticsServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestTracingServiceImpl
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.893 s 
- in org.apache.hadoop.fs.azurebfs.services.ITestTracingServiceImpl
[INFO] Running org.apache.hadoop.fs.azurebfs.utils.TestUriUtils
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s 
- in org.apache.hadoop.fs.azurebfs.utils.TestUriUtils
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestWasbAbfsCompatibility
[WARNING] Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
11.948 s - in org.apache.hadoop.fs.azurebfs.ITestWasbAbfsCompatibility
[INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.894 s 
- in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp
[WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 0.834 
s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp
[INFO] Running 
org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.694 
s -