[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626277#comment-16626277 ] Da Zhou commented on HADOOP-15407: -- Hi, since HADOOP-15407 is merged to trunk, should we move all the unfinished/ongoing sub tasks to HADOOP-15763 Über-JIRA: abfs phase II? > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624927#comment-16624927 ] Hudson commented on HADOOP-15407: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15037 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15037/]) HADOOP-15407. HADOOP-15540. Support Windows Azure Storage - Blob file (tmarq: rev f044deedbbfee0812316d587139cb828f27172e9) * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferWorker.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractSeek.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/package-info.java * (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/FileSystemOperationUnhandledException.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidUriException.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestTracingServiceImpl.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/diagnostics/package-info.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/diagnostics/BooleanConfigurationBasicValidator.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/diagnostics/package-info.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/Abfs.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/diagnostics/IntegerConfigurationBasicValidator.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractMkdir.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/AbfsHttpService.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/constants/TestConfigurationKeys.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRandomRead.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAppend.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemCopy.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAzureBlobFileSystemBasics.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/package-info.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/package.html * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemUriSchemes.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractRootDirectory.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/ServiceResolutionException.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemOpen.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/SecureAzureBlobFileSystem.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContractCreate.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemFileStatus.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/MockAbfsServiceInjectorImpl.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/annotations/package-info.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/ReadBufferStatus.java * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/TimeoutException.java * (edit) hadoop-tools/hadoop-azure/src/test/resources/log4j.properties * (add) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/services/ListResultSchema.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsFileSystemContract.java * (add) h
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619161#comment-16619161 ] Sean Mackrory commented on HADOOP-15407: I'm comfortable with that. The changes outside of the module (and even outside of the new package in the module) are very few in number and they're quite trivial. I've done some testing with MapReduce, Hive and Spark (not on the exact current branch state, but recently with most of it) and all issues found have been fixed. {quote}However vote + merge needs another week time as per my understanding{quote} According to the by-laws (https://hadoop.apache.org/bylaws.html), we can merge as soon as we have 3 +1's from active committers and there's consensus. In this case, the 7 day convention is a courtesy. Once we have 3 +1's, if we'd like to wrap it up I think it's reasonable to ask if anybody would like more time (like the weekend) to evaluate it and consider consensus reached if nobody says so. {quote} Consensus approval of active committers, but with a minimum of one +1. The code can be committed after the first +1, unless the code change represents a merge from a branch, in which case three +1s are required. .. Votes relating to code changes are not subject to a strict timetable but should be made as timely as possible. {quote} > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have use
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618542#comment-16618542 ] Sunil Govindan commented on HADOOP-15407: - Hi [~ste...@apache.org] [~tmarquardt] [~mackrorysd] [~DanielZhou] Thanks for closing many items in this and starting Vote thread. You have mentioned in mail thread that this feature can work independently and has no impacts to other module. However vote + merge needs another week time as per my understanding. Could you please confirm the stability of this feature and any potential risk in going with 3.2 release train ? Because we were planning for 3.2 branch cut this week and we might need to delay this for a week atleast to get this feature in. As long as you feel the feature is functional (barring improvements for next dot releases), has no major impacts and could merge in a week, I am fine in getting this in. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance,
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618227#comment-16618227 ] Steve Loughran commented on HADOOP-15407: - bq. I have not seen the HADOOP-15761 failure, but I'm fine with updating ABFS to not use regex, or whatever it takes to make it robust. Someone who is able to reproduce the failure should fix it. Maybe I don't see it because I don't have OpenSSL? I'm not worried about it; I saw it on an IDE run of all the abfs tests. I don't see it on standalone test runs, which makes me think its some state preserved from a previous test. I've just created HADOOP-15763 for all those followup issues. I think HADOOP-15761 is in that category. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618204#comment-16618204 ] Thomas Marquardt commented on HADOOP-15407: --- I was able to successfully rebase on trunk. The test results look good for hadoop-azure and hadoop-common. I force pushed the changes to the HADOOP-15407 branch. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618111#comment-16618111 ] Thomas Marquardt commented on HADOOP-15407: --- Ok, it is good to know that the rebase was a success for you. I am still running tests, but so far, so good. I will force push if all the tests pass and infra allows me to do so. I have not seen the HADOOP-15761 failure, but I'm fine with updating ABFS to not use regex, or whatever it takes to make it robust. Someone who is able to reproduce the failure should fix it. Maybe I don't see it because I don't have OpenSSL? > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618086#comment-16618086 ] Steve Loughran commented on HADOOP-15407: - I've just done a rebase & retest locally, only one issue w.r.t abfs testing: HADOOP-15761, and that's nothing serious. I don't know if you'll be able to force push the rebased branch up as infra like to lock that down to stop anyone forcing up some rollback of branches. Try it —If it doesn't take, file a JIRA on the INFRA project asking the branch to support it. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618070#comment-16618070 ] Thomas Marquardt commented on HADOOP-15407: --- I am going to rebase branch HADOOP-15407 on the latest trunk today. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617237#comment-16617237 ] Sunil Govindan commented on HADOOP-15407: - Thanks [~DanielZhou], I will move to 3.3 for now as target version. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616590#comment-16616590 ] Da Zhou commented on HADOOP-15407: -- Hi [~sunilg], sorry for this late reply, we were not able to make it before 15th Sept. We are still working on resolving the rest JIRAs, once it is in good shape, we will call out for a discuss /vote. Thanks. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614209#comment-16614209 ] Sunil Govindan commented on HADOOP-15407: - Hi [~DanielZhou] and [~ste...@apache.org] Apart from few open issues, 4 issues are in Patch Available state. It seems there are closer to commit. However since this work is in branch, we need to call out for a discuss/vote. Given 3.2 is very close (code freeze by 15th Sept), could you please let me know how much more time require here. Thank You. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscri
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599829#comment-16599829 ] Thomas Marquardt commented on HADOOP-15407: --- [~ste...@apache.org], the 403 errors are happening because your storage account key was updated. Please try the new key I sent you. Also, make sure you have the latest sources and refer to the "Testing the Azure ABFS Client" section of testing_azure.md. The config was updated recently by HADOOP-15663. In a nutshell, add the following to src/test/resources/azure-auth-keys.xml: {noformat} http://www.w3.org/2001/XInclude";> fs.azure.abfs.account.name {ACCOUNT_NAME}.dfs.core.windows.net fs.azure.account.key.{ACCOUNT_NAME}.dfs.core.windows.net {ACCOUNT_ACCESS_KEY} fs.azure.wasb.account.name {ACCOUNT_NAME}.blob.core.windows.net fs.azure.account.key.{ACCOUNT_NAME}.blob.core.windows.net {ACCOUNT_ACCESS_KEY} fs.contract.test.fs.abfs abfs://{CONTAINER_NAME}@{ACCOUNT_NAME}.dfs.core.windows.net A file system URI to be used by the contract tests. fs.contract.test.fs.wasb wasb://{CONTAINER_NAME}@{ACCOUNT_NAME}.blob.core.windows.net A file system URI to be used by the contract tests. {noformat} > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HD
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599045#comment-16599045 ] Steve Loughran commented on HADOOP-15407: - I'm still hoping to get things stable while its easier to do faster review & big rework cycles. I've been updating & trying to play with adding support for this in my downstream-of-spark tests (https://github.com/hortonworks-spark/cloud-integration) . While not a "real" integration suite, it works well to pick up classpath integration issues, and where spark's expectation of a store doesn't match what is there (e.g: partitioning, seek, rename). Currently: getting 403 errors, which implies that somehow my current config has broken, or there's been some remote change. Filed HADOOP-15710 for mapping 403 -> AccessDeniedException; we should also make sure that files you can't read also have a similar exception. I've now added special support for abfs in cloudstore https://github.com/steveloughran/cloudstore/releases/tag/tag_2018-08-31-release , but not worked out what's up. I do think I have the valid settings. I'm worried that if I can't get it to work, we're not ready to drop it on unsuspecting users > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598962#comment-16598962 ] Sean Mackrory commented on HADOOP-15407: [~ste...@apache.org] What are your thoughts on starting a merge vote once the work Thomas listed is all resolved? I have one patch in flight, but nothing I'm working on impacts stuff outside this module, so I'm happy to proceed with it post-merge. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h.
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596884#comment-16596884 ] Da Zhou commented on HADOOP-15407: -- Yes, [~sunilg], it will be ready. We are now working on JIRAS mentioned by [~tmarquardt] in his last comment, most of them are already under review. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595909#comment-16595909 ] Sunil Govindan commented on HADOOP-15407: - [~DanielZhou] [~ste...@apache.org] Thanks for the work on this. As many of the subtasks are done, is this feature change ready to go for a release (3.2 which is planned for next month end) > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595246#comment-16595246 ] Thomas Marquardt commented on HADOOP-15407: --- Thanks Sean, we would like to merge to trunk soon. I plan to upload a patch for HADOOP-15582 (documentation) this week, and HADOOP-15663 (simplify configuration) is in review so should also complete this week. I have not had a chance to look at HADOOP-15666 yet, but we should be able to do that this week too. We are also looking to improve test paralleization and add the ability to select WASB, ABFS, or both when running tests (HADOOP-15664). I expect all this to be done this week. If the community has any additional requests before the merge to trunk, lets discuss them here. It would be great if we could start voting on the merge. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591812#comment-16591812 ] genericqa commented on HADOOP-15407: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HADOOP-15407 does not apply to HADOOP-15407. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-15407 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927853/HADOOP-15407-HADOOP-15407-008.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15090/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmar
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591800#comment-16591800 ] Sean Mackrory commented on HADOOP-15407: Hey I wanted to ask what everyone else's thoughts are on when we intend to merge this. Several of the recent changes and much of what remains would be fine as independent, incremental developments as trunk. The service is not GA, yet, but it sounds like there's already a strong need for compatibility in the clients (HADOOP-15546). HADOOP-15544, HADOOP-15582, HADOOP-15663 seem like good things to do pre-merge. I know HADOOP-15666 is supposed to work with the Namespace Service enabled prior to GA but doesn't now, and that would be good to resolve one way or another. Everything else that I'm aware of seems like it doesn't require maintaining this in a separate branch. Thoughts? > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Blocker > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534230#comment-16534230 ] Sean Mackrory commented on HADOOP-15407: As I understand it, the initial commit in HADOOP-15540 covers everything referenced by [~esmanii]'s issues that are connected to this, correct? If so, I can go ahead and close those all as duplicates. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514119#comment-16514119 ] Steve Loughran commented on HADOOP-15407: - OK committed patch 008. under the JIRA. HADOOP-15540; which I've now closed > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514039#comment-16514039 ] Steve Loughran commented on HADOOP-15407: - I've just added abfs support in the storediag feature in my [little cloudstore project](https://github.com/steveloughran/cloudstore/releases/tag/tag_2018_06_15_release); knows about all the dependencies and scans for them all before trying to do any operations on the FS. Not done: anything related to credentials or probing the endpoint prior to instantiating the FS; the kind of thing you need for the next stage of fault diagnostics > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#7600
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513094#comment-16513094 ] genericqa commented on HADOOP-15407: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 55 new or modified test files. {color} | || || || || {color:brown} HADOOP-15407 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 25s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 58s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 45s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 22m 45s{color} | {color:green} HADOOP-15407 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 25m 18s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 36s{color} | {color:green} HADOOP-15407 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 7s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 18s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}153m 55s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}345m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps | | | hadoop.yarn.server.timelineservice
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512784#comment-16512784 ] Thomas Marquardt commented on HADOOP-15407: --- The plan sounds good to me. Credit for this work goes to (hope I don't forget anyone): Steve Loughran, Shane Mainali, {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, and James Baker. {color} {color:#212121} {color} > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsub
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512773#comment-16512773 ] Steve Loughran commented on HADOOP-15407: - Now, regarding JIRAs & things. What JIRA to put into the git commit message, and what JIRA to close. Here's my proposal # This is the Uber-JIRA, stays open until branch is merged in # HADOOP-15432 is renamed "Core ABFS module"; the initial patch commit message will reference that and include this JIRA too, e.g. HADOOP-15407/HADOOP-15542 ... # Credit in patch to: Esfandiar, Thomas, Da Zhou. Let me know who has contributed code to it & they should be named too. # The other JIRAs we have here evolve from the initial code submission to more one of review of modules/classes, ideally scoped well, e.g. * imports, javadocs & IDE complaints ( I have this) * fix all outstanding javadoc issues * configuration: model names, docs, XML values in core-default, etc * output stream code review * input stream (including ReadBuffer logic) * General FS Semantics * Docs > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, HADOOP-15407-008.patch, > HADOOP-15407-HADOOP-15407-008.patch, HADOOP-15407-HADOOP-15407.006.patch, > HADOOP-15407-HADOOP-15407.007.patch, HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for q
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512486#comment-16512486 ] Steve Loughran commented on HADOOP-15407: - Let's not worry about those failures; they are unrelated. It may be they've already been fixed in trunk, it's just that this branch hasn't got the fix. I'm about to update the HADOOP-15407 branch to where trunk is, push that up and then apply the patch you have here; resubmit that as a patch 009 & if jenkins and my test happy, merge it in as the first initial patch. Trunk has moved up to 7.0.0 of the azure SDK, so there's a bit of a merge conflict with this patch and that, and a risk of functionality conflict: we need that SDK update in the branch. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, > HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color}
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510403#comment-16510403 ] Da Zhou commented on HADOOP-15407: -- The following *independent* unit tests failed in latest Jenkins build: - hadoop.fs.shell.TestCopyFromLocal - hadoop.crypto.key.TestKeyShell - hadoop.crypto.key.TestKeyProviderFactory Because all Azure changes goes under project hadoop-azure, so these unit test failure cannot be caused by the patch. Can someone help? Regards, Da > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Da Zhou >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch, > HADOOP-15407-HADOOP-15407.008.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510371#comment-16510371 ] genericqa commented on HADOOP-15407: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 55 new or modified test files. {color} | || || || || {color:brown} HADOOP-15407 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 36s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 38s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 27s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 39s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 28s{color} | {color:green} HADOOP-15407 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 7s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 45s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}219m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.shell.TestCopyFromLocal | | | hadoop.crypto.key.TestKeyShell | | | hadoop.crypto.key.TestKeyProviderFactory | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-15407 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927526/HADOOP-15407-HADOOP-15407.008.patch | | Optional Tests | asflicense compile javac javadoc mvn
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510334#comment-16510334 ] genericqa commented on HADOOP-15407: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 55 new or modified test files. {color} | || || || || {color:brown} HADOOP-15407 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 30s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 8s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 17s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 54s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 33m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 1s{color} | {color:green} HADOOP-15407 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 7s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 9s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}219m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.shell.TestCopyFromLocal | | | hadoop.crypto.key.TestKeyShell | | | hadoop.crypto.key.TestKeyProviderFactory | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-15407 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927516/HADOOP-15407-HADOOP-15407.008.patch | | Optional Tests | asflicense compile javac javadoc mvn
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509107#comment-16509107 ] Da Zhou commented on HADOOP-15407: -- Submitting HADOOP-15407-HADOOP-15407.008.patch, all ABFS tests passed against my storage account in west US. Updates in the patch: - Resolved findbugs violations - Resolved checkstyle violations - Added missing java docs - Updated AzureBlobFileSystemException as a subclass of IOE and updated exception check - Reinstate wasb contract tests to parallel runs, enable the parallel runs for ABFS contract tests - Updated names of Service injection interface and impl with Azure specific names - Replaced loggingService with SL4J {noformat} mvn -T 1C -Dparallel-tests -DtestsThreadCount=8 clean verify [INFO] --- maven-antrun-plugin:1.7:run (create-parallel-tests-dirs) @ hadoop-azure --- [INFO] Executing tasks main: [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/1 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/2 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/3 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/4 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/5 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/6 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/7 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test-dir/8 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/1 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/2 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/3 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/4 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/5 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/6 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/7 [mkdir] Created dir: /home/zhoda/dev/Projects/apache-hadoop/hadoop/hadoop-tools/hadoop-azure/target/test/8 [INFO] Executed tasks [INFO] [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-azure --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider [INFO] Running org.apache.hadoop.fs.azure.TestBlobMetadata [INFO] Running org.apache.hadoop.fs.azure.TestWasbFsck [INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked [INFO] Running org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations [INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic [INFO] Running org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer [INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 1.098 s - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic [WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 1.934 s - in org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider [INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency [INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked [WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 8.097 s - in org.apache.hadoop.fs.azure.TestWasbFsck [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.808 s - in org.apache.hadoop.fs.azure.TestBlobMetadata [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.033 s - in org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations [INFO] Running org.apache.hadoop.fs.azure.metrics.TestNativeAzureFileSystemMetricsSystem [INFO] Running org.apache.hadoop.fs.azure.metrics.TestBandwidthGaugeUpdater [INFO] Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorization [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.308 s - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.594 s - in org.apache.hadoop.fs.
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507132#comment-16507132 ] Steve Loughran commented on HADOOP-15407: - followup: got this working from the hadoop fs command, though we need to understand/sort out the packaging there * the ASF distro doesn't put the hadoop-azure connector & deps into hadoop-common, but into hadoop-tools lib, and that doesn't get onto the CP for the {{hadoop fs}} command. Though the {{hadoop fs s3a://}} operations do work out the box. I need to understand more of what goes on there. * even with the hadoop-azure, azure-sdk & htrace artifacts copied from hadoop tools to hadoop common lib I got a CNFE for htrace {code} Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/htrace/fasterxml/jackson/core/JsonProcessingException at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) at java.lang.Class.getDeclaredConstructors(Class.java:2020) at com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245) at com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99) at com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:658) at com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:882) at com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:805) at com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:282) at com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:214) at com.google.inject.internal.InjectorImpl.getInternalFactory(InjectorImpl.java:890) at com.google.inject.internal.FactoryProxy.notify(FactoryProxy.java:46) at com.google.inject.internal.ProcessedBindingData.runCreationListeners(ProcessedBindingData.java:50) at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:134) at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107) at com.google.inject.Guice.createInjector(Guice.java:96) at com.google.inject.Guice.createInjector(Guice.java:73) at com.google.inject.Guice.createInjector(Guice.java:62) at org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.(ServiceProviderImpl.java:43) at org.apache.hadoop.fs.azurebfs.services.ServiceProviderImpl.create(ServiceProviderImpl.java:60) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:102) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352) at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250) at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233) at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104) at org.apache.hadoop.fs.shell.Command.run(Command.java:177) at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) Caused by: java.lang.ClassNotFoundException: org.apache.htrace.fasterxml.jackson.core.JsonProcessingException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) {code} Had to go {code} cp ./share/hadoop/yarn/timelineservice/lib/htrace-core-3.1.0-incubating.jar share/hadoop/common/lib {code} I don't get this as the dependencies are set up (its a 'compile' dep), so not sure why it isn't ending up in hadoop tools lib. Again, clearly something packaging related. I'm not sure how to test this stuff except manually; we don't have any integration tests of the actual packaged code (yet), though its probably possible via some new test suite in the hadoop-dist packaging. Or something downstream/nearby. I know all the hadoop -stack-vendors will have some tests for this, but its testing their packaging, not the b
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505018#comment-16505018 ] Steve Loughran commented on HADOOP-15407: - h1. First attempt at testing I found it very hard to get up and running. As in: I have one of the contract tests going, but nothing else yet. The testing docs will need to explain how to get started. The easier the setup process, the easier writing those docs become I'd make writing that doc priority, as without that, getting the tests working will be a blocker to reviews. Key things I had trouble with * the difference between the wasb & dfs accounts * what's needed in terms of pre-test store container setup. I think it's happening automatically, but that's probably repeating the same problem we see with wasb: container leakage & the need to periodically purge them all. If that's the case, a new version of {{org.apache.hadoop.fs.azure.integration.CleanupTestContainers}} is needed, and again, the docs. * Lack of meaningful details on why a test setup failed other than "skipped". The attached patch addresses that by including a message in the Assume clause. (side-note: I expect meaningful messages in *all* Assume.assume clauses, as I try to do in my own contribs). I tried to get {{ITestAzureBlobFileSystemMkDir}} up and working and didn't get that far, timeouts. Every test needs a timeout. This is to avoid messages like {code} [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on project hadoop-azure: There was a timeout or other error in the fork -> [Help 1] {code} When maven kills a test, all the output is lost, and, as all test teardown is skipped, things on remote stores left in a mess. I've added one to {{DependencyInjectedTest}} where it will be found everywhere This shows me what's hanging. I'm assuming its still test setup related, so will look at my config options more. But the fact things are timing out if the tests are misconfigured is a problem on its own. {code} "Thread-0" #13 prio=5 os_prio=31 tid=0x7f97061b3000 nid=0x5803 waiting on condition [0x7f8ac000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:255) at com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:769) at com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:756) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.exists(StorageInterfaceImpl.java:233) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:856) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538) at org.apache.hadoop.fs.azurebfs.DependencyInjectedTest.initialize(DependencyInjectedTest.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} # I don't think falling back to anon should happen, at least with tests. # I absolutely don't think that login failures should be something you retry on. I see there's a call to {{suppressRetryPolicyInClientIfNeeded();}}, so the tests need to make sure that's running. I think production-side code needs to look at the auth codepath and make sure that its operations are all fail fast. Proposed: add a test for this. Create a config, remove the auth, try to do anon access to your test containers. Expect it to fail fast. Other aspects of that test: LambdaTestUtils.intercept() loves closures which return things other than void: the string value of the res
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504683#comment-16504683 ] Steve Loughran commented on HADOOP-15407: - [~fabbri], [~esmanii] the hadoop project needs to think about what to do w.r.t htrace in future, given its not going to leave incubation. HDFS uses, so I'm not worried about it being added as a dependency here, but it's not going to get any maintenance unless we think about co-opting it into the client-side tracing into own codebase, leaving log collection to other tools. With Todd and Colin Patrick McCabe on our committer list, we could make a case for that. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) -
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503951#comment-16503951 ] Esfandiar Manii commented on HADOOP-15407: -- Thanks [~fabbri]. Yes, htrace was very useful (credits to Steve for letting us know to use it :) ). > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503693#comment-16503693 ] Aaron Fabbri commented on HADOOP-15407: --- Thanks.. was still reading through latest patch. Just don't want "read only" code that can only be reliably changed by vendors. Glad to see some use of htrace. Did you guys find it useful during development? > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503686#comment-16503686 ] Thomas Marquardt commented on HADOOP-15407: --- We updated the patch recently and removed all the generated code. We will begin working to address the feedback above. Thanks, and keep the feedback coming! > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503682#comment-16503682 ] Aaron Fabbri commented on HADOOP-15407: --- Thanks for the update. In terms of us maintaining this code long term, how do we generate the generated code? I'm generally going to be -1 checking in generated code and not giving the community a way to generate it. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch, HADOOP-15407-HADOOP-15407.007.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503626#comment-16503626 ] Steve Loughran commented on HADOOP-15407: - I'm starting off with some high level "strategic issues", not looking at the low-level details. IntelliJ is flagging this up there, but its secondary and can be dealt with package-by-package. I haven't yet done the testing. The general things I review by are covered in my someout out of date [proposed styleguide update|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md]. That doesn't cover Java 8, lambda expressions and streams, but it does show what I'm thinking about when I review stuff The key point: we want the code base to be broadly maintainable by anyone, tests to be fast and fail meaningfully, where possible as much of the same code style as elsewhere (though that evolves; moving things on is always good). So I'm looking it from a not just 'does it work' viewpoint, but "If I was left to maintain this, would I stand a chance?" Key requirements to be fixed before the big patch goes in * {{AzureBlobFileSystemException}} a subclass of IOE. * All those checkstyle complaints except in generated code. * And findbugs, except when it is wrong & needs to be told so * reinstate wasb contract tests to parallel runs * use Azure specific names for the service injection interfaces and impls. Followup JIRAs to go in * doc dependency injection in the javadocs * review exception handling/wrapping * Logging to use SLF4J directly for formatting * Testing enhancements as proposed below * Documentation. Oh, and I'd like the first patch to be one which only does the POM dependency checks. This gives our new branch committers an initial patch to work with. h3. Poms and dependencies The patch to change the POM dependencies should go in on its own (presumably, first patch), so it's more isolated. People cherry picking or doing diffs on POMs will appreciate this. Interesting that you had to exclude the guice transitive dependency on guava, but no other project did. Assumption: maven's "depth first" resolution logic has hidden it from others. h3. Dependency Injection We've tended to avoid this in the past; more a metric of the age of the codebase & the odd bad experience in the past. It is used in YARN a fair bit though. I can see you've been busy here That's fine, except you do get to document it in detail. I also worry that names like "ServiceProvider" may be a bit too generic; sommething like AzureStoreServiceProvider would be clearer. Proposed: add a JIRA to cover names & some coverage in the package.html file of {{org.apache.hadoop.fs.azurebfs}} h3. Exceptions My ideal: exceptions which include stacks, URLs, anything else to diagnose problems, ideally consistent with the Hadoop stack "IOException" everywhere world. * Pleae make {{AzureBlobFileSystemException}} a subclass of IOE. This matches the rest of our codebase and avoids having to translate things to IOEs in the FileSystem implementations. * make sure that stack traces are always retained... {{AzureBlobFileSystem.evaluateFileSystemPathOperation())}} is an example of where they are being stripped out in some cases (404, 409) * Also: look @ {{PathIOException}} for an exception class which takes a path; if {{AzureBlobFileSystemException}} could be built under that, it'd contain the path and operation, and be something other code already handles. Proposed: move {{AzureBlobFileSystemException}} under IOE before the first patch goes in, tune the other details separately h3. Logging Key question: why not just use SLf4J API directly? Except for that specific case method {{log(LogLevel logLevel, tring message, Object... arguments)}}. I don't see anything there that's not in SLF4J And, Why a new message format? The SLF4J Log.info("{}", object) log message constructor is more efficient than the JDK formatter, and handles null object values and only calls toString() on its args when actually logging them. * Followup 2: and a failure in formatting must be swallowed, not raised. Everyone hates a log failure, not least because the debug level logging doesn't normally get any test coverage. Given how fundamental this log injection is going on, I guess its hard to change, but I believe the formatting should be handed off to SL4J rather thsn trying to do thingd with a JSON parser which will inevitably (a) be slow and (b) be brittle. Proposed: add a JIRA: SLf4J to do formatting for AbfsLoggingService. This is going to be a blocker BTW, keeping cost of logging down is something we care about, which is CPU overhead + cost of creating objects, calling toString(), etc. h2. Testing We shouldn't need to add a new auth-keys file, just share the existing azure one, adding the new relevant fs URI. Simplifies path & allows .
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499737#comment-16499737 ] genericqa commented on HADOOP-15407: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 52 new or modified test files. {color} | || || || || {color:brown} HADOOP-15407 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 24s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 11s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 49s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 0s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 32m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 10s{color} | {color:green} HADOOP-15407 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 14s{color} | {color:orange} root: The patch generated 198 new + 5 unchanged - 0 fixed = 203 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 7s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s{color} | {color:red} hadoop-tools/hadoop-azure generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}146m 16s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}348m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-tools/hadoop-azure | | | Hard coded reference to an absolute pathname in org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(URI, Configuration) At AzureBlobFileSystem.java:absolute pathname in org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(URI, Configuration) At AzureBlobFileSystem.
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499624#comment-16499624 ] Da Zhou commented on HADOOP-15407: -- Submitting HADOOP-15407-HADOOP-15407.007.patch, all tests passed against my storage account in west US. Updates in the patch: - Resolved white space violation - Resolved all findbugs violation except the “redundant null check for inputstream” - Updated Hadoop-Common unit test “TestCommonConfigurationFields” for azurebfs. {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.222 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2E [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.652 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2E [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.844 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRandomRead [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 122.812 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRandomRead [INFO] Running org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.008 s - in org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.682 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFlush [WARNING] Tests run: 4, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 180.827 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFlush [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemOpen [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.902 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemOpen [INFO] Running org.apache.hadoop.fs.azurebfs.ITestFileSystemRegistration [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.168 s - in org.apache.hadoop.fs.azurebfs.ITestFileSystemRegistration [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename [WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 23.438 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractGetFileStatus [WARNING] Tests run: 36, Failures: 0, Errors: 0, Skipped: 18, Time elapsed: 27.869 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractGetFileStatus [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDelete [WARNING] Tests run: 16, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 8.447 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDelete [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.264 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractCreate [WARNING] Tests run: 22, Failures: 0, Errors: 0, Skipped: 11, Time elapsed: 15.511 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractCreate [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractOpen [WARNING] Tests run: 12, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 6.441 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractOpen [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractRename [WARNING] Tests run: 16, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 14.6 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractRename [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSecureDistCp [WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 1.61 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSecureDistCp [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractSeek [WARNING] Tests run: 36, Failures: 0, Errors: 0, Skipped: 18, Time elapsed: 24.796 s - in org.apache.hadoop.fs.azurebfs.contract.ITestA
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498779#comment-16498779 ] Da Zhou commented on HADOOP-15407: -- Hi, [~ste...@apache.org], I'm fixing the patch according to the report. There is one findbug test I'm not sure: {noformat} Redundant nullcheck of stream, which is known to be non-null in org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(byte[], int, int) Redundant null check at AbfsHttpOperation.java:is known to be non-null in org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(byte[], int, int) Redundant null check at AbfsHttpOperation.java:[line 26] {noformat} Code is here, I don't understand why this is a redundant check, is it possible to suppress this "bug"? {code:borderStyle=solid} try (InputStream stream = this.connection.getInputStream()) { if (stream == null) { return; } boolean endOfStream = false; ... ... } {code} Thanks, Da > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch, HADOOP-15407-004.patch, > HADOOP-15407-HADOOP-15407.006.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have al
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497445#comment-16497445 ] genericqa commented on HADOOP-15407: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 51 new or modified test files. {color} | || || || || {color:brown} HADOOP-15407 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 7s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 27s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 13s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 2s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 32m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} HADOOP-15407 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 14s{color} | {color:green} HADOOP-15407 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 4m 8s{color} | {color:orange} root: The patch generated 194 new + 0 unchanged - 0 fixed = 194 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 44s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 7s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 55s{color} | {color:red} hadoop-tools/hadoop-azure generated 17 new + 0 unchanged - 0 fixed = 17 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 24s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-tools/hadoop-azure | | | Hard coded reference to an absolute pathname in org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getHomeDirectory() At AzureBlobFileSystem.java:absolute pathname in org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getHomeDirectory() At AzureBlobFileSystem.java:[line 435] | | | Should org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488207#comment-16488207 ] Thomas Marquardt commented on HADOOP-15407: --- Thanks, we are validating the initial patch. Esfandiar and I have been quiet on https://issues.apache.org/jira/browse/HADOOP-15407 and the process of committing Azure Blob FS (ABFS), but we are ready to pick up the pace. The reason for our delay was that we discovered a performance issue within the client, and were struggling with a fix. The fix involved refactoring the ABFS connector, replacing the HTTP stack, and removing dependencies. Now the read and write performance is better, the code is more self-contained, and we have a new patch that we will attach to the JIRA today. [~DanielZhou] contributed to the second iteration. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JI
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487780#comment-16487780 ] Steve Loughran commented on HADOOP-15407: - Branch for this is created. > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch, > HADOOP-15407-003.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452153#comment-16452153 ] Steve Loughran commented on HADOOP-15407: - [~fabbri]: [~chris.douglas] & I will be proposing a branch for this to be pulled in, so allow more detailed review & testing before merge into trunk. But yes, a big patch. If you look close, a lot of it is machine generated, so that can be glanced at but not worried about in detail (what can you do, change the code generator?). > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451492#comment-16451492 ] Aaron Fabbri commented on HADOOP-15407: --- Wow this is a big patch. (Aside: We really need to move away from mega-patches IMO, it is antithetical to quality code reviews.) {quote}Third parties and customers have also done various testing of ABFS. {quote} Is there any specific reasons you didn't do this work with the Apache community? If there are we should try to address them. It is much easier for folks like me to digest if this is done as a series of smaller commits on a feature branch. Do you have a clean commit history you can push to a public branch on github? {quote} WASB is not deprecated but is in pure maintenance mode and customers should upgrade to ABFS once it hits General Availability later in CY18. {quote} Might want to add some caveats around that. ;) > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production e
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449145#comment-16449145 ] Esfandiar Manii commented on HADOOP-15407: -- My bad, the order of diff was incorrect. Updated with the correct one. :) > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch, HADOOP-15407-002.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449125#comment-16449125 ] Devaraj Das commented on HADOOP-15407: -- [~esmanii], the patch seems to have been generated incorrectly. I'd expect this jira is adding lot of new code, but the patch does otherwise :) > Support Windows Azure Storage - Blob file system in Hadoop > -- > > Key: HADOOP-15407 > URL: https://issues.apache.org/jira/browse/HADOOP-15407 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Esfandiar Manii >Assignee: Esfandiar Manii >Priority: Major > Attachments: HADOOP-15407-001.patch > > > *{color:#212121}Description{color}* > This JIRA adds a new file system implementation, ABFS, for running Big Data > and Analytics workloads against Azure Storage. This is a complete rewrite of > the previous WASB driver with a heavy focus on optimizing both performance > and cost. > {color:#212121} {color} > *{color:#212121}High level design{color}* > At a high level, the code here extends the FileSystem class to provide an > implementation for accessing blobs in Azure Storage. The scheme abfs is used > for accessing it over HTTP, and abfss for accessing over HTTPS. The following > URI scheme is used to address individual paths: > {color:#212121} {color} > > {color:#212121}abfs[s]://@.dfs.core.windows.net/{color} > {color:#212121} {color} > {color:#212121}ABFS is intended as a replacement to WASB. WASB is not > deprecated but is in pure maintenance mode and customers should upgrade to > ABFS once it hits General Availability later in CY18.{color} > {color:#212121}Benefits of ABFS include:{color} > {color:#212121}· Higher scale (capacity, throughput, and IOPS) Big > Data and Analytics workloads by allowing higher limits on storage > accounts{color} > {color:#212121}· Removing any ramp up time with Storage backend > partitioning; blocks are now automatically sharded across partitions in the > Storage backend{color} > {color:#212121} . This avoids the need for using > temporary/intermediate files, increasing the cost (and framework complexity > around committing jobs/tasks){color} > {color:#212121}· Enabling much higher read and write throughput on > single files (tens of Gbps by default){color} > {color:#212121}· Still retaining all of the Azure Blob features > customers are familiar with and expect, and gaining the benefits of future > Blob features as well{color} > {color:#212121}ABFS incorporates Hadoop Filesystem metrics to monitor the > file system throughput and operations. Ambari metrics are not currently > implemented for ABFS, but will be available soon.{color} > {color:#212121} {color} > *{color:#212121}Credits and history{color}* > Credit for this work goes to (hope I don't forget anyone): Shane Mainali, > {color:#212121}Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar > Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, > and James Baker. {color} > {color:#212121} {color} > *Test* > ABFS has gone through many test procedures including Hadoop file system > contract tests, unit testing, functional testing, and manual testing. All the > Junit tests provided with the driver are capable of running in both > sequential/parallel fashion in order to reduce the testing time. > {color:#212121}Besides unit tests, we have used ABFS as the default file > system in Azure HDInsight. Azure HDInsight will very soon offer ABFS as a > storage option. (HDFS is also used but not as default file system.) Various > different customer and test workloads have been run against clusters with > such configurations for quite some time. Benchmarks such as Tera*, TPC-DS, > Spark Streaming and Spark SQL, and others have been run to do scenario, > performance, and functional testing. Third parties and customers have also > done various testing of ABFS.{color} > {color:#212121}The current version reflects to the version of the code > tested and used in our production environment.{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15407) Support Windows Azure Storage - Blob file system in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448976#comment-16448976 ] Esfandiar Manii commented on HADOOP-15407: -- {code:java} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCreate [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.924 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCreate [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.623 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemCopy [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.731 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemInitAndCreate [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2EScale [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.169 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemE2EScale [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.202 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAppend [INFO] Running org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.805 s - in org.apache.hadoop.fs.azurebfs.diagnostics.TestConfigurationValidators [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename [WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 27.916 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemRename [INFO] Running org.apache.hadoop.fs.azurebfs.services.TestConfigurationServiceFieldsValidation [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.258 s - in org.apache.hadoop.fs.azurebfs.services.TestConfigurationServiceFieldsValidation [INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestAbfsHttpServiceImpl [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.977 s - in org.apache.hadoop.fs.azurebfs.services.ITestAbfsHttpServiceImpl [INFO] Running org.apache.hadoop.fs.azurebfs.services.TestParameterizedLoggingServiceImpl [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.283 s - in org.apache.hadoop.fs.azurebfs.services.TestParameterizedLoggingServiceImpl [INFO] Running org.apache.hadoop.fs.azurebfs.services.TestLoggingServiceImpl [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.253 s - in org.apache.hadoop.fs.azurebfs.services.TestLoggingServiceImpl [INFO] Running org.apache.hadoop.fs.azurebfs.services.TestNetworkThroughputAnalysisServiceImpl [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.87 s - in org.apache.hadoop.fs.azurebfs.services.TestNetworkThroughputAnalysisServiceImpl [INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestReadWriteAndSeek [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 244.85 s - in org.apache.hadoop.fs.azurebfs.services.ITestReadWriteAndSeek [INFO] Running org.apache.hadoop.fs.azurebfs.services.TestAbfsStatisticsServiceImpl [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.195 s - in org.apache.hadoop.fs.azurebfs.services.TestAbfsStatisticsServiceImpl [INFO] Running org.apache.hadoop.fs.azurebfs.services.ITestTracingServiceImpl [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.893 s - in org.apache.hadoop.fs.azurebfs.services.ITestTracingServiceImpl [INFO] Running org.apache.hadoop.fs.azurebfs.utils.TestUriUtils [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 s - in org.apache.hadoop.fs.azurebfs.utils.TestUriUtils [INFO] Running org.apache.hadoop.fs.azurebfs.ITestWasbAbfsCompatibility [WARNING] Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 11.948 s - in org.apache.hadoop.fs.azurebfs.ITestWasbAbfsCompatibility [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.894 s - in org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemFileStatus [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp [WARNING] Tests run: 6, Failures: 0, Errors: 0, Skipped: 6, Time elapsed: 0.834 s - in org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp [INFO] Running org.apache.hadoop.fs.azurebfs.contract.ITestAzureBlobFileSystemContract [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.694 s -