Austin, I believe the default FS is only used when you write to a path that doesn't specify the filesystem. Meaning, if you set the directory of PutHDFS to /data then it will use the default FS, but if you specify wasb://user@wasb2/data then it will go to /data in a different filesystem.
The problem here is that I don't see a way to specify different keys for each WASB filesystem in the core-site.xml. Admittedly I have never tried to setup something like this with many different filesystems. -Bryan On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <[email protected]> wrote: > Hi Andre, > > Yes, I'm aware of that configuration property, it's what I have been using > to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify > the core-site located in the HADOOP_CONF_DIR but rather copied and modified > it and the pointed the processor to the copy. The problem with this is that > we'll end up with a large number of core-site.xml copies that will all have > to be maintained separately. Ideally we'd be able to specify the defaultFS > in the processor config or have the processor behave like the hdfs command > line tools. The command line tools don't require the defaultFS to be set to > a wasb url in order to use wasb urls. > > The key idea here is long term maintainability and using Ambari to maintain > the configuration. If we need to change any other setting in the > core-site.xml we'd have to change it in a bunch of different files manually. > > Thanks, > Austin > > > On 03/28/2017 03:34 PM, Andre wrote: > > Austin, > > Perhaps that wasn't explicit but the settings don't need to be system wide, > instead the defaultFS may be changed just for a particular processor, while > the others may use configurations. > > The *HDFS processor documentation mentions it allows yout to set particular > hadoop configurations: > > " A file or comma separated list of files which contains the Hadoop file > system configuration. Without this, Hadoop will search the classpath for a > 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default > configuration" > > Have you tried using this field to point to a file as described by Bryan? > > Cheers > > On 29 Mar 2017 05:21, "Austin Heyne" <[email protected]> wrote: > > Thanks Bryan, > > Working with the configuration you sent what I needed to change was to set > the fs.defaultFS to the wasb url that we're working from. Unfortunately this > is a less than ideal solution since we'll be pulling files from multiple > wasb urls and ingesting them into an Accumulo datastore. Changing the > defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo > install. In addition we're trying to maintain all of this configuration with > Ambari, which from what I can tell only supports one core-site configuration > file. > > Is the only solution here to maintain multiple core-site.xml files or is > there another way we configure this? > > Thanks, > > Austin > > > > On 03/28/2017 01:41 PM, Bryan Bende wrote: >> >> Austin, >> >> Can you provide the full error message and stacktrace for the >> IllegalArgumentException from nifi-app.log? >> >> When you start the processor it creates a FileSystem instance based on >> the config files provided to the processor, which in turn causes all >> of the corresponding classes to load. >> >> I'm not that familiar with Azure, but if "Azure blob store" is WASB, >> then I have successfully done the following... >> >> In core-site.xml: >> >> <configuration> >> >> <property> >> <name>fs.defaultFS</name> >> <value>wasb://YOUR_USER@YOUR_HOST/</value> >> </property> >> >> <property> >> <name>fs.azure.account.key.nifi.blob.core.windows.net</name> >> <value>YOUR_KEY</value> >> </property> >> >> <property> >> <name>fs.AbstractFileSystem.wasb.impl</name> >> <value>org.apache.hadoop.fs.azure.Wasb</value> >> </property> >> >> <property> >> <name>fs.wasb.impl</name> >> <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value> >> </property> >> >> <property> >> <name>fs.azure.skip.metrics</name> >> <value>true</value> >> </property> >> >> </configuration> >> >> In Additional Resources property of an HDFS processor, point to a >> directory with: >> >> azure-storage-2.0.0.jar >> commons-codec-1.6.jar >> commons-lang3-3.3.2.jar >> commons-logging-1.1.1.jar >> guava-11.0.2.jar >> hadoop-azure-2.7.3.jar >> httpclient-4.2.5.jar >> httpcore-4.2.4.jar >> jackson-core-2.2.3.jar >> jsr305-1.3.9.jar >> slf4j-api-1.7.5.jar >> >> >> Thanks, >> >> Bryan >> >> >> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <[email protected]> wrote: >>> >>> Hi all, >>> >>> Thanks for all the help you've given me so far. Today I'm trying to pull >>> files from an Azure blob store. I've done some reading on this and from >>> previous tickets [1] and guides [2] it seems the recommended approach is >>> to >>> place the required jars, to use the HDFS Azure protocol, in 'Additional >>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into >>> the >>> 'Hadoop Configuration Resources'. I have my local HDFS properly >>> configured >>> to access wasb urls. I'm able to ls, copy to and from, etc with out >>> problem. >>> Using the same HDFS config files and trying both all the jars in my >>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm >>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error >>> in >>> my NiFi logs and am unable to pull files from Azure blob storage. >>> >>> Interestingly, it seems the processor is spinning up way to fast, the >>> errors >>> appear in the log as soon as I start the processor. I'm not sure how it >>> could be loading all of those jars that quickly. >>> >>> Does anyone have any experience with this or recommendations to try? >>> >>> Thanks, >>> Austin >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-1922 >>> [2] >>> >>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html >>> >>> > > >
