Austin,
I think you are correct that its <containername>@<accountname>, I
hadn't looked at this config in a long time and was reading too
quickly before :)
That would line up with the other property
fs.azure.account.key.<accountname>.blob.core.windows.net where you
specify the key for that account.
I have no idea if this will work, but lets say you had three different
WASB file systems, presumably each with their own account name and
key, you might be able to define these in core-site.xml:
<property>
<name>fs.azure.account.key.ACCOUNT1.blob.core.windows.net</name>
<value>KEY1</value>
</property>
<property>
<name>fs.azure.account.key.ACCOUNT2.blob.core.windows.net</name>
<value>KEY2</value>
</property>
<property>
<name>fs.azure.account.key.ACCOUNT3.blob.core.windows.net</name>
<value>KEY3</value>
</property>
Then in your HDFS processor in NiFi you point at this core-site.xml
and use a specific directory like
wasb://[email protected]/<path> and I'm hoping
it would know how to use the key for ACCOUNT3.
Not really sure if that helps your situation.
-Bryan
On Tue, Mar 28, 2017 at 4:14 PM, Austin Heyne <[email protected]> wrote:
> Bryan,
>
> So I initially didn't think much of it (assumed it a typo, etc) but you've
> said that the access url for wasb that you've been using is
> wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us and I'm
> wondering if we have a difference configuration somewhere. What we have to
> use is wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
> which seems to be in line with the Azure blob storage GUI and is what is
> outlined here [1]. Is there some other way this connector is being setup? It
> would make much more sense using your access pattern as then each container
> wouldn't need to have it's own core-site.xml.
>
> Thanks,
> Austin
>
> [1a]
> https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Accessing_wasb_URLs
> [1b]
> https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage
>
>
>
>
> On 03/28/2017 03:55 PM, Bryan Bende wrote:
>>
>> Austin,
>>
>> I believe the default FS is only used when you write to a path that
>> doesn't specify the filesystem. Meaning, if you set the directory of
>> PutHDFS to /data then it will use the default FS, but if you specify
>> wasb://user@wasb2/data then it will go to /data in a different
>> filesystem.
>>
>> The problem here is that I don't see a way to specify different keys
>> for each WASB filesystem in the core-site.xml.
>>
>> Admittedly I have never tried to setup something like this with many
>> different filesystems.
>>
>> -Bryan
>>
>>
>> On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <[email protected]> wrote:
>>>
>>> Hi Andre,
>>>
>>> Yes, I'm aware of that configuration property, it's what I have been
>>> using
>>> to set the core-site.xml and hdfs-site.xml. For testing this I didn't
>>> modify
>>> the core-site located in the HADOOP_CONF_DIR but rather copied and
>>> modified
>>> it and the pointed the processor to the copy. The problem with this is
>>> that
>>> we'll end up with a large number of core-site.xml copies that will all
>>> have
>>> to be maintained separately. Ideally we'd be able to specify the
>>> defaultFS
>>> in the processor config or have the processor behave like the hdfs
>>> command
>>> line tools. The command line tools don't require the defaultFS to be set
>>> to
>>> a wasb url in order to use wasb urls.
>>>
>>> The key idea here is long term maintainability and using Ambari to
>>> maintain
>>> the configuration. If we need to change any other setting in the
>>> core-site.xml we'd have to change it in a bunch of different files
>>> manually.
>>>
>>> Thanks,
>>> Austin
>>>
>>>
>>> On 03/28/2017 03:34 PM, Andre wrote:
>>>
>>> Austin,
>>>
>>> Perhaps that wasn't explicit but the settings don't need to be system
>>> wide,
>>> instead the defaultFS may be changed just for a particular processor,
>>> while
>>> the others may use configurations.
>>>
>>> The *HDFS processor documentation mentions it allows yout to set
>>> particular
>>> hadoop configurations:
>>>
>>> " A file or comma separated list of files which contains the Hadoop file
>>> system configuration. Without this, Hadoop will search the classpath for
>>> a
>>> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
>>> configuration"
>>>
>>> Have you tried using this field to point to a file as described by Bryan?
>>>
>>> Cheers
>>>
>>> On 29 Mar 2017 05:21, "Austin Heyne" <[email protected]> wrote:
>>>
>>> Thanks Bryan,
>>>
>>> Working with the configuration you sent what I needed to change was to
>>> set
>>> the fs.defaultFS to the wasb url that we're working from. Unfortunately
>>> this
>>> is a less than ideal solution since we'll be pulling files from multiple
>>> wasb urls and ingesting them into an Accumulo datastore. Changing the
>>> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
>>> install. In addition we're trying to maintain all of this configuration
>>> with
>>> Ambari, which from what I can tell only supports one core-site
>>> configuration
>>> file.
>>>
>>> Is the only solution here to maintain multiple core-site.xml files or is
>>> there another way we configure this?
>>>
>>> Thanks,
>>>
>>> Austin
>>>
>>>
>>>
>>> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>>>
>>>> Austin,
>>>>
>>>> Can you provide the full error message and stacktrace for the
>>>> IllegalArgumentException from nifi-app.log?
>>>>
>>>> When you start the processor it creates a FileSystem instance based on
>>>> the config files provided to the processor, which in turn causes all
>>>> of the corresponding classes to load.
>>>>
>>>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>>>> then I have successfully done the following...
>>>>
>>>> In core-site.xml:
>>>>
>>>> <configuration>
>>>>
>>>> <property>
>>>> <name>fs.defaultFS</name>
>>>> <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>>> <value>YOUR_KEY</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>fs.AbstractFileSystem.wasb.impl</name>
>>>> <value>org.apache.hadoop.fs.azure.Wasb</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>fs.wasb.impl</name>
>>>> <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>>> </property>
>>>>
>>>> <property>
>>>> <name>fs.azure.skip.metrics</name>
>>>> <value>true</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>> In Additional Resources property of an HDFS processor, point to a
>>>> directory with:
>>>>
>>>> azure-storage-2.0.0.jar
>>>> commons-codec-1.6.jar
>>>> commons-lang3-3.3.2.jar
>>>> commons-logging-1.1.1.jar
>>>> guava-11.0.2.jar
>>>> hadoop-azure-2.7.3.jar
>>>> httpclient-4.2.5.jar
>>>> httpcore-4.2.4.jar
>>>> jackson-core-2.2.3.jar
>>>> jsr305-1.3.9.jar
>>>> slf4j-api-1.7.5.jar
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bryan
>>>>
>>>>
>>>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <[email protected]> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Thanks for all the help you've given me so far. Today I'm trying to
>>>>> pull
>>>>> files from an Azure blob store. I've done some reading on this and from
>>>>> previous tickets [1] and guides [2] it seems the recommended approach
>>>>> is
>>>>> to
>>>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>>>> the
>>>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>>>> configured
>>>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>>>> problem.
>>>>> Using the same HDFS config files and trying both all the jars in my
>>>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1]
>>>>> I'm
>>>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>>>> in
>>>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>>>
>>>>> Interestingly, it seems the processor is spinning up way to fast, the
>>>>> errors
>>>>> appear in the log as soon as I start the processor. I'm not sure how it
>>>>> could be loading all of those jars that quickly.
>>>>>
>>>>> Does anyone have any experience with this or recommendations to try?
>>>>>
>>>>> Thanks,
>>>>> Austin
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>>>> [2]
>>>>>
>>>>>
>>>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>>>
>>>>>
>>>
>>>
>