Hi Andre,
Yes, I'm aware of that configuration property, it's what I have been
using to set the core-site.xml and hdfs-site.xml. For testing this I
didn't modify the core-site located in the HADOOP_CONF_DIR but rather
copied and modified it and the pointed the processor to the copy. The
problem with this is that we'll end up with a large number of
core-site.xml copies that will all have to be maintained separately.
Ideally we'd be able to specify the defaultFS in the processor config or
have the processor behave like the hdfs command line tools. The command
line tools don't require the defaultFS to be set to a wasb url in order
to use wasb urls.
The key idea here is long term maintainability and using Ambari to
maintain the configuration. If we need to change any other setting in
the core-site.xml we'd have to change it in a bunch of different files
manually.
Thanks,
Austin
On 03/28/2017 03:34 PM, Andre wrote:
Austin,
Perhaps that wasn't explicit but the settings don't need to be system
wide, instead the defaultFS may be changed just for a particular
processor, while the others may use configurations.
The *HDFS processor documentation mentions it allows yout to set
particular hadoop configurations:
" A file or comma separated list of files which contains the Hadoop
file system configuration. Without this, Hadoop will search the
classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will
revert to a default configuration"
Have you tried using this field to point to a file as described by Bryan?
Cheers
On 29 Mar 2017 05:21, "Austin Heyne" <[email protected]
<mailto:[email protected]>> wrote:
Thanks Bryan,
Working with the configuration you sent what I needed to change
was to set the fs.defaultFS to the wasb url that we're working
from. Unfortunately this is a less than ideal solution since we'll
be pulling files from multiple wasb urls and ingesting them into
an Accumulo datastore. Changing the defaultFS I'm pretty certainly
would mess with our local HDFS/Accumulo install. In addition we're
trying to maintain all of this configuration with Ambari, which
from what I can tell only supports one core-site configuration file.
Is the only solution here to maintain multiple core-site.xml files
or is there another way we configure this?
Thanks,
Austin
On 03/28/2017 01:41 PM, Bryan Bende wrote:
Austin,
Can you provide the full error message and stacktrace for the
IllegalArgumentException from nifi-app.log?
When you start the processor it creates a FileSystem instance
based on
the config files provided to the processor, which in turn
causes all
of the corresponding classes to load.
I'm not that familiar with Azure, but if "Azure blob store" is
WASB,
then I have successfully done the following...
In core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>wasb://YOUR_USER@YOUR_HOST/</value>
</property>
<property>
<name>fs.azure.account.key.nifi.blob.core.windows.net
<http://fs.azure.account.key.nifi.blob.core.windows.net></name>
<value>YOUR_KEY</value>
</property>
<property>
<name>fs.AbstractFileSystem.wasb.impl</name>
<value>org.apache.hadoop.fs.azure.Wasb</value>
</property>
<property>
<name>fs.wasb.impl</name>
<value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
</property>
<property>
<name>fs.azure.skip.metrics</name>
<value>true</value>
</property>
</configuration>
In Additional Resources property of an HDFS processor, point to a
directory with:
azure-storage-2.0.0.jar
commons-codec-1.6.jar
commons-lang3-3.3.2.jar
commons-logging-1.1.1.jar
guava-11.0.2.jar
hadoop-azure-2.7.3.jar
httpclient-4.2.5.jar
httpcore-4.2.4.jar
jackson-core-2.2.3.jar
jsr305-1.3.9.jar
slf4j-api-1.7.5.jar
Thanks,
Bryan
On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <[email protected]
<mailto:[email protected]>> wrote:
Hi all,
Thanks for all the help you've given me so far. Today I'm
trying to pull
files from an Azure blob store. I've done some reading on
this and from
previous tickets [1] and guides [2] it seems the
recommended approach is to
place the required jars, to use the HDFS Azure protocol,
in 'Additional
Classpath Resoures' and the hadoop core-site and hdfs-site
configs into the
'Hadoop Configuration Resources'. I have my local HDFS
properly configured
to access wasb urls. I'm able to ls, copy to and from, etc
with out problem.
Using the same HDFS config files and trying both all the
jars in my
hadoop-client/lib directory (hdp) and using the jars
recommend in [1] I'm
still seeing the "java.lang.IllegalArgumentException:
Wrong FS: " error in
my NiFi logs and am unable to pull files from Azure blob
storage.
Interestingly, it seems the processor is spinning up way
to fast, the errors
appear in the log as soon as I start the processor. I'm
not sure how it
could be loading all of those jars that quickly.
Does anyone have any experience with this or
recommendations to try?
Thanks,
Austin
[1] https://issues.apache.org/jira/browse/NIFI-1922
<https://issues.apache.org/jira/browse/NIFI-1922>
[2]
https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
<https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html>