Hi Andre,

Yes, I'm aware of that configuration property, it's what I have been using to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify the core-site located in the HADOOP_CONF_DIR but rather copied and modified it and the pointed the processor to the copy. The problem with this is that we'll end up with a large number of core-site.xml copies that will all have to be maintained separately. Ideally we'd be able to specify the defaultFS in the processor config or have the processor behave like the hdfs command line tools. The command line tools don't require the defaultFS to be set to a wasb url in order to use wasb urls.

The key idea here is long term maintainability and using Ambari to maintain the configuration. If we need to change any other setting in the core-site.xml we'd have to change it in a bunch of different files manually.

Thanks,
Austin


On 03/28/2017 03:34 PM, Andre wrote:
Austin,

Perhaps that wasn't explicit but the settings don't need to be system wide, instead the defaultFS may be changed just for a particular processor, while the others may use configurations.

The *HDFS processor documentation mentions it allows yout to set particular hadoop configurations:

" A file or comma separated list of files which contains the Hadoop file system configuration. Without this, Hadoop will search the classpath for a 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default configuration"

Have you tried using this field to point to a file as described by Bryan?

Cheers

On 29 Mar 2017 05:21, "Austin Heyne" <[email protected] <mailto:[email protected]>> wrote:

    Thanks Bryan,

    Working with the configuration you sent what I needed to change
    was to set the fs.defaultFS to the wasb url that we're working
    from. Unfortunately this is a less than ideal solution since we'll
    be pulling files from multiple wasb urls and ingesting them into
    an Accumulo datastore. Changing the defaultFS I'm pretty certainly
    would mess with our local HDFS/Accumulo install. In addition we're
    trying to maintain all of this configuration with Ambari, which
    from what I can tell only supports one core-site configuration file.

    Is the only solution here to maintain multiple core-site.xml files
    or is there another way we configure this?

    Thanks,

    Austin



    On 03/28/2017 01:41 PM, Bryan Bende wrote:

        Austin,

        Can you provide the full error message and stacktrace for  the
        IllegalArgumentException from nifi-app.log?

        When you start the processor it creates a FileSystem instance
        based on
        the config files provided to the processor, which in turn
        causes all
        of the corresponding classes to load.

        I'm not that familiar with Azure, but if "Azure blob store" is
        WASB,
        then I have successfully done the following...

        In core-site.xml:

        <configuration>

             <property>
               <name>fs.defaultFS</name>
               <value>wasb://YOUR_USER@YOUR_HOST/</value>
             </property>

             <property>
               <name>fs.azure.account.key.nifi.blob.core.windows.net
        <http://fs.azure.account.key.nifi.blob.core.windows.net></name>
               <value>YOUR_KEY</value>
             </property>

             <property>
               <name>fs.AbstractFileSystem.wasb.impl</name>
               <value>org.apache.hadoop.fs.azure.Wasb</value>
             </property>

             <property>
               <name>fs.wasb.impl</name>
<value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
             </property>

             <property>
               <name>fs.azure.skip.metrics</name>
               <value>true</value>
             </property>

        </configuration>

        In Additional Resources property of an HDFS processor, point to a
        directory with:

        azure-storage-2.0.0.jar
        commons-codec-1.6.jar
        commons-lang3-3.3.2.jar
        commons-logging-1.1.1.jar
        guava-11.0.2.jar
        hadoop-azure-2.7.3.jar
        httpclient-4.2.5.jar
        httpcore-4.2.4.jar
        jackson-core-2.2.3.jar
        jsr305-1.3.9.jar
        slf4j-api-1.7.5.jar


        Thanks,

        Bryan


        On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <[email protected]
        <mailto:[email protected]>> wrote:

            Hi all,

            Thanks for all the help you've given me so far. Today I'm
            trying to pull
            files from an Azure blob store. I've done some reading on
            this and from
            previous tickets [1] and guides [2] it seems the
            recommended approach is to
            place the required jars, to use the HDFS Azure protocol,
            in 'Additional
            Classpath Resoures' and the hadoop core-site and hdfs-site
            configs into the
            'Hadoop Configuration Resources'. I have my local HDFS
            properly configured
            to access wasb urls. I'm able to ls, copy to and from, etc
            with out problem.
            Using the same HDFS config files and trying both all the
            jars in my
            hadoop-client/lib directory (hdp) and using the jars
            recommend in [1] I'm
            still seeing the "java.lang.IllegalArgumentException:
            Wrong FS: " error in
            my NiFi logs and am unable to pull files from Azure blob
            storage.

            Interestingly, it seems the processor is spinning up way
            to fast, the errors
            appear in the log as soon as I start the processor. I'm
            not sure how it
            could be loading all of those jars that quickly.

            Does anyone have any experience with this or
            recommendations to try?

            Thanks,
            Austin

            [1] https://issues.apache.org/jira/browse/NIFI-1922
            <https://issues.apache.org/jira/browse/NIFI-1922>
            [2]
            
https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
            
<https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html>





Reply via email to