Is Hierarchical Namespace [1] Enabled on the Storage Account?

When HNS is not enabled or when operations using ADLFS fail, the Azure file
system implementation falls back to Azure Blobs operations.

I have a draft on my machine of a change that would add a configuration
option to *force* the use of ADLFS and fail instead of falling back to
Azure Blobs when ADLFS operations fail.

Any specific reason for not wanting Azure Blobs to never be used?
__
Felipe

[1]
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace
[2] ADLFS = Azure Data Lake File System Gen 2

On Wed, Jul 10, 2024 at 6:34 PM Sutou Kouhei <k...@clear-code.com> wrote:

> Hi,
>
> >       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; //
> If I don't do this, then the
> >                                                                      //
> blob.core.windows.net is used;
> >                                                                      //
> I want dfs not blob, so... not certain
> >                                                                      //
> why that happens either
>
> This is strange. In general, you should not do this.
> AzureFS uses both of blob storage API and data lake storage
> API. If data lake storage API is available, AzureFS uses it
> automatically. So you should not change
> blob_storage_authority.
>
> If you don't have this line, what was happen?
>
>
> Thanks,
> --
> kou
>
> In
>  <
> dm3pr05mb1054334eeaeae4a95805de322f3...@dm3pr05mb10543.namprd05.prod.outlook.com
> >
>   "Using the new Azure filesystem object (C++)" on Wed, 10 Jul 2024
> 16:58:52 +0000,
>   "Jerry Adair via user" <user@arrow.apache.org> wrote:
>
> > Hi-
> >
> > I am attempting to use the new Azure filesystem object in C++.
> Arrow/Parquet version 16.0.0.  I already have code that works for GCS and
> AWS/S3.  I have been waiting for quite a while to see the new Azure
> filesystem object released.  Now that it has in this version (16.0.0) I
> have been trying to use it.  Without success.  I presumed that it would
> work in the same manner in which the GCS and S3/AWS filesystem objects
> work.  You create the object, then you can use it in the same manner that
> you used the other filesystem objects.  Note that I am not using Arrow
> methods to read/write the data but rather the Parquet methods.  This works
> for local, GCS and S3/AWS.  However I cannot open a file on Azure.  It
> seems like no matter which authentication method I try to use, it doesn't
> work.  And I get different results depending on which auth approach I take
> (client secret versus account key, etc.).  Here is a code summary of what I
> am trying to do:
> >
> >       arrow::fs::AzureOptions   azureOptions;
> >       arrow::Status             configureStatus = arrow::Status::OK();
> >
> >      // exact values obfuscated
> >       azureOptions.account_name = "mytest";
> >       azureOptions.dfs_storage_authority = ".dfs.core.windows.net";
> >       azureOptions.blob_storage_authority = ".dfs.core.windows.net"; //
> If I don't do this, then the
> >                                                                      //
> blob.core.windows.net is used;
> >                                                                      //
> I want dfs not blob, so... not certain
> >                                                                      //
> why that happens either
> >       std::string  client_id  = "3f061894-blah";
> >       std::string  client_secret  = "2c796e9eblah";
> >       std::string  tenant_id  = "b1c14d5c-blah";
> >       //std::string  account_key  = "flMhWgNts+i/blah==";
> >
> >
> >       //configureStatus = azureOptions.ConfigureAccountKeyCredential(
> account_key );
> >       configureStatus = azureOptions.ConfigureClientSecretCredential(
> tenant_id, client_id, client_secret );
> >       //configureStatus =
> azureOptions.ConfigureManagedIdentityCredential( client_id );
> >       if( false == configureStatus.ok() )
> >       {
> >          // Uh-oh, throw
> >
> >       }
> >
> >       std::shared_ptr<arrow::fs::AzureFileSystem>   azureFileSystem;
> >       arrow::Result<std::shared_ptr<arrow::fs::AzureFileSystem>>
>  azureFileSystemResult = arrow::fs::AzureFileSystem::Make( azureOptions );
> >       if( true == azureFileSystemResult.ok() )
> >       {
> >          azureFileSystem = azureFileSystemResult.ValueOrDie();
> >
> >       }
> >       else
> >       {
> >          // Uh-oh, throw
> >
> >       }
> >
> >          //const std::string path( "parquet/ParquetFiles/plain.parquet"
> );
> >          const std::string path( "parquet/ParquetFiles/plain.parquet" );
> >          std::shared_ptr<arrow::io::RandomAccessFile> arrowFile;
> > std::cout << "1\n";
> >          arrow::Result<std::shared_ptr<arrow::io::RandomAccessFile>>
> openResult = azureFileSystem->OpenInputFile( path );
> > std::cout << "2\n";
> >
> > And that is where things run off the rails.  At this point, all I want
> to do is open the input file, create a Parquet file reader like so:
> >
> >          std::unique_ptr<parquet::ParquetFileReader> parquet_reader =
> parquet::ParquetFileReader::Open( arrowFile );
> >
> > Then go about my business of reading/writing Parquet data as per
> normal.  Ergo, just as I do for the other filesystem objects.  But the
> OpenInputFile() method fails for the Azure use case scenario.  If I attempt
> the account key configuration, then the error I see is:
> >
> > adls_read
> > Parquet file read commencing...
> > 1
> > Parquet read error: map::at
> >
> > Where the "1" is just a marker to show how far I got in the process of
> reading a pre-existing Parquet file from the Azure server.  Ergo, a
> low-brow means of debugging.  The cout is shown above.  I don't get to "2",
> obviously.
> >
> > When attempting the client secret credential auth, I see the following
> failure:
> >
> > adls_read
> > Parquet file read commencing...
> > 1
> > Parquet read error: GetToken(): error response: 401 Unauthorized
> >
> > Then when attempting the Managed Identity auth configuration, I get the
> following:
> >
> > adls_read
> > Parquet file read commencing...
> > 1
> > ^C
> >
> > Where the process just hangs and I have to interrupt out of it.  Note
> that I didn't have this level of difficulty when I implemented our support
> for GCS and S3/AWS.  Those were relatively straightforward.  Azure however
> has been more difficult;  this should just work.  I mean, you create the
> filesystem object, then you are supposed to be able to use it in the same
> manner that you use any other Arrow filesystem object.  However I can't
> open a file and I suspect it is due to some type of handshaking issue with
> Azure.  Azure has all of these moving parts; tenant ID, application/client
> ID, client secret, object ID (which we don't use in this case) and the list
> goes on.  Finally, I saw this in the azurefs.h header at line 102:
> >
> >   // TODO(GH-38598): Add support for more auth methods.
> >   // std::string connection_string;
> >   // std::string sas_token;
> >
> > But it seemed clear to me that this was referring to other auth methods
> than those that have been implemented thus far (ergo client secret, account
> key, etc.).  Am I correct?
> >
> > So my questions are:
> >
> >   1.  Any ideas where I am going wrong here?
> >   2.  Has anyone else used the Azure filesystem object?
> >   3.  Has it worked for you?
> >   4.  If so, what was your approach?
> >
> > Note that I did peruse the azurefs_test.cc for examples.  I did see
> various approaches.  One involved invoking the MakeDataLakeServiceClient()
> method.  It wasn't clear if I needed to do that or not, but then I saw that
> this is done during the private implementation of the AzureFileSystem's
> Make() method, thus:
> >
> >   static Result<std::unique_ptr<AzureFileSystem::Impl>>
> Make(AzureOptions options,
> >
> io::IOContext io_context) {
> >     auto self = std::unique_ptr<AzureFileSystem::Impl>(
> >         new AzureFileSystem::Impl(std::move(options),
> std::move(io_context)));
> >     ARROW_ASSIGN_OR_RAISE(self->blob_service_client_,
> >                           self->options_.MakeBlobServiceClient());
> >     ARROW_ASSIGN_OR_RAISE(self->datalake_service_client_,
> >                           self->options_.MakeDataLakeServiceClient());
> >     return self;
> >   }
> >
> > So it seemed like I wouldn't need to do it separately.
> >
> > Anyway, I need to get this working ASAP, so I am open to feedback.  I'll
> continue plugging away.
> >
> > Thanks!
> > Jerry
>

Reply via email to