Hi Steve,

I don't think I fully understand your answer. Please pardon my naiveness
regarding the subject. From what I understand, the actual read will happen
in the executor so executor needs access to data lake. In that sense, how
do I make sure that I can programmatically pass azure credentials to the
executor so that it can read data to process.

Another dilemma I have is that a user might be accessing more than one data
sets within a job, lets say for joining them both. In that case, I might
have two separate access tokens to read data from data lake, one for each
data set.

Does that make sense?

Imtiaz

On Sat, Aug 19, 2017 at 7:04 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 19 Aug 2017, at 02:42, Imtiaz Ahmed <emtiazah...@gmail.com> wrote:
>
> Hi All,
>
> I am building a spark library which developers will use when writing their
> spark jobs to get access to data on Azure Data Lake. But the authentication
> will depend on the dataset they ask for. I need to call a rest API from
> within spark job to get credentials and authenticate to read data from
> ADLS. Is that even possible? I am new to spark.
> E.g, from inside a spark job a user will say:
>
> MyCredentials myCredentials = MyLibrary.getCredentialsForPath(userId,
> "/some/path/on/azure/datalake");
>
> then before spark.read.json("adl://examples/src/main/resources/people.json
> ")
> I need to authenticate the user to be able to read that path using the
> credentials fetched above.
>
> Any help is appreciated.
>
> Thanks,
> Imtiaz
>
>
> The ADL filesystem supports addDelegationTokens(); allowing the caller to
> collect the delegation tokens of the current authenticated user & then pass
> it along with the request —which is exactly what spark should be doing in
> spark submit.
>
> if you want to do it yourself, look in SparkHadoopUtils (I think; IDE is
> closed right now) & see how the tokens are picked up and then passed around
> (marshalled over the job request, unmarshalled after & picked up, with bits
> of the UserGroupInformation class doing the low level work)
>
> Java code snippet to write to the path tokenFile:
>
>                 FileSystem fs = FileSystem.get(conf);
>                 Credentials cred = new Credentials();
>                 Token<?> tokens[] = fs.addDelegationTokens(renewer, cred);
>                 cred.writeTokenStorageFile(tokenFile, conf);
>
> you can then read that file in elsewhere, and then (somehow) get the FS to
> use those toakens
>
> otherwise, ADL supports Oauth, so you may be able to use any Oauth
> libraries for this. hadoop-azure-dalalake pulls in okhttp for that,
>
>      <dependency>
>       <groupId>com.squareup.okhttp</groupId>
>       <artifactId>okhttp</artifactId>
>       <version>2.4.0</version>
>     </dependency>
>
> -Steve
>
>

Reply via email to