Wow -- super neat for multi-HDFS configs!  This has always been a weak
spot for the hadoop filesystem (including Hadoop S3).

Any idea if this can support reading with one kerberos keytab and
writing with another?  I'll give it a try!  I suspect the answer is
"maybe".

Ryan

On Wed, Aug 21, 2019 at 4:22 PM Alexey Romanenko
<[email protected]> wrote:
>
> Yes, thanks, I saw this commit. I think we need to add the similar things for 
> S3FileSystem as well to support multiple S3 configs.
>
> On 20 Aug 2019, at 17:45, Lukasz Cwik <[email protected]> wrote:
>
> HDFS support in Beam was recently[1] improved to support more than one 
> cluster.
>
> 1: 
> https://github.com/apache/beam/commit/f1dc92f8ec2d4d78b9b60440f821df43dc374e21
>
> On Tue, Aug 20, 2019 at 7:56 AM Alexey Romanenko <[email protected]> 
> wrote:
>>
>> Hi all,
>>
>> I’m looking for a working solution for cases where it’s needed (or even 
>> required) to use different file system configuration (HDFS, S3, GC) in the 
>> same pipeline and where IO is Beam FileSystems based (FileIO, TextIO, etc).
>> For example:
>> - reading data from one HDFS cluster and writing results into another one 
>> which requires different configuration;
>> - reading objects from one S3 bucket, writing into another one and we need 
>> to use different credentials and/or regions for that;
>> - we even can have heterogeneous case, where we need to read data from HDFS 
>> and write results into S3 or vice versa.
>>
>> Usually, in other IOs, we can do this easily by having specific methods, 
>> like “withConfiguration()”, “withCredentialsProvider()”, etc. for Read and 
>> Write, but FileSystems based IO could be configured only with 
>> PipelineOptions afaik. There was a thread about that a while ago [1] where 
>> Lukasz Cwik said that it’s feasible by using different schemes but, 
>> unfortunately, I haven’t managed to make it working on my side (neither for 
>> HDFS nor for S3).
>>
>> So, any additional inputs or working solutions would be very welcomed if 
>> someone has any. In the long term, I’d like to document this in details 
>> since, I guess, this case can be quite demanded.
>>
>> [1] 
>> https://lists.apache.org/thread.html/bb5f98c4154cc72d097ce5b404ff0b3bcb52b7360b0834af7116883b@%3Cdev.beam.apache.org%3E
>>
>>
>

Reply via email to