Re: get method guid prefix for file parts for write

2020-09-25 Thread gpongracz
What Nick said was correct.

What I should also state is that I am using python spark variant in this
case not the scala.

I am looking to use the guid prefix of part-0 to prevent a race condition by
using a s3 waiter for the part to appear, but to achieve this, I need to
know the guid value in advance.

Thank you all again for your help.

Regards,

George



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: get method guid prefix for file parts for write

2020-09-25 Thread gpongracz
I should add that I tried using a waiter on the _SUCCESS file but it did not
prove successful as due to its small size compared to the part-0 file it
seems to be appearing before the part-0 file in s3, even though it was
written afterwards.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



get method guid prefix for file parts for write

2020-09-24 Thread gpongracz
I lack the vocabulary for this question so please bear with my description of
the problem...

I am searching for a way to get the guid prefix value to be used to write
the parts of a file.

eg: 

part-0-b5265e7b-b974-4083-a66e-e7698258ca50-c000.csv

I would like to get the prefix "0-b5265e7b-b974-4083-a66e-e7698258ca50"

Is there a way that I might be able to access such value programatically?

Any assistance is appreciated.

George Pongracz




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-07-11 Thread gpongracz
As someone who mainly operates in AWS it would be very welcome to have the
option to use an updated version of hadoop using pyspark sourced from pypi.

Acknowledging the issues of backwards compatability...

The most vexing issue is the lack of ability to use s3a STS, ie
org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider. 

This prevents the use of AWS temporary credentials, hampering local
development against s3.

Whilst this would be solved by bumping the hadoop version to anything >=
2.8.x, the 3.x option would also allow for the writing of data using KMS.

Regards,

George Pongracz



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org