Re: s3 filesystem for Python good for GSoC?

2019-02-21 Thread Suneel Marthi
Yup, something like this. import boto3 s3r = boto3.resource(“s3”) data = s3r.Object(bucket=“bucket”, key=“key”).read() On Thu, Feb 21, 2019 at 9:50 PM Boyuan Zhang wrote: > I believe the Boto3 lib should be helpful with right credential > configuration when creating a client: > https://boto3.a

Re: s3 filesystem for Python good for GSoC?

2019-02-21 Thread Boyuan Zhang
I believe the Boto3 lib should be helpful with right credential configuration when creating a client: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration On Thu, Feb 21, 2019 at 6:15 PM Suneel Marthi wrote: > Couldn't u just use Boto python package for doi

Re: s3 filesystem for Python good for GSoC?

2019-02-21 Thread Suneel Marthi
Couldn't u just use Boto python package for doing that ? I am writing one now to read from S3 via the Python api On Thu, Feb 21, 2019 at 6:19 PM Pablo Estrada wrote: > Hello all, > I was thinking that a filesystem with support for s3 would be great to > have in the Python SDK. If I am not wrong

Re: s3 filesystem for Python good for GSoC?

2019-02-21 Thread Kenneth Knowles
This is a good scope. And given there are multiple choices, an advanced student can expand scope to do both. Kenn On Thu, Feb 21, 2019 at 5:36 PM Austin Bennett wrote: > Hi Pablo, > > Agree on the usefulness. > > Some thoughts embedded: > > > On Thu, Feb 21, 2019 at 3:19 PM Pablo Estrada wrote

Re: s3 filesystem for Python good for GSoC?

2019-02-21 Thread Austin Bennett
Hi Pablo, Agree on the usefulness. Some thoughts embedded: On Thu, Feb 21, 2019 at 3:19 PM Pablo Estrada wrote: > Hello all, > I was thinking that a filesystem with support for s3 would be great to > have in the Python SDK. If I am not wrong, it would simply involve > implementing the filesys

s3 filesystem for Python good for GSoC?

2019-02-21 Thread Pablo Estrada
Hello all, I was thinking that a filesystem with support for s3 would be great to have in the Python SDK. If I am not wrong, it would simply involve implementing the filesystem classes with s3, right? I am not fa

Re: KuduIO test flakiness

2019-02-21 Thread Kenneth Knowles
I don't see it in the postcommit history: https://builds.apache.org/job/beam_PreCommit_Java_Cron/977/testReport/org.apache.beam.sdk.io.kudu/KuduIOTest/history/ (URL assembled by trial and error and from Mikhail's prior sharing) Kenn On Thu, Feb 21, 2019 at 12:59 PM Reuven Lax wrote: > I'm find

KuduIO test flakiness

2019-02-21 Thread Reuven Lax
I'm finding the KuduIO tests to be extremely flaky. I've just ran Java Presubmit three times in a row, and each time the KuduIO tests failed. How can we improve this situation? Example failure: https://builds.apache.org/job/beam_PreCommit_Java_Phrase/754/testReport/junit/org.apache.beam.sdk.io.ku

Re: Running WordCount With DataflowRunner

2019-02-21 Thread Henrique Molina
Hi Shrikant, Pay attention in your parameter --output=gs://test-bucket*/c *\ Your configuration is indicate directory /c , not */b* So, check in your Storage GCP if exist this directory : /c ( -output=gs://test-bucket/c ) and check : gsutil ls gs://test-bucket*/c* Cheers Carlos Molina On Thu,

Running WordCount With DataflowRunner

2019-02-21 Thread shrikant bang
Hi Team, I have tried running WordCount with DataflowRunner on GC. However I am getting exception as Caused by: java.lang.IllegalArgumentException: *Output path does not exist or is not writeable: gs://test-bucket/b.* However, gs://test-bucket/b exists and can be accessible by g

Re: Wait on JdbcIO write completion

2019-02-21 Thread Jonathan Perron
Thank you Eugene for your answer. According to your explanation, I think I will go with your 3rd solution, as this seems the most robust and friendly way to act. Jonathan On 21/02/2019 02:22, Eugene Kirpichov wrote: Hi Jonathan, Wait.on() requires a PCollection - it is not possible to chang