Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Chamikara Jayalath
On Wed, Mar 20, 2019 at 7:37 PM Valentyn Tymofieiev wrote: > Pablo, according to Juta's analysis (1.c in the document) and also > https://issuetracker.google.com/issues/129006689, I think BQ confuses > BYTES and STRING when schema is not specified... This seems to me like a BQ > bug, so for Beam

Re: Python36/37 not installed on Beam2 and Beam12?

2019-03-20 Thread Valentyn Tymofieiev
I asked them yesterday on Slack, and commented on existing issue https://issues.apache.org/jira/browse/INFRA-17335, however didn't receive a response. We can try opening another infra ticket. Mark, perhaps you can quote/+1 my message from yesterday in thier slack channel :) ? On Wed, Mar 20, 2019

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Valentyn Tymofieiev
Pablo, according to Juta's analysis (1.c in the document) and also https://issuetracker.google.com/issues/129006689, I think BQ confuses BYTES and STRING when schema is not specified... This seems to me like a BQ bug, so for Beam this means that we either have to wait until BQ fixes or, or work

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Chamikara Jayalath
On Wed, Mar 20, 2019 at 6:30 PM Pablo Estrada wrote: > That sounds reasonable to me, Valentyn. > > Regarding (3), when the table already exists, it's not necessary to get > the schema. BQ is smart enough to load everything in appropriately. (as > long as bytes fields are base64 encoded) > > The

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Pablo Estrada
That sounds reasonable to me, Valentyn. Regarding (3), when the table already exists, it's not necessary to get the schema. BQ is smart enough to load everything in appropriately. (as long as bytes fields are base64 encoded) The problem is when the table does not exist and the user does not

Re: Python36/37 not installed on Beam2 and Beam12?

2019-03-20 Thread Yifan Zou
You could try to ping them in the slack channel https://the-asf.slack.com/messages/ if it is really urgent. On Wed, Mar 20, 2019 at 5:29 PM Mark Liu wrote: > Hi, > > I saw occasional py36 tox test failure in beam_PreCommit_Python > and beam_Release_NightlySnapshot in cron job >

Re: Python36/37 not installed on Beam2 and Beam12?

2019-03-20 Thread Ahmet Altay
I believe this is https://issues.apache.org/jira/browse/BEAM-6863 Asking questions on the infra channel on slack worked well for me before. On Wed, Mar 20, 2019 at 5:29 PM Mark Liu wrote: > Hi, > > I saw occasional py36 tox test failure in beam_PreCommit_Python > and

Python36/37 not installed on Beam2 and Beam12?

2019-03-20 Thread Mark Liu
Hi, I saw occasional py36 tox test failure in beam_PreCommit_Python and beam_Release_NightlySnapshot in cron job as well as PR triggered job . The error is simple:

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Valentyn Tymofieiev
Thanks Juta for detailed analysis. I reached out to BigQuery team to improve documentation around treatment of Bytes and reported the issue that schema autodetection does not work for BYTES in GCP issue tracker

Re: What quick command to catch common issues before pushing a python PR?

2019-03-20 Thread Pablo Estrada
Fancy : ) On Wed, Mar 20, 2019 at 1:25 AM Robert Bradshaw wrote: > I use tox as well. Actually, I use detox and retox (parallel versions > of tox, easily installable with pip) which can speed things up quite a > bit. > > On Wed, Mar 20, 2019 at 1:33 AM Pablo Estrada wrote: > > > > Correction

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Reuven Lax
The Java SDK relies on Jackson to do the encoding. On Wed, Mar 20, 2019 at 11:33 AM Chamikara Jayalath wrote: > > > On Wed, Mar 20, 2019 at 5:46 AM Juta Staes wrote: > >> Hi all, >> >> >> I am working on porting beam to python 3 and discovered the following: >> >> >> Current handling of bytes

Re: Hazelcast Jet Runner

2019-03-20 Thread Ankur Goenka
Hi Can, Like GreedyPipelineFuser, we have added many more components which makes building a Portable Runner easy. Here is a link [1] to slides which explains at a very high level what is needed to add a new portable runner. Still adding a portable runner will be more complex than adding a native

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Chamikara Jayalath
On Wed, Mar 20, 2019 at 5:46 AM Juta Staes wrote: > Hi all, > > > I am working on porting beam to python 3 and discovered the following: > > > Current handling of bytes in bigquery IO: > > When writing bytes to BQ , beam uses > https://cloud.google.com/bigquery/docs/reference/rest/v2/. This API

Re: [Announcement] New Website for Beam Summits

2019-03-20 Thread David Morávek
This is great! Thanks for all of the hard work you're putting into this. D. On Wed, Mar 20, 2019 at 1:38 PM Maximilian Michels wrote: > Not a bug, it's a feature ;) > > On 20.03.19 07:23, Kenneth Knowles wrote: > > Very nice. I appreciate the emphasis on coffee [1] [2] [3] though I > > suspect

Re: User state cleanup

2019-03-20 Thread Kenneth Knowles
On Wed, Mar 20, 2019 at 6:23 AM Maximilian Michels wrote: > Hi, > > I just realized that user state acquired via StateInternals in the Flink > Runner is not released automatically even when it falls out of the > Window scope. There are ways to work around this, i.e. setting a cleanup > timer

Re: User state cleanup

2019-03-20 Thread Thomas Weise
Good to know that the basic capability is in place, otherwise stateful processing could only be used with timers that perform cleanup in user land. I don't think the cleanup timer is used in the portable Flink runner though. DoFnOperator.createWrappingDoFnRunner isn't executed in this case.

Re: Hazelcast Jet Runner

2019-03-20 Thread Maximilian Michels
Documentation on portability is still a bit sparse although there are many design documents: https://beam.apache.org/contribute/design-documents/#portability The structure of portable Runners is not fundamentally different, but some of the operations are deferred to the SDK which runs code

Re: User state cleanup

2019-03-20 Thread Maximilian Michels
Thanks for the pointer Reuven. I didn't see that on window expiration this would iterate over all user state and call the `clear` method. -Max On 20.03.19 14:59, Reuven Lax wrote: Is this not already handled by cleanupTimer in StatefulDoFnRunner? On Wed, Mar 20, 2019 at 6:23 AM Maximilian

Re: User state cleanup

2019-03-20 Thread Reuven Lax
Is this not already handled by cleanupTimer in StatefulDoFnRunner? On Wed, Mar 20, 2019 at 6:23 AM Maximilian Michels wrote: > Hi, > > I just realized that user state acquired via StateInternals in the Flink > Runner is not released automatically even when it falls out of the > Window scope.

Re: Hazelcast Jet Runner

2019-03-20 Thread Can Gencer
I had a look at "GreedyPipelineFuser" and indeed this was what exactly I was talking about. Is https://beam.apache.org/roadmap/portability/ still the best information about the portable runners or is there a more in-depth guide available anywhere? On Wed, Mar 20, 2019 at 2:29 PM Can Gencer

Re: Hazelcast Jet Runner

2019-03-20 Thread Can Gencer
Hi Max, Thanks. When you mean "old-style runner" is this meant that this style of runners will be obsolete and only the portable one will be supported? The documentation for portable runners wasn't quite complete and the barrier to entry for writing an old style runner seemed easier for us and

User state cleanup

2019-03-20 Thread Maximilian Michels
Hi, I just realized that user state acquired via StateInternals in the Flink Runner is not released automatically even when it falls out of the Window scope. There are ways to work around this, i.e. setting a cleanup timer that fires when the Window expires. Do we expect Runners to perform

Writing bytes to BigQuery with beam

2019-03-20 Thread Juta Staes
Hi all, I am working on porting beam to python 3 and discovered the following: Current handling of bytes in bigquery IO: When writing bytes to BQ , beam uses https://cloud.google.com/bigquery/docs/reference/rest/v2/. This API expects byte values to be base-64 encoded*. However when writing

Re: [Announcement] New Website for Beam Summits

2019-03-20 Thread Maximilian Michels
Not a bug, it's a feature ;) On 20.03.19 07:23, Kenneth Knowles wrote: Very nice. I appreciate the emphasis on coffee [1] [2] [3] though I suspect there may be a rendering bug. Kenn [1] https://beamsummit.org/schedule/2019-06-19?sessionId=1 [2]

Re: Hazelcast Jet Runner

2019-03-20 Thread Maximilian Michels
Hi Can, Thanks for the update. Interesting question. Flink has an optimization built in called chaining which works together nicely with Beam. Essentially, operators which share the same partitioning get executed one after another inside a master operator. This saves resources.

Re: What quick command to catch common issues before pushing a python PR?

2019-03-20 Thread Robert Bradshaw
I use tox as well. Actually, I use detox and retox (parallel versions of tox, easily installable with pip) which can speed things up quite a bit. On Wed, Mar 20, 2019 at 1:33 AM Pablo Estrada wrote: > > Correction - the command is now: tox -e py35-gcp,py35-lint > > And it ran on my machine in

Re: [Announcement] New Website for Beam Summits

2019-03-20 Thread Kenneth Knowles
Very nice. I appreciate the emphasis on coffee [1] [2] [3] though I suspect there may be a rendering bug. Kenn [1] https://beamsummit.org/schedule/2019-06-19?sessionId=1 [2] https://beamsummit.org/schedule/2019-06-19?sessionId=3 [3] https://beamsummit.org/schedule/2019-06-19?sessionId=4 On Tue,