Re: rename: BeamRecord -> Row

2018-02-02 Thread Romain Manni-Bucau
Hi Shouldnt the discussion on schema which has a direct impact on this generic container be closed before any action on this? Le 3 févr. 2018 01:09, "Ankur Chauhan" a écrit : > ++ > > On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez > wrote: > >> Very

Re: rename: BeamRecord -> Row

2018-02-02 Thread Ankur Chauhan
++ On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez wrote: > Very strong +1 > > > On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax wrote: > >> We're looking at renaming the BeamRecord class >> , that was used for columnar

Re: rename: BeamRecord -> Row

2018-02-02 Thread Rafael Fernandez
Very strong +1 On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax wrote: > We're looking at renaming the BeamRecord class > , that was used for columnar > data. There was sufficient discussion on the naming, that I want to make > sure the dev

rename: BeamRecord -> Row

2018-02-02 Thread Reuven Lax
We're looking at renaming the BeamRecord class , that was used for columnar data. There was sufficient discussion on the naming, that I want to make sure the dev list is aware of naming plans here. BeamRecord is a columnar, field-based record. Currently

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Robert Bradshaw
Pipeline stages must be retry-tolerant. E.g. the VM it's running on might get shut down. We should not be failing jobs in this case. It seems the current implementation could only produce bad results if (1) unrelated output files already existed and (2) the temporary files were either not written

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Reuven Lax
On Fri, Feb 2, 2018 at 11:17 AM, Chamikara Jayalath wrote: > Currently, Python file-based sink is batch only. > Sure, but that won't be true forever. > > Regarding Raghu's question, stage/pipeline failure should not be > considered as a data loss but I prefer overriding

[SQL] Fix for HOP windows

2018-02-02 Thread Anton Kedin
Hi, If you're not using Beam SQL's HOP windowing functions, you're not affected. *The problem* Calcite defines HOP windowing function like this: - HOP(timestamp_field, frequency_interval, window_size) Beam SQL

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Raghu Angadi
On Fri, Feb 2, 2018 at 10:21 AM, Reuven Lax wrote: > However this code might run in streaming as well, right? > True. What is the best practices recommendation to handle it? Probably the author of the sink transform should consider the context and decide if needs to be retry

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Chamikara Jayalath
Currently, Python file-based sink is batch only. Regarding Raghu's question, stage/pipeline failure should not be considered as a data loss but I prefer overriding existing output and completing a possibly expensive pipeline over failing the whole pipeline due to one or more existing files. -

Re: Replacing Python DirectRunner apply_* hooks with PTransformOverrides

2018-02-02 Thread Kenneth Knowles
Awesome, nice! On Fri, Feb 2, 2018 at 11:00 AM, Charles Chen wrote: > Thanks Kenn. We already do the Runner API roundtripping (I believe Robert > implemented this). With this change, we would start doing exactly what > you're suggesting, where we apply overrides to a

Re: Replacing Python DirectRunner apply_* hooks with PTransformOverrides

2018-02-02 Thread Ahmet Altay
+1 to this change. Thank you Charles for improving the DirectRunner, sharing your progress and seeking feedback. This change would allow us to migrate to a faster DirectRunner for Python. A long time requested feature and an important part of the first use experience for new users trying out

Re: Replacing Python DirectRunner apply_* hooks with PTransformOverrides

2018-02-02 Thread Charles Chen
Thanks Kenn. We already do the Runner API roundtripping (I believe Robert implemented this). With this change, we would start doing exactly what you're suggesting, where we apply overrides to a post-deserialization pipeline. On Thu, Feb 1, 2018 at 6:45 PM Kenneth Knowles

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Reuven Lax
However this code might run in streaming as well, right? On Fri, Feb 2, 2018 at 9:54 AM, Raghu Angadi wrote: > In a batch pipeline, is it considered a data loss if the the stage fails > (assuming it does not set IGNORE_MISSING_FILES and fails hard)? If not, it > might be

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Raghu Angadi
In a batch pipeline, is it considered a data loss if the the stage fails (assuming it does not set IGNORE_MISSING_FILES and fails hard)? If not, it might be better to favor correctness and fail in current implementation. On Thu, Feb 1, 2018 at 4:07 PM, Robert Bradshaw wrote:

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
well we can disagree on the code - it is fine ;), but the needed part of it by beam is not huge and in any case it can be forked without requiring 10 classes - if so we'll use another impl than the guava one ;). This is the whole point. Romain Manni-Bucau @rmannibucau

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Reuven Lax
TypeToken is not trivial. I've written code to do what TypeToken does before (figuring out generic ancestor types). It's actually somewhat tricky, and the code I wrote had subtle bugs in it; eventually we removed this code in favor of Guava's implementation :) On Fri, Feb 2, 2018 at 7:47 AM,

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
Yep, note I never said to reinvent the wheel, we can copy it from guava, openwebbeans or any other impl. Point was more to avoid to depend on something we don't own for that which is after all not that much code. I also think we can limit it a lot to align it on what is supported by beam (I'm

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Kenneth Knowles
On Fri, Feb 2, 2018 at 7:18 AM, Romain Manni-Bucau wrote: > Don't forget beam doesn't support much behind it (mainly only a few > ParameterizedType due the usage code path) so it is mainly only about > handling parameterized types and typevariables recursively. Not a lot

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
Don't forget beam doesn't support much behind it (mainly only a few ParameterizedType due the usage code path) so it is mainly only about handling parameterized types and typevariables recursively. Not a lot of work. Main concern being it is in the API so using a shade as an API is never a good

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Kenneth Knowles
On Fri, Feb 2, 2018 at 6:41 AM, Romain Manni-Bucau wrote: > > > 2018-02-02 15:37 GMT+01:00 Kenneth Knowles : > >> Another couple: >> >> - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken >> > > Technically reflect Type is enough >

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2018-02-02 Thread Romain Manni-Bucau
2018-02-02 15:37 GMT+01:00 Kenneth Knowles : > Another couple: > > - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken > Technically reflect Type is enough > - ImmutableList and friends and their builders are very widely used and > IMO still add a lot

Build failed in Jenkins: beam_PostRelease_NightlySnapshot #14

2018-02-02 Thread Apache Jenkins Server
See Changes: [ekirpichov] Introduces the Wait transform [ehudm] Split out buffered read and write code from gcsio. [github] Fix undefined names: exc_info --> self.exc_info [github] import

Re: [PROPOSAL] Switch from Guava futures vs Java 8 futures

2018-02-02 Thread Holden Karau
For what it's worth there exists a relatively easy Java8 to Scala future conversion so this shouldn't cause an issue on the Spark runner. On Thu, Feb 1, 2018 at 11:22 PM, Alexey Romanenko wrote: > +1, sounds great! > > Regards, > Alexey > > > On 2 Feb 2018, at 07:14,