Re: Reading CSV from google cloud storage to Data Flow

2018-11-26 Thread Robert Bradshaw
The same holds true in Python: Read the files with TextIO and follow with a Map operation that splits the lines into records. This, of course, only works if you don't have newlines within your records. In that case, you may need to use a DoFn that takes as input a each filename and reads the

Re: Edit access to the Apache Beam Confluence Wiki?

2018-11-26 Thread Łukasz Gajowy
It's "lgajowy". Sorry, I incorrectly assumed it's somehow connected to Jira. pt., 23 lis 2018 o 18:37 Thomas Weise napisał(a): > Alexey, you have been added. > > Łukasz, I could not find you. Did you create an account? What's the user > ID? > > On Thu, Nov 22, 2018 at 7:47 AM Alexey Romanenko

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-26 Thread Thomas Weise
+1 for introducing the new interface now and deprecating the old one. The major version change then provides the opportunity to remove deprecated code. On Mon, Nov 26, 2018 at 10:09 AM Lukasz Cwik wrote: > Before 3.0 we will still want to introduce this giving time for people to > migrate,

Re: TextIO setting file dynamically issue

2018-11-26 Thread Reuven Lax
Do you need it to change based on the timestamps of the records being processed, or based on actual current time? On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid wrote: > Hello, > > > > I have an interesting issue that I can’t seem to find a reliable > resolution too. > > > > I have a standard

TextIO setting file dynamically issue

2018-11-26 Thread Matthew Schneid
Hello, I have an interesting issue that I can’t seem to find a reliable resolution too. I have a standard TextIO output that looks like the following: TextIO.write().to("gs://+ new DateTime().toString("HH-mm-ss") + "/Test-") The above works, and writes to GSC, as I expect it too. However, it

Re: Evolving a Coder for an added field

2018-11-26 Thread Robert Bradshaw
Modifying an existing coder is a non-starter until we have a versioning story. Creating an entirely new coder should definitely be possible, and using it either opt-in or, if a good enough case can be made, possibly even opt-out could get this unblocked. On Mon, Nov 26, 2018 at 3:05 PM Jeff

Beam Dependency Check Report (2018-11-26)

2018-11-26 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue future 0.16.0 0.17.1 2016-10-27

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-26 Thread Jeff Klukas
Picking up this thread again. Based on the feedback from Kenn, Reuven, and Romain, it sounds like there's no objection to the idea of SimpleFunction and SerializableFunction declaring that they throw Exception. So the discussion at this point is about whether there's an acceptable way to introduce

Re: Evolving a Coder for an added field

2018-11-26 Thread Lukasz Cwik
Reuven was one of the people I reached out to on this matter and he replied on this thread. On Mon, Nov 26, 2018 at 7:07 AM Robert Bradshaw wrote: > Modifying an existing coder is a non-starter until we have a versioning > story. Creating an entirely new coder should definitely be possible, and

Re: Nexmark Phrase Triggering

2018-11-26 Thread Chamikara Jayalath
Thanks Łukasz. Should the solution be documented (in Beam testing guide ?) so that other performance tests can support manual triggering without affecting benchmark results in a similar manner ? - Cham On Thu, Nov 22, 2018 at 4:03 AM Łukasz Gajowy wrote: > Hi all, > > BEAM-6011 is now

Re: Evolving a Coder for an added field

2018-11-26 Thread Jeff Klukas
Lukasz - Were you able to get any more context on the possibility of versioning coders from other folks at Google? It sounds like adding versioning for coders and/or schemas is potentially a large change. At this point, should I just write up some highlights from this thread in a JIRA issue for

Re: Design review for supporting AutoValue Coders and conversions to Row

2018-11-26 Thread Jeff Klukas
Reuven - How is the work on constructor support for ByteBuddy codegen going? Does it still look like that's going to be a feasible way forward for generating schemas/coders for AutoValue classes? On Thu, Nov 15, 2018 at 4:37 PM Reuven Lax wrote: > I would hope so if possible. > > On Fri, Nov

Re: [PROPOSAL] Prepare Beam 2.9.0 release

2018-11-26 Thread Lukasz Cwik
I'm working on BEAM-6102 and after 12 hours on the issue I have not made much real progress. I initially suspected its a shading issue with the Dataflow worker jar but can't reproduce the issue without running a full Dataflow pipeline. Any help would be appreciated, context of what I have tried is

Re: To create a WordCount-SideInput.java example?

2018-11-26 Thread Lukasz Cwik
Examples are good for showing users how to use certain concepts but we should stick with ValidatesRunner tests for ensuring that runners / SDKs implement concepts correctly. We have several ValidatesRunner side input tests in ParDoTest.java[1], ViewTest.java[2], and sideinputs_test.py[3] that

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-11-26 Thread Thomas Weise
PR for this: https://github.com/apache/beam/pull/7129 On Tue, Oct 16, 2018 at 11:40 AM Robert Bradshaw wrote: > Thanks for bringing this to a conclusion. > > On Mon, Oct 15, 2018 at 6:18 PM Thomas Weise wrote: > > > > Here is my attempt to summarize the discussion, please see the TBDs. > > >

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-26 Thread Lukasz Cwik
Before 3.0 we will still want to introduce this giving time for people to migrate, would it make sense to do that now and deprecate the alternatives that it replaces? On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas wrote: > Picking up this thread again. Based on the feedback from Kenn, Reuven, and

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

2018-11-26 Thread Alex Amato
Thanks Maximilian, let me know if you need any help. Usually I debug this sort of thing by pausing the IntelliJ debugger to see all the different threads which are waiting on various conditions. If you find any insights from that, please post them here and we can try to figure out the source of

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-26 Thread Lukasz Cwik
On Mon, Nov 26, 2018 at 9:09 AM Ismaël Mejía wrote: > > Bundle finalization is unrelated to backlogs but is needed since there > is a class of data stores which need acknowledgement that says I have > successfully received your data and am now responsible for it such as > acking a message from a

Re: [PROPOSAL] Prepare Beam 2.9.0 release

2018-11-26 Thread Chamikara Jayalath
Hi All, Currently there are two blockers for the 2.9.0 release. * Dataflow cannot deserialize DoFns - https://issues.apache.org/jira/browse/BEAM-6102 * [SQL] Nexmark 5, 7 time out - https://issues.apache.org/jira/browse/BEAM-6082 We'll postpone cutting the release candidate till these issues

Re: To create a WordCount-SideInput.java example?

2018-11-26 Thread Ruoyun Huang
Thanks Kenneth. Didn't look into subfolders, let me read a bit more. And will look into the tests Luke pointed out as well. To make sure I understand your comments of "Side inputs _are_ different in streaming as *you* have to ...", are you saying either: 1) a user needs to use/treat SideInput

Re: To create a WordCount-SideInput.java example?

2018-11-26 Thread Kenneth Knowles
On Mon, Nov 26, 2018 at 1:32 PM Ruoyun Huang wrote: > Thanks Kenneth. Didn't look into subfolders, let me read a bit more. And > will look into the tests Luke pointed out as well. > > To make sure I understand your comments of "Side inputs _are_ different in > streaming as *you* have to ...",

Re: org.apache.beam.runners.flink.PortableTimersExecutionTest is very flakey

2018-11-26 Thread Maximilian Michels
Hi Alex, Thanks for your help! I'm quite used to debugging concurrent/distributed problems. But this one is quite tricky, especially with regards to GRPC threads. I try to provide more information in the following. There are two observations: 1) The problem is specifically related to how