Re: [PROPOSAL] Additional design for the Beam Python State and Timers API

2018-11-05 Thread Robert Bradshaw
On Fri, Oct 26, 2018 at 6:47 PM Kenneth Knowles wrote: > > It all sounds very useful but I have basic concerns about item 1. The doc > doesn't really seem to go into the design concerns that I have in mind. > > - map / flatMap are universal functions with definitions that we don't own > and

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-11-05 Thread Maximilian Michels
The result shows that there is a demand for an LTS release. +1 for using an existing release. How about six months for the initial LTS release? I think it shouldn't be too long for the first one to give us a chance to make changes to the model. -Max On 02.11.18 17:26, Ahmet Altay wrote:

Beam Dependency Check Report (2018-11-05)

2018-11-05 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue future 0.16.0 0.17.1 2016-10-27

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-11-05 Thread Robert Bradshaw
Yes, cutting more patch releases is the goal of the LTS release. We have not yet determined what the threshold is for backporting bugfixes (which, in part, depends on how much work that is) nor how often we'd do a release. On Mon, Nov 5, 2018 at 3:42 PM Chamikara Jayalath wrote: > > +1 for using

Re: Evolving a Coder for an added field

2018-11-05 Thread Ismaël Mejía
For some extra context this change touches more than FileIO, in reality this will affect updates in any file-based pipelines because the metadata on each file will have now an extra field for the lastModifiedDate. The PR looks perfect, only issue is the backwards compatibility Coder question.

Re: Evolving a Coder for an added field

2018-11-05 Thread Jean-Baptiste Onofré
That's really a pita. It's an important and impacting change. I would go to 1. For LTS, as already said, I would create a LTS branch and only cherry pick some changes. Using master as LTS release branch won't work IMHO. Regards JB On 05/11/2018 15:47, Ismaël Mejía wrote: > For some extra

Re: Stackoverflow Questions

2018-11-05 Thread Jean-Baptiste Onofré
That's "classic" in the Apache projects. And yes, most of the time, we periodically send or ask the dev to check the questions on other channels like stackoverflow. It makes sense to send a reminder or a list of open questions on the user mailing list (users can help each other too). Regards JB

Re: Evolving a Coder for an added field

2018-11-05 Thread Thomas Weise
+1 I think that coders should be immutable/versioned. The SDK should know about all the available versions and be able to associate the data (stream or at rest) with the corresponding coder version via URN. We can also look how that is solved elsewhere, for example the Kafka schema registry.

Re: Evolving a Coder for an added field

2018-11-05 Thread Jean-Baptiste Onofré
It makes sense to have a more concrete URN including the version. Good idea Robert. Regards JB On 05/11/2018 16:52, Robert Bradshaw wrote: > I think we'll want to allow upgrades across SDK versions. A runner > should be able to recognize when a coder (or any other aspect of the > pipeline) has

Re: Evolving a Coder for an added field

2018-11-05 Thread Robert Bradshaw
I think we'll want to allow upgrades across SDK versions. A runner should be able to recognize when a coder (or any other aspect of the pipeline) has changed and adapt/reject accordingly. (Until we remove coders from sources/sinks, there's also possibly the expectation that one should be able to

Re: DynamicMessage and ProtoCodec

2018-11-05 Thread Lukasz Cwik
It would make sense to have ProtoCoder support DynamicMessage. On Mon, Nov 5, 2018 at 1:29 AM Alex Van Boxel wrote: > It seems that the current ProtoCodec doesn't support DynamicMessage. I > want to fix this, but I'm wondering if I would build it into the current > ProtoCodec or create a

Stackoverflow Questions

2018-11-05 Thread Anton Kedin
Hi dev@, I was looking at stackoverflow questions tagged with `apache-beam` [1] and wanted to ask your opinion. It feels like it's easier for some users to ask questions on stackoverflow than on user@. Overall frequency between the two channels seems comparable but a lot of stackoverflow

Re: Stackoverflow Questions

2018-11-05 Thread Scott Wegner
I like the idea of working to improve the our presence on Q sites like StackOverflow. SO is a great resource and much more discoverable / searchable than a mail archive. One idea on how to improve our presence: StackOverflow supports setting up email subscriptions [1] for particular tags. It

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-11-05 Thread Ahmet Altay
+1 to starting with 2.7 branch and supporting it for 6 months. I think we should start the support window of 6 months from the day we agree to do this. That way users will at least get the benefit for 6 months after learning about LTS status. It seems like there is a consensus. Should we hold a

Re: :beam-sdks-java-io-hadoop-input-format:test task issues

2018-11-05 Thread Alexey Romanenko
Alex, could you check and update (if necessary) the package "libjffi-jni” on your system? Other potential workaround could be to try to run it with “ -Dcom.datastax.driver.USE_NATIVE_CLOCK=false" > On 31 Oct 2018, at 23:40, Kenneth Knowles wrote: > > If I am reading it right, the segfault is

Re: Stackoverflow Questions

2018-11-05 Thread Tim Robertson
Thanks for raising this Anton > It would be very easy to forward new SO questions to the user@ list, or > a new list if we're worried about the noise. +1 (preference on user@ until there are too many) On Mon, Nov 5, 2018 at 7:18 PM Scott Wegner wrote: > I like the idea of working to

Re: Stackoverflow Questions

2018-11-05 Thread Maximilian Michels
Great idea! I'd prefer a daily/weekly digest if possible. On 05.11.18 19:44, Tim Robertson wrote: Thanks for raising this Anton  It would be very easy to forward new SO questions to the user@ list, or a new list if we're worried about the noise. +1 (preference on user@ until there

Re: Stackoverflow Questions

2018-11-05 Thread Ankur Goenka
+1 for the daily/weekly digest to user@ On Mon, Nov 5, 2018 at 10:52 AM Maximilian Michels wrote: > Great idea! I'd prefer a daily/weekly digest if possible. > > On 05.11.18 19:44, Tim Robertson wrote: > > Thanks for raising this Anton > > > > It would be very easy to forward new SO

Re: Stackoverflow Questions

2018-11-05 Thread Kenneth Knowles
+user@ I think we'd better ask user@ before we subscribe the list to a regular automated email. Daily might be OK for dev@ but I would guess that user@ might prefer less frequent. It will have a predictable subject so it should be easy to filter if someone is not interested. Kenn On Mon, Nov 5,

Wiki edit access please

2018-11-05 Thread Robert Burke
I'd like to add more information about contributing to the Go SDK. Cheers, Robert B

Re: Wiki edit access please

2018-11-05 Thread Thomas Weise
You should be all set. On Mon, Nov 5, 2018 at 1:16 PM Robert Burke wrote: > I'd like to add more information about contributing to the Go SDK. > > Cheers, > Robert B >

[DISCUSS] More precision supported by DATETIME field in Schema

2018-11-05 Thread Rui Wang
Hi Community, The DATETIME field in Beam Schema/Row is implemented by Joda's Datetime (see Row.java#L611 and Row.java#L169

How to use "PortableRunner" in Python SDK?

2018-11-05 Thread Ruoyun Huang
Hi, Folks, I want to try out Python PortableRunner, by using following command: *sdk/python: python -m apache_beam.examples.wordcount --output=/tmp/test_output --runner PortableRunner* It complains with following error message: Caused by: java.lang.Exception: The user defined

Re: Python profiling

2018-11-05 Thread Ankur Goenka
All containers are destroyed by default on termination so to analyze profiling data for portable runners, either disable container cleanup (using --retainDockerContainers=true) or use remote distributed file system path. On Mon, Nov 5, 2018 at 1:05 AM Robert Bradshaw wrote: > Any portable

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-05 Thread Reuven Lax
I would vote that we change the internal representation of Row to something other than Joda. Java 8 times would give us at least microseconds, and if we want nanoseconds we could simply store it as a number. We should still keep accessor methods that return and take Joda objects, as the rest of

Re: How to use "PortableRunner" in Python SDK?

2018-11-05 Thread Ankur Goenka
Hi, The Portable Runner requires a job server uri to work with. The current default job server docker image is broken because of docker inside docker issue. Please refer to https://beam.apache.org/roadmap/portability/#python-on-flink for how to run a wordcount using Portable Flink Runner.

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-05 Thread Rui Wang
Thanks Reuven! I think Reuven gives the third option: Change internal representation of DATETIME field in Row. Still keep public ReadableDateTime getDateTime(String fieldName) API to be compatible with existing code. And I think we could add one more API to getDataTimeNanosecond. This option is

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-05 Thread Charles Chen
One related issue that came up before is that we (perhaps unnecessarily) restrict the precision of timestamps in the Python SDK to milliseconds because of legacy reasons related to the Java runner's use of Joda time. Perhaps Beam portability should natively use a more granular timestamp unit. On