Re: Jira components for cross-language transforms

2020-05-28 Thread Heejong Lee
If we use one meta component tag for all xlang related issues, I would prefer just "xlang". Then we could attach the "xlang" tag to not only language specific sdk tags but also other runner tags e.g. ['xlang', 'io-java-kafka'], ['xlang'', 'runner-dataflow']. On Thu, May 28, 2020 at 7:49 PM Robert

Re: [Discuss] Build Kafka read transform on top of SplittableDoFn

2020-05-28 Thread Reuven Lax
This is per-partition, right? In that case I assume it will match the current Kafka watermark. On Thu, May 28, 2020 at 9:03 PM Boyuan Zhang wrote: > Hi Reuven, > > I'm going to use MonotonicallyIncreasing >

Re: [Discuss] Build Kafka read transform on top of SplittableDoFn

2020-05-28 Thread Boyuan Zhang
Hi Reuven, I'm going to use MonotonicallyIncreasing by default and in the future, we may want to support custom kind if there is a request. On

Re: [Discuss] Build Kafka read transform on top of SplittableDoFn

2020-05-28 Thread Reuven Lax
Which WatermarkEstimator do you think should be used? On Thu, May 28, 2020 at 7:17 PM Boyuan Zhang wrote: > Hi team, > > I'm Boyuan, currently working on building a Kafka read PTransform on top > of SplittableDoFn[1][2][3]. There are two questions about Kafka usage I > want to discuss with you:

Re: Contributor permission for beam jira tickets

2020-05-28 Thread Robert Burke
Welcome! I think we've interacted on slack, but please feel free to tag me if you have questions or would like PRs reviewed in merge. I'm @lostluck on both on the beam-go slack and on github. On Wed, 27 May 2020 at 14:44, Gris Cuevas wrote: > Welcome! > > On 2020/05/27 09:12:52, Aaron

Re: Jira components for cross-language transforms

2020-05-28 Thread Robert Burke
+1 to new component not split. The language concerns can be represented and filtered with the existing sdk tags. I know I'm interested in all sdk-go issues, and would prefer not to have to union tags when searching for Go related issues. On Thu, 28 May 2020 at 15:48, Ismaël Mejía wrote: > +1 to

[Discuss] Build Kafka read transform on top of SplittableDoFn

2020-05-28 Thread Boyuan Zhang
Hi team, I'm Boyuan, currently working on building a Kafka read PTransform on top of SplittableDoFn[1][2][3]. There are two questions about Kafka usage I want to discuss with you: 1. Compared to the KafkaIO.Read

Re: Jira components for cross-language transforms

2020-05-28 Thread Ismaël Mejía
+1 to new component not splitted Other use case is using libraries not available in your language e.g. using some python transform that relies in a python only API in the middle of a Java pipeline. On Thu, May 28, 2020 at 11:12 PM Chamikara Jayalath wrote: > I proposed three components since

Re: Jira components for cross-language transforms

2020-05-28 Thread Chamikara Jayalath
I proposed three components since the audience might be different. Also we can use the same component to track issues related to all cross-language wrappers available in a given SDK. If this is too much a single component is fine as well. Ashwin, as others pointed out, the cross-language

Re: Jira components for cross-language transforms

2020-05-28 Thread Robert Bradshaw
+1 to a new component. I would not split things by language. On Thu, May 28, 2020 at 1:55 PM Kyle Weaver wrote: > > What are some of the benefits / drawbacks of using cross-language > transforms? Would a native Python transform perform better than a > cross-language transform written in Java

Re: Jira components for cross-language transforms

2020-05-28 Thread Kyle Weaver
> What are some of the benefits / drawbacks of using cross-language transforms? Would a native Python transform perform better than a cross-language transform written in Java that is then used in a Python pipeline? As Rui says, the main advantage is code reuse. See

Re: Jira components for cross-language transforms

2020-05-28 Thread Rui Wang
+1 on dedicated components for cross-language transform. It might be easy to manage to have one component (one tag for all SDK) rather than multiple ones. Re Ashwin, Cham knows more than me. AFAIK, cross-language transforms will maximize code reuse for newly developed SDK (e.g. IO transforms

Re: Jira components for cross-language transforms

2020-05-28 Thread Ashwin Ramaswami
What are some of the benefits / drawbacks of using cross-language transforms? Would a native Python transform perform better than a cross-language transform written in Java that is then used in a Python pipeline? Ashwin Ramaswami Student *Find me on my:* LinkedIn

Re: Jira components for cross-language transforms

2020-05-28 Thread Kyle Weaver
SGTM. Though I'm not sure it's necessary to split by language. It might be easier to use a single cross-language tag, rather than having to tag lots of issues as both sdks-python-xlang and sdks-java-xlang. On Thu, May 28, 2020 at 4:29 PM Chamikara Jayalath wrote: > Hi All, > > I think it's good

Jira components for cross-language transforms

2020-05-28 Thread Chamikara Jayalath
Hi All, I think it's good if we can have new Jira components to easily track various issues related to cross-language transforms. What do you think about adding the following Jira components ? sdks-python-xlang sdks-java-xlang sdks-go-xlang Jira component sdks-foo-xlang is for tracking issues

Re: writing new IO with Maven dependencies

2020-05-28 Thread Luke Cwik
+dev On Thu, May 28, 2020 at 11:55 AM Ken Barr wrote: > I am currently developing an IO that I would like to eventually submit to > Apache Beam project. The IO itself is Apache2.0 licensed. > Does every chained dependency I use need to be opensource? > The transitive dependency tree must

Re: Kotlin Type Inference Issue for Primitives in DoFn

2020-05-28 Thread Reuven Lax
This means that the TypeDescriptors don't match. It could be something weird with the Int type, or it could be Kotlin not propagating the generic type parameters of the DoFn. On Thu, May 28, 2020 at 8:03 AM Rion Williams wrote: > Hi Reuvan, > > Here's the complete stack trace: > > Exception in

Re: Semantic versioning

2020-05-28 Thread Luke Cwik
Updating our documentation makes sense. The backwards compat discussion is an interesting read. One of the points that they mention is that they like Spark users to be on the latest Spark. I can say that this is also true for Dataflow where we want users to be on the latest version of Beam. In

Re: SQL Windowing

2020-05-28 Thread Maximilian Michels
Thanks for the quick reply Brian! I've filed a JIRA for option (a): https://jira.apache.org/jira/browse/BEAM-10143 Makes sense to define DATETIME as a logical type. I'll check out your PR. We could work around this for now by doing a cast, e.g.: TUMBLE(CAST(f_timestamp AS DATETIME), INTERVAL

Re: Python Cross-language wrappers for Java IOs

2020-05-28 Thread Piotr Szuberski
On 2020/05/28 16:54:47, Piotr Szuberski wrote: > I added to Jira task of creating cross-language wrappers for Java IOs. It > will soon be in progress. > https://issues.apache.org/jira/browse/BEAM-10134

Re: Python Cross-language wrappers for Java IOs

2020-05-28 Thread Chamikara Jayalath
Great. Thanks for working on this. Can you please add these tasks and JIRAs to the cross-language transforms roadmap under "Connector/transform support". https://beam.apache.org/roadmap/connectors-multi-sdk/ Happy to help if you run into any issues during this task.

Re: [ANNOUNCE] Beam 2.21.0 Released

2020-05-28 Thread Udi Meiri
Woohoo! On Thu, May 28, 2020 at 4:16 AM Kyle Weaver wrote: > The Apache Beam team is pleased to announce the release of version 2.21.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelines, including ETL, batch and stream > (continuous)

Re: SQL Windowing

2020-05-28 Thread Brian Hulette
Hey Max, Thanks for kicking the tires on SqlTransform in Python :) We don't have any tests of windowing and Sql in Python yet, so I'm not that surprised you're running into issues here. Portable schemas don't support the DATETIME type, because we decided not to define it as one of the atomic

Python Cross-language wrappers for Java IOs

2020-05-28 Thread Piotr Szuberski
I added to Jira task of creating cross-language wrappers for Java IOs. It will soon be in progress.

Re: Proposal for reading from / writing to archive files

2020-05-28 Thread Robert Bradshaw
On Thu, May 28, 2020 at 9:34 AM Chamikara Jayalath wrote: > Thanks for the contribution. This sounds very interesting. Few comments. > > * | fileio.MatchFiles('hdfs://path/to/*.zip') | fileio.ExtractMatches() | > fileio.MatchAll() > > We usually either do

SQL Windowing

2020-05-28 Thread Maximilian Michels
Hi, I'm using the SqlTransform as an external transform from within a Python pipeline. The SQL docs [1] mention that you can either (a) window the input or (b) window in the SQL query. Option (a): input | "Window >> beam.WindowInto(window.FixedWindows(30)) | "Aggregate" >>

Re: Proposal for reading from / writing to archive files

2020-05-28 Thread Chamikara Jayalath
Thanks for the contribution. This sounds very interesting. Few comments. * | fileio.MatchFiles('hdfs://path/to/*.zip') | fileio.ExtractMatches() | fileio.MatchAll() We usually either do 'fileio.MatchFiles('hdfs://path/to/*.zip')' or 'fileio.MatchAll()'. Former to read a specific glob and latter

Re: Semantic versioning

2020-05-28 Thread Ismaël Mejía
I am surprised that we are claiming in the Beam website to use semantic versioning (semver) [1] in Beam [2]. We have NEVER really followed semantic versioning and we have broken multiple times both internal and external APIs (at least for Java) as you can find in this analysis of source and binary

Re: Kotlin Type Inference Issue for Primitives in DoFn

2020-05-28 Thread Rion Williams
Hi Reuvan, Here's the complete stack trace: Exception in thread "main" java.lang.IllegalArgumentException: Type of @Element must match the DoFn typeCreate.Values/Read(CreateSource).out [PCollection] at org.apache.beam.sdk.transforms.ParDo.getDoFnSchemaInformation(ParDo.java:601)

dealing with late data output timestamps

2020-05-28 Thread David Morávek
Hi, I've came across "unexpected" model behaviour when dealing with late data and custom timestamp combiners. Let's take a following pipeline as an example: final PCollection input = ...; input.apply( "GlobalWindows", Window.into(new GlobalWindows()) .triggering(

Re: Semantic versioning

2020-05-28 Thread Reuven Lax
Most of those items are either in APIs marked @Experimental (the definition of Experimental in Beam is that we can make breaking changes to the API) or are changes in a specific runner - not the Beam API. Reuven On Thu, May 28, 2020 at 7:19 AM Ashwin Ramaswami wrote: > There's a "Breaking

Re: Semantic versioning

2020-05-28 Thread Ashwin Ramaswami
There's a "Breaking Changes" section on this blogpost: https://beam.apache.org/blog/beam-2.21.0/ (and really, for earlier minor versions too) Ashwin Ramaswami Student *Find me on my:* LinkedIn | Website | GitHub

Re: Semantic versioning

2020-05-28 Thread Reuven Lax
What did we break? On Thu, May 28, 2020, 6:31 AM Ashwin Ramaswami wrote: > Do we really use semantic versioning? It appears we introduced breaking > changes from 2.20.0 -> 2.21.0. If not, we should update the documentation > under "API Stability" on this page: >

Semantic versioning

2020-05-28 Thread Ashwin Ramaswami
Do we really use semantic versioning? It appears we introduced breaking changes from 2.20.0 -> 2.21.0. If not, we should update the documentation under "API Stability" on this page: https://beam.apache.org/get-started/downloads/ What would be a better way to word the way in which we decide

[ANNOUNCE] Beam 2.21.0 Released

2020-05-28 Thread Kyle Weaver
The Apache Beam team is pleased to announce the release of version 2.21.0. Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. See https://beam.apache.org You can download the release

Re: What's the purpose of version=2.20.0-RC2 in gradle.properties?

2020-05-28 Thread Maximilian Michels
> I would expect the release branch to have the next -SNAPSHOT version (not the > case currently): Why would the release branch have the next version? It is created for the sole purpose of releasing the current version. For example, the release branch for 2.21.0 would have the version