Re: big data blog

2020-02-07 Thread Kenneth Knowles
Nice! Yes, I think we should promote Beam articles that are insightful from a longtime contributor. Etienne - can you add twitter announcements/retweets to the social media spreadsheet when you write new articles? Kenn On Fri, Feb 7, 2020 at 5:44 PM Ahmet Altay wrote: > Cool, thank you. Would

Re: Compile error on Java 11 when running :examples:java:test

2020-02-07 Thread Jean-Baptiste Onofre
Hi, AFAIR I had the same issue on my Linux. Let me do a new run. Regards JB > Le 7 févr. 2020 à 21:35, Kenneth Knowles a écrit : > > The expected class file version 53 is for Java 9, I believe. So is the right > javac being invoked? > > I hit some issues like this on mac a while back,

Re: Updating releases on Github release page.

2020-02-07 Thread Kenneth Knowles
Previously, GitHub treated every tag a release (why? I don't know). I think you can remove/edit them now. In addition to adding new ones, let's remove the ones that are not actually voted on releases. Kenn On Fri, Feb 7, 2020 at 5:42 PM Ahmet Altay wrote: > I do not believe this is

Re: [PROPOSAL] Beam Schema Options

2020-02-07 Thread Kenneth Knowles
All fair points. I think it is a good proposal. We already know of existing and future uses for it. I don't think my concerns are actually answered by this discussion. Does this allow/encourage creation of a PCollection that you can't make sense of (or can't make *good* sense of) without

Re: big data blog

2020-02-07 Thread Ahmet Altay
Cool, thank you. Would it make sense to promote Beam related posts on our twitter channel? On Fri, Feb 7, 2020 at 2:47 PM Pablo Estrada wrote: > Very nice. Thanks for sharing Etienne! > > On Fri, Feb 7, 2020 at 2:19 PM Reuven Lax wrote: > >> Cool! >> >> On Fri, Feb 7, 2020 at 7:24 AM Etienne

Re: Updating releases on Github release page.

2020-02-07 Thread Ahmet Altay
I do not believe this is intentional. This step might be missing from the release guide. On Fri, Feb 7, 2020 at 5:07 PM Daniel Oliveira wrote: > Hey beam devs, > > I saw a comment on SO that our releases on github ( > https://github.com/apache/beam/releases) are stuck at 2.16.0. It looks > like

Updating releases on Github release page.

2020-02-07 Thread Daniel Oliveira
Hey beam devs, I saw a comment on SO that our releases on github ( https://github.com/apache/beam/releases) are stuck at 2.16.0. It looks like that's still tagged as the "Latest Release", but the newer releases are actually present in tiny words above it: "... Show 7 newer tags". I wanted to fix

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Kenneth Knowles
Regarding StatefulDoFnRunner: this fails during pipeline execution, too late, and as you noted is just a utility that a runner may optionally use. The change needs to be in the runner's run() method prior to execution starting. Here is a specific PR that demonstrates the technique:

Re: big data blog

2020-02-07 Thread Pablo Estrada
Very nice. Thanks for sharing Etienne! On Fri, Feb 7, 2020 at 2:19 PM Reuven Lax wrote: > Cool! > > On Fri, Feb 7, 2020 at 7:24 AM Etienne Chauchot > wrote: > >> Hi all, >> >> FYI, I just started a blog around big data technologies and for now it >> is focused on Beam. >> >>

Re: Dynamic timers now supported!

2020-02-07 Thread Reuven Lax
Thanks for finding this. Hopefully the bug is easy .to fix. The tests indeed never ran on any runner except for the DirectRunner, which is something I should've noticed in the code review. Reuven On Mon, Feb 3, 2020 at 12:50 AM Ismaël Mejía wrote: > I had a discussion with Rehman last week and

Re: big data blog

2020-02-07 Thread Reuven Lax
Cool! On Fri, Feb 7, 2020 at 7:24 AM Etienne Chauchot wrote: > Hi all, > > FYI, I just started a blog around big data technologies and for now it > is focused on Beam. > > https://echauchot.blogspot.com/ > > Feel free to comment, suggest or anything. > > Etienne > >

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Jan Lukavský
Hi Robert, thanks for this insight. I think that this sort of uncovered additional question - I'm not saying that I follow every thread in dev@, but I didn't notice anything about "trying to stabilize the protos", which is again where I think these big milestones probably should be defined in

Re: [PROPOSAL] Beam Schema Options

2020-02-07 Thread Reuven Lax
True - however at some level that's up to the user. We should be diligent that we don't implement core functionality this way (so far schema metadata has only been used for the fidelity use case above). However if some users wants to use it more extensively in their pipeline, that's up to them.

Re: Unable to run ParDoTests from CLI

2020-02-07 Thread Reuven Lax
FYI, this is documented here https://cwiki.apache.org/confluence/display/BEAM/Contribution+Testing+Guide#ContributionTestingGuide-HowtorunJavaNeedsRunnertests On Fri, Feb 7, 2020 at 6:29 AM Ismaël Mejía wrote: > Use > > ./gradlew :runners:direct-java:needsRunner --tests

Re: [PROPOSAL] Beam Schema Options

2020-02-07 Thread Kenneth Knowles
It is a good point that it applies to configuring sources and sinks mostly, or external data more generally. What I worry about is that metadata channels like this tend to take over everything, and do it worse than more structured approach. As an exaggerated example which is not actually that

Re: [PROPOSAL] Beam Schema Options

2020-02-07 Thread Reuven Lax
I disagree - I've had several cases where user options on fields are very useful internally. A common rationale is to preserve fidelity. For instance, reading protos, projecting out a few fields, writing protos back out. You want to be able to nicely map protos to Beam schemas, but also preserve

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Jan Lukavský
I reviewed closely the runners ad it seems to me that:  - all batch runners that would fail to support the annotation will fail already (spark structured streaming, apex) due to missing support for state or timers  - streaming runners must explicitly enable this, _as long as they use

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Robert Bradshaw
There are two separable concerns here. (1) The @RequiresTimeSortedInput feature itself. This is a subtle feature needed for certain pipelines, and if anything Jan has gone the extra mile discussing, documenting, and designing this and trying to reach consensus. I feel like there has been a

Re: [DISCUSS] Autoformat python code with Black

2020-02-07 Thread Udi Meiri
Chad: yes. I also noticed that it's not running on the Jenkins lint precommit job. On Fri, Feb 7, 2020 at 12:59 PM David Yan wrote: > Thank you Robert. > > https://github.com/google/yapf/issues/530 has been open for 2 years, but > we will use `yapf: disable` and `yapf: enable` as a workaround

Re: [DISCUSS] Autoformat python code with Black

2020-02-07 Thread David Yan
Thank you Robert. https://github.com/google/yapf/issues/530 has been open for 2 years, but we will use `yapf: disable` and `yapf: enable` as a workaround for now. David On Fri, Feb 7, 2020 at 12:29 PM Robert Bradshaw wrote: > Yeah, that's a lot worse. This looks like >

Re: Compile error on Java 11 when running :examples:java:test

2020-02-07 Thread Kenneth Knowles
The expected class file version 53 is for Java 9, I believe. So is the right javac being invoked? I hit some issues like this on mac a while back, unrelated to Java 11. Suspected something wonky in Mac's Java setup not working well with the Gradle wrapper. Never resolved them actually. Have been

Re: [DISCUSS] Autoformat python code with Black

2020-02-07 Thread Chad Dombrova
I have a PR I'm working on to allow users to easily setup yapf to run on pre-commit. Is that something that interests people? -chad On Fri, Feb 7, 2020 at 12:29 PM Robert Bradshaw wrote: > Yeah, that's a lot worse. This looks like > https://github.com/google/yapf/issues/530 . In the

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Kenneth Knowles
TL;DR I am not suggesting that you must implement this for any runner. I'm afraid I do have to propose this change be rolled back before release 2.21.0 unless we fix this. I think the fix is easily achieved. Clarifications inline. On Fri, Feb 7, 2020 at 11:20 AM Jan Lukavský wrote: > Hi Kenn,

Re: [DISCUSS] Autoformat python code with Black

2020-02-07 Thread Robert Bradshaw
Yeah, that's a lot worse. This looks like https://github.com/google/yapf/issues/530 . In the meantime, https://pypi.org/project/yapf/#potentially-frequently-asked-questions On Fri, Feb 7, 2020 at 12:17 PM David Yan wrote: > > Hi, I just tried out the yapf formatter and I noticed that sometimes

Re: [DISCUSS] Autoformat python code with Black

2020-02-07 Thread David Yan
Hi, I just tried out the yapf formatter and I noticed that sometimes it's making the original code a lot less readable. In the below example, - is the original, + is after running the yapf formatter. Looks like the problem is with the method chaining pattern. How feasible is it to have yapf

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Jan Lukavský
And as a quick summary a pipeline with @RequiresTimeSortedInput will:  a) work well on streaming pipelines run on direct java and non-portable flink, will fail on every other streaming runner  b) work well on batch non-portable flink, legacy spark and batch dataflow  c) from what I can tell,

Re: Compile error on Java 11 when running :examples:java:test

2020-02-07 Thread Jean-Baptiste Onofré
HiNo jdk 11 is not yet fully supported.I?ve started to work on it but it?s not yet ready.RegardsJBLe ven. 7 f?vr. 2020 ? 20:20, David Cavazos a ?crit :Hi Beamers,I'm trying to run the tests for the Java examples using Java 11 and there is a compilation error due to an incompatible version.I'm

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Jan Lukavský
Hi Kenn, I think that this approach is not well maintainable and doesn't scale. Main reasons:  a) modifying core has by definition some impact on runners, so modifying core would imply necessity to modify all runners  b) having to implement core feature for all existing runners will make

Re: Unable to run ParDoTests from CLI

2020-02-07 Thread Rehman Murad Ali
Thanks, Ismaël. *Rehman Murad Ali* Software Engineer Mobile: +92 3452076766 Skype: rehman.muradali On Fri, Feb 7, 2020 at 7:29 PM Ismaël Mejía wrote: > Use > > ./gradlew :runners:direct-java:needsRunner --tests "*ParDoTest\$TimerTests" > > For ValidatesRunner for example: > /gradlew

Re: [BEAM-8550] @RequiresTimeSortedInput ready for merge to master

2020-02-07 Thread Kenneth Knowles
I see. It is good to see that the pipeline will at least fail. However, the expect approach here is that the pipeline is rejected prior to execution. That is a primary reason for our annotation-driven API style; it allows much better "static" analysis by a runner, so we don't have to wait and fail

Re: Jenkins outage

2020-02-07 Thread Yifan Zou
Cleaned the beam7, it should be okay. On Fri, Feb 7, 2020 at 9:42 AM Yifan Zou wrote: > I'll look into the 'no space' issue. > > On Fri, Feb 7, 2020 at 7:14 AM Ismaël Mejía wrote: > >> mmm apache-beam-jenkins-14 also has issues: >> >> *16:08:47* ERROR: Error cloning remote repo

Re: Jenkins outage

2020-02-07 Thread Yifan Zou
I'll look into the 'no space' issue. On Fri, Feb 7, 2020 at 7:14 AM Ismaël Mejía wrote: > mmm apache-beam-jenkins-14 also has issues: > > *16:08:47* ERROR: Error cloning remote repo 'origin'*16:08:47* > hudson.plugins.git.GitException: Could not init >

Re: Executing the runner validation tests for the Twister2 runner

2020-02-07 Thread Pulasthi Supun Wickramasinghe
Hi Kenn Thanks for the information, Will add information accordingly and update the community. Best Regards, Pulasthi On Wed, Jan 29, 2020 at 8:28 AM Kenneth Knowles wrote: > In my opinion it is fine to add the documentation after the runner is > added. I do think we should have input from

Re: Tests not triggering

2020-02-07 Thread Andrew Pilloud
I saw similar things yesterday. I reran the stuck/missing tests about an hour ago with 'retest this please' and they worked. Andrew On Fri, Feb 7, 2020 at 9:25 AM Reuven Lax wrote: > Is Jenkins wedged again? I have PRs where the tests have been have been > pending for over 10 hours. > > Reuven

Tests not triggering

2020-02-07 Thread Reuven Lax
Is Jenkins wedged again? I have PRs where the tests have been have been pending for over 10 hours. Reuven

Re: Time precision in Python

2020-02-07 Thread Robert Bradshaw
I meant issues with windows firing before their time (i.e. before the watermark passes the end of the window). On Thu, Feb 6, 2020 at 8:42 PM Kenneth Knowles wrote: > > What is an out of order window? > > On Thu, Feb 6, 2020 at 3:09 PM Sam Rohde wrote: >> >> Gotcha, I was just surprised by the

Re: [PROPOSAL] Beam Schema Options

2020-02-07 Thread Brian Hulette
Messed up my own short-link. It's https://s.apache.org/xlang-table-provider On Fri, Feb 7, 2020 at 8:54 AM Brian Hulette wrote: > I'm not sure this belongs directly on schemas. I've had trouble > reconciling that opinion, since the idea does seem very useful, and in fact > I'm interested in

Re: [PROPOSAL] Beam Schema Options

2020-02-07 Thread Brian Hulette
I'm not sure this belongs directly on schemas. I've had trouble reconciling that opinion, since the idea does seem very useful, and in fact I'm interested in using it myself. I think I've figured out my concern - what I really want is options for a (maybe portable) Table. As I indicated in a

big data blog

2020-02-07 Thread Etienne Chauchot
Hi all, FYI, I just started a blog around big data technologies and for now it is focused on Beam. https://echauchot.blogspot.com/ Feel free to comment, suggest or anything. Etienne

Re: Jenkins jobs not running for my PR 10438

2020-02-07 Thread Ismaël Mejía
done On Fri, Feb 7, 2020 at 4:05 PM Tomo Suzuki wrote: > Hi Beam committers, > > I appreciate if you can run precommit checks for > https://github.com/apache/beam/pull/10769 > with the following 6 commands: > > Run Java PostCommit > Run Java HadoopFormatIO Performance Test > Run BigQueryIO

Re: Jenkins outage

2020-02-07 Thread Ismaël Mejía
mmm apache-beam-jenkins-14 also has issues: *16:08:47* ERROR: Error cloning remote repo 'origin'*16:08:47* hudson.plugins.git.GitException: Could not init /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_VR_Spark_PR/src

Re: Jenkins jobs not running for my PR 10438

2020-02-07 Thread Tomo Suzuki
Hi Beam committers, I appreciate if you can run precommit checks for https://github.com/apache/beam/pull/10769 with the following 6 commands: Run Java PostCommit Run Java HadoopFormatIO Performance Test Run BigQueryIO Streaming Performance Test Java Run Dataflow ValidatesRunner Run Spark

Re: Unable to run ParDoTests from CLI

2020-02-07 Thread Ismaël Mejía
Use ./gradlew :runners:direct-java:needsRunner --tests "*ParDoTest\$TimerTests" For ValidatesRunner for example: /gradlew :runners:direct-java:validatesRunner --tests "*ParDoTest\$TimerFamily*" Credit to Brian who helped me because I was struggling with the same issue last week. On Fri, Feb

Unable to run ParDoTests from CLI

2020-02-07 Thread Rehman Murad Ali
Hello Community, I have been trying to run test cases from CLI. ParDoTest.java has some inner classes with test functions (for example TimerTest). This is the command I have used to run the test: ./gradlew runners:direct-java:needsRunnerTests --tests

Re: Jenkins outage

2020-02-07 Thread Ismaël Mejía
I am getting "java.lang.IllegalStateException: java.io.IOException: No space left on device" on apache-beam-jenkins-7 Can somebody please clean the space. Thanks. https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark_PR/26/ On Fri, Feb 7, 2020 at 2:19 PM Ismaël Mejía wrote: > Thanks

Re: Jenkins outage

2020-02-07 Thread Ismaël Mejía
Thanks for taking care of this issue with INFRA Michał. Everything back to normal! On Fri, Feb 7, 2020 at 11:34 AM Michał Walenia wrote: > Everything looks fine now, the jobs are triggering correctly again > > On Fri, Feb 7, 2020 at 10:06 AM Michał Walenia > wrote: > >> Hi there, >> it seems

SplittableDoFn with Flink fails at checkpointing larger files (200MB)

2020-02-07 Thread marek-simunek
Hi,    I am using FileIO with continuously watching folder for new files to process. The problem is when flink starts reading 200MB file (around 3M elements) and also starts checkpointing. Checkpoint never finishes until WHOLE file is processed. Minimal example :

Re: Jenkins outage

2020-02-07 Thread Michał Walenia
Everything looks fine now, the jobs are triggering correctly again On Fri, Feb 7, 2020 at 10:06 AM Michał Walenia wrote: > Hi there, > it seems that our Jenkins is experiencing some issues and the jobs are > getting stuck in the queue despite the executors being idle. > Here's the JIRA issue

Jenkins outage

2020-02-07 Thread Michał Walenia
Hi there, it seems that our Jenkins is experiencing some issues and the jobs are getting stuck in the queue despite the executors being idle. Here's the JIRA issue for this: https://issues.apache.org/jira/browse/INFRA-19830 Let's hope it will be resolved soon. -- Michał Walenia Polidea