Re: [DISCUSS] Next steps for update of Avro dependency in Beam

2022-05-13 Thread Etienne Chauchot
Hi, Thanks Alexey for bringing this topic up. I'd be in favor of 3 Best Etienne Le 12/05/2022 à 23:21, Brian Hulette a écrit : Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 (still have an issue with CVEs)" Do we actually need to keep avro in core for any

Remove support for Elasticsearch 5 and 6 ?

2022-03-25 Thread Etienne Chauchot
Hi all, Elastic no more supports Elastic 5 and 6 (1). We are in the middle of removing long overdue Elasticsearch 2 support but maybe it is time to remove support for ES5 and ES6 from Beam as well. WDYT ? [1] https://endoflife.date/elasticsearch Best Etienne

Re: [ANNOUNCE] New committer: Moritz Mack

2022-03-11 Thread Etienne Chauchot
Congrats Moritz ! Well deserved ! Etienne Le 10/03/2022 à 19:44, Sachin Agarwal a écrit : Congratulations Moritz! On Thu, Mar 10, 2022 at 10:44 AM Alexey Romanenko wrote: Hi everyone, Please join me and the rest of the Beam PMC in welcoming a new committer: Moritz Mack

[ANNOUNCE] New committer: Evan Galpin

2022-03-10 Thread Etienne Chauchot
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Evan Galpin Since joining the Beam community Evan has done lots of contributions to IOs mainly Elasticsearch, but also to SDK transforms. He also gave support on the ML and tested releases. Considering these

Re: Intro

2021-10-22 Thread Etienne Chauchot
Welcome onboard Moritz ! Best Etienne On 22/10/2021 15:52, Moritz Mack wrote: Hi all, I’m very much looking forward to start contributing to Beam and just want to briefly introduce myself. My name is Moritz (mosche) and I’m working together with Alexey and Etienne. Having worked mostly

Spark Structured Streaming runner migrated to Spark 3

2021-08-05 Thread Etienne Chauchot
Hi all, Just to let you know that Spark Structured Streaming runner was migrated to Spark 3. Enjoy ! Etienne

Re: Spark Structured Streaming Runner Roadmap

2021-08-03 Thread Etienne Chauchot
Hi, Sorry for the late answer: the streaming mode in spark structured streaming runner is stuck because of spark structured streaming framework implementation of watermark at the apache spark project side. See https://echauchot.blogspot.com/2020/11/watermark-architecture-proposal-for.html

Re: [VOTE] Vendored Dependencies Release Byte Buddy 1.11.0

2021-05-20 Thread Etienne Chauchot
+1 (binding) on releasing vendored bytebuddy for testing in https://github.com/apache/beam/pull/14824 Etienne On 19/05/2021 23:43, Kai Jiang wrote: +1 (non-binding) On Wed, May 19, 2021 at 12:23 PM Jan Lukavský > wrote: +1 (non-binding) verified correct

Re: [DISCUSS] Drop support for Flink 1.8 and 1.9

2021-03-15 Thread Etienne Chauchot
Hi, +1 on drop Etienne On 12/03/2021 20:39, Ismaël Mejía wrote: Do we now support 1.8 through 1.12? Yes and that's clearly too much given that the Flink community only support the two latest release. It also hits us because we run tests for all those versions on precommit. On Fri, Mar 12,

Re: Unit tests vs. Integration Tests

2021-01-15 Thread Etienne Chauchot
Big +1 on using testcontainers rather than embedded real backends. That is what we plan to use for ES refactoring. I'm a strong believer that mocks are useless to replace complex backends. Testing things like IOs against mocks are a almost certain failure because mocks cannot be

Re: Combine with multiple outputs case Sample and the rest

2021-01-15 Thread Etienne Chauchot
Hi all, Regarding leveraging the Pardo part of Combine (Combine <=> GBK + Pardo) to have multiple outputs, please note that most of the time Combine is translated by the runners with a native (destination-tech) Combine and not a GBK + Pardo. Regarding using the Stateful DoFn I agree with

Re: ElasticsearchIO.Write() dynamic ES indices

2021-01-15 Thread Etienne Chauchot
+1 What is not supported yet is the wildcard indexes (index* pattern). It will be when the refactoring (use high level ES objects rather than low level REST String objects) is done. Best Etienne On 13/01/2021 16:45, Brian Hulette wrote: It looks like you should be able to accomplish this

Re: [blog about Beam]

2020-11-12 Thread Etienne Chauchot
Hi Pablo, Thanks for reading these. Etienne On 10/11/2020 20:09, Pablo Estrada wrote: Thanks Etienne! I read your post on why we can't have multiple aggregations in Spark Streaming. It was informative. Thanks for writing these! Best -P. On Tue, Nov 10, 2020 at 3:39 AM Etienne Chauchot

[blog about Beam]

2020-11-10 Thread Etienne Chauchot
Hi all, In case anyone is interested, I started a blog [1] this year about big data technologies. There are 8 articles so far and they are mainly related to Beam even if some are related to Spark (but with the knowledge acquired while working on the Beam Spark runner). I just published the

[Beam Spark Structured Streaming runner]

2020-11-05 Thread Etienne Chauchot
Hi all, In case anyone wanted some details about the new Beam runner based on Spark Structured Streaming framework, here are 2 talks I gave at the ApacheCon this year and last year about this subject. https://www.youtube.com/watch?v=oEehQwOEFvg https://www.youtube.com/watch?v=_dCmV1ZW3M4

Re: Shutting down Perfkit Explorer

2020-09-25 Thread Etienne Chauchot
Thanks Kamil ! Etienne On 25/09/2020 16:10, Kamil Wasilewski wrote: They have been migrated: http://metrics.beam.apache.org The website doesn't support HTTPS, so if you can't access it, you may need to add an exception to your browser extension. On Fri, Sep 25, 2020 at 3:49 PM Etienne

Re: Shutting down Perfkit Explorer

2020-09-25 Thread Etienne Chauchot
Hi all, I'm coming a bit after the battle, but how should we access the dashboards now (load tests, nexmark etc...)? Are the dashboards lost or have they migrated to another environment ? Thanks Etienne On 24/09/2020 13:51, Robert Burke wrote: LGTM Good clear message. On Thu, Sep 24,

Re: Chronically flaky tests

2020-08-04 Thread Etienne Chauchot
Hi all, +1 on ping the assigned person. For the flakes I know of (ESIO and CassandraIO), they are due to the load of the CI server. These IOs are tested using real embedded backends because those backends are complex and we need relevant tests. Counter measures have been taken (retrial

Re: [ANNOUNCE] New PMC Member: Alexey Romanenko

2020-06-19 Thread Etienne Chauchot
Congrats Alexey ! Well deserved ! Etienne On 17/06/2020 16:30, Gleb Kanterov wrote: Congratulations! Thanks for your hard work On Wed, Jun 17, 2020 at 1:11 PM Alexey Romanenko mailto:aromanenko@gmail.com>> wrote: Thank you Ismaël and everybody! Happy to be a part of Beam

Re: Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

2020-06-17 Thread Etienne Chauchot
ormance and debuggability. On Wed, Dec 11, 2019 at 3:47 AM Etienne Chauchot wrote: Ok, Thanks Kenn. Le Flatten javadoc says that by default the coder of the output should be the coder of the first input. But in the test, it sets the output coder to something different. Waiting for a cons

Re: Add options to CassandraIO

2020-05-14 Thread Etienne Chauchot
Hi Nathan, Thanks for raising this, and thanks for the PR proposal. I would recommend (as it was done in other IOs such as ElasticsearchIO) the third solution: you could add a method called withConnectTimeout(Integer) to both the Read and Write builders of the IO (there is no common conf

Re: A new reworked Elasticsearch 7+ IO module

2020-04-09 Thread Etienne Chauchot
ention and indicates urgency)  - reject completely Kenn On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot <mailto:echauc...@apache.org>> wrote: Hi all, it's been 3 weeks since the survey on ES versions the users use. The survey received very few responses: only 9 responses for no

Re: A new reworked Elasticsearch 7+ IO module

2020-03-31 Thread Etienne Chauchot
Etienne Chauchot. On 06/03/2020 11:26, Etienne Chauchot wrote: Hi all, it's been 3 weeks since the survey on ES versions the users use. The survey received very few responses: only 9 responses for now (multiple versions possible of course). The responses are the following: ES2: 0 clients

Re: A new reworked Elasticsearch 7+ IO module

2020-03-06 Thread Etienne Chauchot
support but for now it is still not very representative. I'm cross-posting to @users to let you know that I'm closing the survey within 1 or 2 weeks. So please respond if you're using ESIO. Best Etienne On 13/02/2020 12:37, Etienne Chauchot wrote: Hi Cham, thanks for your comments ! I just

Re: Beam Emitted Metrics Reference

2020-03-02 Thread Etienne Chauchot
Hi, There is a doc about metrics here: https://beam.apache.org/documentation/programming-guide/#metrics You can also export the metrics to sinks (REST http endpoint and Graphite), see MetricsOptions class for configuration. Still, there is no doc for export on website, I'll add some Best

Re: GroupIntoBatches not Working properly for Direct Runner Java

2020-03-02 Thread Etienne Chauchot
Hi, +1 to what Kenn asked: your pipeline is in streaming mode and GIB preserves windowing, the elements are buffered until one of these conditions are true: batchsize reached or end of window. I your case I think it is the second one. Best Etienne On 28/02/2020 19:15, Kenneth Knowles

Re: [ANNOUNCE] New committer: Alex Van Boxel

2020-02-20 Thread Etienne Chauchot
Congrats Alex ! Well deserved ! Etienne On 20/02/2020 12:23, Michał Walenia wrote: Congratulations! On Thu, Feb 20, 2020 at 2:31 AM Chamikara Jayalath mailto:chamik...@google.com>> wrote: Congrats Alex! On Wed, Feb 19, 2020 at 7:21 AM Ryan Skraba mailto:r...@skraba.com>> wrote:

Re: big data blog

2020-02-13 Thread Etienne Chauchot
Hi all, I just sent the link to the blog articles on @ApacheBeam twitter as Kenn suggested. Etienne On 10/02/2020 10:01, Etienne Chauchot wrote: Yes sure, Here is the link to the spreadsheet for review of the tweet: https://docs.google.com/spreadsheets/d

Re: A new reworked Elasticsearch 7+ IO module

2020-02-13 Thread Etienne Chauchot
Jayalath wrote: On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot <mailto:echauc...@apache.org>> wrote: Hi, please see my comments inline On 06/02/2020 16:24, Alexey Romanenko wrote: Please, see my comments inline. On 6 Feb 2020, at 10:50, Etienne Chauchot mail

Re: big data blog

2020-02-10 Thread Etienne Chauchot
gle.com>> wrote: Cool! On Fri, Feb 7, 2020 at 7:24 AM Etienne Chauchot mailto:echauc...@apache.org>> wrote: Hi all, FYI, I just started a blog around big data technologies and for now it

big data blog

2020-02-07 Thread Etienne Chauchot
Hi all, FYI, I just started a blog around big data technologies and for now it is focused on Beam. https://echauchot.blogspot.com/ Feel free to comment, suggest or anything. Etienne

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Etienne Chauchot
Hi, please see my comments inline On 06/02/2020 16:24, Alexey Romanenko wrote: Please, see my comments inline. On 6 Feb 2020, at 10:50, Etienne Chauchot <mailto:echauc...@apache.org>> wrote: 1. regarding version support: ES v2 is no more maintained by Elastic sinc

Re: A new reworked Elasticsearch 7+ IO module

2020-02-06 Thread Etienne Chauchot
Jayalath, my aswers are inline. Have a good day ! Ludovic Le mer. 5 févr. 2020 à 20:15, Chamikara Jayalath <mailto:chamik...@google.com>> a écrit : On Wed, Feb 5, 2020 at 6:35 AM Etienne Chauchot mailto:echauc...@apache.org>> wrote: Still there is somethin

Re: A new reworked Elasticsearch 7+ IO module

2020-02-05 Thread Etienne Chauchot
reat job! As I recall, we already have some support of Elasticsearch7 in current ElasticsearchIO (afaik, at least they are compatible), thanks to Zhong Chen and Etienne Chauchot, who were working on adding this [1][2] and it should be released in Beam 2.19

Re: A new reworked Elasticsearch 7+ IO module

2020-02-05 Thread Etienne Chauchot
ort of Elasticsearch7 in current ElasticsearchIO (afaik, at least they are compatible), thanks to Zhong Chen and Etienne Chauchot, who were working on adding this [1][2] and it should be released in Beam 2.19. Would you think you can leverage this i

Re: A new reworked Elasticsearch 7+ IO module

2020-01-30 Thread Etienne Chauchot
that can be included in ASF projects. Best, Etienne On 25/01/2020 14:23, Ludovic Boutros wrote: Hi all, First, thank you for your great answers. I thank Zhong Chen and Etienne Chauchot for their great job on this too ! Alexey and Chamikara, I understand your point of view. Actually, I have

[Spark Structured Streaming runner] perfs and encoders

2019-12-23 Thread Etienne Chauchot
Hi all, good news ! I did some refactoring of the encoders to improve maintenability and replace as much as possible string generated code with compiled code and the perf results are awesome ! Best Etienne

Re: Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

2019-12-11 Thread Etienne Chauchot
lem. Flatten just became quite an interesting transform, for me :-) Kenn On Tue, Dec 10, 2019 at 12:37 AM Etienne Chauchot mailto:echauc...@apache.org>> wrote: Hi all, I have an interrogation around testFlattenMultipleCoders test: This test uses 2 collections 1. long and n

Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

2019-12-10 Thread Etienne Chauchot
Hi all, I have an interrogation around testFlattenMultipleCoders test: This test uses 2 collections 1. long and null data encoded using NullableCoder(BigEndianLongCoder) 2. long data encoded using VarlongCoder It then flattens the 2 collections and set the coder of the resulting collection

Re: consurrent PRs

2019-11-28 Thread Etienne Chauchot
Hi all, FYI, I closed the most recent one (with explanation and a sorry message): https://github.com/apache/beam/pull/10025 Etienne On 26/11/2019 17:06, Robert Bradshaw wrote: On Tue, Nov 26, 2019 at 6:15 AM Etienne Chauchot wrote: Hi guys, I wanted your opinion about something: I have 2

consurrent PRs

2019-11-26 Thread Etienne Chauchot
Hi guys, I wanted your opinion about something: I have 2 concurrent PRs that do the same: https://github.com/apache/beam/pull/10010 https://github.com/apache/beam/pull/10025 The first one is a bit better because it addresses a deprecation that the other does not address. Except that they

Re: [spark structured streaming runner] available on master

2019-11-20 Thread Etienne Chauchot
Forgot to say thanks everyone for their contribution to this especially Alexey, Ryan and Ismael. Etienne On 20/11/2019 17:12, Etienne Chauchot wrote: Hi all, I'm glad to announce that the new Spark runner based on Spark structured streaming framework has been merged into master

[spark structured streaming runner] available on master

2019-11-20 Thread Etienne Chauchot
Hi all, I'm glad to announce that the new Spark runner based on Spark structured streaming framework has been merged into master ! It is not based on RDD/DStream API. See https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html It is still experimental, its coverage

Re: [spark structured streaming runner] merge to master?

2019-11-13 Thread Etienne Chauchot
Ok for 1 jar with the 2 runners then. I'll add the banner to the logs and the Experimental in the code and in in the javadocs. Thanks for your opinions guys ! Etienne On 08/11/2019 18:50, Kenneth Knowles wrote: On Thu, Nov 7, 2019 at 5:32 PM Etienne Chauchot <mailto:echauc...@apache.

Re: [spark structured streaming runner] merge to master?

2019-11-07 Thread Etienne Chauchot
omparing to the whole size of shaded jar of users job. Even more, I think it will be quite confusing for users to choose which jar to use if we will have 3 different ones for similar purposes. Though, let’s see what others think. >>> >>> On 29 Oct 2019,

Re: [spark structured streaming runner] merge to master?

2019-10-11 Thread Etienne Chauchot
both RDD/Dstream and Structured Streaming. Etienne Kenn On Thu, Oct 10, 2019 at 11:50 AM Robert Bradshaw <mailto:rober...@google.com>> wrote: On Thu, Oct 10, 2019 at 12:39 AM Etienne Chauchot mailto:echauc...@apache.org>> wrote: > > Hi guys, >

Re: [spark structured streaming runner] merge to master?

2019-10-11 Thread Etienne Chauchot
Hi Robert comments inline: On 10/10/2019 20:49, Robert Bradshaw wrote: On Thu, Oct 10, 2019 at 12:39 AM Etienne Chauchot wrote: Hi guys, You probably know that there has been for several months an work developing a new Spark runner based on Spark Structured Streaming framework. This work

Re: [spark structured streaming runner] merge to master?

2019-10-11 Thread Etienne Chauchot
Question is: do we keep the "old" spark runner for a while or not (or just keep on previous version/tag on git) ? Regards JB On 10/10/2019 09:39, Etienne Chauchot wrote: Hi guys, You probably know that there has been for several months an work developing a new Spark runner based

[spark structured streaming runner] merge to master?

2019-10-10 Thread Etienne Chauchot
Hi guys, You probably know that there has been for several months an work developing a new Spark runner based on Spark Structured Streaming framework. This work is located in a feature branch here: https://github.com/apache/beam/tree/spark-runner_structured-streaming To attract more

Re: Cassandra flaky on Jenkins?

2019-09-19 Thread Etienne Chauchot
Hi all,I just created a PR (1) that tries to fix the flakiness of CassandraIOTest (underlying ticket https://jira.apache.org/jira/browse/BEAM-8025 that was assigned to me). We will see with the test repetitions if it is no more flaky. JB, I don't know if my PR will also fix the ticket

Re: Pointers on Contributing to Structured Streaming Spark Runner

2019-09-18 Thread Etienne Chauchot
rting multiple-aggregations > > in its streaming mode but design is ongoing. Do you have a link or > > something else to their design discussion/doc? > > > > > > -Rui > > On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot > > wrote: > > > Hi Rahu

Re: Pointers on Contributing to Structured Streaming Spark Runner

2019-09-18 Thread Etienne Chauchot
to their design discussion/doc? > > > -Rui > On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot wrote: > > Hi Rahul,Sure, and great ! Thanks for proposing !If you want details, here > > is the presentation I did 30 mins ago at > > the apachecon. You will find the vide

[Off for 3 weeks]

2019-07-19 Thread Etienne Chauchot
Hi guys, Just to let you know, I'll be off for 3 weeks starting tonight. See you when I get back Etienne

Re: [Current spark runner] Combine globally translation is risky and not very performant

2019-07-01 Thread Etienne Chauchot
tential OOM will be just moved to some worker. Or > would you see some other option? > Jan > [1] https://issues.apache.org/jira/browse/BEAM-7574 > On 6/27/19 11:43 AM, Etienne Chauchot wrote: > Hi guys, > FYI, while I'm working on the combine translation for the new spark runner >

[Current spark runner] Combine globally translation is risky and not very performant

2019-06-27 Thread Etienne Chauchot
Hi guys, FYI, while I'm working on the combine translation for the new spark runner poc, I saw something that do not seem right in the current runner: https://issues.apache.org/jira/browse/BEAM-7647 Best, Etienne

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-09 Thread Etienne Chauchot
Congrats ! Etienne Le lundi 06 mai 2019 à 22:28 -0700, Joana Filipa Bernardo Carrasqueira a écrit : > Thank you for your work in the community and Congratulations!! :) > > On Thu, May 2, 2019 at 9:44 PM Ankur Goenka wrote: > > Congratulations and thank you for making Beam awesome! > > From:

Re: [ANNOUNCE] New committer announcement: Boyuan Zhang

2019-05-09 Thread Etienne Chauchot
Congrats ! Etienne Le vendredi 12 avril 2019 à 15:53 -0700, Thomas Weise a écrit : > Congrats! > > On Thu, Apr 11, 2019 at 6:03 PM Reuven Lax wrote: > > Congratulations Boyuan! > > On Thu, Apr 11, 2019 at 4:53 PM Ankur Goenka wrote: > > > Congrats Boyuan! > > > On Thu, Apr 11, 2019 at 4:52 PM

Re: [ANNOUNCE] New committer announcement: Mark Liu

2019-05-09 Thread Etienne Chauchot
Congrats ! Le lundi 25 mars 2019 à 10:55 -0700, Chamikara Jayalath a écrit : > Congrats Mark! > > On Mon, Mar 25, 2019 at 10:50 AM Alexey Romanenko > wrote: > > Congratulations, Mark! > > > > > On 25 Mar 2019, at 18:36, Mark Liu wrote: > > > > > > Thank you all! It's a great pleasure to work

Structured streaming based spark runner.

2019-04-30 Thread Etienne Chauchot
Hi guys, As part of the ongoing work on spark runner POC based on structured streaming framework, I sketched up a design doc (1) to share context and design principles. Feel free to comment. [1] https://s.apache.org/spark-structured-streaming-runner Etienne

Re: CVE audit gradle plugin

2019-04-26 Thread Etienne Chauchot
mercredi 24 avril 2019 à 15:56 +0200, Etienne Chauchot a écrit : > Hi all,FYI I just submitted a PR (1) to add the CVE audit plugin to the build > as an optional task gradlew audit -- > info. > [1] https://github.com/apache/beam/pull/8388 > Etienne > Le mardi 23 avril 2019 à 17

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-26 Thread Etienne Chauchot
Hi, Thanks for all your work and patience Andrew ! PS: as a side note, there were 5 binding votes (I voted +1) Etienne Le jeudi 25 avril 2019 à 11:16 -0700, Andrew Pilloud a écrit : > I reran the Nexmark tests, each runner passed. I compared the numbers > on the direct runner to the dashboard

Re: CVE audit gradle plugin

2019-04-24 Thread Etienne Chauchot
Hi all,FYI I just submitted a PR (1) to add the CVE audit plugin to the build as an optional task gradlew audit --info. [1] https://github.com/apache/beam/pull/8388 Etienne Le mardi 23 avril 2019 à 17:25 +0200, Etienne Chauchot a écrit : > Hi,should I merge my branch > https://gith

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-24 Thread Etienne Chauchot
> > degraded performance around 04/10 it is not part of the release > > > > > > > > we are > > > > > > > > > > > > > > > > voting, so please consider reverting your -1. > > > > > > > > > >

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-24 Thread Etienne Chauchot
> > > > > > > > > > > > > However the issue you are reporting looks important, from a quick > > > > > > look > > > > > > > > > > > > I am guessing it could be related to BEAM-5775 that was merged on > > > > > > > > > > > > 12/04 however the performance regressions started happening

Re: CVE audit gradle plugin

2019-04-23 Thread Etienne Chauchot
> > > > running > > > > > > > > > > > > in or main repo maybe in a weekly basis like we do for the > > > > > > dependency > > > > > > > > > > > > repor

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-23 Thread Etienne Chauchot
Hi guys ,I will vote -1 (binding) on this RC (although degradation is before RC4 cut date). I took a look at Nexmark graphs for the 3 major runners :- there seem to have functional regressions on Dataflow: https://apache-beam-testing.appspot.com/explore?dashboard=5647201107705856 . 13 queries

Re: [PROPOSAL] commit granularity in master

2019-04-04 Thread Etienne Chauchot
gt; > > > > > > > > > > > > > > I would like to propose a small modification to the commit title style > > > > > > > > on that guide. We use two brackets to enclose the issue id, but that > > > > > > > >

Re: [PROPOSAL] commit granularity in master

2019-03-22 Thread Etienne Chauchot
. > > In my opinion, the most important rule is that every commit should be atomic > in terms of added/fixed functionality and > rolling it back should not break master branch. > > [1] > https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives >

[PROPOSAL] commit granularity in master

2019-03-22 Thread Etienne Chauchot
Hi all, It has already been discussed partially but I would like that we agree on the commit granularity that we want in our history. Some features were squashed to only one commit which seems a bit too granular to me for a big feature. On the contrary I see PRs with very small commits such as

[spark runner dataset POC] workCount works !

2019-03-21 Thread Etienne Chauchot
Hi guys, We are glad to announce that the spark runner POC that was re-written from scratch using the structured-streaming framework and the dataset API can now run WordCount ! It is still embryonic. For now it only runs in batch mode and there is no fancy stuff like state, timer, SDF,

Re: [PROPOSAL] Preparing for Beam 2.12.0 release

2019-03-18 Thread Etienne Chauchot
Sounds great, thanks for volunteering to do the release. Etienne Le mercredi 13 mars 2019 à 12:08 -0700, Andrew Pilloud a écrit : > Hello Beam community! > Beam 2.12 release branch cut date is March 27th according to the release > calendar [1]. I would like to volunteer > myself to do this

Re: JIRA hygiene

2019-03-18 Thread Etienne Chauchot
) time to first understand a problem and then the PR to realise > > nothing has to be done anymore. Or not > > knowing what's left out and for what reason. But of course, this is another > > issue which we definitely need to invest > > time into - kenn already asked for o

Re: JIRA hygiene

2019-03-12 Thread Etienne Chauchot
a commit on Feb 25, 2019 11:48 PM PST. > > > > > > And that is a commit on the release > > > > > > branch. > > > > > > > > > > > > After cutting the release branch, I only merged cherry picks from > > > > > > m

Re: [VOTE] Release 2.11.0, release candidate #2

2019-03-12 Thread Etienne Chauchot
e PRs. > > Does this answer your question? > > Ahmet > > [1] > https://github.com/apache/beam/commit/a103edafba569b2fd185b79adffd91aaacb790f0 > On Mon, Mar 11, 2019 at 1:50 AM Etienne Chauchot wrote: > > @Ahmet sorry I did not have time to check 2.11 release but a f

Re: [ANNOUNCE] New committer announcement: Raghu Angadi

2019-03-11 Thread Etienne Chauchot
Congrats ! Well deserved Etienne Le lundi 11 mars 2019 à 13:22 +0100, Alexey Romanenko a écrit : > My congratulations, Raghu! > > > On 8 Mar 2019, at 10:39, Łukasz Gajowy wrote: > > > > Congratulations! :) > > pt., 8 mar 2019 o 10:16 Gleb Kanterov napisał(a): > > > Congratulations! > > > On

Re: [BEAM-6759] CassandraIOTest failing in presubmit in multiple PRs

2019-03-06 Thread Etienne Chauchot
Hi guys, As I introduced embedded Cassandra backend in the IO tests, I'll fix this issue. It is very common (as discussed) that embedded backends cause flakiness. But it is the price to pay for more relevant tests :) Etienne Le lundi 04 mars 2019 à 11:28 +0100, Maximilian Michels a écrit : > Hey

Re: Apache Beam Newsletter - February/March 2019

2019-03-06 Thread Etienne Chauchot
Hi,I would add in what's been done: Work on cassandraIO (Etienne Chauchot, Mathieu Blanchard, Frank Shahar) : refactorings, bugfixes, new where clause, security fix Etienne Le lundi 04 mars 2019 à 18:36 +0100, Suneel Marthi a écrit : > Is this the final draft? - we had 2 beam talks at Big D

CVE audit gradle plugin

2019-02-28 Thread Etienne Chauchot
Hi guys, I came by this [1] gradle plugin that is a client to the Sonatype OSS Index CVE database. I have set it up here in a branch [2], though the cache is not configured and the number of requests is limited. It can be run with "gradle --info audit" It could be nice to have something like

Re: Signing off

2019-02-15 Thread Etienne Chauchot
Thank you for your contributions Scott ! Your new project seems very fun. Enjoy ! Etienne Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit : > Your work and willingness to make Beam better will be missed.Good luck for > the next phase! > On Fri, Feb 15, 2019 at 1:39 PM Łukasz

Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-08 Thread Etienne Chauchot
Thanks Robert ! Etienne Le vendredi 08 février 2019 à 16:42 +0100, Robert Bradshaw a écrit : > +1 (binding) > > I have verified that the artifacts and their checksums/signatures look good, > and also checked the Python wheels > against simple pipelines. > On Fri, Feb 8, 2019

Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-08 Thread Etienne Chauchot
Hi,I did the same visual checks of Nexmark that I did on RC2 for both functional regressions (output size) and performance regressions (execution time) on all the runners/modes for RC3 cut date (02/06) and I saw no regression except the one that I already mentioned (end of october perf

Re: Another another new contributor! :)

2019-02-07 Thread Etienne Chauchot
Hi, Help much appreciated !And welcome ! Etienne Le jeudi 07 février 2019 à 15:44 +0800, Reza Ardeshir Rokni a écrit : > Welcome! > On Tue, 5 Feb 2019 at 23:34, Kenneth Knowles wrote: > > Welcome Kyle! > > On Tue, Feb 5, 2019 at 4:34 AM Maximilian Michels wrote: > > > Welcome Kyle! Excited to

Re: [VOTE] Release 2.10.0, release candidate #1

2019-02-06 Thread Etienne Chauchot
Hi, I just fixed both (one was not a bug but an error in test code) in this [1] PR[1] https://github.com/apache/beam/pull/7751 Etienne Le mardi 05 février 2019 à 17:37 +0100, Etienne Chauchot a écrit : > Hi guys, > I just found 2 bugs while replacing the mock in CassandraIO by a

Re: [DISCUSSION] UTests and embedded backends

2019-02-06 Thread Etienne Chauchot
100, Etienne Chauchot a écrit : > Guys, > I will try using mocks where I see it is needed. As there is a current PR > opened on Cassandra, I will take this > opportunity to add the embedded cassandra server > (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket >

Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Etienne Chauchot
Hi,I checked Nexmark on both output size (functional regression detection) and run time (performance regression). The only thing I see is a performance regression on query7 (side input + fanout) in spark runner but this regression is there since the previous release cut.Indeed 2.9 was cut on

Re: [VOTE] Release 2.10.0, release candidate #1

2019-02-05 Thread Etienne Chauchot
Hi guys, I just found 2 bugs while replacing the mock in CassandraIO by a proper instance: https://issues.apache.org/jira/browse/BEAM-6592https://issues.apache.org/jira/browse/BEAM-6591 I don't think they are release blockers because they have been there since CassandraIO first version.One of

Re: BEAM-6324 / #7340: "I've pretty much given up on the PR being merged. I use my own fork for my projects"

2019-01-31 Thread Etienne Chauchot
ere might be misunderstandings > > > authors of such commits should give a clear message saying "do not merge > > > yet" or "not ready for review" in title > > > or comments or even close such PR and reopen until the change is ready. > > >

Re: [ANNOUNCE] New PMC member: Etienne Chauchot

2019-01-29 Thread Etienne Chauchot
; > > > > > <mailto:sc...@apache.org>> wrote: > > > > > > > > > > > > > > Congrats Etienne! > > > > > > > > > > > > > > On Fri, Jan 25, 2019 at

Re: [DISCUSSION] UTests and embedded backends

2019-01-28 Thread Etienne Chauchot
e they are > often anexpensive (in production and > maintenance) way to get what amounts tolow true coverage. > On Mon, Jan 28, 2019 at 11:16 AM Etienne Chauchot > wrote: > > Guys, > I will try using mocks where I see it is needed. As there is a current PR > opened on Cassa

Re: [DISCUSSION] UTests and embedded backends

2019-01-28 Thread Etienne Chauchot
embed and the backend behavior is predictable, then itmakes > sense.In other cases, we can fallback to > mock. > RegardsJB > On 21/01/2019 10:07, Etienne Chauchot wrote: > Hi guys, > Lately I have been fixing various Elasticsearch flakiness issues in theUTests > by: introducing

Re: BEAM-6324 / #7340: "I've pretty much given up on the PR being merged. I use my own fork for my projects"

2019-01-28 Thread Etienne Chauchot
Sure it's a pity than this PR got unnoticed and I think it is a combination of factors (PR date around Christmas, the fact that the author forgot - AFAIK - to ping a reviewer in either the PR or the ML). I agree with Rui's proposal to enhance visibility of the "how to get a reviewed" process.

Re: [ANNOUNCE] New committer announcement: Gleb Kanterov

2019-01-25 Thread Etienne Chauchot
Congrats Gleb and welcome onboard ! Etienne Le vendredi 25 janvier 2019 à 10:39 +0100, Alexey Romanenko a écrit : > Congrats to Gleb and welcome on board! > > > On 25 Jan 2019, at 09:22, Tim Robertson wrote: > > > > Welcome Gleb and congratulations! > > > > On Fri, Jan 25, 2019 at 8:06 AM

Re: [spark runner based on dataset POC] your opinion

2019-01-24 Thread Etienne Chauchot
@Gleb, I'll also take a look at ExpressionEncoder thanks for the pointer to typelevel/frameless. Etienne Le mercredi 23 janvier 2019 à 17:06 +0100, Etienne Chauchot a écrit : > Hi all ,Thanks for your feedback! I was indeed thinking about Reuven's work > around Schema PCollections, he

Re: [spark runner based on dataset POC] your opinion

2019-01-23 Thread Etienne Chauchot
t; > > of serde and forgoes the benefits of Dataset API. > > > Maybe Dataset is not the best idea to integrate Beam with Spark. Just my > > > $0.02. > > > > > > Manu > > > > > > On Thu, Jan 17, 2019 at 10:44 PM Etienne Chaucho

Re: [PROPOSAL] allow the users to anticipate the support of features in the targeted runner.

2019-01-23 Thread Etienne Chauchot
this knowledge and the envy to code it ? Best Etienne Le mercredi 24 octobre 2018 à 09:45 +0200, Etienne Chauchot a écrit : > Hi guys, > To sum up what we said, I just opened this > ticket:https://issues.apache.org/jira/browse/BEAM-5849 > Etienne > Le jeudi 18 octobre 2018 à 12:44 +0

Re: [DISCUSSION] UTests and embedded backends

2019-01-21 Thread Etienne Chauchot
r via a > binary search). > On Mon, Jan 21, 2019 at 10:07 AM Etienne Chauchot > wrote: > > Hi guys, > Lately I have been fixing various Elasticsearch flakiness issues in the > UTests by: introducing timeouts, countdown > latches, force refresh, embedded cluster si

[DISCUSSION] UTests and embedded backends

2019-01-21 Thread Etienne Chauchot
Hi guys, Lately I have been fixing various Elasticsearch flakiness issues in the UTests by: introducing timeouts, countdown latches, force refresh, embedded cluster size decrease ... These flakiness issues are due to the embedded Elasticsearch not coping well with the jenkins overload. Still,

Re: [spark runner based on dataset POC] your opinion

2019-01-17 Thread Etienne Chauchot
way to go, but we don't leverage schema PCollections. BestEtienne Le jeudi 17 janvier 2019 à 21:52 +0800, Manu Zhang a écrit : > Nice Try, Etienne ! Is it possible to pass in the schema through pipeline > options ? > Manu > On Thu, Jan 17, 2019 at 5:25 PM Etienne Chauchot wrote:

Re: [spark runner based on dataset POC] your opinion

2019-01-17 Thread Etienne Chauchot
; Cool! > I don't quite understand the issue in "bytes serialization to comply to spark > dataset schemas to store > windowedValues". Can you say a little more? > > Kenn > On Tue, Jan 15, 2019 at 8:54 AM Etienne Chauchot wrote: > > Hi guys, > > regarding the

[spark runner based on dataset POC] your opinion

2019-01-15 Thread Etienne Chauchot
Hi guys, regarding the new (made from scratch) spark runner POC based on the dataset API, I was able to make a big step forward: it can now run a first batch pipeline with a source ! See

  1   2   3   4   >