Re: Permission to contribute on LZO compression enablement for Beam Java SDK

2019-11-07 Thread Amogh Tiwari
Thanks Luke :) On Fri, Nov 8, 2019 at 1:01 AM Luke Cwik wrote: > Welcome, I have added you as a contributor and assigned the ticket to you. > > On Thu, Nov 7, 2019 at 4:21 AM Amogh Tiwari wrote: > >> Hi, >> >> I would like to contribute on enabling Apache Beam's java SDK to work >> with LZO

Re: RabbitMQ and CheckpointMark feasibility

2019-11-07 Thread Reuven Lax
Just to clarify one thing: CheckpointMark does not need to be Java Seralizable. All that's needed is do return a Coder for the CheckpointMark in getCheckpointMarkCoder. On Thu, Nov 7, 2019 at 7:29 PM Eugene Kirpichov wrote: > Hi Daniel, > > This is probably insufficiently well documented. The

Re: [Discuss] Beam mascot

2019-11-07 Thread Reza Rokni
Salmon... they love streams? :-) On Fri, 8 Nov 2019 at 12:00, Kenneth Knowles wrote: > Agree with Aizhamal that it doesn't matter if they are taken if they are > not too close in space to Beam: Apache projects, big data, log processing, > stream processing. Not a legal opinion, but an aesthetic

Re: [discuss] More dimensions for the Capability Matrix

2019-11-07 Thread Thomas Weise
FWIW there are currently at least 2 instances of capability matrix [1] [2]. [1] has been in need of a refresh for a while. [2] is more useful but only covers portable runners and is hard to find. Thomas [1] https://beam.apache.org/documentation/runners/capability-matrix/ [2]

Re: [Discuss] Beam mascot

2019-11-07 Thread Kenneth Knowles
Agree with Aizhamal that it doesn't matter if they are taken if they are not too close in space to Beam: Apache projects, big data, log processing, stream processing. Not a legal opinion, but an aesthetic opinion. So I would keep Lemur as a possibility. Definitely nginx is far away from Beam so it

[discuss] More dimensions for the Capability Matrix

2019-11-07 Thread Pablo Estrada
Hi all, I think this is a relatively common question: - Can I do X with runner Y, and SDK Z? The answers vary significantly between SDK and Runner pairs. This makes it such that the current Capability Matrix falls somewhat short when potential users / solutions architects / etc are trying to

Re: RabbitMQ and CheckpointMark feasibility

2019-11-07 Thread Eugene Kirpichov
Hi Daniel, This is probably insufficiently well documented. The CheckpointMark is used for two purposes: 1) To persistently store some notion of how much of the stream has been consumed, so that if something fails we can tell the underlying streaming system where to start reading when we

Re: Pipeline AttributeError on Python3

2019-11-07 Thread Valentyn Tymofieiev
I think we have heard of this issue from the same source: This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it

Re: Cython unit test suites running without Cythonized sources

2019-11-07 Thread Chad Dombrova
Hi, Answers inline below, It's unclear from the nose source[1] whether it's calling build_py >> and build_ext, or just build_ext. It's also unclear whether the result of >> that build is actually used. When python setup.py nosetests runs, it runs >> inside of a virtualenv created by tox, and

Re: Cython unit test suites running without Cythonized sources

2019-11-07 Thread Ahmet Altay
On Thu, Nov 7, 2019 at 1:37 PM Chad Dombrova wrote: > > On Thu, Nov 7, 2019 at 11:31 AM Robert Bradshaw > wrote: > >> Does python setup.py nosetests invoke build_ext (or, more generally, >> build)? > > > It's unclear from the nose source[1] whether it's calling build_py > and build_ext, or just

Re: Feature addition to java CassandraIO connector

2019-11-07 Thread Vincent Marquez
Thanks for the response Pablo. We're currently using our own custom ParDo connector for Cassandra (specialized to Scylla's sharding algorithm) that has a 'readAll' type option and getting great results. Would you be up for taking an outside contribution that refactors the current CassandraIO

Re: Contributor permission for Beam Jira tickets

2019-11-07 Thread Changming Ma
Oh, one more thing: my jira account name is: cmma On Thu, Nov 7, 2019 at 3:04 PM Changming Ma wrote: > Hi, > This is Changming, a SWE with Google. I'm working on a GCP DataFlow item > and it'll be nice some of my changes can be backported to beam (e.g., > BEAM-8579). > Could someone please

Re: New Contributor

2019-11-07 Thread Kyle Weaver
Can you please share your Jira username? On Thu, Nov 7, 2019 at 3:04 PM Andrew Crites wrote: > This is Andrew Crites. I'm making some changes to the Python Dataflow > runner. Can someone add me as a contributor for Beam's Jira issue tracker? > Apparently I can't be assigned issues right now. >

Contributor permission for Beam Jira tickets

2019-11-07 Thread Changming Ma
Hi, This is Changming, a SWE with Google. I'm working on a GCP DataFlow item and it'll be nice some of my changes can be backported to beam (e.g., BEAM-8579). Could someone please add me as a contributor for Beam's Jira issue tracker? My github account is cmm08 (email: c...@google.com). Thank

New Contributor

2019-11-07 Thread Andrew Crites
This is Andrew Crites. I'm making some changes to the Python Dataflow runner. Can someone add me as a contributor for Beam's Jira issue tracker? Apparently I can't be assigned issues right now. Thanks!

Re: Confusing multiple output semantics in Python

2019-11-07 Thread Ning Kang
Hi Sam, Thanks for clarifying the accessor to output when building a pipeline. Internally, we have AppliedPTransform, where the output is always a dictionary: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L770 And it seems to me that with key 'None', the output

Re: Deprecate some or all of TestPipelineOptions?

2019-11-07 Thread Ismaël Mejía
Thanks for bringing this to the ML Brian +1 For full TestPipelineOptions deprecation. Even worth to remove it, bad part is that this class resides in 'sdks/core/main/java' and not in testing as I imagined so this could count as a 'breaking' change. On Thu, Nov 7, 2019 at 8:27 PM Luke Cwik

Re: Triggers still finish and drop all data

2019-11-07 Thread Steve Niemitz
Interestingly enough, we just had a use case come up that I think could have been solved by finishing triggers. Basically, we want to emit a notification when a certain threshold is reached (in this case, we saw at least N elements for a given key), and then never notify again within that window.

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-07 Thread Alex Van Boxel
For date wise, I'm wondering why we should switching the Europe and NA one, this would mean that the Berlin and the new EU summit would be almost 1.5 years apart. _/ _/ Alex Van Boxel On Thu, Nov 7, 2019 at 8:43 PM Ahmet Altay wrote: > I prefer bay are for NA summit. My reasoning is that

Confusing multiple output semantics in Python

2019-11-07 Thread Sam Rohde
Hi All, In the Python SDK there are three ways of representing the output of a PTransform with multiple PCollections: - dictionary: PCollection tag --> PCollection - tuple: index --> PCollection - DoOutputsTuple: tag, index, or field name --> PCollection I find this inconsistent way of

Re: Cython unit test suites running without Cythonized sources

2019-11-07 Thread Chad Dombrova
On Thu, Nov 7, 2019 at 11:31 AM Robert Bradshaw wrote: > Does python setup.py nosetests invoke build_ext (or, more generally, > build)? It's unclear from the nose source[1] whether it's calling build_py and build_ext, or just build_ext. It's also unclear whether the result of that build is

Re: ES 7.0 Support Development

2019-11-07 Thread David Morávek
Hi Zhong, just fyi, there is another ongoing effort on adding es 7 support. https://github.com/apache/beam/pull/10025 you guys should get in touch ;) D. Sent from my iPhone > On 7 Nov 2019, at 20:20, Zhong Chen wrote: > >  > Hi all, > > I have made a PR for adding ES 7.0 support here.

RabbitMQ and CheckpointMark feasibility

2019-11-07 Thread Daniel Robert
(Background: I recently upgraded RabbitMqIO from the 4.x to 5.x library. As part of this I switched to a pull-based API rather than the previously-used push-based. This has caused some nebulous problems so put up a correction PR that I think needs some eyes fairly quickly as I'd consider

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-07 Thread Ahmet Altay
I prefer bay are for NA summit. My reasoning is that there is a criticall mass of contributors and users in that location, probably more than alternative NA locations. I was not involved with planning recently and I do not know if there were people who could attend due to location previously. If

Re: Key encodings for state requests

2019-11-07 Thread Maximilian Michels
While the Go SDK doesn't yet support a State API, Option 3) is what the Go SDK does for all non-standard coders (aka custom coders) anyway. For wire transfer, the Java Runner also adds a LengthPrefixCoder for the coder and its subcomponents. The problem is that this is an implicit assumption

Re: Cython unit test suites running without Cythonized sources

2019-11-07 Thread Robert Bradshaw
Does python setup.py nosetests invoke build_ext (or, more generally, build)? It's possible cython is present, but the build step is not invoked which would explain the skip for slow_coders_test. The correct test is being used in

Re: Permission to contribute on LZO compression enablement for Beam Java SDK

2019-11-07 Thread Luke Cwik
Welcome, I have added you as a contributor and assigned the ticket to you. On Thu, Nov 7, 2019 at 4:21 AM Amogh Tiwari wrote: > Hi, > > I would like to contribute on enabling Apache Beam's java SDK to work with > LZO compression. Please add me as a contributor so that I can work on this. > I've

Re: Deprecate some or all of TestPipelineOptions?

2019-11-07 Thread Luke Cwik
There was issue with asynchrony of p.run(), some runners blocked till the pipeline was complete with p.run() which was never meant to be the intent. The test timeout one makes sense to be able to configure it per runner (since Dataflow takes a lot longer than other runners) but we may be able to

ES 7.0 Support Development

2019-11-07 Thread Zhong Chen
Hi all, I have made a PR for adding ES 7.0 support here. However the unit tests are failing because for some reason the test cluster is not publishing http endpoints correctly, which is leading to connection refused exception. I am still trying to

Re: Cython unit test suites running without Cythonized sources

2019-11-07 Thread Ahmet Altay
I believe tox is correctly installing cython and executes "python setup.py nosetests" which triggers cythonzation path inside setup.py. Some indications that cython is installed and used is the following log entries (from a recent precommit cron job [1]) - [ 1/12] Cythonizing

Re: Contributing to Beam javadoc

2019-11-07 Thread Luke Cwik
Welcome and I just merged your PR. On Wed, Nov 6, 2019 at 1:15 PM Ismaël Mejía wrote: > Done, you can now self assign issues too, welcome Jonathan! > > On Wed, Nov 6, 2019 at 10:00 PM Jonathan Alvarez-Gutierrez > wrote: > > > > Hey, > > > > I just filed

Re: Command for Beam worker on Spark cluster

2019-11-07 Thread Matthew K.
Thanks, but still have problem making remote worker on k8s work (important to point out that I had to create shared volume between nodes in order all have access to the same /tmp, since beam runner creates artifact staging files on the machine it is running on, and expects workers to read from

Re: Getting contributor permission to JIRA

2019-11-07 Thread Luke Cwik
Welcome, I have added you as a contributor and assigned BEAM-8575 to you. On Wed, Nov 6, 2019 at 5:37 PM Wenjia Liu wrote: > Hi, > > This is Wendy from Google. I'm contributing to adding more tests for Beam > Python. Could anyone add me as a contributor for JIRA? I'd like to assign > this issue

Re: (Question) SQL integration tests for MongoDb

2019-11-07 Thread Kirill Kozlov
Thank you for your response! I want to make sure that when tests run on Jenkins they get supplied with pipelines options containing hostName and Port of a running MongoDb service. I'm writing integration test for a MongoDb SQL adapter (located sdks/java/extensions/sql/meta/provider/mongodb). I

Re: 10,000 Pull Requests

2019-11-07 Thread Maximilian Michels
Yes! Keep the committer pipeline filled ;) Reviewing PRs probably remains one of the toughest problems in active open-source projects. On 07.11.19 18:28, Luke Cwik wrote: We need more committers... that review the code. On Wed, Nov 6, 2019 at 6:21 PM Pablo Estrada

Re: 10,000 Pull Requests

2019-11-07 Thread Luke Cwik
We need more committers... that review the code. On Wed, Nov 6, 2019 at 6:21 PM Pablo Estrada wrote: > iiipe : ) > > On Thu, Nov 7, 2019 at 12:59 AM Kenneth Knowles wrote: > >> Awesome! >> >> Number of days from PR #1 and PR #1000: 211 >> Number of days from PR #9000 and PR #1: 71

Re: [PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-11-07 Thread Kamil Wasilewski
Thanks for spotting this! It should be working fine now. On Thu, Nov 7, 2019 at 5:40 PM Dan Gazineu wrote: > Thank you for the update Kamil! Please fix the sharing options in the new > doc. > > On Thu, Nov 7, 2019 at 7:22 AM Kamil Wasilewski < > kamil.wasilew...@polidea.com> wrote: > >> Hi all,

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-07 Thread Luke Cwik
I did suggest one other alternative on Jincheng's PR[1] which was to allow windowless values to be sent across the gRPC port. The SDK would then be responsible for ensuring that the execution didn't access any properties that required knowledge of the timestamp, pane or window. This is different

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-07 Thread Robert Bradshaw
I think there is some misunderstanding about what is meant by option 2. What Kenn (I think) and I are proposing is not a WindowedValueCoder whose window/timestamp/paneinfo coders are parameterized to be constant coders, but a WindowedValueCoder whose window/timestamp/paneinfo values are specified

Re: [PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-11-07 Thread Dan Gazineu
Thank you for the update Kamil! Please fix the sharing options in the new doc. On Thu, Nov 7, 2019 at 7:22 AM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: > Hi all, > > For a while we have been working on the implementation of proposal > presented in this thread. Just to remind what

Re: [spark structured streaming runner] merge to master?

2019-11-07 Thread Etienne Chauchot
Hi guys @Kenn, I just wanted to mention that I did answered your question on dependencies here: https://lists.apache.org/thread.html/5a85caac41e796c2aa351d835b3483808ebbbd4512b480940d494439@%3Cdev.beam.apache.org%3E regarding jars: I don't like 3 jars either. I'm not in favor of having

Re: Key encodings for state requests

2019-11-07 Thread Robert Bradshaw
On Thu, Nov 7, 2019 at 6:26 AM Maximilian Michels wrote: > > Thanks for the feedback thus far. Some more comments: > > > Instead, the runner knows ahead of time that it > > will need to instantiate this coder, and should update the bundle > > processor to specify KvCoder, > > VarIntCoder> as the

Re: Key encodings for state requests

2019-11-07 Thread Robert Burke
While the Go SDK doesn't yet support a State API, Option 3) is what the Go SDK does for all non-standard coders (aka custom coders) anyway. While this means that for certain custom encodings of user types there may be the overhead of length prefixing it, it's not likely to be the most significant

Re: [PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-11-07 Thread Kamil Wasilewski
Hi all, For a while we have been working on the implementation of proposal presented in this thread. Just to remind what it was about in short words — we wanted to use Prometheus and Grafana to visualize Beam performance tests results in addition to detect regressions automatically. A couple of

Re: [Discuss] Beam Summit 2020 Dates & locations

2019-11-07 Thread Elliotte Rusty Harold
The U.S. sadly is not a reliable destination for international conferences these days. Almost every conference I go to, big and small, has at least one speaker, sometimes more, who can't get into the country. Canada seems worth considering. Vancouver, Montreal, and Toronto are all convenient. On

Re: [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

2019-11-07 Thread Jan Lukavský
Hi,is there anything I can do to make this more attractive? :-) Any feedback would be much appreciated.Many thanks, JanDne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský :Hi, I'd like to open a vote on accepting design document [1] as a base for implementation of @RequiresTimeSortedInput

Permission to contribute on LZO compression enablement for Beam Java SDK

2019-11-07 Thread Amogh Tiwari
Hi, I would like to contribute on enabling Apache Beam's java SDK to work with LZO compression. Please add me as a contributor so that I can work on this. I've also raised a ticket for the same. Thanks and best regards, Amogh Tiwari

Re: (Question) SQL integration tests for MongoDb

2019-11-07 Thread Michał Walenia
Hi, What exactly are you trying to do? If you're looking for a way to provide pipeline options to the MongoDBIOIT, you can pass them via command line like this: ./gradlew integrationTest -p sdks/java/io/mongodb * -DintegrationTestPipelineOptions='[ "--mongoDBHostName=1.2.3.4",

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-07 Thread jincheng sun
Thanks for your feedback and the valuable comments, Kenn & Robert! I think your comments are more comprehensive and enlighten me a lot. The two proposals which I mentioned above are to reuse the existing coder (FullWindowedValueCoder and ValueOnlyWindowedValueCoder). Now, with your comments, I