Build failed in Jenkins: beam_SeedJob_Standalone #1804

2018-10-23 Thread Apache Jenkins Server
See Changes: [kedin] Fix java-harness build by adding flush() to -- Started by timer [EnvInject] - Loading node environment variables. Building remotely on beam2

Re: KafkaIO - Deadletter output

2018-10-23 Thread Raghu Angadi
User can read serialized bytes from KafkaIO and deserialize explicitly in a ParDo, which gives complete control on how to handle record errors. This is I would do if I need to in my pipeline. If there is a transform in Beam that does this, it could be convenient for users in many such scenarios.

Build failed in Jenkins: beam_SeedJob #2851

2018-10-23 Thread Apache Jenkins Server
See -- GitHub pull request #6802 of commit 4ae0d3d8507e1a71618443f0fcd51547417bd049, no merge conflicts. Setting status of 4ae0d3d8507e1a71618443f0fcd51547417bd049 to PENDING with url

Re: KafkaIO - Deadletter output

2018-10-23 Thread Chamikara Jayalath
Given that KafkaIO uses UnboundeSource framework, this is probably not something that can easily be supported. We might be able to support similar features when we have Kafka on top of Splittable DoFn though. So feel free to create a feature request JIRA for this. Thanks, Cham On Tue, Oct 23,

Build failed in Jenkins: beam_SeedJob #2850

2018-10-23 Thread Apache Jenkins Server
See -- GitHub pull request #6802 of commit 0845ab25a592350c87ec3443d83203c93553d5aa, no merge conflicts. Setting status of 0845ab25a592350c87ec3443d83203c93553d5aa to PENDING with url

Re: KafkaIO - Deadletter output

2018-10-23 Thread Kenneth Knowles
This is a great question. I've added the dev list to be sure it gets noticed by whoever may know best. Kenn On Tue, Oct 23, 2018 at 2:05 AM Kaymak, Tobias wrote: > > Hi, > > Is there a way to get a Deadletter Output from a pipeline that uses a > KafkaIO > connector for it's input? As

Re: Follow up ideas, to simplify creating MonitoringInfos.

2018-10-23 Thread Kenneth Knowles
FWIW AutoValue will build most of that class for you, if it is as you say. Kenn On Tue, Oct 23, 2018 at 6:04 PM Alex Amato wrote: > Hi Robert + beam dev list, > > I was thinking about your feedback in PR#6205 > , and agree that this >

Follow up ideas, to simplify creating MonitoringInfos.

2018-10-23 Thread Alex Amato
Hi Robert + beam dev list, I was thinking about your feedback in PR#6205 , and agree that this monitoring_infos.py https://github.com/apache/beam/blob/61a9f7193f1a61869915da3b4f386b34eac63822/sdks/python/apache_beam/metrics/monitoring_infos.py> became a

Build failed in Jenkins: beam_SeedJob #2849

2018-10-23 Thread Apache Jenkins Server
See Changes: [kedin] [SQL] Move builtin aggregations creation to a map of factories [kedin] [SQL] Simplify AggregationRel [kedin] [SQL] Add AggregationCall wrapper [kedin] [SQL] Inline aggregation rel helper

Build failed in Jenkins: beam_SeedJob_Standalone #1803

2018-10-23 Thread Apache Jenkins Server
See Changes: [kedin] [SQL] Move builtin aggregations creation to a map of factories [kedin] [SQL] Simplify AggregationRel [kedin] [SQL] Add AggregationCall wrapper [kedin] [SQL] Inline aggregation rel

[SQL] Investigation of missing/wrong session_end implementation in BeamSQL

2018-10-23 Thread Rui Wang
Hi community, In BeamSQL, SESSION window is supported in GROUP BY. Example query: "SELECT f_int2, COUNT(*) AS `getFieldCount`," + " SESSION_START(f_timestamp, INTERVAL '5' MINUTE) AS `window_start`, " + " SESSION_END(f_timestamp, INTERVAL '5' MINUTE) AS `window_end` "

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread Kenneth Knowles
I think it makes sense for each vendored dependency to be self-contained as much as possible. It should keep it fairly simple. Things that cross their API surface cannot be hidden, of course. Jar size is not a concern IMO. Kenn On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik wrote: > How should we

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-23 Thread Kenneth Knowles
Yes, user@ cannot reach new users, really. Twitter might, if we have enough of adjacent followers to get it in front of the right people. On the other hand, I find testimonials from experience convincing in this case. Kenn On Tue, Oct 23, 2018 at 2:59 PM Ahmet Altay wrote: > > > On Tue, Oct

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-23 Thread Ahmet Altay
On Tue, Oct 23, 2018 at 9:16 AM, Thomas Weise wrote: > > > On Mon, Oct 22, 2018 at 2:42 PM Ahmet Altay wrote: > >> We attempted to collect feedback on the mailing lists but did not get >> much input. From my experience (mostly based on dataflow) there is a >> sizeable group of users who are

Re: Java Precommit duration

2018-10-23 Thread Robert Bradshaw
On Tue, Oct 23, 2018 at 11:28 PM Kenneth Knowles wrote: > Hi all, > > Java Precommit duration is about 1h15. That is quite a burden. Especially > if something gets broken. > I'm in favor of (simple!) build breaks going in before precommits finish, on the promise that the offending test(s)

Please ignore the 'Java FnApi PreCommit' and 'Java FnApi PostCommit' failures

2018-10-23 Thread Boyuan Zhang
Hey all, I'm working on adding 2 more Jenkins Jobs to run java PreCommit and PostCommit with fn-api worker and stabilizing the job status. Please ignore failures from these 2 jobs. Once they are ready, there will be another email to follow up. Sorry for the inconvenience! Best, Boyuan Zhang

Java Precommit duration

2018-10-23 Thread Kenneth Knowles
Hi all, Java Precommit duration is about 1h15. That is quite a burden. Especially if something gets broken. We turned off parallel builds, which we really need to re-enable. But beyond that, I see low-hanging fruit that would most appropriately be a separate Jenkins job. Here's a scan of a

Re: [Proposal] Add exception handling option to MapElements

2018-10-23 Thread Jeff Klukas
https://github.com/apache/beam/pull/6586 is still open for review, but I also wanted to gather feedback about a potential refactor as part of that change. We could refactor MapElements, FlatMapElements, and Filter to all inherit from a common abstract base class SingleMessageTransform. The new

Re: Possible memory leak in Direct Runner unbounded

2018-10-23 Thread Andrew Pilloud
Hi Martin, I've seen similar things. The Direct Runner is intended for testing with small datasets, and is expected to retain the entire dataset in memory. It sounds like you have a pipeline that requires storing data for a GroupByKey operation. There is no mechanism to page intermediates to disk

Re: Data Preprocessing in Beam

2018-10-23 Thread Lukasz Cwik
Arnoud Fournier (afourn...@talend.com) started by adding a library to support sketching ( https://github.com/apache/beam/tree/master/sdks/java/extensions/sketching), I feel as those some of these could be added there or possibly within another extension. On Tue, Oct 23, 2018 at 9:54 AM Austin

Data Preprocessing in Beam

2018-10-23 Thread Austin Bennett
Hi Beam Devs, Alejandro, copied, is an enthusiastic developer, who recently coded up: https://github.com/elbaulp/DPASF (associated paper found: https://arxiv.org/abs/1810.06021). He had been looking to contribute that code to FlinkML, at which point I found him and alerted him to Beam. He has

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread Lukasz Cwik
How should we handle the transitive dependencies of the things we want to vendor? For example we use gRPC which depends on Guava 20 and we also use Calcite which depends on Guava 19. Should the vendored gRPC/Calcite/... be self-contained so it contains all its dependencies, hence vendored gRPC

Re: Docker missing on Beam15

2018-10-23 Thread Thomas Weise
Thanks! There have been a few successful runs now. On Tue, Oct 23, 2018 at 8:52 AM Yifan Zou wrote: > FYI, the docker was restarted on beam15. > > On Tue, Oct 23, 2018 at 7:08 AM Thomas Weise wrote: > >> For the latter (createProcessWorker): >> https://github.com/apache/beam/pull/6793 >> >> >>

Re: Docker missing on Beam15

2018-10-23 Thread Yifan Zou
FYI, the docker was restarted on beam15. On Tue, Oct 23, 2018 at 7:08 AM Thomas Weise wrote: > For the latter (createProcessWorker): > https://github.com/apache/beam/pull/6793 > > > On Tue, Oct 23, 2018 at 6:47 AM Thomas Weise wrote: > >> Thanks for taking a look Yifan. Yes, it appears this

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread Kenneth Knowles
I actually created the subtasks by finding things shaded by at least one module. I think each one should definitely have an on list discussion that clarifies the target artifact, namespace, version, possible complications, etc. My impression is that many many modules shade only Guava. So for

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread Thomas Weise
+1 for separate artifacts I would request that we explicitly discuss and agree which dependencies we vendor though. Not everything listed in the JIRA subtasks is currently relocated. Thomas On Tue, Oct 23, 2018 at 8:04 AM David Morávek wrote: > +1 This should improve build times a lot. It

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread David Morávek
+1 This should improve build times a lot. It would be great if vendored deps could stay in the main repository. D. On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels wrote: > Looks great, Kenn! > > > Max: what is the story behind having a separate flink-shaded repo? Did > that make it easier

Re: Docker missing on Beam15

2018-10-23 Thread Thomas Weise
For the latter (createProcessWorker): https://github.com/apache/beam/pull/6793 On Tue, Oct 23, 2018 at 6:47 AM Thomas Weise wrote: > Thanks for taking a look Yifan. Yes, it appears this was an intermittent > issue. > > For beam_PostCommit_Python_VR_Flink we are left with: > > * beam15 docker

Re: Docker missing on Beam15

2018-10-23 Thread Thomas Weise
Thanks for taking a look Yifan. Yes, it appears this was an intermittent issue. For beam_PostCommit_Python_VR_Flink we are left with: * beam15 docker errors * segmentation faults * "Execution failed for task ':beam-sdks-python:createProcessWorker'" - which should not even execute since we are

Re: [DISCUSS] Publish vendored dependencies independently

2018-10-23 Thread Maximilian Michels
Looks great, Kenn! Max: what is the story behind having a separate flink-shaded repo? Did that make it easier to manage in some way? Better separation of concerns, but I don't think releasing the shaded artifacts from the main repo is a problem. I'd even prefer not to split up the repo

Re: Python docs build error

2018-10-23 Thread Maximilian Michels
It looks like now the build is broken on Jenkins but runs fine on MacOs. There is some inconsistency in how `:pylint27` runs across the two platforms. Broken build: https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/216/ On 22.10.18 19:01, Ruoyun Huang wrote: To Colm's

Re: [PROPOSAL] Move sorting to sdks-java-core

2018-10-23 Thread Robert Bradshaw
I like the idea of asking for a coder for T with properties X. (E.g. the order-preserving one may not be the the most efficient, so a poor default, but required in some cases.) Note that if we go the route of secondary-key-extraction, we don't even need a full coder here, just an order-preserving

Re: Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #216

2018-10-23 Thread Maximilian Michels
I don't get the error locally when running: gradle :beam-sdks-python:lintPy27 Seems like there is a different configuration on Jenkins? On 23.10.18 10:16, Apache Jenkins Server wrote: See

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #216

2018-10-23 Thread Apache Jenkins Server
See Changes: [david.moravek] [BEAM-5297] Add propdeps-idea plugin. [25622840+adude3141] remove usage of deprecated Task.leftShift(Closure) method [25622840+adude3141] remove usage of