Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Kenneth Knowles
My mistake - using @Teardown in this way is a good approach. It may not be executed sometimes, but like Reuven says it means the process died. Kenn On Thu, Jan 17, 2019 at 9:31 AM Jean-Baptiste Onofré wrote: > Hi, > > I don't think we have connection leak in normal behavior. > > The actual SQL

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Kenneth Knowles
Actually, Reuven, that's no longer the case. It used to be that incoming data was compared to the watermark but it is not today. Instead, Jeff's first phrasing is perfect. One way to see it is the think about what are the consequences of late data: if there is a grouping/aggregation by

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Rui Wang
I created this PR: https://github.com/apache/beam/pull/7556 Feel free to review/comment it. -Rui On Thu, Jan 17, 2019 at 2:37 PM Rui Wang wrote: > It might be better to keep something like "watermark usually consistently > moves forward". But "Elements that arrive with a smaller timestamp

Re: Where should I locate shared/common Java Code (Java SDK and Java RH both use)?

2019-01-17 Thread Alex Amato
Cool, thanks! Sounds like a good place to put it. Appreciate the pointer :) On Thu, Jan 17, 2019 at 3:47 PM Kenneth Knowles wrote: > I think runners/core-java fits in this case. It started life as a place > for shared functionality to all JVM-based runners, but much of that also > turns out to

Adding KMS support to generic filesystem interface

2019-01-17 Thread Udi Meiri
Hi, I'd like to add support for creating files using a cloud Key Management System. A KMS allows you to audit, create, rotate, and disable encryption keys. Both AWS and GCP have such a service.. I wanted to show the community what I've been working on and see if there are any comments or

Re: Where should I locate shared/common Java Code (Java SDK and Java RH both use)?

2019-01-17 Thread Kenneth Knowles
I think runners/core-java fits in this case. It started life as a place for shared functionality to all JVM-based runners, but much of that also turns out to apply to the Java SDK harness. I see it is already used in the harness. Kenn On Thu, Jan 17, 2019 at 1:55 PM Alex Amato wrote: > any

[Proposal] Requesting PMC approval to start planning for Beam Summits 2019

2019-01-17 Thread joanafilipa
Dear Project Management Committee, The Beam Summits are community events funded by a Sponsoring Committee and organized by an Organizing Committee. I’d like to get the following approvals: To organize and host the Summits under the name of Beam Summit , i.e. Beam Summit North America 2019,

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Rui Wang
It might be better to keep something like "watermark usually consistently moves forward". But "Elements that arrive with a smaller timestamp than the current watermark are considered late data." has already given the order of late data ts and watermark. -Rui On Thu, Jan 17, 2019 at 1:39 PM Jeff

Re: Where should I locate shared/common Java Code (Java SDK and Java RH both use)?

2019-01-17 Thread Alex Amato
any Java SDK Harness coult use* On Thu, Jan 17, 2019 at 1:22 PM Alex Amato wrote: > Just the SDK harness, it doesn't need to be part of the user APIs or > anything. But I want to offer it as a library that any Java SDK could use > to enable state sampling as a low weight estimation of execution

Re: Where should I locate shared/common Java Code (Java SDK and Java RH both use)?

2019-01-17 Thread Alex Amato
Just the SDK harness, it doesn't need to be part of the user APIs or anything. But I want to offer it as a library that any Java SDK could use to enable state sampling as a low weight estimation of execution times, which is useful to implement the execution time metrics. So I don't see a need to

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Jeff Klukas
Reuven - I don't think I realized it was possible to have late data with the global window, so I'm definitely learning things through this discussion. New suggested wording, then: Elements that arrive with a smaller timestamp than the current watermark are considered late data. That says

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Reuven Lax
Though it's not tied to window. You could be in the global window, so the watermark never advances past the end of the window, yet still get late data. On Thu, Jan 17, 2019, 11:14 AM Jeff Klukas How about: "Once the watermark progresses past the end of a window, any > further elements that

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Rui Wang
It is better, and it also fits the following example. -Rui On Thu, Jan 17, 2019 at 11:14 AM Jeff Klukas wrote: > How about: "Once the watermark progresses past the end of a window, any > further elements that arrive with a timestamp in that window are considered > late data." > > On Thu, Jan

Re: [PROPOSAL] Prepare Beam 2.10.0 release

2019-01-17 Thread Maximilian Michels
An issue with the Flink Runner when restarting streaming pipelines: https://jira.apache.org/jira/browse/BEAM-6460 Looks like it will be easy to fix by invalidating the Jackson cache. -Max On 16.01.19 23:00, Kenneth Knowles wrote: Quick update on this. There are three remaining issues:  -

Re: [PROPOSAL] decrease the number of threads for BigQuery streaming insertAll

2019-01-17 Thread Chamikara Jayalath
Thanks Heejong. I agree that writing to a service using 50 unlimited threadpools sounds excessive and can result in flooding that service (BigQuery in this case) in error scenarios where we should backoff. Determining a suitable and limited amount of parallelization (50 in this case) sounds good

flink portable runner usage

2019-01-17 Thread Hai Lu
Hi Thomas, This is Hai who works on portable runner for Samza. I have a few minor question that I would like to get clarification on from you. We chatted briefly at last beam meetup and you mention your flink portable runner (Python) is going into production. So today are you using Beam Python

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Jeff Klukas
How about: "Once the watermark progresses past the end of a window, any further elements that arrive with a timestamp in that window are considered late data." On Thu, Jan 17, 2019 at 1:43 PM Rui Wang wrote: > Hi Community, > > In Beam programming guide [1], there is a sentence: "Data that

Re: Where should I locate shared/common Java Code (Java SDK and Java RH both use)?

2019-01-17 Thread Kenneth Knowles
Do you want to use it in the core SDK or the SDK harness? Does the state sampling code depend on the core SDK? I'd really like to avoid layers with names like "common". Just delete that layer of the directory and the path hierarchy has exactly the same meaning, but is more concise and clear.

Where should I locate shared/common Java Code (Java SDK and Java RH both use)?

2019-01-17 Thread Alex Amato
I would like to reuse the State Sampling classes from the Dataflow Runner Harness in the Beam Java SDK. I created a Refactoring plan and removed the Dataflow references from the classes in PR/7507

Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Rui Wang
Hi Community, In Beam programming guide [1], there is a sentence: "Data that arrives with a timestamp after the watermark is considered *late data*" Seems like people get confused by it. For example, see Stackoverflow comment [2]. Basically it makes people think that a event timestamp that is

Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Jean-Baptiste Onofré
Hi, I don't think we have connection leak in normal behavior. The actual SQL statement is executed in @FinishBundle, where the connection is closed. The processElement adds record to process. Does it mean that an Exception occurs in the batch addition ? Regards JB On 17/01/2019 12:41, Alexey

Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Reuven Lax
But I think it will be called unless the process crashes though. On Thu, Jan 17, 2019, 7:05 AM Ismaël Mejía Couldn't this be related also to the fact that @Teardown is > best-effort in Dataflow? > > On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko > wrote: > > > > Kenn, > > > > I’m not sure

Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Ismaël Mejía
Couldn't this be related also to the fact that @Teardown is best-effort in Dataflow? On Thu, Jan 17, 2019 at 12:41 PM Alexey Romanenko wrote: > > Kenn, > > I’m not sure that we have a connection leak in JdbcIO since new connection is > being obtained from an instance of javax.sql.DataSource

Re: Our jenkins beam1 server is down

2019-01-17 Thread Ismaël Mejía
Thanks Yifan for taking care. On Thu, Jan 17, 2019 at 1:24 AM Yifan Zou wrote: > > Yes, beam14 is offline as well. We're on it. > > On Wed, Jan 16, 2019 at 4:11 PM Ruoyun Huang wrote: >> >> With another try, succeeding on beam10. >> >> Thanks for the fix. >> >> On Wed, Jan 16, 2019 at 3:53 PM

Re: [spark runner based on dataset POC] your opinion

2019-01-17 Thread Etienne Chauchot
Hi Manu,Yes a json schema can make its way to the spark source with no difficulty. but still we need to store windowedValue not only the elements that would comply with this schema. The problem is that spark will try to match the element (windowedValue) to the schema of the source at any element

Re: [spark runner based on dataset POC] your opinion

2019-01-17 Thread Manu Zhang
Nice Try, Etienne ! Is it possible to pass in the schema through pipeline options ? Manu On Thu, Jan 17, 2019 at 5:25 PM Etienne Chauchot wrote: > Hi Kenn, > > Sure, in spark DataSourceV2 providing a schema is mandatory: > - if I set it to null, I obviously get a NPE > - if I set it empty: I

Re: Proposal: Portability SDKHarness Docker Image Release with Beam Version Release.

2019-01-17 Thread Alan Myrvold
+1 This would be great. gcr.io seems like a good option for snapshots due to the permissions from jenkins to upload and ability to keep snapshots around. On Wed, Jan 16, 2019 at 6:51 PM Ruoyun Huang wrote: > +1 This would be a great thing to have. > > On Wed, Jan 16, 2019 at 6:11 PM Ankur

Re: Connection leaks with PostgreSQL instance

2019-01-17 Thread Alexey Romanenko
Kenn, I’m not sure that we have a connection leak in JdbcIO since new connection is being obtained from an instance of javax.sql.DataSource (created in @Setup) and which is org.apache.commons.dbcp2.BasicDataSource by default. BasicDataSource uses connection pool and closes all idle connections

Re: [spark runner based on dataset POC] your opinion

2019-01-17 Thread Etienne Chauchot
Hi Kenn, Sure, in spark DataSourceV2 providing a schema is mandatory:- if I set it to null, I obviously get a NPE- if I set it empty: I get an array out of bounds exception- if I set it to Datatype.Null, null is stored as actual elements => Consequently I set it to binary. As the beam reader is