Re: Query Beam internal state

2019-11-04 Thread Luke Cwik
This has come up for Dataflow customers as well where people would like to directly serve the content that is stored in state and I'm not aware of any current plans to expose this from Beam at the moment. You could implement this yourself by writing the value to state and also writing the value

Re: Deprecate some or all of TestPipelineOptions?

2019-11-08 Thread Luke Cwik
ven worth to remove it, > bad part is that this class resides in 'sdks/core/main/java' and not > in testing as I imagined so this could count as a 'breaking' change. > > On Thu, Nov 7, 2019 at 8:27 PM Luke Cwik wrote: > > > > There was issue with asynchrony of p.run(), some ru

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-08 Thread Luke Cwik
On Thu, Nov 7, 2019 at 7:36 PM Kenneth Knowles wrote: > > > On Thu, Nov 7, 2019 at 9:19 AM Luke Cwik wrote: > >> I did suggest one other alternative on Jincheng's PR[1] which was to >> allow windowless values to be sent across the gRPC port. The SDK would then >&g

Re: New Contributor

2019-11-08 Thread Luke Cwik
Welcome, I have added you as a contributor. On Fri, Nov 8, 2019 at 10:16 AM Andrew Crites wrote: > It's crites. Thanks! > > On Thu, Nov 7, 2019 at 3:06 PM Kyle Weaver wrote: > >> Can you please share your Jira username? >> >> On Thu, Nov 7, 2019 at 3:04 PM Andrew Crites >> wrote: >> >>> This

Re: Contributor permission for Beam Jira tickets

2019-11-08 Thread Luke Cwik
Welcome, I have added you as a contributor and assigned BEAM-8579 to you. On Thu, Nov 7, 2019 at 3:14 PM Changming Ma wrote: > Oh, one more thing: my jira account name is: cmma > > > > On Thu, Nov 7, 2019 at 3:04 PM Changming Ma wrote: > >> Hi, >> This is Changming, a SWE with Google. I'm

Re: [Discuss] Beam mascot

2019-11-08 Thread Luke Cwik
My top suggestion is a cuttlefish. On Thu, Nov 7, 2019 at 10:28 PM Reza Rokni wrote: > Salmon... they love streams? :-) > > On Fri, 8 Nov 2019 at 12:00, Kenneth Knowles wrote: > >> Agree with Aizhamal that it doesn't matter if they are taken if they are >> not too close in space to Beam:

Re: Detecting resources to stage

2019-11-08 Thread Luke Cwik
I believe the closest suggestion[1] we had that worked for Java 11 and maintained backwards compatibility was to use the URLClassLoader to infer the resources and if we couldn't do that then look at the java.class.path system property to do the inference otherwise fail and force the users to tell

Re: Questions about the current and future design of the job service message stream

2019-11-08 Thread Luke Cwik
+Daniel Mills for usability in job messages / logging integration across Beam runners. On Wed, Nov 6, 2019 at 10:30 AM Chad Dombrova wrote: > Hi all, > I’ve been working lately on improving the state stream and message stream > on the job service (links to issues and PRs below), and I’m

Re: [Discuss] Beam mascot

2019-11-11 Thread Luke Cwik
o:kcwea...@google.com>> wrote: > >> > >> Re fish: The authors of the Streaming Systems went with trout, but > >> the book mentioned a missed opportunity to make their cover a "robot > >> dinosaur with a Scottish accent." Perhaps tha

Re: Any chance I could help do code reviews in beam

2019-11-11 Thread Luke Cwik
What is your github user id so people could tag you as a reviewer? On Mon, Nov 11, 2019 at 11:02 AM Brandon Pollack wrote: > I might have some off time once and an while and could use a change in > pace sometimes, I wouldn't be able to much but Luke said you guys could use > the help? > > -- >

Re: Questions about the current and future design of the job service message stream

2019-11-11 Thread Luke Cwik
On Sun, Nov 10, 2019 at 5:06 PM Chad Dombrova wrote: > Hi, > >> You can see that each JobMessagesResponse may contain a message *or* a >>> GetJobStateResponse. >>> >>> What’s the intention behind this design? >>> >> I believe this was because a user may want to listen to both job state >> and

Re: contributor permission for Beam Jira tickets

2019-11-11 Thread Luke Cwik
Welcome, I have added you as a contributor. On Mon, Nov 11, 2019 at 8:38 AM Рустам Халмурзаев wrote: > Hi, > > This is Rustam Khalmurzaev (username Rustam_Kh). I'm studying Spark and > Beam and trying to use Python SDK. Can someone add me as a contributor for > Beam's Jira issue tracker? I

Re: New Contributor

2019-11-09 Thread Luke Cwik
Welcome, I have added you as a contributor. On Fri, Nov 8, 2019 at 3:14 PM Yang Zhang wrote: > Hello Beam community, > > This is Yang from LinkedIn. I am closely working with Xinyu on adopting > Beam SQL in LinkedIn. Can someone add me as a contributor for Beam's Jira > issue tracker? I would

Re: Getting contributor permission to JIRA

2019-11-07 Thread Luke Cwik
Welcome, I have added you as a contributor and assigned BEAM-8575 to you. On Wed, Nov 6, 2019 at 5:37 PM Wenjia Liu wrote: > Hi, > > This is Wendy from Google. I'm contributing to adding more tests for Beam > Python. Could anyone add me as a contributor for JIRA? I'd like to assign > this issue

Re: [DISCUSS] Avoid redundant encoding and decoding between runner and harness

2019-11-07 Thread Luke Cwik
I did suggest one other alternative on Jincheng's PR[1] which was to allow windowless values to be sent across the gRPC port. The SDK would then be responsible for ensuring that the execution didn't access any properties that required knowledge of the timestamp, pane or window. This is different

Re: Contributing to Beam javadoc

2019-11-07 Thread Luke Cwik
Welcome and I just merged your PR. On Wed, Nov 6, 2019 at 1:15 PM Ismaël Mejía wrote: > Done, you can now self assign issues too, welcome Jonathan! > > On Wed, Nov 6, 2019 at 10:00 PM Jonathan Alvarez-Gutierrez > wrote: > > > > Hey, > > > > I just filed

Re: 10,000 Pull Requests

2019-11-07 Thread Luke Cwik
We need more committers... that review the code. On Wed, Nov 6, 2019 at 6:21 PM Pablo Estrada wrote: > iiipe : ) > > On Thu, Nov 7, 2019 at 12:59 AM Kenneth Knowles wrote: > >> Awesome! >> >> Number of days from PR #1 and PR #1000: 211 >> Number of days from PR #9000 and PR #1: 71

Re: Key encodings for state requests

2019-11-05 Thread Luke Cwik
+1 to what Robert said. On Tue, Nov 5, 2019 at 2:36 PM Robert Bradshaw wrote: > The Coder used for State/Timers in a StatefulDoFn is pulled out of the > input PCollection. If a Runner needs to partition by this coder, it > should ensure the coder of this PCollection matches with the Coder >

Re: Permission to contribute on LZO compression enablement for Beam Java SDK

2019-11-07 Thread Luke Cwik
Welcome, I have added you as a contributor and assigned the ticket to you. On Thu, Nov 7, 2019 at 4:21 AM Amogh Tiwari wrote: > Hi, > > I would like to contribute on enabling Apache Beam's java SDK to work with > LZO compression. Please add me as a contributor so that I can work on this. > I've

Re: why are so many transformation needed for a simple TextIO.write() operation

2019-10-30 Thread Luke Cwik
A lot of the logic is around handling various error scenarios. You should notice that the majority of that graph is about passing around metadata around what files were written and what errors there were. That metadata is tiny in comparison and should only be a blip when compared to writing the

Re: RFC: python static typing PR

2019-10-30 Thread Luke Cwik
+1 for type annotations. On Mon, Oct 28, 2019 at 7:41 PM Robert Burke wrote: > As someone who cribs from the Python SDK to make changes in the Go SDK, > this will make things much easier to follow! Thank you. > > On Mon, Oct 28, 2019, 6:52 PM Chad Dombrova wrote: > >> >> Wow, that is an

Re: [discuss] Using a logger hierarchy in Python

2019-11-13 Thread Luke Cwik
That doesn't seem like a very invasive change so if we adopt it we should adopt it everywhere in the same CL so people see the common pattern and use it. I'm for using a named logger and would rather that it is per class instead of per module since many of the modules have lots of classes but +1

Re: Date/Time Ranges & Protobuf

2019-11-13 Thread Luke Cwik
he global window are all >> performance hacks in my view. Timestamps in beam are really a tagged union: >> > >> > timestamp ::= min | max | end_of_global | actual_time(... some >> quantitative timestamp ...) >> > >> > with the ordering >> > &

Re: Type of builtin PTransform/PCollection metrics

2019-11-13 Thread Luke Cwik
Are you referring specifically to? * beam:metric:element_count:v1 * beam:metric:pardo_execution_time:start_bundle_msecs:v1 * beam:metric:pardo_execution_time:process_bundle_msecs:v1 * beam:metric:pardo_execution_time:finish_bundle_msecs:v1 * beam:metric:ptransform_execution_time:total_msecs:v1

Re: Make environment_id a top level attribute of PTransform

2019-11-13 Thread Luke Cwik
The original ideology was around having only those attributes that required to set it would contain the attribute but once something becomes common enough it makes sense to have it as an optional parameter so +1. Are there areas where the environment id will still exist outside of a PTransform?

Re: Why is Pipeline not Serializable and can it be changed to be Serializable

2019-11-14 Thread Luke Cwik
You should create placeholders inside of your Twister2/OpenMPI implementation that represent these functions and then instantiate actual instances of them on the workers if you want to write your own pipeline representation and format for OpenMPI/Twister2. Or consider converting the pipeline to

Re: Date/Time Ranges & Protobuf

2019-11-14 Thread Luke Cwik
eap years or seconds? If we were to make our own timestamp > format, would we have to worry about that? Or is the timestamp supplied to > Beam a property of the underlying system giving Beam the timestamp? If it > is, then there may be some interop problems between sources. > >

Date/Time Ranges & Protobuf

2019-11-11 Thread Luke Cwik
While crites@ was investigating using protobuf to represent Apache Beam timestamps within the TestStreamEvents, he found out that the well known type google.protobuf.Timestamp doesn't support certain timestamps we were using in our tests (specifically the max timestamp that Apache Beam supports).

Re: [Discuss] Beam mascot

2019-11-11 Thread Luke Cwik
elated to anything, but chinchillas are also cute. >> >> On Mon, Nov 11, 2019 at 8:25 AM Luke Cwik wrote: >> >>> 9 and 7 for me (in that order) >>> >>> On Mon, Nov 11, 2019 at 7:18 AM Maximilian Michels >>> wrote: >>> >>>>

Re: How to unsubscribe the Apache projects and jira issues notification

2019-11-15 Thread Luke Cwik
https://apache.org/foundation/mailinglists.html#request-addresses-for-unsubscribing If you want to subscribe to l...@apache.org then you need to send a message to list-subscr...@apache.org To get off a list, send a message to list-unsubscr...@apache.org On Fri, Nov 15, 2019 at 2:40 AM P.

Re: Why is Pipeline not Serializable and can it be changed to be Serializable

2019-11-15 Thread Luke Cwik
sue of not having default constructors. >> >> I also initially considered converting the pipeline into a JSON format >> and sending that over to the workers, Will take a look at the option you >> have mentioned since we do plan to implement a Portable pipeline runner for &

Re: Deprecate some or all of TestPipelineOptions?

2019-11-07 Thread Luke Cwik
There was issue with asynchrony of p.run(), some runners blocked till the pipeline was complete with p.run() which was never meant to be the intent. The test timeout one makes sense to be able to configure it per runner (since Dataflow takes a lot longer than other runners) but we may be able to

Re: Date/Time Ranges & Protobuf

2019-11-18 Thread Luke Cwik
our own >>> timestamp format, would we have to worry about that? Or is the timestamp >>> supplied to Beam a property of the underlying system giving Beam the >>> timestamp? If it is, then there may be some interop problems between >>> sources. >>>

Re: GCP libraries up-to-date versions in Java

2019-11-22 Thread Luke Cwik
On Wed, Nov 20, 2019 at 4:52 PM Luke Cwik wrote: > >> I took a look at the linkage checker and have opened up this PR[1] to >> allow contributors to aid in performing dependency analysis within Apache >> Beam during upgrades. >> >> The current PR works by co

Re: Request for review of PR [Beam-8564]

2019-12-04 Thread Luke Cwik
; Is there a way to wrap this up as an optional dependency with multiple >> possible providers, if there's no good library satisfying all of the >> conditions (in particular (1))? >> >> On Tue, Dec 3, 2019 at 9:47 AM Luke Cwik wrote: >> > >> > I was h

Re: Request for review of PR [Beam-8564]

2019-12-03 Thread Luke Cwik
this on airlift:aircompressor. > > Thanks and Regards, > Amogh > > > > On Tue, Dec 3, 2019 at 2:59 AM Luke Cwik wrote: > >> I took a look. My biggest concern is finding a good LZO implementation. >> Looking for one that preferably has: >> 1) Apache licens

Re: Python staging file weirdness

2019-12-04 Thread Luke Cwik
ckages: > https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161 > > > On Wed, Dec 4, 2019 at 6:19 PM Luke Cwik wrote: > >> Is there a way to use a cache on disk that is separate from th

Re: Contributor permission for Beam Jira tickets

2019-12-04 Thread Luke Cwik
Welcome, I have added you as a contributor. On Wed, Dec 4, 2019 at 4:19 PM Esun Kim wrote: > Hi, > > This is Esun Kim from Google. I'm working on GCS connector of beam IO. > Can you add me as a contributor for Beam's Jira issue tracker? My Jira ID > is veblush. > > Regards, > Esun. > >

Re: Python staging file weirdness

2019-12-04 Thread Luke Cwik
Is there a way to use a cache on disk that is separate from the set of packages we use as requirements? On Wed, Dec 4, 2019 at 5:58 PM Udi Meiri wrote: > Thanks! > Another reason to periodically referesh workers. > > On Wed, Nov 27, 2019 at 10:37 PM Valentyn Tymofieiev > wrote: > >> Tests job

Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-17 Thread Luke Cwik
+1 (binding) On Mon, Dec 16, 2019 at 1:18 PM Chamikara Jayalath wrote: > +1 (non-binding) > > On Mon, Dec 16, 2019 at 1:12 PM Mark Liu wrote: > >> +1 >> >> On Mon, Dec 16, 2019 at 11:31 AM Daniel Oliveira >> wrote: >> >>> +1 (non-binding) >>> >>> On Sat, Dec 14, 2019 at 5:24 PM Kyle Weaver

Re: Root logger configuration

2019-12-17 Thread Luke Cwik
In Beam Java, the expectation has always been that pipeline authors are responsible for setting up logging correctly during pipeline construction time and that the Beam SDK is responsible for setting up logging at pipeline execution time. Is this something we can solve by documenting and telling

Re: Request for review of PR [Beam-8564]

2019-12-17 Thread Luke Cwik
d in beam. This will solve both #2 and #3 as the transitive > dependency will be removed and the size will also be reduced by almost > ~20mbs. > > But if we use this approach, we will have to manually change the util > whenever any changes are made to the airlift library. > > On We

Re: Python staging file weirdness

2019-12-05 Thread Luke Cwik
ip* > -rw-rw-r-- 1 jenkins jenkins 852264 Nov 25 20:55 *setuptools-42.0.1.zip* > -rw-rw-r-- 1 jenkins jenkins 858444 Dec 1 18:12 *setuptools-42.0.2.zip* > -rw-rw-r-- 1 jenkins jenkins 32725 Sep 6 21:38 *six-1.12.0.tar.gz* > -rw-rw-r-- 1 jenkins jenkins 33726 Nov 5 19:18 *six-1.13.

Re: [DISCUSS] How to stopp SdkWorker in SdkHarness

2019-10-28 Thread Luke Cwik
ote on this discussion or I can create JIRAs and submit the > PRs directly? > > Best, > Jincheng > > Luke Cwik 于2019年10月26日周六 上午4:01写道: > >> Approach 3 is about caching the bundle descriptor forever but tearing >> down a "live" instance of the DoFns

Re: [DISCUSS] How to stopp SdkWorker in SdkHarness

2019-10-21 Thread Luke Cwik
Approach 2 is currently the suggested approach[1] for DoFn's to shutdown. Note that SDK harnesses can terminate instances any time they want and start new instances anytime as well. Why do you want to expose this logic so that Runners could control it? 1:

Re: Are empty bundles allowed by model?

2019-10-21 Thread Luke Cwik
Yes, please update the test. On Mon, Oct 21, 2019 at 11:20 AM Jan Lukavský wrote: > Hi Robert, > > I though it would be that case. ParDoLifecycleTest, however, does not > currently allow for empty bundles. We have currently worked around this > in Flink by avoiding the creation of these

Re: Proposal: Dynamic timer support (BEAM-6857)

2019-10-29 Thread Luke Cwik
Based upon the current description, from the portability perspective we could: Update the timer spec map comment[1] to be: // (Optional) A mapping of local timer families to timer specifications. map timer_specs = 5; And update the timer coder to have the timer id[2]: // Encodes a timer

Re: aggregating over triggered results

2019-10-29 Thread Luke Cwik
You should first try the obvious answer of using a sliding window of 30 days every 10 minutes before you try the 60 days every 30 days. Beam has some optimizations which will assign a value to multiple windows and only process that value once even if its in many windows. If that doesn't perform

Re: Java 11 compatibility question

2019-10-18 Thread Luke Cwik
There are some changes with Java where the system class loader is no longer a URL class loader[1]. Also reflection is changing such that non-public fields/methods aren't accessible which we (or our dependencies) may be doing. Not sure how our usage of bytecode generation/proxies will need to

Re: Python SDK timestamp precision

2019-10-18 Thread Luke Cwik
Robert it seems like your for Plan A. Assuming we go forward with nanosecond and based upon your analysis in 3), wouldn't that mean we would have to make a breaking change to the Java SDK to swap to nanosecond precision? On Fri, Oct 18, 2019 at 11:35 AM Robert Bradshaw wrote: > TL;DR: We

Re: Multiple Outputs from Expand in Python

2019-10-25 Thread Luke Cwik
I believe PCollectionTuple should be unnecessary since Python has first class support for tuples as shown in the example below[1]. Can we use tuples to solve your issue? wordsStartingWithA = \ p | 'Words starting with A' >> beam.Create(['apple', 'ant', 'arrow']) wordsStartingWithB = \ p

Re: [DISCUSS] How to stopp SdkWorker in SdkHarness

2019-10-25 Thread Luke Cwik
conflict with the proposed Approach 1 as the SDK harness could decide what >>>> to do when receiving the teardown request. It could do nothing if the DoFns >>>> has already been teared down and could also tear down the DoFns if needed. >>>> >>>> What do you th

Re: DynamoDBIO related issue

2019-10-25 Thread Luke Cwik
If you create a JIRA account and share your user id with us, we will grant you contributor access which will allow you to create a JIRA issue. Please take a look at the our contribution guide, it mentions how to connect with the Beam community including creating a JIRA account[1]. 1:

Re: [DISCUSS] How to stopp SdkWorker in SdkHarness

2019-10-25 Thread Luke Cwik
wn, but I think the idea to trigger this logic when the SDK >> > Harness evicts process bundle descriptors is more elegant. >> > >> > Thanks, >> > Max >> > >> > On 25.10.19 17:23, Luke Cwik wrote: >> > > I like approach 3 since it doesn't add additi

Re: GCP libraries up-to-date versions in Java

2019-11-20 Thread Luke Cwik
Minor note that Gradle 5 added support for BOMs[1]. I think attempting to perform the upgrade (whether to use BOM or not) will be a concerted effort every time to minimize the amount of breakage to users while maximizing compatibility with the OSS ecosystem. Unfortunately I'm not aware of any

Re: Portable runner bundle scheduling (Streaming/Python/Flink)

2019-11-20 Thread Luke Cwik
Dataflow has run into this issue as well. Dataflow has "work items" that are converted into bundles that are executed on the SDK. Each work item does a greedy assignment to the SDK worker with the fewest work items assigned. As you surmised, we use SDF splitting in batch pipelines to balance work.

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-20 Thread Luke Cwik
[ ] Beaver [ ] Hedgehog [ ] Lemur [ ] Owl [ ] Salmon [ ] Trout [ ] Robot dinosaur [ ] Firefly [X] Cuttlefish [ ] Dumbo Octopus [X] Angler fish On Wed, Nov 20, 2019 at 10:08 AM Eugene Kirpichov wrote: > [ ] Beaver > [ ] Hedgehog > [X] Lemur > [X] Owl > [ ] Salmon > [ ] Trout > [ ] Robot dinosaur

Re: cython test instability

2019-11-26 Thread Luke Cwik
I also started to see this on PRs that I'm reviewing. BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py and egg_info but this looks different then all of those so I filed BEAM-8831. On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova wrote: > Actually, it looks like I'm

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Luke Cwik
they >> are ready to be stable in terms of API? Perhaps, this topic deserves a new >> discussion if there are several opinions on that. >> >> On 26 Nov 2019, at 00:39, Luke Cwik wrote: >> >> Phase I sounds fine. >> >> Apache Beam follows semant

Re: Detecting resources to stage

2019-11-27 Thread Luke Cwik
using URLClassLoader, and after that, the job worked on >>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and >>> classgraph doesn't have any dependencies. I'm wondering if we can relocate >>> it to java-core jar and use in for non-URLClassLoad

Re: Request for review of PR [Beam-8564]

2019-12-02 Thread Luke Cwik
I took a look. My biggest concern is finding a good LZO implementation. Looking for one that preferably has: 1) Apache license 2) Has zero transitive dependencies 3) Is small 4) Is performant 5) Is native java or supports execution on the three main OSs (Windows, Linux, Mac) In your PR you

Re: Python SDK timestamp precision

2019-10-29 Thread Luke Cwik
I would also suggest using Java's Instant since it will be compatible with many more date/time libraries without forcing onto users the need to go through an artificial millis/nanos conversion layer to Java's Instant. On Tue, Oct 29, 2019 at 5:06 PM Robert Bradshaw wrote: > On Tue, Oct 29, 2019

Re: GCP libraries up-to-date versions in Java

2019-11-20 Thread Luke Cwik
I took a look at the linkage checker and have opened up this PR[1] to allow contributors to aid in performing dependency analysis within Apache Beam during upgrades. The current PR works by compiling and publishing all the Java artifacts to your local maven repo and then runs the linkage checker

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-25 Thread Luke Cwik
Phase I sounds fine. Apache Beam follows semantic versioning and I believe removing the IOs will be a backwards incompatible change unless they were marked experimental which will be a problem for Phase 2. What is the feasibility of making the V1 transforms wrappers around V2? On Mon, Nov 25,

Re: Do we know why gradle scans are not working?

2019-10-10 Thread Luke Cwik
The Gradle forums[1] suggested that we will need to downgrade the build scan plugin to 2.3 to get the build scans working again. Tested it locally and it worked. Filed BEAM-8378 and opened pr/9762 with the downgrade. 1: https://discuss.gradle.org/t/your-build-scan

Re: Contributor Permissions

2019-10-10 Thread Luke Cwik
Welcome, I have added you as a contributor. On Thu, Oct 10, 2019 at 2:02 PM Igor Durovic wrote: > Hi! > > I'm Igor Durovic, an intern at LinkedIn. I'm working on Samza runner and > interactive beam. My JIRA username is idurovic. > > Thanks, > Igor Durovic >

Re: Support for LZO compression.

2019-10-08 Thread Luke Cwik
k it as optional. 1: https://www.apache.org/legal/resolved.html#category-x 2: https://www.apache.org/legal/resolved.html#optional On Tue, Oct 8, 2019 at 3:51 PM Luke Cwik wrote: > Which GPL version? > > The Apache License 2.0 is compatible with GPL 3[1] > > 1: https://www.apache.

Re: Support for LZO compression.

2019-10-08 Thread Luke Cwik
Which GPL version? The Apache License 2.0 is compatible with GPL 3[1] 1: https://www.apache.org/foundation/license-faq.html#GPL On Tue, Oct 8, 2019 at 2:10 PM Sameer Abhyankar wrote: > Hi All, > > We were looking to add an IO that would read LZO compressed binaries from > a supported

Re: [portability] Removing the old portable metrics API...

2019-10-09 Thread Luke Cwik
One way would be to report both so this way we don't need to update the Dataflow Java implementation but other runners using the new API get all the metrics. On Mon, Oct 7, 2019 at 10:00 AM Robert Bradshaw wrote: > Yes, Dataflow still uses the old API, for both counters and for its >

Python thread pool executor for Apache Beam

2019-10-11 Thread Luke Cwik
I'm looking for a thread pool that re-uses threads that are idle before creating new ones and has an API that is compatible with the concurrent.futures ThreadPoolExecutor[1]. To my knowledge, the concurrent.futures ThreadPool creates new threads for tasks up until the thread pool limit before

Re: Unifying Build/contributing instructions

2019-12-19 Thread Luke Cwik
+1 on Kenn's suggestion. On Thu, Dec 12, 2019 at 8:17 PM Kenneth Knowles wrote: > Thanks for taking this on! My preference would be to have CONTRIBUTING.md > link to https://beam.apache.org/contribute/contribution-guide/ and focus > work on the latter. > > Kenn > > On Thu, Dec 12, 2019 at 12:38

Re: Is org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders supposed to be supported ?

2019-12-19 Thread Luke Cwik
I'm pretty sure that Flatten with different coders is well defined. input: List> output: PCollection When flatten is executed using T vs encoded(T), transcoding can be optimized because the coder for the output PCollection is assumed to be able to encode all T's. The DirectRunner specifically

Re: BEAM-8989 fix for 2.18.0 release

2019-12-19 Thread Luke Cwik
Either Salman Raza who developed the PR or Reuven Lax who reviewed it would have the most context. I don't know Salman's contact information though. On Thu, Dec 19, 2019 at 10:18 AM Udi Meiri wrote: > The JIRA issue was assigned to me, but I have no background in the issue. > Who would be the

Re: External transform API in Java SDK

2019-12-20 Thread Luke Cwik
What do side inputs look like? On Thu, Dec 19, 2019 at 4:39 PM Heejong Lee wrote: > I wanted to know if anybody has any comment on external transform API for > Java SDK. > > `External.of()` can create external transform for Java SDK. Depending on > input and output types, two additional methods

Re: Need Help | SpannerIO

2019-12-18 Thread Luke Cwik
How do you want to use the previous data in the SpannerIO.read()? Are you trying to perform a join on a key between two PCollections? If so, please use CoGroupByKey[1]. Are you trying to merge two PCollection objects? If so, please use Flatten[2]. 1:

Re: [VOTE] Release 2.17.0, release candidate #2

2019-12-18 Thread Luke Cwik
I verified the release and ran the quickstarts and found that release 2.16 broke Apache Nemo runner which is also an issue for 2.17.0 RC #2. It is caused by a backwards incompatible change in ParDo.MultiOutput where getSideInputs return value was changed from List to Map as part of

Re: [Proposal] Slowly Changing Dimensions and Distributed Map Side Inputs (in Dataflow)

2019-12-18 Thread Luke Cwik
Most of the doc is about how to support distributed side inputs in Dataflow and doesn't really cover how the Beam model (accumulating, discarding, retraction) triggers impact what are the "contents" of a PCollection in time and how this proposal for a limited set of side input shapes can work to

Re: Jenkins jobs not running for my PR 10438

2019-12-20 Thread Luke Cwik
I'm also affected by this. On Fri, Dec 20, 2019 at 10:13 AM Tomo Suzuki wrote: > Hi Beam developers, > > Does anybody know why my PR does not trigger Jenkins jobs today? > https://github.com/apache/beam/pull/10438 > > -- > Regards, > Tomo >

Re: BEAM-8758: Code Review Wanted for PR 10765

2020-02-10 Thread Luke Cwik
I took a look, left relevant comments. On Mon, Feb 10, 2020 at 12:26 PM Tomo Suzuki wrote: > Hi Udi, Luke, and Beam committers, > > Would you review/merge this google-cloud-spanner dependency upgrade? > https://github.com/apache/beam/pull/10765 > > -- > Regards, > Tomo >

Re: Upgrades gcsio to 2.0.0

2020-02-10 Thread Luke Cwik
What prevents the usage of the newer version of Guava? On Mon, Feb 10, 2020 at 2:28 PM Esun Kim wrote: > Hi Beam Developers, > > I'm working on pr/10769 which > upgrades gcsio from 1.9.16 to 2.0.0 which is an intermediate step to get us > to use gcsio

Re: FnAPI proto backwards compatibility

2020-02-13 Thread Luke Cwik
On Wed, Feb 12, 2020 at 2:24 PM Kenneth Knowles wrote: > > > On Wed, Feb 12, 2020 at 12:04 PM Robert Bradshaw > wrote: > >> On Wed, Feb 12, 2020 at 11:08 AM Luke Cwik wrote: >> > >> > We can always detect on the runner/SDK side whether there is an unknow

Re: daily dataflow job failing today

2020-02-12 Thread Luke Cwik
+dev There was recently an update to add autoformatting to the Python SDK[1]. I'm seeing this during testing of a PR as well. 1: https://lists.apache.org/thread.html/448bb5c2d73fbd74eec7aacb5f28fa2f9d791784c2e53a2e3325627a%40%3Cdev.beam.apache.org%3E On Wed, Feb 12, 2020 at 9:57 AM Alan

Re: FnAPI proto backwards compatibility

2020-02-12 Thread Luke Cwik
On Wed, Feb 12, 2020 at 7:57 AM Robert Bradshaw wrote: > On Tue, Feb 11, 2020 at 7:25 PM Kenneth Knowles wrote: > > > > On Tue, Feb 11, 2020 at 8:38 AM Robert Bradshaw > wrote: > >> > >> On Mon, Feb 10, 2020 at 7:35 PM Kenneth Knowles > wrote: > >> > > >> > On the runner requirements side: if

Re: Request to be added to maintainters in Jira.

2020-02-12 Thread Luke Cwik
What is your JIRA id? Also, note that there is an ongoing issue that prevents many people from running tests themselves on their PRs[1] and requires asking on the dev@ mailing list for someone with the appropriate set of permissions to launch the tests for you. 1:

Re: KafkaIO to read from regex topic

2020-02-24 Thread Luke Cwik
I have been working on getting unbounded SDFs working within Beam over portability so if you are interested in writing an SDF KafkaIO implementation, I would be interested. On Mon, Feb 24, 2020 at 7:34 AM Alexey Romanenko wrote: > Hi Maulik, > > For the moment, KafkaIO doesn’t support reading

Re: Request to be added as contributor

2020-02-24 Thread Luke Cwik
Please provide your user id associated with your JIRA account. On Mon, Feb 24, 2020 at 8:38 AM Emiliano Capoccia wrote: > Hi, > > This is Emiliano from JP Morgan Chase. I develop applications in Beam and > run them on a Spark cluster on Kubernetes. Can someone add me as a > contributor for

Re: Custom 2.20 failing on Dataflow: what am I doing wrong?

2020-02-24 Thread Luke Cwik
I look at the gradle task definition to see what additional flags are being passed whenever trying to rerun/repro an issue. Many of our integration tests require additional flags/experiments which are unique on a runner per runner basis. On Wed, Feb 19, 2020 at 9:15 AM Alex Van Boxel wrote: >

Re: [VOTE] Vendored Dependencies Release Byte Buddy 1.10.8

2020-02-25 Thread Luke Cwik
-1 The jar contains META-INF/versions/9/module-info.class copied over from bytebuddy containing: module net.bytebuddy { requires static java.instrument; requires static jdk.unsupported; requires static net.bytebuddy.agent; exports net.bytebuddy; exports net.bytebuddy.agent.builder;

Re: [VOTE] Upgrade gradle to 6.2

2020-02-25 Thread Luke Cwik
+1 On Tue, Feb 25, 2020 at 12:49 AM Gleb Kanterov wrote: > +1 (non-binding) > > On Tue, Feb 25, 2020 at 9:38 AM Ismaël Mejía wrote: > >> +1 great to have our build updated, please share if there are new >> interesting features/plugin advantages we can benefit from too. >> >> On Tue, Feb 25,

[RESULT] [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.2 for BEAM-9252

2020-02-25 Thread Luke Cwik
I'm happy to announce that we have unanimously approved this release. There are 7 approving votes, 4 of which are binding: * Ismaël Mejía * Robert Bradshaw * Ahmet Altay * Luke Cwik There are no disapproving votes. Thanks everyone! On Sat, Feb 22, 2020 at 5:49 AM Ismaël Mejía wrote: >

Github flaky?

2020-02-25 Thread Luke Cwik
I have been getting errors from the Github UI when attempting to post a comment / merge a commit / 500 errors when reloading a page. Anyone else seeing this?

Re: [VOTE] Upgrade gradle to 6.2

2020-02-24 Thread Luke Cwik
+1 Can you try testing removing -Ppublishing and no parallel build requirement when doing the release? (some of the Gradle release related plugins had issues when running in parallel which manifested as build errors when doing the release). On Mon, Feb 24, 2020 at 5:43 PM Kenneth Knowles wrote:

Re: [VOTE] Vendored Dependencies Release Byte Buddy 1.10.8 RC2

2020-02-26 Thread Luke Cwik
+1 (binding) Verified signatures and contents of jar to not contain module-info.class On Wed, Feb 26, 2020 at 10:45 AM Kai Jiang wrote: > +1 (non-binding) > > On Wed, Feb 26, 2020 at 01:23 Ismaël Mejía wrote: > >> Please review the release of the following artifacts that we vendor: >> *

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Luke Cwik
t; > even GRPC, but I don't think having classical-only or > > classical-excluded features is where we want to be long-term. > > > > > On Tue, Mar 3, 2020 at 1:41 AM Robert Bradshaw > wrote: > > > > > > > > I don't have a strong preference for us

Re: Java SplittableDoFn Watermark API

2020-03-04 Thread Luke Cwik
ps://github.com/apache/beam/blob/ded686a58ad4747e91a26d3e59f61019b641e655/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java#L130 4: https://github.com/apache/beam/blob/ded686a58ad4747e91a26d3e59f61019b641e655/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L1174

Re: Permission to self-assign JIRAs

2020-03-02 Thread Luke Cwik
Welcome, you have been added. On Mon, Mar 2, 2020 at 3:57 AM Jozef Vilcek wrote: > Can I please get a permission in JIRA for `jvilcek` user to self assign > JIRAs? >

Re: Java SplittableDoFn Watermark API

2020-03-03 Thread Luke Cwik
core/SplittableParDoViaKeyedWorkItems.java > On Tue, Mar 3, 2020 at 1:41 AM Robert Bradshaw > wrote: > > > > I don't have a strong preference for using a provider/having a set of > > tightly coupled methods in Java, other than that we be consistent (and > > we alre

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288

2020-03-03 Thread Luke Cwik
+1 (binding) Verified signatures Verified that there are no conscrypt classes in jar Verified pom.xml has runtime dependency on conscrypt On Tue, Mar 3, 2020 at 10:35 AM Jean-Baptiste Onofré wrote: > +1 (binding) > > Regards > JB > > Le mar. 3 mars 2020 ? 19:31, Luke Cwik a ?

[VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288

2020-03-03 Thread Luke Cwik
Please review the release of the following artifacts that we vendor: * beam-vendor-grpc-1_26_0 Hi everyone, Please review and vote on the release candidate #1 for the version 0.3, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The

Re: Contributor Permission

2020-03-03 Thread Luke Cwik
Welcome, you have been added. On Tue, Mar 3, 2020 at 9:18 AM Fernando Díaz González wrote: > Hi! I work with Beam SQL at Spotify together with Gleb Kanterov. Can > someone add me as a contributor so I can assign myself a ticket I have > created? > > My username is fdiazgon. > > Thanks! >

  1   2   3   4   5   >