Re: [VOTE] Sign a pledge to discontinue support of Python 2 in 2020.

2019-10-01 Thread Lukasz Cwik
+1 On Tue, Oct 1, 2019 at 10:39 AM Ning Kang wrote: > +1 > > On Tue, Oct 1, 2019 at 10:17 AM Pablo Estrada wrote: > >> +1 >> >> I guess it was http://python3statement.org : ) >> >> On Tue, Oct 1, 2019 at 10:14 AM Mark Liu wrote: >> >>> +1 >>> >>> btw, the link (http://python3stament.org) you

Re: Introduction + Support in Comms for Beam!

2019-09-30 Thread Lukasz Cwik
Welcome to the community. On Mon, Sep 30, 2019 at 3:15 PM María Cruz wrote: > Hi everyone, > my name is María Cruz, I am from Buenos Aires but I live in the Bay Area. > I recently became acquainted with Apache Beam project, and I got a chance > to meet some of the Beam community at Apache Con

Re: Shuffling on shardnum, is it necessary?

2019-09-27 Thread Lukasz Cwik
Using a state variable to store the shard key introduces a GroupByKey within Dataflow to ensure that there is a strict ordering on state. Other runners insert similar materializations to guarantee this as well. Also a sufficiently powerful enough execution engine could do state processing for the

Re: Do we know why gradle scans are not working?

2019-09-26 Thread Lukasz Cwik
AM Lukasz Cwik wrote: > I reached out on the Gradle forum > https://discuss.gradle.org/t/your-build-scan-could-not-be-displayed-what-does-this-mean/33302 > > On Wed, Sep 25, 2019 at 8:49 AM Łukasz Gajowy wrote: > >> FWIW, I tried doing it locally and observed the same behavi

Re: Possible Python SDK performance regression

2019-09-25 Thread Lukasz Cwik
My environment has gotten all the dependencies installed/setup/maintained organically over time as the project has evolved. On Wed, Sep 25, 2019 at 9:56 AM Thomas Weise wrote: > The issue was related to how we build our custom packages. > > However, what might help users is documentation about

Re: Do we know why gradle scans are not working?

2019-09-25 Thread Lukasz Cwik
ojects. This is all I know for now. > > Łukasz > > wt., 24 wrz 2019 o 22:43 Lukasz Cwik napisał(a): > >> Not to my knowledge. Maybe something is down. >> >> Have you tried running a gradle build locally with --scan? >> >> On Tue, Sep 24, 2019 at 1:03 PM Valen

Re: Collecting feedback for Beam usage

2019-09-24 Thread Lukasz Cwik
being sent. > > > > One more heavy-weight option is to also allow user configure and persist > what information he is ok with sharing. > > > > --Mikhail > > > > > > On Tue, Sep 24, 2019 at 10:02 AM Lukasz Cwik wrote: > >> > >> Why not

Re: Do we know why gradle scans are not working?

2019-09-24 Thread Lukasz Cwik
Not to my knowledge. Maybe something is down. Have you tried running a gradle build locally with --scan? On Tue, Sep 24, 2019 at 1:03 PM Valentyn Tymofieiev wrote: > For example, https://gradle.com/s/mpfu3wpz2xfwe says: Your build scan > could not be displayed. >

Re: Jenkins queue times steadily increasing for a few months now

2019-09-24 Thread Lukasz Cwik
We can get the per gradle task profile with the --profile flag: https://jakewharton.com/static/files/trace/profile.html This information also appears within the build scans that are sent to Gradle. Integrating with either of these sources of information would allow us to figure out whether its

Re: Collecting feedback for Beam usage

2019-09-24 Thread Lukasz Cwik
Why not add a flag to the SDK that would do the phone home when specified? >From a support perspective it would be useful to know: * SDK version * Runner * SDK provided PTransforms that are used * Features like user state/timers/side inputs/splittable dofns/... * Graph complexity (# nodes, #

Re: Next LTS?

2019-09-23 Thread Lukasz Cwik
I agree with what Lukasz Gajowy mentioned. I find that Jenkins is fine when your developing at HEAD but as soon as you cut a branch, the jenkins configuration starts to drift as it keeps getting updated to HEAD with the seed job. I was always thinking that the Jenkins configurations would be as

Re: New contributor to BEAM SQL

2019-09-16 Thread Lukasz Cwik
Welcome Kirill, I have granted you the JIRA permissions you requested. On Mon, Sep 16, 2019 at 10:59 AM Kirill Kozlov wrote: > Hello everyone! > > My name is Kirill Kozlov, I recently joined a Dataflow team at Google and > will be working on SQL filter pushdown. > Can I get permission to work

Re: portableWordCountBatch and portableWordCountStreaming failing in Python PreCommit

2019-09-16 Thread Lukasz Cwik
I'm also being impacted by this on my PR[1]. I found BEAM-6316[2] that has a similar error but it was resolved Dec 2018. 1: https://github.com/apache/beam/pull/9583 2: https://issues.apache.org/jira/browse/BEAM-6316 On Mon, Sep 16, 2019 at 12:43 PM Ning Kang wrote: > A new check renders

Re: How do you write portable runner pipeline on separate python code ?

2019-09-13 Thread Lukasz Cwik
nk/Spark cluster or Dataflow). It would be very convenient if >>>>> we >>>>> could automatically stage local files to be read as artifacts that could >>>>> be >>>>> consumed by any worker (possibly via external directory mounting in th

Fwd: Gradle Training Roundup

2019-09-12 Thread Lukasz Cwik
Forwarding some events to improve your Gradle-fu so that we can all improve and maintain our build system. -- Forwarded message - From: Alex Leventer Date: Thu, Sep 12, 2019 at 10:55 AM Subject: Gradle Training Roundup To: Hi there, Join us for these upcoming free, online,

Re: Feature request - file metadata AVP

2019-09-11 Thread Lukasz Cwik
In Java, each of the file system operations take an "options" class[1] such as CreateOptions/MoveOptions. In Python, there is an explicit field as a parameter[2]. Go doesn't seem to have those options available[3]. There was originally a plan to make those options classes be like PipelineOptions

Re: Request to join group

2019-09-11 Thread Lukasz Cwik
Thanks for reaching out, I have added your Google account to the apache-beam-testing project with Viewer permissions. On Tue, Sep 10, 2019 at 6:19 PM Hubert Theodore wrote: > Hi Beam Dev Team! > > I am a Google engineer working on Cloud Console Dataflow. I have been > assigned a bug related to

Re: clickhouse tests failing

2019-09-08 Thread Lukasz Cwik
Is passing at head on Jenkins: https://builds.apache.org/job/beam_PreCommit_Java_Cron/1771/testReport/org.apache.beam.sdk.io.clickhouse/ What are the failures your seeing at initialization? (the tests do rely on setting up zookeeper and other stuff that could fail) On Fri, Sep 6, 2019 at 12:36

Re: [DISCUSS] Supporting multiple Flink versions vs. tech debt

2019-09-07 Thread Lukasz Cwik
When we import the Beam code into Google, we also run into issues where sometimes we need to transform parts of the code. During import we use copybara[1] to do these transformations to the source which are more then just copy file X from some other path since most of the time we want to change

Re: [VOTE] Vendored Dependencies Release

2019-09-05 Thread Lukasz Cwik
lp on this operation to move [1] to >>> "release" repo? >>> >>> [1]: [1] >>> https://dist.apache.org/repos/dist/dev/beam/vendor/calcite/1_20_0 >>> >>> -Rui >>> >>> On Wed, Sep 4, 2019 at 10:16 AM Rui Wang wrote:

Re: Save state on tear down

2019-09-03 Thread Lukasz Cwik
execution graph causing windows to close, timers to fire, state to be emit and then garbage collected and so forth. > > thanks, > -chad > > > On Fri, Aug 16, 2019 at 2:47 PM Jose Delgado > wrote: > >> I see, thank you Lukasz. >> >> >> >&g

Re: [VOTE] Vendored Dependencies Release

2019-09-03 Thread Lukasz Cwik
+1 On Tue, Sep 3, 2019 at 1:22 PM Kenneth Knowles wrote: > +1 > > On Tue, Sep 3, 2019 at 11:00 AM Ahmet Altay wrote: > >> +1 >> >> On Tue, Sep 3, 2019 at 10:52 AM Andrew Pilloud >> wrote: >> >>> +1 >>> >>> Inspected the jar it looked reasonable. >>> >>> Andrew >>> >>> On Tue, Sep 3, 2019 at

Re: How to Implement a Runner for C++ Streaming Processing Engine?

2019-08-30 Thread Lukasz Cwik
Sorry, meant to say that some re-usable parts exist in Python and Go (not C++). On Fri, Aug 30, 2019 at 8:36 AM Lukasz Cwik wrote: > There is an ongoing portability effort which is attempting to enable any > runner execute any Beam SDK written in any language. > > A good starting

Re: [PROPOSAL] Preparing for Beam 2.16.0 release

2019-08-29 Thread Lukasz Cwik
+1 On Thu, Aug 29, 2019 at 7:42 PM Alan Myrvold wrote: > +1 Thanks for keeping to the schedule, Mark. > > On Thu, Aug 29, 2019 at 6:21 PM jincheng sun > wrote: > >> Hi Mark, >> >> +1 and thank you for keeping the cadence! >> >> BTW I have mark the Fix Version for some of issues to 2.17, which

Re: Improve container support

2019-08-28 Thread Lukasz Cwik
Google locks down docs created wtih @google.com addresses. Hannah please recreate the doc using a non @google.com address and share it with the community. You'll want to replace Google short link with an Apache short link (s.apache.org). On Wed, Aug 28, 2019 at 5:40 AM Gleb Kanterov wrote: >

Re: Write-through-cache in State logic

2019-08-27 Thread Lukasz Cwik
euse > // cached data returned by the State API across multiple bundles. repeated > CacheToken cache_tokens = 2; > } > > On 27.08.19 19:22, Lukasz Cwik wrote: > > SideInputState -> SideInput (side_input_state -> side_input) > + more comments around t

Re: Write-through-cache in State logic

2019-08-27 Thread Lukasz Cwik
list of cache tokens that can be used by an SDK to reuse > // cached data returned by the State API across multiple bundles. repeated > CacheToken cache_tokens = 2; > } > > -Max > > On 27.08.19 18:43, Lukasz Cwik wrote: > > The bundles view of side inputs should never change

Re: Write-through-cache in State logic

2019-08-27 Thread Lukasz Cwik
y beneficial. > > For the first version I want to focus on user state because that's where > I see the most benefit for caching. I don't see a problem though for the > Runner to detect new side input and reflect that in the cache tokens > supplied for a new bundle. > > -Max >

Re: Help triaging Jira issues

2019-08-27 Thread Lukasz Cwik
t; cases > they should be. Please take care to do the triage if it is already > assigned on creation or if you judge it is complete enough but prefer to > let it > unassigned in case someone else can work on it. That will for sure reduce > the > triage work until this become

Re: Write-through-cache in State logic

2019-08-26 Thread Lukasz Cwik
://github.com/apache/beam/pull/9374 > for the SDK: https://github.com/apache/beam/pull/9418 > > Note that the Runner PR needs to be updated to fully reflected the above > scheme. The SDK implementation is WIP. I want to make sure that we > clarify the design before this gets f

Re: Write-through-cache in State logic

2019-08-26 Thread Lukasz Cwik
19 09:33, Reuven Lax wrote: > >> > Dataflow does something like this, however since work is > >> > load balanced across workers a per-worker id doesn't work very > well. > >> > Dataflow divides the keyspace up into lexicographic ranges, and > >

Re: [Discuss] Propose Calcite Vendor Release

2019-08-22 Thread Lukasz Cwik
+1 for release On Thu, Aug 22, 2019 at 8:20 AM Kenneth Knowles wrote: > +1 to doing this release. There is no risk since nothing will use the 0.1 > version and if it has problems we just make 0.2, etc, etc. > > And big thanks to Rui for volunteering. > > On Wed, Aug 21, 2019 at 11:11 PM Kai

Re: [VOTE] Release 2.15.0, release candidate #2

2019-08-21 Thread Lukasz Cwik
+1 (binding) I validated the signatures against the key dist/release/KEYS and hashes of the source distributions and release artifacts. I also ran some of the quickstarts for Java. On Tue, Aug 20, 2019 at 3:59 PM Pablo Estrada wrote: > +1 > > I've installed from the source in apache/dist. >

Re: (mini-doc) Beam (Flink) portable job templates

2019-08-20 Thread Lukasz Cwik
On Mon, Aug 19, 2019 at 5:52 PM Ahmet Altay wrote: > > > On Sun, Aug 18, 2019 at 12:34 PM Thomas Weise wrote: > >> There is a PR open for this: https://github.com/apache/beam/pull/9331 >> >> (it wasn't tagged with the JIRA and therefore not linked) >> >> I think it is worthwhile to explore how

Re: Java serialization for coders and compatibility

2019-08-13 Thread Lukasz Cwik
Coders such as AvroCoder are translated to an intermediate JSON form called a CloudObject[1]. Dataflow only uses the serialized Java representation (embedded as bytes in ?base64? within the CloudObject) for coders which extend SerializableCoder[2]. Dataflow only cares that these CloudObject

Re: [VOTE] Support ZetaSQL as another SQL dialect for BeamSQL in Beam repo

2019-08-13 Thread Lukasz Cwik
+1 On Tue, Aug 13, 2019 at 9:09 AM Andrew Pilloud wrote: > +1 > I also hope this can move to Calcite. > > On Tue, Aug 13, 2019 at 2:40 AM Gleb Kanterov wrote: > >> +1 >> >> On Tue, Aug 13, 2019 at 10:47 AM Ismaël Mejía wrote: >> >>> +1 >>> Wishing that this goes to calcite too someday (hoping

Re: Write-through-cache in State logic

2019-08-13 Thread Lukasz Cwik
d). > -Max > > [1] > > https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m > > Is it simply to > On 12.08.19 19:55, Lukasz Cwik wrote: > > > > > > On Mon, Aug 12, 2019 at 10:09 AM Thomas Weise > <mailto:t...@

Re: Write-through-cache in State logic

2019-08-12 Thread Lukasz Cwik
heckpoint performance so we may want to flush state that hasn't been used in a while as well. > Another performance improvement would be caching read requests because >> these first go to the Runner regardless of already cached appends. >> >> -Max >> >>

Re: Docker Run Options in SDK Container

2019-08-09 Thread Lukasz Cwik
On Fri, Aug 2, 2019 at 11:00 AM Chad Dombrova wrote: > Hi all, > I’m a bit confused about the desire to use json for the environment_config. > Note that complex PipelineOptions are already expected to be in JSON format[1, 2]. This has solved many string parsing and ambiguity issues. > It’s

Re: Late data handling in Python SDK

2019-08-09 Thread Lukasz Cwik
+dev Related JIRA's I found are BEAM-3759 and BEAM-7825. This has been a priority thing as the community has been trying to get streaming Python execution working on multiple Beam runners. On Wed, Aug 7, 2019 at 2:31 AM Sam Stephens wrote: > Hi all, > > I’ve been reading into, and

Re: Write-through-cache in State logic

2019-08-09 Thread Lukasz Cwik
can add the id of the last state request as part of the ProcessBundleResponse. > [1] > https://github.com/apache/beam/blob/release-2.14.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L627 > > On Thu, Aug 8, 2019 at 6:57 PM Lukasz Cwik wrote: > > > > The purpose of the new state A

Re: Inconsistent Results with GroupIntoBatches PTransform

2019-08-08 Thread Lukasz Cwik
Have you tried running this on more than one runner (e.g. Dataflow, Flink, Direct)? Are you setting --streaming when executing? On Thu, Aug 8, 2019 at 10:23 AM rahul patwari wrote: > Hi, > > I am getting inconsistent results when using GroupIntoBatches PTransform. > I am using Create.of()

Re: Proposal for SDFs in the Go SDK

2019-08-08 Thread Lukasz Cwik
Thanks for the informative doc. Added a bunch of questions/feedback. On Thu, Aug 8, 2019 at 9:15 AM Robert Burke wrote: > Thanks for the spending the time writing this up! I'm looking forward to > seeing how the prototype implementation plays out. In particular with the > extensive section on

Re: Allowing firewalled/offline builds of Beam

2019-08-08 Thread Lukasz Cwik
Udi beat me by a couple of mins. We build a good portion of the Beam Java codebase internally within Google by bypassing the gradle wrapper (gradlew) and executing the gradle command from a full gradle installation at the root of a copy of the Beam codebase. It does require your internal build

Re: Java 11 compatibility question

2019-08-07 Thread Lukasz Cwik
Since java8 -> java11 is similar to python2 -> python3 migration, what was the acceptance criteria there? On Wed, Aug 7, 2019 at 1:54 PM Elliotte Rusty Harold wrote: > > > On Wed, Aug 7, 2019 at 9:41 AM Michał Walenia > wrote: > >> >> Are these tests sufficient to say that we’re java 11

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

2019-08-07 Thread Lukasz Cwik
I wanted to add some more details about the state discussion. BEAM-7000 is about adding support for a gRPC message saying that the SDK is now blocked on one of its requests. This would allow for an easy optimization on the runner side where it gathers requests and is able to batch them knowing

Re: Waht would be the best place for performance tests documentation?

2019-08-07 Thread Lukasz Cwik
I also think confluence makes the most sense. On Wed, Aug 7, 2019 at 11:57 AM Alexey Romanenko wrote: > I agree with Cyrus that Confluence page should a good place for that > since, seems, it will be very dev oriented documentation. > > > On 7 Aug 2019, at 16:31, Cyrus Maden wrote: > > Hi

Re: Write-through-cache in State logic

2019-08-05 Thread Lukasz Cwik
it to convert clear + appends into set calls and do any other optimizations as well. By default, the runner would have a time and space based limit on how many outstanding state calls there are before choosing to resolve them. On Mon, Aug 5, 2019 at 5:43 PM Lukasz Cwik wrote: > Now I see what you m

Re: Write-through-cache in State logic

2019-08-05 Thread Lukasz Cwik
eficial to have a dedicated fn api > operation to allow for such optimization. That's something that needs to be > determined with a profiler :) > > But the low hanging fruit is cross-bundle caching. > > Thomas > > On Mon, Aug 5, 2019 at 2:06 PM Lukasz Cwik wrote: > >> Thomas

Re: Latency of Google Dataflow with Pubsub

2019-08-05 Thread Lukasz Cwik
+dev On Mon, Aug 5, 2019 at 12:49 PM Dmitry Minaev wrote: > Hi there, > > I'm building streaming pipelines in Beam (using Google Dataflow runner) > and using Google Pubsub as a message broker. I've made a couple of > experiments with a very simple pipeline: consume events from Pubsub >

Re: Write-through-cache in State logic

2019-08-05 Thread Lukasz Cwik
gt; >> > >>> >> > @Robert >>> >> > I am interested to see the proposal. Can you provide me the link of >>> the proposal? >>> >> > >>> >> > [1]: >>> https://github.com/apache/beam/blob/db59a3df665e094f

[RESULT] [VOTE] Vendored Dependencies Release

2019-07-16 Thread Lukasz Cwik
I'm happy to announce that we have unanimously approved this release. There are 4 approving votes, 3 of which are binding: * Ismaël Mejía * Lukasz Cwik * Pablo Estrada There are no disapproving votes. Thanks everyone! On Tue, Jul 16, 2019 at 4:30 AM Ismaël Mejía wrote: > +1 > >

Re: Write-through-cache in State logic

2019-07-16 Thread Lukasz Cwik
User state is built on top of read, append and clear and not off a read and write paradigm to allow for blind appends. The optimization you speak of can be done completely inside the SDK without any additional protocol being required as long as you clear the state first and then append all your

Re: [VOTE] Vendored Dependencies Release

2019-07-15 Thread Lukasz Cwik
+1 On Mon, Jul 15, 2019 at 8:14 PM Pablo Estrada wrote: > +1 > verified hashes and signatures > > On Fri, Jul 12, 2019 at 9:40 AM Kai Jiang wrote: > >> +1 (non-binding) >> >> On Thu, Jul 11, 2019 at 8:27 PM Lukasz Cwik wrote: >> >>> Please revie

Re: Return types of Write transforms (aka best way to signal)

2019-07-15 Thread Lukasz Cwik
In the POutput case (4), does that mean we will have to compute all those outputs in the transform even if they aren't used? If yes, I prefer (6) because it allows for the transform structure to be modified to either produce these additional outputs only if they will be consumed instead of having

Re: Circular dependencies between DataflowRunner and google cloud IO

2019-07-15 Thread Lukasz Cwik
task you linked. > Is keeping the test in a separate package viable in your opinion? > > Thanks! > Michal > > On Fri, Jul 12, 2019 at 3:45 PM Lukasz Cwik wrote: > >> Yes, there is a dependency between Dataflow -> GCP IOs and this is >> expected since Dataflow depen

Re: Beam/Samza Ensuring At Least Once semantics

2019-07-12 Thread Lukasz Cwik
window that is not completely > processed. > > > > *From: *Lukasz Cwik > *Date: *Wednesday, July 10, 2019 at 11:07 AM > *To: *dev > *Cc: *"LeVeck, Matt" , "Deshpande, Omkar" < > omkar_deshpa...@intuit.com>, Xinyu Liu , Xinyu Liu > , Samarth Shetty ,

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Lukasz Cwik
Shannon Duncan > wrote: > >> Aha, makes sense. Thanks! >> >> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik wrote: >> >>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of())); >>> >>> On Fri, Jul 12, 2019 at 10:22 AM Shannon D

Re: Circular dependencies between DataflowRunner and google cloud IO

2019-07-12 Thread Lukasz Cwik
Yes, there is a dependency between Dataflow -> GCP IOs and this is expected since Dataflow depends on parts of those implementations for its own execution purposes. We definitely don't want GCP IOs depending on Dataflow since we would like users of other runners to still be able to use GCP IOs

[VOTE] Vendored Dependencies Release

2019-07-11 Thread Lukasz Cwik
Please review the release of the following artifacts that we vendor: * beam-vendor-grpc_1_21_0 * beam-vendor-guava-26_0-jre * beam-vendor-bytebuddy-1_9_3 Hi everyone, Please review and vote on the release candidate #3 for the org.apache.beam:beam-vendor-grpc_1_21_0:0.1,

Re: [VOTE] Vendored Dependencies Release

2019-07-10 Thread Lukasz Cwik
g/codehaus/mojo, com/google/errorprone, >> org/checkerframework, javax/annotation >> >> On Tue, Jul 9, 2019 at 3:34 PM Lukasz Cwik wrote: >> >>> Please review the release of the following artifacts that we vendor: >>> * beam-vendor-grpc_1_21

Re: Beam/Samza Ensuring At Least Once semantics

2019-07-10 Thread Lukasz Cwik
When you restart the application, are you resuming it from Samza's last commit? Since the exception is thrown after the GBK, all the data could be read from Kafka and forwarded to the GBK operator inside of Samza and checkpointed in Kafka before the exception is ever thrown. On Tue, Jul 9, 2019

Re: Phrase triggering jobs problem

2019-07-10 Thread Lukasz Cwik
This has happened in the past. Usually there is some issue where Jenkins isn't notified of new PRs by Github or doesn't see the PR phrases and hence Jenkins sits around idle. This is usually fixed after a few hours without any action on our part. On Wed, Jul 10, 2019 at 10:28 AM Katarzyna

[VOTE] Vendored Dependencies Release

2019-07-09 Thread Lukasz Cwik
Please review the release of the following artifacts that we vendor: * beam-vendor-grpc_1_21_0 * beam-vendor-guava-26_0-jre Hi everyone, Please review and vote on the release candidate #2 for the org.apache.beam:beam-vendor-grpc_1_21_0:0.1 and org.apache.beam:beam-vendor-guava-26_0-jre:0.1, as

Re: Apache Beam issue | Reading Avro files and pushing to Bigquery

2019-07-09 Thread Lukasz Cwik
+user (please use user@ for questions about using the product and restrict to dev@ for questions related to developing the product). Can you provide the rest of the failing reason (and any stacktraces from the workers related to the failures)? On Tue, Jul 9, 2019 at 11:04 AM Dhiraj Sardana

Re: Unable to start BEAM sql shell

2019-07-09 Thread Lukasz Cwik
Thanks for the fixes. I have reviewed both and merged them. On Tue, Jul 9, 2019 at 10:59 AM Kyle Weaver wrote: > I would also make sure that you are running the command from the root of > the repo. > > Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com > | +1650203 > >

Re: [Discuss] Create stackoverflow tags for python, java and go SDKs?

2019-07-09 Thread Lukasz Cwik
That sounds like a good idea to me. On Wed, Jul 3, 2019 at 10:45 AM Rui Wang wrote: > Hi Community, > > When reading apache-beam related questions in stackoverflow, it happens > that some questions only mention version number(e.g. 2.8.0) but not mention > which SDK related. Sometimes I can tell

Re: [VOTE] Vendored dependencies release process

2019-07-08 Thread Lukasz Cwik
Thanks for taking a look. I followed up on your questions. On Mon, Jul 8, 2019 at 3:58 PM Udi Meiri wrote: > I left some comments. Being new to the Beam releasing process, my question > might be trivial to someone actually performing the release. > > On Tue, Jul 2, 2019 at 4:49 PM

Re: [VOTE] Vendored dependencies release process

2019-07-06 Thread Lukasz Cwik
+1 On Wed, Jul 3, 2019 at 10:24 AM Jens Nyman wrote: > +1 > > On 2019/07/02 23:49:10, Lukasz Cwik wrote: > > Please vote based on the vendored dependencies release process as> > > discussed[1] and documented[2].> > > > > Please vote as follows:> >

Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-03 Thread Lukasz Cwik
works from Beam developers while we can avoid that > and use the existing tools (gradle, jenkins). > > Thanks! > > pt., 28 cze 2019 o 17:31 Lukasz Cwik napisał(a): > >> +1 for removing tests that are not maintained. >> >> Are there features in Perfkit that we would lik

Re: Wiki access?

2019-07-03 Thread Lukasz Cwik
I have added you. Thanks for helping out with the docs. On Wed, Jul 3, 2019 at 8:22 AM Ryan Skraba wrote: > Oof, sorry: ryanskraba > > Thanks in advance! There's a lot of great info in there. > > On Wed, Jul 3, 2019 at 5:03 PM Lukasz Cwik wrote: > >> Can

Re: Wiki access?

2019-07-03 Thread Lukasz Cwik
Can you share your login id for cwiki.apache.org? On Wed, Jul 3, 2019 at 7:21 AM Ryan Skraba wrote: > Hello -- I've been reading through a lot of Beam documentation recently, > and noting minor typos here and there... Is it possible to get Wiki access > to make fixes on the spot? > > Best

[VOTE] Vendored dependencies release process

2019-07-02 Thread Lukasz Cwik
Please vote based on the vendored dependencies release process as discussed[1] and documented[2]. Please vote as follows: +1: Adopt the vendored dependency release process -1: The vendored release process needs to change because ... Since many people in the US may be out due to the holiday

Re: Change of Behavior - JDBC Set Command

2019-07-02 Thread Lukasz Cwik
in/codegen/includes/parserImpls.ftl#L307 > [2] > https://github.com/apache/beam/blob/b2fd4e392ede19f03a48997252970b8bba8535f1/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcConnection.java#L82 > > On Fri, Jun 28, 2019 at 7:57 AM Lukasz Cwik wro

Re: BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

2019-07-01 Thread Lukasz Cwik
I think the BQ streaming writes buffers data into batches and sends them when used with STREAMING_INSERTS. Have you been able to ask the user to get a heap dump to see what was using the majority of memory? On Mon, Jul 1, 2019 at 2:34 PM Lukasz Cwik wrote: > I think the Bq > > On M

Re: BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

2019-07-01 Thread Lukasz Cwik
I think the Bq On Mon, Jul 1, 2019 at 10:54 AM Mikhail Gryzykhin wrote: > Hello everybody, > > This question is regarding user post on StackOverflow > > . > > My understanding of

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-28 Thread Lukasz Cwik
Thanks for the explanation. On Fri, Jun 28, 2019 at 6:49 AM Reuven Lax wrote: > This happens when the watermark hops forward. In practice whenever there > is any backlog, this is the normal mode of operation. > > On Fri, Jun 28, 2019, 12:42 AM Lukasz Cwik wrote: > >>

Re: Stop using Perfkit Benchmarker tool in all tests?

2019-06-28 Thread Lukasz Cwik
+1 for removing tests that are not maintained. Are there features in Perfkit that we would like to be using that we aren't? Can we make the integration with Perfkit less brittle? If we aren't getting much and don't plan to get much value in the short term, removal makes sense to me. On Thu, Jun

Re: gRPC method to get a pipeline definition?

2019-06-28 Thread Lukasz Cwik
+dev On Fri, Jun 28, 2019 at 8:20 AM Chad Dombrova wrote: > > I think the simplest solution would be to have some kind of override/hook >> that allows Flink/Spark/... to provide storage. They already have a concept >> of a job and know how to store them so can we piggyback the Beam pipeline >>

Re: Change of Behavior - JDBC Set Command

2019-06-28 Thread Lukasz Cwik
y and we can avoid > this if we are able to just set the pipeline options by name in the first > place. In that case we can just use whatever PipelineOptions instance we > have at the moment without extra validation / reconciliation. > > Hope this makes sense. > > Regards, > Anton

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Lukasz Cwik
ark holds (which is how the timer holds up the watermark today, > as there is no timer watermark) is per key. Usually the input watermark > making a "hop" is not a problem, in fact it's the normal state of affairs. > > On Fri, Jun 28, 2019 at 1:08 AM Lukasz Cwik wrote: &

Re: [DISCUSS] Releasing Vendored Artifacts

2019-06-27 Thread Lukasz Cwik
Thanks Ismael for the feedback on the doc. If there isn't any additional feedback, I will start a process vote on the release procedure of vendored artifacts on Tuesday. On Tue, Jun 25, 2019 at 10:24 AM Lukasz Cwik wrote: > Ismael mentioned[1] that there is confusion about how to rele

Re: DirectRunner timers are not strictly time ordered

2019-06-27 Thread Lukasz Cwik
ng > I also experimented with at first (in DirectRunner), but it then turned > out, that it is equivalent to firing only timers for lowest timestamp. > On 6/20/19 9:52 PM, Reuven Lax wrote: > > I think BEAM-2535 is independent. > > On Thu, Jun 20, 2019 at 9:47 PM Lukasz Cwik wrote: >

Re: [DISCUSS] Solving timer ordering on immutable bundles

2019-06-27 Thread Lukasz Cwik
I'm confused as to why it is valid to advance the watermark to T3 in the original scenario. T1 and T2 should be treated as inputs to the function and hold the input watermark hence T1 should fire and if it doesn't produce any new timers before T2, then T2 should fire since the watermark will now

Re: Spotless exclusions

2019-06-26 Thread Lukasz Cwik
On Wed, Jun 26, 2019 at 4:22 PM Anton Kedin wrote: > Currently our spotless is configured globally [1] (for java at least) to > include all source files by '**/*.java'. And then we exclude things > explicitly. Don't know why, but these exclusions are ignored for me > sometimes, for example

[DISCUSS] Releasing Vendored Artifacts

2019-06-25 Thread Lukasz Cwik
Ismael mentioned[1] that there is confusion about how to release and validate vendored artifacts. I have created this doc[2] and could use guidance from the community to validate its contents. Feel free to comment on the doc or this thread. Note that I used our release guide[3] as a basis for

Re: [VOTE] Release vendored artifacts upgrading Guava usage to 26.0-jre, release candidate #1

2019-06-25 Thread Lukasz Cwik
> process of verification and in general of release of the vendored > dependencies, so probably it is worth to do this and add it to the > release guide [1] (or as an independent document) so we can do the > validation eagerly. > > [1] https://beam.apache.org/contribute/

Re: Change of Behavior - JDBC Set Command

2019-06-25 Thread Lukasz Cwik
That makes sense. I took a look at your PR, is there a way to do it without exposing the reflection capabilities to pipeline authors? On Mon, Jun 24, 2019 at 2:20 PM Alireza Samadian wrote: > Hi all, > > I am writing to ask if it is OK to slightly change the behaviour of SET > command in JDBC

Re: PTransform.expand() guarantees

2019-06-21 Thread Lukasz Cwik
On Fri, Jun 21, 2019 at 10:01 AM Alexey Romanenko wrote: > Thank you for answers, Lukasz. > > On 21 Jun 2019, at 18:15, Lukasz Cwik wrote: > >> Does Beam guarantee where (at “driver” or at "worker” of backend system) " >> *PTransform.expand()*” of

Re: PTransform.expand() guarantees

2019-06-21 Thread Lukasz Cwik
On Fri, Jun 21, 2019 at 9:07 AM Alexey Romanenko wrote: > Hello, > > I tried to find an answer in documentation for the questions below but I > haven’t managed to do that. Actually, there are 3 related questions: > > Does Beam guarantee where (at “driver” or at "worker” of backend system) " >

Re: Assigning Reviewers in GitHub?

2019-06-21 Thread Lukasz Cwik
Only a few people have permission to update the 'Reviewers' section and I believe you either have to be a project PMC member or committer to be able to update it which is why all people should use "R: @GITHUB-USERNAME" as specified in the contribution guide[1]. 1:

Re: [VOTE] Release vendored artifacts upgrading Guava usage to 26.0-jre, release candidate #1

2019-06-20 Thread Lukasz Cwik
t 9:51 AM Lukasz Cwik wrote: > >> Hi everyone, >> >> Please review the release of the following artifacts that we vendor: >> beam-vendor-guava-26_0-jre >> beam-vendor-grpc-1_21_0 >> >> Please vote as follows: >> [ ] +1, Approve the release >> [

[VOTE] Release vendored artifacts upgrading Guava usage to 26.0-jre, release candidate #1

2019-06-20 Thread Lukasz Cwik
Hi everyone, Please review the release of the following artifacts that we vendor: beam-vendor-guava-26_0-jre beam-vendor-grpc-1_21_0 Please vote as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-06-17 Thread Lukasz Cwik
gt; [1] https://github.com/apache/beam/pull/8194 >>> >>> >>> >>> On Thu, Apr 11, 2019 at 1:04 AM Michael Luckey >>> wrote: >>> >>>> To my understanding, that's it, yes. Of course, there might be other >>>> places/plugins whic

Re: Jira permissions

2019-06-13 Thread Lukasz Cwik
Welcome, I have added you as a contributor and assigned BEAM-7542 to you. On Wed, Jun 12, 2019 at 9:18 PM Viktor Gerdin wrote: > >Hello > >My name it Viktor > >I've encountered an issue (BEAM-7542 >) and would like to >

DRAFT - Apache Beam Board Report - June '19

2019-06-13 Thread Lukasz Cwik
Hi all Our next project report to the ASF Board of Directors is due June 14th. I've seeded a draft here: https://docs.google.com/document/d/1GY16lzVKL-mPh4M560AtqPAB1kXEptkhcBymvFr-4z8/edit?usp=sharing Please help to eliminate all the TODOs by adding suggestions. Luke

Re: Help triaging Jira issues

2019-06-12 Thread Lukasz Cwik
I looked at automating the two in JIRA but got the unhelpful: "You are using Automation Lite for Jira. This is the free offering of Automation for Jira Pro and only contains a small subset of the many awesome features of the paid app. For example, project admins like yourself can can only create

Re: Python dependency compatibility badges

2019-06-12 Thread Lukasz Cwik
SGTM On Wed, Jun 12, 2019 at 8:53 AM Ahmet Altay wrote: > Looks like a nice improvement to me. To make it very explicit, it seems to > focus on compatibility issues with google managed libraries even though the > reports identify general old dependencies as well. > > On Wed, Jun 12, 2019 at

Re: Contributor permission request for Apache Beam Jira

2019-06-11 Thread Lukasz Cwik
You have been added. On Tue, Jun 11, 2019 at 11:56 AM Andy Wang wrote: > It's anyyw, forgot to add that to my first email. Thanks! > > On Tue, Jun 11, 2019, 11:31 AM Lukasz Cwik wrote: > >> Welcome, I tried to add you but there were multiple accounts with your >> na

Re: Contributor permission request for Apache Beam Jira

2019-06-11 Thread Lukasz Cwik
Welcome, I tried to add you but there were multiple accounts with your name. What is your JIRA id? On Mon, Jun 10, 2019 at 4:22 PM Andy Wang wrote: > Hello, > > My name is Andy and I'd like to contribute where I can to the project. > Been using the tool for over a year now and would like to

  1   2   3   4   5   6   7   8   9   >