Contributor permissions for Beam JIRA

2019-05-20 Thread Kamil Wasilewski
Hi, I am Kamil Wasilewski and I would like to start making improvements to the Beam Python SDK. I would like to assign JIRA issues to myself. Can someone mark me as a contributor? Thanks, Kamil

Re: Contributor permissions for Beam JIRA

2019-05-20 Thread Kamil Wasilewski
Here's my username: kamilwu Kamil On Mon, May 20, 2019 at 11:47 AM Maximilian Michels wrote: > Hi Kamil, > > That sounds great. Could you send me your JIRA username? I couldn't find > your account on JIRA. > > Thanks, > Max > > On 20.05.19 11:27, Kamil Was

[PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-08-23 Thread Kamil Wasilewski
Hi all, Recently we did some research on how to visualize IO performance tests, Nexmark and Load test results better and how to detect regressions automatically in an easy way using tools dedicated for the job. We'd like to share a proposal with you:

Re: [PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-09-05 Thread Kamil Wasilewski
> worked on internal benchmarking at Google - would you take a look please? > > On Fri, Aug 23, 2019 at 3:22 AM Kamil Wasilewski < > kamil.wasilew...@polidea.com> wrote: > >> Hi all, >> >> Recently we did some research on how to visualize IO performance tests,

Reading from BigQuery on portable runners in Python SDK

2019-10-01 Thread Kamil Wasilewski
Hi all, At the moment, we have a BigQuery native source for Python SDK, which can be used only by Dataflow runner. Consequently, it doesn't work on portable runners, such as Flink. Recently I have written a prototypical source which implements iobase.BoundedSource, so that other runners can read

Re: Reading from BigQuery on portable runners in Python SDK

2019-10-01 Thread Kamil Wasilewski
If anyone is interested, here is a link to my code: https://github.com/kamilwu/beam/tree/bounded-source-for-bq On Tue, Oct 1, 2019 at 11:17 AM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: > Hi all, > > At the moment, we have a BigQuery native source for Python SDK, whic

Beam Python fails to run on macOS 10.15?

2019-10-10 Thread Kamil Wasilewski
Hi all, I've recently updated my macOS to 10.15 Catalina. Since then, I have the following error when I try to import apache_beam package (both in python 2.7 and 3.x): >>> import apache_beam [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already exists in database: [libprotobu

Python2 postcommit broken

2019-10-16 Thread Kamil Wasilewski
Hello all, I've noticed that since last two days all Python2 post commit tests have failed[1]. Logs show it's because of an exception in the sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount task: RuntimeError: IOError: [Errno 2] No such file or directory: '/tmp/beam-temp-py-wor

Re: Timeouting jobs do not notify builds@

2019-10-17 Thread Kamil Wasilewski
Thanks Łukasz for adressing this! In my opinion, such builds are de facto failed and should be treated in the same way. I think it's fine for now to notify builds@, but in the future we should also take this problem into account when developing an anomaly detection system. I left some comments on t

Re: [PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-11-07 Thread Kamil Wasilewski
lo Estrada wrote: > >> Thanks Kamil for bringing this! >> +Manisha Bhardwaj +Mark Liu have >> worked on internal benchmarking at Google - would you take a look please? >> >> On Fri, Aug 23, 2019 at 3:22 AM Kamil Wasilewski < >> kamil.wasilew...@polidea.com&g

Re: [PROPOSAL] Storing, displaying and detecting anomalies in test results

2019-11-07 Thread Kamil Wasilewski
Thanks for spotting this! It should be working fine now. On Thu, Nov 7, 2019 at 5:40 PM Dan Gazineu wrote: > Thank you for the update Kamil! Please fix the sharing options in the new > doc. > > On Thu, Nov 7, 2019 at 7:22 AM Kamil Wasilewski < > kamil.wasilew...@polidea.com&g

Wiki edit access

2019-12-04 Thread Kamil Wasilewski
Hi all, I'm going to make a contribution to documentation pages that describe testing framework in Beam. May I get access to edit the Wiki? My username is kamilwu. Kamil

Re: Wiki edit access

2019-12-04 Thread Kamil Wasilewski
Thanks! On Wed, Dec 4, 2019 at 5:07 PM Maximilian Michels wrote: > Done ;) > > On 04.12.19 15:49, Kamil Wasilewski wrote: > > Hi all, > > > > I'm going to make a contribution to documentation pages that describe > > testing framework in Beam. May I get ac

Poor Python 3.x performance on Dataflow?

2019-12-06 Thread Kamil Wasilewski
Hi all, Python 2.7 won't be maintained past 2020 and that's why we want to migrate all Python performance tests in Beam from Python 2.7 to Python 3.7. However, I was surprised by seeing that after switching Dataflow tests to Python 3.x they are a few times slower. For example, the same ParDo test

Performance drops in Python PortableRunner tests

2019-12-20 Thread Kamil Wasilewski
Hi all, We have a couple of Python load tests running on Flink in which we are testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine operations. Recently, I've discovered that the runtime of all those tests rose up significantly. It happened between the 6th and 7th of December (t

Re: Performance drops in Python PortableRunner tests

2020-01-02 Thread Kamil Wasilewski
te data. This much of a change is quite surprising. Where is >> the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10 >> iterations | Batch" and how does one run it? >> >> On Fri, Dec 20, 2019 at 6:50 AM Kamil Wasilewski >> wrote: >> >

Re: [ANNOUNCE] New committer: Kasia Kucharczyk

2020-01-03 Thread Kamil Wasilewski
Congrats Kasia, good job! On Fri, Jan 3, 2020 at 8:22 AM Michał Walenia wrote: > Congratulations, Kasia! > > On Thu, Jan 2, 2020 at 6:52 PM Valentyn Tymofieiev > wrote: > >> Congratulations, Kasia! >> >> On Thu, Jan 2, 2020 at 1:23 AM Katarzyna Kucharczyk < >> ka.kucharc...@gmail.com> wrote: >>

Re: Poor Python 3.x performance on Dataflow?

2020-01-10 Thread Kamil Wasilewski
t;> This is very surprising--I would expect the times to quite similar. Do >> you have profiles for where the (difference in) time is spent? With >> differences like these, I wonder if there are issues with container >> setup (e.g. some things not being installed or cached) for P

[DISCUSS] Integrate Google Cloud AI functionalities

2020-01-14 Thread Kamil Wasilewski
Hi all, We’d like to implement a set of PTransforms that would allow users to use some of the Google Cloud AI services in Beam pipelines. Here's the full list of services and functionalities we’d like to integrate Beam with: * Video Intelligence [1] * Cloud Natural Language [2] * Cloud AI Plat

Re: [DISCUSS] Integrate Google Cloud AI functionalities

2020-01-15 Thread Kamil Wasilewski
then once. >> >> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía wrote: >> >>> Nice idea, IO looks like a good place for them but there is another path >>> that could fit this case: `sdks/java/extensions`, some module like >>> `google-cloud-platform-ai` in

Re: [DISCUSS] Autoformat python code with Black

2020-01-23 Thread Kamil Wasilewski
Thank you Michał for creating the ticket. I have some free time and I'd like to volunteer myself for this task. Indeed, it looks like there's consensus for `yapf`, so I'll try `yapf` first. Best, Kamil On Thu, Jan 23, 2020 at 10:37 AM Michał Walenia wrote: > Hi all, > I created a JIRA issue fo

Re: [DISCUSS] Autoformat python code with Black

2020-01-24 Thread Kamil Wasilewski
t; COALESCE_BRACKETS, that will conform more to the style we are already > (mostly) following. > > > On Thu, Jan 23, 2020 at 1:59 AM Kamil Wasilewski > wrote: > > > > Thank you Michał for creating the ticket. I have some free time and I'd > like to volunte

Re: [ANNOUNCE] New committer: Michał Walenia

2020-01-27 Thread Kamil Wasilewski
Congrats, Michał! On Tue, Jan 28, 2020 at 3:03 AM Udi Meiri wrote: > Congratulations Michał! > > On Mon, Jan 27, 2020 at 3:49 PM Chamikara Jayalath > wrote: > >> Congrats Michał! >> >> On Mon, Jan 27, 2020 at 2:59 PM Reza Rokni wrote: >> >>> Congratulations buddy! >>> >>> On Tue, 28 Jan 2020,

Re: [DISCUSS] Autoformat python code with Black

2020-02-06 Thread Kamil Wasilewski
>> strict. What do you think? >>> >> > >>> >>> >> > >>> On Thu, Jan 23, 2020 at 8:37 PM Robert Bradshaw < >>> rober...@google.com> wrote: >>> >> > >>>> >>> >> > &

Re: [ANNOUNCE] New committer: Alex Van Boxel

2020-02-20 Thread Kamil Wasilewski
Congrats! On Thu, Feb 20, 2020 at 2:58 PM Karolina Rosół wrote: > Congratulations Alex! > > Karolina Rosół > Polidea | Project Manager > > M: +48 606 630 236 <+48606630236> > E: karolina.ro...@polidea.com > [image: Polidea] > > Check out our

Re: Are there extra Beam Python test matchers available?

2020-02-28 Thread Kamil Wasilewski
Hi, You can use matchers from hamcrest module. For example, assuming your pcollection consists of a single list, you can use something like this to test if it contains a subset: import hamcrest as hc assert_that(pcoll, matches_all([hc.has_items(1, 3, 6, 10)])) Thanks, Kamil On Tue, Feb 25, 2020

Re: [ANNOUNCE] New Committer: Kamil Wasilewski

2020-03-02 Thread Kamil Wasilewski
>>>>> wrote: >>>>>> >>>>>>> Congrats, Kamil! >>>>>>> >>>>>>> On Fri, Feb 28, 2020 at 9:53 AM Valentyn Tymofieiev < >>>>>>> valen...@google.com> wrote: >>>>>>&g

Re: Python Static Typing: Next Steps

2020-03-03 Thread Kamil Wasilewski
+1 for enabling mypy as a precommit job This however could be a good occasion to rework the current PythonLint job. Since yapf has been introduced, some of the checks made by pylint/flake are now unnecessary and could be dismantled. This would speed-up PythonLint quite a lot. I volunteer to help w

Re: Upcoming Apache Beam meetups in Warsaw

2020-03-03 Thread Kamil Wasilewski
work closely with three Apache Beam > committers (Katarzyna Kucharczyk, Kamil Wasilewski and Michał Walenia). > > Together with folks from Polidea we'd like to announce our plans towards > the upcoming Apache Beam meetups in Warsaw. The next date for the Beam > meetup we're c

Re: [EXTERNAL] Re: Java Build broken

2020-03-03 Thread Kamil Wasilewski
I had the same problem, it seems that removing Gradle's cache (`rm -rf ~/.gradle/caches`) solved the issue. On Tue, Feb 25, 2020 at 4:33 PM Pulasthi Supun Wickramasinghe < pulasthi...@gmail.com> wrote: > Hi Stefan, > > Yes, I am also still getting this error on my local setup, However, > strangel

Re: Jenkins jobs not running for my PR 10438

2020-03-30 Thread Kamil Wasilewski
Done. On Mon, Mar 30, 2020 at 4:58 PM Tomo Suzuki wrote: > Hi Beam committers, > (Thanks Ahmet) > > Would you retrigger the following 5 jobs for > https://github.com/apache/beam/pull/11156 ? > Run Dataflow ValidatesRunner > Run Java PreCommit > Run Portable_Python PreCommit > Run PythonFormatter

Re: CommunityMetrics precommit failing

2020-04-07 Thread Kamil Wasilewski
By the way, we have a JIRA for this: https://issues.apache.org/jira/browse/BEAM-8409 It would be great if someone with SSH access to the Jenkins workers took care of that. On Tue, Apr 7, 2020 at 3:04 PM Michał Walenia wrote: > Hi all, > I noticed that recently the community metrics precommit jo

New Grafana dashboards

2020-05-13 Thread Kamil Wasilewski
Hello everyone, I'm pleased to announce that we've just moved dashboards gathering performance test execution times from Perfkit Explorer to Grafana. Here's a link to new dashboards: http://metrics.beam.apache.org *Why Grafana?* Grafana is an open source visualization tool. It offers better user

Re: New Grafana dashboards

2020-05-15 Thread Kamil Wasilewski
! >> -P. >> >> On Wed, May 13, 2020 at 8:43 AM Tyson Hamilton >> wrote: >> >>> The dashboards look great! Thank you. >>> >>> It would be nice if the 'Useful Links' section included links to Apache >>> Beam related material

Re: [ANNOUNCE] New committer: Robin Qiu

2020-05-19 Thread Kamil Wasilewski
Congrats! On Tue, May 19, 2020 at 5:33 PM Jan Lukavský wrote: > Congrats Robin! > On 5/19/20 5:01 PM, Tyson Hamilton wrote: > > Congratulations! > > On Tue, May 19, 2020 at 6:10 AM Omar Ismail wrote: > >> Congrats! >> >> On Tue, May 19, 2020 at 5:00 AM Gleb Kanterov wrote: >> >>> Congratulatio

Re: Fwd: [DISCUSSION] Use github actions for python wheels ?

2020-06-01 Thread Kamil Wasilewski
"unistd.h" C header is present on POSIX systems (MacOS and Linux), but not on Windows, therefore you can't build a wheel for Windows. I took a look and "statesampler_fast.pyx" uses "unistd.h" only because of the `usleep` function. Unless we use C++ which offers [1], the solution would be to search

Re: Kafka IO performance tests leaving behind disks on GCP

2020-06-05 Thread Kamil Wasilewski
I opened a PR to stop spawning new leftover GKE disks: https://github.com/apache/beam/pull/11931. Please take a look. On Fri, Jun 5, 2020 at 12:55 AM Udi Meiri wrote: > Hi, > I opened a bug on what seems to be leftover GKE disk images: > https://issues.apache.org/jira/browse/BEAM-10145 > > Can a

Running Beam pipeline using Spark on YARN

2020-06-23 Thread Kamil Wasilewski
Hi all, I'm trying to run a Beam pipeline using Spark on YARN. My pipeline is written in Python, so I need to use a portable runner. Does anybody know how I should configure job server parameters, especially --spark-master-url? Is there anything else I need to be aware of while using such setup?

Re: Running Beam pipeline using Spark on YARN

2020-06-24 Thread Kamil Wasilewski
ains untested as far as I know :) >>> >>> As I indicated in a comment, you can set --output_executable_path to >>> create a jar that you can then submit to yarn via spark-submit. >>> >>> If you can get this working, I'd additionally like to script t

Re: Python SDK ReadFromKafka: Timeout expired while fetching topic metadata

2020-07-13 Thread Kamil Wasilewski
I'd like to bump this thread up since I get the same error when trying to read from Kafka in Python SDK: *java.lang.UnsupportedOperationException: The ActiveBundle does not have a registered bundle checkpoint handler.* Can someone familiar with cross-language and Flink verify the problem? I use t

Re: Python SDK ReadFromKafka: Timeout expired while fetching topic metadata

2020-07-14 Thread Kamil Wasilewski
Never mind, I found this thread on user list: https://lists.apache.org/thread.html/raeb69afbd820fdf32b3cf0a273060b6b149f80fa49c7414a1bb60528%40%3Cuser.beam.apache.org%3E, which answers my question. On Mon, Jul 13, 2020 at 4:10 PM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: >

Re: Monitoring performance for releases

2020-07-21 Thread Kamil Wasilewski
> > The prerequisite is that we have all the stats in one place. They seem > to be scattered across http://metrics.beam.apache.org and > https://apache-beam-testing.appspot.com. > > Would it be possible to consolidate the two, i.e. use the Grafana-based > dashboard to load the legacy stats? I'm p

Is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK?

2020-08-20 Thread Kamil Wasilewski
Hi all, As I stated in the title, is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK? I've got a streaming pipeline in Python which suffers from OutOfMemory exceptions (I'm using Dataflow). Switching to highmem workers solved the issue, but I wonder if I can set a limit of thr

Re: Is there an equivalent for --numberOfWorkerHarnessThreads in Python SDK?

2020-08-21 Thread Kamil Wasilewski
/github.com/apache/beam/blob/017936f637b119f0b0c0279a226c9f92a2cf4f15/sdks/python/apache_beam/options/pipeline_options.py#L834 >> >> On Thu, Aug 20, 2020 at 7:33 AM Kamil Wasilewski < >> kamil.wasilew...@polidea.com> wrote: >> >>> Hi all, >>> >>> As

Shutting down Perfkit Explorer

2020-09-18 Thread Kamil Wasilewski
Hello everyone, Beam support for Python 2 is coming to an end. Consequently, we should make sure no Python 2 applications are running as a part of Beam's infrastructure. As you may know, Beam is still hosting a Python 2 application on Google App Engine. This application is Perfkit Explorer [1]. P

Re: Shutting down Perfkit Explorer

2020-09-22 Thread Kamil Wasilewski
Thanks. The application has been disabled. On Fri, Sep 18, 2020 at 8:46 PM Ahmet Altay wrote: > +1. Thank you for the cleanup. > > On Fri, Sep 18, 2020 at 8:24 AM Tyson Hamilton wrote: > >> +1 to removing, thank you Kamil. >> >> On Fri, Sep 18, 20

Re: Shutting down Perfkit Explorer

2020-09-23 Thread Kamil Wasilewski
t;>> to add a Cloud Datastore or Cloud Firestore database. Note that Cloud >>>> Datastore or Cloud Firestore always have an associated App Engine app and >>>> this app must not be disabled. >>>> New failure: >>>> https://ci-beam.apache.org/job/

Re: Shutting down Perfkit Explorer

2020-09-24 Thread Kamil Wasilewski
pect if we don't we'll just have a repeat of "we shut down app >> engine since it was just running a hello world, and the Datastore tests >> died". >> >> On Wed, Sep 23, 2020, 8:14 AM Kamil Wasilewski < >> kamil.wasilew...@polidea.com> wrote:

Re: Shutting down Perfkit Explorer

2020-09-24 Thread Kamil Wasilewski
The message has been updated: https://apache-beam-testing.appspot.com/ On Thu, Sep 24, 2020 at 12:07 PM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: > I'm not sure if such a jira issue exists. I'm also not convinced that we > need a new one. New jira means th

Re: Shutting down Perfkit Explorer

2020-09-25 Thread Kamil Wasilewski
fter the battle, but how should we access the dashboards > now (load tests, nexmark etc...)? > > Are the dashboards lost or have they migrated to another environment ? > > Thanks > > Etienne > On 24/09/2020 13:51, Robert Burke wrote: > > LGTM > Good clear me

Adding CombineFn setup and teardown

2020-10-08 Thread Kamil Wasilewski
Hi all, Recently, I've been working on adding CombineFn.setup and CombineFn.teardown to Python SDK [1] (Java SDK is also on the list, but for now, it's only Python). Here's my implementation: https://github.com/apache/beam/pull/13048. Would someone be willing to take a look? The functionality is

Re: BeamSQL and Beam equivalent -- examples?

2020-11-02 Thread Kamil Wasilewski
Hi Austin, Did you take a look at Nexmark tests? Some of them have two versions: Beam and BeamSQL. It sounds like this is what you are looking for. For example: https://github.com/kamilwu/beam/blob/master/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query0.java http

Re: Jenkins trigger phrase "run seed job" not working?

2020-11-09 Thread Kamil Wasilewski
Has anyone noticed problems with running "run seed job" by committers when the author of the Pull Request is NOT a committer? For example: https://github.com/apache/beam/pull/13242. Neither I nor Valentyn could trigger the job. Does it mean that it's an author's username that really matters, not a

Re: [REMOTE WORKSHOPS] Introduction to Apache Beam - remote workshops Dec 3rd and Dec 10th

2020-11-20 Thread Kamil Wasilewski
S at >> Polidea and I'm working with great Apache Beam committers Michał Walenia & >> Kamil Wasilewski who will be carrying out the introductory remote workshops >> to Apache Beam on *Dec 3rd* and *Dec 10th*. >> >> If you're interested in taking part in the