Re: Jenkins jobs not running for my PR 10438

2020-01-21 Thread Kirill Kozlov
Thank you Rui and Ankur!

On Tue, Jan 21, 2020 at 4:14 PM Ankur Goenka  wrote:

> Done
>
> On Tue, Jan 21, 2020 at 4:03 PM Kirill Kozlov 
> wrote:
>
>> Forgot to paste a link to a PR: https://github.com/apache/beam/pull/10649
>>
>> On Tue, Jan 21, 2020 at 3:47 PM Rui Wang  wrote:
>>
>>> Done but I am not seeing tests are triggered by those two commands.
>>>
>>> On Tue, Jan 21, 2020 at 3:42 PM Kirill Kozlov 
>>> wrote:
>>>
>>>> Hello again!
>>>>
>>>> Could someone trigger tests on this PR please?
>>>> *Run SQL postcommit*
>>>> *Run JavaBeamZetaSQL PreCommit*
>>>>
>>>> On Mon, Jan 20, 2020 at 2:38 PM Kirill Kozlov 
>>>> wrote:
>>>>
>>>>> Thank you, Ismaël!
>>>>>
>>>>> On Mon, Jan 20, 2020 at 2:14 PM Ismaël Mejía 
>>>>> wrote:
>>>>>
>>>>>> done
>>>>>>
>>>>>> On Mon, Jan 20, 2020 at 9:27 PM Kirill Kozlov <
>>>>>> kirillkoz...@google.com> wrote:
>>>>>>
>>>>>>> Hello Beam community,
>>>>>>>
>>>>>>> Can a committer re-run tests for this PR please?
>>>>>>> https://github.com/apache/beam/pull/10440
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> On Fri, Jan 17, 2020 at 4:44 PM Tomo Suzuki 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you, Ahmet.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 17, 2020 at 7:22 PM Tomo Suzuki 
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hi Beam committer,
>>>>>>>> >
>>>>>>>> > I appreciate if somebody can trigger the following checks for
>>>>>>>> https://github.com/apache/beam/pull/10631
>>>>>>>> >
>>>>>>>> > Run JavaPortabilityApi PreCommit
>>>>>>>> > Run Java PostCommit
>>>>>>>> > Run Java HadoopFormatIO Performance Test
>>>>>>>> > Run BigQueryIO Streaming Performance Test Java
>>>>>>>> > Run Dataflow ValidatesRunner
>>>>>>>> > Run Spark ValidatesRunner
>>>>>>>> > Run SQL Postcommit
>>>>>>>> >
>>>>>>>> > Regards,
>>>>>>>> > Tomo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Tomo
>>>>>>>>
>>>>>>>


Re: Jenkins jobs not running for my PR 10438

2020-01-21 Thread Kirill Kozlov
Forgot to paste a link to a PR: https://github.com/apache/beam/pull/10649

On Tue, Jan 21, 2020 at 3:47 PM Rui Wang  wrote:

> Done but I am not seeing tests are triggered by those two commands.
>
> On Tue, Jan 21, 2020 at 3:42 PM Kirill Kozlov 
> wrote:
>
>> Hello again!
>>
>> Could someone trigger tests on this PR please?
>> *Run SQL postcommit*
>> *Run JavaBeamZetaSQL PreCommit*
>>
>> On Mon, Jan 20, 2020 at 2:38 PM Kirill Kozlov 
>> wrote:
>>
>>> Thank you, Ismaël!
>>>
>>> On Mon, Jan 20, 2020 at 2:14 PM Ismaël Mejía  wrote:
>>>
>>>> done
>>>>
>>>> On Mon, Jan 20, 2020 at 9:27 PM Kirill Kozlov 
>>>> wrote:
>>>>
>>>>> Hello Beam community,
>>>>>
>>>>> Can a committer re-run tests for this PR please?
>>>>> https://github.com/apache/beam/pull/10440
>>>>>
>>>>> Thank you!
>>>>>
>>>>> On Fri, Jan 17, 2020 at 4:44 PM Tomo Suzuki 
>>>>> wrote:
>>>>>
>>>>>> Thank you, Ahmet.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 17, 2020 at 7:22 PM Tomo Suzuki 
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi Beam committer,
>>>>>> >
>>>>>> > I appreciate if somebody can trigger the following checks for
>>>>>> https://github.com/apache/beam/pull/10631
>>>>>> >
>>>>>> > Run JavaPortabilityApi PreCommit
>>>>>> > Run Java PostCommit
>>>>>> > Run Java HadoopFormatIO Performance Test
>>>>>> > Run BigQueryIO Streaming Performance Test Java
>>>>>> > Run Dataflow ValidatesRunner
>>>>>> > Run Spark ValidatesRunner
>>>>>> > Run SQL Postcommit
>>>>>> >
>>>>>> > Regards,
>>>>>> > Tomo
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Tomo
>>>>>>
>>>>>


Re: Jenkins jobs not running for my PR 10438

2020-01-21 Thread Kirill Kozlov
Hello again!

Could someone trigger tests on this PR please?
*Run SQL postcommit*
*Run JavaBeamZetaSQL PreCommit*

On Mon, Jan 20, 2020 at 2:38 PM Kirill Kozlov 
wrote:

> Thank you, Ismaël!
>
> On Mon, Jan 20, 2020 at 2:14 PM Ismaël Mejía  wrote:
>
>> done
>>
>> On Mon, Jan 20, 2020 at 9:27 PM Kirill Kozlov 
>> wrote:
>>
>>> Hello Beam community,
>>>
>>> Can a committer re-run tests for this PR please?
>>> https://github.com/apache/beam/pull/10440
>>>
>>> Thank you!
>>>
>>> On Fri, Jan 17, 2020 at 4:44 PM Tomo Suzuki  wrote:
>>>
>>>> Thank you, Ahmet.
>>>>
>>>>
>>>> On Fri, Jan 17, 2020 at 7:22 PM Tomo Suzuki  wrote:
>>>> >
>>>> > Hi Beam committer,
>>>> >
>>>> > I appreciate if somebody can trigger the following checks for
>>>> https://github.com/apache/beam/pull/10631
>>>> >
>>>> > Run JavaPortabilityApi PreCommit
>>>> > Run Java PostCommit
>>>> > Run Java HadoopFormatIO Performance Test
>>>> > Run BigQueryIO Streaming Performance Test Java
>>>> > Run Dataflow ValidatesRunner
>>>> > Run Spark ValidatesRunner
>>>> > Run SQL Postcommit
>>>> >
>>>> > Regards,
>>>> > Tomo
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Tomo
>>>>
>>>


PreCommit Java Portability is failing

2020-01-21 Thread Kirill Kozlov
Hello beam developers!

Jenkins job page:
https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Cron/
The following tasks fails: `
*:runners:google-cloud-dataflow-java:buildAndPushDockerContainer*`
with error:
*'command 'docker'' finished with non-zero exit value 1*

Is it possible that this task is failing due to recent INFRA changes?

-
Kirill


Re: Jenkins jobs not running for my PR 10438

2020-01-20 Thread Kirill Kozlov
Thank you, Ismaël!

On Mon, Jan 20, 2020 at 2:14 PM Ismaël Mejía  wrote:

> done
>
> On Mon, Jan 20, 2020 at 9:27 PM Kirill Kozlov 
> wrote:
>
>> Hello Beam community,
>>
>> Can a committer re-run tests for this PR please?
>> https://github.com/apache/beam/pull/10440
>>
>> Thank you!
>>
>> On Fri, Jan 17, 2020 at 4:44 PM Tomo Suzuki  wrote:
>>
>>> Thank you, Ahmet.
>>>
>>>
>>> On Fri, Jan 17, 2020 at 7:22 PM Tomo Suzuki  wrote:
>>> >
>>> > Hi Beam committer,
>>> >
>>> > I appreciate if somebody can trigger the following checks for
>>> https://github.com/apache/beam/pull/10631
>>> >
>>> > Run JavaPortabilityApi PreCommit
>>> > Run Java PostCommit
>>> > Run Java HadoopFormatIO Performance Test
>>> > Run BigQueryIO Streaming Performance Test Java
>>> > Run Dataflow ValidatesRunner
>>> > Run Spark ValidatesRunner
>>> > Run SQL Postcommit
>>> >
>>> > Regards,
>>> > Tomo
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-01-20 Thread Kirill Kozlov
Hello Beam community,

Can a committer re-run tests for this PR please?
https://github.com/apache/beam/pull/10440

Thank you!

On Fri, Jan 17, 2020 at 4:44 PM Tomo Suzuki  wrote:

> Thank you, Ahmet.
>
>
> On Fri, Jan 17, 2020 at 7:22 PM Tomo Suzuki  wrote:
> >
> > Hi Beam committer,
> >
> > I appreciate if somebody can trigger the following checks for
> https://github.com/apache/beam/pull/10631
> >
> > Run JavaPortabilityApi PreCommit
> > Run Java PostCommit
> > Run Java HadoopFormatIO Performance Test
> > Run BigQueryIO Streaming Performance Test Java
> > Run Dataflow ValidatesRunner
> > Run Spark ValidatesRunner
> > Run SQL Postcommit
> >
> > Regards,
> > Tomo
>
>
>
> --
> Regards,
> Tomo
>


Re: Jenkins job execution policy

2020-01-14 Thread Kirill Kozlov
Thanks for working on this!

I have noticed that tests run for new PRs and force-pushed commits, but if
a test fails due to a flake I am unable to re-run it (ex: "Run Java
PreCommit").
PR that has this issue: https://github.com/apache/beam/pull/10369.
Is this intended behaviour?

-
Kirill

On Tue, Jan 14, 2020 at 3:20 PM Luke Cwik  wrote:

> Does the approval list live beyond the lifetime of the jenkins machine (my
> initial impression is that the approval list disappears on Jenkins machine
> restart)?
>
> Also, I imagine that ASF wants an explicit way to see who is approved and
> who is denied which the plugin doesn't seem to allow.
>
> On Tue, Jan 14, 2020 at 3:11 PM Pablo Estrada  wrote:
>
>> I've merged https://github.com/apache/beam/pull/10582 to unblock
>> existing contributors that are having trouble getting their PRs tested
>> without committer help. We can discuss Kai's suggestion.
>>
>> Looking at https://github.com/jenkinsci/ghprb-plugin, it seems like the
>> 'add to whitelist' comment adds contributors permanently to a whitelist.
>> This would have more immediate results than the .asf.yaml file. It would be
>> harder to track who has the privilege, but it doesn't sound like that
>> concerns us, right?
>>
>> Thoughts from others?
>> -P.
>>
>> On Tue, Jan 14, 2020 at 1:43 PM Kai Jiang  wrote:
>>
>>> Nice! I took a look at Beam Jenkins job properties (
>>> CommonJobProperties.groovy#L108-L111
>>> )
>>> and it uses jenkinsci/ghprb-plugin
>>> .
>>> It should support the feature of comment add to whitelist from
>>> committer on PR for adding new contributors to whitelist.
>>> Adding github account to asf yaml might be a little heavy if this
>>> approach works. Could we also test on this method?
>>>
>>> Best,
>>> Kai
>>>
>>>
>>> On Tue, Jan 14, 2020 at 1:16 PM Pablo Estrada 
>>> wrote:
>>>
 I've added all the PR authors for the last 1000 merged PRs. I will
 merge in a few minutes. I'll have a follow up change to document this on
 the website.

 On Tue, Jan 14, 2020 at 11:29 AM Luke Cwik  wrote:

> Should we scrape all past contributors and add them to the file?
>
> On Tue, Jan 14, 2020 at 11:18 AM Kenneth Knowles 
> wrote:
>
>> Nice! This will help at least temporarily. We can see if it grows too
>> unwieldy. It is still unfriendly to newcomers.
>>
>> Kenn
>>
>> On Tue, Jan 14, 2020 at 11:06 AM Pablo Estrada 
>> wrote:
>>
>>> Hi all,
>>> ASF INFRA gave us a middle-ground sort of workaround for this by
>>> using .asf.yaml files. Here's a change to implement it[1], and
>>> documentation for the .asf.yaml file[2], as well as the relevant section
>>> for our case[3].
>>>
>>> I'll check the docs in [2] well before pushing to merge, just to be
>>> sure we're not breaking anything.
>>>
>>> [1] https://github.com/apache/beam/pull/10582
>>> [2]
>>> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
>>>
>>> [3]
>>> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-JenkinsPRWhitelisting
>>>
>>> On Mon, Jan 13, 2020 at 3:29 PM Luke Cwik  wrote:
>>>
 I'm for going back to the status quo where anyone's PR ran the
 tests automatically or to the suggestion where users marked as 
 contributors
 had their tests run automatically (with the documentation update about 
 how
 link your github/jira accounts).

 On Mon, Jan 13, 2020 at 2:45 AM Michał Walenia <
 michal.wale...@polidea.com> wrote:

> Hi,
> I wanted to decouple the conversation about solutions to the issue
> from job execution requests.
> We have 131 open PRs right now and 64 committers with job running
> privileges. From what I counted, more than 80 of those PRs are not 
> authored
> by committers.
> I think that having committers answer testing and retesting
> requests is a temporary solution and a permanent one should be 
> decided upon
> soon. While it's an inconvenience for contributors familiar with the
> workings of the project and the community, newcomers might be put off 
> by
> the fact that the tests don't run automatically on their pull requests
> (this is an industry standard which IMO should be upheld also in 
> Beam). The
> barrier of finding one of committers who is active and willing to 
> trigger
> their tests can make the entry to the project more difficult.
>
> I believe that the solution proposed by Kenneth in the Jira thread
> 

Re: Failing Java PostCommit for Dataflow runner

2020-01-13 Thread Kirill Kozlov
Thanks for taking care of this!

On Mon, Jan 13, 2020 at 2:00 PM Boyuan Zhang  wrote:

> This problem is addressed by PR10564. Now all affected tests are back to
> green.
>
> On Mon, Jan 13, 2020 at 1:11 PM Luke Cwik  wrote:
>
>> This is being tracked in BEAM-9083
>>
>> On Mon, Jan 13, 2020 at 11:23 AM Boyuan Zhang  wrote:
>>
>>> Thanks Kirill! I'm going to look into it.
>>>
>>> On Mon, Jan 13, 2020 at 11:18 AM Kirill Kozlov 
>>> wrote:
>>>
>>>> Hello everyone!
>>>>
>>>> I have noticed that Jenkins tests for Dataflow runner [1] are failing
>>>> with a runtime exception. It looks like the issue originated here [2],
>>>> failed Dataflow job [3].
>>>> We should look into fixing it.
>>>>
>>>> Failing test:
>>>> :runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest »
>>>> org.apache.beam.sdk.transforms.ParDoTest$TimerTests » testOutputTimestamp
>>>> (29.723s)
>>>>
>>>> Exception thrown:
>>>>
>>>> java.lang.RuntimeException: Workflow failed. Causes: Unknown streaming 
>>>> source: test_stream
>>>>
>>>>
>>>> [1]
>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/
>>>> [2]
>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/4157/
>>>> [3]
>>>> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2020-01-08_14_36_07-180707589785292440?project=apache-beam-testing
>>>>
>>>


Failing Java PostCommit for Dataflow runner

2020-01-13 Thread Kirill Kozlov
Hello everyone!

I have noticed that Jenkins tests for Dataflow runner [1] are failing with
a runtime exception. It looks like the issue originated here [2], failed
Dataflow job [3].
We should look into fixing it.

Failing test:
:runners:google-cloud-dataflow-java:validatesRunnerLegacyWorkerTest »
org.apache.beam.sdk.transforms.ParDoTest$TimerTests » testOutputTimestamp
(29.723s)

Exception thrown:

java.lang.RuntimeException: Workflow failed. Causes: Unknown streaming
source: test_stream


[1]
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/
[2]
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/4157/
[3]
https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2020-01-08_14_36_07-180707589785292440?project=apache-beam-testing


[Design Proposal] DataStore SQL Connector

2020-01-08 Thread Kirill Kozlov
Hello everyone!

I have written up a proposal [1] for a DataStore SQL connector. I would
love to hear comments and suggestions from the Beam dev community!

A quick summary:
DataStore [2] is a NoSQL database with a dynamic schema, where entities
(documents) are stored in Kinds (databases). Each entity has a key (unique
identifier) [3], which consists of a partition id and a path (can be used
to link to other entities).
Proposal is to implement *PTransforms* to perform conversion between
DataStore data types and Beam types: *EntityToRow* and *RowToEntity*.
PTransforms can be used independently or via *SQL Table* (which will use
them implicitly).
SQL Table should allow users to specify a row schema field name to store
the key in.

[1]
https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?usp=sharing
[2] https://cloud.google.com/datastore/
[3]
https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers


Re: Edit access to Wiki

2020-01-03 Thread Kirill Kozlov
Thank you!

On Fri, Jan 3, 2020 at 10:39 AM Luke Cwik  wrote:

> I have added you. Happy editing.
>
> On Fri, Jan 3, 2020 at 10:31 AM Kirill Kozlov 
> wrote:
>
>> Hello everyone!
>>
>> I was hoping to add a design doc for SQL push-down [1] to the Wiki page
>> [2], but I need edit access.
>> What is the process for obtaining edit access?
>> My wiki username is: Kirill Kozlov
>>
>> [1]
>> https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit
>> [2] https://cwiki.apache.org/confluence/display/BEAM/Design+Documents
>>
>


Edit access to Wiki

2020-01-03 Thread Kirill Kozlov
Hello everyone!

I was hoping to add a design doc for SQL push-down [1] to the Wiki page
[2], but I need edit access.
What is the process for obtaining edit access?
My wiki username is: Kirill Kozlov

[1]
https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit
[2] https://cwiki.apache.org/confluence/display/BEAM/Design+Documents


Re: Intellij Issue in Imports

2019-12-27 Thread Kirill Kozlov
Hi Zohaib,

What command are you using to build and run the project?
Does building from command line work?

Try checking for problems via "File -> Project Structure -> Problems".

Intellij setup notes can be found here:
https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit


On Thu, Dec 26, 2019 at 11:20 PM Zohaib Baig 
wrote:

> Hi,
>
> According to the documentation, I have setup Beam project from scratch in
> IntelliJ. Seems like some files have issues in imports and were not able to
> build, eventually, I wasn't able to test it through IDE (Working On
> Windows).
>
> Is there any other configuration that I am missing?
>
> Thank you.
>
> [image: image.png]
>
>
>
> --
>
> *Muhammad Zohaib Baig*
> Senior Software Engineer
> Mobile: +92 3443060266 <+92%20344%203060266>
> Skype: mzobii.baig
>
> 
>


Quota limitation for Java tests

2019-12-09 Thread Kirill Kozlov
Hello everyone!

It looks like JavaPostCommit Jenkins tests [1] are failing due to CPU quota
limitations.
Could someone please look into this?

[1]
https://builds.apache.org/job/beam_PostCommit_Java/4838/testReport/junit/org.apache.beam.examples.complete/TrafficMaxLaneFlowIT/testE2ETrafficMaxLaneFlow/

--
Kirill


Jenkins jobs are not being displayed on GitHub

2019-12-03 Thread Kirill Kozlov
Hello everyone!

It looks like for the PRs created within the last 30 minutes status of
Jenkins jobs is not being displayed.
Seed job appears to be stuck [1]. #5293 is waiting for #5292 to finish,
but #5293 shows that it is complete.

[1] https://builds.apache.org/job/beam_SeedJob/

--
Kirill


Re: Update on push-down for SQL IOs.

2019-12-02 Thread Kirill Kozlov
>
> ParquetIO, CassandraIO/HBaseIO/BigTableIO (all should be about the same),
> JdbcIO, IcebergIO (doesn't exist yet, but is basically generalized
> schema-aware files as I understand it).

I think that adding Jiras with a tag "starter" for implementing push-down
for all of the IO interfaces listed above would be a good start. The design
doc does have an example for project push-down; predicate push-down example
is in the works.
Hopefully, that will make it straight forward for new contributors.

On Thu, Nov 28, 2019 at 4:32 AM David Morávek 
wrote:

> Nice, this should bring a great performance improvement for SQL. Thanks
> for your work!
>
> On Thu, Nov 28, 2019 at 6:33 AM Kenneth Knowles  wrote:
>
>> Nice! Thanks for the very thorough summary. I think this will be a really
>> good thing for Beam. Most of the IO sources are very highly optimized for
>> querying and will do it more efficiently than the Beam runner when the
>> structure of the query matches. I'm really excited to see the performance
>> measurements.
>>
>> A have a thought: your update did not mention a few extensions that we
>> might consider: ParquetIO, CassandraIO/HBaseIO/BigTableIO (all should be
>> about the same), JdbcIO, IcebergIO (doesn't exist yet, but is basically
>> generalized schema-aware files as I understand it). Are these things you
>> are thinking about doing, or would these be Jiras that could potentially be
>> tagged "starter"? They seem complex but maybe your framework will make it
>> feasible for someone with slightly less experience to implement new
>> versions of what you have already finished?
>>
>> Kenn
>>
>> On Tue, Nov 26, 2019 at 12:19 PM Kirill Kozlov 
>> wrote:
>>
>>> Hello everyone!
>>>
>>> I have been working on a push-down feature and would like to give a
>>> brief update on what is done and is still under works.
>>>
>>> *Things that are done*:
>>> General API for SQL IOs to provide information about what
>>> filters/projects they support [1]:
>>> - *Filter* can be unsupported, supported with field reordering, and
>>> supported without field reordering.
>>> - *Predicate* is broken down into a conjunctive normal form (CNF) and
>>> passed to a validator class to check what parts are supported or
>>> unsupported by an IO.
>>>
>>> A Calcite rule [2] that checks for push-down support, constructs a new
>>> IO source Rel [3] with pushed-down projects and filters when applicable,
>>> and preserves unsupported filters/projects.
>>>
>>> BigQuery should perform push-down when running queries in DIRECT_READ
>>> method [4].
>>>
>>> MongoDB project push-down support is in a PR [5] and predicate support
>>> will be added soon.
>>>
>>>
>>> *Things that are in progress:*
>>> Documenting how developers can enable push-down for IOs that support it.
>>>
>>> Documenting certain limitation for BigQuery push-down (ex: comparing
>>> values of 2 columns is not supported at the moment, so it is being
>>> preserved in a Calc).
>>>
>>> Updating google-cloud-bigquerystorage to 0.117.0-beta. Earlier versions
>>> have a gRPC message limit set to ~11MB, which may cause some pipelies to
>>> break when reading from a table with rows larger than the limit.
>>>
>>> Adding some sort of performance tests to run continuously to
>>> measure speed-up and detect regressions.
>>>
>>> Deciding how cost should be computed for the IO source Rel with
>>> push-down [6]. Right now the following formula is used: cost of an IO
>>> without push-down minus the normalized (between 0.0 and 1.0) benefit of a
>>> performed push-down.
>>> The challenge here is to make the change to the cost small enough to not
>>> break join reordering, but large enough to make the optimizer favor
>>> pushed-down IO.
>>>
>>>
>>> If you have any suggestions/questions/concerns I would love to hear them.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java#L36
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
>>> [3]
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java
>>> [4]
>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L128
>>> [5] https://github.com/apache/beam/pull/10095
>>> [6] https://github.com/apache/beam/pull/10060
>>>
>>> --
>>> Kirill
>>>
>>


Update on push-down for SQL IOs.

2019-11-26 Thread Kirill Kozlov
Hello everyone!

I have been working on a push-down feature and would like to give a brief
update on what is done and is still under works.

*Things that are done*:
General API for SQL IOs to provide information about what filters/projects
they support [1]:
- *Filter* can be unsupported, supported with field reordering, and
supported without field reordering.
- *Predicate* is broken down into a conjunctive normal form (CNF) and
passed to a validator class to check what parts are supported or
unsupported by an IO.

A Calcite rule [2] that checks for push-down support, constructs a new IO
source Rel [3] with pushed-down projects and filters when applicable, and
preserves unsupported filters/projects.

BigQuery should perform push-down when running queries in DIRECT_READ
method [4].

MongoDB project push-down support is in a PR [5] and predicate support will
be added soon.


*Things that are in progress:*
Documenting how developers can enable push-down for IOs that support it.

Documenting certain limitation for BigQuery push-down (ex: comparing values
of 2 columns is not supported at the moment, so it is being preserved in a
Calc).

Updating google-cloud-bigquerystorage to 0.117.0-beta. Earlier versions
have a gRPC message limit set to ~11MB, which may cause some pipelies to
break when reading from a table with rows larger than the limit.

Adding some sort of performance tests to run continuously to
measure speed-up and detect regressions.

Deciding how cost should be computed for the IO source Rel with push-down
[6]. Right now the following formula is used: cost of an IO without
push-down minus the normalized (between 0.0 and 1.0) benefit of a performed
push-down.
The challenge here is to make the change to the cost small enough to not
break join reordering, but large enough to make the optimizer favor
pushed-down IO.


If you have any suggestions/questions/concerns I would love to hear them.

[1]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java#L36
[2]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
[3]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java
[4]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L128
[5] https://github.com/apache/beam/pull/10095
[6] https://github.com/apache/beam/pull/10060

--
Kirill


Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-20 Thread Kirill Kozlov
[ ] Beaver
[ ] Hedgehog
[X] Lemur
[X] Owl
[ ] Salmon
[ ] Trout
[ ] Robot dinosaur
[ ] Firefly
[ ] Cuttlefish
[ ] Dumbo Octopus
[X] Angler fish


On Wed, Nov 20, 2019, 08:38 Cyrus Maden  wrote:

> Here's my vote, but I'm curious about the distinction between salmon and
> trout mascots :)
>
> [ ] Beaver
> [ ] Hedgehog
> [ X] Lemur
> [ ] Owl
> [ X] Salmon
> [ ] Trout
> [ ] Robot dinosaur
> [ X] Firefly
> [ ] Cuttlefish
> [ ] Dumbo Octopus
> [ X] Angler fish
>
> On Wed, Nov 20, 2019 at 11:24 AM Allan Wilson 
> wrote:
>
>>
>>
>> On 11/20/19, 8:44 AM, "Ryan Skraba"  wrote:
>>
>> *** Vote for as many as you like, using this checklist as a template
>> 
>>
>> [] Beaver
>> [X] Hedgehog
>> [X ] Lemur
>> [ ] Owl
>> [ ] Salmon
>> [] Trout
>> [ ] Robot dinosaur
>> [ ] Firefly
>> [ ] Cuttlefish
>> [ ] Dumbo Octopus
>> [ ] Angler fish
>>
>>
>>


Re: [Discuss] Beam mascot

2019-11-15 Thread Kirill Kozlov
Angler fish? Found a few animated examples that may look interesting [1, 2,
3, 4].
[1] https://www.pinterest.com/pin/121175046208927340/
[2] https://www.pinterest.com/pin/353180795779533157/
[3] https://www.pinterest.com/pin/121175046208927334/
[4]
https://graphicriver.net/item/cartoon-angler-fish/16828573?ref=gvector_id=1413785108_back=true

On Fri, Nov 15, 2019 at 11:39 AM Aizhamal Nurmamat kyzy 
wrote:

> That last sketch of a cuttlefish with a hat is really really good. I vote
> for that.
>
> On Fri, Nov 15, 2019 at 10:33 AM Kenneth Knowles  wrote:
>
>> PSA: cuttlefish tentacles look more like their face, not their legs.
>> Please find attached illustrations to that effect. And also evidence that
>> they are occasionally fancy. Definitely *not* from a respectable designer
>> or anyone who could execution a professional logo.
>>
>> I do think the PMC needs to shepherd this. I would suggest starting with
>> an approval vote [1] for animal (no plant ideas?) and then an ASF-style
>> vote to record the validated result. From there, we can go through more
>> standard design process.
>>
>> Kenn
>>
>> [1] https://www.electionscience.org/library/approval-voting/
>>
>> On Fri, Nov 15, 2019 at 6:54 AM Maximilian Michels 
>> wrote:
>>
>>> It's great we're having this discussion and we came up with a lot of
>>> great ideas through it. However, it is unclear how we proceed from here.
>>> Certainly, we can't let designers work with an open-ended discussion on
>>> the type of mascot.
>>>
>>> Personally, I'm fine with _any_ kind of mascot, as long as it is made by
>>> a decent designer. The respectable designer I'm talking to was very
>>> generous to donate these sketches for free. I don't think that this is
>>> to be expected at all. Coming back to the original idea of hiring
>>> multiple designers, I don't see how we will pay those designers to all
>>> come up with a logo, unless we all donate money.
>>>
>>> The sketches I've sent might not look like much because there is still a
>>> decent process involved in coming up with the final logo which looks
>>> good in different sizes, possibly with many iterations. So far I've
>>> tried to incorporate as many of the suggestions here, but for the sake
>>> of protecting the designer, I think I'll have to stop doing that. It
>>> simply won't work.
>>>
>>> How to proceed from here? I think _any_ professionally executed logo
>>> will do, this is really more about the community agreeing for the
>>> greater good. Ultimately, I think the PMC might have to decide on how to
>>> proceed here.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 15.11.19 13:59, Hannah Jiang wrote:
>>> > I also vote for firefly.
>>> >
>>> > On Wed, Nov 13, 2019 at 1:38 PM Valentyn Tymofieiev <
>>> valen...@google.com
>>> > > wrote:
>>> >
>>> > I like the firefly sketch a lot, it's my favorite so far.
>>> >
>>> > On Wed, Nov 13, 2019 at 12:58 PM Robert Bradshaw
>>> > mailto:rober...@google.com>> wrote:
>>> >
>>> > #37 from the sketches was the cuttlefish, which would put it at
>>> > (with
>>> > 4 votes) the most popular so far. I do like the firefly too.
>>> >
>>> > On Wed, Nov 13, 2019 at 12:03 PM Gris Cuevas >> > > wrote:
>>> >  >
>>> >  > Hi everyone, so exciting to see this convo taking off!
>>> >  >
>>> >  > I loved Alex's firefly! -- it can have so many cool
>>> > variations, and as a stuffed animal is very original.
>>> >  >
>>> >  > Other ideas I had are a caterpillar because it looks like a
>>> > data pipeline, lol or the beaver!
>>> >  >
>>> >  > Feedback on the current sketches.
>>> >  > - They resemble a lot either the Octocat or Totoro [1]
>>> >  > - I'd like to see something that is completely new and
>>> > original, pancakes from gRPC is an example[2]
>>> >  > - Something more caricaturesque is better, since we can
>>> dress
>>> > it up and modify it
>>> >  >
>>> >  > To move forward, it seems that the animals that were winners
>>> > in this thread are:
>>> >  >
>>> >  > Beaver (3)
>>> >  > Firefly (3)
>>> >  > Lemur or votes on sketches (3)
>>> >  > Cuttlefish (2)
>>> >  > Hedgehog (1)
>>> >  > Salmon (1)
>>> >  >
>>> >  > So let's focus the design proposals on the three winners:
>>> > beaver, firefly and lemur.
>>> >  > I'd like to see more options on beavers and fireflies, the
>>> > current sketch options I think are based on the cuttlefish and
>>> > the lemur (?)
>>> >  >
>>> >  > I think it's a good idea to get sketches from multiple
>>> > designers, since like someone else pointed out, we'll get
>>> > variations based on their personal styles, and someone else
>>> > mentioned here that we have teams/companies
>>> 

Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-15 Thread Kirill Kozlov
Congratulations, Brian!

On Fri, Nov 15, 2019 at 10:56 AM Udi Meiri  wrote:

> Congrats Brian!
>
>
> On Fri, Nov 15, 2019 at 10:47 AM Ruoyun Huang  wrote:
>
>> Congrats Brian!
>>
>> On Fri, Nov 15, 2019 at 10:41 AM Robin Qiu  wrote:
>>
>>> Congrats, Brian!
>>>
>>> On Fri, Nov 15, 2019 at 10:02 AM Daniel Oliveira 
>>> wrote:
>>>
 Congratulations Brian! It's well deserved.

 On Fri, Nov 15, 2019, 9:37 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Congratulations, Brian!
>
> On 15 Nov 2019, at 18:27, Rui Wang  wrote:
>
> Congrats!
>
>
> -Rui
>
> On Fri, Nov 15, 2019 at 8:16 AM Thomas Weise  wrote:
>
>> Congratulations!
>>
>>
>> On Fri, Nov 15, 2019 at 6:34 AM Connell O'Callaghan <
>> conne...@google.com> wrote:
>>
>>> Well done Brian!!!
>>>
>>> Kenn thank you for sharing
>>>
>>> On Fri, Nov 15, 2019 at 6:31 AM Cyrus Maden 
>>> wrote:
>>>
 Congrats Brian!

 On Fri, Nov 15, 2019 at 5:25 AM Ismaël Mejía 
 wrote:

> Congratulations Brian!
> Happy to see this happening and eager to see more of your work!
>
> On Fri, Nov 15, 2019 at 11:02 AM Ankur Goenka 
> wrote:
> >
> > Congrats Brian!
> >
> > On Fri, Nov 15, 2019, 2:42 PM Jan Lukavský 
> wrote:
> >>
> >> Congrats Brian!
> >>
> >> On 11/15/19 9:58 AM, Reza Rokni wrote:
> >>
> >> Great news!
> >>
> >> On Fri, 15 Nov 2019 at 15:09, Gleb Kanterov 
> wrote:
> >>>
> >>> Congratulations!
> >>>
> >>> On Fri, Nov 15, 2019 at 5:44 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> 
>  Congratulations, Brian!
> 
>  On Thu, Nov 14, 2019 at 6:25 PM jincheng sun <
> sunjincheng...@gmail.com> wrote:
> >
> > Congratulation Brian!
> >
> > Best,
> > Jincheng
> >
> > Kyle Weaver  于2019年11月15日周五 上午7:19写道:
> >>
> >> Thanks for your contributions and congrats Brian!
> >>
> >> On Thu, Nov 14, 2019 at 3:14 PM Kenneth Knowles <
> k...@apache.org> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Please join me and the rest of the Beam PMC in welcoming a
> new committer: Brian Hulette
> >>>
> >>> Brian introduced himself to dev@ earlier this year and
> has been contributing since then. His contributions to Beam include
> explorations of integration with Arrow, standardizing coders, 
> portability
> for schemas, and presentations at Beam events.
> >>>
> >>> In consideration of Brian's contributions, the Beam PMC
> trusts him with the responsibilities of a Beam committer [1].
> >>>
> >>> Thank you, Brian, for your contributions and looking
> forward to many more!
> >>>
> >>> Kenn, on behalf of the Apache Beam PMC
> >>>
> >>> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >>
> >>
> >>
> >> --
> >>
> >> This email may be confidential and privileged. If you received
> this communication by mistake, please don't forward it to anyone else,
> please erase all copies and attachments, and please let me know that 
> it has
> gone to the wrong person.
> >>
> >> The above terms reflect a potential business arrangement, are
> provided solely as a basis for further discussion, and are not 
> intended to
> be and do not constitute a legally binding obligation. No legally 
> binding
> obligations will be created, implied, or inferred until an agreement 
> in
> final form is executed in writing by all parties involved.
>

>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>


Re: (Question) SQL integration tests for MongoDb

2019-11-08 Thread Kirill Kozlov
Alternative approach would be to manually start a MongoDb service like it
is done here:
https://github.com/apache/beam/blob/master/sdks/java/io/mongodb/src/test/java/org/apache/beam/sdk/io/mongodb/MongoDbIOTest.java#L85
Doing it like in the example above should solve my problem.

Thank you for your help!

-
Kirill


On Fri, Nov 8, 2019, 03:09 Michał Walenia 
wrote:

> Won't the command be analogous to what is in the Javadoc of
> MongoDbReadWriteIT? It seems that you don't need to use
> `enableJavaPerformanceTesting`, as `integrationTest` task parses
> `pipelineOptions` parameter.
>
>
>
> On Thu, Nov 7, 2019 at 6:40 PM Kirill Kozlov 
> wrote:
>
>> Thank you for your response!
>>
>> I want to make sure that when tests run on Jenkins they get supplied with
>> pipelines options containing hostName and Port of a running MongoDb service.
>>
>> I'm writing integration test for a MongoDb SQL adapter (located
>> sdks/java/extensions/sql/meta/provider/mongodb).
>> I cannot simply use `enableJavaPerformanceTesting()`, because tests for
>> all adapters are run via the same build file, which has a custom task
>> "integrationTest".
>>
>> I hope this better explains the problem I am trying to tackle.
>>
>> -
>> Kirill
>>
>> On Thu, Nov 7, 2019, 03:36 Michał Walenia 
>> wrote:
>>
>>> Hi,
>>>
>>> What exactly are you trying to do? If you're looking for a way to
>>> provide pipeline options to the MongoDBIOIT, you can pass them via command
>>> line like this:
>>>
>>> ./gradlew integrationTest -p sdks/java/io/mongodb
>>>
>>>
>>>
>>> * -DintegrationTestPipelineOptions='[   "--mongoDBHostName=1.2.3.4",
>>>  "--mongoDBPort=27017",   "--mongoDBDatabaseName=mypass",
>>>  "--numberOfRecords=1000" ]'*
>>>--tests org.apache.beam.sdk.io.mongodb.MongoDbIOIT
>>>-DintegrationTestRunner=direct
>>>
>>> Gradle tasks created with `enableJavaPerformanceTesting()` will allow
>>> such options to be passed.
>>>
>>> If you're trying to do something else, please let me know.
>>>
>>> Regards
>>> Michal
>>>
>>> On Thu, Nov 7, 2019 at 1:44 AM Kirill Kozlov 
>>> wrote:
>>>
>>>> Hi everyone!
>>>>
>>>> I am trying to test MongoDb Sql Table, but not quite sure how to pass
>>>> pipeline options with the hostName, port, and databaseName used by Jenkins.
>>>>
>>>> It looks like the integration test for MongoDbIO Connector obtain those
>>>> values from the
>>>> 'beam/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy' file
>>>> via calling the following methods in the 'gradle.build' file:
>>>> provideIntegrationTestingDependencies()
>>>> enableJavaPerformanceTesting()
>>>>
>>>> Sql build file already has a task with the name 'integrationTest'
>>>> defined and does not let us do `enableJavaPerformanceTesting()`.
>>>>
>>>>  I would really appreciate if someone could provide me with a couple of
>>>> pointers on getting this to work.
>>>>
>>>> -
>>>> Kirill
>>>>
>>>
>>>
>>> --
>>>
>>> Michał Walenia
>>> Polidea <https://www.polidea.com/> | Software Engineer
>>>
>>> M: +48 791 432 002 <+48791432002>
>>> E: michal.wale...@polidea.com
>>>
>>> Unique Tech
>>> Check out our projects! <https://www.polidea.com/our-work>
>>>
>>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>


Re: (Question) SQL integration tests for MongoDb

2019-11-07 Thread Kirill Kozlov
Thank you for your response!

I want to make sure that when tests run on Jenkins they get supplied with
pipelines options containing hostName and Port of a running MongoDb service.

I'm writing integration test for a MongoDb SQL adapter (located
sdks/java/extensions/sql/meta/provider/mongodb).
I cannot simply use `enableJavaPerformanceTesting()`, because tests for all
adapters are run via the same build file, which has a custom task
"integrationTest".

I hope this better explains the problem I am trying to tackle.

-
Kirill

On Thu, Nov 7, 2019, 03:36 Michał Walenia 
wrote:

> Hi,
>
> What exactly are you trying to do? If you're looking for a way to provide
> pipeline options to the MongoDBIOIT, you can pass them via command line
> like this:
>
> ./gradlew integrationTest -p sdks/java/io/mongodb
>
>
>
> * -DintegrationTestPipelineOptions='[   "--mongoDBHostName=1.2.3.4",
>  "--mongoDBPort=27017",   "--mongoDBDatabaseName=mypass",
>  "--numberOfRecords=1000" ]'*
>--tests org.apache.beam.sdk.io.mongodb.MongoDbIOIT
>-DintegrationTestRunner=direct
>
> Gradle tasks created with `enableJavaPerformanceTesting()` will allow such
> options to be passed.
>
> If you're trying to do something else, please let me know.
>
> Regards
> Michal
>
> On Thu, Nov 7, 2019 at 1:44 AM Kirill Kozlov 
> wrote:
>
>> Hi everyone!
>>
>> I am trying to test MongoDb Sql Table, but not quite sure how to pass
>> pipeline options with the hostName, port, and databaseName used by Jenkins.
>>
>> It looks like the integration test for MongoDbIO Connector obtain those
>> values from the
>> 'beam/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy' file
>> via calling the following methods in the 'gradle.build' file:
>> provideIntegrationTestingDependencies()
>> enableJavaPerformanceTesting()
>>
>> Sql build file already has a task with the name 'integrationTest' defined
>> and does not let us do `enableJavaPerformanceTesting()`.
>>
>>  I would really appreciate if someone could provide me with a couple of
>> pointers on getting this to work.
>>
>> -
>> Kirill
>>
>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>


(Question) SQL integration tests for MongoDb

2019-11-06 Thread Kirill Kozlov
Hi everyone!

I am trying to test MongoDb Sql Table, but not quite sure how to pass
pipeline options with the hostName, port, and databaseName used by Jenkins.

It looks like the integration test for MongoDbIO Connector obtain those
values from the
'beam/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy' file
via calling the following methods in the 'gradle.build' file:
provideIntegrationTestingDependencies()
enableJavaPerformanceTesting()

Sql build file already has a task with the name 'integrationTest' defined
and does not let us do `enableJavaPerformanceTesting()`.

 I would really appreciate if someone could provide me with a couple of
pointers on getting this to work.

-
Kirill


Re: [Question] Cannot resolve symbol 'AutoValue_KafkaIO_WriteRecords'

2019-10-28 Thread Kirill Kozlov
I found this document [1] helpful when setting up an Intellij to work with
Beam.
Make sure that (File | Settings | Build, Execution, Deployment | Build
Tools | Gradle | Runner) has a "Delegate IDE build/run actions to Gradle"
checked and "Run test using" is set to "Gradle Test Runner".
Under (Project Structure | Problems) there should be no import problems, if
there are - try reimporting a project from scratch.
Also, try running the following target in the beam folder: ./gradlew idea
Hope that helps.

[1]
https://docs.google.com/document/d/18eXrO9IYll4oOnFb53EBhOtIfx-JLOinTWZSIBFkLk4/edit

On Sun, Oct 27, 2019 at 7:09 AM lan.liang  wrote:

> Hi Team:
> I Open Beam Project in IDEA, But it not working.
>
> Looks like there are some files missing,
>
> In org.apache.beam.sdk.io.kafka.KafkaIO,i just got err:
>
> 
>Cannot resolve symbol 'AutoValue_KafkaIO_WriteRecords'
>
>Cannot resolve symbol 'AutoValue_KafkaIO_Write'
>
> 
>
>  If i missed something,remind me please.
>
> Thanks!
>
>
>
>
> - lan.liang
>


[DISCUSS] Beam SQL filter push-down

2019-09-30 Thread Kirill Kozlov
The objective is to create a universal way for Beam SQL IO APIs to support
filter/project push-down.
A proposed way to achieve that is by introducing an interface
responsible for identifying what portion(s) of a Calc can be moved down to
IO layer. Also, adding following methods to a BeamSqlTable interface to
pass necessary parameters to IO APIs:
- BeamSqlTableFilter supportsFilter(RexNode program, RexNode filter)
- Boolean supportsProjects()
- PCollection buildIOReader(PBegin begin, BeamSqlTableFilter
filters, List fieldNames)

Please feel free to provide feedback and suggestions on this proposal.
Thank you!

Here is a more complete design doc:
https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing

--
Kirill Kozlov


New contributor to BEAM SQL

2019-09-16 Thread Kirill Kozlov
Hello everyone!

My name is Kirill Kozlov, I recently joined a Dataflow team at Google and
will be working on SQL filter pushdown.
Can I get permission to work issues in jira, my username is: kirillkozlov
Looking forward to developing Beam together!

Thank you,
Kirill Kozlov