from:"Mikhail Gryzykhin"

Re: [Proposal] Adding Python Coverage Reports To CI/CD

2020-07-06 Thread Mikhail Gryzykhin

I wouldn't consider build time as a blocker to add report. Even if build
time is rather slower, we can run coverage report periodically as a
separate job and still get use of it.

On Mon, Jul 6, 2020, 2:38 PM Robert Bradshaw  wrote:

> This sounds useful to me, and as it's purely informational would be a
> low cost to try out. The one question is how it would impact build
> runtimes--do you have an estimate for what the cost is here?
>
> On Sun, Jul 5, 2020 at 1:14 PM Saavan Nanavati  wrote:
> >
> > Hey everyone,
> >
> > Currently, during the Jenkins build process, we don't generate any code
> coverage reports for the Python SDK. This email includes a proposal to
> generate python coverage reports during the pre-commit build, upload them
> to codecov.io for analysis and visualization, and automatically post the
> resulting stats back to GitHub PRs to help developers decide whether their
> tests need revision.
> >
> > You can view/comment on the proposal here, or the full text of the
> proposal at the end of this email. Let me know what you think, or if there
> are any suggestions for improvements. Thanks!
> >
> > Python Coverage Reports For CI/CD
> >
> > Author: Saavan Nanavati (saa...@google.com)
> >
> > Reviewer: Udi Meiri (eh...@google.com)
> >
> >
> > Overview
> >
> >
> > This is a proposal for generating code coverage reports for the Python
> SDK during Jenkins’ pre-commit phase, and uploading them to codecov.io
> for analysis, with integration back into GitHub using the service’s sister
> app.
> >
> >
> > This would extend the pre-commit build time but provide valuable
> information for developers to revise and improve their tests before their
> PR is merged, rather than after when it’s less likely developers will go
> back to improve their coverage numbers.
> >
> >
> > This particular 3rd party service has a litany of awesome benefits:
> >
> > It’s free for open-source projects
> >
> > It seamlessly integrates into GitHub via a comment-bot (example here)
> >
> > It overlays coverage report information directly onto GitHub code using
> Sourcegraph
> >
> > It requires no changes to Jenkins, thereby reducing the risk of breaking
> the live test-infra
> >
> > It’s extensible and can later be used for the Java & Go SDKs if it
> proves to be awesome
> >
> > It has an extremely responsive support team that’s happy to help
> open-source projects
> >
> >
> > A proof-of-concept can be seen here and here.
> >
> >
> > Goals
> >
> >
> > Provide coverage stats for the Python SDK that update with every
> pre-commit run
> >
> > Integrate these reports into GitHub so developers can take advantage of
> the information
> >
> > Open a discussion for how these coverage results can be utilized in code
> reviews
> >
> >
> > Non-Goals
> >
> > Calculate coverage statistics using external tests located outside of
> the Python SDK
> >
> >
> > This is ideal, but would require not only merging multiple coverage
> reports together but, more importantly, waiting for these tests to be
> triggered in the first place. The main advantage of calculating coverage
> during pre-commit is that developers can revise their PRs before merging,
> which is not guaranteed if this is a goal.
> >
> >
> > However, it could be something to explore for the future.
> >
> > Background
> >
> >
> > Providing code coverage for the Python SDK has been a problem since at
> least 2017 (BEAM-2762) with the previous solution being to calculate
> coverage in post-commit with coverage.py, and then sending the report to
> coveralls.io which would post to GitHub. At some point, this solution
> broke and the Tox environment used to compute coverage, cover, was turned
> off but still remains in the codebase.
> >
> >
> > There have been 4 main barriers, in the past, to re-implementing
> coverage that will be addressed here.
> >
> >
> > It’s difficult to unify coverage for some integration tests, especially
> ones that rely on 3rd party dependencies like GCP since it’s not possible
> to calculate coverage for the dependencies.
> >
> > As stated earlier, this is a non-goal for the proposal.
> >
> >
> > The test reporter outputs results in the same directory which sometimes
> causes previous results to be overwritten. This occurs when using different
> parameters for the same test (e.g. running a test with Dataflow vs
> DirectRunner).
> >
> > This was mainly a post-commit problem but it does require exploration
> since it could be an issue for pre-commit. However, even in the worst case,
> the coverage numbers would still be valuable since you can still see how
> coverage changed relatively between commits even if the absolute numbers
> are slightly inaccurate.
> >
> >
> > It’s time-consuming and non-trivial to modify and test changes to Jenkins
> >
> > We don’t need to - this proposal integrates directly with codecov.io,
> making Jenkins an irrelevant part of the testing infrastructure with
> regards to code coverage - it’s not just easier, it’s better because it
>

Python precommits fail due to compilation of pandas with cython

2020-06-26 Thread Mikhail Gryzykhin

Hi all,

Multiple python precommit jobs are failing due to cython failing to compile
pandas ([BEAM-10333] )
currently. I tried to debug this, but no success. Can someone help take a
look?

Thank you,
Mikhail.

Re: [DISCUSS] Dealing with @Ignored tests

2020-05-12 Thread Mikhail Gryzykhin

I wonder if we can add graph to community metrics showing ignored tests by
language/project/overall. That can be useful to see focus area.

On Tue, May 12, 2020 at 12:28 PM Jan Lukavský  wrote:

> +1, visualizing the number of ignored tests in a graph seems useful. Even
> better with some slices (e.g. per runner, module, ...).
> On 5/12/20 8:02 PM, Ahmet Altay wrote:
>
> +1 to generate a report instead of removing these tests. A report like
> this could help us with prioritization. It is easier to address issues when
> we can quantify how much of a problem it is.
>
> I am curious what we can do to incentivize reducing the number of
> flaky/ignored tests? A report itself might provide incentive, it is
> rewarding to see ignored tests numbers go down over time.
>
> On Mon, May 11, 2020 at 8:30 AM Luke Cwik  wrote:
>
>> Deleting ignored tests does lead us to losing the reason as to why the
>> test case was around so I would rather keep it around. I think it would be
>> more valuable to generate a report that goes on the website/wiki showing
>> stability of the modules (num tests, num passed, num skipped, num failed
>> (running averages over the past N runs)). We had discussed doing something
>> like this for ValidatesRunner so we could show which runner supports what
>> automatically.
>>
>> On Mon, May 11, 2020 at 12:53 AM Jan Lukavský  wrote:
>>
>>> I think that we do have Jira issues for ignored test, there should be no
>>> problem with that. The questionable point is that when test gets Ignored,
>>> people might consider the problem as "less painful" and postpone the
>>> correct solution until ... forever. I'd just like to discuss if people see
>>> this as an issue. If yes, should we do something about that, or if no,
>>> maybe we can create a rule that test marked as Ignored for long time might
>>> be deleted, because apparently is only a dead code.
>>> On 5/6/20 6:30 PM, Kenneth Knowles wrote:
>>>
>>> Good point.
>>>
>>> The raw numbers are available in the test run output. See
>>> https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PreCommit_Java_Cron/2718/testReport/
>>>  for
>>> the "skipped" column.
>>> And you get the same on console or Gradle Scan:
>>> https://scans.gradle.com/s/ml3jv5xctkrmg/tests?collapse-all
>>> This would be good to review periodically for obvious trouble spots.
>>>
>>> But I think you mean something more detailed. Some report with columns:
>>> Test Suite, Test Method, Jira, Date Ignored, Most Recent Update
>>>
>>> I think we can get most of this from Jira, if we just make sure that
>>> each ignored test has a Jira and they are all labeled in a consistent way.
>>> That would be the quickest way to get some result, even though it is not
>>> perfectly automated and audited.
>>>
>>> Kenn
>>>
>>> On Tue, May 5, 2020 at 2:41 PM Jan Lukavský  wrote:
>>>
 Hi,

 it seems we are accumulating test cases (see discussion in [1]) that
 are
 marked as @Ignored (mostly due to flakiness), which is generally
 undesirable. Associated JIRAs seem to be open for a long time, and this
 might generally cause that we loose code coverage. Would anyone have
 idea on how to visualize these Ignored tests better? My first idea
 would
 be something similar to "Beam dependency check report", but that seems
 to be not the best example (which is completely different issue :)).

 Jan

 [1] https://github.com/apache/beam/pull/11614

Re: metrics site not available

2020-04-24 Thread Mikhail Gryzykhin

Service back up.

On Fri, Apr 24, 2020 at 9:16 PM Mikhail Gryzykhin  wrote:

> Service is down with "insufficient cpu". It seems that we are not the only
> ones hit by the issue. I will try to restart pods and see if it helps.
>
> --Mikhail
>
> On Fri, Apr 24, 2020 at 4:42 PM Ahmet Altay  wrote:
>
>> Hi all,
>>
>> I am not able to access http://metrics.beam.apache.org/, I also
>> tried 104.154.241.245 and it has the same problem. As of last week both of
>> them were working.
>>
>> Does anyone know what might be the issue? Is there an open JIRA for this?
>>
>> Thank you!
>> Ahmet
>>
>

Re: metrics site not available

2020-04-24 Thread Mikhail Gryzykhin

Service is down with "insufficient cpu". It seems that we are not the only
ones hit by the issue. I will try to restart pods and see if it helps.

--Mikhail

On Fri, Apr 24, 2020 at 4:42 PM Ahmet Altay  wrote:

> Hi all,
>
> I am not able to access http://metrics.beam.apache.org/, I also
> tried 104.154.241.245 and it has the same problem. As of last week both of
> them were working.
>
> Does anyone know what might be the issue? Is there an open JIRA for this?
>
> Thank you!
> Ahmet
>

Re: Validates Runner on Java 11 and the Java SDK Harness

2020-04-22 Thread Mikhail Gryzykhin

+Paweł Pasterz 

On Wed, Apr 22, 2020, 13:23 Pablo Estrada  wrote:

> +Mikhail Gryzykhin  fyi : )
>
> On Tue, Apr 21, 2020 at 1:25 PM Ismaël Mejía  wrote:
>
>> I have been working in recent days on enabling the CI tests for Java 11 in
>> Jenkins for Flink (based on latest release 1.10.x, already merged) and
>> Spark
>> (based on upcoming release 3.x.x tested locally), so far we have good
>> progress
>> for both classical runners with only one test failing the complete suite!
>> So
>> soon we would be able to announce that we support Java 11 for our most
>> popular
>> open source classical runners.
>>
>> The next step is to tackle the portable validates runner tests, and when I
>> looked at those I realized that we are not publishing the Java SDK
>> Harness based
>> on Java 11. I would like to know if someone might be interested on taking
>> this
>> task so we can publish the Java 11 SDK Harness docker image too maybe as
>> part of
>> the next release.
>>
>> Anyone who knows the harness container part interested on creating a
>> JIRA and working on a fix for this?
>>
>> Regards,
>> Ismaël
>>
>

Website publish jobs fail recently

2020-04-14 Thread Mikhail Gryzykhin

Hi all,

Have anyone seen the following error of website publish?


*16:33:47* jekyll 3.6.3 | Error: Permission denied @ dir_s_mkdir -
/repo/build/website/generated-local-content/security

I tried the failed target locally and it succeeded. Seems there's some
issue with jenkins configuration.

Regards,
Mikhail.

Re: Task :model:pipeline:compileJava fails locally

2020-04-07 Thread Mikhail Gryzykhin

That look like the issue. Thank you.

On Tue, Apr 7, 2020 at 6:31 PM Kyle Weaver  wrote:

> Related thread:
> https://lists.apache.org/thread.html/r611fe469106585c5da4c1b2d9ecccfad7e2062ef0ce3b396b5fb5230%40%3Cdev.beam.apache.org%3E
>
> Reuven: "I fixed this by explicitly setting JAVA_HOME to point to my
> jdk1.8 directory."
> That also worked for me.
>
> On Tue, Apr 7, 2020 at 2:23 PM Mikhail Gryzykhin 
> wrote:
>
>> Hi all,
>>
>> I get the following error [1] when trying to build beam locally. Does
>> anyone have any idea what I might have configured wrong on my end?
>>
>> Regards,
>> Mikhail
>>
>> [1]
>> /work/beam/myfork1$ ./gradlew build
>> .
>> > Task :model:pipeline:compileJava
>> /home/migryz/work/beam/myfork1/model/pipeline/build/generated/source/proto/main/java/org/apache/beam/model/pipeline/v1/MetricsApi.java:6:
>> error: cannot access Object
>> public final class MetricsApi {
>>  ^
>>   bad class file: /modules/java.base/java/lang/Object.class
>> class file has wrong version 55.0, should be 53.0
>> Please remove or make sure it appears in the correct subdirectory of
>> the classpath.
>> 1 error
>>
>> > Task :model:pipeline:compileJava FAILED
>> .
>>
>

Task :model:pipeline:compileJava fails locally

2020-04-07 Thread Mikhail Gryzykhin

Hi all,

I get the following error [1] when trying to build beam locally. Does
anyone have any idea what I might have configured wrong on my end?

Regards,
Mikhail

[1]
/work/beam/myfork1$ ./gradlew build
.
> Task :model:pipeline:compileJava
/home/migryz/work/beam/myfork1/model/pipeline/build/generated/source/proto/main/java/org/apache/beam/model/pipeline/v1/MetricsApi.java:6:
error: cannot access Object
public final class MetricsApi {
 ^
  bad class file: /modules/java.base/java/lang/Object.class
class file has wrong version 55.0, should be 53.0
Please remove or make sure it appears in the correct subdirectory of
the classpath.
1 error

> Task :model:pipeline:compileJava FAILED
.

Re: Tests not getting triggered

2020-03-26 Thread Mikhail Gryzykhin

I see queueing time up to one hour:
http://metrics.beam.apache.org/d/_TNndF2iz/pre-commit-test-latency?orgId=1

So estimate slow pre-commits reaction.

On Thu, Mar 26, 2020 at 8:03 PM Ankur Goenka  wrote:

> Seems to be running now with some delay.
>
> On Thu, Mar 26, 2020 at 12:33 PM Luke Cwik  wrote:
>
>> I saw upwards of a 20 min delay before anything was triggered on a couple
>> of PRs.
>>
>> On Thu, Mar 26, 2020 at 12:32 PM Ankur Goenka  wrote:
>>
>>> Hi,
>>>
>>> I think the tests for PRs are not getting triggered example:
>>> https://github.com/apache/beam/pull/10870
>>>
>>> Can someone take a look.
>>>
>>> Thanks,
>>> Ankur
>>>
>>

Re: [Proposal] Slowly Changing Dimensions and Distributed Map Side Inputs (in Dataflow)

2020-03-23 Thread Mikhail Gryzykhin

UPD:
I have updated doc with API suggestions, please check on relevant section
of the doc [1]
<https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit#heading=h.5e78hch3k732>

--Mikhail

[1]
https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg/edit#heading=h.5e78hch3k732

On Thu, Jan 16, 2020 at 2:52 AM Reza Rokni  wrote:

> +1 To this proposal, this is a very common pattern requirement from users.
> With the following current workaround having seen a lot of traction:
>
>
> https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs
>
> Making this process simpler for users and Out Of the Box, would be a great
> win!
>
> I would also mention that ideally we will also cover the large distributed
> side inputs, but a lot of the core cases for this comes down to Side inputs
> that do fit in memory. Perhaps worth putting priorities on the work with
> the smaller side input tables having precedence. Unless the work will cover
> both cases in the same way of course.
>
> Cheers
>
> Reza
>
> On Thu, 19 Dec 2019 at 07:14, Kenneth Knowles  wrote:
>
>> I do think that the implementation concerns around larger side inputs are
>> relevant to most runners. Ideally there would be no model change necessary.
>> Triggers are harder and bring in consistency concerns, which are even more
>> likely to be relevant to all runners.
>>
>> Kenn
>>
>> On Wed, Dec 18, 2019 at 11:23 AM Luke Cwik  wrote:
>>
>>> Most of the doc is about how to support distributed side inputs in
>>> Dataflow and doesn't really cover how the Beam model (accumulating,
>>> discarding, retraction) triggers impact what are the "contents" of a
>>> PCollection in time and how this proposal for a limited set of side input
>>> shapes can work to support larger side inputs in Dataflow.
>>>
>>> On Tue, Dec 17, 2019 at 2:28 AM Jan Lukavský  wrote:
>>>
>>>> Hi Mikhail,
>>>> On 12/17/19 10:43 AM, Mikhail Gryzykhin wrote:
>>>>
>>>> inline
>>>>
>>>> On Tue, Dec 17, 2019 at 12:59 AM Jan Lukavský  wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I actually thought that the proposal refers to Dataflow only. If this
>>>>> is supposed to be general, can we remove the Dataflow/Windmill specific
>>>>> parts and replace them with generic ones?
>>>>>
>>>>  I'll look into rephrasing doc to keep Dataflow/Windmill as example.
>>>>
>>>> Cool, thanks!
>>>>
>>>> I'd have two more questions:
>>>>>
>>>>>  a) the proposal is named "Slowly changing", why is the rate of change
>>>>> essential to the proposal? Once running on event time, that should not
>>>>> matter, or what am I missing?
>>>>>
>>>> Within this proposal, it is suggested to make a full snapshot of data
>>>> on every re-read. This is generally expensive and setting time event to
>>>> short interval might cause issues. Otherwise it is not essential.
>>>>
>>>> Understood. This relates to table-stream duality, where the
>>>> requirements might relax once you don't have to convert table to stream by
>>>> re-reading it, but by being able to retrieve updates as you go (example
>>>> would be reading directly from kafka or any other "commit log" 
>>>> abstraction).
>>>>
>>>>  b) The description says: 'User wants to solve a stream enrichment
>>>>> problem. In brief request sounds like: ”I want to enrich each event in 
>>>>> this
>>>>> stream by corresponding data from given table.”'. That is understandable,
>>>>> but would it be better to enable the user to express this intent directly
>>>>> (via Join operation)? The actual implementation might be runner (and
>>>>> input!) specific. The analogy is that when doing group-by-key operation,
>>>>> runner can choose hash grouping or sort-merge grouping, but that is not
>>>>> (directly) expressed in user code. I'm not saying that we should not have
>>>>> low-level transforms, just asking if it would be better to leave this
>>>>> decision to the runner (at least in some cases). It might be the case that
>>>>> we want to make core SDK as low level as possible (and as reasonable), I
>>>>> just want to make sure that that is really the intent.
>>>>>
>>>>

Re: dataflow job was working fine last night and it isn't now

2020-02-04 Thread Mikhail Gryzykhin

Hi Alan,

+Valentyn Tymofieiev  Can you verify if my assumption
is correct?

It seems that the problem might come from dill version mismatch.  Dill
version should match on worker and user code. Between Beam 2.17 and Beam
2.18 we upgraded dill version to 0.3.1.1 which has an incompatible format
with earlier versions.

Which version of dill do you use when submitting pipeline?

Try using dill version below 0.3.1 with Beam 2.17 and earlier. And Dill
0.3.1 or above with Beam 2.18 and above.

Regards,
--Mikhail.


On Tue, Feb 4, 2020 at 10:04 AM Alan Krumholz 
wrote:

> I tried breaking apart my pipeline. Seems the step that breaks it is:
> beam.io.WriteToBigQuery
>
> Let me see if I can create a self contained example that breaks to share
> with you
>
> Thanks!
>
> On Tue, Feb 4, 2020 at 9:53 AM Pablo Estrada  wrote:
>
>> Hm that's odd. No changes to the pipeline? Are you able to share some of
>> the code?
>>
>> +Udi Meiri  do you have any idea what could be going
>> on here?
>>
>> On Tue, Feb 4, 2020 at 9:25 AM Alan Krumholz 
>> wrote:
>>
>>> Hi Pablo,
>>> This is strange... it doesn't seem to be the last beam release as last
>>> night it was already using 2.19.0 I wonder if it was some release from the
>>> DataFlow team (not beam related):
>>> Job typeBatch
>>> Job status Succeeded
>>> SDK version
>>> Apache Beam Python 3.5 SDK 2.19.0
>>> Region
>>> us-central1
>>> Start timeFebruary 3, 2020 at 9:28:35 PM GMT-8
>>> Elapsed time5 min 11 sec
>>>
>>> On Tue, Feb 4, 2020 at 9:15 AM Pablo Estrada  wrote:
>>>
 Hi Alan,
 could it be that you're picking up the new Apache Beam 2.19.0 release?
 Could you try depending on beam 2.18.0 to see if the issue surfaces when
 using the new release?

 If something was working and no longer works, it sounds like a bug.
 This may have to do with how we pickle (dill / cloudpickle) - see this
 question
 https://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype
 Best
 -P.

 On Tue, Feb 4, 2020 at 6:22 AM Alan Krumholz 
 wrote:

> Hi,
>
> I was running a dataflow job in GCP last night and it was running fine.
> This morning this same exact job is failing with the following error:
>
> Error message from worker: Traceback (most recent call last): File
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
> line 286, in loads return dill.loads(s) File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in loads
> return load(file, ignore, **kwds) File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in load
> return Unpickler(file, ignore=ignore, **kwds).load() File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in load
> obj = StockUnpickler.load(self) File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
> _load_type return _reverse_typemap[name] KeyError: 'ClassType' During
> handling of the above exception, another exception occurred: Traceback
> (most recent call last): File
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py",
> line 648, in do_work work_executor.execute() File
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/executor.py", line
> 176, in execute op.start() File 
> "apache_beam/runners/worker/operations.py",
> line 649, in apache_beam.runners.worker.operations.DoOperation.start File
> "apache_beam/runners/worker/operations.py", line 651, in
> apache_beam.runners.worker.operations.DoOperation.start File
> "apache_beam/runners/worker/operations.py", line 652, in
> apache_beam.runners.worker.operations.DoOperation.start File
> "apache_beam/runners/worker/operations.py", line 261, in
> apache_beam.runners.worker.operations.Operation.start File
> "apache_beam/runners/worker/operations.py", line 266, in
> apache_beam.runners.worker.operations.Operation.start File
> "apache_beam/runners/worker/operations.py", line 597, in
> apache_beam.runners.worker.operations.DoOperation.setup File
> "apache_beam/runners/worker/operations.py", line 602, in
> apache_beam.runners.worker.operations.DoOperation.setup File
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
> line 290, in loads return dill.loads(s) File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 275, in loads
> return load(file, ignore, **kwds) File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 270, in load
> return Unpickler(file, ignore=ignore, **kwds).load() File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 472, in load
> obj = StockUnpickler.load(self) File
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 577, in
> _load_type return _reverse_typemap[name] KeyError: 'ClassType'
>
>
> If I

Re: Using very slow changing stream of data (KV) as side input

2020-01-28 Thread Mikhail Gryzykhin

Hi Mohil,

I've responded to another thread you started. Please, check on:
https://lists.apache.org/thread.html/r93066c4a3daefba954b23dbdb4536159ff3ff29b06052b92e747da17%40%3Cdev.beam.apache.org%3E

Regards,
Mikhail.

On Tue, Jan 28, 2020 at 10:22 AM Mohil Khare  wrote:

> Hi,
> This is Mohil Khare from San Jose, California. I work in an early stage
> startup: Prosimo.
> We use Apache beam with gcp dataflow for all real time stats processing
> with Kafka and Pubsub as data source while elasticsearch and GCS as sinks.
>
> I am trying to solve the following use with sideinputs.
>
> INPUT:
> 1. We have a continuous stream of data coming from pubsub topicA. This
> data can be put in KV Pcollection and each data item can be uniquely
> identified with certain key.
> 2. We have a very slow changing stream of data coming from pubsub topicB
> i.e. you can say that stream of data comes for few mins on topicB followed
> by no activity for a long time period.   This stream of data can be again
> put in KV PCollection with same keys as above. NOTE: after long inactivity,
> it is possible that data comes for only certain keys.
>
> DESIRED OUTPUT/PROCESSING:
> 1. I want to use KV PCollection as sideinput to enrich data arriving in
> topicA. I think View.asMap can be a good choice for it.
> 2. After enriching data in topic A using sideinput data from topic B,
> write to GCS in a fixed window of 10 minutes
> 2.  Want to continue using above PCollectionView as sideinput as long as
> no new data arrives in topicB.
> 3. Whenever new data arrives in topicB, want to update PCollectionView Map
> only for set of Keys that arrived in new stream.
>
> My question is what should be the best approach to tackle this use case? I
> will really appreciate if someone can suggest some good solution.
>
> Thanks and Regards
> Mohil Khare
>

Re: [ANNOUNCE] New committer: Michał Walenia

2020-01-27 Thread Mikhail Gryzykhin

Congratulations Michal!

--Mikhail

On Mon, Jan 27, 2020 at 1:01 PM Kyle Weaver  wrote:

> Congratulations Michał! Looking forward to your future contributions :)
>
> Thanks,
> Kyle
>
> On Mon, Jan 27, 2020 at 12:47 PM Pablo Estrada  wrote:
>
>> Hi everyone,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Michał Walenia
>>
>> Michał has contributed to Beam in many ways, including the performance
>> testing infrastructure, and has even spoken at events about Beam.
>>
>> In consideration of his contributions, the Beam PMC trusts him with the
>> responsibilities of a Beam committer[1].
>>
>> Thanks for your contributions Michał!
>>
>> Pablo, on behalf of the Apache Beam PMC.
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>

Re: Using very slow changing stream of data (KV) as side input

2020-01-27 Thread Mikhail Gryzykhin

Hi Mohil,

Please, take a look at.
https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs


Also, I have design doc out that handles similar case. I'm working on
prototyping it in python atm.
https://lists.apache.org/thread.html/r792fcf4b6adbce79ea1eb81592d29a3cee7aef768ba4615ac2d078ad%40%3Cdev.beam.apache.org%3E


Regards,
--Mikhail

On Mon, Jan 27, 2020 at 8:56 AM Mohil Khare  wrote:

> Hi,
> This is Mohil Khare from San Jose, California. I work in an early stage
> startup: Prosimo.
> We use Apache beam with gcp dataflow for all real time stats processing
> with Kafka and Pubsub as data source while elasticsearch and GCS as sinks.
>
> I am trying to solve the following use with sideinputs.
>
> INPUT:
> 1. We have a continuous stream of data coming from pubsub topicA. This
> data can be put in KV Pcollection and each data item can be uniquely
> identified with certain key.
> 2. We have a very slow changing stream of data coming from pubsub topicB
> i.e. you can say that stream of data comes for few mins on topicB followed
> by no activity for a long time period.   This stream of data can be again
> put in KV PCollection with same keys as above. NOTE: after long inactivity,
> it is possible that data comes for only certain keys.
>
> DESIRED OUTPUT/PROCESSING:
> 1. I want to use KV PCollection as sideinput to enrich data arriving in
> topicA. I think View.asMap can be a good choice for it.
> 2. After enriching data in topic A using sideinput data from topic B,
> write to GCS in a fixed window of 10 minutes
> 2.  Want to continue using above PCollectionView as sideinput as long as
> no new data arrives in topicB.
> 3. Whenever new data arrives in topicB, want to update PCollectionView Map
> only for set of Keys that arrived in new stream.
>
> My question is what should be the best approach to tackle this use case? I
> will really appreciate if someone can suggest some good solution.
>
> Thanks and Regards
> Mohil Khare
>
>
>
>
>

[ANNOUNCE] Beam 2.17.0 Released!

2020-01-10 Thread Mikhail Gryzykhin

The Apache Beam team is pleased to announce the release of version 2.17.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed on
the Beam blog: https://beam.apache.org/blog/2020/01/06/beam-2.17.0.html

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.17.0.

[RESULT][VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin

Hi all,

I'm happy to announce that we have approved this release.

There are 5 approving votes, 4 of which are binding (in order):
* Ahmet (al...@google.com);
* Luke (lc...@google.com);
* Reuven (re...@google.com);
* Robert (rober...@google.com);

There are no disapproving votes.

Thanks everyone!

Next step is to finalize the release (merge the docs/website/blog PRs,
publish artifacts). Please let me know if you have any questions.

Regards,
--Mikhail

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin

Hi all,

I'm happy to announce that we have approved this release.

There are 5 approving votes, 4 of which are binding (in order):
* Ahmet (al...@google.com);
* Luke (lc...@google.com);
* Reuven (re...@google.com);
* Robert (rober...@google.com);

There are no disapproving votes.

Thanks everyone!

Next step is to finalize the release (merge the docs/website/blog PRs,
publish artifacts). Please let me know if you have any questions.

Regards,
--Mikhail

On Mon, Jan 6, 2020 at 10:55 AM Robert Bradshaw  wrote:

> Thanks. That's the right one. The signatures (and everything else) all
> look good now.
>
> Changing my vote to a +1.
>
> On Mon, Jan 6, 2020 at 9:13 AM Mikhail Gryzykhin 
> wrote:
>
>> KEYS files should be fixed now.
>>
>> On Mon, Jan 6, 2020 at 8:29 AM Robert Bradshaw 
>> wrote:
>>
>>> Yes, please update KEYS to have the correct key. (If you've never used
>>> the other one you could just remove it.)
>>>
>>> On Mon, Jan 6, 2020, 6:46 AM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> I see. Seems that the wrong key is imported into KEYS file. And header
>>>> is incorrect.
>>>>
>>>> --Mikhail
>>>>
>>>> On Mon, Jan 6, 2020 at 6:16 AM Mikhail Gryzykhin 
>>>> wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> I redownloaded binaries from
>>>>> https://dist.apache.org/repos/dist/dev/beam/2.17.0/ and ran
>>>>>
>>>>> gpg --verify apache-beam-2.17.0-source-release.zip.asc
>>>>> gpg: assuming signed data in 'apache-beam-2.17.0-source-release.zip'
>>>>> gpg: Signature made Mon 16 Dec 2019 09:17:23 PM PST
>>>>> gpg:using RSA key
>>>>> 53F72D4EEEF306D97736FE1065ABB07A8965E788
>>>>> gpg: Good signature from "Mikhail Gryzykhin "
>>>>> [ultimate]
>>>>>
>>>>> Signature is valid with key 53F72D4EEEF306D97736FE1065ABB07A8965E788.
>>>>> The key you received is different. Which binaries did you get that
>>>>> signature from?
>>>>>
>>>>> --Mikhail
>>>>>
>>>>> On Thu, Jan 2, 2020 at 4:53 PM Robert Bradshaw 
>>>>> wrote:
>>>>>
>>>>>> (Other than that everything looks fine.)
>>>>>>
>>>>>> On Thu, Jan 2, 2020 at 4:44 PM Robert Bradshaw 
>>>>>> wrote:
>>>>>> >
>>>>>> > -1
>>>>>> >
>>>>>> > I'm having trouble verifying the signatures on the release
>>>>>> artifacts.
>>>>>> > When I try to import the key from
>>>>>> > https://dist.apache.org/repos/dist/release/beam/KEYS I get
>>>>>> >
>>>>>> > pub   rsa4096 2019-10-22 [SC]
>>>>>> >   79552F5C2FD869A08E097F96841855FB73AFFC7F
>>>>>> > uid   [ unknown] Mikhail Gryzykhin (mikhail) <
>>>>>> mikh...@apache.org>
>>>>>> > sub   rsa4096 2019-10-22 [E]
>>>>>> >
>>>>>> > which is not the key that these artifacts were signed with.
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Jan 2, 2020 at 4:23 PM Reuven Lax  wrote:
>>>>>> > >
>>>>>> > > +1
>>>>>> > >
>>>>>> > > On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev <
>>>>>> valen...@google.com> wrote:
>>>>>> > >>
>>>>>> > >> +1. Validated Batch and Streaming quickstarts on Python 3.7
>>>>>> (using wheels) and Batch Mobile Gaming examples (user score, hourly team
>>>>>> score) on Dataflow.
>>>>>> > >>
>>>>>> > >> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay 
>>>>>> wrote:
>>>>>> > >>>
>>>>>> > >>> This vote needs at least one more PMC vote before it can be
>>>>>> finalized. Could you please validate and vote?
>>>>>> > >>>
>>>>>> > >>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik 
>>>>>> wrote:
>>>>>> > >>>>
>>>>>> > >>>> +1, I validated the Java quickstarts for the runners and the
>>>>>> issues I have brought up have been moved to a f

Re: [Proposal] Slowly Changing Dimensions support in Beam

2020-01-06 Thread Mikhail Gryzykhin

I've narrowed down the topic. This does not include any of Dataflow part
and is general for all runners.  Please visit
<https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg>
.

Changes:
* Changed title
* Narrowed topic to slowly changing dimensions support only.

This should simplify discussion and make it easier to come to conclusion.

Looking for comments on:
* API for new feature.
   * Currently there's single comment with alternative approach.
* General idea review.

Regards,
--Mikhail

On Wed, Dec 18, 2019 at 3:14 PM Kenneth Knowles  wrote:

> I do think that the implementation concerns around larger side inputs are
> relevant to most runners. Ideally there would be no model change necessary.
> Triggers are harder and bring in consistency concerns, which are even more
> likely to be relevant to all runners.
>
> Kenn
>
> On Wed, Dec 18, 2019 at 11:23 AM Luke Cwik  wrote:
>
>> Most of the doc is about how to support distributed side inputs in
>> Dataflow and doesn't really cover how the Beam model (accumulating,
>> discarding, retraction) triggers impact what are the "contents" of a
>> PCollection in time and how this proposal for a limited set of side input
>> shapes can work to support larger side inputs in Dataflow.
>>
>> On Tue, Dec 17, 2019 at 2:28 AM Jan Lukavský  wrote:
>>
>>> Hi Mikhail,
>>> On 12/17/19 10:43 AM, Mikhail Gryzykhin wrote:
>>>
>>> inline
>>>
>>> On Tue, Dec 17, 2019 at 12:59 AM Jan Lukavský  wrote:
>>>
>>>> Hi,
>>>>
>>>> I actually thought that the proposal refers to Dataflow only. If this
>>>> is supposed to be general, can we remove the Dataflow/Windmill specific
>>>> parts and replace them with generic ones?
>>>>
>>>  I'll look into rephrasing doc to keep Dataflow/Windmill as example.
>>>
>>> Cool, thanks!
>>>
>>> I'd have two more questions:
>>>>
>>>>  a) the proposal is named "Slowly changing", why is the rate of change
>>>> essential to the proposal? Once running on event time, that should not
>>>> matter, or what am I missing?
>>>>
>>> Within this proposal, it is suggested to make a full snapshot of data on
>>> every re-read. This is generally expensive and setting time event to short
>>> interval might cause issues. Otherwise it is not essential.
>>>
>>> Understood. This relates to table-stream duality, where the requirements
>>> might relax once you don't have to convert table to stream by re-reading
>>> it, but by being able to retrieve updates as you go (example would be
>>> reading directly from kafka or any other "commit log" abstraction).
>>>
>>>  b) The description says: 'User wants to solve a stream enrichment
>>>> problem. In brief request sounds like: ”I want to enrich each event in this
>>>> stream by corresponding data from given table.”'. That is understandable,
>>>> but would it be better to enable the user to express this intent directly
>>>> (via Join operation)? The actual implementation might be runner (and
>>>> input!) specific. The analogy is that when doing group-by-key operation,
>>>> runner can choose hash grouping or sort-merge grouping, but that is not
>>>> (directly) expressed in user code. I'm not saying that we should not have
>>>> low-level transforms, just asking if it would be better to leave this
>>>> decision to the runner (at least in some cases). It might be the case that
>>>> we want to make core SDK as low level as possible (and as reasonable), I
>>>> just want to make sure that that is really the intent.
>>>>
>>> The idea is to add basic operation with as small change as possible for
>>> current API.
>>> Ultimate goal is to have a Join/GBK operator that will choose proper
>>> strategy. However, I don't think that we have proper tools and view of how
>>> to choose best strategy at hand as of yet.
>>>
>>> OK, cool. That is where I would find it very much useful to have some
>>> sort of "goals", that we are targeting. I agree that there are some pieces
>>> missing in the puzzle as of now. But it would be good to know what these
>>> pieces are and what needs to be done to fulfill our goals. But this is
>>> probably not related to discussion of this proposal, but more related to
>>> the concept of BIP or similar.
>>>
>>> Thanks for the explanation.
>>>
>>> Thanks for the proposal!

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin

KEYS files should be fixed now.

On Mon, Jan 6, 2020 at 8:29 AM Robert Bradshaw  wrote:

> Yes, please update KEYS to have the correct key. (If you've never used the
> other one you could just remove it.)
>
> On Mon, Jan 6, 2020, 6:46 AM Mikhail Gryzykhin  wrote:
>
>> I see. Seems that the wrong key is imported into KEYS file. And header is
>> incorrect.
>>
>> --Mikhail
>>
>> On Mon, Jan 6, 2020 at 6:16 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi Robert,
>>>
>>> I redownloaded binaries from
>>> https://dist.apache.org/repos/dist/dev/beam/2.17.0/ and ran
>>>
>>> gpg --verify apache-beam-2.17.0-source-release.zip.asc
>>> gpg: assuming signed data in 'apache-beam-2.17.0-source-release.zip'
>>> gpg: Signature made Mon 16 Dec 2019 09:17:23 PM PST
>>> gpg:using RSA key
>>> 53F72D4EEEF306D97736FE1065ABB07A8965E788
>>> gpg: Good signature from "Mikhail Gryzykhin "
>>> [ultimate]
>>>
>>> Signature is valid with key 53F72D4EEEF306D97736FE1065ABB07A8965E788.
>>> The key you received is different. Which binaries did you get that
>>> signature from?
>>>
>>> --Mikhail
>>>
>>> On Thu, Jan 2, 2020 at 4:53 PM Robert Bradshaw 
>>> wrote:
>>>
>>>> (Other than that everything looks fine.)
>>>>
>>>> On Thu, Jan 2, 2020 at 4:44 PM Robert Bradshaw 
>>>> wrote:
>>>> >
>>>> > -1
>>>> >
>>>> > I'm having trouble verifying the signatures on the release artifacts.
>>>> > When I try to import the key from
>>>> > https://dist.apache.org/repos/dist/release/beam/KEYS I get
>>>> >
>>>> > pub   rsa4096 2019-10-22 [SC]
>>>> >   79552F5C2FD869A08E097F96841855FB73AFFC7F
>>>> > uid   [ unknown] Mikhail Gryzykhin (mikhail) <
>>>> mikh...@apache.org>
>>>> > sub   rsa4096 2019-10-22 [E]
>>>> >
>>>> > which is not the key that these artifacts were signed with.
>>>> >
>>>> >
>>>> > On Thu, Jan 2, 2020 at 4:23 PM Reuven Lax  wrote:
>>>> > >
>>>> > > +1
>>>> > >
>>>> > > On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>> > >>
>>>> > >> +1. Validated Batch and Streaming quickstarts on Python 3.7 (using
>>>> wheels) and Batch Mobile Gaming examples (user score, hourly team score) on
>>>> Dataflow.
>>>> > >>
>>>> > >> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay 
>>>> wrote:
>>>> > >>>
>>>> > >>> This vote needs at least one more PMC vote before it can be
>>>> finalized. Could you please validate and vote?
>>>> > >>>
>>>> > >>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik 
>>>> wrote:
>>>> > >>>>
>>>> > >>>> +1, I validated the Java quickstarts for the runners and the
>>>> issues I have brought up have been moved to a future release.
>>>> > >>>>
>>>> > >>>> On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay 
>>>> wrote:
>>>> > >>>>>
>>>> > >>>>> +1, I validated the python2 quick starts using wheels. Thank
>>>> you for pushing the release this far.
>>>> > >>>>>
>>>> > >>>>> On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles 
>>>> wrote:
>>>> > >>>>>>
>>>> > >>>>>> I verified the Java quickstart on Dataflow manually.
>>>> > >>>>>>
>>>> > >>>>>> Kenn
>>>> > >>>>>>
>>>> > >>>>>> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun <
>>>> sunjincheng...@gmail.com> wrote:
>>>> > >>>>>>>
>>>> > >>>>>>> Thanks for drive this release Mikhail !
>>>> > >>>>>>>
>>>> > >>>>>>> I have found there is an incorrect release version for
>>>> release notes in PR[1], also left a question in PR[2].
>>>> > >>>>>>>
>>>> > >>>>>>>

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin

I see. Seems that the wrong key is imported into KEYS file. And header is
incorrect.

--Mikhail

On Mon, Jan 6, 2020 at 6:16 AM Mikhail Gryzykhin  wrote:

> Hi Robert,
>
> I redownloaded binaries from
> https://dist.apache.org/repos/dist/dev/beam/2.17.0/ and ran
>
> gpg --verify apache-beam-2.17.0-source-release.zip.asc
> gpg: assuming signed data in 'apache-beam-2.17.0-source-release.zip'
> gpg: Signature made Mon 16 Dec 2019 09:17:23 PM PST
> gpg:using RSA key 53F72D4EEEF306D97736FE1065ABB07A8965E788
> gpg: Good signature from "Mikhail Gryzykhin "
> [ultimate]
>
> Signature is valid with key 53F72D4EEEF306D97736FE1065ABB07A8965E788. The
> key you received is different. Which binaries did you get that signature
> from?
>
> --Mikhail
>
> On Thu, Jan 2, 2020 at 4:53 PM Robert Bradshaw 
> wrote:
>
>> (Other than that everything looks fine.)
>>
>> On Thu, Jan 2, 2020 at 4:44 PM Robert Bradshaw 
>> wrote:
>> >
>> > -1
>> >
>> > I'm having trouble verifying the signatures on the release artifacts.
>> > When I try to import the key from
>> > https://dist.apache.org/repos/dist/release/beam/KEYS I get
>> >
>> > pub   rsa4096 2019-10-22 [SC]
>> >   79552F5C2FD869A08E097F96841855FB73AFFC7F
>> > uid   [ unknown] Mikhail Gryzykhin (mikhail) <
>> mikh...@apache.org>
>> > sub   rsa4096 2019-10-22 [E]
>> >
>> > which is not the key that these artifacts were signed with.
>> >
>> >
>> > On Thu, Jan 2, 2020 at 4:23 PM Reuven Lax  wrote:
>> > >
>> > > +1
>> > >
>> > > On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>> > >>
>> > >> +1. Validated Batch and Streaming quickstarts on Python 3.7 (using
>> wheels) and Batch Mobile Gaming examples (user score, hourly team score) on
>> Dataflow.
>> > >>
>> > >> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay 
>> wrote:
>> > >>>
>> > >>> This vote needs at least one more PMC vote before it can be
>> finalized. Could you please validate and vote?
>> > >>>
>> > >>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:
>> > >>>>
>> > >>>> +1, I validated the Java quickstarts for the runners and the
>> issues I have brought up have been moved to a future release.
>> > >>>>
>> > >>>> On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay 
>> wrote:
>> > >>>>>
>> > >>>>> +1, I validated the python2 quick starts using wheels. Thank you
>> for pushing the release this far.
>> > >>>>>
>> > >>>>> On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles 
>> wrote:
>> > >>>>>>
>> > >>>>>> I verified the Java quickstart on Dataflow manually.
>> > >>>>>>
>> > >>>>>> Kenn
>> > >>>>>>
>> > >>>>>> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun <
>> sunjincheng...@gmail.com> wrote:
>> > >>>>>>>
>> > >>>>>>> Thanks for drive this release Mikhail !
>> > >>>>>>>
>> > >>>>>>> I have found there is an incorrect release version for release
>> notes in PR[1], also left a question in PR[2].
>> > >>>>>>>
>> > >>>>>>> But I do not think it's the blocker of the release :)
>> > >>>>>>>
>> > >>>>>>> Best,
>> > >>>>>>> Jincheng
>> > >>>>>>>
>> > >>>>>>> [1] https://github.com/apache/beam/pull/10401
>> > >>>>>>> [2] https://github.com/apache/beam/pull/10402
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> Ahmet Altay  于2019年12月19日周四 上午3:31写道：
>> > >>>>>>>>
>> > >>>>>>>> I validated python quickstarts with python 2. Wheels file are
>> missing but they work otherwise. Once the wheel files are added I will add
>> my vote.
>> > >>>>>>>>
>> > >>>>>>>> On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik 
>> wrote:
>> > >>>>&

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin

Hi Robert,

I redownloaded binaries from
https://dist.apache.org/repos/dist/dev/beam/2.17.0/ and ran

gpg --verify apache-beam-2.17.0-source-release.zip.asc
gpg: assuming signed data in 'apache-beam-2.17.0-source-release.zip'
gpg: Signature made Mon 16 Dec 2019 09:17:23 PM PST
gpg:using RSA key 53F72D4EEEF306D97736FE1065ABB07A8965E788
gpg: Good signature from "Mikhail Gryzykhin " [ultimate]

Signature is valid with key 53F72D4EEEF306D97736FE1065ABB07A8965E788. The
key you received is different. Which binaries did you get that signature
from?

--Mikhail

On Thu, Jan 2, 2020 at 4:53 PM Robert Bradshaw  wrote:

> (Other than that everything looks fine.)
>
> On Thu, Jan 2, 2020 at 4:44 PM Robert Bradshaw 
> wrote:
> >
> > -1
> >
> > I'm having trouble verifying the signatures on the release artifacts.
> > When I try to import the key from
> > https://dist.apache.org/repos/dist/release/beam/KEYS I get
> >
> > pub   rsa4096 2019-10-22 [SC]
> >   79552F5C2FD869A08E097F96841855FB73AFFC7F
> > uid   [ unknown] Mikhail Gryzykhin (mikhail)  >
> > sub   rsa4096 2019-10-22 [E]
> >
> > which is not the key that these artifacts were signed with.
> >
> >
> > On Thu, Jan 2, 2020 at 4:23 PM Reuven Lax  wrote:
> > >
> > > +1
> > >
> > > On Thu, Jan 2, 2020 at 3:02 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> > >>
> > >> +1. Validated Batch and Streaming quickstarts on Python 3.7 (using
> wheels) and Batch Mobile Gaming examples (user score, hourly team score) on
> Dataflow.
> > >>
> > >> On Thu, Jan 2, 2020 at 11:23 AM Ahmet Altay  wrote:
> > >>>
> > >>> This vote needs at least one more PMC vote before it can be
> finalized. Could you please validate and vote?
> > >>>
> > >>> On Mon, Dec 23, 2019 at 9:44 AM Luke Cwik  wrote:
> > >>>>
> > >>>> +1, I validated the Java quickstarts for the runners and the issues
> I have brought up have been moved to a future release.
> > >>>>
> > >>>> On Fri, Dec 20, 2019 at 8:09 PM Ahmet Altay 
> wrote:
> > >>>>>
> > >>>>> +1, I validated the python2 quick starts using wheels. Thank you
> for pushing the release this far.
> > >>>>>
> > >>>>> On Thu, Dec 19, 2019 at 1:27 PM Kenneth Knowles 
> wrote:
> > >>>>>>
> > >>>>>> I verified the Java quickstart on Dataflow manually.
> > >>>>>>
> > >>>>>> Kenn
> > >>>>>>
> > >>>>>> On Wed, Dec 18, 2019 at 5:58 PM jincheng sun <
> sunjincheng...@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Thanks for drive this release Mikhail !
> > >>>>>>>
> > >>>>>>> I have found there is an incorrect release version for release
> notes in PR[1], also left a question in PR[2].
> > >>>>>>>
> > >>>>>>> But I do not think it's the blocker of the release :)
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jincheng
> > >>>>>>>
> > >>>>>>> [1] https://github.com/apache/beam/pull/10401
> > >>>>>>> [2] https://github.com/apache/beam/pull/10402
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Ahmet Altay  于2019年12月19日周四 上午3:31写道：
> > >>>>>>>>
> > >>>>>>>> I validated python quickstarts with python 2. Wheels file are
> missing but they work otherwise. Once the wheel files are added I will add
> my vote.
> > >>>>>>>>
> > >>>>>>>> On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik 
> wrote:
> > >>>>>>>>>
> > >>>>>>>>> I verified the release and ran the quickstarts and found that
> release 2.16 broke Apache Nemo runner which is also an issue for 2.17.0 RC
> #2. It is caused by a backwards incompatible change in ParDo.MultiOutput
> where getSideInputs return value was changed from List to Map as part of
> https://github.com/apache/beam/pull/9275. I filed
> https://issues.apache.org/jira/browse/BEAM-8989 to track the issue.
> > >>>>>>>>>
> > >>>>>>>>> Should we re-add the method back in 2.1

[VOTE] Release 2.17.0, release candidate #2

2019-12-17 Thread Mikhail Gryzykhin

Hi everyone,


Please review and vote on the release candidate #2 for the version 2.17.0,
as follows:

[ ] +1, Approve the release

[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:

* JIRA release notes [1],

* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
53F72D4EEEF306D97736FE1065ABB07A8965E788

 [3],

* all artifacts to be deployed to the Maven Central Repository [4],

* source code tag "v2.17.0-RC2" [5],

* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [8].

* Python artifacts are deployed along with the source release to the
dist.apache.org [2].

* Validation sheet with a tab for 2.17.0 release to help with validation
[9].

* Docker images published to Docker Hub [10].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,

--Mikhail

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345970=12319527

[2] https://dist.apache.org/repos/dist/dev/beam/2.17.0/

[3] https://dist.apache.org/repos/dist/release/beam/KEYS

[4] https://repository.apache.org/content/repositories/orgapachebeam-1087/

[5] https://github.com/apache/beam/tree/v2.17.0-RC2

[6] https://github.com/apache/beam/pull/10401

[7] https://github.com/apache/beam-site/pull/594

[8] https://github.com/apache/beam/pull/10402

[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=803858785

[10] https://hub.docker.com/u/apachebeam

Re: [Proposal] Slowly Changing Dimensions and Distributed Map Side Inputs (in Dataflow)

2019-12-17 Thread Mikhail Gryzykhin

inline

On Tue, Dec 17, 2019 at 12:59 AM Jan Lukavský  wrote:

> Hi,
>
> I actually thought that the proposal refers to Dataflow only. If this is
> supposed to be general, can we remove the Dataflow/Windmill specific parts
> and replace them with generic ones?
>
 I'll look into rephrasing doc to keep Dataflow/Windmill as example.

> I'd have two more questions:
>
>  a) the proposal is named "Slowly changing", why is the rate of change
> essential to the proposal? Once running on event time, that should not
> matter, or what am I missing?
>
Within this proposal, it is suggested to make a full snapshot of data on
every re-read. This is generally expensive and setting time event to short
interval might cause issues. Otherwise it is not essential.

>  b) The description says: 'User wants to solve a stream enrichment
> problem. In brief request sounds like: ”I want to enrich each event in this
> stream by corresponding data from given table.”'. That is understandable,
> but would it be better to enable the user to express this intent directly
> (via Join operation)? The actual implementation might be runner (and
> input!) specific. The analogy is that when doing group-by-key operation,
> runner can choose hash grouping or sort-merge grouping, but that is not
> (directly) expressed in user code. I'm not saying that we should not have
> low-level transforms, just asking if it would be better to leave this
> decision to the runner (at least in some cases). It might be the case that
> we want to make core SDK as low level as possible (and as reasonable), I
> just want to make sure that that is really the intent.
>
The idea is to add basic operation with as small change as possible for
current API.
Ultimate goal is to have a Join/GBK operator that will choose proper
strategy. However, I don't think that we have proper tools and view of how
to choose best strategy at hand as of yet.

> Thanks for the proposal!
>
> Jan
> On 12/17/19 12:01 AM, Kenneth Knowles wrote:
>
> I want to highlight that this design works for definitely more runners
> than just Dataflow. I see two pieces of it that I want to bring onto the
> thread:
>
> 1. A new kind of "unbounded source" which is a periodic refresh of a
> bounded source, and use that as a side input. Each main input element has a
> window that maps to a specific refresh of the side input.
> 2. Distributed map side inputs: supporting very large lookup tables, but
> with consistency challenges. Even the part about "windmill API" probably
> applies to other runners
>
> So I hope the title and "Objective" section do not cause people to stop
> reading.
>
> Kenn
>
> On Mon, Dec 16, 2019 at 11:36 AM Mikhail Gryzykhin 
> wrote:
>
>> +some people explicitly
>>
>> Can you please check on the doc and comment if it looks fine?
>>
>> Thank you,
>> --Mikhail
>>
>> On Tue, Dec 10, 2019 at 1:43 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> "Good news, everyone-"
>>> ―Farnsworth
>>>
>>> Hi everyone,
>>>
>>> Recently, I was looking into relaxing limitations on side inputs in
>>> Dataflow runner. As part of it, I came up with design proposal for
>>> standardizing slowly changing dimensions use case in Beam and relevant
>>> changes to add support for distributed map side inputs.
>>>
>>> Please review and comment on design doc.
>>> <https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg>
>>>  [1]
>>>
>>> Thank you,
>>> Mikhail.
>>>
>>> -
>>>
>>> [1]
>>> https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg
>>>
>>>

[Proposal] Slowly Changing Dimensions and Distributed Map Side Inputs (in Dataflow)

2019-12-10 Thread Mikhail Gryzykhin

"Good news, everyone-"
―Farnsworth

Hi everyone,

Recently, I was looking into relaxing limitations on side inputs in
Dataflow runner. As part of it, I came up with design proposal for
standardizing slowly changing dimensions use case in Beam and relevant
changes to add support for distributed map side inputs.

Please review and comment on design doc.

 [1]

Thank you,
Mikhail.

-

[1]
https://docs.google.com/document/d/1LDY_CtsOJ8Y_zNv1QtkP6AGFrtzkj1q5EW_gSChOIvg

Re: [UPDATE] Preparing for Beam 2.17.0 release

2019-12-06 Thread Mikhail Gryzykhin

UPD:
All PRs are merged and tickets are closed. Building RC.

--Mikhail

On Tue, Nov 26, 2019 at 10:10 AM Mikhail Gryzykhin 
wrote:

> Hello everybody,
>
> Got release branch green except gradle build that timeout and fails with
> go tests that look like flake.
>
> I'll go over remaining PRs and Jiras today and do final tests validation.
> Will start RC process afterwards.
>
> --Mikhail
>
> On Fri, Nov 22, 2019 at 9:29 PM Jan Lukavský  wrote:
>
>> Hi Mikhail,
>> I created PR for [BEAM-8812]. It is linked in the JIRA.
>> Jan
>>
>> Dne 23. 11. 2019 0:45 napsal uživatel Mikhail Gryzykhin <
>> mig...@google.com>:
>>
>> UPD:
>> on current branch there's timeout on gradle build job, I'm mitigating it
>> by increasing job time. Seems that this job runs most of python tests. We
>> might look into adjusting the target.
>>
>> Second failure is https://issues.apache.org/jira/browse/BEAM-8812 . I
>> would really appreciate if someone can help me debug this one.
>>
>> --Mikhail
>>
>> On Tue, Nov 19, 2019 at 10:14 PM Kenneth Knowles  wrote:
>>
>> I've poked through the bugs and there do seem to be a few that are
>> finished and a few that may not be started that should probably be deferred
>> if they can be triaged to not be blockers.
>>
>> Kenn
>>
>> On Fri, Nov 15, 2019 at 2:13 PM Mikhail Gryzykhin 
>> wrote:
>>
>> Hi everyone,
>>
>> There's still an outstanding cherry-pick PR that I can't merge due to
>> tests failing on it and release branch validation PR
>> <https://github.com/apache/beam/pull/9884>. Once I get tests green, I'll
>> send another update and review outstanding open issues.
>>
>> --Mikhail
>>
>> On Fri, Nov 15, 2019 at 10:40 AM Thomas Weise  wrote:
>>
>> Any update regarding the release?
>>
>> The list still shows 10 open issues:
>>
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.17.0%20and%20resolution%20is%20EMPTY
>>
>> Is the RC blocked on those?
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 28, 2019 at 12:46 PM Ahmet Altay  wrote:
>>
>>
>>
>> On Mon, Oct 28, 2019 at 12:44 PM Gleb Kanterov  wrote:
>>
>> It looks like BigQueryIO DIRECT_READ is broken since 2.16.0, I've added a
>> ticket describing the problem and possible fix, see BEAM-8504
>> <https://issues.apache.org/jira/browse/BEAM-8504> [1].
>>
>>
>> Should this be added to 2.16 blog post as a known issue?
>>
>>
>>
>> [1]: https://issues.apache.org/jira/browse/BEAM-8504
>>
>> On Wed, Oct 23, 2019 at 9:19 PM Kenneth Knowles  wrote:
>>
>> I opened https://github.com/apache/beam/pull/9862 to raise the
>> documentation of Fix Version to the top level. It also includes the write
>> up of Jira priorities, to make clear that "Blocker" priority does not refer
>> to release blocking.
>>
>> On Wed, Oct 23, 2019 at 11:16 AM Kenneth Knowles  wrote:
>>
>> I've gone over the tickets and removed Fix Version from many of them that
>> do not seem to be critical defects. If I removed Fix Version from a ticket
>> you care about, please feel free to add it back. I am not trying to decide
>> what is in/out of the release, just trying to triage the Jira data to match
>> expected practices.
>>
>> It should probably be documented somewhere outside of the release guide.
>> As far as I can tell, the fact that we triage them down to zero is the only
>> place we mention that it is used to indicate release blockers and not used
>> for feature targets.
>>
>> Kenn
>>
>> On Wed, Oct 23, 2019 at 10:40 AM Kenneth Knowles  wrote:
>>
>>  Wow, 28 release blocking tickets! That is the most I've ever seen, by
>> far. Many appear to be feature requests, not release-blocking defects. I
>> believe this is not according to our normal best practice. The release
>> cadence should not wait for features in progress, with exceptions discussed
>> on dev@. As a matter of best practice, I think we should triage feature
>> requests to not have Fix Version set until it has been discussed on dev@.
>>
>> Kenn
>>
>> On Wed, Oct 23, 2019 at 9:55 AM Mikhail Gryzykhin 
>> wrote:
>>
>> Hi all,
>>
>> Beam 2.17 release branch cut is scheduled today (2019/10/23) according to
>> the release calendar [1].  I'll start working on the branch cutoff and
>> later work on cherry picking blocker fixes.
>>
>> If you have release blocking issues for 2.17

Re: [UPDATE] Preparing for Beam 2.17.0 release

2019-11-26 Thread Mikhail Gryzykhin

Hello everybody,

Got release branch green except gradle build that timeout and fails with go
tests that look like flake.

I'll go over remaining PRs and Jiras today and do final tests validation.
Will start RC process afterwards.

--Mikhail

On Fri, Nov 22, 2019 at 9:29 PM Jan Lukavský  wrote:

> Hi Mikhail,
> I created PR for [BEAM-8812]. It is linked in the JIRA.
> Jan
>
> Dne 23. 11. 2019 0:45 napsal uživatel Mikhail Gryzykhin  >:
>
> UPD:
> on current branch there's timeout on gradle build job, I'm mitigating it
> by increasing job time. Seems that this job runs most of python tests. We
> might look into adjusting the target.
>
> Second failure is https://issues.apache.org/jira/browse/BEAM-8812 . I
> would really appreciate if someone can help me debug this one.
>
> --Mikhail
>
> On Tue, Nov 19, 2019 at 10:14 PM Kenneth Knowles  wrote:
>
> I've poked through the bugs and there do seem to be a few that are
> finished and a few that may not be started that should probably be deferred
> if they can be triaged to not be blockers.
>
> Kenn
>
> On Fri, Nov 15, 2019 at 2:13 PM Mikhail Gryzykhin 
> wrote:
>
> Hi everyone,
>
> There's still an outstanding cherry-pick PR that I can't merge due to
> tests failing on it and release branch validation PR
> <https://github.com/apache/beam/pull/9884>. Once I get tests green, I'll
> send another update and review outstanding open issues.
>
> --Mikhail
>
> On Fri, Nov 15, 2019 at 10:40 AM Thomas Weise  wrote:
>
> Any update regarding the release?
>
> The list still shows 10 open issues:
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.17.0%20and%20resolution%20is%20EMPTY
>
> Is the RC blocked on those?
>
>
>
>
>
>
> On Mon, Oct 28, 2019 at 12:46 PM Ahmet Altay  wrote:
>
>
>
> On Mon, Oct 28, 2019 at 12:44 PM Gleb Kanterov  wrote:
>
> It looks like BigQueryIO DIRECT_READ is broken since 2.16.0, I've added a
> ticket describing the problem and possible fix, see BEAM-8504
> <https://issues.apache.org/jira/browse/BEAM-8504> [1].
>
>
> Should this be added to 2.16 blog post as a known issue?
>
>
>
> [1]: https://issues.apache.org/jira/browse/BEAM-8504
>
> On Wed, Oct 23, 2019 at 9:19 PM Kenneth Knowles  wrote:
>
> I opened https://github.com/apache/beam/pull/9862 to raise the
> documentation of Fix Version to the top level. It also includes the write
> up of Jira priorities, to make clear that "Blocker" priority does not refer
> to release blocking.
>
> On Wed, Oct 23, 2019 at 11:16 AM Kenneth Knowles  wrote:
>
> I've gone over the tickets and removed Fix Version from many of them that
> do not seem to be critical defects. If I removed Fix Version from a ticket
> you care about, please feel free to add it back. I am not trying to decide
> what is in/out of the release, just trying to triage the Jira data to match
> expected practices.
>
> It should probably be documented somewhere outside of the release guide.
> As far as I can tell, the fact that we triage them down to zero is the only
> place we mention that it is used to indicate release blockers and not used
> for feature targets.
>
> Kenn
>
> On Wed, Oct 23, 2019 at 10:40 AM Kenneth Knowles  wrote:
>
>  Wow, 28 release blocking tickets! That is the most I've ever seen, by
> far. Many appear to be feature requests, not release-blocking defects. I
> believe this is not according to our normal best practice. The release
> cadence should not wait for features in progress, with exceptions discussed
> on dev@. As a matter of best practice, I think we should triage feature
> requests to not have Fix Version set until it has been discussed on dev@.
>
> Kenn
>
> On Wed, Oct 23, 2019 at 9:55 AM Mikhail Gryzykhin 
> wrote:
>
> Hi all,
>
> Beam 2.17 release branch cut is scheduled today (2019/10/23) according to
> the release calendar [1].  I'll start working on the branch cutoff and
> later work on cherry picking blocker fixes.
>
> If you have release blocking issues for 2.17 please mark their "Fix
> Version" as 2.17.0 [2]. This tag is already created in JIRA in case you
> would like to move any non-blocking issues to that version.
>
> There is a decent amount of open bugs to be resolved in 2.17.0 [2] and
> only 4 [3] are marked as blockers. Please, review those if these bugs are
> actually to be resolved in 2.17.0 and prioritize fixes if possible.
>
> Any thoughts, comments, objections?
>
> Regards.
> Mikhail.
>
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
> [2]
> https://issues.apache.org/jir

Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-25 Thread Mikhail Gryzykhin

[ ] Beaver
[X] Hedgehog
[] Lemur
[X] Owl
[ ] Salmon
[ ] Trout
[X] Robot dinosaur
[ ] Firefly
[ ] Cuttlefish
[ ] Dumbo Octopus
[ ] Angler fish
[X] Honey Badger

Re: [UPDATE] Preparing for Beam 2.17.0 release

2019-11-22 Thread Mikhail Gryzykhin

UPD:
on current branch there's timeout on gradle build job, I'm mitigating it by
increasing job time. Seems that this job runs most of python tests. We
might look into adjusting the target.

Second failure is https://issues.apache.org/jira/browse/BEAM-8812 . I would
really appreciate if someone can help me debug this one.

--Mikhail

On Tue, Nov 19, 2019 at 10:14 PM Kenneth Knowles  wrote:

> I've poked through the bugs and there do seem to be a few that are
> finished and a few that may not be started that should probably be deferred
> if they can be triaged to not be blockers.
>
> Kenn
>
> On Fri, Nov 15, 2019 at 2:13 PM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> There's still an outstanding cherry-pick PR that I can't merge due to
>> tests failing on it and release branch validation PR
>> <https://github.com/apache/beam/pull/9884>. Once I get tests green, I'll
>> send another update and review outstanding open issues.
>>
>> --Mikhail
>>
>> On Fri, Nov 15, 2019 at 10:40 AM Thomas Weise  wrote:
>>
>>> Any update regarding the release?
>>>
>>> The list still shows 10 open issues:
>>>
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.17.0%20and%20resolution%20is%20EMPTY
>>>
>>> Is the RC blocked on those?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 28, 2019 at 12:46 PM Ahmet Altay  wrote:
>>>
>>>>
>>>>
>>>> On Mon, Oct 28, 2019 at 12:44 PM Gleb Kanterov 
>>>> wrote:
>>>>
>>>>> It looks like BigQueryIO DIRECT_READ is broken since 2.16.0, I've
>>>>> added a ticket describing the problem and possible fix, see BEAM-8504
>>>>> <https://issues.apache.org/jira/browse/BEAM-8504> [1].
>>>>>
>>>>
>>>> Should this be added to 2.16 blog post as a known issue?
>>>>
>>>>
>>>>>
>>>>> [1]: https://issues.apache.org/jira/browse/BEAM-8504
>>>>>
>>>>> On Wed, Oct 23, 2019 at 9:19 PM Kenneth Knowles 
>>>>> wrote:
>>>>>
>>>>>> I opened https://github.com/apache/beam/pull/9862 to raise the
>>>>>> documentation of Fix Version to the top level. It also includes the write
>>>>>> up of Jira priorities, to make clear that "Blocker" priority does not 
>>>>>> refer
>>>>>> to release blocking.
>>>>>>
>>>>>> On Wed, Oct 23, 2019 at 11:16 AM Kenneth Knowles 
>>>>>> wrote:
>>>>>>
>>>>>>> I've gone over the tickets and removed Fix Version from many of them
>>>>>>> that do not seem to be critical defects. If I removed Fix Version from a
>>>>>>> ticket you care about, please feel free to add it back. I am not trying 
>>>>>>> to
>>>>>>> decide what is in/out of the release, just trying to triage the Jira 
>>>>>>> data
>>>>>>> to match expected practices.
>>>>>>>
>>>>>>> It should probably be documented somewhere outside of the release
>>>>>>> guide. As far as I can tell, the fact that we triage them down to zero 
>>>>>>> is
>>>>>>> the only place we mention that it is used to indicate release blockers 
>>>>>>> and
>>>>>>> not used for feature targets.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Wed, Oct 23, 2019 at 10:40 AM Kenneth Knowles 
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  Wow, 28 release blocking tickets! That is the most I've ever seen,
>>>>>>>> by far. Many appear to be feature requests, not release-blocking 
>>>>>>>> defects. I
>>>>>>>> believe this is not according to our normal best practice. The release
>>>>>>>> cadence should not wait for features in progress, with exceptions 
>>>>>>>> discussed
>>>>>>>> on dev@. As a matter of best practice, I think we should triage
>>>>>>>> feature requests to not have Fix Version set until it has been 
>>>>>>>> discussed on
>>>>>>>> dev@.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>

Re: [UPDATE] Preparing for Beam 2.17.0 release

2019-11-15 Thread Mikhail Gryzykhin

Hi everyone,

There's still an outstanding cherry-pick PR that I can't merge due to tests
failing on it and release branch validation PR
<https://github.com/apache/beam/pull/9884>. Once I get tests green, I'll
send another update and review outstanding open issues.

--Mikhail

On Fri, Nov 15, 2019 at 10:40 AM Thomas Weise  wrote:

> Any update regarding the release?
>
> The list still shows 10 open issues:
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.17.0%20and%20resolution%20is%20EMPTY
>
> Is the RC blocked on those?
>
>
>
>
>
>
> On Mon, Oct 28, 2019 at 12:46 PM Ahmet Altay  wrote:
>
>>
>>
>> On Mon, Oct 28, 2019 at 12:44 PM Gleb Kanterov  wrote:
>>
>>> It looks like BigQueryIO DIRECT_READ is broken since 2.16.0, I've added
>>> a ticket describing the problem and possible fix, see BEAM-8504
>>> <https://issues.apache.org/jira/browse/BEAM-8504> [1].
>>>
>>
>> Should this be added to 2.16 blog post as a known issue?
>>
>>
>>>
>>> [1]: https://issues.apache.org/jira/browse/BEAM-8504
>>>
>>> On Wed, Oct 23, 2019 at 9:19 PM Kenneth Knowles  wrote:
>>>
>>>> I opened https://github.com/apache/beam/pull/9862 to raise the
>>>> documentation of Fix Version to the top level. It also includes the write
>>>> up of Jira priorities, to make clear that "Blocker" priority does not refer
>>>> to release blocking.
>>>>
>>>> On Wed, Oct 23, 2019 at 11:16 AM Kenneth Knowles 
>>>> wrote:
>>>>
>>>>> I've gone over the tickets and removed Fix Version from many of them
>>>>> that do not seem to be critical defects. If I removed Fix Version from a
>>>>> ticket you care about, please feel free to add it back. I am not trying to
>>>>> decide what is in/out of the release, just trying to triage the Jira data
>>>>> to match expected practices.
>>>>>
>>>>> It should probably be documented somewhere outside of the release
>>>>> guide. As far as I can tell, the fact that we triage them down to zero is
>>>>> the only place we mention that it is used to indicate release blockers and
>>>>> not used for feature targets.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Wed, Oct 23, 2019 at 10:40 AM Kenneth Knowles 
>>>>> wrote:
>>>>>
>>>>>>  Wow, 28 release blocking tickets! That is the most I've ever seen,
>>>>>> by far. Many appear to be feature requests, not release-blocking 
>>>>>> defects. I
>>>>>> believe this is not according to our normal best practice. The release
>>>>>> cadence should not wait for features in progress, with exceptions 
>>>>>> discussed
>>>>>> on dev@. As a matter of best practice, I think we should triage
>>>>>> feature requests to not have Fix Version set until it has been discussed 
>>>>>> on
>>>>>> dev@.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Wed, Oct 23, 2019 at 9:55 AM Mikhail Gryzykhin 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Beam 2.17 release branch cut is scheduled today (2019/10/23)
>>>>>>> according to the release calendar [1].  I'll start working on the
>>>>>>> branch cutoff and later work on cherry picking blocker fixes.
>>>>>>>
>>>>>>> If you have release blocking issues for 2.17 please mark their "Fix
>>>>>>> Version" as 2.17.0 [2]. This tag is already created in JIRA in case you
>>>>>>> would like to move any non-blocking issues to that version.
>>>>>>>
>>>>>>> There is a decent amount of open bugs to be resolved in 2.17.0 [2]
>>>>>>> and only 4 [3] are marked as blockers. Please, review those if these 
>>>>>>> bugs
>>>>>>> are actually to be resolved in 2.17.0 and prioritize fixes if possible.
>>>>>>>
>>>>>>> Any thoughts, comments, objections?
>>>>>>>
>>>>>>> Regards.
>>>>>>> Mikhail.
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>>>>>> [2]
>>>>>>> https://issues.apache.org/jira/browse/BEAM-8457?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20fixVersion%20%3D%202.17.0
>>>>>>> [3]
>>>>>>> https://issues.apache.org/jira/browse/BEAM-8457?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%202.17.0
>>>>>>>
>>>>>>

Re: Python Precommit duration pushing 2 hours

2019-11-14 Thread Mikhail Gryzykhin

Hi Everyone,

Python precommit phrase timeouts for (roughly) 80% of the jobs in 2 hours.
This also blocks release branch validation. I suggest to bump the timeout
to 3 hours while we are working on a proper solution. This way many people
can get unblocked.

I believe the change can be rather small:
https://github.com/apache/beam/pull/10121

--Mikhail



On Mon, Nov 11, 2019 at 5:24 PM Ning Kang  wrote:

> I'm removing the additional interactive test env + suite and add
> [interactive] dependencies as extra dependencies in tests_require:
> https://github.com/apache/beam/pull/10068
>
> On Mon, Nov 11, 2019 at 2:15 PM Robert Bradshaw 
> wrote:
>
>> On Fri, Nov 8, 2019 at 5:45 PM Ahmet Altay  wrote:
>> >
>> > I looked at the log but I could not figure what is causing the timeout
>> because the gradle scan links are missing. I sampled a few of the
>> successful jobs, It seems like python 3.7 and python 2 are running 3 tests
>> in serial {interactive, py37cython, py37gcp} and {docs, py27cython,
>> py27gcp} respectively. These two versions are pushing the total time
>> because other variants are now only running {cython, gcp} versions.
>> >
>> > I suggest breaking up docs, and interactive into 2 separate suites of
>> their own. docs is actually faster than interactive,just separating that
>> out to a new suite might help.
>> >
>> > Interactive was recently added (
>> https://github.com/apache/beam/pull/9741). +Ning Kang could you separate
>> interactive to new suite?
>>
>> I would ask why interactive is a separate tox configuration at all; I
>> don't think there's a need to run every test again with a couple of
>> extra dependencies (adding ~30 minutes to every presumbit). I think it
>> would be much more valuable to run the (presumably relatively small)
>> set of interactive tests in all modes.
>>
>> (The other suites are to guerentee the tests specifically run
>> *without* installing gcp and *without* compiling with Cython.)
>>
>> > On Fri, Nov 8, 2019 at 11:09 AM Robert Bradshaw 
>> wrote:
>> >>
>> >> Just saw another 2-hour timeout:
>> >> https://builds.apache.org/job/beam_PreCommit_Python_Commit/9440/ , so
>> >> perhaps we're not out of the woods yet (though in general things have
>> >> been a lot better).
>> >>
>> >> On Tue, Nov 5, 2019 at 10:52 AM Ahmet Altay  wrote:
>> >> >
>> >> > GCP tests are already on separate locations. IO related tests are
>> under /sdks/python/apache_beam/io/gcp and Dataflow related tests are under
>> sdks/python/apache_beam/runners/dataflow. It should be a matter of changing
>> gradle files to run either one of the base tests or GCP tests depending on
>> the types of changes. I do not expect this to have any material impact on
>> the precommit times because these two test suites take about exactly the
>> same time to complete.
>> >> >
>> >> > #9985 is merged now. Precommit times on master branch dropped to ~1h
>> 20 for the last 5 runs.
>> >> >
>> >> > On Tue, Nov 5, 2019 at 10:12 AM David Cavazos 
>> wrote:
>> >> >>
>> >> >> +1 to moving the GCP tests outside of core. If there are issues
>> that only show up on GCP tests but not in core, it might be an indication
>> that there needs to be another test in core covering that, but I think that
>> should be pretty rare.
>> >> >>
>> >> >> On Mon, Nov 4, 2019 at 8:33 PM Kenneth Knowles 
>> wrote:
>> >> >>>
>> >> >>> +1 to moving forward with this
>> >> >>>
>> >> >>> Could we move GCP tests outside the core? Then only code changes
>> touches/affecting GCP would cause them to run in precommit. Could still run
>> them in postcommit in their own suite. If the core has reasonably stable
>> abstractions that the connectors are built on, this should not change
>> coverage much.
>> >> >>>
>> >> >>> Kenn
>> >> >>>
>> >> >>> On Mon, Nov 4, 2019 at 1:55 PM Ahmet Altay 
>> wrote:
>> >> 
>> >>  PR for the proposed change:
>> https://github.com/apache/beam/pull/9985
>> >> 
>> >>  On Mon, Nov 4, 2019 at 1:35 PM Udi Meiri 
>> wrote:
>> >> >
>> >> > +1
>> >> >
>> >> > On Mon, Nov 4, 2019 at 12:09 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>> >> >>
>> >> >> +1, this seems like a good step with a clear win.
>> >> >>
>> >> >> On Mon, Nov 4, 2019 at 12:06 PM Ahmet Altay 
>> wrote:
>> >> >> >
>> >> >> > Python precommits are still timing out on #9925. I am
>> guessing that means this change would not be enough.
>> >> >> >
>> >> >> > I am proposing cutting down the number of test variants we
>> run in precommits. Currently for each version we ran the following variants
>> serially:
>> >> >> > - base: Runs all unit tests with tox
>> >> >> > - Cython: Installs cython and runs all unit tests as base
>> version. The original purpose was to ensure that tests pass with or without
>> cython. There is probably a huge overlap with base. (IIRC only a few coders
>> have different slow vs fast tests.)
>> >> >> > - GCP: Installs GCP dependencies and tests all base +
>> additional gcp

Re: Quota issues again

2019-10-29 Thread Mikhail Gryzykhin

IIRC currently, post-commit doesn't run pre-commits. However we have
precommit_cron jobs that run pre-commits periodically. However it sums up
to dozens of jobs that is really hard to monitor.

If we split things even further, we definitely need to combine result into
something more easily trackable.

Also making post-commits bigger is not that good idea either, since it will
make them even more flaky and any PR that needs to run them can get stuck
forever.

Main point is that we want to do some work around improving monitoring, not
simply make more post-commits, or bigger post-commits.

On Tue, Oct 29, 2019 at 9:56 AM Chad Dombrova  wrote:

>
> +1 for splitting pre-commit tests into smaller modules. However in this
>> case we need to run all the small tests periodically and have some combined
>> flag or dashboard for regular monitoring. Otherwise we might not run/check
>> on big amount of tests.
>>
>
> post-commit seems like the best place for that, no?
>
>
>

Re: Quota issues again

2019-10-29 Thread Mikhail Gryzykhin

+1 for splitting pre-commit tests into smaller modules. However in this
case we need to run all the small tests periodically and have some combined
flag or dashboard for regular monitoring. Otherwise we might not run/check
on big amount of tests.


On Mon, Oct 28, 2019 at 6:39 PM Kenneth Knowles  wrote:

> It may also be advantageous to separate most submodules to not run a giant
> generic Java precommit. Each IO really only needs its own, and to register
> itself in the global Java precommit run only for the core. The bookkeeping
> may become quite a lot, but this is the natural structure.
>
> Kenn
>
> On Mon, Oct 28, 2019 at 6:12 PM Chad Dombrova  wrote:
>
>> Can we get more aggressive about separating tests into groups by those
>> that are dependent on other languages and those that are not?  I think we
>> could dramatically reduce our backlog if we didn’t run all of the Java
>> tests every time a commit is made that only affects python code, and vice
>> versa.
>>
>> -chad
>>
>>
>> On Mon, Oct 28, 2019 at 3:05 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> Quota jira issue:
>>> https://issues.apache.org/jira/browse/BEAM-8195
>>>
>>> On Mon, Oct 28, 2019 at 2:05 PM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>>
>>>> While validating release branch, I got failure due Quota again. Also, 
>>>> current queue time for jobs is more than 1.5 hours.
>>>>
>>>>
>>>> I'm not sure if it is worth starting another thread on tests efficiency, 
>>>> but still want to keep this mail to highlight the issues.
>>>>
>>>>
>>>> See PS for links.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> --Mikhail
>>>>
>>>>
>>>> PS:
>>>>
>>>> https://builds.apache.org/job/beam_PostCommit_Go_PR/71/consoleFull
>>>>
>>>> *13:46:25* 2019/10/28 20:46:25 Test wordcount:kinglear failed: googleapi: 
>>>> Error 429: Quota exceeded for quota metric 
>>>> 'dataflow.googleapis.com/create_requests' and limit 
>>>> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
>>>> consumer 'project_number:844138762903'., rateLimitExceeded
>>>>
>>>>
>>>> Queue time:
>>>>
>>>> http://metrics.beam.apache.org/d/_TNndF2iz/pre-commit-test-latency?orgId=1
>>>>
>>>>

Re: Quota issues again

2019-10-28 Thread Mikhail Gryzykhin

Quota jira issue:
https://issues.apache.org/jira/browse/BEAM-8195

On Mon, Oct 28, 2019 at 2:05 PM Mikhail Gryzykhin  wrote:

> Hi everyone,
>
>
> While validating release branch, I got failure due Quota again. Also, current 
> queue time for jobs is more than 1.5 hours.
>
>
> I'm not sure if it is worth starting another thread on tests efficiency, but 
> still want to keep this mail to highlight the issues.
>
>
> See PS for links.
>
>
> Regards,
>
> --Mikhail
>
>
> PS:
>
> https://builds.apache.org/job/beam_PostCommit_Go_PR/71/consoleFull
>
> *13:46:25* 2019/10/28 20:46:25 Test wordcount:kinglear failed: googleapi: 
> Error 429: Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'., rateLimitExceeded
>
>
> Queue time:
>
> http://metrics.beam.apache.org/d/_TNndF2iz/pre-commit-test-latency?orgId=1
>
>

Quota issues again

2019-10-28 Thread Mikhail Gryzykhin

Hi everyone,


While validating release branch, I got failure due Quota again. Also,
current queue time for jobs is more than 1.5 hours.


I'm not sure if it is worth starting another thread on tests
efficiency, but still want to keep this mail to highlight the issues.


See PS for links.


Regards,

--Mikhail


PS:

https://builds.apache.org/job/beam_PostCommit_Go_PR/71/consoleFull

*13:46:25* 2019/10/28 20:46:25 Test wordcount:kinglear failed:
googleapi: Error 429: Quota exceeded for quota metric
'dataflow.googleapis.com/create_requests' and limit
'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com'
for consumer 'project_number:844138762903'., rateLimitExceeded


Queue time:

http://metrics.beam.apache.org/d/_TNndF2iz/pre-commit-test-latency?orgId=1

Re: Beam 2.17.0 Release Tracking

2019-10-24 Thread Mikhail Gryzykhin

Thank you for updated link Thomas.

UPD:
Snapshot build completed
<https://builds.apache.org/job/beam_Release_NightlySnapshot/611/>.


On Thu, Oct 24, 2019 at 11:54 AM Thomas Weise 
wrote:

> Thanks Mikhail!
>
> JIRA issues pointed to a resolved ticket.
>
> This should list the open items:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.17.0%20and%20resolution%20is%20EMPTY
>
>
> On Thu, Oct 24, 2019 at 11:16 AM Mikhail Gryzykhin 
> wrote:
>
>> Hello everyone,
>>
>> Bream 2.17.0 release branch is cut
>> <https://github.com/apache/beam/tree/release-2.17.0>. Next steps are to
>> build snapshot and validate branch.
>>
>> Follow on blocking Jira issues
>> <https://issues.apache.org/jira/browse/BEAM-8403?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20fixVersion%20%3D%202.17.0>
>>  as
>> well.
>>
>> --Mikhail
>>
>>

Beam 2.17.0 Release Tracking

2019-10-24 Thread Mikhail Gryzykhin

Hello everyone,

Bream 2.17.0 release branch is cut
. Next steps are to
build snapshot and validate branch.

Follow on blocking Jira issues

as
well.

--Mikhail

[UPDATE] Preparing for Beam 2.17.0 release

2019-10-23 Thread Mikhail Gryzykhin

Hi all,

Beam 2.17 release branch cut is scheduled today (2019/10/23) according to
the release calendar [1].  I'll start working on the branch cutoff and
later work on cherry picking blocker fixes.

If you have release blocking issues for 2.17 please mark their "Fix
Version" as 2.17.0 [2]. This tag is already created in JIRA in case you
would like to move any non-blocking issues to that version.

There is a decent amount of open bugs to be resolved in 2.17.0 [2] and only
4 [3] are marked as blockers. Please, review those if these bugs are
actually to be resolved in 2.17.0 and prioritize fixes if possible.

Any thoughts, comments, objections?

Regards.
Mikhail.


[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
[2]
https://issues.apache.org/jira/browse/BEAM-8457?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20fixVersion%20%3D%202.17.0
[3]
https://issues.apache.org/jira/browse/BEAM-8457?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%202.17.0

CWiki edit rights.

2019-10-21 Thread Mikhail Gryzykhin

Hello everybody,

Just a friendly heads up.

Seems that CWiki changed authentication approach and people with non-apache
logins might have lost rights to edit Beam pages. So if you don't see
"Edit" button, that might be the case.

For committers: use your apache ldap. For others, I guess the process is
the same: ask for access on mailing list if needed.

Regards,
Mikhail.

[PROPOSAL] Preparing for Beam 2.17.0 release

2019-10-15 Thread Mikhail Gryzykhin

Hi all,

Beam 2.17 release branch cut is scheduled on Oct 23 according to the
release calendar
[1]. I would like to volunteer myself to do this release. The plan is to
cut the branch on that date, and cherrypick release-blocking fixes
afterwards if any.

If you have release blocking issues for 2.17 please mark their "Fix
Version" as 2.17.0 [2]. This tag is already created in JIRA in case you
would like to move any non-blocking issues to that version.

Any thoughts, comments, objections?

Regards.
Mikhail Gryzykhin

[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
[2]
https://issues.apache.org/jira/browse/BEAM-8403?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20fixVersion%20%3D%202.17.0

Re: [VOTE] Sign a pledge to discontinue support of Python 2 in 2020.

2019-10-02 Thread Mikhail Gryzykhin

+1

On Tue, Oct 1, 2019 at 6:24 PM Ankur Goenka  wrote:

> +1
>
> On Tue, Oct 1, 2019 at 4:27 PM Ruoyun Huang  wrote:
>
>> +1
>>
>> On Tue, Oct 1, 2019 at 3:52 PM Rui Wang  wrote:
>>
>>> +1
>>>
>>> I needed to use https://python3statement.org to access the website BTW
>>> (https, not http).
>>>
>>>
>>> -Rui
>>>
>>> On Tue, Oct 1, 2019 at 3:29 PM Cam Mach  wrote:
>>>
 +1



 On Tue, Oct 1, 2019 at 9:44 AM Udi Meiri  wrote:

> +1
>
> On Tue, Oct 1, 2019 at 3:22 AM Łukasz Gajowy 
> wrote:
>
>> +1
>>
>> wt., 1 paź 2019 o 11:29 Maximilian Michels 
>> napisał(a):
>>
>>> +1
>>>
>>> On 30.09.19 23:03, Reza Rokni wrote:
>>> > +1
>>> >
>>> > On Tue, 1 Oct 2019 at 13:54, Tanay Tummalapalli <
>>> ttanay...@gmail.com
>>> > > wrote:
>>> >
>>> > +1
>>> >
>>> > On Tue, Oct 1, 2019 at 8:19 AM Suneel Marthi <
>>> smar...@apache.org
>>> > > wrote:
>>> >
>>> > +1
>>> >
>>> > On Mon, Sep 30, 2019 at 10:33 PM Manu Zhang
>>> > mailto:owenzhang1...@gmail.com>>
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > On Tue, Oct 1, 2019 at 9:44 AM Austin Bennett
>>> > >> > > wrote:
>>> >
>>> > +1
>>> >
>>> > On Mon, Sep 30, 2019 at 5:22 PM Valentyn Tymofieiev
>>> > mailto:valen...@google.com>>
>>> wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > Please vote whether to sign a pledge on behalf
>>> of
>>> > Apache Beam to sunset Beam Python 2 offering
>>> (in new
>>> > releases) in 2020 on http://python3stament.org
>>>  as
>>> > follows:
>>> >
>>> > [ ] +1: Sign a pledge to discontinue support of
>>> > Python 2 in Beam in 2020.
>>> > [ ] -1: Do not sign a pledge to discontinue
>>> support
>>> > of Python 2 in Beam in 2020.
>>> >
>>> > The motivation and details for this vote were
>>> > discussed in [1, 2]. Please follow up in [2]
>>> if you
>>> > have any questions.
>>> >
>>> > This is a procedural vote [3] that will follow
>>> the
>>> > majority approval rules and will be open for at
>>> > least 72 hours.
>>> >
>>> > Thanks,
>>> > Valentyn
>>> >
>>> > [1]
>>> >
>>> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
>>> > [2]
>>> >
>>> https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E
>>> > [3]
>>> https://www.apache.org/foundation/voting.html
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > This email may be confidential and privileged. If you received
>>> this
>>> > communication by mistake, please don't forward it to anyone else,
>>> please
>>> > erase all copies and attachments, and please let me know that it
>>> has
>>> > gone to the wrong person.
>>> >
>>> > The above terms reflect a potential business arrangement, are
>>> provided
>>> > solely as a basis for further discussion, and are not intended to
>>> be and
>>> > do not constitute a legally binding obligation. No legally binding
>>> > obligations will be created, implied, or inferred until an
>>> agreement in
>>> > final form is executed in writing by all parties involved.
>>> >
>>>
>>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>

Re: [ANNOUNCE] New committer: Alan Myrvold

2019-09-30 Thread Mikhail Gryzykhin

Congratulations!

On Mon, Sep 30, 2019 at 9:47 AM David Cavazos  wrote:

> Congratulations Alan!
>
> On Mon, Sep 30, 2019 at 7:57 AM Connell O'Callaghan 
> wrote:
>
>> Congratulations Alan - well done!!! Ahmet thank you for sharing this
>> great news!!!
>>
>> On Mon, Sep 30, 2019 at 7:34 AM Łukasz Gajowy  wrote:
>>
>>> Congratulations :)
>>>
>>> pon., 30 wrz 2019 o 15:41 Reza Rokni  napisał(a):
>>>
 Woohoo Congratulations!

 On Mon, 30 Sep 2019 at 21:06, Thomas Weise  wrote:

> Congratulations, Alan!
>
>
> On Mon, Sep 30, 2019 at 4:47 AM Ismaël Mejía 
> wrote:
>
>> Congrats Alan!
>>
>> On Mon, Sep 30, 2019, 11:20 AM Tanay Tummalapalli <
>> ttanay...@gmail.com> wrote:
>>
>>> Congratulations, Alan!
>>>
>>>
>>> On Mon, Sep 30, 2019 at 1:03 PM Gleb Kanterov 
>>> wrote:
>>>
 Congratulations!

 On Sat, Sep 28, 2019 at 12:07 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Congratulations, Alan. Well deserved.
>
> On Fri, Sep 27, 2019 at 2:09 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> Congrats Alan!!
>>
>> On Fri, Sep 27, 2019 at 1:49 PM Jan Lukavský 
>> wrote:
>>
>>> Congrats Alan!
>>> On 9/27/19 10:22 PM, Mark Liu wrote:
>>>
>>> Congratulations Alan!!!
>>>
>>> On Fri, Sep 27, 2019 at 12:55 PM Ning Kang 
>>> wrote:
>>>
 Congrats Alan!

 On Fri, Sep 27, 2019 at 12:02 PM Ankur Goenka <
 goe...@google.com> wrote:

> Congratulations Alan!
>
> On Fri, Sep 27, 2019 at 11:17 AM Yichi Zhang <
> zyi...@google.com> wrote:
>
>> Congrats, Alan!
>>
>> On Fri, Sep 27, 2019 at 10:26 AM Robin Qiu <
>> robi...@google.com> wrote:
>>
>>> Congrats, Alan!
>>>
>>> On Fri, Sep 27, 2019 at 10:15 AM Hannah Jiang <
>>> hannahji...@google.com> wrote:
>>>
 Congrats Alan!

 On Fri, Sep 27, 2019 at 9:57 AM Ruoyun Huang <
 ruo...@google.com> wrote:

> Congratulations, Alan!
>
>
> On Fri, Sep 27, 2019 at 9:55 AM Rui Wang <
> ruw...@google.com> wrote:
>
>> Congrats!
>>
>> -Rui
>>
>> On Fri, Sep 27, 2019 at 9:54 AM Pablo Estrada <
>> pabl...@google.com> wrote:
>>
>>> Yooh! : D
>>>
>>> On Fri, Sep 27, 2019 at 9:53 AM Yifan Zou <
>>> yifan...@google.com> wrote:
>>>
 Congratulations, Alan!

 On Fri, Sep 27, 2019 at 9:18 AM Ahmet Altay <
 al...@google.com> wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in
> welcoming a new
> committer: Alan Myrvold
>
> Alan has been a long time Beam contributor. His
> contributions made Beam more productive and friendlier 
> [1] for all
> contributors with significant improvements to Beam 
> release process,
> automation, and infrastructure.
>
> In consideration of Alan's contributions, the Beam PMC
> trusts him
> with the responsibilities of a Beam committer [2].
>
> Thank you, Alan, for your contributions and looking
> forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://beam-summit-na-2019.firebaseapp.com/schedule/2019-09-11?sessionId=1126
> [2] https://beam.apache.org/contribute/become-a-
> committer/#an-apache-beam-committer
>

>
> --
> 
> Ruoyun  Huang
>
>

 --
 Cheers,
 Gleb

>>>

 --

 This email may be confidential and privileged. If you received this
 communication by mistake, please don't forward it to anyone else, please
 erase all copies and attachments, and please let me know that it has gone
 to the wrong person.

 The above terms

Re: Collecting feedback for Beam usage

2019-09-24 Thread Mikhail Gryzykhin

I'm with Luke on this. We can add a set of flags to send home stats and
crash dumps if user agrees. If we keep code isolated, it will be easy
enough for user to check what is being sent.

One more heavy-weight option is to also allow user configure and persist
what information he is ok with sharing.

--Mikhail


On Tue, Sep 24, 2019 at 10:02 AM Lukasz Cwik  wrote:

> Why not add a flag to the SDK that would do the phone home when specified?
>
> From a support perspective it would be useful to know:
> * SDK version
> * Runner
> * SDK provided PTransforms that are used
> * Features like user state/timers/side inputs/splittable dofns/...
> * Graph complexity (# nodes, # branches, ...)
> * Pipeline failed or succeeded
>
> On Mon, Sep 23, 2019 at 3:18 PM Robert Bradshaw 
> wrote:
>
>> On Mon, Sep 23, 2019 at 3:08 PM Brian Hulette 
>> wrote:
>> >
>> > Would people actually click on that link though? I think Kyle has a
>> point that in practice users would only find and click on that link when
>> they're having some kind of issue, especially if the link has "feedback" in
>> it.
>>
>> I think the idea is that we would make the link very light-weight,
>> kind of like a survey (but even easier as it's pre-populated).
>> Basically an opt-in phone-home. If we don't collect any personal data
>> (not even IP/geo, just (say) version + runner, all visible in the
>> URL), no need to guard/anonymize (and this may be sufficient--I don't
>> think we have to worry about spammers and ballot stuffers given the
>> target audience). If we can catch people while they wait for their
>> pipeline to start up (and/or complete), this is a great time to get
>> some feedback.
>>
>> > I agree usage data would be really valuable, but I'm not sure that this
>> approach would get us good data. Is there a way to get download statistics
>> for the different runner artifacts? Maybe that could be a better metric to
>> compare usage.
>>
>> This'd be useful too, but hard to get and very noisy.
>>
>> >
>> > On Mon, Sep 23, 2019 at 2:57 PM Ankur Goenka  wrote:
>> >>
>> >> I agree, these are the questions that need to be answered.
>> >> The data can be anonymize and stored as public data in BigQuery or
>> some other place.
>> >>
>> >> The intent is to get the usage statistics so that we can get to know
>> what people are using Flink or Spark etc and not intended for discussion or
>> a help channel.
>> >> I also think that we don't need to monitor this actively as it's more
>> like a survey rather than active channel to get issues resolved.
>> >>
>> >> If we think its useful for the community then we come up with the
>> solution as to how can we do this (similar to how we released the container
>> images).
>> >>
>> >>
>> >>
>> >> On Fri, Sep 20, 2019 at 4:38 PM Kyle Weaver 
>> wrote:
>> >>>
>> >>> There are some logistics that would need worked out. For example,
>> Where would the data go? Who would own it?
>> >>>
>> >>> Also, I'm not convinced we need yet another place to discuss Beam
>> when we already have discussed the challenge of simultaneously monitoring
>> mailing lists, Stack Overflow, Slack, etc. While "how do you use Beam" is
>> certainly an interesting question, and I'd be curious to know that >= X
>> many people use a certain runner, I'm not sure answers to these questions
>> are as useful for guiding the future of Beam as discussions on the
>> dev/users lists, etc. as the latter likely result in more depth/specific
>> feedback.
>> >>>
>> >>> However, I do think it could be useful in general to include links
>> directly in the console output. For example, maybe something along the
>> lines of "Oh no, your Flink pipeline crashed! Check Jira/file a bug/ask the
>> mailing list."
>> >>>
>> >>> Kyle Weaver | Software Engineer | github.com/ibzib |
>> kcwea...@google.com
>> >>>
>> >>>
>> >>> On Fri, Sep 20, 2019 at 4:14 PM Ankur Goenka 
>> wrote:
>> 
>>  Hi,
>> 
>>  At the moment we don't really have a good way to collect any usage
>> statistics for Apache Beam. Like runner used etc. As many of the users
>> don't really have a way to report their usecase.
>>  How about if we create a feedback page where users can add their
>> pipeline details and usecase.
>>  Also, we can start printing the link to this page when user launch
>> the pipeline in the command line.
>>  Example:
>>  $ python my_pipeline.py --runner DirectRunner --input /tmp/abc
>> 
>>  Starting pipeline
>>  Please use
>> http://feedback.beam.org?args=runner=DirectRunner,input=/tmp/abc
>>  Pipeline started
>>  ..
>> 
>>  Using a link and not publishing the data automatically will give
>> user control over what they publish and what they don't. We can enhance the
>> text and usage further but the basic idea is to ask for user feeback at
>> each run of the pipeline.
>>  Let me know what you think.
>> 
>> 
>>  Thanks,
>>  Ankur
>>
>

Re: Hackathon @BeamSummit @ApacheCon

2019-09-06 Thread Mikhail Gryzykhin

I'll be in most of the week and will join gladly.

On Thu, Sep 5, 2019, 14:32 Chad Dombrova  wrote:

> Has a date and time been picked for this?  I'll be there for part of the
> week and would love to join.
>
> On Tue, Sep 3, 2019 at 11:31 AM Brian Hulette  wrote:
>
>> I will be around all week as well and would love to help with a Beam
>> hackathon in any way :)
>>
>> On Thu, Aug 29, 2019 at 9:46 AM Maximilian Michels 
>> wrote:
>>
>>> Hey,
>>>
>>> I'm in as well! Austin and I recently talked about how we could organize
>>> the hackathon. Likely it will be an hour per day for exchanging ideas
>>> and learning about Beam. For example, there has been interest from the
>>> Apache Streams project to discuss points for collaboration.
>>>
>>> We will soon announce the exact hours.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 23.08.19 05:06, Kenneth Knowles wrote:
>>> > I will be at Beam Summit / ApacheCon NA and would love to drop by a
>>> > hackathon room if one is arranged. Really excited for both my first
>>> > ApacheCon and Beam Summit (finally!)
>>> >
>>> > Kenn
>>> >
>>> > On Thu, Aug 22, 2019 at 10:18 AM Austin Bennett
>>> > mailto:whatwouldausti...@gmail.com>>
>>> wrote:
>>> >
>>> > And, for clarity, especially focused on Hackathon times on Monday
>>> > and/or Tuesday of ApacheCon, to not conflict with BeamSummit
>>> sessions.
>>> >
>>> > On Thu, Aug 22, 2019 at 9:47 AM Austin Bennett
>>> > mailto:whatwouldausti...@gmail.com>>
>>> > wrote:
>>> >
>>> > Less than 3 weeks till Beam Summit @ApacheCon!
>>> >
>>> > We are to be in Vegas for BeamSummit and ApacheCon in a few
>>> weeks.
>>> >
>>> > Likely to reserve space in the Hackathon Room to accomplish
>>> some
>>> > tasks:
>>> > * Help Users
>>> > * Build Beam
>>> > * Collaborate with other projects
>>> > * etc
>>> >
>>> > If you're to be around (or not) let us know how you'd like to
>>> be
>>> > involved.  Also, please share and surface anything that would
>>> be
>>> > good for us to look at (and, esp. any beginner tasks, in case
>>> we
>>> > can entice some new contributors).
>>> >
>>> >
>>> > P.S.  See BeamSummit.org, if you're thinking of attending -
>>> > there's a discount code.
>>> >
>>>
>>

Re: Dataflow worker overview graphs

2019-08-08 Thread Mikhail Gryzykhin

Unfortunately no, I don't have those for streaming explicitly.

However most of code is shared between streaming and batch with main
difference in initialization. Same goes for boilerplate parts of legacy vs
FnApi.

If you happen to create anything similar for streaming, please update page
and let me know. Also I'll update this page with relevant changes once I
get back to worker.

--Mikhail

On Thu, Aug 8, 2019 at 2:13 PM Ankur Goenka  wrote:

> Thanks Mikhail. This is really useful.
> Do you also have something similar for Streaming use case. More
> specifically for Portable (fn_api) based streaming pipelines.
>
>
> On Thu, Aug 8, 2019 at 2:08 PM Mikhail Gryzykhin 
> wrote:
>
>> Hello everybody,
>>
>> Just wanted to share that I have found some graphs for dataflow worker I
>> created while starting working on it. They cover specific scenarios, but
>> may be useful for newcomers, so I put them into this wiki page
>> <https://cwiki.apache.org/confluence/display/BEAM/Dataflow+Worker+overview+graphs>
>> .
>>
>> If you feel they belong to some other location, please let me know.
>>
>> Regards,
>> Mikhail.
>>
>

Dataflow worker overview graphs

2019-08-08 Thread Mikhail Gryzykhin

Hello everybody,

Just wanted to share that I have found some graphs for dataflow worker I
created while starting working on it. They cover specific scenarios, but
may be useful for newcomers, so I put them into this wiki page

.

If you feel they belong to some other location, please let me know.

Regards,
Mikhail.

Re: [DISCUSS] Moving FakeBigQueryServices to main/ rather than test/

2019-07-30 Thread Mikhail Gryzykhin

+1
It is completely worth it.

On Tue, Jul 30, 2019 at 8:50 PM Rui Wang  wrote:

> +1.
>
> I did something similar before: move TestBoundedTable to BeamSQL main to
> allow another module tests use it.
>
>
> -Rui
>
> On Tue, Jul 30, 2019 at 6:13 PM Pablo Estrada  wrote:
>
>> Hello all,
>> I found some test utilities that we use to write unit tests for
>> transforms that read/write to/from BigQuery. These are all the
>> non-(*IT.java/*Test.java) classes in [1].
>>
>> I believe that users may want to write tests for their own pipelines that
>> may rely on complex DynamicDestination logic (imagine streaming, or side
>> inputs for on-the-fly schema computation, or other tricky issues).
>>
>> I think it makes sense to move these classes to
>> org.apache.beam.io.gcp.bigquery.testing, and publish them in the release.
>> Thoughts?
>>
>> -P.
>>
>> [1]
>> https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery
>>
>

Beam metrics update

2019-07-26 Thread Mikhail Gryzykhin

Hello everybody,

I'm working on improving deployment scripts for beam metrics site
 and going to do some updates
over the weekend. This might bring site down for short periods of time.

Please respond to this message if you require metrics dashboards up.

Regards,
Mikhail.

Re: Proposal: Add permanent url to community metrics dashboard

2019-07-22 Thread Mikhail Gryzykhin

Thank you for starting this.

I'm working on some maintenance on metrics site meanwhile. Looking to
disable auth and intializing dashboards from github sources. We might have
to look into proper deployment afterwards as well.

--Mikhail

On Mon, Jul 22, 2019 at 4:11 PM Pablo Estrada  wrote:

> Hi all,
> I've filed https://issues.apache.org/jira/browse/INFRA-18786 for this.
> Thanks!
> -P.
>
> On Thu, Jul 18, 2019 at 1:38 PM Mikhail Gryzykhin 
> wrote:
>
>> +1 explicitly
>>
>> On Thu, Jul 18, 2019 at 1:46 AM Łukasz Gajowy 
>> wrote:
>>
>>> +1 for pushing this forward. The url "metrics.beam.apache.org" looks
>>> good to me and is generic enough - this is good in case we want to display
>>> not only "community metrics" in grafana but for eg. IOIT or load tests
>>> resutls.
>>>
>>> Thanks!
>>>
>>> czw., 18 lip 2019 o 00:48 Mikhail Gryzykhin 
>>> napisał(a):
>>>
>>>> Thank you Alan, that's an interesting link.
>>>>
>>>> Latest Grafana version in docker is v6.2.5, so issues on that list are
>>>> not applicable. We should be fine on this front. Should update container
>>>> version of grafana running on service though.
>>>>
>>>> @Pablo
>>>> I feel it's best for PMC to start conversation with INFRA. I can follow
>>>> up on it if you CC me.
>>>>
>>>> Regards,
>>>> Mikhail.
>>>>
>>>>
>>>> On Wed, Jul 17, 2019 at 2:46 PM Alan Myrvold 
>>>> wrote:
>>>>
>>>>> Are all of the CVE issues fixed at the version in use?
>>>>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=grafana
>>>>> XSS isn't much of a concern until there is a hostname associated.
>>>>>
>>>>> On Wed, Jul 17, 2019 at 2:17 PM Pablo Estrada 
>>>>> wrote:
>>>>>
>>>>>> I'd like to move this forward. Mikhail, would you be interested in
>>>>>> filing an issue with Infra to see if it's possible? I can do it if you
>>>>>> prefer.
>>>>>>
>>>>>> It seems that the concerns related to these dashboards showing up in
>>>>>> search results have been addressed. Does the community have any other
>>>>>> concern around this before we can move it forward?
>>>>>> Best
>>>>>> -P.
>>>>>>
>>>>>> On Wed, May 22, 2019 at 8:53 AM Kenneth Knowles 
>>>>>> wrote:
>>>>>>
>>>>>>> I suggest asking infra about the best way to proceed, so that we
>>>>>>> don't vote on something that doesn't work for them. This might be 
>>>>>>> something
>>>>>>> handy to spin up easily for any Apache project using similar tools.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Tue, May 21, 2019 at 1:02 PM Mikhail Gryzykhin 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Current http://104.154.241.245/robots.txt is already disallow all,
>>>>>>>> so we are good here.
>>>>>>>>
>>>>>>>> On Tue, May 21, 2019 at 12:57 PM Ahmet Altay 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> If SSL is a concern that makes sense, I am not familiar with that
>>>>>>>>> enough to suggest whether another way to do this exists or not.
>>>>>>>>>
>>>>>>>>> It will be good to check that we can set robots.txt properly from
>>>>>>>>> the begging if we go down this path.
>>>>>>>>>
>>>>>>>>> On Mon, May 20, 2019 at 10:54 AM Mikhail Gryzykhin <
>>>>>>>>> mig...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> @Ahmet Altay 
>>>>>>>>>> Thank you for the comment.
>>>>>>>>>>
>>>>>>>>>> Point on search engines is really good. If that happens we can
>>>>>>>>>> look into configuring robots.txt to notify search engines to ignore 
>>>>>>>>>> whole
>>>>>>>>>> domain.
>>>>>>>>>> The link is a redirect to static IP. So it is still confusing.
>>>>>>>&

Re: Proposal: Add permanent url to community metrics dashboard

2019-07-18 Thread Mikhail Gryzykhin

+1 explicitly

On Thu, Jul 18, 2019 at 1:46 AM Łukasz Gajowy 
wrote:

> +1 for pushing this forward. The url "metrics.beam.apache.org" looks good
> to me and is generic enough - this is good in case we want to display not
> only "community metrics" in grafana but for eg. IOIT or load tests
> resutls.
>
> Thanks!
>
> czw., 18 lip 2019 o 00:48 Mikhail Gryzykhin 
> napisał(a):
>
>> Thank you Alan, that's an interesting link.
>>
>> Latest Grafana version in docker is v6.2.5, so issues on that list are
>> not applicable. We should be fine on this front. Should update container
>> version of grafana running on service though.
>>
>> @Pablo
>> I feel it's best for PMC to start conversation with INFRA. I can follow
>> up on it if you CC me.
>>
>> Regards,
>> Mikhail.
>>
>>
>> On Wed, Jul 17, 2019 at 2:46 PM Alan Myrvold  wrote:
>>
>>> Are all of the CVE issues fixed at the version in use?
>>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=grafana
>>> XSS isn't much of a concern until there is a hostname associated.
>>>
>>> On Wed, Jul 17, 2019 at 2:17 PM Pablo Estrada 
>>> wrote:
>>>
>>>> I'd like to move this forward. Mikhail, would you be interested in
>>>> filing an issue with Infra to see if it's possible? I can do it if you
>>>> prefer.
>>>>
>>>> It seems that the concerns related to these dashboards showing up in
>>>> search results have been addressed. Does the community have any other
>>>> concern around this before we can move it forward?
>>>> Best
>>>> -P.
>>>>
>>>> On Wed, May 22, 2019 at 8:53 AM Kenneth Knowles 
>>>> wrote:
>>>>
>>>>> I suggest asking infra about the best way to proceed, so that we don't
>>>>> vote on something that doesn't work for them. This might be something 
>>>>> handy
>>>>> to spin up easily for any Apache project using similar tools.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Tue, May 21, 2019 at 1:02 PM Mikhail Gryzykhin 
>>>>> wrote:
>>>>>
>>>>>> Current http://104.154.241.245/robots.txt is already disallow all,
>>>>>> so we are good here.
>>>>>>
>>>>>> On Tue, May 21, 2019 at 12:57 PM Ahmet Altay 
>>>>>> wrote:
>>>>>>
>>>>>>> If SSL is a concern that makes sense, I am not familiar with that
>>>>>>> enough to suggest whether another way to do this exists or not.
>>>>>>>
>>>>>>> It will be good to check that we can set robots.txt properly from
>>>>>>> the begging if we go down this path.
>>>>>>>
>>>>>>> On Mon, May 20, 2019 at 10:54 AM Mikhail Gryzykhin <
>>>>>>> mig...@google.com> wrote:
>>>>>>>
>>>>>>>> @Ahmet Altay 
>>>>>>>> Thank you for the comment.
>>>>>>>>
>>>>>>>> Point on search engines is really good. If that happens we can look
>>>>>>>> into configuring robots.txt to notify search engines to ignore whole 
>>>>>>>> domain.
>>>>>>>> The link is a redirect to static IP. So it is still confusing.
>>>>>>>>
>>>>>>>> Having domain name will allow for getting SSL associated with it
>>>>>>>> and will allow to keep same address even if IP changes (say we want to 
>>>>>>>> move
>>>>>>>> to other hoster).
>>>>>>>>
>>>>>>>
>>>>>>> I suppose short link will also allow us to change the host very
>>>>>>> similar to a domain name. That is a minor point anyway.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Given two points above, I still consider that having explicit name
>>>>>>>> will be beneficial. If there's some other way to get SSL cert and 
>>>>>>>> benefit
>>>>>>>> of static name I'm eager to utilize it.
>>>>>>>>
>>>>>>>> On Mon, May 20, 2019 at 10:43 AM Ahmet Altay 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Mikhail,
>>>>>>>>>

Re: Proposal: Add permanent url to community metrics dashboard

2019-07-17 Thread Mikhail Gryzykhin

Thank you Alan, that's an interesting link.

Latest Grafana version in docker is v6.2.5, so issues on that list are not
applicable. We should be fine on this front. Should update container
version of grafana running on service though.

@Pablo
I feel it's best for PMC to start conversation with INFRA. I can follow up
on it if you CC me.

Regards,
Mikhail.


On Wed, Jul 17, 2019 at 2:46 PM Alan Myrvold  wrote:

> Are all of the CVE issues fixed at the version in use?
> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=grafana
> XSS isn't much of a concern until there is a hostname associated.
>
> On Wed, Jul 17, 2019 at 2:17 PM Pablo Estrada  wrote:
>
>> I'd like to move this forward. Mikhail, would you be interested in filing
>> an issue with Infra to see if it's possible? I can do it if you prefer.
>>
>> It seems that the concerns related to these dashboards showing up in
>> search results have been addressed. Does the community have any other
>> concern around this before we can move it forward?
>> Best
>> -P.
>>
>> On Wed, May 22, 2019 at 8:53 AM Kenneth Knowles  wrote:
>>
>>> I suggest asking infra about the best way to proceed, so that we don't
>>> vote on something that doesn't work for them. This might be something handy
>>> to spin up easily for any Apache project using similar tools.
>>>
>>> Kenn
>>>
>>> On Tue, May 21, 2019 at 1:02 PM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> Current http://104.154.241.245/robots.txt is already disallow all, so
>>>> we are good here.
>>>>
>>>> On Tue, May 21, 2019 at 12:57 PM Ahmet Altay  wrote:
>>>>
>>>>> If SSL is a concern that makes sense, I am not familiar with that
>>>>> enough to suggest whether another way to do this exists or not.
>>>>>
>>>>> It will be good to check that we can set robots.txt properly from the
>>>>> begging if we go down this path.
>>>>>
>>>>> On Mon, May 20, 2019 at 10:54 AM Mikhail Gryzykhin 
>>>>> wrote:
>>>>>
>>>>>> @Ahmet Altay 
>>>>>> Thank you for the comment.
>>>>>>
>>>>>> Point on search engines is really good. If that happens we can look
>>>>>> into configuring robots.txt to notify search engines to ignore whole 
>>>>>> domain.
>>>>>> The link is a redirect to static IP. So it is still confusing.
>>>>>>
>>>>>> Having domain name will allow for getting SSL associated with it and
>>>>>> will allow to keep same address even if IP changes (say we want to move 
>>>>>> to
>>>>>> other hoster).
>>>>>>
>>>>>
>>>>> I suppose short link will also allow us to change the host very
>>>>> similar to a domain name. That is a minor point anyway.
>>>>>
>>>>>
>>>>>>
>>>>>> Given two points above, I still consider that having explicit name
>>>>>> will be beneficial. If there's some other way to get SSL cert and benefit
>>>>>> of static name I'm eager to utilize it.
>>>>>>
>>>>>> On Mon, May 20, 2019 at 10:43 AM Ahmet Altay 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Mikhail,
>>>>>>>
>>>>>>> Thank you for your work on this. I have some comments:
>>>>>>>
>>>>>>> - There is already a short link (
>>>>>>> https://s.apache.org/beam-community-metrics). Would a link from
>>>>>>> contributing to beam page (if there is not one already) sufficient> 
>>>>>>> People
>>>>>>> can bookmark the short link if they need to quickly access.
>>>>>>> - Metrics is a developer facing tool. If it has its own subdomain
>>>>>>> and start showing up in web search results, it will be a confusing 
>>>>>>> landing
>>>>>>> page for people simply searching for "beam metrics". I believe there is
>>>>>>> some value in having a single domain and linking to various things from
>>>>>>> there. This would be similar to how we link to jira, wiki, mailing list
>>>>>>> archives.
>>>>>>>
>>>>>>> Ahmet
>>>>>>>
>>>>>>> On Fri, May 17, 2019 at 9:26 PM Mikhail Gryzykhin <

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Mikhail Gryzykhin

Congratulations!

On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka  wrote:

> Congratulations Robert!
>
> Go GO!
>
> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang  wrote:
>
>> Congrats!
>>
>>
>> -Rui
>>
>> On Tue, Jul 16, 2019 at 10:32 AM Udi Meiri  wrote:
>>
>>> Congrats Robert B.!
>>>
>>> On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Robert Burke.

 Robert has been contributing to Beam and actively involved in the
 community for over a year. He has been actively working on Go SDK, helping
 users, and making it easier for others to contribute [1].

 In consideration of Robert's contributions, the Beam PMC trusts him
 with the responsibilities of a Beam committer [2].

 Thank you, Robert, for your contributions and looking forward to many
 more!

 Ahmet, on behalf of the Apache Beam PMC

 [1]
 https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
 [2] https://beam.apache.org/contribute/become-a-committer
 /#an-apache-beam-committer

>>>

BQ IO GC thrashing when specifying .withMethod(STREAMING_INSERTS)

2019-07-01 Thread Mikhail Gryzykhin

Hello everybody,

This question is regarding user post on StackOverflow

.

My understanding of problem is that setting .withMethod(STREAMING_INSERTS)
on BigQueryIO sink causes GC thrashing on big amount of entries.

Is there a known issue or information how to start triaging this?

Search on Jira shown me this ticket, but it is not directly connected with
the issue: https://issues.apache.org/jira/browse/BEAM-7666

Thank you,
Mikhail.

Re: Using Grafana to display test metrics and alert anomalies

2019-06-26 Thread Mikhail Gryzykhin

Hi Łukasz,

See answers inline.

Regard,
Mikhail.

On Wed, Jun 26, 2019 at 7:47 AM Łukasz Gajowy  wrote:

> Hi Mikhail!
>
> Together with Kamil we're investigating the possibilities of creating
> alerts for anomalies for the metrics collected from various tests (load, IO
> tests, other performance tests). This is unfortunately impossible to do in
> Perfkit explorer tool that we're using for displaying the metrics right now
> [1]. Therefore we're considering a switch to some other solution.
>

> This is why we'd like to ask you some questions about the Community
> Metrics tool. It is set up using Grafana that has the alerting feature out
> of the box so it is a natural candidate. Moreover, it let's keep the
> infrastructure as code which is also a big plus. Unfortunately alerting
> feature does not work with BigQuery as a data source for Grafana [2].
>
I'd say it is worth looking into. Keeping most of metrics in one place is
much more convenient than have multiple tools in different locations.

>
> The questions:
>
>1. What do you think of adding test related metrics as separate
>dashboards in the existing Grafana instance?
>
> This should be completely fine and I don't see any blockers.

>
>1. We were thinking of setting up a cloud SQL Postgres instance for
>storing test metrics and reference this source in our Grafana dashboards.
>Won't this approach collide in any way with existing setup?
>
> You can reuse existing PSQL DB that is utilized by Grafana. It's already
hosted in GCP and should be available for you to use. Some permissions
configuration might need configuring though. Even if you utilize separate
DB, Grafana supports multiple data sources, so there should be no issues.

>
>1. Have you tried setting up alerts in Grafana for community metrics?
>Do you expect any blockers there?
>
> We do not have email configured in Grafana, however we utilize alerts on
freshness dashboard that are later checked by metrics prober job

.


> I also CCed the devlist for visibility and comments (if any).
>
> Thanks!
> Łukasz
>
> [1] https://s.apache.org/io-test-dashboards
> [2] https://github.com/doitintl/bigquery-grafana/issues/67
>

Re: [ANNOUNCE] New committer: Mikhail Gryzykhin

2019-06-24 Thread Mikhail Gryzykhin

Thank you everyone.

On Mon, Jun 24, 2019 at 2:28 AM Aizhamal Nurmamat kyzy 
wrote:

> Congrats Misha!
>
> On Mon, Jun 24, 2019 at 11:23 Łukasz Gajowy  wrote:
>
>> Congratulations Mikhail!
>>
>> pt., 21 cze 2019 o 22:09 Ruoyun Huang  napisał(a):
>>
>>> Congratulations! Mikhail!
>>>
>>>
>>> On Fri, Jun 21, 2019 at 1:00 PM Yichi Zhang  wrote:
>>>
>>>> Congrats!
>>>>
>>>> On Fri, Jun 21, 2019 at 11:55 AM Tanay Tummalapalli <
>>>> ttanay...@gmail.com> wrote:
>>>>
>>>>> Congratulations!
>>>>>
>>>>> On Fri, Jun 21, 2019 at 10:35 PM Rui Wang  wrote:
>>>>>
>>>>>> Congrats!
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Fri, Jun 21, 2019 at 9:58 AM Robin Qiu  wrote:
>>>>>>
>>>>>>> Congrats, Mikhail!
>>>>>>>
>>>>>>> On Fri, Jun 21, 2019 at 9:12 AM Alexey Romanenko <
>>>>>>> aromanenko@gmail.com> wrote:
>>>>>>>
>>>>>>>> Congrats, Mikhail!
>>>>>>>>
>>>>>>>> On 21 Jun 2019, at 18:01, Anton Kedin  wrote:
>>>>>>>>
>>>>>>>> Congrats!
>>>>>>>>
>>>>>>>> On Fri, Jun 21, 2019 at 3:55 AM Reza Rokni  wrote:
>>>>>>>>
>>>>>>>>> Congratulations!
>>>>>>>>>
>>>>>>>>> On Fri, 21 Jun 2019, 12:37 Robert Burke, 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Congrats
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 21, 2019, 12:29 PM Thomas Weise 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>>>>>>> committer: Mikhail Gryzykhin.
>>>>>>>>>>>
>>>>>>>>>>> Mikhail has been contributing to Beam and actively involved in
>>>>>>>>>>> the community for over a year. He developed the community build 
>>>>>>>>>>> dashboard
>>>>>>>>>>> [1] and added substantial improvements to our build infrastructure.
>>>>>>>>>>> Mikhail's work also covers metrics, contributor documentation, 
>>>>>>>>>>> development
>>>>>>>>>>> process improvements and other areas.
>>>>>>>>>>>
>>>>>>>>>>> In consideration of Mikhail's contributions, the Beam PMC trusts
>>>>>>>>>>> him with the responsibilities of a Beam committer [2].
>>>>>>>>>>>
>>>>>>>>>>> Thank you, Mikhail, for your contributions and looking forward
>>>>>>>>>>> to many more!
>>>>>>>>>>>
>>>>>>>>>>> Thomas, on behalf of the Apache Beam PMC
>>>>>>>>>>>
>>>>>>>>>>> [1] https://s.apache.org/beam-community-metrics
>>>>>>>>>>> [2]
>>>>>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>
>>> --
>>> 
>>> Ruoyun  Huang
>>>
>>>

Re: PubSubIT tests topic cleanup?

2019-05-24 Thread Mikhail Gryzykhin

Last time it was decided to manually cleanup topics and postpone fix. My
estimate was that we need to cleanup topics about every two months.

I think, we should cleanup topics manually to mitigate issue and prioritize
proper fix.

On Fri, May 24, 2019, 12:00 Pablo Estrada  wrote:

> Seems like Mikhail created https://issues.apache.org/jira/browse/BEAM-6610 
> last
> time ^^'
>
> On Fri, May 24, 2019 at 11:58 AM Kenneth Knowles  wrote:
>
>> Is there a jira tracking this?
>>
>> Kenn
>>
>> On Fri, May 24, 2019, 11:50 Andrew Pilloud  wrote:
>>
>>> This came up on the list in before in February:
>>>
>>> https://lists.apache.org/thread.html/38384d193e6f0af89f00a583e56cff93b18cfaebbf84e743eb900bc5@%3Cdev.beam.apache.org%3E
>>>
>>> We should be cleaning up topics, but it sounds like we aren't.
>>>
>>> Andrew
>>>
>>> On Fri, May 24, 2019 at 11:42 AM Pablo Estrada 
>>> wrote:
>>>
 I've found a bunch of topics created by PubSub integration tests - they
 dont seem to be getting cleaned up, perhaps?

 614 name:
 projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSelectsPayloadContent-
 614 name:
 projects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-
 222 name:
 projects/apache-beam-testing/topics/integ-test-PubsubReadIT-testReadPublicData-

>>>

Quota: In use IP-adresses

2019-05-23 Thread Mikhail Gryzykhin

Hello everybody,

Some of our jobs fail with 1/0 in use IP-addresses quota exception.

Seems that we spin-up too many VMs and run out of IP-addresses. Should we
bump the quota to mitigate the issue?

Regards,
Mikhail.

---
https://issues.apache.org/jira/browse/BEAM-7410

Re: Beam dashboards

2019-05-21 Thread Mikhail Gryzykhin

@Łukasz Gajowy 

Reviving old thread:
Grafana doesn't support BigQuery officially, but recently there were news
with unofficial BQ plugin:
* https://blog.doit-intl.com/power-grafana-with-google-bigquery-6822443a7f99
* https://github.com/doitintl/bigquery-grafana

I didn't try it yet, but is looks promising.

--Mikhail

On Tue, Sep 25, 2018 at 4:37 AM Łukasz Gajowy 
wrote:

> Nice! The Grafana dashboards look great!
>
> Side question: do you think Grafana can be used to present results stored
> in BigQuery? Maybe in the future, we could use Grafana to show every
> testing dashboards (IO tests, nexmark, currently developed load tests) with
> this tool leveraging its more advanced features such as dynamic data
> ranges, alerts, ad-hoc filters etc[1]?
>
> [1] https://grafana.com/grafana
>
> pon., 17 wrz 2018 o 19:14 Mikhail Gryzykhin 
> napisał(a):
>
>> Thank you for feedback.
>>
>> I had the idea of adding the list of tests with duration. Unfortunately,
>> I'm not clear on how to represent such metric. One of the ideas I had is to
>> show graph with top 10 slowest tests for each job run. However we have too
>> many different test jobs. I'm open to brainstorming ideas in this area.
>>
>> Meanwhile I'll play around with different options and see what I can come
>> up with.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Sun, Sep 16, 2018 at 4:04 AM Maximilian Michels 
>> wrote:
>>
>>> Thanks Mikhail, that will help to identify flaky or slow tests. At the
>>> size of the Beam code base, such statistics are extremely helpful.
>>>
>>> If we had a list of test cases ordered by test duration, that would be a
>>> great addition.
>>>
>>> On 14.09.18 00:30, Connell O'Callaghan wrote:
>>> > Thank you Mikhail for sharing this and to everyone involved in these
>>> > improvements!!! It will be great to hear about progress and any
>>> blockers
>>> > encountered with this work.
>>> >
>>> > On Thu, Sep 13, 2018 at 11:12 AM Mikhail Gryzykhin >> > <mailto:mig...@google.com>> wrote:
>>> >
>>> > Hi everyone,
>>> >
>>> > Huygaa and me are working on creating dashboards for Beam.
>>> >
>>> > So far I created a POC dashboard for post-commit greenness
>>> > <http://104.154.241.245/d/D81lW0pmk/post-commit-tests> that
>>> includes
>>> > a list of latest failed jobs and dashboard for pre-commits
>>> > <http://104.154.241.245/d/_TNndF2iz/pre-commit-tests> duration,
>>> that
>>> > one contains only duration so far.
>>> >
>>> > We have work ongoing for adding github statistics.
>>> >
>>> > Please, check those out. Any feedback is welcome.
>>> >
>>> > Small hints:
>>> > * You can change time range you're looking at in to-right corner.
>>> > * You can choose dashboard by clicking dashboard name on top-left
>>> > corner.
>>> >
>>> > As a short insight:
>>> > Thank you everyone who helps fixing post-commit tests flakes. We
>>> > moved from ~30% to 75-85% successful runs.
>>> >
>>> > Best regards,
>>> > --Mikhail
>>> >
>>> > Have feedback <http://go/migryz-feedback>?
>>> >
>>>
>>

Re: Proposal: Add permanent url to community metrics dashboard

2019-05-21 Thread Mikhail Gryzykhin

Current http://104.154.241.245/robots.txt is already disallow all, so we
are good here.

On Tue, May 21, 2019 at 12:57 PM Ahmet Altay  wrote:

> If SSL is a concern that makes sense, I am not familiar with that enough
> to suggest whether another way to do this exists or not.
>
> It will be good to check that we can set robots.txt properly from the
> begging if we go down this path.
>
> On Mon, May 20, 2019 at 10:54 AM Mikhail Gryzykhin 
> wrote:
>
>> @Ahmet Altay 
>> Thank you for the comment.
>>
>> Point on search engines is really good. If that happens we can look into
>> configuring robots.txt to notify search engines to ignore whole domain.
>> The link is a redirect to static IP. So it is still confusing.
>>
>> Having domain name will allow for getting SSL associated with it and will
>> allow to keep same address even if IP changes (say we want to move to other
>> hoster).
>>
>
> I suppose short link will also allow us to change the host very similar to
> a domain name. That is a minor point anyway.
>
>
>>
>> Given two points above, I still consider that having explicit name will
>> be beneficial. If there's some other way to get SSL cert and benefit of
>> static name I'm eager to utilize it.
>>
>> On Mon, May 20, 2019 at 10:43 AM Ahmet Altay  wrote:
>>
>>> Hi Mikhail,
>>>
>>> Thank you for your work on this. I have some comments:
>>>
>>> - There is already a short link (
>>> https://s.apache.org/beam-community-metrics). Would a link from
>>> contributing to beam page (if there is not one already) sufficient> People
>>> can bookmark the short link if they need to quickly access.
>>> - Metrics is a developer facing tool. If it has its own subdomain and
>>> start showing up in web search results, it will be a confusing landing page
>>> for people simply searching for "beam metrics". I believe there is some
>>> value in having a single domain and linking to various things from there.
>>> This would be similar to how we link to jira, wiki, mailing list archives.
>>>
>>> Ahmet
>>>
>>> On Fri, May 17, 2019 at 9:26 PM Mikhail Gryzykhin <
>>> gryzykhin.mikh...@gmail.com> wrote:
>>>
>>>> @Aizamat
>>>> Code is not generalized and is project specific in some places. But it
>>>> is small and pretty straightforward so can be ported easily. Whole thing
>>>> can be started locally with a single docker command, so it's easy to try it
>>>> out.
>>>>
>>>> On Fri, May 17, 2019, 19:33 Aizhamal Nurmamat kyzy 
>>>> wrote:
>>>>
>>>>> Hi Mikhail,
>>>>>
>>>>> I think this dashboard is amazing, and would love to have an easy
>>>>> access to it. So here is my non binding +1.
>>>>>
>>>>> On the side note, how easy is to recreate it for other Apache
>>>>> projects? ;)
>>>>>
>>>>> Thanks,
>>>>> Aizhamal
>>>>>
>>>>> *From: *Mikhail Gryzykhin 
>>>>> *Date: *Fri, May 17, 2019 at 6:49 PM
>>>>> *To: *dev
>>>>>
>>>>> Hello everyone,
>>>>>>
>>>>>> Some time ago we started community metrics dashboard.
>>>>>> <https://s.apache.org/beam-community-metrics> However we never had
>>>>>> added a permanent URL for it. This is really inconvenient to use, since
>>>>>> only available way to access dashboard is by IP-address.
>>>>>>
>>>>>> In this tread I'd like to:
>>>>>> 1. Vote to assign metrics.beam.apache.org to metrics dashboard (
>>>>>> http://104.154.241.245).
>>>>>> 2. Gather information on how to do it. I can assume only following
>>>>>> steps so far: a) vote b) once vote is complete, contact Apache INFRA to
>>>>>> help with this.
>>>>>>
>>>>>> Regards,
>>>>>> Mikhail.
>>>>>>
>>>>>>

Re: Proposal: Add permanent url to community metrics dashboard

2019-05-20 Thread Mikhail Gryzykhin

@Ahmet Altay 
Thank you for the comment.

Point on search engines is really good. If that happens we can look into
configuring robots.txt to notify search engines to ignore whole domain.
The link is a redirect to static IP. So it is still confusing.

Having domain name will allow for getting SSL associated with it and will
allow to keep same address even if IP changes (say we want to move to other
hoster).

Given two points above, I still consider that having explicit name will be
beneficial. If there's some other way to get SSL cert and benefit of static
name I'm eager to utilize it.

On Mon, May 20, 2019 at 10:43 AM Ahmet Altay  wrote:

> Hi Mikhail,
>
> Thank you for your work on this. I have some comments:
>
> - There is already a short link (
> https://s.apache.org/beam-community-metrics). Would a link from
> contributing to beam page (if there is not one already) sufficient> People
> can bookmark the short link if they need to quickly access.
> - Metrics is a developer facing tool. If it has its own subdomain and
> start showing up in web search results, it will be a confusing landing page
> for people simply searching for "beam metrics". I believe there is some
> value in having a single domain and linking to various things from there.
> This would be similar to how we link to jira, wiki, mailing list archives.
>
> Ahmet
>
> On Fri, May 17, 2019 at 9:26 PM Mikhail Gryzykhin <
> gryzykhin.mikh...@gmail.com> wrote:
>
>> @Aizamat
>> Code is not generalized and is project specific in some places. But it is
>> small and pretty straightforward so can be ported easily. Whole thing can
>> be started locally with a single docker command, so it's easy to try it out.
>>
>> On Fri, May 17, 2019, 19:33 Aizhamal Nurmamat kyzy 
>> wrote:
>>
>>> Hi Mikhail,
>>>
>>> I think this dashboard is amazing, and would love to have an easy access
>>> to it. So here is my non binding +1.
>>>
>>> On the side note, how easy is to recreate it for other Apache projects?
>>> ;)
>>>
>>> Thanks,
>>> Aizhamal
>>>
>>> *From: *Mikhail Gryzykhin 
>>> *Date: *Fri, May 17, 2019 at 6:49 PM
>>> *To: *dev
>>>
>>> Hello everyone,
>>>>
>>>> Some time ago we started community metrics dashboard.
>>>> <https://s.apache.org/beam-community-metrics> However we never had
>>>> added a permanent URL for it. This is really inconvenient to use, since
>>>> only available way to access dashboard is by IP-address.
>>>>
>>>> In this tread I'd like to:
>>>> 1. Vote to assign metrics.beam.apache.org to metrics dashboard (
>>>> http://104.154.241.245).
>>>> 2. Gather information on how to do it. I can assume only following
>>>> steps so far: a) vote b) once vote is complete, contact Apache INFRA to
>>>> help with this.
>>>>
>>>> Regards,
>>>> Mikhail.
>>>>
>>>>

Re: Proposal: Add permanent url to community metrics dashboard

2019-05-17 Thread Mikhail Gryzykhin

@Aizamat
Code is not generalized and is project specific in some places. But it is
small and pretty straightforward so can be ported easily. Whole thing can
be started locally with a single docker command, so it's easy to try it out.

On Fri, May 17, 2019, 19:33 Aizhamal Nurmamat kyzy 
wrote:

> Hi Mikhail,
>
> I think this dashboard is amazing, and would love to have an easy access
> to it. So here is my non binding +1.
>
> On the side note, how easy is to recreate it for other Apache projects? ;)
>
> Thanks,
> Aizhamal
>
> *From: *Mikhail Gryzykhin 
> *Date: *Fri, May 17, 2019 at 6:49 PM
> *To: *dev
>
> Hello everyone,
>>
>> Some time ago we started community metrics dashboard.
>> <https://s.apache.org/beam-community-metrics> However we never had added
>> a permanent URL for it. This is really inconvenient to use, since only
>> available way to access dashboard is by IP-address.
>>
>> In this tread I'd like to:
>> 1. Vote to assign metrics.beam.apache.org to metrics dashboard (
>> http://104.154.241.245).
>> 2. Gather information on how to do it. I can assume only following steps
>> so far: a) vote b) once vote is complete, contact Apache INFRA to help with
>> this.
>>
>> Regards,
>> Mikhail.
>>
>>

Proposal: Add permanent url to community metrics dashboard

2019-05-17 Thread Mikhail Gryzykhin

Hello everyone,

Some time ago we started community metrics dashboard.
 However we never had added a
permanent URL for it. This is really inconvenient to use, since only
available way to access dashboard is by IP-address.

In this tread I'd like to:
1. Vote to assign metrics.beam.apache.org to metrics dashboard (
http://104.154.241.245).
2. Gather information on how to do it. I can assume only following steps so
far: a) vote b) once vote is complete, contact Apache INFRA to help with
this.

Regards,
Mikhail.

Re: Postcommit kiosk dashboard

2019-05-17 Thread Mikhail Gryzykhin

Hi Kyle,

Currently available dashboard documentation is located on cwiki
<https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics> and
github <https://github.com/apache/beam/tree/master/.test-infra/metrics>.

I'll start a separate thread with voting on getting it a permanent url.
Initially it was started as prototype and we decided to let it run without
url while we check whether it will be used by people. Recently I got
several requests to add additional graphs to dash, so I believe it is
utilized enough to justify extra work for getting a url for it.

Regards,
--Mikhail

On Fri, May 17, 2019 at 5:55 PM Kyle Weaver  wrote:

> Hmm just wasn't working on my desktop; maybe a proxy issue.
>
> How about giving this a permanent URL and maybe some documentation?
>
> Thanks
>
> On Fri, May 17, 2019 at 4:28 PM Pablo Estrada  wrote:
>
>> The dashboard's here:
>> http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?orgId=1
>>
>> Or are you looking for something else in particular?
>> Best
>> -P.
>>
>> On Fri, May 17, 2019 at 4:18 PM Kyle Weaver  wrote:
>>
>>> Whatever happened to this dashboard? Having to manually maintain
>>> multiple lists of long links is a pain, and error-prone to boot.
>>>
>>> (Sorry for resurrecting a month-old thread)
>>>
>>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>>> | +1650203
>>>
>>>
>>> On Fri, Apr 19, 2019 at 2:53 AM Ismaël Mejía  wrote:
>>>
>>>> Catching up on this one, nice dashboard !
>>>> Some jobs are misisng e.g. validatesRunner for both Spark and Flink.
>>>> I suppose those are important if this may eventually replace the
>>>> README as Thomas suggests.
>>>>
>>>> On Fri, Mar 15, 2019 at 2:18 AM Thomas Weise  wrote:
>>>> >
>>>> > This is very nice!
>>>> >
>>>> > Perhaps it can also replace this manually maintained list?
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md
>>>> >
>>>> >
>>>> > On Thu, Mar 14, 2019 at 1:01 PM Mikhail Gryzykhin 
>>>> wrote:
>>>> >>
>>>> >> Addressed comments:
>>>> >> 1. Added precommits.
>>>> >> 2. Limited timeframe to 7 days. This removed old jobs from table.
>>>> >> 2.1 We keep history of all jobs in separate DB that's used by
>>>> grafana. Some of deprecated jobs come from there.
>>>> >>
>>>> >> --Mikhail
>>>> >>
>>>> >> Have feedback?
>>>> >>
>>>> >>
>>>> >> On Thu, Mar 14, 2019 at 12:03 PM Michael Luckey 
>>>> wrote:
>>>> >>>
>>>> >>> Very nice!
>>>> >>>
>>>> >>> Two questions though:
>>>> >>> - the links on the left should point somewhere?
>>>> >>> - where are the beam_PostCommit_[Java|GO]_GradleBuild coming from?
>>>> Cant find them on Jenkins...
>>>> >>>
>>>> >>> On Thu, Mar 14, 2019 at 7:20 PM Mikhail Gryzykhin <
>>>> mig...@google.com> wrote:
>>>> >>>>
>>>> >>>> we already have https://s.apache.org/beam-community-metrics
>>>> >>>>
>>>> >>>> --Mikhail
>>>> >>>>
>>>> >>>> Have feedback?
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Mar 14, 2019 at 11:15 AM Pablo Estrada 
>>>> wrote:
>>>> >>>>>
>>>> >>>>> Woaahhh very fanc... this is great. Thanks so much. Love it.
>>>> - I also like the Code Velocity dashboard that you've added.
>>>> >>>>>
>>>> >>>>> Let's make these more discoverable. How about adding a shortlink?
>>>> s.apache.org/beam-dash ? : )
>>>> >>>>> Best
>>>> >>>>> -P.
>>>> >>>>>
>>>> >>>>> On Thu, Mar 14, 2019 at 10:58 AM Mikhail Gryzykhin <
>>>> mig...@google.com> wrote:
>>>> >>>>>>
>>>> >>>>>> Hi everyone,
>>>> >>>>>>
>>>> >>>>>> I've added a kiosk style post-commit status dashboard that can
>>>> help decorate your office space with green and red colors.
>>>> >>>>>>
>>>> >>>>>> Regards,
>>>> >>>>>> --Mikhail
>>>> >>>>>>
>>>> >>>>>> Have feedback?
>>>>
>>> --
> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
> | +1650203
>

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Mikhail Gryzykhin

Congratulations Pablo!

On Tue, May 14, 2019, 20:25 Kenneth Knowles  wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming Pablo Estrada to
> join the PMC.
>
> Pablo first picked up BEAM-722 in October of 2016 and has been a steady
> part of the Beam community since then. In addition to technical work on
> Beam Python & Java & runners, I would highlight how Pablo grows Beam's
> community by helping users, working on GSoC, giving talks at Beam Summits
> and other OSS conferences including Flink Forward, and holding training
> workshops. I cannot do justice to Pablo's contributions in a single
> paragraph.
>
> Thanks Pablo, for being a part of Beam.
>
> Kenn
>

Re: [ANNOUNCE] New committer announcement: Mark Liu

2019-05-09 Thread Mikhail Gryzykhin

Congratulations Mark!

*From: *Kenneth Knowles 
*Date: *Sun, Mar 24, 2019 at 9:40 PM
*To: *dev

Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer:
> Mark Liu.
>
> Mark has been contributing to Beam since late 2016! He has proposed 100+
> pull requests. Mark was instrumental in expanding test and infrastructure
> coverage, especially for Python. In consideration of Mark's
> contributions, the Beam PMC trusts Mark with the responsibilities of a Beam
>  committer [1].
>
> Thank you, Mark, for your contributions.
>
> Kenn
>
> [1] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>

Re: Removing Java Reference Runner code

2019-04-26 Thread Mikhail Gryzykhin

+1 to remove overall. We removed all tests for ULR already and when we did
that, tests were red. Removing code base is a natural next step.

It is a valid point that we should have a way to run portable pipelines
locally with Python ULR.

I don't believe that a Java person working with Java SDK should actually
debug worker in most cases. If we have a situation when SDK dev have to
debug runner retularly, we should improve runner logging and error
reporting. This can be a great exercise of improving testability. As well
as a good requirement if we want to eventually split mono-repo.

--Mikhail

On Fri, Apr 26, 2019 at 12:36 PM Boyuan Zhang  wrote:

> Another concern from me is, will it be difficult for a Java person (who
> developing Java SDK) to figure out what's going on in Python ULR when
> debugging?
>
> On Fri, Apr 26, 2019 at 12:05 PM Kenneth Knowles  wrote:
>
>> Good points. Distilling one single item: can I, today, run the Java SDK's
>> suite of ValidatesRunner command against the Python ULR + Java SDK Harness,
>> in a single Gradle command?
>>
>> Kenn
>>
>> On Fri, Apr 26, 2019 at 9:54 AM Anton Kedin  wrote:
>>
>>> If there is no plans to invest in ULR then it makes sense to remove it.
>>>
>>> Going forward, however, I think we should try to document the higher
>>> level approach we're taking with runners (and portability) now that we have
>>> something working and can reflect on it. For example, couple of things that
>>> are not 100% clear to me:
>>>  - if the focus is on python runner for portability efforts, how does
>>> java SDK (and other languages) tie into this? E.g. how do we run, test,
>>> measure, and develop things (pipelines, aspects of the SDK, runner);
>>>  - what's our approach to developing new features, should we make sure
>>> python runner supports them as early as possible (e.g. schemas and SQL)?
>>>  - java DirectRunner is still there:
>>> - it is still the primary tool for java SDK development purposes,
>>> and as Kenn mentioned in the linked threads it adds value by making sure
>>> users don't rely on implementation details of specific runners. Do we have
>>> a similar story for portable scenarios?
>>> - I assume that extra validations in the DirectRunner have impact on
>>> performance in various ways (potentially non-deterministic). While this
>>> doesn't matter in some cases, it might do in others. Having a local runner
>>> that is (better) optimized for execution would probably make more sense for
>>> perf measurements, integration tests, and maybe even local production jobs.
>>> Is this something potentially worth looking into?
>>>
>>> Regards,
>>> Anton
>>>
>>>
>>> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels 
>>> wrote:
>>>
 Thanks for following up with this. I have mixed feelings to see the
 portable Java DirectRunner go, but I'm in favor of this change because
 it removes a lot of code that we do not really make use of.

 -Max

 On 26.04.19 02:58, Kenneth Knowles wrote:
 > Thanks for providing all this background on the PR. It is very easy
 to
 > see where it came from. Definitely nice to have less code and fewer
 > things that can break. Perhaps lazy consensus is enough.
 >
 > Kenn
 >
 > On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira <
 danolive...@google.com
 > > wrote:
 >
 > Hey everyone,
 >
 > I made a preliminary PR for removing all the Java Reference Runner
 > code (PR-8380 ) since I
 > wanted to see if it could be done easily. It seems to be working
 > fine, so I wanted to open up this discussion to make sure people
 are
 > still in agreement on getting rid of this code and that people
 don't
 > have any concerns.
 >
 > For those who need additional context about this, this previous
 > thread
 > <
 https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E
 >
 > is where we discussed deprecating the Java Reference Runner (in
 some
 > places it's called the ULR or Universal Local Runner, but it's the
 > same thing). Then there's this thread
 > <
 https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E
 >
 > where we discussed removing the code from the repo since it's been
 > deprecated.
 >
 > If no one has any objections to trying to remove the code I'll
 have
 > someone review the PR I wrote and start a vote to have it merged.
 >
 > Thanks,
 > Daniel Oliveira
 >

>>>

Re: New contributor to Beam

2019-04-17 Thread Mikhail Gryzykhin

Welcome!

--Mikhail

On Wed, Apr 17, 2019 at 9:58 AM Melissa Pashniak 
wrote:

>
> Welcome Cyrus!
>
>
> On Wed, Apr 17, 2019 at 7:31 AM Jean-Baptiste Onofré 
> wrote:
>
>> Welcome !
>>
>> Regards
>> JB
>>
>> On 17/04/2019 16:05, Cyrus Maden wrote:
>> > Hi all!
>> >
>> > My name's Cyrus and I'd like to start contributing to Beam. I'm a
>> > technical writer so I'm particularly looking forward to contributing to
>> > the Beam docs. Could someone add me as a contributor on JIRA so I can
>> > create and assign tickets?
>> >
>> > My JIRA name is: *cyrusmaden*
>> > *
>> > *
>> > Excited to be a part of this community and to work with ya'll!
>> >
>> > Best,
>> > Cyrus
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

Comparison of Beam on X vs X

2019-04-15 Thread Mikhail Gryzykhin

Hi everyone,

I've recently got curious of what are benefits/drawbacks for Beam on X vs
X, where X is relevant runner (Spark, Hadoop, etc).

I wonder, if anyone did similar research already and might have some
documents/tables/references available?

Sample topics of curiosity:
* performance of similar pipelines
* ease of development
* debugability
* extra/missing functionality (we have capability matrix
)
* other topics?

Regards,
--Mikhail

Re: ParDo Execution Time stat is always 0

2019-04-04 Thread Mikhail Gryzykhin

Hi everyone,

Quick summary on python and Dataflow Runner:
Python SDK already reports:
- MSec
- User metrics (int64 and distribution)
- PCollection Element Count
- Work on MeanByteCount for pcollection is ongoing here
.

Dataflow Runner:
- all metrics listed above are passed through to Dataflow.

Ryan can give more information on Flink Runner. I also see Maximilian on
some of relevant PRs, so he might comment on this as well.

Regards,
Mikhail.


On Thu, Apr 4, 2019 at 10:43 AM Pablo Estrada  wrote:

> Hello guys!
> Alex, Mikhail and Ryan are working on support for metrics in the
> portability framework. The support on the SDK is pretty advanced AFAIK*,
> and the next step is to get the metrics back into the runner. Lukazs and
> myself are working on a project that depends on this too, so I'm adding
> everyone so we can get an idea of what's missing.
>
> I believe:
> - User metrics are fully wired up in the SDK
> - State sampler (timing) metrics are wired up as well (is that right, +Alex
> Amato ?)
> - Work is ongoing to send the updates back to Flink.
> - What is the plan for making metrics queriable from Flink? +Ryan Williams
> 
>
> Thanks!
> -P.
>
>
>
> On Wed, Apr 3, 2019 at 12:02 PM Thomas Weise  wrote:
>
>> I believe this is where the metrics are supplied:
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py
>>
>> git grep process_bundle_msecs   yields results for dataflow worker only
>>
>> There isn't any test coverage for the Flink runner:
>>
>>
>> https://github.com/apache/beam/blob/d38645ae8758d834c3e819b715a66dd82c78f6d4/sdks/python/apache_beam/runners/portability/flink_runner_test.py#L181
>>
>>
>>
>> On Wed, Apr 3, 2019 at 10:45 AM Akshay Balwally 
>> wrote:
>>
>>> Should have added- I'm using Python sdk, Flink runner
>>>
>>> On Wed, Apr 3, 2019 at 10:32 AM Akshay Balwally 
>>> wrote:
>>>
 Hi,
 I'm hoping to get metrics on the amount of time spent on each operator,
 so it seams like the stat


 {organization_specific_prefix}.operator.beam-metric-pardo_execution_time-process_bundle_msecs-v1.gauge.mean

 would be pretty helpful. But in practice, this stat always shows 0,
 which I interpret as 0 milliseconds spent per bundle, which can't be
 correct (other stats show that the operators are running, and timers within
 the operators show more reasonable times). Is this a known bug?


 --
 *Akshay Balwally*
 Software Engineer
 937.271.6469 <+19372716469>
 [image: Lyft] 

>>>
>>>
>>> --
>>> *Akshay Balwally*
>>> Software Engineer
>>> 937.271.6469 <+19372716469>
>>> [image: Lyft] 
>>>
>>

Re: Quieten javadoc generation

2019-04-02 Thread Mikhail Gryzykhin

+1 to suppress warnings globally. If we care about an issue, it should be
error.

On Tue, Apr 2, 2019 at 5:38 AM Alexey Romanenko 
wrote:

> +1 to suppress such warnings globally. IMO, usually, meaningful Javadoc
> description is quite enough to understand what this method does.
>
> On 1 Apr 2019, at 18:21, Kenneth Knowles  wrote:
>
> Personally, I would like to suppress the warnings globally. I think
> requiring javadoc everywhere is already enough to remind someone to write
> something meaningful. And I think @param rarely adds anything beyond the
> function signature and @return rarely adds anything beyond the description.
>
> Kenn
>
> On Mon, Apr 1, 2019 at 6:53 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> currently our console output gets cluttered by thousands of Javadoc
>> warnings [1]. Most of them are warnings caused by missinlng @return
>> or @param tags  [2].
>>
>> So currently, this signal is completely ignored, and even worse, makes it
>> difficult to parse through the log.
>>
>> As I could not find a previous discussion on the list on how to handle
>> param/return on java docs, I felt the need to ask here first, how we would
>> like to improve this situation.
>>
>> Some options
>> 1. fix those warnings
>> 2. do not insist on those tags being present and disable doclint warnings
>> (probably not doable on tag granularity). This is already done on doc
>> aggregation task [3]
>>
>> Thoughts?
>>
>>
>> [1] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/console
>> [2] https://builds.apache.org/job/beam_PreCommit_Java_Cron/1131/java/
>> [3]
>> https://github.com/apache/beam/blob/master/sdks/java/javadoc/build.gradle#L77-L78
>>
>>
>

Re: Debugging :beam-sdks-java-io-hadoop-input-format:test

2019-04-02 Thread Mikhail Gryzykhin

Hi everyone,

I created BEAM-6974 <https://issues.apache.org/jira/browse/BEAM-6974>. This
test and beam-sdks-java-io-cassandra tests fail often in our Pre-Commit
jobs. Can someone look into this?

Thank you,
Mikhail.


On Thu, Mar 28, 2019 at 12:31 PM Mikhail Gryzykhin 
wrote:

> I've seen it couple of times already and just got another repro:
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/5011/consoleFull
>
> On Thu, Mar 28, 2019 at 8:55 AM Alexey Romanenko 
> wrote:
>
>> Hi Mikhail,
>>
>> We had a flaky “HIFIOWithEmbeddedCassandraTest” a while ago and it was
>> caused by issue with launching of embedded Cassandra cluster. Then it was
>> fixed by Etienne Chauchot's PR [1]
>> Though, I don’t see any similar error messages in your Jenkins job log,
>> so, I’m not sure it’s the same issue.
>>
>> Have you seen this fail only once or several times already?
>>
>> [1] https://github.com/apache/beam/pull/8000
>>
>> On 27 Mar 2019, at 22:24, Mikhail Gryzykhin  wrote:
>>
>> Hi everyone,
>>
>> I have a pre-commit job that fails on
>> *:beam-sdks-java-io-hadoop-input-format:test*
>> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/834/consoleFull>.
>> Relevant PR. <https://github.com/apache/beam/pull/8131>
>>
>> Target doesn't have any explicit log associated with it. Running same
>> target in local doesn't give me much help. It seem to fail somewhere in
>> native runtime.
>>
>> Can someone help with tackling this issue?
>>
>> Regards,
>> Mikhail.
>>
>>
>>
>>

Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Mikhail Gryzykhin

+1 on this. I'd prefer to have this as pre-commit only.

On Mon, Apr 1, 2019 at 9:09 AM Andrew Pilloud  wrote:

> +1 on this, particularly removing the dead link checker from default
> tests. It is effectively testing that ~20 random websites are up. I wonder
> if there is a way to limit it to locally testing links within the beam site?
>
> On Mon, Apr 1, 2019 at 3:54 AM Michael Luckey  wrote:
>
>> Hi,
>>
>> after playing around with Gradle build for a while, I would like to
>> suggest to remove ':beam-website:testWebsite target from Gradle's check
>> task.
>>
>> Rationale:
>> - the task seems to be very flaky. In fact, I always need to add '-x
>> :beam-website:testWebsite' to my build [1]
>> - task uses docker, which imho adds a (unnecessary) severe constraint on
>> the build task. E.g. A part time user is unable to execute these tests in a
>> docker environment
>> - these tests are accessing production environment. So myself hitting the
>> build several times an hour could be considered a DOS attack.
>>
>> Of course, these tests add lots of value and should definitely be
>> executed, but wouldn't it be sufficient, to run this task only dedicated,
>> i.e. by an explicit call to ':beam-website:testWebsite' o
>> ':websitePreCommit'? Any thoughts?
>>
>> best,
>>
>> michel
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6760
>>
>

Re: Debugging :beam-sdks-java-io-hadoop-input-format:test

2019-03-28 Thread Mikhail Gryzykhin

I've seen it couple of times already and just got another repro:
https://builds.apache.org/job/beam_PreCommit_Java_Commit/5011/consoleFull

On Thu, Mar 28, 2019 at 8:55 AM Alexey Romanenko 
wrote:

> Hi Mikhail,
>
> We had a flaky “HIFIOWithEmbeddedCassandraTest” a while ago and it was
> caused by issue with launching of embedded Cassandra cluster. Then it was
> fixed by Etienne Chauchot's PR [1]
> Though, I don’t see any similar error messages in your Jenkins job log,
> so, I’m not sure it’s the same issue.
>
> Have you seen this fail only once or several times already?
>
> [1] https://github.com/apache/beam/pull/8000
>
> On 27 Mar 2019, at 22:24, Mikhail Gryzykhin  wrote:
>
> Hi everyone,
>
> I have a pre-commit job that fails on
> *:beam-sdks-java-io-hadoop-input-format:test*
> <https://builds.apache.org/job/beam_PreCommit_Java_Phrase/834/consoleFull>.
> Relevant PR. <https://github.com/apache/beam/pull/8131>
>
> Target doesn't have any explicit log associated with it. Running same
> target in local doesn't give me much help. It seem to fail somewhere in
> native runtime.
>
> Can someone help with tackling this issue?
>
> Regards,
> Mikhail.
>
>
>
>

Re: Frequent failures on beam8

2019-03-27 Thread Mikhail Gryzykhin

And another one.
beam14 OOMs

On Mon, Mar 25, 2019 at 5:54 PM Yifan Zou  wrote:

> the beam8 is disabled by now.
>
> On Mon, Mar 25, 2019 at 2:06 PM Mikhail Gryzykhin 
> wrote:
>
>> Yifan is looking into this.
>>
>> On Mon, Mar 25, 2019 at 1:55 PM Boyuan Zhang  wrote:
>>
>>> Hey all,
>>>
>>> Could anyone help take a look at beam8
>>> <https://builds.apache.org/computer/beam8/builds>? Seems like there are
>>> many tests failed on beam8 owing to infra problems.
>>>
>>> Thanks!
>>>
>>

Re: New contributor

2019-03-27 Thread Mikhail Gryzykhin

Welcome Niklas.

This is another location with useful resources for contributors:
https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides (contributor
guide has link to this as well though)

On Wed, Mar 27, 2019 at 10:54 AM Connell O'Callaghan 
wrote:

> Welcome Niklas - given your background it will be very interesting to see
> your contributions.
>
> On Wed, Mar 27, 2019 at 10:29 AM Mark Liu  wrote:
>
>> Welcome!
>>
>> Mark
>>
>> On Wed, Mar 27, 2019 at 10:09 AM Lukasz Cwik  wrote:
>>
>>> Welcome. The getting started[1] and contribution guides[2] are most
>>> useful. I have also added you as a contributor to the JIRA project.
>>>
>>> 1: https://beam.apache.org/get-started/beam-overview/
>>> 2: https://beam.apache.org/contribute/
>>>
>>> On Wed, Mar 27, 2019 at 9:38 AM Niklas Hansson <
>>> niklas.sven.hans...@gmail.com> wrote:
>>>
 Hi!

 I work as a data scientist within banking but will switch over to
 manufacturing the next month. I would like to contribute to Beam and
 especially the Python SDK. Could you add me as a contributor?

 I am new to open source contribution so feel free to give me any advice
 or point me in the right direction. Plan to start off with some of the
 starter tasks from the Jira board.

 Best regards
 Niklas

>>>

Debugging :beam-sdks-java-io-hadoop-input-format:test

2019-03-27 Thread Mikhail Gryzykhin

Hi everyone,

I have a pre-commit job that fails on
*:beam-sdks-java-io-hadoop-input-format:test*
.
Relevant PR. 

Target doesn't have any explicit log associated with it. Running same
target in local doesn't give me much help. It seem to fail somewhere in
native runtime.

Can someone help with tackling this issue?

Regards,
Mikhail.

Re: Build blocking on

2019-03-26 Thread Mikhail Gryzykhin

I believe what happens is that testPy2Gcp actually runs integration tests
that try to connect to GCP. Without having GCP cluster and configuration on
your machine I'd expect these tests to fail.

I'd say we should remove testPy2Gcp task from "build" task and explicitly
keep it as integration test.

--Mikhail


On Tue, Mar 26, 2019 at 3:12 PM Michael Luckey  wrote:

>
>
> On Tue, Mar 26, 2019 at 10:29 PM Udi Meiri  wrote:
>
>> Luckey, I couldn't recreate your issue, but I still haven't done a full
>> build.
>> I created a new GCE VM with using the ubuntu-1804-bionic-v20190212a image
>> (n1-standard-4 machine type).
>>
>> Ran the following:
>> sudo apt-get update
>> sudo apt-get install python-pip
>> sudo apt-get install python-virtualenv
>> git clone https://github.com/apache/beam.git
>> cd beam
>> ./gradlew :beam-sdks-python:testPy2Gcp
>> [failed: no JAVA_HOME]
>> sudo apt-get install openjdk-8-jdk
>> ./gradlew :beam-sdks-python:testPy2Gcp
>>
>> Got: BUILD SUCCESSFUL in 7m 52s
>>
>
> Nice. Thanks a lot for your help here.
>
> If I understand correctly, this VM is already located within gcp. Could it
> already have some setup, which needs to be done on 'my' VM? For instance I
> was contemplating about that test trying 'to call home', but as I am
> (unfortunately ;) no googler and do not have any gcp specific setup, fails
> here but misses to timeout? This is just some weird assumption, did not yet
> look into the actual implementation.
>
> Which I seemingly need to do here :(
>
>
>> Then I tried:
>> ./gradlew build
>>
>> And ran out of disk space. :) (beam/ is taking 4.5G and the VM boot disk
>> is 10G total)
>>
>
> Ouch :D
>
>
>>
>> On Tue, Mar 26, 2019 at 1:35 PM Robert Burke  wrote:
>>
>>> Michael, your concern is reasonable, especially with the experience with
>>> python, though that does help me bootstrap this work. :)
>>>
>>> The go tools provide caching and avoid redoing work if the source files
>>> haven't changed. This applies most particularly for `go build` and `go
>>> test`. As long as the go code isn't changing at every invocation, this
>>> should be fine. I'm not aware of the same being the case for the usual
>>> python tools.
>>>
>>>  The real trick is ensuring a valid and consistent environment for the
>>> go code.
>>>
>>> The environment question becomes easier for everyone by moving to go
>>> modules, which were designed to provide these kinds of consistent builds.
>>> It also avoids needing a GOPATH set. Any directory is permitted, as long as
>>> the go.mod is present.
>>>
>>> (The Go SDK doesn't yet us go modules, so go.mod and go.sum aren't yet
>>> in the repo.)
>>>
>>> The main blocker is see is updating the Jenkins machines to have the
>>> latest version of Go (1.12) instead of 1.10, which doesn't support modules.
>>> This only blocks a final submission, rather than the work fortunately.
>>>
>>> On Tue, Mar 26, 2019, 1:08 PM Udi Meiri  wrote:
>>>
 "rm -r ~/.gradle/go/repo/" worked for me (there was more than one
 package with issues).
 My ~/.bashrc has
   export GOPATH=$HOME/go
 so maybe that's making the difference in my setup.

 On Tue, Mar 26, 2019 at 11:28 AM Thomas Weise  wrote:

> Can this be addressed by having "clean" remove all state that gogradle
> leaves behind? This staleness issue has bitten me a few times also and it
> would be good to have a reliable way to deal with it, even if it involves
> an extra clean.
>
>
> On Tue, Mar 26, 2019 at 11:14 AM Michael Luckey 
> wrote:
>
>> @Udi
>> Did you try to just delete the
>> '/usr/local/google/home/ehudm/.gradle/go/repo/cloud.google.com'
>> folder?
>>
>> @Robert
>> As said before, I am a bit scared about the implications. Shelling
>> out is done by python, and from build perspective, this does not work 
>> very
>> well, unfortunately. I.e. no caching, up-to-date checks etc...
>>
>> But of course, we need to play with this a bit more.
>>
>> On Tue, Mar 26, 2019 at 6:24 PM Robert Burke 
>> wrote:
>>
>>> Reading the error from the gradle scan, it largely looks like some
>>> part of the GCP dependencies for the build depends on a package, where 
>>> the
>>> commit version is no longer around. The main issue with gogradle is that
>>> it's entirely distinct from the usual Go workflow, which means deps 
>>> users
>>> use are likely to be different to what's in the lock file.
>>>
>>> This work will be tracked in
>>> https://issues.apache.org/jira/browse/BEAM-5379
>>> GoGradle hasn't moved to support the new-go way of handling deps, so
>>> my inclination is to simplify to simple scripts for Gradle that shell 
>>> out
>>> the to Go tool for handling Go dep management, over trying to fix 
>>> GoGradle.
>>>
>>> On Tue, 26 Mar 2019 at 09:43, Udi Meiri  wrote:
>>>
 Robert, from what I recall it's not flaky for me - it consistently

Re: Frequent failures on beam8

2019-03-25 Thread Mikhail Gryzykhin

Yifan is looking into this.

On Mon, Mar 25, 2019 at 1:55 PM Boyuan Zhang  wrote:

> Hey all,
>
> Could anyone help take a look at beam8
> ? Seems like there are
> many tests failed on beam8 owing to infra problems.
>
> Thanks!
>

Re: [PROPOSAL] commit granularity in master

2019-03-22 Thread Mikhail Gryzykhin

I agree with keeping history clean.

Although, Small commits like address PR comments are useful during review
process. They allow reviewer to see only new changes, not review whole diff
again. Best to squash then before/on merge though.

On Fri, Mar 22, 2019, 07:34 Ismaël Mejía  wrote:

> > I like the extra delimitation the brackets give, worth the two extra
> > characters to me. More importantly, it's nice to have consistency, and
> > the only way to be consistent with the past is leave them there.
>
> My point with the brackets is that we are 'getting close' to 10K issue
> so we will then have 3 chars less, probably it does not change much
> but still.
>
> On Fri, Mar 22, 2019 at 3:19 PM Robert Bradshaw 
> wrote:
> >
> > On Fri, Mar 22, 2019 at 3:02 PM Ismaël Mejía  wrote:
> > >
> > > It is good to remind committers of their responsability on the
> > > 'cleanliness' of the merged code. Github sadly does not have an easy
> > > interface to do this and this should be done manually in many cases,
> > > sadly I have seen many committers just merging code with multiple
> > > 'fixup' style commits by clicking Github's merge button. Maybe it is
> > > time to find a way to automatically detect these cases and disallow
> > > the merge or maybe we should reconsider the policy altogether if they
> > > are people who don't see the value of this.
> >
> > I agree about keeping our history clean and useful, and think those
> > four points summarize things well (but a clarification on fixup
> > commits would be good).
> >
> > +1 to an automated check that there are many extraneous commits.
> > Anything the person hitting the merge button would easily see before
> > doing the merge.
> >
> > > I would like to propose a small modification to the commit title style
> > > on that guide. We use two brackets to enclose the issue id, but that
> > > really does not improve much the readibility and uses 2 extra spaces
> > > of the already short space title, what about getting rid of them?
> > >
> > > Current style: "[BEAM-] Commit title"
> > > Proposed style: "BEAM- Commit title"
> > >
> > > Any ideas or opinons pro/con ?
> >
> > I like the extra delimitation the brackets give, worth the two extra
> > characters to me. More importantly, it's nice to have consistency, and
> > the only way to be consistent with the past is leave them there.
> >
> > > On Fri, Mar 22, 2019 at 2:32 PM Etienne Chauchot 
> wrote:
> > > >
> > > > Thanks Alexey to point this out. I did not know about these 4 points
> in the guide. I agree with them also. I would just add "Avoid keeping in
> history formatting messages such as checktyle or spotless fixes"
> > > > If it is ok, I'll submit a PR to add this point.
> > > > Le vendredi 22 mars 2019 à 11:33 +0100, Alexey Romanenko a écrit :
> > > >
> > > > Etienne, thanks for bringing this topic.
> > > >
> > > > I think it was already discussed several times before and we have
> finally came to what we have in the current Committer guide “Granularity of
> changes" [1].
> > > >
> > > > Personally, I completely agree with these 4 rules presented there.
> The main concern is that all committers should follow them as well,
> otherwise we still have sometimes a bunch of small commits with
> inexpressive messages (I believe they were added during review process and
> were not squashed before merging).
> > > >
> > > > In my opinion, the most important rule is that every commit should
> be atomic in terms of added/fixed functionality and rolling it back should
> not break master branch.
> > > >
> > > > [1]
> https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
> > > >
> > > >
> > > > On 22 Mar 2019, at 10:16, Etienne Chauchot 
> wrote:
> > > >
> > > > Hi all,
> > > > It has already been discussed partially but I would like that we
> agree on the commit granularity that we want in our history.
> > > > Some features were squashed to only one commit which seems a bit too
> granular to me for a big feature.
> > > > On the contrary I see PRs with very small commits such as "apply
> spotless" or "fix checkstyle".
> > > >
> > > > IMHO I think a good commit size is an isolable portion of a feature
> such as for ex "implement Read part of Kudu IO" or "reduce concurrency in
> Test A". Such a granularity allows to isolate problems easily (git bisect
> for ex) and rollback only a part if necessary.
> > > > WDYT about:
> > > > - squashing non meaningful commits such as "apply review comments"
> (and rather state what they do and group them if needed), or "apply
> spotless" or "fix checkstyle"
> > > > - trying to stick to a commit size as described above
> > > >
> > > > => and of course update the contribution guide at the end
> > > > ?
> > > >
> > > > Best
> > > > Etienne
> > > >
> > > >
>

[BEAM-6862] Adding pyhamcrest library to python container

2019-03-21 Thread Mikhail Gryzykhin

Hi everyone,

Recently, there was added a test for verifying metrics in python SDK (
PR-8038 ).

This PR causes beam_PostCommit_Py_ValCont job to fail

due to lack of pyhamcrest library in python SDK container.

I have created a PR-8107  that
adds relevant library and it fixes test. However I want to first confirm
that we are OK with adding library used for testing to prod container or
whether we want to utilize some other approach for testing.

For a background: we already have tenacity library added this way.

Regards,
--Mikhail

Have feedback ?

Re: Python PVR Reference post-commit tests failing

2019-03-14 Thread Mikhail Gryzykhin

@Kenneth
If we disable tests, I'd call Java ULR a dead code.

One of the better compromises:
1. disable tests.
2. Add tag to the last commit where Java ULR existed.
3. Remove Java ULR from head.

Keeping history, no extra dead code at head.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Mar 14, 2019 at 1:02 PM Ankur Goenka  wrote:

> On that note, we should also think about adding PVR for python reference
> runners. Jira: https://issues.apache.org/jira/browse/BEAM-6837
>
>
> On Thu, Mar 14, 2019 at 12:57 PM Kenneth Knowles  wrote:
>
>> How about this compromise:
>>
>> 1. disable the test since clearly no one is relying on the functionality
>> that is broken
>> 2. leave the Java ULR as-is for now, and a volunteer can pick it up and
>> make it work if they want
>>
>> Kenn
>>
>> On Thu, Mar 14, 2019 at 11:41 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> We have Python PVR Reference post-commit tests failing for quite some
>>> time now. These are tests for java reference runner.
>>>
>>> According to this thread
>>> <https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E>,
>>> we are deciding what to do with java reference runner and might want to
>>> remove it from code base.
>>>
>>> My question is: do we want to a) invest time in fixing python PVR tests,
>>> or b) disable this test and start cleaning up code?
>>>
>>> a) Is worth it if we want to invest into java reference runner in the
>>> future.
>>> b) Is worth if we want to invest into Python and forfeit java reference
>>> runner.
>>>
>>> Option b) seem more reasonable to me atm, since most people lean towards
>>> going forward with Python reference runner.
>>>
>>> Please, share your thoughts.
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>

Re: Postcommit kiosk dashboard

2019-03-14 Thread Mikhail Gryzykhin

Addressed comments:
1. Added precommits.
2. Limited timeframe to 7 days. This removed old jobs from table.
2.1 We keep history of all jobs in separate DB that's used by grafana. Some
of deprecated jobs come from there.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Mar 14, 2019 at 12:03 PM Michael Luckey  wrote:

> Very nice!
>
> Two questions though:
> - the links on the left should point somewhere?
> - where are the beam_PostCommit_[Java|GO]_GradleBuild coming from? Cant
> find them on Jenkins...
>
> On Thu, Mar 14, 2019 at 7:20 PM Mikhail Gryzykhin 
> wrote:
>
>> we already have https://s.apache.org/beam-community-metrics
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Thu, Mar 14, 2019 at 11:15 AM Pablo Estrada 
>> wrote:
>>
>>> Woaahhh very fanc... this is great. Thanks so much. Love it. - I
>>> also like the Code Velocity dashboard that you've added.
>>>
>>> Let's make these more discoverable. How about adding a shortlink?
>>> s.apache.org/beam-dash ? : )
>>> Best
>>> -P.
>>>
>>> On Thu, Mar 14, 2019 at 10:58 AM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I've added a kiosk style
>>>> <http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1>
>>>>  post-commit
>>>> status dashboard
>>>> <http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1>
>>>> that can help decorate your office space with green and red colors.
>>>>
>>>> Regards,
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>

Re: Postcommit kiosk dashboard

2019-03-14 Thread Mikhail Gryzykhin

@Kenneth
Good point.

That's possible and I thought of it. However then total list of jobs will
overflow the size of most screens, so it will not work as kiosk any more.
Our tests jobs list is way too long.

Also, I would not worry about those as much, since people get aware of
pre-commits failures much faster via failing pre-commits on their PRs.

--Mikhail

Have feedback <http://go/migryz-feedback>?

On Thu, Mar 14, 2019 at 11:20 AM Kenneth Knowles  wrote:

> This is great!
>
> "PreCommit_Cron" are actually very important postcommit runs. Can you add
> them?
>
> Kenn
>
> On Thu, Mar 14, 2019 at 11:15 AM Pablo Estrada  wrote:
>
>> Woaahhh very fanc... this is great. Thanks so much. Love it. - I also
>> like the Code Velocity dashboard that you've added.
>>
>> Let's make these more discoverable. How about adding a shortlink?
>> s.apache.org/beam-dash ? : )
>> Best
>> -P.
>>
>> On Thu, Mar 14, 2019 at 10:58 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I've added a kiosk style
>>> <http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1>
>>>  post-commit
>>> status dashboard
>>> <http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1>
>>> that can help decorate your office space with green and red colors.
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>

Re: JIRA hygiene

2019-03-14 Thread Mikhail Gryzykhin

I believe that there are too many scenarios that we have to cover if we are
to design a generic approach. Common pattern I've seen most times is when
assignee on the ticket, who's usually author of relevant PR, is expected to
either resolve ticket or pass it to the feature owner for verification.

We can have a bot that will check stale assigned tickets and poke
assignees. Can go further and allow bot to unassign tickets if no response
comes and remove "triaged" label. This will always highlight all
non-updated tickets and keep forgotten tickets in available pool. Giving a
hint to pass ownership of ticket to committer (or person who merged PR) can
be a simple answer for contributors who are not sure whether ticket can be
closed.

--Mikhail

Have feedback ?

On Wed, Mar 13, 2019 at 6:00 PM Michael Luckey  wrote:

> Totally agree. The contributor is most likely be the better target. But as
> she is probably less familiar with the process, we might be better of to
> put the responsibility on the committer to kindly ask/discuss with her how
> to proceed with corresponding jira ticket?
>
> On Thu, Mar 14, 2019 at 1:18 AM Ahmet Altay  wrote:
>
>> I agree with defining the workflow for closing JIRAs. Would not
>> contributor be in a better position to close JIRAs or keep it open? It
>> would make sense for the committer to ask about this but I think
>> contributor (presumably the person who is the assignee of the JIRA) could
>> be the responsible party for updating their JIRAs. On the other hand, I
>> understand the argument that committer could do this at the time of merging
>> and fill a gap in the process.
>>
>> On Wed, Mar 13, 2019 at 4:59 PM Michael Luckey 
>> wrote:
>>
>>> Hi,
>>>
>>> definitely +1 to properly establish a workflow to maintain jira status.
>>> Naively I d think, the reporter should close as she is the one to confirm
>>> whether the reported issue is fixed or not. But for obvious reasons that
>>> will not work here, so - although it puts another burden on committers, you
>>> are probably right that the committer is the best choice to ensure that the
>>> ticket gets promoted. Whether it will be resolved or clarified what's still
>>> to be done.
>>>
>>> Looking into the current state, we seem to have tons of issues whith
>>> merged PRs, which for anyone trying to find an existing jira issue to work
>>> on makes it unnecessary difficult to decide whether to look into that or
>>>  not. From my personal experience, it is somehow frustrating going through
>>> open issues, selecting one and after investing some (or even more) time to
>>> first understand a problem and then the PR to realise nothing has to be
>>> done anymore. Or not knowing what's left out and for what reason. But of
>>> course, this is another issue which we definitely need to invest time into
>>> - kenn already asked for our support here.
>>>
>>> thx,
>>>
>>> michel
>>>
>>> On Tue, Mar 12, 2019 at 11:30 AM Etienne Chauchot 
>>> wrote:
>>>
 Hi Thomas,

 I agree, the committer that merges a PR should close the ticket. And,
 if needed, he could discuss with the author (inside the PR) to assess if
 the PR covers the ticket scope.

 This is the rule I apply to myself when I merge a PR (even thought it
 has happened that I forgot to close one or two tickets :) ) .

 Etienne

 Le lundi 11 mars 2019 à 14:17 -0700, Thomas Weise a écrit :

 JIRA probably deserves a separate discussion. It is messy.. We also
 have examples of tickets being referenced by users that were not closed,
 although the feature long implemented or issue fixed.

 There is no clear ownership in our workflow.

 A while ago I proposed in another context to make resolving JIRA part
 of committer duty. I would like to bring this up for discussion again:

 https://github.com/apache/beam/pull/7129#discussion_r236405202

 Thomas

 On Mon, Mar 11, 2019 at 1:47 PM Ahmet Altay  wrote:

 I agree this is a good idea. I used the same technique for 2.11 blog
 post (JIRA release notes -> editorialized list + diffed the dependencies).

 On Mon, Mar 11, 2019 at 1:40 PM Kenneth Knowles 
 wrote:

 That is a good idea. The blog post is probably the main avenue where
 folks will find out about new features or big fixes.

 When I did 2.10.0 I just used the automated Jira release notes and
 pulled out significant things based on my judgment. I would also suggest
 that our Jira hygiene could be significantly improved to make this process
 more effective.

 +1 to improving JIRA notes as well. Often times issues are closed with
 no real comments on what happened, has it been resolved or not. It becomes
 an exercise on reading the linked PRs to figure out what happened.

 Kenn

 On Mon, Mar 11, 2019 at 1:04 PM Thomas Weise  wrote:

 Ahmet,

Python PVR Reference post-commit tests failing

2019-03-14 Thread Mikhail Gryzykhin

Hi everyone,

We have Python PVR Reference post-commit tests failing for quite some time
now. These are tests for java reference runner.

According to this thread
,
we are deciding what to do with java reference runner and might want to
remove it from code base.

My question is: do we want to a) invest time in fixing python PVR tests, or
b) disable this test and start cleaning up code?

a) Is worth it if we want to invest into java reference runner in the
future.
b) Is worth if we want to invest into Python and forfeit java reference
runner.

Option b) seem more reasonable to me atm, since most people lean towards
going forward with Python reference runner.

Please, share your thoughts.

Regards,
--Mikhail

Have feedback ?

Re: Postcommit kiosk dashboard

2019-03-14 Thread Mikhail Gryzykhin

we already have https://s.apache.org/beam-community-metrics

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Thu, Mar 14, 2019 at 11:15 AM Pablo Estrada  wrote:

> Woaahhh very fanc... this is great. Thanks so much. Love it. - I also
> like the Code Velocity dashboard that you've added.
>
> Let's make these more discoverable. How about adding a shortlink?
> s.apache.org/beam-dash ? : )
> Best
> -P.
>
> On Thu, Mar 14, 2019 at 10:58 AM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> I've added a kiosk style
>> <http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1>
>>  post-commit
>> status dashboard
>> <http://104.154.241.245/d/8N6LVeCmk/post-commits-status-dashboard?refresh=30s=1>
>> that can help decorate your office space with green and red colors.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>

Postcommit kiosk dashboard

2019-03-14 Thread Mikhail Gryzykhin

Hi everyone,

I've added a kiosk style

post-commit
status dashboard

that can help decorate your office space with green and red colors.

Regards,
--Mikhail

Have feedback ?

Re: Build broken: repo.spring.io is down

2019-03-12 Thread Mikhail Gryzykhin

Thanks Kyle, that worked.

Does anyone know the reason why we declare same repositories in two
different locations?
Can we remove one of duplicates?

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Tue, Mar 12, 2019 at 1:14 PM Kyle Weaver  wrote:

> I commented out this line and it built fine:
> https://github.com/apache/beam/blob/c41e4fbbeb6ec622a0072e01afcba95428faafb9/buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy#L44
>
> Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203
>
>
> On Tue, Mar 12, 2019 at 1:03 PM Mikhail Gryzykhin 
> wrote:
>
>> I tried to replace repo.sprint.io
>> <https://github.com/apache/beam/blob/master/buildSrc/build.gradle#L30>
>> with mavenCentral() that seem to have relevant plugin (propdeps-plugin
>> <https://mvnrepository.com/artifact/io.spring.gradle/propdeps-plugin/0.0.9.RELEASE>
>> in my case), but gradle still fails to fetch it.
>>
>> Did anyone else had success? Build fails on my local machine as well.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Tue, Mar 12, 2019 at 11:25 AM Kyle Weaver  wrote:
>>
>>> Looks like this is still ongoing. Would greatly appreciate a fix if
>>> anyone's got one.
>>>
>>> Thanks,
>>> Kyle
>>>
>>> Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203
>>>
>>>
>>> On Tue, Mar 12, 2019 at 8:17 AM Maximilian Michels 
>>> wrote:
>>>
>>>> FYI: Our build system is broken at the moment due to
>>>> https://repo.spring.io being down.
>>>>
>>>> If this is not a temporary issue, we could try to switch to a different
>>>> repository.
>>>>
>>>> 16:07:02 FAILURE: Build failed with an exception.
>>>> 16:07:02
>>>> 16:07:02 * What went wrong:
>>>> 16:07:02 Execution failed for task ':beam-model-pipeline:compileJava'.
>>>> 16:07:02 > Could not resolve all files for configuration
>>>> ':beam-model-pipeline:errorprone'.
>>>> 16:07:02> Could not resolve
>>>> com.google.errorprone:error_prone_core:latest.release.
>>>> 16:07:02  Required by:
>>>> 16:07:02  project :beam-model-pipeline
>>>> 16:07:02   > Failed to list versions for
>>>> com.google.errorprone:error_prone_core.
>>>> 16:07:02  > Unable to load Maven meta-data from
>>>>
>>>> https://repo.spring.io/plugins-release/com/google/errorprone/error_prone_core/maven-metadata.xml
>>>> .
>>>> 16:07:02 > Could not HEAD
>>>> '
>>>> https://repo.spring.io/plugins-release/com/google/errorprone/error_prone_core/maven-metadata.xml
>>>> '.
>>>> 16:07:02> Read timed out
>>>> 16:07:02
>>>>
>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4722/console
>>>>
>>>>

Re: Build broken: repo.spring.io is down

2019-03-12 Thread Mikhail Gryzykhin

I tried to replace repo.sprint.io
 with
mavenCentral() that seem to have relevant plugin (propdeps-plugin

in my case), but gradle still fails to fetch it.

Did anyone else had success? Build fails on my local machine as well.

Regards,
--Mikhail

Have feedback ?


On Tue, Mar 12, 2019 at 11:25 AM Kyle Weaver  wrote:

> Looks like this is still ongoing. Would greatly appreciate a fix if
> anyone's got one.
>
> Thanks,
> Kyle
>
> Kyle Weaver |  Software Engineer |  kcwea...@google.com |  +1650203
>
>
> On Tue, Mar 12, 2019 at 8:17 AM Maximilian Michels  wrote:
>
>> FYI: Our build system is broken at the moment due to
>> https://repo.spring.io being down.
>>
>> If this is not a temporary issue, we could try to switch to a different
>> repository.
>>
>> 16:07:02 FAILURE: Build failed with an exception.
>> 16:07:02
>> 16:07:02 * What went wrong:
>> 16:07:02 Execution failed for task ':beam-model-pipeline:compileJava'.
>> 16:07:02 > Could not resolve all files for configuration
>> ':beam-model-pipeline:errorprone'.
>> 16:07:02> Could not resolve
>> com.google.errorprone:error_prone_core:latest.release.
>> 16:07:02  Required by:
>> 16:07:02  project :beam-model-pipeline
>> 16:07:02   > Failed to list versions for
>> com.google.errorprone:error_prone_core.
>> 16:07:02  > Unable to load Maven meta-data from
>>
>> https://repo.spring.io/plugins-release/com/google/errorprone/error_prone_core/maven-metadata.xml
>> .
>> 16:07:02 > Could not HEAD
>> '
>> https://repo.spring.io/plugins-release/com/google/errorprone/error_prone_core/maven-metadata.xml
>> '.
>> 16:07:02> Read timed out
>> 16:07:02
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4722/console
>>
>>

Re: Python precommit duration is above 1hr

2019-03-11 Thread Mikhail Gryzykhin

That's cool! Thank you for working on this.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Mon, Mar 11, 2019 at 10:49 AM Mark Liu  wrote:

> Sorry for missing this thread in my inbox.
>
> Yes, I'm actively working on pull/7675
> <https://github.com/apache/beam/pull/7675> which works pretty well and is
> under review. At first, I tried detox but the test console output are all
> mixed together which makes debugging extremely hard. We also lost many
> advantages of Gradle and scan UI with detox.
>
> pull/7675 <https://github.com/apache/beam/pull/7675> use Gradle
> parallelism and run tox tasks in there.
> https://scans.gradle.com/s/f3fkqqmiosejm is an example run.
>
> Mark
>
> On Sun, Mar 10, 2019 at 8:19 AM Robbe Sneyders 
> wrote:
>
>> Yes, this is largely due to the addition of Python 3 test suites.
>>
>> Running tests in parallel is actively being investigated by +Mark Liu
>>  in this Jira ticket [1] and this PR [2]. We will
>> add other Python 3.6 and 3.7 test suites only to postcommit until then.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6527
>> [2] https://github.com/apache/beam/pull/7675
>>
>> Kind regards,
>> Robbe
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08
>>
>>
>> On Sat, 9 Mar 2019 at 20:22, Robert Bradshaw  wrote:
>>
>>> Perhaps this is the duplication of all (or at least most) previously
>>> existing tests for running under Python 3. I agree that this is excessive;
>>> we should probably split out Py2, Py3, and the linters into separate
>>>  targets.
>>>
>>> We could look into using detox or retox to parallelize the testing as
>>> well. (The issue last time was suppression of output on timeout, but that
>>> can be worked around by adding timeouts to the individual tox targets.)
>>>
>>> On Fri, Mar 8, 2019 at 11:26 PM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Seems that our python pre-commits grow up in time really fast
>>>> <http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1=now-6M=now>
>>>> .
>>>>
>>>> Did anyone follow trend or know what are the biggest changes that
>>>> happened with python lately?
>>>>
>>>> I don't see a single jump, but duration of pre-commits almost doubled
>>>> since new year.
>>>>
>>>> [image: image.png]
>>>>
>>>> Regards,
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>

Python precommit duration is above 1hr

2019-03-08 Thread Mikhail Gryzykhin

Hi everyone,

Seems that our python pre-commits grow up in time really fast

.

Did anyone follow trend or know what are the biggest changes that happened
with python lately?

I don't see a single jump, but duration of pre-commits almost doubled since
new year.

[image: image.png]

Regards,
--Mikhail

Have feedback ?

Re: New Contributor

2019-03-05 Thread Mikhail Gryzykhin

Welcome to the community!

--Mikhail

Have feedback ?


On Tue, Mar 5, 2019 at 1:53 PM Ruoyun Huang  wrote:

> Welcome Boris!
>
> On Tue, Mar 5, 2019 at 1:34 PM Ahmet Altay  wrote:
>
>> Welcome Boris!
>>
>> On Mon, Mar 4, 2019 at 5:40 PM Ismaël Mejía  wrote:
>>
>>> Done, welcome!
>>>
>>> On Tue, Mar 5, 2019 at 1:25 AM Boris Shkolnik  wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > My name is Boris Shkolnik. I am a committer in Hadoop and Samza Apache
>>> projects.
>>> > I would like to contribute to beam.
>>> > Could you please add me to the beam project.
>>> >
>>> > My user name is boryas @apache.org
>>> >
>>> > Thanks,
>>> > -Boris.
>>>
>>
>
> --
> 
> Ruoyun  Huang
>
>

Re: Beam Jenkins job summary available in .test-infra/jenkins/REAMDE.md

2019-02-26 Thread Mikhail Gryzykhin

This looks nice.

Would also be great if we can add links to docs on how to re-run relevant
job/tests on local machine.

Regards,
--Mikhail

Have feedback ?


On Mon, Feb 25, 2019 at 9:38 AM Mark Liu  wrote:

> Glad to hear that!
>
> Thanks,
> Mark
>
> On Sat, Feb 16, 2019 at 1:53 PM Maximilian Michels  wrote:
>
>> Hi Mark,
>>
>> That's super useful. I often end up using Jenkins' search which isn't
>> all that great and finds jobs from all Apache projects. Thank you!
>>
>> Cheers,
>> Max
>>
>> On 15.02.19 22:34, Mark Liu wrote:
>> > TL;DR: Check out .test-infra/jenkins/REAMDE.md
>> > <
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md> for
>>
>> > Beam Jenkins job summary!
>> >
>> > Hi folks,
>> >
>> > I found it's difficult for me to quickly find particular Jenkins job
>> > link or PR trigger phrase during development and PR review. So I
>> > collected some useful job information from groovy files and put them in
>> > .test-infra/jenkins/REAMDE.md
>> > <
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md>.
>>
>> > And also linked this file from PR template
>> > <
>> https://github.com/apache/beam/blob/master/.github/PULL_REQUEST_TEMPLATE.md>.
>>
>> > Due to large number of jobs we currently running, I group them into few
>> > tables: PreCommit, PopstCommit, Performance, Inventory and Others.
>> > Hopefully this's clear and also helpful to other contributors.
>> >
>> > Since the README is generated based on current state of Jenkins groovy
>> > files, so unfortunately any further changes won't be reflected there
>> > without manual update.
>> >
>> > Thanks,
>> > Mark
>> >
>>
>

Re: Beam Community Metrics

2019-02-19 Thread Mikhail Gryzykhin

Hi everyone,

Regarding empty list of dashboards: Most likely you're looking at recently
viewed dashboards. Top-left corner has a drop-down menu with full list of
available dashboards. I've added explicit ticket to add landing page.


Regarding hardcoded IP: unfortunately, I didn't manage to figure out proper
process on acquiring of proper domain name in time I had available to
implement metrics site. Any help with this would be appreciated.

Scott added information on metrics at
https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics
I believe that this documentation was located on CWiki since it is
Beam-dev-relevant, not user-relevant information.

Regards,
--Mikhail

Have feedback ?


On Tue, Feb 19, 2019 at 9:13 AM Scott Wegner  wrote:

> > Also blank for me. Maybe Kenn and I need to sign in?
>
> I think I see what's going on. When I hit
> https://s.apache.org/beam-community-metrics-infra I see a list of
> "Recently viewed dashboards" which you won't have on first pageview.
>
> To see the available dashboards and navigate to them, find the drop-down
> arrow next to "Home" in the top navbar. This will display the list of
> dashboards available that you can drill into.
>
> Does that solve your issue?
>
> On Tue, Feb 19, 2019 at 8:37 AM Brian Hulette  wrote:
>
>> Also blank for me. Maybe Kenn and I need to sign in?
>>
>> On Tue, Feb 19, 2019 at 8:09 AM Scott Wegner  wrote:
>>
>>> > I think it would be good to have a build infra section under
>>> https://beam.apache.org/contribute/ that contains the dashboard link
>>> and other related information.
>>>
>>> I previously wrote a page for the wiki:
>>> https://cwiki.apache.org/confluence/display/BEAM/Community+Metrics . I
>>> chose the wiki over website because Community Metrics is Beam
>>> contributor-focused (as opposed to for Beam users), and not tied to a
>>> release. If it would fit better on the website, we can move it.
>>>
>>> > Hmm, currently blank for me.
>>>
>>> Interesting. Works for me now. We have prober tests that verify that the
>>> infrastructure is healthy:
>>> https://builds.apache.org/job/beam_Prober_CommunityMetrics/ I wonder
>>> what happened in this case.
>>>
>>> > Also a hardcoded i.p. ? We should get it a proper DNS and health
>>> check, perhaps?
>>>
>>> Yes, ugly indeed. Mikhail was previously looking into getting an SSL
>>> cert for HTTPS / login support; a proper domain name seems appropriate too.
>>> I believe the previously roadblock was finding funding for it.
>>>
>>>
>>>
>>> On Sat, Feb 16, 2019 at 7:50 PM Kenneth Knowles  wrote:
>>>
 Hmm, currently blank for me. Also a hardcoded i.p. ? We should get it a
 proper DNS and health check, perhaps?

 Kenn

 On Sat, Feb 16, 2019 at 9:14 AM Thomas Weise  wrote:

> This is super useful information.
>
> I think it would be good to have a build infra section under
> https://beam.apache.org/contribute/ that contains the dashboard link
> and other related information.
>
> WDYT?
>
>
> On Mon, Oct 29, 2018 at 3:26 AM Maximilian Michels 
> wrote:
>
>> Hi Scott,
>>
>> Thanks for sharing the progress. The test metrics are super helpful.
>> I'm
>> particularly looking forward to the PR metrics which could be useful
>> for
>> improving interaction within the community and with new contributors.
>>
>> -Max
>>
>> On 26.10.18 07:36, Scott Wegner wrote:
>> > I want to summarize some of the great work done this summer by
>> Mikhail,
>> > Udi, and Huygaa to visualize and track some project/community
>> health
>> > metrics for Beam. Specifically, they've helped to build dashboards
>> for:
>> > * Test suite health (pre-commit speed, post-commit reliability)
>> > * Pull Request health (code review latency, PR load per reviewer)
>> >
>> > Check it out here: https://s.apache.org/beam-community-metrics,
>> and
>> > please leave feedback on this thread or under our umbrella JIRA
>> item:
>> > BEAM-5862.
>> >
>> > There's some new infrastructure behind this which is hosted
>> alongside
>> > our Jenkins resources on Google Cloud. I want to ensure this
>> doesn't
>> > become a burden for the community, so I've written up a maintenance
>> plan
>> > here: https://s.apache.org/beam-community-metrics-infra. That link
>> > contains more details on the metrics pipeline architecture
>> components,
>> > the design discussions which lead to building them, and my proposal
>> for
>> > documenting and monitoring the infrastructure.
>> >
>> > There was a ton of discussion [1][2][3] that helped shape the
>> dashboards
>> > we've come up with. There's a whole lot we didn't get to, but the
>> source
>> > code is documented and checked-in [4], and I encourage others in
>> the
>> >

Re: Searching Jenkins for particular test failures

2019-02-13 Thread Mikhail Gryzykhin

You can look at this url:
https://builds.apache.org/job/beam_PreCommit_Java_Phrase/718/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/history/

to get there
failed job -> test result -> navigate to failed test -> history

I remember having issue getting there for succeeded tests. So usually just
replaced relevant fields in url.

This seems close to what you're looking for. Has a limited history though,
since Jenkins retires old builds.

--Mikhail

Have feedback ?


On Tue, Feb 12, 2019 at 11:24 AM Brian Hulette  wrote:

> Hey everyone,
> I've been taking a look at some flakey tests as a way to get more familiar
> with the codebase. The first thing I want to do when investigating a
> particular test is find all the builds where it failed - but unfortunately
> this has proven rather challenging.
>
> For example, when I was looking at BEAM-6512 [1], I wanted to find all
> builds where `testMessageReceivedBySingleClientWhenThereAreMultipleClients`
> failed. Fortunately this test was (is) flaking relatively often, so it
> didn't take too long to just step through the
> Java_PreCommit_{Commit,Phrase,Cron} builds in Jenkins and find a couple of
> examples, like Cron #900 [2]. But what should I do for a flake that doesn't
> occur so often, or a flake that has actually already been resolved? Ideally
> I could confirm it hasn't happened in any recent builds and close the
> corresponding Jira in the latter case. Or find a couple of occurrences in
> the former case.
>
> klk@ suggested searching builds@ archives which was a great idea, but
> sadly I don't think it will work. When I searched for the BEAM-6512 failure
> [3], it did turn up a couple of builds, but note that Cron #900 is not
> present in the results. It looks like the relevant line was truncated in
> that build's email, so the results are not complete. Also Commit and Phrase
> build failures don't seem to go to builds@, which makes a lot of sense
> since I'm sure there's a lot of churn there, but failures in those builds
> could certainly be relevant for this search.
>
> I would think there would be some way to search Jenkins by test name, but
> I can't find it if it's there. Is anyone aware of a way to do this? I'm
> sure I could hack something together with curl and the Jenkins API but I'd
> rather not go that route if there's something easier I'm missing.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6512
> [2] https://builds.apache.org/job/beam_PreCommit_Java_Cron/900/
> [3]
> https://lists.apache.org/list.html?bui...@beam.apache.org:lte=1M:testMessageReceivedBySingleClientWhenThereAreMultipleClients
>

FYI: beam11 bad worker

2019-02-11 Thread Mikhail Gryzykhin

Hi everyone,

Small update:
We have a bad jenkins executor that fails all builds. You might experience
pre/post commit failures.

Yifan follows up on this.

Regards,
--Mikhail

Have feedback ?

Re: master broken (findBugs)

2019-02-08 Thread Mikhail Gryzykhin

+Ismael (commit author) explicitly

--Mikhail

Have feedback ?


On Fri, Feb 8, 2019 at 10:47 AM Michael Luckey  wrote:

> Hi,
>
> currently PRs are failing due to
>
> *19:14:30* FAILURE: Build completed with 2 failures.*19:14:30* *19:14:30* 1: 
> Task failed with an exception.*19:14:30* ---*19:14:30* * What went 
> wrong:*19:14:30* Execution failed for task 
> ':beam-runners-google-cloud-dataflow-java-fn-api-worker:findbugsMain'.*19:14:30*
>  > FindBugs rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Commit/src/runners/google-cloud-dataflow-java/worker/build/reports/findbugs/main.xml
>
>
> I traced it down to [1]. Which, from commit message is confusing me, because 
> now fund bugs is unhappy.
>
>
> Is it possible to revert? Or something else we should do?
>
>
> Thanks,
>
>
> michel
>
>
> [1] 
> https://github.com/apache/beam/commit/72627eb9f0b556574c23ceea06b3547d96c4107d
>
>

Re: master broken (findBugs)

2019-02-08 Thread Mikhail Gryzykhin

created BEAM-6637 <https://issues.apache.org/jira/browse/BEAM-6637> to
track this.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Fri, Feb 8, 2019 at 1:52 PM Mikhail Gryzykhin  wrote:

> +Ismael (commit author) explicitly
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Fri, Feb 8, 2019 at 10:47 AM Michael Luckey 
> wrote:
>
>> Hi,
>>
>> currently PRs are failing due to
>>
>> *19:14:30* FAILURE: Build completed with 2 failures.*19:14:30* *19:14:30* 1: 
>> Task failed with an exception.*19:14:30* ---*19:14:30* * What went 
>> wrong:*19:14:30* Execution failed for task 
>> ':beam-runners-google-cloud-dataflow-java-fn-api-worker:findbugsMain'.*19:14:30*
>>  > FindBugs rule violations were found. See the report at: 
>> file:///home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Commit/src/runners/google-cloud-dataflow-java/worker/build/reports/findbugs/main.xml
>>
>>
>> I traced it down to [1]. Which, from commit message is confusing me, because 
>> now fund bugs is unhappy.
>>
>>
>> Is it possible to revert? Or something else we should do?
>>
>>
>> Thanks,
>>
>>
>> michel
>>
>>
>> [1] 
>> https://github.com/apache/beam/commit/72627eb9f0b556574c23ceea06b3547d96c4107d
>>
>>

Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Mikhail Gryzykhin

Thank you for quick response Andrew.

I'll cleanup these. I'll keep the bug open and assign it to @Kenneth Knowles
 who's working on SQK for follow up: we need a way to
automatically cleanup topics.

Current suggestions:
1. Make SQL cleanup created topics
2. Cleanup topics created by SQL in tests
3. Increase quota so that topics have enough time to be cleaned up
automatically.

Regards,
--Mikhail

Have feedback <http://go/migryz-feedback>?


On Wed, Feb 6, 2019 at 8:05 AM Andrew Pilloud  wrote:

> SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.
>
> Andrew
>
> On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin  wrote:
>
>> +Kenneth Knowles  you're working on SQL recently, so
>> might provide some info.
>>
>> I see a lot of topics of format
>> rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
>> <https://pantheon.corp.google.com/cloudpubsub/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721?project=apache-beam-testing>.
>> Seems we do not cleanup properly.
>>
>> Is it safe to cleanup topics with this name?
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Minor UPD:
>>> As expected it fails most of our test jobs, since we use Pub/Subs in
>>> many tests.
>>>
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>>
>>> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Our python pipelines failed with limit exceeded error
>>>> <https://builds.apache.org/job/beam_PostCommit_Python_Verify/7313/console>
>>>> :
>>>>
>>>> ResourceExhausted: 429 Your project has exceeded a limit: 
>>>> (type="topics-per-project", current=1, maximum=1).
>>>>
>>>>
>>>> Does anyone know if there were new tests that use topics added recently?
>>>>
>>>> I tried to see list of topics, but UI fails
>>>> <https://pantheon.corp.google.com/cloudpubsub/topicList?project=apache-beam-testing>
>>>> to load. Will see if I can use APIs to investigate.
>>>>
>>>> If anyone has good insight, please, pick up BEAM-6610
>>>> <https://issues.apache.org/jira/browse/BEAM-6610>.
>>>>
>>>> --Mikhail
>>>>
>>>> Have feedback <http://go/migryz-feedback>?
>>>>
>>>

Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Mikhail Gryzykhin

+Kenneth Knowles  you're working on SQL recently, so might
provide some info.

I see a lot of topics of format
rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
<https://pantheon.corp.google.com/cloudpubsub/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721?project=apache-beam-testing>.
Seems we do not cleanup properly.

Is it safe to cleanup topics with this name?

--Mikhail

Have feedback <http://go/migryz-feedback>?

On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin  wrote:

> Minor UPD:
> As expected it fails most of our test jobs, since we use Pub/Subs in many
> tests.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> Our python pipelines failed with limit exceeded error
>> <https://builds.apache.org/job/beam_PostCommit_Python_Verify/7313/console>
>> :
>>
>> ResourceExhausted: 429 Your project has exceeded a limit: 
>> (type="topics-per-project", current=1, maximum=1).
>>
>>
>> Does anyone know if there were new tests that use topics added recently?
>>
>> I tried to see list of topics, but UI fails
>> <https://pantheon.corp.google.com/cloudpubsub/topicList?project=apache-beam-testing>
>> to load. Will see if I can use APIs to investigate.
>>
>> If anyone has good insight, please, pick up BEAM-6610
>> <https://issues.apache.org/jira/browse/BEAM-6610>.
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>

1 2 >

1 - 100 of 183 matches

Mail list logo