Re: Command line to run DatstoreIO integration tests for java

2021-07-29 Thread Alex Amato
I was hoping for the command line to run it. So that the test could be
tweaked to inject an error, and ensure the error handling code works as
expected

On Wed, Jul 28, 2021 at 8:34 PM Ke Wu  wrote:

> Comment the PR with "Run SQL PostCommit” would trigger the post commit
> integration tests for SQL, which I suppose includes DataStoreReadWriteIT
>
> Let me know if whether or not this is sufficient.
>
> Best,
> Ke
>
> On Jul 28, 2021, at 12:20 PM, Alex Amato  wrote:
>
> Is it possible to run a Datastore IO integration test to test this PR?
>
> https://github.com/apache/beam/pull/15183/files
>
> Probably this test can be ran somehow. Though I don't know the gradle
> command to run it
>
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/datastore/DataStoreReadWriteIT.java
>
> Does anyone know how to run this test?
>
>
>


Re: Java precomit failing, (though no test are failing)

2021-06-17 Thread Alex Amato
Hmm, perhaps it only happens sometimes. The other half of the time I "Run
Java Precommit" on this PR I hit this different failure:

The connection is not obvious to me, if its related to my PR.
https://github.com/apache/beam/pull/14804
I only added some Precondition checks. But I don't see those failing
anywhere.
(Unless something indirect is causing it and stacktrace for that is not
printed, i.e. like in a subprocess).

Any ideas? Are these tests known to be failing right now?

https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3742/#showFailuresLink

 Test Result (32 failures / +32)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteScriptedUpsert
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testReadWithMetadata
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithIndexFn
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testMaxParallelRequestsPerWindow
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteRetryValidRequest
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithMaxBatchSize
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteRetry
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testReadWithQueryString
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSizes
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithMaxBatchSizeBytes
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithDocVersion
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithAllowableErrors
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithTypeFn
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteScriptedUpsert
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testReadWithQueryValueProvider
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteRetryValidRequest
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithDocVersion
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSizes
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testMaxParallelRequestsPerWindow
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testReadWithQueryString
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWritePartialUpdate
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithMaxBatchSizeBytes
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testDefaultRetryPredicate
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithIndexFn
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithRouting
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteRetry
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testReadWithMetadata
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteFullAddressing
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithMaxBatchSize
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWriteWithIsDeleteFn
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testWrite

On Wed, Jun 16, 2021 at 5:24 PM Robert Burke  wrote:

> Very odd as those paths do resolve now, redirecting to their pkg.go.dev
> paths. Very odd. This feels transient, but it's not clear why that would
> return a 404 vs some other error.
>
> On Wed, 16 Jun 2021 at 15:39, Kyle Weaver  wrote:
>
>> For tasks without structured JUnit output, we have to scroll up / ctrl-f / 
>> grep for more logs. In this case it looks like it was probably a server-side 
>> issue. These links work for me, so I'm assuming the problem has been 
>> resolved.
>>
>>
>> *11:31:04* >* Task :release:go-licenses:java:dockerRun**11:31:04* package 
>> google.golang.org/protobuf/reflect/protoreflect: unrecognized import path 
>> "google.golang.org/protobuf/reflect/protoreflect": reading 
>> https://google.golang.org/protobuf/reflect/protoreflect?go-get=1: 404 Not 
>> Found*11:31:04* package google.golang.org/protobuf/runtime/protoimpl: 
>> unrecognized import path "google.golang.org/protobuf/runtime/protoimpl": 
>> reading https://google.golang.org/protobuf/runtime/protoimpl?go-get=1: 404 
>> Not Found*11:31:04* package google.golang.org/protobuf/types/descriptorpb: 
>> unrecognized import path "google.golang.org/protobuf/types/descriptorpb": 
>> reading https://google.golang.org/protobuf/types/descriptorpb?go-get=1: 404 
>> Not Found*11:31:04* package 
>> google.golang.org/protobuf/types/known/durationpb: unrecognized import path 
>> "google.golang.org/protobuf/types/known/durationpb": reading 
>> https://google.golang.org/protobuf/types/known/durationpb?go-get=1: 404 Not 
>> Found
>>
>>
>>
>> On Wed, Jun 16, 2021 at

Java precomit failing, (though no test are failing)

2021-06-16 Thread Alex Amato
For PR: https://github.com/apache/beam/pull/14804

Is something wrong on this machine? preventing it from running docker?
Seems to happen a few times after a run again as well.

Anything I can do here to move my PR forward and get it merged?

https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3735/consoleFull

*11:36:42* >* Task :sdks:java:core:buildDependents**11:36:42*
*11:36:42* FAILURE: Build failed with an exception.*11:36:42*
*11:36:42* * What went wrong:*11:36:42* Execution failed for task
':release:go-licenses:java:dockerRun'.*11:36:42* > Process 'command
'docker'' finished with non-zero exit value 1*11:36:42* *11:36:42* *
Try:*11:36:42* Run with --stacktrace option to get the stack trace.
Run with --info or --debug option to get more log output. Run with
--scan to get full insights.*11:36:42* *11:36:42* * Get more help at
https://help.gradle.org*11:36:42* *11:36:42* Deprecated Gradle
features were used in this build, making it incompatible with Gradle
7.0.*11:36:42* Use '--warning-mode all' to show the individual
deprecation warnings.*11:36:42* See
https://docs.gradle.org/6.8.3/userguide/command_line_interface.html#sec:command_line_warnings*11:36:42*
*11:36:42* BUILD FAILED in 7m 20s*11:36:42* 1134 actionable tasks: 420
executed, 712 from cache, 2 up-to-date*11:36:43* *11:36:43* Publishing
build scan...*11:36:43* https://gradle.com/s/3yfusnsnfll62


Fwd: One Pager - Test Command Line Discoverability in Beam

2021-05-25 Thread Alex Amato
Friendly ping. I'll wait for more suggestions by the end of the week. Then
close it out.

-- Forwarded message -
From: Alex Amato 
Date: Fri, May 21, 2021 at 2:54 PM
Subject: One Pager - Test Command Line Discoverability in Beam
To: dev 


Hi, I have had some issues determining how to run Beam tests. I have
written a one pager for review and would like your feedback, to solve the
problem
<https://docs.google.com/document/d/1qGkBmHUOIVM2KfPyKhJL6oH2nAgX9Vkl0z9ZydEoWzA/edit>
:

"A Beam developer is looking at a test file, such as
“BigQueryTornadoesIT.java” and wants to run this test. But they do not know
the command line they need to type to run this test."

I would like your feedback, to get toward a more concrete proposal. A few
solutions are possible for this, mentioned in the proposal. But any
solution that makes it very easy to understand how to run the test is a
viable option as well.

Cheers,
Alex


One Pager - Test Command Line Discoverability in Beam

2021-05-21 Thread Alex Amato
Hi, I have had some issues determining how to run Beam tests. I have
written a one pager for review and would like your feedback, to solve the
problem

:

"A Beam developer is looking at a test file, such as
“BigQueryTornadoesIT.java” and wants to run this test. But they do not know
the command line they need to type to run this test."

I would like your feedback, to get toward a more concrete proposal. A few
solutions are possible for this, mentioned in the proposal. But any
solution that makes it very easy to understand how to run the test is a
viable option as well.

Cheers,
Alex


Re: pb2 file generation

2021-05-17 Thread Alex Amato
Hmm, I don't recall if I needed to include the generated files in my PRs or
not, when i modified these in the past
You should be able to run this from the python virtualenv/sdk dir: (
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips), to generate
the pb2 files from the protos. After changing the relevant .proto files.
like metrics.proto
<https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/model/pipeline/src/main/proto/metrics.proto#L311>
.
python3 setup.py sdist


Example PR changing proto files
https://github.com/apache/beam/commit/0432f138f2bfb8d4d9543c4569581bdd3f8782db

We generally don't modify and check in the generated _pb2.py files. Just
let this command above generate the _pb2.py files, which occurs as part of
building the protos





On Mon, May 17, 2021 at 12:29 PM Tyson Hamilton  wrote:

> +Alex Amato  may be able to help with this.
>
> On Mon, May 17, 2021 at 7:35 AM Miguel Hernández Sandoval <
> rogelio.hernan...@wizeline.com> wrote:
>
>> Hi team,
>> I am working on BEAM-11984[1]. I want to add some labels to
>> metrics_pb2_urns.py. However, I don't know the process to generate/update
>> those files. I was hoping you could give me some guidance on this.
>>
>> Thank you all!
>>
>> -Mike
>>
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-11984
>>
>>
>>
>>
>>
>>
>>
>>
>> *This email and its contents (including any attachments) are being sent
>> toyou on the condition of confidentiality and may be protected by
>> legalprivilege. Access to this email by anyone other than the intended
>> recipientis unauthorized. If you are not the intended recipient, please
>> immediatelynotify the sender by replying to this message and delete the
>> materialimmediately from your system. Any further use, dissemination,
>> distributionor reproduction of this email is strictly prohibited. Further,
>> norepresentation is made with respect to any content contained in this
>> email.*
>
>


Re: [PROPOSAL] Remove pylint format checks

2021-04-09 Thread Alex Amato
Whatever the decision is, please update the instructions here :)
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips

(And if possible let's have one simple, easy to remember command to run all
python lint/formatting). Possibly using a wrapper script.


On Fri, Apr 9, 2021 at 4:59 PM Robert Bradshaw  wrote:

> I'd be happy with yapf + docformatter + isort, but I'd like to understand
> why yapf lets breakable lines go longer than 80 chars.
>
> On Fri, Apr 9, 2021 at 4:19 PM Brian Hulette  wrote:
>
>> Currently we have two different format checks for the Python SDK. Most
>> format checks are handled by yapf, which is nice since it is also capable
>> of re-writing the code to make it pass the checks done in CI. However we
>> *also* have some formatting checks still enabled in our .pylintrc [1], and
>> pylint has no such capability.
>>
>> Generally yapf's output just passes these pylint format checks, but not
>> always. For example yapf is lenient about lines over the column limit, and
>> pylint is not. So things like [2] can happen even on a PR formatted by
>> yapf. This is frustrating because it requires manual changes.
>>
>> I experimented with the yapf config to see if we can make it strict about
>> the column limit, but it doesn't seem to be possible. So instead I'd like
>> to propose that we just remove the pylint format checks, and rely on yapf's
>> checks alone.
>>
>> There are a couple issues here:
>> - we'd need to be ok with yapf deciding that some lines can be >80
>> characters
>> - yapf has no opinion whatsoever about docstrings [3], so the only thing
>> checking them is pylint. We might work around this by setting up
>> docformatter [4].
>>
>> Personally I'm ok with this if it means Python code formatting can be
>> completely automated with a single script that runs yapf, docformatter, and
>> isort.
>>
>> Brian
>>
>> [1]
>> https://github.com/apache/beam/blob/2408d0c11337b45e289736d4d7483868e717760c/sdks/python/.pylintrc#L165
>> [2]
>> https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Commit/9088/console
>> [3] https://github.com/google/yapf/issues/279
>> [4] https://github.com/myint/docformatter
>>
>


Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Alex Amato
How about a PCollection containing every element which was successfully
written?
Basically the same things which were passed into it.

Then you could act on every element after its been successfully written to
the sink.

On Wed, Mar 24, 2021 at 3:16 PM Robert Bradshaw  wrote:

> On Wed, Mar 24, 2021 at 2:36 PM Ismaël Mejía  wrote:
>
>> +dev
>>
>> Since we all agree that we should return something different than
>> PDone the real question is what should we return.
>>
>
> My proposal is that one returns a PCollection that consists,
> internally, of something contentless like nulls. This is future compatible
> with returning something more maningful based on the source source or write
> process itself, but at least this would be followable.
>
>
>> As a reminder we had a pretty interesting discussion about this
>> already in the past but uniformization of our return values has not
>> happened.
>> This thread is worth reading for Vincent or anyone who wants to
>> contribute Write transforms that return.
>>
>> https://lists.apache.org/thread.html/d1a4556a1e13a661cce19021926a5d0997fbbfde016d36989cf75a07%40%3Cdev.beam.apache.org%3E
>
>
> Yeah, we should go ahead and finally do something.
>
>
>>
>> > Returning PDone is an anti-pattern that should be avoided, but changing
>> it now would be backwards incompatible.
>>
>> Periodic reminder most IOs are still Experimental so I suppose it is
>> worth to the maintainers to judge if the upgrade to return someething
>> different of PDone is worth, in that case we can deprecate and remove
>> the previous signature in short time (2 releases was the average for
>> previous cases).
>>
>>
>> On Wed, Mar 24, 2021 at 10:24 PM Alexey Romanenko
>>  wrote:
>> >
>> > I thought that was said about returning a PCollection of write results
>> as it’s done in other IOs (as I mentioned as examples) that have
>> _additional_ write methods, like “withWriteResults()” etc, that return
>> PTransform<…, PCollection>.
>> > In this case, we keep backwards compatibility and just add new
>> funtionality. Though, we need to follow the same pattern for user API and
>> maybe even naming for this feature across different IOs (like we have for
>> "readAll()” methods).
>> >
>> >  I agree that we have to avoid returning PDone for such cases.
>> >
>> > On 24 Mar 2021, at 20:05, Robert Bradshaw  wrote:
>> >
>> > Returning PDone is an anti-pattern that should be avoided, but changing
>> it now would be backwards incompatible. PRs to add non-PDone returning
>> variants (probably as another option to the builders) that compose well
>> with Wait, etc. would be welcome.
>> >
>> > On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>> >>
>> >> In this way, I think “Wait” PTransform should work for you but, as it
>> was mentioned before, it doesn’t work with PDone, only with PCollection as
>> a signal.
>> >>
>> >> Since you already adjusted your own writer for that, it would be great
>> to contribute it back to Beam in the way as it was done for other IOs (for
>> example, JdbcIO [1] or BigtableIO [2])
>> >>
>> >> In general, I think we need to have it for all IOs, at least to use
>> with “Wait” because this pattern it's quite often required.
>> >>
>> >> [1]
>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L1078
>> >> [2]
>> https://github.com/apache/beam/blob/ab1dfa13a983d41669e70e83b11f58a83015004c/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L715
>> >>
>> >> On 24 Mar 2021, at 18:01, Vincent Marquez 
>> wrote:
>> >>
>> >> No, it only needs to ensure that one record seen on Pubsub has
>> successfully written to a database.  So "record by record" is fine, or even
>> "bundle".
>> >>
>> >> ~Vincent
>> >>
>> >>
>> >> On Wed, Mar 24, 2021 at 9:49 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>> >>>
>> >>> Do you want to wait for ALL records are written for Cassandra and
>> then write all successfully written records to PubSub or it should be
>> performed "record by record"?
>> >>>
>> >>> On 24 Mar 2021, at 04:58, Vincent Marquez 
>> wrote:
>> >>>
>> >>> I have a common use case where my pipeline looks like this:
>> >>> CassandraIO.readAll -> Aggregate -> CassandraIO.write ->
>> PubSubIO.write
>> >>>
>> >>> I do NOT want my pipeline to look like the following:
>> >>>
>> >>> CassandraIO.readAll -> Aggregate -> CassandraIO.write
>> >>>  |
>> >>>   ->
>> PubsubIO.write
>> >>>
>> >>> Because I need to ensure that only items written to Pubsub have
>> successfully finished a (quorum) write.
>> >>>
>> >>> Since CassandraIO.write is a PTransform I can't actually
>> use it here so I often roll my own 'writer', but maybe there is a
>> recommended way of doing this?
>> >>>
>> >>> Thanks in advance for any help.
>> 

Re: [Question] What is the best Beam datatype to map to BigQuery's BIGNUMERIC?

2021-03-19 Thread Alex Amato
Just skimming through the BigQuery docs quickly. They don't explicitly say
anywhere what types you should use in java or python for BigDecimal.
However their APIs deal with json strings, which can represent large
integers as strings.
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

We have two methods of interacting with BigQuery (BigQuery export and load
jobs which copy JSON files in and out of BigQuery)

Another is the the insertAll API code is called here
https://github.com/apache/beam/blob/adf85a1fa18cb45fd14fe2537e26bac69b158f87/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L845

For this to work with the BigNumeric/BigDecimal type the code would need to
construct these JSON strings properly if given a BigInteger class (Or
something similar). By copying them from the bigquery class:
com.google.api.services.bigquery.model.TableRow;
Since this is a BigQuery proto class, hopefully it has something to support
this...

Its not entirely clear to me if they do or do not support TableRow in their
class. Looking at the TableCell you can set it with Objects. So perhaps
they do support something like BigInteger
https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableCell.html

BigQuery team may have an answer to this question. You may wish to ask on
on Stackoverflow about this.
I would show them the TableRow and TableCell javadocs and ask them if
BigNumeric or BigDecimal can be used with it. And what type you should set
on the TableCell in Java

Hope that helps you get toward a solution






On Fri, Mar 19, 2021 at 12:07 PM Mingyu Zhong  wrote:

> Beam types and BigQuery types do not have 1:1 mapping. For example, Beam
> has a few integer types and BigQuery has only INT64. So far, each Beam type
> is mapped to 1 BigQuery type. This is the first case where a Beam type can
> potentially map to 2 BigQuery types.
>
> It would be ideal if we could provide a way for users to choose which
> BigQuery type to use. If they know their values fit in NUMERIC, they can
> choose NUMERIC. Otherwise they can choose BIGNUMERIC.
>
> IIRC, the mapping in question [1] is used only in
> BigQueryUtils.toTableSchema [2], which is called in
> BigQueryTable.buildIOWriter [3] and schema inference in WriteResult [4].
>
> For [3], BigQueryTable also has a member BigQueryUtils.ConversionOptions
> [5]. We can add an option to BigQueryUtils.ConversionOptions to specify
> whether to convert Decimal to NUMERIC or BIGNUMERIC, and pass
> BigQueryUtils.ConversionOptions as a new argument to
> BigQueryUtils.toTableSchema.
>
> For [4], I'm not sure if we can add BigQueryUtils.ConversionOptions to
> WriteResult but even if we can't, users can specify the schema instead of
> using the inferred schema, so it seems fine to keep mapping Decimal to
> NUMERIC in schema inference.
>
> Does this sound reasonable?
>
> [1]
> https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L180
> [2]
> https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L391
> [3]
> https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L192
> [4]
> https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2604
> [5]
> https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L81
>
> On 2021/03/18 19:40:07, Reuven Lax  wrote:
> > It does not, which might've been a mistake. The user can pass in an>
> > arbitrary  BigDecimal object, and we will encode whatever scale
> parameter>
> > is encoded. This means that for DECIMAL, each record encodes the scale.>
> >
> > On Thu, Mar 18, 2021 at 12:33 PM Mingyu Zhong 
> wrote:>
> >
> > > Just wanted to clarify: BigQuery BIGNUMERIC type costs more than
> NUMERIC>
> > > type, so if NUMERIC is sufficient, the users likely won't want to
> switch to>
> > > BIGNUMERIC.>
> > >>
> > > Does Beam DECIMAL datatype contain the precision/scale parameters in
> the>
> > > metadata? If so, can we use those parameters to determine the mapped
> type?>
> > >>
> > > On Thu, Mar 18, 2021 at 12:08 PM Brian Hulette >
> > > wrote:>
> > >>
> > >> Hi Vachan,>
> > >> I don't think Beam DECIMAL is really a great mapping for either>
> > >> BigQuery's NUMERIC or BIGNUMERIC type. Beam's DECIMAL represents
> arbitrary>
> > >> precision decimals [1] to map well to Java's 

Re: Adding flaky postcommit test suite

2021-03-16 Thread Alex Amato
Is it possible to make the presubmit auto retry all failed tests a few
times? (and maybe generate a report of a list of flakey tests).
Then you don't need to disable/isolate the flakey tests.

If this is not possible, or hard to setup, then manually moving them to a
different suite sounds like a good idea.

On Tue, Mar 16, 2021 at 2:11 PM Pablo Estrada  wrote:

> Hi all,
> In Beam, we sometimes hit the issue of having one or two test cases that
> are particularly flaky, and we deactivate them.
> This is completely reasonable to me, because we need to keep good testing
> signal on our primary suites.
> The danger of deactivating these tests is that, although we have good
> practices to file JIRA issues to re-enable them, it is still easy for these
> issues and tests to be forgotten.
> Of course, ideally, the solution is "do not forget old deactivated tests"
> - and we should adopt practices to ensure that.
>
> I think, to strengthen our practices, we can reinforce them with a
> pragmatic choice: Instead of fully deactivating tests, we can make them run
> in a separate suite of Flaky tests. Why would this help?
>
> - It would allow us to make sure that flaky tests continue to *be able to
> run*.
> - It would remind us that we have flaky tests that need fixing.
> - It would allow us to experiment fixes to these tests on the Flaky suite,
> and once they're reliable, move them to the main suite.
>
> Does this make sense to others?
> Best
> -P.
>


Re: Please help triage issues!

2021-03-15 Thread Alex Amato
(Do I need certain permissions to be able to do this?)

On Mon, Mar 15, 2021 at 9:47 AM Alex Amato  wrote:

> Would you mind posting a screenshot of exactly where you are supposed to
> click to move a jira issue to "Open" status? I honestly can't find where to
> click. I don't see the option in the edit dialog box
>
> On Sun, Mar 14, 2021 at 8:03 PM Kenneth Knowles  wrote:
>
>> No need for feeling any guilt :-)
>>
>> I'm just hoping that by everyone randomly doing a very small amount of
>> work, this could be in good shape very quickly. I've done a number of bulk
>> edits like automated dependency upgrade requests which brings the number
>> down to just over 600.
>>
>> Your message does highlight some easy cases: issues filed to track your
>> own feature work. I did built automation for this: "On Issue Created" ->
>> "If Assignee == Issue Creator" -> "Transition to 'Open'". If the automation
>> isn't working, that can probably be fixed. Some of the issues might just
>> predate the automation.
>>
>> To be super clear: I don't mean to ask anyone to waste time looking at
>> things that don't need attention, but to be able to notice things that do
>> need attention. I did a few manually too, and the components, issue type,
>> and priority very often need fixing up. I especially want to get untriaged
>> P0s and P1s to zero.
>>
>> Kenn
>>
>> On Fri, Mar 12, 2021 at 5:07 PM Tyson Hamilton 
>> wrote:
>>
>>> I'm guilty of creating issues and not moving them to 'open'. I'll do
>>> better to move them to open in the future. To recompense I will spend some
>>> additional time triaging =)
>>>
>>> Thanks for the review of the flow.
>>>
>>> On Thu, Mar 11, 2021 at 12:39 PM Kenneth Knowles 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> You may or may not think about this very often, but our Jira workflow
>>>> goes like this:
>>>>
>>>> Needs Triage --> Open --> In Progress --> Resolved
>>>>
>>>> "Needs Triage" means someone needs to look at it briefly:
>>>>
>>>>  - component(s)
>>>>  - label(s)
>>>>  - issue type
>>>>  - priority (see https://beam.apache.org/contribute/jira-priorities/)
>>>>  - if appropriate, ping someone or write to dev@ especially for P1 and
>>>> P0
>>>>
>>>> Then transition the issue to "Open".
>>>>
>>>> Currently there is a big backlog but I don't think it is actually
>>>> accurate. I also think we have enough people to keep up with this and even
>>>> to eliminate the backlog pretty quick.
>>>>
>>>> Here are some things you can do when you are waiting for Jenkins tests
>>>> to complete:
>>>>
>>>>  - check your assigned issues
>>>>  - open up this filter and triage a couple issues at random:
>>>> https://issues.apache.org/jira/issues/?filter=12345682
>>>>
>>>> 800+ may seem like a lot, but dev@ had 65 participants in the last 28
>>>> days (126 participants in the last 3 months). I would guess it averages
>>>> less than a minute per issue so this could be done in less than a day,
>>>> especially considering our CI times :-)
>>>>
>>>> Kenn
>>>>
>>>>


Re: Please help triage issues!

2021-03-15 Thread Alex Amato
Would you mind posting a screenshot of exactly where you are supposed to
click to move a jira issue to "Open" status? I honestly can't find where to
click. I don't see the option in the edit dialog box

On Sun, Mar 14, 2021 at 8:03 PM Kenneth Knowles  wrote:

> No need for feeling any guilt :-)
>
> I'm just hoping that by everyone randomly doing a very small amount of
> work, this could be in good shape very quickly. I've done a number of bulk
> edits like automated dependency upgrade requests which brings the number
> down to just over 600.
>
> Your message does highlight some easy cases: issues filed to track your
> own feature work. I did built automation for this: "On Issue Created" ->
> "If Assignee == Issue Creator" -> "Transition to 'Open'". If the automation
> isn't working, that can probably be fixed. Some of the issues might just
> predate the automation.
>
> To be super clear: I don't mean to ask anyone to waste time looking at
> things that don't need attention, but to be able to notice things that do
> need attention. I did a few manually too, and the components, issue type,
> and priority very often need fixing up. I especially want to get untriaged
> P0s and P1s to zero.
>
> Kenn
>
> On Fri, Mar 12, 2021 at 5:07 PM Tyson Hamilton  wrote:
>
>> I'm guilty of creating issues and not moving them to 'open'. I'll do
>> better to move them to open in the future. To recompense I will spend some
>> additional time triaging =)
>>
>> Thanks for the review of the flow.
>>
>> On Thu, Mar 11, 2021 at 12:39 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> You may or may not think about this very often, but our Jira workflow
>>> goes like this:
>>>
>>> Needs Triage --> Open --> In Progress --> Resolved
>>>
>>> "Needs Triage" means someone needs to look at it briefly:
>>>
>>>  - component(s)
>>>  - label(s)
>>>  - issue type
>>>  - priority (see https://beam.apache.org/contribute/jira-priorities/)
>>>  - if appropriate, ping someone or write to dev@ especially for P1 and
>>> P0
>>>
>>> Then transition the issue to "Open".
>>>
>>> Currently there is a big backlog but I don't think it is actually
>>> accurate. I also think we have enough people to keep up with this and even
>>> to eliminate the backlog pretty quick.
>>>
>>> Here are some things you can do when you are waiting for Jenkins tests
>>> to complete:
>>>
>>>  - check your assigned issues
>>>  - open up this filter and triage a couple issues at random:
>>> https://issues.apache.org/jira/issues/?filter=12345682
>>>
>>> 800+ may seem like a lot, but dev@ had 65 participants in the last 28
>>> days (126 participants in the last 3 months). I would guess it averages
>>> less than a minute per issue so this could be done in less than a day,
>>> especially considering our CI times :-)
>>>
>>> Kenn
>>>
>>>


Re: Can we solve WindowFn.getOutputTime another way?

2021-02-10 Thread Alex Amato
On Wed, Feb 10, 2021 at 12:14 PM Kenneth Knowles  wrote:

> On a PR (https://github.com/apache/beam/pull/13927) we got into a
> discussion of a very old and strange feature of Beam that I think we should
> revisit.
>
> The WindowFn has the ability to shift timestamps forward in order to
> unblock downstream watermarks. Why? Specifically in this situation:
>
>  - aggregation/GBK with overlapping windows like SlidingWindows
>  - timestamp combiner of the aggregated outputs is EARLIEST of the inputs
>  - there is another downstream aggregation/GBK
>
> The output watermark of the upstream aggregation is held to the minimum of
> the inputs. When an output is emitted, we desire the output to flow through
> the rest of the pipeline without delay. However, the downstream aggregation
> can (and often will) be delayed by the window size because of *watermark
> holds in other later windows that are not released until those windows
> output.*
>
Could you describe this a bit more? Why would later windows hold up the
watermark for upstream steps. (Is it due to some subtlety? Such as tracking
the watermark for each stage, rather than for each step?)


>
> To avoid this problem, element x in window w will have its timestamp
> shifted to not overlap with any earlier windows. It is a weird behavior. It
> fixes the watermark hold problem but introduces a strange output with a
> mysterious timestamp that is hard to justify.
>
> Any other ideas?
>
> Kenn
>


Re: Docker Development Environment

2020-11-30 Thread Alex Amato
If any of these are suitable for at least some development. I propose we
merge them, and update them with fixes later. Rather than trying to get
things 100% working in the first PR.

Looks like this one was opened in early Sept, and never got merged. Which
is a pretty long time. Perhaps abandoned for the later?
https://github.com/apache/beam/pull/12837

This one looks like its just failing on just a few tests (Which may be
addressed soon, but the PR was opened 19 days ago).
https://github.com/apache/beam/pull/13308
(Can we set a deadline for this? And just say merge it by the end of the
week, regardless if the last two tests can be fixed or not?)

(Would like to nudge this along, as it's been a pain point for many for a
while now).

Thanks for the work here Niels, Omar and Sam.
Looking forward to giving this a try :)


On Mon, Nov 30, 2020 at 11:32 AM Brian Hulette  wrote:

> I agree this is a good idea. I remember my first experience with Beam
> development - I ran through the steps at [1] and had `./gradlew check`
> fail. I don't think I ever got it working before moving on and just running
> more specific tasks.
> It would be great if we had a reliable way for new contributors to
> establish an environment that can successfully run `./gradlew check`.
>
> Niels Basjes' PR (https://github.com/apache/beam/pull/13308) seems to be
> close to that, so I think we should focus on getting that working and
> iterate from there. Omar concurred with that in
> https://github.com/apache/beam/pull/12837.
>
> [1] https://beam.apache.org/contribute/#development-setup
>
>
> On Wed, Nov 25, 2020 at 3:39 PM Ahmet Altay  wrote:
>
>> Thank you for doing this.
>>
>> I have seen a few related PRs. Connecting them here in case these efforts
>> could be combined:
>> - https://github.com/apache/beam/pull/12837 (/cc +Omar Ismail
>>  )
>> - https://github.com/apache/beam/pull/13308
>>
>> Ahmet
>>
>> On Wed, Nov 25, 2020 at 2:53 PM Sam Rohde  wrote:
>>
>>> Hi All,
>>>
>>> I got tired of my local dev environment being ruined by updates so I
>>> made a container for Apache Beam development work. What this does is create
>>> a Docker container from the Ubuntu Groovy image and load it up with all the
>>> necessary libraries/utilities for Apache Beam development. Then I run an
>>> interactive shell in the Docker container where I do my work.
>>>
>>> This is a nice way for new contributors to easily get started. However
>>> with the container in its current form, I don't know if this will help
>>> other people because it is tied closely with my workflow (using VIM,
>>> YouCompleteMe, for Python). But I think it can be a nice starting point for
>>> improvements. For example:
>>>
>>>- Sharing the host display with Docker to start GUI applications
>>>(like IntelliJ) in the container
>>>- Adding Golang development support
>>>
>>> Here's a draft PR , let me
>>> know what you think, how it can be improved, and whether it's a good idea
>>> for us to have a dev container like this.
>>>
>>> Regards,
>>> Sam
>>>
>>>


Re: beam flink-runner distribution implementation

2020-11-19 Thread Alex Amato
Are you referring to a "Flink Gauge" or a "Beam Gauge"? Are you suggesting
to package it as a "Flink Histogram?" (i.e. A Flink runner specific concept
of Histograms) If so, seems fine and I have no comment here.

FWIW,
I proposed a "Beam Histogram" metric (bucket counts).
https://s.apache.org/beam-histogram-metrics

(No runner, implements this, and most likely I will not be pursuing this
further, due to a change of priority/interest around the meric I was
interested in using this for).
I was intending to use it for a specific set of metrics metric (No plans to
provide a User defined Histogram Metric API)
https://s.apache.org/beam-gcp-debuggability

I don't think we should pursue any plans to package "Beam Distributions" as
"Beam Histograms". As a "Beam Histogram" is essential several counters (one
for each bucket). Changing all usage of beam.distribution to a "Beam
Histograms" would have performance implications, and is not advised. If at
some point "Beam Histograms" are implemented, migrating the usage of
Metrics.distribution to histogram should be done on an individual basis.





On Thu, Nov 19, 2020 at 5:47 PM Robert Bradshaw  wrote:

> Guage certainly seems wrong for DistributionResult. Yes, using a
> Histogram would be a welcome PR.
>
> On Thu, Nov 19, 2020 at 12:58 PM Kyle Weaver  wrote:
> >
> > What are the advantages of using a Histogram instead of a Gauge?
> >
> > Also, check out this design doc for adding histogram metrics to Beam if
> you haven't already: http://s.apache.org/beam-metrics-api (Not sure what
> the current status is.)
> >
> > On Wed, Nov 18, 2020 at 1:37 PM Richard Moorhead <
> richard.moorh...@gmail.com> wrote:
> >>
> >> Beam's DistributionResult is implemented as a Gauge within the Flink
> runner. Can someone explain the rationale behind this? Would a PR to
> utilize a Histogram be acceptable?
>


Re: Unable to run python formater (Are the instructions out of date?)

2020-11-04 Thread Alex Amato
Well, running as sudo just hits a different error

(my-virtual-env-3.6.10) ajamato@ajamato-linux0:~/beam/sdks/python$ sudo tox
-e py36-lint
[sudo] password for ajamato:
GLOB sdist-make: /usr/local/google/home/ajamato/beam/sdks/python/setup.py
py36-lint create:
/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint
ERROR: invocation failed (exit code 1), logfile:
/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/log/py36-lint-0.log
===
log start

RuntimeError: failed to query /usr/bin/python3.6 with code 1 err:
'Traceback (most recent call last):\n  File
"/usr/lib/python3/dist-packages/virtualenv/discovery/py_info.py", line 16,
in \nfrom distutils import dist\nImportError: cannot import
name \'dist\'\n'


log end
=
ERROR: InvocationError for command /usr/bin/python3 -m virtualenv
--no-download --python /usr/bin/python3.6 py36-lint (exited with code 1)

summary
_
ERROR:   py36-lint: InvocationError for command /usr/bin/python3 -m
virtualenv --no-download --python /usr/bin/python3.6 py36-lint (exited with
code 1)

On Wed, Nov 4, 2020 at 10:16 PM Alex Amato  wrote:

> I see, well. I have setup a new virutalenv using pyenv for python 3.6.10
>
> Then ran these steps from the python tips guide+tox+pytest install
>
> # Install setup.py requirements.
> (env) $ pip install -r build-requirements.txt
>
> # Install packages.
> (env) $ pip install -e .[gcp,test]
>
> (env) $ pip install pytest
>
> (env) $ pip install tox
> Tried running tox -e py36-lint and tox -e py37-lint
>
>
> (Realized that I might only be able to run py36-lint from my 3.6.10
> environment? but not sure)
> Got this for py36-lint
>
> ===
> log start
> 
> ERROR: invocation failed (exit code 1), logfile:
> /usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/log/py36-lint-1.log
> ===
> log start
> 
> Traceback (most recent call last):
>   File "target/.tox/py36-lint/bin/pip", line 5, in 
> from pip._internal.cli.main import main
>   File
> "/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/main.py",
> line 10, in 
> from pip._internal.cli.autocompletion import autocomplete
>   File
> "/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/autocompletion.py",
> line 9, in 
> from pip._internal.cli.main_parser import create_main_parser
>   File
> "/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/main_parser.py",
> line 7, in 
> from pip._internal.cli import cmdoptions
>   File
> "/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/cmdoptions.py",
> line 24, in 
> from pip._internal.cli.progress_bars import BAR_TYPES
>   File
> "/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/progress_bars.py",
> line 7, in 
> from pip._vendor import six
> ImportError: cannot import name 'six'
>
> 
> log end
> =
> ERRO

Re: Unable to run python formater (Are the instructions out of date?)

2020-11-04 Thread Alex Amato
I see, well. I have setup a new virutalenv using pyenv for python 3.6.10

Then ran these steps from the python tips guide+tox+pytest install

# Install setup.py requirements.
(env) $ pip install -r build-requirements.txt

# Install packages.
(env) $ pip install -e .[gcp,test]

(env) $ pip install pytest

(env) $ pip install tox
Tried running tox -e py36-lint and tox -e py37-lint


(Realized that I might only be able to run py36-lint from my 3.6.10
environment? but not sure)
Got this for py36-lint

===
log start

ERROR: invocation failed (exit code 1), logfile:
/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/log/py36-lint-1.log
===
log start

Traceback (most recent call last):
  File "target/.tox/py36-lint/bin/pip", line 5, in 
from pip._internal.cli.main import main
  File
"/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/main.py",
line 10, in 
from pip._internal.cli.autocompletion import autocomplete
  File
"/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/autocompletion.py",
line 9, in 
from pip._internal.cli.main_parser import create_main_parser
  File
"/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/main_parser.py",
line 7, in 
from pip._internal.cli import cmdoptions
  File
"/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/cmdoptions.py",
line 24, in 
from pip._internal.cli.progress_bars import BAR_TYPES
  File
"/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/lib/python3.6/site-packages/pip/_internal/cli/progress_bars.py",
line 7, in 
from pip._vendor import six
ImportError: cannot import name 'six'


log end
=
ERROR: could not install deps [-rbuild-requirements.txt]; v =
InvocationError('/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/bin/python
target/.tox/py36-lint/bin/pip install --retries 10
-rbuild-requirements.txt', 1)

summary
_
ERROR:   py36-lint: could not install deps [-rbuild-requirements.txt]; v =
InvocationError('/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py36-lint/bin/python
target/.tox/py36-lint/bin/pip install --retries 10
-rbuild-requirements.txt', 1)
==

tried pip install six,as well, but I am met with
Requirement already satisfied: six in
/usr/local/google/home/ajamato/.pyenv/versions/3.6.10/envs/my-virtual-env-3.6.10/lib/python3.6/site-packages
(1.15.0)



I am guessing something is preventing tox from doing some steps? Does one
normally run tox under sudo?


On Wed, Nov 4, 2020 at 10:05 PM Chad Dombrova  wrote:

>
> All of these are great suggestions. I think what I really need though is
>> some way to figure out how to cleanly install (perhaps reinstalling)
>> everything I need to run all these commands. tox, yapf,
>>
>
> tox should be the only thing you need to install.  After that, tox will
> install whatever you need to run the tests.  pre-commit accomplishes
> something similar, but just for the pre-commit git hooks.
>
> -chad
>
>


Re: Unable to run python formater (Are the instructions out of date?)

2020-11-04 Thread Alex Amato
All of these are great suggestions. I think what I really need though is
some way to figure out how to cleanly install (perhaps reinstalling)
everything I need to run all these commands. tox, yapf,

As I keep getting errors, try to install a dep I think I am missing, rinse
an repeat, and not quite getting to a state where I have reliable tooling.

On Mon, Nov 2, 2020 at 12:40 PM Sam Rohde  wrote:

> I personally run `tox -e py37-lint` and `tox -e py3-yapf` from the
> root/sdks/python directory and that catches most stuff. If you are adding
> type annotations then also running `tox -e py37-mypy` is a good choice.
> Note that tox supports tab completion, so you can see all the different
> options by double-pressing tab with `tox -e` in the root/sdks/python
> directory.
>
> On Wed, Oct 28, 2020 at 8:52 PM Alex Amato  wrote:
>
>> Thanks Chad, this was helpful. :)
>>
>> Btw, I think this helps my PR format somewhat, but some more checks are
>> ru, not covered by this tool when I push the PR.
>>
>> My PR is running more checks under
>> *:sdks:python:test-suites:tox:py37:mypyPy37*
>>
>> I am curious if anyone knows a good command line to try before pushing
>> PRs to catch these issues locally first? (I had one in the past, but I
>> think its outdated).
>>
>>
>>
>> On Wed, Oct 28, 2020 at 8:41 PM Pablo Estrada  wrote:
>>
>>> woah I didn't know about this tool at all Chad. It looks nice : )
>>> FWIW, if you feel up to it, I've given you edit access to the Beam wiki (
>>> https://cwiki.apache.org/confluence/display/BEAM) in case you'd like to
>>> add the tip.
>>> Thanks!
>>> -P.
>>>
>>> On Wed, Oct 28, 2020 at 8:09 PM Chad Dombrova  wrote:
>>>
>>>> I would like to edit it!  I have an apache account and I am a committed
>>>> but IIRC I could not edit it with my normal credentials.
>>>>
>>>>
>>>> On Wed, Oct 28, 2020 at 8:02 PM Robert Burke 
>>>> wrote:
>>>>
>>>>> (it's a wiki, so anyone who requests and account can improve it)
>>>>>
>>>>> On Wed, Oct 28, 2020, 7:45 PM Chad Dombrova  wrote:
>>>>>
>>>>>> It’s unfortunate that those instructions don’t include pre-commit,
>>>>>> which is by far the easiest way to do this.
>>>>>>
>>>>>> To set it up:
>>>>>>
>>>>>> pip install pre-commit
>>>>>> pre-commit install
>>>>>>
>>>>>> Install sets up git pre-commit hooks so that it will run yapf and
>>>>>> pylint on changed files every time you commit (you’ll need python3.7. I
>>>>>> think it should be possible to loosen this, as this has been an annoyance
>>>>>> for me)
>>>>>>
>>>>>> To skip running the check on commit add -n:
>>>>>>
>>>>>> git commit -nm "blah blah"
>>>>>>
>>>>>> Alternatively, to run the check manually on changed files (pre-commit
>>>>>> install is not required to run it this way):
>>>>>>
>>>>>> pre-commit run yapf
>>>>>>
>>>>>> Or on all files:
>>>>>>
>>>>>> pre-commit run -a yapf
>>>>>>
>>>>>> More info here: https://pre-commit.com/#config-language_version
>>>>>>
>>>>>> On Wed, Oct 28, 2020 at 6:46 PM Alex Amato 
>>>>>> wrote:
>>>>>>
>>>>>>> I tried both the tox and yapf instructions on the python tips page
>>>>>>> <https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-Formatting>.
>>>>>>> And the gradle target which failed on PR precommit. I am wondering if 
>>>>>>> there
>>>>>>> is something additional I need to setup?
>>>>>>>
>>>>>>> Here is the output from all three attempts approaches I attempted.
>>>>>>> Any ideas how to get this working?
>>>>>>>
>>>>>>> *(ajamato_env2) ajamato@ajamato-linux0:~/beam/sdks/python$ git diff
>>>>>>> --name-only --relative bigquery_python_sdk origin/master | xargs yapf
>>>>>>> --in-place*
>>>>>>> Traceback (most recent call last):
>>>>>>>   File "/usr/local/google/home/ajamato/.local/bin/yapf", line 8, in
>>>>>>> 
>

Re: Unable to run python formater (Are the instructions out of date?)

2020-10-28 Thread Alex Amato
Thanks Chad, this was helpful. :)

Btw, I think this helps my PR format somewhat, but some more checks are
ru, not covered by this tool when I push the PR.

My PR is running more checks under
*:sdks:python:test-suites:tox:py37:mypyPy37*

I am curious if anyone knows a good command line to try before pushing PRs
to catch these issues locally first? (I had one in the past, but I think
its outdated).



On Wed, Oct 28, 2020 at 8:41 PM Pablo Estrada  wrote:

> woah I didn't know about this tool at all Chad. It looks nice : )
> FWIW, if you feel up to it, I've given you edit access to the Beam wiki (
> https://cwiki.apache.org/confluence/display/BEAM) in case you'd like to
> add the tip.
> Thanks!
> -P.
>
> On Wed, Oct 28, 2020 at 8:09 PM Chad Dombrova  wrote:
>
>> I would like to edit it!  I have an apache account and I am a committed
>> but IIRC I could not edit it with my normal credentials.
>>
>>
>> On Wed, Oct 28, 2020 at 8:02 PM Robert Burke  wrote:
>>
>>> (it's a wiki, so anyone who requests and account can improve it)
>>>
>>> On Wed, Oct 28, 2020, 7:45 PM Chad Dombrova  wrote:
>>>
>>>> It’s unfortunate that those instructions don’t include pre-commit,
>>>> which is by far the easiest way to do this.
>>>>
>>>> To set it up:
>>>>
>>>> pip install pre-commit
>>>> pre-commit install
>>>>
>>>> Install sets up git pre-commit hooks so that it will run yapf and
>>>> pylint on changed files every time you commit (you’ll need python3.7. I
>>>> think it should be possible to loosen this, as this has been an annoyance
>>>> for me)
>>>>
>>>> To skip running the check on commit add -n:
>>>>
>>>> git commit -nm "blah blah"
>>>>
>>>> Alternatively, to run the check manually on changed files (pre-commit
>>>> install is not required to run it this way):
>>>>
>>>> pre-commit run yapf
>>>>
>>>> Or on all files:
>>>>
>>>> pre-commit run -a yapf
>>>>
>>>> More info here: https://pre-commit.com/#config-language_version
>>>>
>>>> On Wed, Oct 28, 2020 at 6:46 PM Alex Amato  wrote:
>>>>
>>>>> I tried both the tox and yapf instructions on the python tips page
>>>>> <https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-Formatting>.
>>>>> And the gradle target which failed on PR precommit. I am wondering if 
>>>>> there
>>>>> is something additional I need to setup?
>>>>>
>>>>> Here is the output from all three attempts approaches I attempted. Any
>>>>> ideas how to get this working?
>>>>>
>>>>> *(ajamato_env2) ajamato@ajamato-linux0:~/beam/sdks/python$ git diff
>>>>> --name-only --relative bigquery_python_sdk origin/master | xargs yapf
>>>>> --in-place*
>>>>> Traceback (most recent call last):
>>>>>   File "/usr/local/google/home/ajamato/.local/bin/yapf", line 8, in
>>>>> 
>>>>> sys.exit(run_main())
>>>>>   File
>>>>> "/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
>>>>> line 365, in run_main
>>>>> sys.exit(main(sys.argv))
>>>>>   File
>>>>> "/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
>>>>> line 135, in main
>>>>> verbose=args.verbose)
>>>>>   File
>>>>> "/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
>>>>> line 204, in FormatFiles
>>>>> in_place, print_diff, verify, quiet, verbose)
>>>>>   File
>>>>> "/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
>>>>> line 233, in _FormatFile
>>>>> logger=logging.warning)
>>>>>   File
>>>>> "/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/yapflib/yapf_api.py",
>>>>> line 100, in FormatFile
>>>>> verify=verify)
>>>>>   File
>>>>> "/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/yapflib/yapf_api.py",
>>>>> line 147, in FormatCode
>>>>> tree = pytree_utils.ParseCodeToTree(unformatted_source)
>>>>>   File
>>&g

Unable to run python formater (Are the instructions out of date?)

2020-10-28 Thread Alex Amato
I tried both the tox and yapf instructions on the python tips page
.
And the gradle target which failed on PR precommit. I am wondering if there
is something additional I need to setup?

Here is the output from all three attempts approaches I attempted. Any
ideas how to get this working?

*(ajamato_env2) ajamato@ajamato-linux0:~/beam/sdks/python$ git diff
--name-only --relative bigquery_python_sdk origin/master | xargs yapf
--in-place*
Traceback (most recent call last):
  File "/usr/local/google/home/ajamato/.local/bin/yapf", line 8, in 
sys.exit(run_main())
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
line 365, in run_main
sys.exit(main(sys.argv))
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
line 135, in main
verbose=args.verbose)
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
line 204, in FormatFiles
in_place, print_diff, verify, quiet, verbose)
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/__init__.py",
line 233, in _FormatFile
logger=logging.warning)
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/yapflib/yapf_api.py",
line 100, in FormatFile
verify=verify)
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/yapflib/yapf_api.py",
line 147, in FormatCode
tree = pytree_utils.ParseCodeToTree(unformatted_source)
  File
"/usr/local/google/home/ajamato/.local/lib/python2.7/site-packages/yapf/yapflib/pytree_utils.py",
line 127, in ParseCodeToTree
raise e
  File "apache_beam/metrics/execution.pxd", line 18
cimport cython
 ^
SyntaxError: invalid syntax

*(ajamato_env2) ajamato@ajamato-linux0:~/beam/sdks/python$ tox -e py3-yapf*
GLOB sdist-make: /usr/local/google/home/ajamato/beam/sdks/python/setup.py
py3-yapf create:
/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py3-yapf
ERROR: invocation failed (exit code 1), logfile:
/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py3-yapf/log/py3-yapf-0.log
===
log start

RuntimeError: failed to build image pkg_resources because:
Traceback (most recent call last):
  File
"/usr/lib/python3/dist-packages/virtualenv/seed/embed/via_app_data/via_app_data.py",
line 60, in _install
installer.install(creator.interpreter.version_info)
  File
"/usr/lib/python3/dist-packages/virtualenv/seed/embed/via_app_data/pip_install/base.py",
line 42, in install
self._sync(filename, into)
  File
"/usr/lib/python3/dist-packages/virtualenv/seed/embed/via_app_data/pip_install/copy.py",
line 13, in _sync
copy(src, dst)
  File "/usr/lib/python3/dist-packages/virtualenv/util/path/_sync.py", line
53, in copy
method(norm(src), norm(dest))
  File "/usr/lib/python3/dist-packages/virtualenv/util/path/_sync.py", line
64, in copytree
shutil.copy(src_f, dest_f)
  File "/usr/lib/python3.8/shutil.py", line 415, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.8/shutil.py", line 261, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory:
'/usr/local/google/home/ajamato/beam/sdks/python/target/.tox/py3-yapf/lib/python3.8/site-packages/pkg_resources/_vendor/packaging/__init__.py'



log end
=
ERROR: InvocationError for command /usr/bin/python3 -m virtualenv
--no-download --python /usr/bin/python3 py3-yapf (exited with code 1)

summary
_
ERROR:   py3-yapf: InvocationError for command /usr/bin/python3 -m
virtualenv --no-download --python /usr/bin/python3 py3-yapf (exited with
code 1)
(ajamato_env2) ajamato@ajamato-linux0:~/beam/sdks/python$



*ajamato@ajamato-linux0:~/beam$ ./gradlew
:sdks:python:test-suites:tox:py38:formatter*
To honour the JVM settings for this build a new JVM will be forked. Please
consider using the daemon:
https://docs.gradle.org/6.6.1/userguide/gradle_daemon.html.
Daemon will be stopped at the end of the build stopping after processing
Configuration on demand is an incubating feature.

> Task :sdks:python:test-suites:tox:py38:formatter
GLOB sdist-make:
/usr/local/google/home/ajamato/beam/sdks/python/test-suites/tox/py38/build/srcs/sdks/python/setup.py
py3-yapf-check recreate:

Re: Guarding Python setup function in __main__ checker

2020-10-28 Thread Alex Amato
Found this stackoverflow question.
https://stackoverflow.com/questions/59831397/run-setup-function-from-setuptools-only-if-name-main

>From what I understand, normally its not needed since setup.py is generally
not imported. But yes, a main guard would prevent code from running if the
setup.py is imported.
I think it would be best to first understand why setup.py being imported in
the first place. Which is a bit odd.

Does one of our dependencies or tools do this? Is that tool making an
assumption that we have main guarded our setup.py?
It seems like run_integration_test.sh

runs setup.py itself, via nose. Is this an intended behaviour of nose? Is
it doing some setup first then running setup.py?



On Wed, Oct 28, 2020 at 1:38 PM Heejong Lee  wrote:

> I've encountered the following error while I was testing Python
> integration tests via `run_integration_test.sh` on MacOS:
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/spawn.py",
> line 116, in spawn_main
> exitcode = _main(fd, parent_sentinel)
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/spawn.py",
> line 125, in _main
> prepare(preparation_data)
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/spawn.py",
> line 236, in prepare
> _fixup_main_from_path(data['init_main_from_path'])
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/spawn.py",
> line 287, in _fixup_main_from_path
> main_content = runpy.run_path(main_path,
>   File "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/runpy.py", line
> 265, in run_path
> return _run_module_code(code, init_globals, run_name,
>   File "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/runpy.py", line
> 97, in _run_module_code
> _run_code(code, mod_globals, init_globals,
>   File "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/runpy.py", line
> 87, in _run_code
> exec(code, run_globals)
>   File "/Users/heejong/Work/beam/sdks/python/setup.py", line 259, in
> 
> setuptools.setup(
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/setuptools/__init__.py",
> line 153, in setup
> return distutils.core.setup(**attrs)
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/distutils/core.py",
> line 148, in setup
> dist.run_commands()
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/distutils/dist.py",
> line 966, in run_commands
> self.run_command(cmd)
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/distutils/dist.py",
> line 985, in run_command
> cmd_obj.run()
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/commands.py",
> line 158, in run
> TestProgram(argv=argv, config=self.__config)
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/core.py",
> line 118, in __init__
> unittest.TestProgram.__init__(
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/unittest/main.py", line
> 100, in __init__
> self.parseArgs(argv)
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/core.py",
> line 145, in parseArgs
> self.config.configure(argv, doc=self.usage())
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/config.py",
> line 346, in configure
> self.plugins.configure(options, self)
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/plugins/manager.py",
> line 284, in configure
> cfg(options, config)
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/plugins/manager.py",
> line 99, in __call__
> return self.call(*arg, **kw)
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose/plugins/manager.py",
> line 167, in simple
> result = meth(*arg, **kw)
>   File
> "/Users/heejong/Work/beam/build/gradleenv/192237/lib/python3.8/site-packages/nose_xunitmp.py",
> line 42, in configure
> manager = multiprocessing.Manager()
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/context.py",
> line 57, in Manager
> m.start()
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/managers.py",
> line 579, in start
> self._process.start()
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/process.py",
> line 121, in start
> self._popen = self._Popen(self)
>   File
> "/Users/heejong/.pyenv/versions/3.8.6/lib/python3.8/multiprocessing/context.py",
> line 284, in _Popen
> return Popen(process_obj)
>   File
> 

Java precommit errors "cannot find symbol"

2020-10-13 Thread Alex Amato
I am seeing cannot find symbol errors in two separate PR precommits.

I am not sure how to get more specific info, but they fail with "cannot
find symbol" in files:
ContextualTextIO.java:246 code

BigQueryIO.java:723 code


Build links:
https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/14029/java/new/

https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/14041/java/

PRs:
https://github.com/apache/beam/pull/13083

https://github.com/apache/beam/pull/13078

This one seems like it would be broken for everyone as I am seeing it in
two separate precommits. Any ideas how to find the specific symbols? And
diagnose/fix this issue?

Thanks for your help in advance :),
Alex


PCollectionVisualizationTest.test_dynamic_plotting_return_handle failing in precommit

2020-10-06 Thread Alex Amato
I am seeing this failure in the precommit of a PR
 where I am trying to update the
Dataflow container reference.

I would have filed a JIRA issue as well, but I can't seem to load the
website right now. Is this test known to be flakey or something? Has it
regressed? I don't suspect this interactive runner test is using the new
container referenced in the PR, sio I didn't think this PR would affect
this test. Though I could be wrong.

I will rerun for now.
Please let me know if you have any suggestions

Details
---
https://ci-beam.apache.org/job/beam_PreCommit_Python_Phrase/2241/testReport/junit/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_return_handle/

Error Message

AssertionError: None is not an instance of 

Stacktrace

self = 


def test_dynamic_plotting_return_handle(self):
  h = pv.visualize(
  self._stream, dynamic_plotting_interval=1, display_facets=True)
> self.assertIsInstance(h, timeloop.Timeloop)
E AssertionError: None is not an instance of 

apache_beam/runners/interactive/display/pcoll_visualization_test.py:93:
AssertionError

Standard Output



   0
0  0
1  1
2  2
3  3
4  4


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-09-08 Thread Alex Amato
Hi,

Just wanted to mention that I updated this document with one detail
https://s.apache.org/beam-gcp-debuggability


Date

Changes

Sept 8, 2020

   -

   Clarified that the InstructionRequest/Control Channel will be used in
   “Proposal: SDKHs to Report non-bundle metrics.”

May 15, 2020

   -

   Completed review with beam dev list.


PTAL, and LMK what you think

On Fri, May 15, 2020 at 6:02 PM Alex Amato  wrote:

> Thanks everyone. I was able to collect a lot of good feedback from
> everyone who contributed. I am going to wrap it up for now and label the
> design as "Design Finalized (Unimplemented)".
>
> I really believe we have made a much better design than I initially wrote
> up. I couldn't have done it without the help of everyone who offered their
> time, energy and viewpoints. :)
>
> Thanks again, please let me know if you see any major issues with the
> design still. I think I have enough information to begin some
> implementation as soon as I have some time in the coming weeks.
> Alex
>
> https://s.apache.org/beam-gcp-debuggability
> https://s.apache.org/beam-histogram-metrics
>
> On Thu, May 14, 2020 at 5:22 PM Alex Amato  wrote:
>
>> Thanks to all who have spent their time on this, there were many great
>> suggestions, just another reminder that tomorrow I will be finalizing the
>> documents, unless there are any major objections left. Please take a look
>> at it if you are interested.
>>
>> I will still welcome feedback at any time :).
>>
>> But I believe we have gathered enough information to produce a good
>> design, which I will start to work on soon.
>> I will begin to build the necessary subset of the new features proposed
>> to support the BigQueryIO metrics use case, proposed.
>> I will likely start with the python SDK first.
>>
>> https://s.apache.org/beam-gcp-debuggability
>> https://s.apache.org/beam-histogram-metrics
>>
>>
>> On Wed, May 13, 2020 at 3:07 PM Alex Amato  wrote:
>>
>>> Thanks again for more feedback :). I have iterated on things again. I'll
>>> report back at the end of the week. If there are no major disagreements
>>> still, I'll close the discussion, believe it to be in a good enough state
>>> to start some implementation. But welcome feedback.
>>>
>>> Latest changes are changing the exponential format to allow denser
>>> buckets. Using only two MonitoringInfoSpec now for all of the IOs to use.
>>> Requiring some labels, but allowing optional
>>> ones for specific IOs to provide more contents.
>>>
>>> https://s.apache.org/beam-gcp-debuggability
>>> https://s.apache.org/beam-histogram-metrics
>>>
>>> On Mon, May 11, 2020 at 4:24 PM Alex Amato  wrote:
>>>
>>>> Thanks for the great feedback so far :). I've included many new ideas,
>>>> and made some revisions. Both docs have changed a fair bit since the
>>>> initial mail out.
>>>>
>>>> https://s.apache.org/beam-gcp-debuggability
>>>> https://s.apache.org/beam-histogram-metrics
>>>>
>>>> PTAL and let me know what you think, and hopefully we can resolve major
>>>> issues by the end of the week. I'll try to finalize things by then, but of
>>>> course always stay open to your great ideas. :)
>>>>
>>>> On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:
>>>>
>>>>> Thanks everyone so far for taking a look so far :).
>>>>>
>>>>> I am hoping to have this finalize the two reviews by the end of next
>>>>> week, May 15th.
>>>>>
>>>>> I'll continue to follow up on feedback and make changes, and I will
>>>>> add some more mentions to the documents to draw attention
>>>>>
>>>>> https://s.apache.org/beam-gcp-debuggability
>>>>>  https://s.apache.org/beam-histogram-metrics
>>>>>
>>>>> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>>>>>
>>>>>> Thanks, also took a look and left some comments.
>>>>>>
>>>>>> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I created another design document. This time for GCP IO
>>>>>>> Debuggability Metrics. Which defines some new metrics to collect in the 
>>>>>>> GCP
>>>>>>> IO libraries. This is for monitoring request counts and request 
>>>>>>> latencies.
>>>>>>>
>>>>>>> Please take a look and let me know what you think:
>>>>>>> https://s.apache.org/beam-gcp-debuggability
>>>>>>>
>>>>>>> I also sent out a separate design yesterday (
>>>>>>> https://s.apache.org/beam-histogram-metrics) which is related as
>>>>>>> this document uses a Histogram style metric :).
>>>>>>>
>>>>>>> I would love some feedback to make this feature the best possible :D,
>>>>>>> Alex
>>>>>>>
>>>>>>


Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-09-08 Thread Alex Amato
Hi Again, Just reviving this thread to mention that I updated the doc with
a few sections:
https://s.apache.org/beam-histogram-metrics


Date

Changes

Sept 8, 2020

   -

   Added alternative section: “Collect Moment Sketch Variables Instead of
   Bucket Counts” (recommend not pursuing, due to opposing trade offs and
   significant implementation/maintenance challenge. But may be worth pursuing
   in a future MonitoringInfo type).
   -

   Add distribution variables: min, max, sum, count
   -

   Added alternative section: “Update all distribution metrics to be
   Histograms” (recommend not pursuing, update to histogramDistribution on a
   case by case, due to performance concerns).

May 15, 2020

   -

   Completed review with beam dev list.


PTAL and LMK what you think :)

On Wed, May 6, 2020 at 9:58 AM Luke Cwik  wrote:

> Thanks Alex, I had some minor comments.
>
> On Mon, May 4, 2020 at 4:04 PM Alex Amato  wrote:
>
>> Thanks Ismaël :). Done
>>
>> On Mon, May 4, 2020 at 3:59 PM Ismaël Mejía  wrote:
>>
>>> Moving the short link to this thread
>>> https://s.apache.org/beam-histogram-metrics
>>>
>>> Alex can you add this link (and any other of your documents that may
>>> not be there) to
>>> https://cwiki.apache.org/confluence/display/BEAM/Design+Documents
>>>
>>>
>>> On Tue, May 5, 2020 at 12:51 AM Pablo Estrada 
>>> wrote:
>>> >
>>> > FYI +Boyuan Zhang worked on implementing a histogram metric that was
>>> performance-optimized into outer space for Python : ) - I don't recall if
>>> she ended up getting it merged, but it's worth looking at the work. I also
>>> remember Scott Wegner wrote the metrics for Java.
>>> >
>>> > Best
>>> > -P.
>>> >
>>> > On Mon, May 4, 2020 at 3:33 PM Alex Amato  wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> I have created a proposal for Apache Beam FN API to support Histogram
>>> Style Metrics. Which defines a method to collect Histogram style metrics
>>> and pass them over the FN API.
>>> >>
>>> >> I would love to hear your feedback in order to improve this proposal,
>>> please let me know what you think. Thanks for taking a look :)
>>> >> Alex
>>>
>>


Re: Percentile metrics in Beam

2020-09-08 Thread Alex Amato
I've updated the Histogram Style Metrics design
<https://s.apache.org/beam-histogram-metrics> for the FN API based, with a
section exploring the Moment Sketch. PTAL at the “Collect Moment Sketch
Variables Instead of Bucket Counts” section, and see the assessment. LMK
what you think :)


Date

Changes

Sept 8, 2020

   -

   Added alternative section: “Collect Moment Sketch Variables Instead of
   Bucket Counts” (recommend not pursuing, due to opposing trade offs and
   significant implementation/maintenance challenge. But may be worth pursuing
   in a future MonitoringInfo type).
   -

   Add distribution variables: min, max, sum, count
   -

   Added alternative section: “Update all distribution metrics to be
   Histograms” (recommend not pursuing, update to histogramDistribution on a
   case by case, due to performance concerns).

May 15, 2020

   -

   Completed review with beam dev list.


--
Re @Lukasz Cwik 
I saw that code you linked, which calls into
linearTimeIncrementHistogramCounters, but got confused a bit more when I
tried to dive into the implementation.
(Relevant, since I would need to port parts of this to python for the SDK,
and C++ for the RunnerHarness side)

I asked a data science peer to help me understand this a bit more. I was
trying to get the equation for the CDF. And added a section on how to
derive the CDF in the doc.
If I understand correctly, we need to calculate the theta values (which
depend on the current moment sketch variables) to calculate the CDF.
Which can then be used to estimate bucket counts, with an integral equation
in the paper.





On Tue, Aug 18, 2020 at 11:35 AM Luke Cwik  wrote:

> getPMForCDF[1] seems to return a CDF and you can choose the split points
> (b0, b1, b2, ...).
>
> 1:
> https://github.com/stanford-futuredata/msketch/blob/cf4e49e860761f48ebdeb00f650ce997c46073e2/javamsketch/quantilebench/src/main/java/yahoo/DoublesPmfCdfImpl.java#L16
>
> On Tue, Aug 18, 2020 at 11:20 AM Alex Amato  wrote:
>
>> I'm a bit confused, are you sure that it is possible to derive the CDF?
>> Using the moments variables.
>>
>> The linked implementation on github seems to not use a derived CDF
>> equation, but instead using some sampling technique (which I can't fully
>> grasp yet) to estimate how many elements are in each bucket.
>>
>> linearTimeIncrementHistogramCounters
>>
>> https://github.com/stanford-futuredata/msketch/blob/cf4e49e860761f48ebdeb00f650ce997c46073e2/javamsketch/quantilebench/src/main/java/yahoo/DoublesPmfCdfImpl.java#L117
>>
>> Calls into .get() to do some sort of sampling
>>
>> https://github.com/stanford-futuredata/msketch/blob/cf4e49e860761f48ebdeb00f650ce997c46073e2/javamsketch/quantilebench/src/main/java/yahoo/DirectDoublesSketchAccessor.java#L29
>>
>>
>>
>> On Tue, Aug 18, 2020 at 9:52 AM Ke Wu  wrote:
>>
>>> Hi Alex,
>>>
>>> It is great to know you are working on the metrics. Do you have any
>>> concern if we add a Histogram type metrics in Samza Runner itself for now
>>> so we can start using it before a generic histogram metrics can be
>>> introduced in the Metrics class?
>>>
>>> Best,
>>> Ke
>>>
>>> On Aug 18, 2020, at 12:57 AM, Gleb Kanterov  wrote:
>>>
>>> Hi Alex,
>>>
>>> I'm not sure about restoring histogram, because the use-case I had in
>>> the past used percentiles. As I understand it, you can approximate
>>> histogram if you know percentiles and total count. E.g. 5% of values fall
>>> into [P95, +INF) bucket, other 5% [P90, P95), etc. I don't understand the
>>> paper well enough to say how it's going to work if given bucket boundaries
>>> happen to include a small number of values. I guess it's a similar kind of
>>> trade-off when we need to choose boundaries if we want to get percentiles
>>> from histogram buckets. I see primarily moment sketch as a method intended
>>> to approximate percentiles, not histogram buckets.
>>>
>>> /Gleb
>>>
>>> On Tue, Aug 18, 2020 at 2:13 AM Alex Amato  wrote:
>>>
>>>> Hi Gleb, and Luke
>>>>
>>>> I was reading through the paper, blog and github you linked to. One
>>>> thing I can't figure out is if it's possible to use the Moment Sketch to
>>>> restore an original histogram.
>>>> Given bucket boundaries: b0, b1, b2, b3, ...
>>>> Can we obtain the counts for the number of values inserted each of the
>>>> ranges: [-INF, B0), … [Bi, Bi+1), …
>>>> (This is a requirement I need)
>>>>
>>>> Not be confused with the percentile/threshold based queries discussed
>>>> in the b

How does groupIntoBatches behave when there are too few elements for a key?

2020-08-26 Thread Alex Amato
How does groupIntoBatches behave when there are too few elements for a key
(less than the provided batch size)?

Based on how its described
.
Its not clear to me that the elements will ever emit. Can this cause
stuckness in this case?


Re: Still seeing these wheel build emails on my fork

2020-08-19 Thread Alex Amato
Thanks, done

On Tue, Aug 18, 2020 at 8:49 PM Rui Wang  wrote:

> I solved this by disabling github actions in my forked repo:
> https://github.community/t/how-to-turn-off-github-actions-for-a-repo/16875
>
>
> -Rui
>
> On Tue, Aug 18, 2020 at 8:23 PM Alex Amato  wrote:
>
>> Asked about this a few weeks ago, I rebased from master as was suggested,
>> but I am still seeing these. I am guessing this is wasting our resources
>> somehow? :(
>>
>>
>> On Tue, Aug 18, 2020 at 7:28 PM Alex Amato 
>> wrote:
>>
>>> Run failed for master (010adc5)
>>>
>>> Repository: ajamato/beam
>>> Workflow: Build python wheels
>>> Duration: 8 minutes and 20.0 seconds
>>> Finished: 2020-08-19 02:28:43 UTC
>>>
>>> View results <https://github.com/ajamato/beam/actions/runs/214541528>
>>> Jobs:
>>>
>>>- build_source <https://github.com/ajamato/beam/runs/1001143594>
>>>succeeded (0 annotations)
>>>- Build wheels on ubuntu-latest
>>><https://github.com/ajamato/beam/runs/1001149597> cancelled (2
>>>annotations)
>>>- Build wheels on macos-latest
>>><https://github.com/ajamato/beam/runs/1001149606> failed (1
>>>annotation)
>>>- Build wheels on windows-latest
>>><https://github.com/ajamato/beam/runs/1001149615> cancelled (2
>>>annotations)
>>>- Prepare GCS <https://github.com/ajamato/beam/runs/1001149638>
>>>skipped (0 annotations)
>>>- Upload source to GCS bucket
>>><https://github.com/ajamato/beam/runs/1001149658> skipped (0
>>>annotations)
>>>- Tag repo nightly <https://github.com/ajamato/beam/runs/1001159805>
>>>skipped (0 annotations)
>>>- Upload wheels to GCS bucket
>>><https://github.com/ajamato/beam/runs/1001159822> skipped (0
>>>annotations)
>>>- List files on Google Cloud Storage Bucket
>>><https://github.com/ajamato/beam/runs/1001159835> skipped (0
>>>annotations)
>>>
>>> —
>>> You are receiving this because this workflow ran on your branch.
>>> Manage your GitHub Actions notifications here
>>> <https://github.com/settings/notifications>.
>>>
>>


Still seeing these wheel build emails on my fork

2020-08-18 Thread Alex Amato
Asked about this a few weeks ago, I rebased from master as was suggested,
but I am still seeing these. I am guessing this is wasting our resources
somehow? :(


On Tue, Aug 18, 2020 at 7:28 PM Alex Amato  wrote:

> Run failed for master (010adc5)
>
> Repository: ajamato/beam
> Workflow: Build python wheels
> Duration: 8 minutes and 20.0 seconds
> Finished: 2020-08-19 02:28:43 UTC
>
> View results <https://github.com/ajamato/beam/actions/runs/214541528>
> Jobs:
>
>- build_source <https://github.com/ajamato/beam/runs/1001143594>
>succeeded (0 annotations)
>- Build wheels on ubuntu-latest
><https://github.com/ajamato/beam/runs/1001149597> cancelled (2
>annotations)
>- Build wheels on macos-latest
><https://github.com/ajamato/beam/runs/1001149606> failed (1 annotation)
>- Build wheels on windows-latest
><https://github.com/ajamato/beam/runs/1001149615> cancelled (2
>annotations)
>- Prepare GCS <https://github.com/ajamato/beam/runs/1001149638>
>skipped (0 annotations)
>- Upload source to GCS bucket
><https://github.com/ajamato/beam/runs/1001149658> skipped (0
>annotations)
>- Tag repo nightly <https://github.com/ajamato/beam/runs/1001159805>
>skipped (0 annotations)
>- Upload wheels to GCS bucket
><https://github.com/ajamato/beam/runs/1001159822> skipped (0
>annotations)
>- List files on Google Cloud Storage Bucket
><https://github.com/ajamato/beam/runs/1001159835> skipped (0
>annotations)
>
> —
> You are receiving this because this workflow ran on your branch.
> Manage your GitHub Actions notifications here
> <https://github.com/settings/notifications>.
>


Re: Percentile metrics in Beam

2020-08-18 Thread Alex Amato
I'm a bit confused, are you sure that it is possible to derive the CDF?
Using the moments variables.

The linked implementation on github seems to not use a derived CDF
equation, but instead using some sampling technique (which I can't fully
grasp yet) to estimate how many elements are in each bucket.

linearTimeIncrementHistogramCounters
https://github.com/stanford-futuredata/msketch/blob/cf4e49e860761f48ebdeb00f650ce997c46073e2/javamsketch/quantilebench/src/main/java/yahoo/DoublesPmfCdfImpl.java#L117

Calls into .get() to do some sort of sampling
https://github.com/stanford-futuredata/msketch/blob/cf4e49e860761f48ebdeb00f650ce997c46073e2/javamsketch/quantilebench/src/main/java/yahoo/DirectDoublesSketchAccessor.java#L29



On Tue, Aug 18, 2020 at 9:52 AM Ke Wu  wrote:

> Hi Alex,
>
> It is great to know you are working on the metrics. Do you have any
> concern if we add a Histogram type metrics in Samza Runner itself for now
> so we can start using it before a generic histogram metrics can be
> introduced in the Metrics class?
>
> Best,
> Ke
>
> On Aug 18, 2020, at 12:57 AM, Gleb Kanterov  wrote:
>
> Hi Alex,
>
> I'm not sure about restoring histogram, because the use-case I had in the
> past used percentiles. As I understand it, you can approximate histogram if
> you know percentiles and total count. E.g. 5% of values fall into
> [P95, +INF) bucket, other 5% [P90, P95), etc. I don't understand the paper
> well enough to say how it's going to work if given bucket boundaries happen
> to include a small number of values. I guess it's a similar kind of
> trade-off when we need to choose boundaries if we want to get percentiles
> from histogram buckets. I see primarily moment sketch as a method intended
> to approximate percentiles, not histogram buckets.
>
> /Gleb
>
> On Tue, Aug 18, 2020 at 2:13 AM Alex Amato  wrote:
>
>> Hi Gleb, and Luke
>>
>> I was reading through the paper, blog and github you linked to. One thing
>> I can't figure out is if it's possible to use the Moment Sketch to restore
>> an original histogram.
>> Given bucket boundaries: b0, b1, b2, b3, ...
>> Can we obtain the counts for the number of values inserted each of the
>> ranges: [-INF, B0), … [Bi, Bi+1), …
>> (This is a requirement I need)
>>
>> Not be confused with the percentile/threshold based queries discussed in
>> the blog.
>>
>> Luke, were you suggesting collecting both and sending both over the FN
>> API wire? I.e. collecting both
>>
>>- the variables to represent the Histogram as suggested in
>>https://s.apache.org/beam-histogram-metrics:
>>- In addition to the moment sketch variables
>>
>> <https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/>
>>.
>>
>> I believe that would be feasible, as we would still retain the Histogram
>> data. I don't think we can restore the Histograms with just the Sketch, if
>> that was the suggestion. Please let me know if I misunderstood.
>>
>> If that's correct, I can write up the benefits and drawbacks I see for
>> both approaches.
>>
>>
>> On Mon, Aug 17, 2020 at 9:23 AM Luke Cwik  wrote:
>>
>>> That is an interesting suggestion to change to use a sketch.
>>>
>>> I believe having one metric URN that represents all this information
>>> grouped together would make sense instead of attempting to aggregate
>>> several metrics together. The underlying implementation of using
>>> sum/count/max/min would stay the same but we would want a single object
>>> that abstracts this complexity away for users as well.
>>>
>>> On Mon, Aug 17, 2020 at 3:42 AM Gleb Kanterov  wrote:
>>>
>>>> Didn't see proposal by Alex before today. I want to add a few more
>>>> cents from my side.
>>>>
>>>> There is a paper Moment-based quantile sketches for efficient high
>>>> cardinality aggregation queries [1], a TL;DR that for some N (around 10-20
>>>> depending on accuracy) we need to collect SUM(log^N(X)) ... log(X),
>>>> COUNT(X), SUM(X), SUM(X^2)... SUM(X^N), MAX(X), MIN(X). Given aggregated
>>>> numbers, it uses solver for Chebyshev polynomials to get quantile number,
>>>> and there is already Java implementation for it on GitHub [2].
>>>>
>>>> This way we can express quantiles using existing metric types in Beam,
>>>> that can be already done without SDK or runner changes. It can fit nicely
>>>> into existing runners and can be abstracted over if needed. I think this is
>>>> also one of the best im

Re: Percentile metrics in Beam

2020-08-17 Thread Alex Amato
Hi Gleb, and Luke

I was reading through the paper, blog and github you linked to. One thing I
can't figure out is if it's possible to use the Moment Sketch to restore an
original histogram.
Given bucket boundaries: b0, b1, b2, b3, ...
Can we obtain the counts for the number of values inserted each of the
ranges: [-INF, B0), … [Bi, Bi+1), …
(This is a requirement I need)

Not be confused with the percentile/threshold based queries discussed in
the blog.

Luke, were you suggesting collecting both and sending both over the FN API
wire? I.e. collecting both

   - the variables to represent the Histogram as suggested in
   https://s.apache.org/beam-histogram-metrics:
   - In addition to the moment sketch variables
   
<https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/>
   .

I believe that would be feasible, as we would still retain the Histogram
data. I don't think we can restore the Histograms with just the Sketch, if
that was the suggestion. Please let me know if I misunderstood.

If that's correct, I can write up the benefits and drawbacks I see for both
approaches.


On Mon, Aug 17, 2020 at 9:23 AM Luke Cwik  wrote:

> That is an interesting suggestion to change to use a sketch.
>
> I believe having one metric URN that represents all this information
> grouped together would make sense instead of attempting to aggregate
> several metrics together. The underlying implementation of using
> sum/count/max/min would stay the same but we would want a single object
> that abstracts this complexity away for users as well.
>
> On Mon, Aug 17, 2020 at 3:42 AM Gleb Kanterov  wrote:
>
>> Didn't see proposal by Alex before today. I want to add a few more cents
>> from my side.
>>
>> There is a paper Moment-based quantile sketches for efficient high
>> cardinality aggregation queries [1], a TL;DR that for some N (around 10-20
>> depending on accuracy) we need to collect SUM(log^N(X)) ... log(X),
>> COUNT(X), SUM(X), SUM(X^2)... SUM(X^N), MAX(X), MIN(X). Given aggregated
>> numbers, it uses solver for Chebyshev polynomials to get quantile number,
>> and there is already Java implementation for it on GitHub [2].
>>
>> This way we can express quantiles using existing metric types in Beam,
>> that can be already done without SDK or runner changes. It can fit nicely
>> into existing runners and can be abstracted over if needed. I think this is
>> also one of the best implementations, it has < 1% error rate for 200 bytes
>> of storage, and quite efficient to compute. Did we consider using that?
>>
>> [1]:
>> https://blog.acolyer.org/2018/10/31/moment-based-quantile-sketches-for-efficient-high-cardinality-aggregation-queries/
>> [2]: https://github.com/stanford-futuredata/msketch
>>
>> On Sat, Aug 15, 2020 at 6:15 AM Alex Amato  wrote:
>>
>>> The distinction here is that even though these metrics come from user
>>> space, we still gave them specific URNs, which imply they have a specific
>>> format, with specific labels, etc.
>>>
>>> That is, we won't be packaging them into a USER_HISTOGRAM urn. That URN
>>> would have less expectation for its format. Today the USER_COUNTER just
>>> expects like labels (TRANSFORM, NAME, NAMESPACE).
>>>
>>> We didn't decide on making a private API. But rather an API available to
>>> user code for populating metrics with specific labels, and specific URNs.
>>> The same API could pretty much be used for user USER_HISTOGRAM. with a
>>> default URN chosen.
>>> Thats how I see it in my head at the moment.
>>>
>>>
>>> On Fri, Aug 14, 2020 at 8:52 PM Robert Bradshaw 
>>> wrote:
>>>
>>>> On Fri, Aug 14, 2020 at 7:35 PM Alex Amato  wrote:
>>>> >
>>>> > I am only tackling the specific metrics covered in (for the python
>>>> SDK first, then Java). To collect latency of IO API RPCS, and store it in a
>>>> histogram.
>>>> > https://s.apache.org/beam-gcp-debuggability
>>>> >
>>>> > User histogram metrics are unfunded, as far as I know. But you should
>>>> be able to extend what I do for that project to the user metric use case. I
>>>> agree, it won't be much more work to support that. I designed the histogram
>>>> with the user histogram case in mind.
>>>>
>>>> From the portability point of view, all metrics generated in users
>>>> code (and SDK-side IOs are "user code") are user metrics. But
>>>> regardless of how things are named, once we have histogram metrics
>>>> crossing the FnAP

Re: Percentile metrics in Beam

2020-08-14 Thread Alex Amato
The distinction here is that even though these metrics come from user
space, we still gave them specific URNs, which imply they have a specific
format, with specific labels, etc.

That is, we won't be packaging them into a USER_HISTOGRAM urn. That URN
would have less expectation for its format. Today the USER_COUNTER just
expects like labels (TRANSFORM, NAME, NAMESPACE).

We didn't decide on making a private API. But rather an API available to
user code for populating metrics with specific labels, and specific URNs.
The same API could pretty much be used for user USER_HISTOGRAM. with a
default URN chosen.
Thats how I see it in my head at the moment.


On Fri, Aug 14, 2020 at 8:52 PM Robert Bradshaw  wrote:

> On Fri, Aug 14, 2020 at 7:35 PM Alex Amato  wrote:
> >
> > I am only tackling the specific metrics covered in (for the python SDK
> first, then Java). To collect latency of IO API RPCS, and store it in a
> histogram.
> > https://s.apache.org/beam-gcp-debuggability
> >
> > User histogram metrics are unfunded, as far as I know. But you should be
> able to extend what I do for that project to the user metric use case. I
> agree, it won't be much more work to support that. I designed the histogram
> with the user histogram case in mind.
>
> From the portability point of view, all metrics generated in users
> code (and SDK-side IOs are "user code") are user metrics. But
> regardless of how things are named, once we have histogram metrics
> crossing the FnAPI boundary all the infrastructure will be in place.
> (At least the plan as I understand it shouldn't use private APIs
> accessible only by the various IOs but not other SDK-level code.)
>
> > On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw 
> wrote:
> >>
> >> Once histograms are implemented in the SDK(s) (Alex, you're tackling
> >> this, right?) it shoudn't be much work to update the Samza worker code
> >> to publish these via the Samza runner APIs (in parallel with Alex's
> >> work to do the same on Dataflow).
> >>
> >> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato  wrote:
> >> >
> >> > Noone has any plans currently to work on adding a generic histogram
> metric, at the moment.
> >> >
> >> > But I will be actively working on adding it for a specific set of
> metrics in the next quarter or so
> >> > https://s.apache.org/beam-gcp-debuggability
> >> >
> >> > After that work, one could take a look at my PRs for reference to
> create new metrics using the same histogram. One may wish to implement the
> UserHistogram use case and use that in the Samza Runner
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu  wrote:
> >> >>
> >> >> Thank you Robert and Alex. I am not running a Beam job in Google
> Cloud but with Samza Runner, so I am wondering if there is any ETA to add
> the Histogram metrics in Metrics class so it can be mapped to the
> SamzaHistogram metric to the actual emitting.
> >> >>
> >> >> Best,
> >> >> Ke
> >> >>
> >> >> On Aug 14, 2020, at 4:44 PM, Alex Amato  wrote:
> >> >>
> >> >> One of the plans to use the histogram data is to send it to Google
> Monitoring to compute estimates of percentiles. This is done using the
> bucket counts and bucket boundaries.
> >> >>
> >> >> Here is a describing of roughly how its calculated.
> >> >>
> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
> >> >> This is a non exact estimate. But plotting the estimated percentiles
> over time is often easier to understand and sufficient.
> >> >> (An alternative is a heatmap chart representing histograms over
> time. I.e. a histogram for each window of time).
> >> >>
> >> >>
> >> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw 
> wrote:
> >> >>>
> >> >>> You may be interested in the propose histogram metrics:
> >> >>>
> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
> >> >>>
> >> >>> I think it'd be reasonable to add percentiles as its own metric type
> >> >>> as well. The tricky bit (though there are lots of resources on this)
> >> >>> is that one would have to publish more than just the percentiles
> from
> >> >>> each worker to be able to compute the final percentiles across all
> >> >>> workers.
> >> >

Re: Percentile metrics in Beam

2020-08-14 Thread Alex Amato
I am only tackling the specific metrics covered in (for the python SDK
first, then Java). To collect latency of IO API RPCS, and store it in a
histogram.
https://s.apache.org/beam-gcp-debuggability

User histogram metrics are unfunded, as far as I know. But you should be
able to extend what I do for that project to the user metric use case. I
agree, it won't be much more work to support that. I designed the histogram
with the user histogram case in mind.

On Fri, Aug 14, 2020 at 5:47 PM Robert Bradshaw  wrote:

> Once histograms are implemented in the SDK(s) (Alex, you're tackling
> this, right?) it shoudn't be much work to update the Samza worker code
> to publish these via the Samza runner APIs (in parallel with Alex's
> work to do the same on Dataflow).
>
> On Fri, Aug 14, 2020 at 5:35 PM Alex Amato  wrote:
> >
> > Noone has any plans currently to work on adding a generic histogram
> metric, at the moment.
> >
> > But I will be actively working on adding it for a specific set of
> metrics in the next quarter or so
> > https://s.apache.org/beam-gcp-debuggability
> >
> > After that work, one could take a look at my PRs for reference to create
> new metrics using the same histogram. One may wish to implement the
> UserHistogram use case and use that in the Samza Runner
> >
> >
> >
> >
> > On Fri, Aug 14, 2020 at 5:25 PM Ke Wu  wrote:
> >>
> >> Thank you Robert and Alex. I am not running a Beam job in Google Cloud
> but with Samza Runner, so I am wondering if there is any ETA to add the
> Histogram metrics in Metrics class so it can be mapped to the
> SamzaHistogram metric to the actual emitting.
> >>
> >> Best,
> >> Ke
> >>
> >> On Aug 14, 2020, at 4:44 PM, Alex Amato  wrote:
> >>
> >> One of the plans to use the histogram data is to send it to Google
> Monitoring to compute estimates of percentiles. This is done using the
> bucket counts and bucket boundaries.
> >>
> >> Here is a describing of roughly how its calculated.
> >>
> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
> >> This is a non exact estimate. But plotting the estimated percentiles
> over time is often easier to understand and sufficient.
> >> (An alternative is a heatmap chart representing histograms over time.
> I.e. a histogram for each window of time).
> >>
> >>
> >> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw 
> wrote:
> >>>
> >>> You may be interested in the propose histogram metrics:
> >>>
> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
> >>>
> >>> I think it'd be reasonable to add percentiles as its own metric type
> >>> as well. The tricky bit (though there are lots of resources on this)
> >>> is that one would have to publish more than just the percentiles from
> >>> each worker to be able to compute the final percentiles across all
> >>> workers.
> >>>
> >>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu  wrote:
> >>> >
> >>> > Hi everyone,
> >>> >
> >>> > I am looking to add percentile metrics (p50, p90 etc) to my beam job
> but I only find Counter, Gauge and Distribution metrics. I understand that
> I can calculate percentile metrics in my job itself and use Gauge to emit,
> however this is not an easy approach. On the other hand, Distribution
> metrics sounds like the one to go to according to its documentation: "A
> metric that reports information about the distribution of reported
> values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX.
> >>> >
> >>> > The question(s) are:
> >>> >
> >>> > 1. is Distribution metric only intended for sum, count, min, max?
> >>> > 2. If Yes, can the documentation be updated to be more specific?
> >>> > 3. Can we add percentiles metric support, such as Histogram, with
> configurable list of percentiles to emit?
> >>> >
> >>> > Best,
> >>> > Ke
> >>
> >>
>


Re: Percentile metrics in Beam

2020-08-14 Thread Alex Amato
Noone has any plans currently to work on adding a generic histogram metric,
at the moment.

But I will be actively working on adding it for a specific set of metrics
in the next quarter or so
https://s.apache.org/beam-gcp-debuggability

After that work, one could take a look at my PRs for reference to create
new metrics using the same histogram. One may wish to implement the
UserHistogram use case and use that in the Samza Runner




On Fri, Aug 14, 2020 at 5:25 PM Ke Wu  wrote:

> Thank you Robert and Alex. I am not running a Beam job in Google Cloud but
> with Samza Runner, so I am wondering if there is any ETA to add the
> Histogram metrics in Metrics class so it can be mapped to the
> SamzaHistogram
> <http://samza.apache.org/learn/documentation/versioned/api/javadocs/org/apache/samza/metrics/SamzaHistogram.html>
>  metric
> to the actual emitting.
>
> Best,
> Ke
>
> On Aug 14, 2020, at 4:44 PM, Alex Amato  wrote:
>
> One of the plans to use the histogram data is to send it to Google
> Monitoring to compute estimates of percentiles. This is done using the
> bucket counts and bucket boundaries.
>
> Here is a describing of roughly how its calculated.
>
> https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
> This is a non exact estimate. But plotting the estimated percentiles over
> time is often easier to understand and sufficient.
> (An alternative is a heatmap chart representing histograms over time. I.e.
> a histogram for each window of time).
>
>
> On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw 
> wrote:
>
>> You may be interested in the propose histogram metrics:
>>
>> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
>>
>> I think it'd be reasonable to add percentiles as its own metric type
>> as well. The tricky bit (though there are lots of resources on this)
>> is that one would have to publish more than just the percentiles from
>> each worker to be able to compute the final percentiles across all
>> workers.
>>
>> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu  wrote:
>> >
>> > Hi everyone,
>> >
>> > I am looking to add percentile metrics (p50, p90 etc) to my beam job
>> but I only find Counter, Gauge and Distribution metrics. I understand that
>> I can calculate percentile metrics in my job itself and use Gauge to emit,
>> however this is not an easy approach. On the other hand, Distribution
>> metrics sounds like the one to go to according to its documentation: "A
>> metric that reports information about the distribution of reported
>> values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX.
>> >
>> > The question(s) are:
>> >
>> > 1. is Distribution metric only intended for sum, count, min, max?
>> > 2. If Yes, can the documentation be updated to be more specific?
>> > 3. Can we add percentiles metric support, such as Histogram, with
>> configurable list of percentiles to emit?
>> >
>> > Best,
>> > Ke
>>
>
>


Added instructions on how to use snapshot/wheel versions of beam to the wiki

2020-08-14 Thread Alex Amato
I added a few instructions for Java and Python on how to use snapshot/wheel
versions of beam SDKs to the Python Tips and Java Tips section.

   - How do I use a snapshot Beam Java SDK version
   ?
   - Use a Wheel SDK for Python
   

Feel free to change or modify it. LMK if you think it needs correction.
Alex


Re: Percentile metrics in Beam

2020-08-14 Thread Alex Amato
One of the plans to use the histogram data is to send it to Google
Monitoring to compute estimates of percentiles. This is done using the
bucket counts and bucket boundaries.

Here is a describing of roughly how its calculated.
https://stackoverflow.com/questions/59635115/gcp-console-how-are-percentile-charts-calculated
This is a non exact estimate. But plotting the estimated percentiles over
time is often easier to understand and sufficient.
(An alternative is a heatmap chart representing histograms over time. I.e.
a histogram for each window of time).


On Fri, Aug 14, 2020 at 4:16 PM Robert Bradshaw  wrote:

> You may be interested in the propose histogram metrics:
>
> https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit
>
> I think it'd be reasonable to add percentiles as its own metric type
> as well. The tricky bit (though there are lots of resources on this)
> is that one would have to publish more than just the percentiles from
> each worker to be able to compute the final percentiles across all
> workers.
>
> On Fri, Aug 14, 2020 at 4:05 PM Ke Wu  wrote:
> >
> > Hi everyone,
> >
> > I am looking to add percentile metrics (p50, p90 etc) to my beam job but
> I only find Counter, Gauge and Distribution metrics. I understand that I
> can calculate percentile metrics in my job itself and use Gauge to emit,
> however this is not an easy approach. On the other hand, Distribution
> metrics sounds like the one to go to according to its documentation: "A
> metric that reports information about the distribution of reported
> values.”, however it seems that it is intended for SUM, COUNT, MIN, MAX.
> >
> > The question(s) are:
> >
> > 1. is Distribution metric only intended for sum, count, min, max?
> > 2. If Yes, can the documentation be updated to be more specific?
> > 3. Can we add percentiles metric support, such as Histogram, with
> configurable list of percentiles to emit?
> >
> > Best,
> > Ke
>


Re: Using --sdk_location with python fails with a TypeError

2020-08-13 Thread Alex Amato
I changed the .wdl I was passing in to:
--sdk_location=
https://storage.googleapis.com/beam-wheels-staging/master/699f872ea1ef3bdb1588a029fc6b1e3185e986a6-207696119/apache_beam-2.25.0.dev0-cp36-cp36m-macosx_10_9_x86_64.whl


and also tried

--sdk_location=
https://storage.googleapis.com/beam-wheels-staging/master/699f872ea1ef3bdb1588a029fc6b1e3185e986a6-207696119/apache-beam-2.25.0.dev0.zip


python --version

Python 3.6.8

In both cases the same TypeError occurs.
https://paste.googleplex.com/6275630654029824



On Thu, Aug 13, 2020 at 3:52 PM Valentyn Tymofieiev 
wrote:

> You are passing a python 2.7 wheel to a job that was launched on python
> 3.6.
>
> You need to select a correct wheel for the platform or pass source
> distribution (zip/tag.gz).
>
> On Thu, Aug 13, 2020, 15:20 Alex Amato  wrote:
>
>> I was trying to use the --sdk_location parameter in a python pipeline, to
>> allow users to run a snapshot SDK. Though it looks like it hit a type error
>> after downloading the .wdl file.
>>
>> Perhaps this code is assuming that remote files downloaded are text type,
>> not bytes type? Have I done something wrong? Or is this a bug? Any ideas?
>>
>> Thanks for taking a look,
>> Alex
>>
>> Using the --sdk_location parameter (Full command line
>> <https://paste.googleplex.com/5792777008840704>)
>> --sdk_location=
>> https://storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl
>>
>> INFO:apache_beam.runners.portability.stager:Failed to download Artifact
>> from
>> https://storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl
>> Traceback (most recent call last):
>>   File
>> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py",
>> line 193, in _run_module_as_main
>> "__main__", mod_spec)
>>   File
>> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py",
>> line 85, in _run_code
>> exec(code, run_globals)
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/examples/wordcount.py",
>> line 142, in 
>> run()
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/examples/wordcount.py",
>> line 121, in run
>> result = p.run()
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/pipeline.py",
>> line 521, in run
>> allow_proto_holders=True).run(False)
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/pipeline.py",
>> line 534, in run
>> return self.runner.run_pipeline(self, self._options)
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>> line 479, in run_pipeline
>> artifacts=environments.python_sdk_dependencies(options)))
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/transforms/environments.py",
>> line 611, in python_sdk_dependencies
>> staged_name in stager.Stager.create_job_resources(options, tmp_dir))
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py",
>> line 235, in create_job_resources
>> resources.extend(Stager._create_beam_sdk(sdk_remote_location,
>> temp_dir))
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py",
>> line 657, in _create_beam_sdk
>> Stager._download_file(sdk_remote_location, local_download_file)
>>   File
>> "/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py",
>> line 375, in _download_file
>> f.write(content)
>> TypeError: write() argument must be str, not bytes
>>
>>
>>


Using --sdk_location with python fails with a TypeError

2020-08-13 Thread Alex Amato
I was trying to use the --sdk_location parameter in a python pipeline, to
allow users to run a snapshot SDK. Though it looks like it hit a type error
after downloading the .wdl file.

Perhaps this code is assuming that remote files downloaded are text type,
not bytes type? Have I done something wrong? Or is this a bug? Any ideas?

Thanks for taking a look,
Alex

Using the --sdk_location parameter (Full command line
)
--sdk_location=
https://storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl

INFO:apache_beam.runners.portability.stager:Failed to download Artifact
from
https://storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl
Traceback (most recent call last):
  File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py",
line 193, in _run_module_as_main
"__main__", mod_spec)
  File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py",
line 85, in _run_code
exec(code, run_globals)
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/examples/wordcount.py",
line 142, in 
run()
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/examples/wordcount.py",
line 121, in run
result = p.run()
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/pipeline.py",
line 521, in run
allow_proto_holders=True).run(False)
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/pipeline.py",
line 534, in run
return self.runner.run_pipeline(self, self._options)
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
line 479, in run_pipeline
artifacts=environments.python_sdk_dependencies(options)))
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/transforms/environments.py",
line 611, in python_sdk_dependencies
staged_name in stager.Stager.create_job_resources(options, tmp_dir))
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py",
line 235, in create_job_resources
resources.extend(Stager._create_beam_sdk(sdk_remote_location, temp_dir))
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py",
line 657, in _create_beam_sdk
Stager._download_file(sdk_remote_location, local_download_file)
  File
"/Users/ajamato/beam/beam-sdk-download-test/venv/lib/python3.6/site-packages/apache_beam/runners/portability/stager.py",
line 375, in _download_file
f.write(content)
TypeError: write() argument must be str, not bytes


Re: Making reporting bugs/feature request easier

2020-08-04 Thread Alex Amato
May I suggest we print a URL(and a message) you can use to file bugs at, in
the command line when you run a beam pipeline. (And in any other user
interface we use for beam, some of the runner specific UIs may want to link
to this as well).

On Tue, Aug 4, 2020 at 9:16 AM Alexey Romanenko 
wrote:

> Great topic, thanks Griselda for raising this question.
>
> I’d prefer to keep Jira as the only one main issue tracker and use other
> suggested ways, like emails, Git issues, web form or dedicated Slack
> channel, as different interfaces designed to simplify a way how users can
> submit an issue. But in any case it will require an attention of Beam
> contributors to properly create Jira issue and send back a link that can be
> followed for updates.
>
> On 31 Jul 2020, at 20:22, Robert Burke  wrote:
>
> I do like the idea of the "mwrong way to raise issues point to the correct
> ways.
>
> On Fri, Jul 31, 2020, 10:57 AM Brian Hulette  wrote:
>
>> I think I'd prefer continuing to use jira, but GitHub issues are
>> certainly much more discoverable for our users. The Arrow project uses
>> GitHub issues as a way to funnel users to the mailing lists and JIRA. When
>> users go to file an issue they're first given two options [1]:
>>
>> - Ask a question -> Please ask questions at u...@arrow.apache.org
>> - Report an issue -> Please report bugs and request features on JIRA.
>>
>> With accompanying links for each option. The user@ link actually
>> takes you to the new issue page, with a template strongly encouraging you
>> to file a jira or subscribe to the mailing lists.
>> Despite all these barriers people do still file github issues, and they
>> need to be triaged (usually they just receive a comment asking the reporter
>> to file a jira or linking to an existing jira), but the volume isn't that
>> high.
>>
>> Maybe we could consider something like that?
>>
>> Brian
>>
>> [1] https://github.com/apache/arrow/issues/new/choose
>>
>> On Thu, Jul 30, 2020 at 2:45 PM Robert Bradshaw 
>> wrote:
>>
>>> On Wed, Jul 29, 2020 at 7:12 PM Kenneth Knowles  wrote:
>>>

 On Wed, Jul 29, 2020 at 11:08 AM Robert Bradshaw 
 wrote:

> +1 to a simple link that fills in most of the fields in JIRA, though
> this doesn't solve the issue of having to sign up just to submit a bug
> report. Just using the users list isn't a bad idea either--we could easily
> create a script that ensures all threads that have a message like "we
> should file a JIRA for this" are followed up with a message like "JIRA
> filed at ...". (That doesn't mean it won't language on the tracker.)
>
> I think it's worth seriously considering just using Github's issue
> tracker, since that's where our users are. Is there anything in we 
> actually
> use in JIRA that'd be missing?
>

 Pretty big question. Just noting to start that Apache projects
 certainly can and do use GitHub issues. Here is a quick inventory of things
 that are used in a meaningful way:

  - Priorities (with GitHub Issues I think you roll your own with labels)
  - Issue types (with GitHub Issues I think you roll your own with
 labels)
  - Distinct "Triage Needed" state (also labels; anything lacking the
 "triaged" label)
  - Distinguishing "Open" and "In Progress" (also labels? can use
 Projects/Milestones - I forget exactly which - to keep a kanban-ish status)

>>>
>>> Yes, basically everything can is done with labels. Whether having one
>>> hammer is good, well, there are pros and cons.
>>>
>>>
  - Our new automation: "stale-assigned" and subsequent unassign;
 "stale-P2" and subsequent downgrade

>>>
>>> Github has a very nice ReST API, making things like this very easy.
>>>
>>>
  - Fix Version for associating fixes with releases

>>>
>>> This information is typically intrinsic with when the commits were
>>> applied and the bug closed. It's pretty typical to use milestones for a
>>> release, and then tag "blockers" to it. (IMHO, this is better than having
>>> the default always be the next release, and bumping all open bugs every
>>> release that comes out.) Milestones can be used to track other work as
>>> well.
>>>
>>>
  - Affect Version, while not used much, is still helpful to have
  - Components, since our repo is really a mini mono repo. Again, labels.
  - Kanban boards (milestones/projects maybe kinda)
  - Reports (not really same level, but maybe OK?)

 Fairly recently I worked on a project that tried to use GitHub Issues
 and Projects and Milestones and whatnot and it was OK but not great. Jira's
 complexity is largely essential / not really complex but just visually
 busy. The two are not really even comparable offerings. There may be third
 party integrations that add some of what you'd want.

>>>
>>> Yeah, I agree Github issues is not as full featured. One thing I miss
>>> from other products is dependencies 

Re: Email about build runs on my fork.

2020-08-03 Thread Alex Amato
Thanks, havent seen any emails since rebasing to master.

On Sun, Aug 2, 2020 at 5:09 AM Tobiasz Kędzierski <
tobiasz.kedzier...@polidea.com> wrote:

> Hi Alex,
>
> After rebase on the latest master scheduled workflow should not run,
> condition for scheduled job was extended to prevent situations like this.
> As mentioned by Robert, you can disable gh action in case you don't need
> it.
>
> BR
> Tobiasz
>
> On Thu, Jul 30, 2020 at 9:18 PM Robert Burke  wrote:
>
>> You can disable GitHub actions on your own repos via the UI or via a code
>> change
>>
>> https://github.community/t/how-can-i-disable-a-github-action/17049
>>
>> On Thu, Jul 30, 2020, 12:15 PM Ahmet Altay  wrote:
>>
>>> /cc +tobiasz.kedzier...@polidea.com  +Emily
>>> Ye  -- this question is related to one of the
>>> recent github action prs.
>>>
>>> On Thu, Jul 30, 2020 at 10:23 AM Alex Amato  wrote:
>>>
>>>> Hi,
>>>>
>>>> I received this email indicating some build was running on my fork,
>>>> though I had not been doing any work on that fork for the last few weeks.
>>>>
>>>> I don't really need to run these builds on my fork and don't think we
>>>> need to waste resources on this. Is there some way to prevent forks from
>>>> doing this?
>>>>
>>>> I just rebased my own fork from apache beam master now. I am not sure
>>>> if that will stop it or not, but it should now be up to date.
>>>>
>>>> -- Forwarded message -
>>>> From: Alex Amato 
>>>> Date: Wed, Jul 29, 2020 at 7:47 PM
>>>> Subject: [ajamato/beam] Run failed: Build python wheels - master
>>>> (9ca80ae)
>>>> To: ajamato/beam 
>>>> Cc: Ci activity 
>>>>
>>>>
>>>> Run failed for master (9ca80ae)
>>>>
>>>> Repository: ajamato/beam
>>>> Workflow: Build python wheels
>>>> Duration: 29 minutes and 27.0 seconds
>>>> Finished: 2020-07-30 02:47:17 UTC
>>>>
>>>> View results <https://github.com/ajamato/beam/actions/runs/187983527>
>>>> Jobs:
>>>>
>>>>- build_source <https://github.com/ajamato/beam/runs/925886578>
>>>>succeeded (0 annotations)
>>>>- Build wheels on ubuntu-latest
>>>><https://github.com/ajamato/beam/runs/925892640> succeeded (0
>>>>annotations)
>>>>- Build wheels on macos-latest
>>>><https://github.com/ajamato/beam/runs/925892647> succeeded (0
>>>>annotations)
>>>>- Prepare GCS <https://github.com/ajamato/beam/runs/925892670>
>>>>succeeded (0 annotations)
>>>>- Upload source to GCS bucket
>>>><https://github.com/ajamato/beam/runs/925893275> failed (1
>>>>annotation)
>>>>- Tag repo nightly <https://github.com/ajamato/beam/runs/925940497>
>>>>succeeded (0 annotations)
>>>>- Upload wheels to GCS bucket (ubuntu-latest)
>>>><https://github.com/ajamato/beam/runs/925940517> cancelled (2
>>>>annotations)
>>>>- Upload wheels to GCS bucket (macos-latest)
>>>><https://github.com/ajamato/beam/runs/925940521> failed (1
>>>>annotation)
>>>>- List files on Google Cloud Storage Bucket
>>>><https://github.com/ajamato/beam/runs/925941387> skipped (0
>>>>annotations)
>>>>
>>>> —
>>>> You are receiving this because this workflow ran on your branch.
>>>> Manage your GitHub Actions notifications here
>>>> <https://github.com/settings/notifications>.
>>>>
>>>


Email about build runs on my fork.

2020-07-30 Thread Alex Amato
Hi,

I received this email indicating some build was running on my fork, though
I had not been doing any work on that fork for the last few weeks.

I don't really need to run these builds on my fork and don't think we need
to waste resources on this. Is there some way to prevent forks from doing
this?

I just rebased my own fork from apache beam master now. I am not sure if
that will stop it or not, but it should now be up to date.

-- Forwarded message -
From: Alex Amato 
Date: Wed, Jul 29, 2020 at 7:47 PM
Subject: [ajamato/beam] Run failed: Build python wheels - master (9ca80ae)
To: ajamato/beam 
Cc: Ci activity 


Run failed for master (9ca80ae)

Repository: ajamato/beam
Workflow: Build python wheels
Duration: 29 minutes and 27.0 seconds
Finished: 2020-07-30 02:47:17 UTC

View results <https://github.com/ajamato/beam/actions/runs/187983527>
Jobs:

   - build_source <https://github.com/ajamato/beam/runs/925886578>
   succeeded (0 annotations)
   - Build wheels on ubuntu-latest
   <https://github.com/ajamato/beam/runs/925892640> succeeded (0
   annotations)
   - Build wheels on macos-latest
   <https://github.com/ajamato/beam/runs/925892647> succeeded (0
   annotations)
   - Prepare GCS <https://github.com/ajamato/beam/runs/925892670> succeeded
   (0 annotations)
   - Upload source to GCS bucket
   <https://github.com/ajamato/beam/runs/925893275> failed (1 annotation)
   - Tag repo nightly <https://github.com/ajamato/beam/runs/925940497>
   succeeded (0 annotations)
   - Upload wheels to GCS bucket (ubuntu-latest)
   <https://github.com/ajamato/beam/runs/925940517> cancelled (2
   annotations)
   - Upload wheels to GCS bucket (macos-latest)
   <https://github.com/ajamato/beam/runs/925940521> failed (1 annotation)
   - List files on Google Cloud Storage Bucket
   <https://github.com/ajamato/beam/runs/925941387> skipped (0 annotations)

—
You are receiving this because this workflow ran on your branch.
Manage your GitHub Actions notifications here
<https://github.com/settings/notifications>.


Is there an easy way to figure out why my build failed?

2020-06-30 Thread Alex Amato
Often I see the build failing, but on the next page there are no warnings
and no errors.

Then when you dive into the full log, it slows down the browser and there
is no obvious ctrl-f keyword to find the error ("error" yields over 100
results, and the error isn't always at the bottom). Is there a
faster/better way to do it?

There is a log about the build timing out, but I don't really know what
timed out or where to look next.

Is 120 min a long enough time? Did something recently happen? If so Can we
increase the timeout until we debug the regression?

https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12017/

https://issues.apache.org/jira/browse/BEAM-10390

Thanks, I would appreciate any ideas :)
Alex


Re: Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Alex Amato
This is failing in two of my PRs now. It looks like this tool already has
dealt with these errors by adding retries. But it only retries 3 times per
URL.

Here is a PR which changes it from 3 to 9 retries.
https://github.com/apache/beam/pull/12130
Would it be possible to merge this? Hopefully this will remove the issue
for everyone :)


On Mon, Jun 29, 2020 at 5:45 PM Alex Amato  wrote:

> My mistake looks like this is the failure:
> https://issues.apache.org/jira/browse/BEAM-10381
>
> I'll keep running it locally to see if it will pass, to see if the flake
> theory makes sense
>
> On Mon, Jun 29, 2020 at 5:38 PM Ahmet Altay  wrote:
>
>> It might be a flake? I restarted the "Run Python2_PVR_Flink PreCommit"
>> test. Is the JIRA link correct, it does not look directly related.
>>
>> On Mon, Jun 29, 2020 at 5:34 PM Alex Amato  wrote:
>>
>>> I thought this was a bit odd as this PR doesn't change java code or deps.
>>>
>>> Details in JIRA:
>>>
>>> https://issues.apache.org/jira/projects/BEAM/issues/BEAM-10308?filter=allopenissues
>>>
>>> 404s trying to download this file:
>>>
>>>
>>> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.8.5/hadoop-mapreduce-client-jobclient-2.8.5.jar
>>>
>>>
>>> Any ideas why this is happening and if there is a workaround to unblock
>>> my PR <https://github.com/apache/beam/pull/12084>?
>>>
>>


Unsure why Warning shows up for unmodified file on Java PR

2020-06-29 Thread Alex Amato
PR: https://github.com/apache/beam/pull/12083

Java ("Run Java PreCommit") is failing -
https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/12011/

When I dug into the console log and found the error details:
https://screenshot.googleplex.com/EJkhH9Bq8en

*15:14:55* 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Commit/src/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java:709:
error: cannot find symbol*15:14:55*   public abstract static class
TypedRead extends PTransform> {


I actually don't modify BigQuery.java in my PR
, but I do modify other
files in the same folder. PTransform looks like it is imported properly in
that file as well.

Thanks for taking a look, any ideas would be appreciated :)


Re: Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Alex Amato
My mistake looks like this is the failure:
https://issues.apache.org/jira/browse/BEAM-10381

I'll keep running it locally to see if it will pass, to see if the flake
theory makes sense

On Mon, Jun 29, 2020 at 5:38 PM Ahmet Altay  wrote:

> It might be a flake? I restarted the "Run Python2_PVR_Flink PreCommit"
> test. Is the JIRA link correct, it does not look directly related.
>
> On Mon, Jun 29, 2020 at 5:34 PM Alex Amato  wrote:
>
>> I thought this was a bit odd as this PR doesn't change java code or deps.
>>
>> Details in JIRA:
>>
>> https://issues.apache.org/jira/projects/BEAM/issues/BEAM-10308?filter=allopenissues
>>
>> 404s trying to download this file:
>>
>>
>> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.8.5/hadoop-mapreduce-client-jobclient-2.8.5.jar
>>
>>
>> Any ideas why this is happening and if there is a workaround to unblock
>> my PR <https://github.com/apache/beam/pull/12084>?
>>
>


Blocked: Precommit failing pull_licenses_java.py in python PR

2020-06-29 Thread Alex Amato
I thought this was a bit odd as this PR doesn't change java code or deps.

Details in JIRA:
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-10308?filter=allopenissues

404s trying to download this file:

https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.8.5/hadoop-mapreduce-client-jobclient-2.8.5.jar


Any ideas why this is happening and if there is a workaround to unblock my
PR ?


Error in FlinkRunnerTest.test_external_transforms

2020-06-26 Thread Alex Amato
Hi,

I was wondering if this is something wrong with my PR
 or an issue in master.
Thanks for your help.

Seeing this in my PR's presubmit
https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Commit/5382/

Logs


==
ERROR: test_external_transforms (__main__.FlinkRunnerTest)
--
Traceback (most recent call last):
 Timed out after 60 seconds. 
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/sdks/python/apache_beam/runners/portability/flink_runner_test.py",
line 204, in test_external_transforms

assert_that(res, equal_to([i for i in range(1, 10)]))
# Thread: 
  File "apache_beam/pipeline.py", line 547, in __exit__
self.run().wait_until_finish()

# Thread: 
  File "apache_beam/runners/portability/portable_runner.py", line 543,
in wait_until_finish
self._observe_state(message_thread)
  File "apache_beam/runners/portability/portable_runner.py", line 552,
in _observe_state

for state_response in self._state_stream:
# Thread: <_Worker(Thread-110, started daemon 140197924693760)>
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",
line 413, in next
return self._next()

  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",
line 697, in _next
# Thread: <_MainThread(MainThread, started 140200366741248)>
_common.wait(self._state.condition.wait, _response_ready)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",
line 138, in wait
_wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb)

  File 
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python2_PVR_Flink_Commit/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",
line 103, in _wait_once
wait_fn(timeout=timeout)
# Thread: 
  File "/usr/lib/python2.7/threading.py", line 359, in wait
_sleep(delay)
  File "apache_beam/runners/portability/portable_runner_test.py", line
82, in handler
raise BaseException(msg)
BaseException: Timed out after 60 seconds.


# Thread: <_Worker(Thread-18, started daemon 140198537066240)>

# Thread: 

--
# Thread: <_Worker(Thread-19, started daemon 140198528673536)>

Ran 82 tests in 461.409s

FAILED (errors=1, skipped=15)


Commands to detect style issues quickly before sending PR

2020-06-26 Thread Alex Amato
I sent out some PRs a few days ago, and quickly discovered a bunch of
errors and have been spending most of my time playing wack-a-mole without
knowing how to repro them all locally.

I asked this a few years ago, and wanted to make sure I have something up
to date to work with. Ideally, I'd like a single command line for
simplicity. Here is what I've been using. I'm not sure if we have a script
or gradle target which already covers this or not

*Java*
time ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
javadoc spotbugsMain compileJava compileTestJava

*Python *
./gradlew  :sdks:python:test-suites:tox:py2:lintPy27_3 && ./gradlew
:sdks:python:test-suites:tox:py37:lintPy37
&& ./gradlew :sdks:python:test-suites:tox:py38:formatter

(I think this might be correct, maybe there is a faster way to run it
directly with tox as well)


Though the python command is failing for me, perhaps I need to install
another python version. I think we have setup steps for those in the wiki...


creating
build/temp.linux-x86_64-3.8/third_party/protobuf/src/google/protobuf/util/internal

creating
build/temp.linux-x86_64-3.8/third_party/protobuf/src/google/protobuf/stubs

creating
build/temp.linux-x86_64-3.8/third_party/protobuf/src/google/protobuf/io

x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare
-DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat
-Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat
-Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC
-DHAVE_PTHREAD=1 -I. -Igrpc_root -Igrpc_root/include
-Ithird_party/protobuf/src -I/usr/include/python3.8
-I/usr/local/google/home/ajamato/beam/build/gradleenv/-1227304282/include/python3.8
-c grpc_tools/_protoc_compiler.cpp -o
build/temp.linux-x86_64-3.8/grpc_tools/_protoc_compiler.o -std=c++11
-fno-wrapv -frtti

grpc_tools/_protoc_compiler.cpp:216:10: fatal error: Python.h: No such
file or directory

  216 | #include "Python.h"

  |  ^~

compilation terminated.

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1



ERROR: Command errored out with exit status 1:
/usr/local/google/home/ajamato/beam/build/gradleenv/-1227304282/bin/python3.8
-u -c 'import sys, setuptools, tokenize; sys.argv[0] =
'"'"'/tmp/pip-install-xmf_k_sy/grpcio-tools/setup.py'"'"';
__file__='"'"'/tmp/pip-install-xmf_k_sy/grpcio-tools/setup.py'"'"';f=getattr(tokenize,
'"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
install --record /tmp/pip-record-9bhhuq55/install-record.txt
--single-version-externally-managed --compile --install-headers
/usr/local/google/home/ajamato/beam/build/gradleenv/-1227304282/include/site/python3.8/grpcio-tools
Check the logs for full command output.


*> Task :sdks:python:test-suites:tox:py38:setupVirtualenv* FAILED


Re: I can't view my build scans

2020-06-25 Thread Alex Amato
Thanks for the suggestion Luke.

Unfortunately, this didn't resolve the issue. Maybe I will need to wait for
2.4.3. :/. Or I will have to remove spotless locally.

Maybe I should chime in on that thread as well and let them know its
occurring for me on 2.4.2

When I used 2.4.2 it had the same issue (seemed to warn that it also wanted
to use 2.3). I'm not sure where this warning is coming from, and if
resolving it would even help at all.

*BUILD FAILED* in 23s

1 actionable task: 1 executed


WARNING: Several versions of the build scan plugin were applied: [2.4.2,
2.3].

The build scan data was captured by version [2.4.2].

This is often caused by multiple init scripts and/or build scripts applying
the plugin.


Publishing build scan...

https://gradle.com/s/rdkcndx7zmdio



When I used 2.3, no warning was shown

*BUILD FAILED* in 21s

1 actionable task: 1 executed


Publishing build scan...

https://gradle.com/s/bzzudejssn4mw




On Thu, Jun 25, 2020 at 3:58 PM Luke Cwik  wrote:

> I have had a similar problem in the past and had reached out to
> the company on the gradle forum[1]. They debugged that it was an issue with
> how some plugin was generating some data that was causing issues on their
> end. They suggested using a newer version and eventually fixed the problem
> on their end as well. You could try the same.
>
> 1:
> https://discuss.gradle.org/t/your-build-scan-could-not-be-displayed-what-does-this-mean/33302
>
>
> On Thu, Jun 25, 2020 at 3:52 PM Alex Amato  wrote:
>
>> Hi, for some reason I get this error when I build my build scan URLS.
>>
>> Any ideas why this is occurring? :(
>> Thanks for taking a look at this.
>>
>> https://scans.gradle.com/s/wvqklwnjrl3ky
>>
>> Here is where I enable build scans. Any issue with the plugin version or
>> something? Though, I also had it occur when I commented out the plugin and
>> passed in --scan manually.
>>
>> ajamato@ajamato-linux0:~/beam$ cat ~/.gradle/init.d/buildScan.gradle
>>
>>
>> initscript {
>>
>>   repositories {
>>
>> gradlePluginPortal()
>>
>>   }
>>
>>   dependencies {
>>
>> classpath 'com.gradle:build-scan-plugin:2.0.2'
>>
>>   }
>>
>> }
>>
>> rootProject {
>>
>>   apply plugin: com.gradle.scan.plugin.BuildScanPlugin
>>
>>   buildScan {
>>
>> publishOnFailure()
>>
>> termsOfServiceUrl = 'https://gradle.com/terms-of-service'
>>
>> termsOfServiceAgree = 'yes'
>>
>>   }
>>
>> }
>>
>>
>>


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-15 Thread Alex Amato
Thanks everyone. I was able to collect a lot of good feedback from everyone
who contributed. I am going to wrap it up for now and label the design as
"Design Finalized (Unimplemented)".

I really believe we have made a much better design than I initially wrote
up. I couldn't have done it without the help of everyone who offered their
time, energy and viewpoints. :)

Thanks again, please let me know if you see any major issues with the
design still. I think I have enough information to begin some
implementation as soon as I have some time in the coming weeks.
Alex

https://s.apache.org/beam-gcp-debuggability
https://s.apache.org/beam-histogram-metrics

On Thu, May 14, 2020 at 5:22 PM Alex Amato  wrote:

> Thanks to all who have spent their time on this, there were many great
> suggestions, just another reminder that tomorrow I will be finalizing the
> documents, unless there are any major objections left. Please take a look
> at it if you are interested.
>
> I will still welcome feedback at any time :).
>
> But I believe we have gathered enough information to produce a good
> design, which I will start to work on soon.
> I will begin to build the necessary subset of the new features proposed to
> support the BigQueryIO metrics use case, proposed.
> I will likely start with the python SDK first.
>
> https://s.apache.org/beam-gcp-debuggability
> https://s.apache.org/beam-histogram-metrics
>
>
> On Wed, May 13, 2020 at 3:07 PM Alex Amato  wrote:
>
>> Thanks again for more feedback :). I have iterated on things again. I'll
>> report back at the end of the week. If there are no major disagreements
>> still, I'll close the discussion, believe it to be in a good enough state
>> to start some implementation. But welcome feedback.
>>
>> Latest changes are changing the exponential format to allow denser
>> buckets. Using only two MonitoringInfoSpec now for all of the IOs to use.
>> Requiring some labels, but allowing optional
>> ones for specific IOs to provide more contents.
>>
>> https://s.apache.org/beam-gcp-debuggability
>> https://s.apache.org/beam-histogram-metrics
>>
>> On Mon, May 11, 2020 at 4:24 PM Alex Amato  wrote:
>>
>>> Thanks for the great feedback so far :). I've included many new ideas,
>>> and made some revisions. Both docs have changed a fair bit since the
>>> initial mail out.
>>>
>>> https://s.apache.org/beam-gcp-debuggability
>>> https://s.apache.org/beam-histogram-metrics
>>>
>>> PTAL and let me know what you think, and hopefully we can resolve major
>>> issues by the end of the week. I'll try to finalize things by then, but of
>>> course always stay open to your great ideas. :)
>>>
>>> On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:
>>>
>>>> Thanks everyone so far for taking a look so far :).
>>>>
>>>> I am hoping to have this finalize the two reviews by the end of next
>>>> week, May 15th.
>>>>
>>>> I'll continue to follow up on feedback and make changes, and I will add
>>>> some more mentions to the documents to draw attention
>>>>
>>>> https://s.apache.org/beam-gcp-debuggability
>>>>  https://s.apache.org/beam-histogram-metrics
>>>>
>>>> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>>>>
>>>>> Thanks, also took a look and left some comments.
>>>>>
>>>>> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I created another design document. This time for GCP IO Debuggability
>>>>>> Metrics. Which defines some new metrics to collect in the GCP IO 
>>>>>> libraries.
>>>>>> This is for monitoring request counts and request latencies.
>>>>>>
>>>>>> Please take a look and let me know what you think:
>>>>>> https://s.apache.org/beam-gcp-debuggability
>>>>>>
>>>>>> I also sent out a separate design yesterday (
>>>>>> https://s.apache.org/beam-histogram-metrics) which is related as
>>>>>> this document uses a Histogram style metric :).
>>>>>>
>>>>>> I would love some feedback to make this feature the best possible :D,
>>>>>> Alex
>>>>>>
>>>>>


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-14 Thread Alex Amato
Thanks to all who have spent their time on this, there were many great
suggestions, just another reminder that tomorrow I will be finalizing the
documents, unless there are any major objections left. Please take a look
at it if you are interested.

I will still welcome feedback at any time :).

But I believe we have gathered enough information to produce a good design,
which I will start to work on soon.
I will begin to build the necessary subset of the new features proposed to
support the BigQueryIO metrics use case, proposed.
I will likely start with the python SDK first.

https://s.apache.org/beam-gcp-debuggability
https://s.apache.org/beam-histogram-metrics


On Wed, May 13, 2020 at 3:07 PM Alex Amato  wrote:

> Thanks again for more feedback :). I have iterated on things again. I'll
> report back at the end of the week. If there are no major disagreements
> still, I'll close the discussion, believe it to be in a good enough state
> to start some implementation. But welcome feedback.
>
> Latest changes are changing the exponential format to allow denser
> buckets. Using only two MonitoringInfoSpec now for all of the IOs to use.
> Requiring some labels, but allowing optional
> ones for specific IOs to provide more contents.
>
> https://s.apache.org/beam-gcp-debuggability
> https://s.apache.org/beam-histogram-metrics
>
> On Mon, May 11, 2020 at 4:24 PM Alex Amato  wrote:
>
>> Thanks for the great feedback so far :). I've included many new ideas,
>> and made some revisions. Both docs have changed a fair bit since the
>> initial mail out.
>>
>> https://s.apache.org/beam-gcp-debuggability
>> https://s.apache.org/beam-histogram-metrics
>>
>> PTAL and let me know what you think, and hopefully we can resolve major
>> issues by the end of the week. I'll try to finalize things by then, but of
>> course always stay open to your great ideas. :)
>>
>> On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:
>>
>>> Thanks everyone so far for taking a look so far :).
>>>
>>> I am hoping to have this finalize the two reviews by the end of next
>>> week, May 15th.
>>>
>>> I'll continue to follow up on feedback and make changes, and I will add
>>> some more mentions to the documents to draw attention
>>>
>>> https://s.apache.org/beam-gcp-debuggability
>>>  https://s.apache.org/beam-histogram-metrics
>>>
>>> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>>>
>>>> Thanks, also took a look and left some comments.
>>>>
>>>> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I created another design document. This time for GCP IO Debuggability
>>>>> Metrics. Which defines some new metrics to collect in the GCP IO 
>>>>> libraries.
>>>>> This is for monitoring request counts and request latencies.
>>>>>
>>>>> Please take a look and let me know what you think:
>>>>> https://s.apache.org/beam-gcp-debuggability
>>>>>
>>>>> I also sent out a separate design yesterday (
>>>>> https://s.apache.org/beam-histogram-metrics) which is related as this
>>>>> document uses a Histogram style metric :).
>>>>>
>>>>> I would love some feedback to make this feature the best possible :D,
>>>>> Alex
>>>>>
>>>>


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-13 Thread Alex Amato
Thanks again for more feedback :). I have iterated on things again. I'll
report back at the end of the week. If there are no major disagreements
still, I'll close the discussion, believe it to be in a good enough state
to start some implementation. But welcome feedback.

Latest changes are changing the exponential format to allow denser buckets.
Using only two MonitoringInfoSpec now for all of the IOs to use. Requiring
some labels, but allowing optional
ones for specific IOs to provide more contents.

https://s.apache.org/beam-gcp-debuggability
https://s.apache.org/beam-histogram-metrics

On Mon, May 11, 2020 at 4:24 PM Alex Amato  wrote:

> Thanks for the great feedback so far :). I've included many new ideas, and
> made some revisions. Both docs have changed a fair bit since the initial
> mail out.
>
> https://s.apache.org/beam-gcp-debuggability
> https://s.apache.org/beam-histogram-metrics
>
> PTAL and let me know what you think, and hopefully we can resolve major
> issues by the end of the week. I'll try to finalize things by then, but of
> course always stay open to your great ideas. :)
>
> On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:
>
>> Thanks everyone so far for taking a look so far :).
>>
>> I am hoping to have this finalize the two reviews by the end of next
>> week, May 15th.
>>
>> I'll continue to follow up on feedback and make changes, and I will add
>> some more mentions to the documents to draw attention
>>
>> https://s.apache.org/beam-gcp-debuggability
>>  https://s.apache.org/beam-histogram-metrics
>>
>> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>>
>>> Thanks, also took a look and left some comments.
>>>
>>> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>>>
>>>> Hello,
>>>>
>>>> I created another design document. This time for GCP IO Debuggability
>>>> Metrics. Which defines some new metrics to collect in the GCP IO libraries.
>>>> This is for monitoring request counts and request latencies.
>>>>
>>>> Please take a look and let me know what you think:
>>>> https://s.apache.org/beam-gcp-debuggability
>>>>
>>>> I also sent out a separate design yesterday (
>>>> https://s.apache.org/beam-histogram-metrics) which is related as this
>>>> document uses a Histogram style metric :).
>>>>
>>>> I would love some feedback to make this feature the best possible :D,
>>>> Alex
>>>>
>>>


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-11 Thread Alex Amato
Thanks for the great feedback so far :). I've included many new ideas, and
made some revisions. Both docs have changed a fair bit since the initial
mail out.

https://s.apache.org/beam-gcp-debuggability
https://s.apache.org/beam-histogram-metrics

PTAL and let me know what you think, and hopefully we can resolve major
issues by the end of the week. I'll try to finalize things by then, but of
course always stay open to your great ideas. :)

On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:

> Thanks everyone so far for taking a look so far :).
>
> I am hoping to have this finalize the two reviews by the end of next week,
> May 15th.
>
> I'll continue to follow up on feedback and make changes, and I will add
> some more mentions to the documents to draw attention
>
> https://s.apache.org/beam-gcp-debuggability
>  https://s.apache.org/beam-histogram-metrics
>
> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>
>> Thanks, also took a look and left some comments.
>>
>> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>>
>>> Hello,
>>>
>>> I created another design document. This time for GCP IO Debuggability
>>> Metrics. Which defines some new metrics to collect in the GCP IO libraries.
>>> This is for monitoring request counts and request latencies.
>>>
>>> Please take a look and let me know what you think:
>>> https://s.apache.org/beam-gcp-debuggability
>>>
>>> I also sent out a separate design yesterday (
>>> https://s.apache.org/beam-histogram-metrics) which is related as this
>>> document uses a Histogram style metric :).
>>>
>>> I would love some feedback to make this feature the best possible :D,
>>> Alex
>>>
>>


Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-06 Thread Alex Amato
Thanks everyone so far for taking a look so far :).

I am hoping to have this finalize the two reviews by the end of next week,
May 15th.

I'll continue to follow up on feedback and make changes, and I will add
some more mentions to the documents to draw attention

https://s.apache.org/beam-gcp-debuggability
 https://s.apache.org/beam-histogram-metrics

On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:

> Thanks, also took a look and left some comments.
>
> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>
>> Hello,
>>
>> I created another design document. This time for GCP IO Debuggability
>> Metrics. Which defines some new metrics to collect in the GCP IO libraries.
>> This is for monitoring request counts and request latencies.
>>
>> Please take a look and let me know what you think:
>> https://s.apache.org/beam-gcp-debuggability
>>
>> I also sent out a separate design yesterday (
>> https://s.apache.org/beam-histogram-metrics) which is related as this
>> document uses a Histogram style metric :).
>>
>> I would love some feedback to make this feature the best possible :D,
>> Alex
>>
>


[Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-05 Thread Alex Amato
Hello,

I created another design document. This time for GCP IO Debuggability
Metrics. Which defines some new metrics to collect in the GCP IO libraries.
This is for monitoring request counts and request latencies.

Please take a look and let me know what you think:
https://s.apache.org/beam-gcp-debuggability

I also sent out a separate design yesterday (
https://s.apache.org/beam-histogram-metrics) which is related as this
document uses a Histogram style metric :).

I would love some feedback to make this feature the best possible :D,
Alex


Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-05-04 Thread Alex Amato
Thanks Ismaël :). Done

On Mon, May 4, 2020 at 3:59 PM Ismaël Mejía  wrote:

> Moving the short link to this thread
> https://s.apache.org/beam-histogram-metrics
>
> Alex can you add this link (and any other of your documents that may
> not be there) to
> https://cwiki.apache.org/confluence/display/BEAM/Design+Documents
>
>
> On Tue, May 5, 2020 at 12:51 AM Pablo Estrada  wrote:
> >
> > FYI +Boyuan Zhang worked on implementing a histogram metric that was
> performance-optimized into outer space for Python : ) - I don't recall if
> she ended up getting it merged, but it's worth looking at the work. I also
> remember Scott Wegner wrote the metrics for Java.
> >
> > Best
> > -P.
> >
> > On Mon, May 4, 2020 at 3:33 PM Alex Amato  wrote:
> >>
> >> Hello,
> >>
> >> I have created a proposal for Apache Beam FN API to support Histogram
> Style Metrics. Which defines a method to collect Histogram style metrics
> and pass them over the FN API.
> >>
> >> I would love to hear your feedback in order to improve this proposal,
> please let me know what you think. Thanks for taking a look :)
> >> Alex
>


[Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)

2020-05-04 Thread Alex Amato
Hello,

I have created a proposal for Apache Beam FN API to support Histogram Style
Metrics
.
Which defines a method to collect Histogram style metrics and pass them
over the FN API.

I would love to hear your feedback in order to improve this
proposal, please let me know what you think. Thanks for taking a look :)
Alex


Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics

2020-05-04 Thread Alex Amato
Sorry, wrong link. Let's close this thread and I'll send another...

On Mon, May 4, 2020 at 3:28 PM Pablo Estrada  wrote:

> Hi Alex!
> Thanks for the proposal. I've created
> https://s.apache.org/beam-histogram-metrics
>
> On Mon, May 4, 2020 at 2:44 PM Alex Amato  wrote:
>
>> Hello,
>>
>> I have created a proposal for Apache Beam FN API to support Histogram
>> Style Metrics
>> <https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit#heading=h.c6fjf0g6rsbc>.
>> Which defines a method to collect Histogram style metrics and pass them
>> over the FN API.
>>
>> Also, I would appreciate it if someone could generate an s.apache.org
>> link for this document? Unless there is some way for me to do it myself.
>>
>> I would love to hear your feedback in order to improve this
>> proposal, please let me know what you think. Thanks for taking a look :)
>> Alex
>>
>


[Proposal] Apache Beam Fn API - Histogram Style Metrics

2020-05-04 Thread Alex Amato
Hello,

I have created a proposal for Apache Beam FN API to support Histogram Style
Metrics
.
Which defines a method to collect Histogram style metrics and pass them
over the FN API.

Also, I would appreciate it if someone could generate an s.apache.org link
for this document? Unless there is some way for me to do it myself.

I would love to hear your feedback in order to improve this
proposal, please let me know what you think. Thanks for taking a look :)
Alex


Re: [VOTE + INPUT] Beam Mascot Designs, 2nd iteration - Deadline Friday, March 27

2020-03-26 Thread Alex Amato
1. Do you prefer red or black colored line art?

Black



On Thu, Mar 26, 2020 at 11:07 AM Pablo Estrada  wrote:

> 1. I am slightly inclined for black lines
> 2. I have two pieces of feedback:
> - It feels like the head and the eyes on this design are less elongated. I
> think I liked the more oval-like eyes from previous designs. (And the head
> also became less oval-like?, maybe?)
> - I like the white hand tips and stripes in the body from slide 22 of your
> previous deck. These lines are very easy to draw but I think they make the
> Firefly less flat.
>
> That's it from me!
> Thanks Julian, it's looking really good.
> Best
> -P.
>
> On Wed, Mar 25, 2020 at 10:00 PM Kenneth Knowles  wrote:
>
>> I assume that when this bug moves fast the tail will leave a cool light
>> trail.
>>
>> Kenn
>>
>> On Wed, Mar 25, 2020 at 5:45 PM Daniel Oliveira 
>> wrote:
>>
>>> 1. Do you prefer red or black colored line art?
>>>
>>>
>>> Red.
>>>
>>>
 2. Do you have any additional feedback about the mascot's shape or
 features?
>>>
>>>
>>> Love the new tail and new shadows.
>>>
>>> I like the wings better with color, but they still feel a bit dull to
>>> me. I feel they would be improved by having more vibrant colors near the
>>> tips, and possibly by going with more yellow-ish colors closer to the Beam
>>> logo. Compare with the wings from slide 10 of your previous deck
>>> ,
>>> which I like much better. Having the more vibrant color near the tips of
>>> the wings also pairs well with the new tail, which does the same thing with
>>> its yellow light.
>>>
>>> On Wed, Mar 25, 2020 at 12:11 PM Julian Bruno 
>>> wrote:
>>>
 Hello Apache Beam Community,

 Together with Aizhamal and her team, we have been working on the design
 of the Apache Beam mascot.

 We now need input from the community to continue moving forward with
 the design. Please share your input no later than Friday, March 27, at noon
 Pacific Time. Below you will find a link to the presentation of the work
 process and we are eager to know what you think of the current design [1].

 Our questions to you:

 1. Do you prefer red or black colored line art?

 2. Do you have any additional feedback about the mascot's shape or
 features?

 Please reply inline, so it is clear what exactly you are referring to. The
 vote and input phase will be open until Friday, March 27, at 12 pm Pacific
 Time. We will incorporate the feedback to the next design iteration of
 the mascot.

 Thank you,


 Julian Bruno // Visual Artist & Graphic Designer
  (510) 367-0551 / SF Bay Area, CA
 www.instagram.com/julbro.art

 [1]

  Mascot Weekly Update - 3/25
 



 ᐧ

>>>


Re: Thoughts on Covid19

2020-03-18 Thread Alex Amato
Well, you could try scaling it as an App to connect people. A simple web
architecture would be fastest to setup.

But I think a lot of people won't be able to use an app, if you had a phone
number with some operators to collect their information, then it could be
possible to get those users assistance.
There might be some privacy and security issues too around taking and
publishing people's information. So I am not too sure how to navigate that.

On Wed, Mar 18, 2020 at 3:48 PM Jan Lukavský  wrote:

> Hi Alex,
>
> great idea, thanks for that! Can we think of a solution that would be a
> little more scalable? Can we (e.g. via a mobile app) help connect people
> who need help with people who might offer help? Can we do this in
> reasonable time?
> On 3/18/20 11:42 PM, Alex Amato wrote:
>
> Here is one thing many people could do:
> - Contact your neighbors (leave a note on their door with your phone
> number) and find out if anyone is high risk and does not want to risk
> leaving their home. If you are lower risk and willing to go out. Insist
> that you can help them and obtain supplies for them. Or help them order
> online if they don't know how.
> - If there are neighbours who live along, also give them your phone
> number. Help keep track of them incase they get sick.
>
> More technical and farfetched idea:
> - Building custom ventilators. In some locations they are already out of
> respirators, and they will need more. You could donate these to a hospital,
> though I am not sure if they would use them (but they might be willing to
> if there is no other option).
> There are a few blogs on how to build these from supplies available in a
> crisis. A little bit of DIY knowhow and it may be possible to build a few.
> Even a few low quality ventilators could save some lives. Though, it may be
> possible there are more skilled people or local shops already doing this.
> Helping them get supplies and funds is another option.
> https://www.instructables.com/id/The-Pandemic-Ventilator/
>
>
>
> On Wed, Mar 18, 2020 at 3:27 PM Jan Lukavský  wrote:
>
>> Hi,
>>
>> I'm taking this opportunity to speak to this "streaming first" and
>> "datadriven" community to try to do a little brainstorming. I'm not
>> trying to create any panic, I'd like to start a serious discussion about
>> solving a problem. I'm well aware this is not the primary use-case for
>> this mailing list, but we are in a sort of special situation. I think we
>> might share a know-how that might help people and so we could take
>> advantage of that. Currently, the biggest concern (at least in Europe)
>> seems to be separating people as much as possible. My questions would be:
>>
>>   - Can we try to think of ways to help people achieve better
>> separation? There are places people must go to (e.g. shopping food), can
>> we help planning this so that there are less peaks?
>>
>>   - Can we find any other ways to help prevent the virus spread? Or any
>> other benefits we can do for people (e.g. missing medical supplies,
>> missing work force, ...)
>>
>>   - Does anyone have any infrastructure or data that can be used for this?
>>
>>   - Would people be interested in investing some of their (hacking) time
>> to implement any "global" precaution(s)? IMO there seems to be no
>> "local" solution to this, currently.
>>
>> These are only a few questions from the top of my head, please feel free
>> to add any thoughts.
>>
>> Cheers,
>>
>>   Jan
>>
>>


Re: Thoughts on Covid19

2020-03-18 Thread Alex Amato
Here is one thing many people could do:
- Contact your neighbors (leave a note on their door with your phone
number) and find out if anyone is high risk and does not want to risk
leaving their home. If you are lower risk and willing to go out. Insist
that you can help them and obtain supplies for them. Or help them order
online if they don't know how.
- If there are neighbours who live along, also give them your phone number.
Help keep track of them incase they get sick.

More technical and farfetched idea:
- Building custom ventilators. In some locations they are already out of
respirators, and they will need more. You could donate these to a hospital,
though I am not sure if they would use them (but they might be willing to
if there is no other option).
There are a few blogs on how to build these from supplies available in a
crisis. A little bit of DIY knowhow and it may be possible to build a few.
Even a few low quality ventilators could save some lives. Though, it may be
possible there are more skilled people or local shops already doing this.
Helping them get supplies and funds is another option.
https://www.instructables.com/id/The-Pandemic-Ventilator/



On Wed, Mar 18, 2020 at 3:27 PM Jan Lukavský  wrote:

> Hi,
>
> I'm taking this opportunity to speak to this "streaming first" and
> "datadriven" community to try to do a little brainstorming. I'm not
> trying to create any panic, I'd like to start a serious discussion about
> solving a problem. I'm well aware this is not the primary use-case for
> this mailing list, but we are in a sort of special situation. I think we
> might share a know-how that might help people and so we could take
> advantage of that. Currently, the biggest concern (at least in Europe)
> seems to be separating people as much as possible. My questions would be:
>
>   - Can we try to think of ways to help people achieve better
> separation? There are places people must go to (e.g. shopping food), can
> we help planning this so that there are less peaks?
>
>   - Can we find any other ways to help prevent the virus spread? Or any
> other benefits we can do for people (e.g. missing medical supplies,
> missing work force, ...)
>
>   - Does anyone have any infrastructure or data that can be used for this?
>
>   - Would people be interested in investing some of their (hacking) time
> to implement any "global" precaution(s)? IMO there seems to be no
> "local" solution to this, currently.
>
> These are only a few questions from the top of my head, please feel free
> to add any thoughts.
>
> Cheers,
>
>   Jan
>
>


Re: Beam Emitted Metrics Reference

2020-03-02 Thread Alex Amato
MonitoringInfoSpecs is effectively a list of metrics
<https://github.com/apache/beam/blob/c0b60195b17b7cdc46d6ad6548cd41a967e71cde/model/pipeline/src/main/proto/metrics.proto#L60>,
but its purpose is to simply define how SDKs should populate MonitoringInfo
protos for a RunnerHarness to interpret.

These metrics are provided by the Java and Python SDKs, and Go will soon
provide all of them as well.

But there is no requirement for a particular runner to support any of these
metrics. The DataflowRunner will support these and they metrics are
accessible via Dataflow APIs. I am not sure the state of other runners.




On Mon, Mar 2, 2020 at 2:47 AM Etienne Chauchot 
wrote:

> Hi,
>
> There is a doc about metrics here:
> https://beam.apache.org/documentation/programming-guide/#metrics
>
> You can also export the metrics to sinks (REST http endpoint and
> Graphite), see MetricsOptions class for configuration.
>
> Still, there is no doc for export on website, I'll add some
>
> Best
>
> Etienne
> On 28/02/2020 01:07, Pablo Estrada wrote:
>
> Hi Daniel!
> I think +Alex Amato  had tried to have an inventory
> of metrics at some point.
> Other than that, I don't think we have a document outlining them.
>
> Can you talk about what you plan to do with them? Do you plan to export
> them somehow? Do you plan to add your own?
> Best
> -P.
>
> On Thu, Feb 27, 2020 at 11:33 AM Daniel Chen  wrote:
>
>> Hi all,
>>
>> I some questions about the reference to the framework metrics emitted by
>> Beam. I would like to leverage these metrics to allow better monitoring of
>> by Beam jobs but cannot find any references to the description or a
>> complete set of emitted metrics.
>>
>> Do we have this information documented anywhere?
>>
>> Thanks,
>> Daniel
>>
>


Re: Updating Metrics Counter in user defined thread

2020-01-17 Thread Alex Amato
The one work around I can suggest, is if its at all possible to parallelize
the work by keying the data. This requires modifying the pipeline. I.e. the
first ParDo produces elements for different keys. Then follow that with a
GBK. Then the downstream pardo will have a thread for every key. IF those
threads all compute something and you need to combine those results in a
single place, you may need to produce elements which again rekey the data,
follow that with another GBK and a combiner.

Something sort of like this, if I understand correctly.

ParDo
GBK
ParDo
GBK
Combiner

But this work around may not work for all problems necessarily. And if the
metrics are designed to be aggregated within a single UDF, ParDo or
Combiner. So if you needed the counters to be aggregated across all of
these operations as well, then this may not work.

The backed in assumption of using the thread local setup is that
parallelism is typically handled by the Beam, rather than introducing a
separate threading model. Though, perhaps breaking out of this threading
model is more common than we initially thought.

I hope thats helpful, sorry we don't have an easy fix.

On Fri, Jan 17, 2020 at 11:39 AM Robert Bradshaw 
wrote:

> Yes, this is an issue with how counters are implemented, and there's
> no good workaround. (We could use inheritable thread locals in Java,
> but that assumes the lifetime of the thread does not outlive the
> lifetime of the DoFn, and would probably work poorly with
> threadpools). In the meantime, one can update (say) a Map in the
> spawned threads and let the main thread in processElement (and likely
> finishBundle) increment the metrics in a threadsafe way based on the
> contents of the map.
>
> On Fri, Jan 17, 2020 at 11:29 AM Yixing Zhang 
> wrote:
> >
> > Hi Beam Committers,
> >
> > I am a developer on Beam Samza runner. Currently, we are seeing some
> issues where our users failed to update Metrics in their thread. I am
> wondering if anyone has suggestions on this issue.
> >
> > Problem:
> > MetricsContainer is ThreadLocal in MetricsEnvironment. Whenever
> DelegatingCounter.inc() is called. It tries to find the MetricsContainer in
> the current thread and update the corresponding CounterCell. For Samza
> runner, we have a FnWithMetricsWrapper to set the MetricsContainer for the
> current thread before each DoFn is run. However, if users define their own
> threads inside a Pardo function and try to update the Metrics in their
> threads, they will fail to update the Metrics and get error log "Unable to
> update metrics on the current thread".
> >
> > Example:
> >
> > pipeline
> > .apply(Create.of(inputData))
> > .apply(ParDo.of(new DoFn, Void>() {
> >   @ProcessElement
> >   public void processElement(ProcessContext context) {
> > Metrics.counter("test", "counter1").inc();
> > Thread thread = new Thread(() -> {
> >   Metrics.counter("test", "counter2").inc();
> > }, "a user-defined thread");
> > thread.start();
> >   }
> > }));
> >
> > In this case, counter1 can be updated but counter2 cannot be updated
> because MetricsContainer has not been set in their thread.
> >
> > We don't have any control of user-defined threads. So, it seems
> impossible for our runner to set the MetricsContainer for their threads.
> Can someone give me some suggestions either from developer's perspective or
> from user's perspective about how to make this use case work?
> >
> > Thanks,
> > Yixing
> >
>


Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-20 Thread Alex Amato
[ ] Beaver
[ ] Hedgehog
[ ] Lemur
[ ] Owl
[ ] Salmon
[ ] Trout
[X] Robot dinosaur
[ ] Firefly
[ ] Cuttlefish
[ ] Dumbo Octopus
[ ] Angler fish


On Wed, Nov 20, 2019 at 9:15 AM Kirill Kozlov 
wrote:

> [ ] Beaver
> [ ] Hedgehog
> [X] Lemur
> [X] Owl
> [ ] Salmon
> [ ] Trout
> [ ] Robot dinosaur
> [ ] Firefly
> [ ] Cuttlefish
> [ ] Dumbo Octopus
> [X] Angler fish
>
>
> On Wed, Nov 20, 2019, 08:38 Cyrus Maden  wrote:
>
>> Here's my vote, but I'm curious about the distinction between salmon and
>> trout mascots :)
>>
>> [ ] Beaver
>> [ ] Hedgehog
>> [ X] Lemur
>> [ ] Owl
>> [ X] Salmon
>> [ ] Trout
>> [ ] Robot dinosaur
>> [ X] Firefly
>> [ ] Cuttlefish
>> [ ] Dumbo Octopus
>> [ X] Angler fish
>>
>> On Wed, Nov 20, 2019 at 11:24 AM Allan Wilson 
>> wrote:
>>
>>>
>>>
>>> On 11/20/19, 8:44 AM, "Ryan Skraba"  wrote:
>>>
>>> *** Vote for as many as you like, using this checklist as a template
>>> 
>>>
>>> [] Beaver
>>> [X] Hedgehog
>>> [X ] Lemur
>>> [ ] Owl
>>> [ ] Salmon
>>> [] Trout
>>> [ ] Robot dinosaur
>>> [ ] Firefly
>>> [ ] Cuttlefish
>>> [ ] Dumbo Octopus
>>> [ ] Angler fish
>>>
>>>
>>>


Re: Gauge Metrics

2019-10-15 Thread Alex Amato
Would you elaborate on what you are expecting the behaviour to look like?
Ideally your runner would export gauges at a periodic interval.

The design of gauge is inherently unable to handle multiple updates to it
around the same time.

Consider the case of multiple machines reporting the gauge at the same
time. You can pick the one with the largest timestamp on each machine. Then
when reported to a central metric service, it cannot compare timestamps in
a meaningful way, since they come from different machines with out of sync
clocks. Racy threads can be an issue as well (multiple bundles reporting
separate values for the gauge, the order is arbitrary based on thread
execution order even though on the same machine)

The current thinking around this IIRC, is to try and document this and make
this clear in the usage of gauge:

   1. Gauges should only be used for values which are updated infrequently.
   2. Different gauge values reported from different workers near the same
   time cannot be reliably aggregated together into a single, "most recent"
   value.






On Tue, Oct 15, 2019 at 9:55 AM Maximilian Michels  wrote:

> Hi,
>
> While adding metrics for the Python state cache [1], I was wondering
> about the story of Gauges in Beam. It seems like we only keep a value at
> a time and use a combiner [2] that replaces an old, possibly not
> reported gauge result, with a newer gauge result based on their timestamps.
>
> This behavior is an issue because if the SDK reports faster than the
> Runner queries, metrics will just be swallowed. Gauges seem important to
> get right because often users want to see all the values, e.g. in case
> of spikes in the data.
>
> What do you think about keeping all gauge values until they are reported?
>
> Thanks,
> Max
>
> [1] https://github.com/apache/beam/pull/9769
> [2]
>
> https://github.com/apache/beam/blob/fa74467b82e78962e9f170ad0d95fa6b345add67/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsContainerStepMap.java#L134
>


Re: [portability] Removing the old portable metrics API...

2019-10-09 Thread Alex Amato
@Robert Bradshaw  Dataflow is updated to use
MonitoringInfos.

This is specifically referring to the FN API Layer. Beam Python and Beam
Java export metrics using the new APIs. And the DataflowRunner harness is
consuming and using those. When I was removed from that project, most of
the metrics were implemented in the
Python and Java SDKs as MonitoringInfos.



Java SDK

Python SDK

Go SDK

User Counters

Done

Done

Legacy FN API

User Distributions

Done

Done

Legacy FN API

Execution Time Start

Done

Done

Not Started

Execution Time Process()

Done

Done

Not Started

Execution Time Finish()

Done

Done

Not Started

Element Count

Done

Done

Legacy FN API

Sampled PColl Byte Size

Pending (PR/8416 )

Handoff instructions

BEAM-7462 

Done

Legacy FN API

And the Dataflow Java Runner Harness was consuming this. +Mikhail Gryzykhin
 implemented the runner harness layer.

Do delete the deprecated stuff, we would need to get the Go SDK on
MonitoringInfos for what it has implemented so far.

Integration test coverage could be increased. But we wrote this test

.


On Wed, Oct 9, 2019 at 10:51 AM Luke Cwik  wrote:

> One way would be to report both so this way we don't need to update the
> Dataflow Java implementation but other runners using the new API get all
> the metrics.
>
> On Mon, Oct 7, 2019 at 10:00 AM Robert Bradshaw 
> wrote:
>
>> Yes, Dataflow still uses the old API, for both counters and for its
>> progress/autoscaling mechanisms. We'd need to convert that over as
>> well (which is on the TODO list but lower than finishing up support
>> for portability in general).
>>
>> On Mon, Oct 7, 2019 at 9:56 AM Robert Burke  wrote:
>> >
>> > The Go SDK uses the old API [1], but it shouldn't be too hard to
>> migrate it.
>> >
>> > The main thing I'd want to do at the same time is move the dependencies
>> on the protos out of that package and have those live only in the harness
>> package [2]. I wasn't aware of that particular separation of concerns until
>> much later, but allows for alternative harness implementations.
>> >
>> > I have some other work to get the Per-DoFn profiling metrics (eleemnt
>> count, size, time) into the Go SDK this quarter, so I can handle this then.
>> >
>> > [1]
>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/metrics/metrics.go#L474
>> > [2]
>> https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/core/runtime/harness
>> >
>> > On Fri, Oct 4, 2019 at 6:14 PM Pablo Estrada 
>> wrote:
>> >>
>> >> Hello devs,
>> >> I recently took a look at how Dataflow is retrieving metrics from the
>> Beam SDK harnesses, and noticed something. As you may (or may not)
>> remember, the portability API currently has two ways of reporting metrics.
>> Namely, the newer MonitoringInfo API[1], and the older Metrics one[2].
>> >>
>> >> This is somewhat troublesome because now we have two things that do
>> the same thing. The SDKs report double the amount of metrics[3][4], and I
>> bet it's confusing for runner implementers.
>> >>
>> >> Luckily, it seems like the Flink and Spark runners do use the new API
>> [5][6] - yay! : ) - so I guess then the only runner that uses the old API
>> is Dataflow? (internally)
>> >>
>> >> Which way does the Samza runner use? +Hai Lu?
>> >> How about the Go SDK +Robert Burke ? - Ah I bet this uses the old API?
>> >>
>> >> If they all use the MonitoringInfos, we may be able to clean up the
>> old api, and move to the new one (somewhat)soon : )
>> >>
>> >> [1]
>> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
>> >> [2]
>> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
>> >> [3]
>> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
>> >> [4]
>> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384
>> >>
>> >> [5]
>> https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
>> >> [6]
>> https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226
>>
>


Re: ParDo Execution Time stat is always 0

2019-07-15 Thread Alex Amato
Perhaps no metric at all should be returned, instead of 0, which is an
incorrect value.

Also, is there a reason to have state_sampler_slow at all then, if its not
intended to be implemented?

On Mon, Jul 15, 2019 at 5:03 PM Kyle Weaver  wrote:

> Pablo, what about setting a lower sampling rate? Or would that lead to
> poor results?
>
> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
> | +1650203
>
>
> On Mon, Jul 15, 2019 at 4:44 PM Pablo Estrada  wrote:
>
>> @Thomas do you think this is a problem of documentation, or a missing
>> feature?
>>
>> We did not add support for it without cython because the cost of locking
>> and checking every 200ms in Python would be too high - that's why this is
>> only implemented in the optimized Cython codepath. I think it makes sense
>> to document this, rather than adding the support, as it would be really
>> expensive. What are your thoughts?
>>
>> Best
>> -P.
>>
>> On Mon, Jul 15, 2019, 1:48 PM Thomas Weise  wrote:
>>
>>> That's great, but I think the JIRA needs to remain open since w/o Cython
>>> the metric still doesn't work.
>>>
>>> It would however be helpful to add a comment regarding your findings.
>>>
>>>
>>> On Mon, Jul 15, 2019 at 1:46 PM Rakesh Kumar 
>>> wrote:
>>>
>>>>
>>>> Installing cython in the application environment fixed the issue. Now I
>>>> am able to see the operator metrics ({organization_specific_prefix}
>>>> .operator.beam-metric-pardo_execution_time-process_bundle_
>>>> msecs-v1.gauge.mean)
>>>>
>>>> Thanks Ankur for looking into it and providing support.
>>>>
>>>> I am going to close  https://issues.apache.org/jira/browse/BEAM-7058 if
>>>> no one has any objection?
>>>>
>>>>
>>>> On Thu, Apr 11, 2019 at 7:13 AM Thomas Weise  wrote:
>>>>
>>>>> Tracked as https://issues.apache.org/jira/browse/BEAM-7058
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2019 at 11:38 AM Pablo Estrada 
>>>>> wrote:
>>>>>
>>>>>> This sounds like a bug then? +Alex Amato 
>>>>>>
>>>>>> On Wed, Apr 10, 2019 at 3:59 AM Maximilian Michels 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi @all,
>>>>>>>
>>>>>>>  From a quick debugging session, I conclude that the wiring is in
>>>>>>> place
>>>>>>> for the Flink Runner. There is a ProgressReporter that reports
>>>>>>> MonitoringInfos to Flink, in a similar fashion as the "legacy"
>>>>>>> Runner.
>>>>>>>
>>>>>>> The bundle duration metrics are 0, but the element count gets
>>>>>>> reported
>>>>>>> correctly. It appears to be an issue of the Python/Java harness
>>>>>>> because
>>>>>>> "ProcessBundleProgressResponse" contains only 0 values for the
>>>>>>> bundle
>>>>>>> duration.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Max
>>>>>>>
>>>>>>> On 04.04.19 19:54, Mikhail Gryzykhin wrote:
>>>>>>> > Hi everyone,
>>>>>>> >
>>>>>>> > Quick summary on python and Dataflow Runner:
>>>>>>> > Python SDK already reports:
>>>>>>> > - MSec
>>>>>>> > - User metrics (int64 and distribution)
>>>>>>> > - PCollection Element Count
>>>>>>> > - Work on MeanByteCount for pcollection is ongoing here
>>>>>>> > <https://github.com/apache/beam/pull/8062>.
>>>>>>> >
>>>>>>> > Dataflow Runner:
>>>>>>> > - all metrics listed above are passed through to Dataflow.
>>>>>>> >
>>>>>>> > Ryan can give more information on Flink Runner. I also see
>>>>>>> Maximilian on
>>>>>>> > some of relevant PRs, so he might comment on this as well.
>>>>>>> >
>>>>>>> > Regards,
>>>>>>> > Mikhail.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Apr 4, 2019 at 10:43 AM Pablo Estrada >>>>>> > <

Re: Bucketed histogram metrics in beam. Anyone currently looking into this?

2019-07-15 Thread Alex Amato
Thanks Steve, is your fork available for me to see? Would you mind linking
me to the PRs you introduced to add the histogram support to the dataflow
worker

On Fri, Jul 12, 2019 at 11:52 AM Steve Niemitz  wrote:

> I've been doing some experiments in my own fork of the Dataflow worker
> using HdrHistogram [1] to record histograms.  I export them to our own
> stats collector, not Stackdriver, but have been having good success with
> them.
>
> The problem is that the dataflow worker metrics implementation is totally
> different than the beam metrics implementation, but the concept would
> translate pretty easily I imagine.
>
> [1] https://github.com/HdrHistogram/HdrHistogram
>
> On Fri, Jul 12, 2019 at 1:33 PM Pablo Estrada  wrote:
>
>> I am not aware of anyone working on this. I do recall a couple things:
>>
>> - These metrics can be very large in terms of space. Users may cause
>> themselves trouble if they define too many of them.
>> - Not enough reason not to do it, but certainly worth considering.
>> - There is some code added by Boyuan to develop highly efficient
>> histogram-type metrics.
>>
>> Best
>> -P.
>>
>> On Fri, Jul 12, 2019 at 10:21 AM Alex Amato  wrote:
>>
>>> Hi,
>>>
>>> I was wondering if anyone has any plans to introduce bucketed
>>> histogram to beam (different from Distribution, which is just min, max, sum
>>> and count values)? I have some thoughts about how it could be done so that
>>> it integrates with stackdriver.
>>>
>>> Essentially I am referring to a timeseries of histograms, displaying
>>> buckets of values at fixed windows in time.
>>>
>>


Bucketed histogram metrics in beam. Anyone currently looking into this?

2019-07-12 Thread Alex Amato
Hi,

I was wondering if anyone has any plans to introduce bucketed histogram to
beam (different from Distribution, which is just min, max, sum and count
values)? I have some thoughts about how it could be done so that it
integrates with stackdriver.

Essentially I am referring to a timeseries of histograms, displaying
buckets of values at fixed windows in time.


Re: 1 Million Lines of Code (1 MLOC)

2019-05-31 Thread Alex Amato
Interesting, so if we play with https://github.com/cgag/loc we could break
it down further? I.e. test files vs code files? Which folders, etc. That
could be interesting as well.

On Fri, May 31, 2019 at 4:20 PM Brian Hulette  wrote:

> Dennis Nedry needed 2 million lines of code to control Jurassic Park, and
> he only had to manage eight computers! I think we may actually need to pick
> up the pace.
>
> On Fri, May 31, 2019 at 4:11 PM Anton Kedin  wrote:
>
>> And to reduce the effort of future rewrites we should start doing it on a
>> schedule. I propose we start over once a week :)
>>
>> On Fri, May 31, 2019 at 4:02 PM Lukasz Cwik  wrote:
>>
>>> 1 million lines is too much, time to delete the entire project and start
>>> over again, :-)
>>>
>>> On Fri, May 31, 2019 at 3:12 PM Ankur Goenka  wrote:
>>>
 Thanks for sharing.
 This is really interesting metrics.
 One use I can see is to track LOC vs Comments to make sure that we keep
 up with the practice of writing maintainable code.

 On Fri, May 31, 2019 at 3:04 PM Ismaël Mejía  wrote:

> I was checking some metrics in our codebase and found by chance that
> we have passed the 1 million lines of code (MLOC). Of course lines of
> code may not matter much but anyway it is interesting to see the size
> of our project at this moment.
>
> This is the detailed information returned by loc [1]:
>
>
> 
>  Language FilesLinesBlank  Comment
>  Code
>
> 
>  Java  3681   67300778265   140753
>453989
>  Python 497   1310822256013378
> 95144
>  Go 333   1057751368111073
> 81021
>  Markdown   20531989 65260
> 25463
>  Plain Text  1121979 63590
> 15620
>  Sass92 9867 1434 1900
>  6533
>  JavaScript  19 5157 1197  467
>  3493
>  YAML14 4601  454 1104
>  3043
>  Bourne Shell30 3874  470 1028
>  2376
>  Protobuf17 4258  677 1373
>  2208
>  XML 17 2789  296  559
>  1934
>  Kotlin  19 3501  347 1370
>  1784
>  HTML60 2447  148  914
>  1385
>  Batch3  249   570
>   192
>  INI  1  206   21   16
>   169
>  C++  2   724   36
>32
>  Autoconf 1   211   16
> 4
>
> 
>  Total 5002  1000874   132497   173987
>694390
>
> 
>
> [1] https://github.com/cgag/loc
>



Re: How do I debug failing runners:google-cloud-dataflow-java:examples:verifyFnApiWorker task in presubmit

2019-05-28 Thread Alex Amato
PR link
https://github.com/apache/beam/pull/8416

On Tue, May 28, 2019 at 4:25 PM Alex Amato  wrote:

> I'm had a lingering PR for some about a month now. I'm trying to get this
> passing presubmits and submitted, but I don't have enough output from the
> failing task to debug this.
>
> I think its from a wordcount timeout, but I don't know how to get more
> info. I don't think its a dataflow job with any links to its running page.
> Can this test be launched somehow in a debugger?
>
>
> https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Commit/3210/console
>
> *19:09:11* Build timed out (after 120 minutes). Marking the build as 
> aborted.*19:09:11* Build was aborted*19:09:11* Recording test 
> results*19:09:13* *19:09:13* >* Task 
> :runners:google-cloud-dataflow-java:examples:verifyFnApiWorker* 
> FAILED*19:09:13* Could not stop 
> org.gradle.internal.actor.internal.DefaultActorFactory$NonBlockingActor@3bd68689.*19:09:13*
>  org.gradle.internal.dispatch.DispatchException: Could not dispatch message 
> [MethodInvocation method: 
> processTestClass(DefaultTestClassRunInfo(org.apache.beam.examples.WordCountIT))].*19:09:13*
>   at 
> org.gradle.internal.dispatch.ExceptionTrackingFailureHandler.dispatchFailed(ExceptionTrackingFailureHandler.java:34)*19:09:13*
>at 
> org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:31)*19:09:13*
>  at 
> org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:87)*19:09:13*
>  at 
> org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:36)*19:09:13*
>at 
> org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:71)*19:09:13*
> at 
> org.gradle.internal.concurrent.InterruptibleRunnable.run(InterruptibleRunnable.java:42)*19:09:13*
> at 
> org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.run(CurrentBuildOperationPreservingRunnable.java:42)*19:09:13*
> at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)*19:09:13*
>  at 
> org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)*19:09:13*
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*19:09:13*
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*19:09:13*
> at 
> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)*19:09:13*
>   at java.lang.Thread.run(Thread.java:748)*19:09:13* Caused by: 
> org.gradle.process.internal.ExecException: Process 'Gradle Test Executor 1' 
> finished with non-zero exit value 143*19:09:13* This problem might be caused 
> by incorrect test process configuration.*19:09:13* Please refer to the test 
> execution section in the User Manual at 
> https://docs.gradle.org/5.2.1/userguide/java_testing.html#sec:test_execution*19:09:13*
>at 
> org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.stop(ForkingTestClassProcessor.java:163)*19:09:13*
> at 
> org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.endBatch(RestartEveryNTestClassProcessor.java:77)*19:09:13*
>  at 
> org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:55)*19:09:13*
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)*19:09:13*at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*19:09:13*
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*19:09:13*
>   at java.lang.reflect.Method.invoke(Method.java:498)*19:09:13*   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)*19:09:13*
>at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)*19:09:13*
>at 
> org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)*19:09:13*
>  ... 11 more*19:09:14* Setting status of 
> 59fcfe4a0e37f9a057b03d1d8bb35f35d7748252 to FAILURE with url 
> https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Commit/3210/ 
> and message: 'FAILURE*19:09:14*  '*19:09:14* Using context: 
> JavaPortabilityApi ("Run JavaPortabilityApi PreCommit")*19:09:14* The message 
> received from the daemon indicates that the daemon has disappeared.*19:09:14* 
> Build request sent: Build{id=7ed96db1-2c38-4d51-b65d-d2ced0ec2e6d, 
> currentDir=/home/jenkins/jenkins-slave/workspace/beam_PreCommit_JavaPortabilityApi_Commit/src}*19:09:14*
>  Attempting to read 

How do I debug failing runners:google-cloud-dataflow-java:examples:verifyFnApiWorker task in presubmit

2019-05-28 Thread Alex Amato
I'm had a lingering PR for some about a month now. I'm trying to get this
passing presubmits and submitted, but I don't have enough output from the
failing task to debug this.

I think its from a wordcount timeout, but I don't know how to get more
info. I don't think its a dataflow job with any links to its running page.
Can this test be launched somehow in a debugger?

https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Commit/3210/console

*19:09:11* Build timed out (after 120 minutes). Marking the build as
aborted.*19:09:11* Build was aborted*19:09:11* Recording test
results*19:09:13* *19:09:13* >* Task
:runners:google-cloud-dataflow-java:examples:verifyFnApiWorker*
FAILED*19:09:13* Could not stop
org.gradle.internal.actor.internal.DefaultActorFactory$NonBlockingActor@3bd68689.*19:09:13*
org.gradle.internal.dispatch.DispatchException: Could not dispatch
message [MethodInvocation method:
processTestClass(DefaultTestClassRunInfo(org.apache.beam.examples.WordCountIT))].*19:09:13*
at 
org.gradle.internal.dispatch.ExceptionTrackingFailureHandler.dispatchFailed(ExceptionTrackingFailureHandler.java:34)*19:09:13*
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:31)*19:09:13*
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:87)*19:09:13*
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:36)*19:09:13*
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:71)*19:09:13*
at 
org.gradle.internal.concurrent.InterruptibleRunnable.run(InterruptibleRunnable.java:42)*19:09:13*
at 
org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.run(CurrentBuildOperationPreservingRunnable.java:42)*19:09:13*
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)*19:09:13*
at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)*19:09:13*
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*19:09:13*
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*19:09:13*
at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)*19:09:13*
at java.lang.Thread.run(Thread.java:748)*19:09:13* Caused by:
org.gradle.process.internal.ExecException: Process 'Gradle Test
Executor 1' finished with non-zero exit value 143*19:09:13* This
problem might be caused by incorrect test process
configuration.*19:09:13* Please refer to the test execution section in
the User Manual at
https://docs.gradle.org/5.2.1/userguide/java_testing.html#sec:test_execution*19:09:13*
at 
org.gradle.api.internal.tasks.testing.worker.ForkingTestClassProcessor.stop(ForkingTestClassProcessor.java:163)*19:09:13*
at 
org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.endBatch(RestartEveryNTestClassProcessor.java:77)*19:09:13*
at 
org.gradle.api.internal.tasks.testing.processors.RestartEveryNTestClassProcessor.processTestClass(RestartEveryNTestClassProcessor.java:55)*19:09:13*
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)*19:09:13*   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*19:09:13*
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*19:09:13*
at java.lang.reflect.Method.invoke(Method.java:498)*19:09:13*   at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)*19:09:13*
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)*19:09:13*
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)*19:09:13*
... 11 more*19:09:14* Setting status of
59fcfe4a0e37f9a057b03d1d8bb35f35d7748252 to FAILURE with url
https://builds.apache.org/job/beam_PreCommit_JavaPortabilityApi_Commit/3210/
and message: 'FAILURE*19:09:14*  '*19:09:14* Using context:
JavaPortabilityApi ("Run JavaPortabilityApi PreCommit")*19:09:14* The
message received from the daemon indicates that the daemon has
disappeared.*19:09:14* Build request sent:
Build{id=7ed96db1-2c38-4d51-b65d-d2ced0ec2e6d,
currentDir=/home/jenkins/jenkins-slave/workspace/beam_PreCommit_JavaPortabilityApi_Commit/src}*19:09:14*
Attempting to read last messages from the daemon log...*19:09:14*
Daemon pid: 23758*19:09:14*   log file:
/home/jenkins/.gradle/daemon/5.2.1/daemon-23758.out.log*19:09:14*
- Last  20 lines from daemon log file - daemon-23758.out.log
-*19:09:14* at
org.gradle.process.internal.DefaultExecHandle.execExceptionFor(DefaultExecHandle.java:232)*19:09:14*
at 
org.gradle.process.internal.DefaultExecHandle.setEndStateInfo(DefaultExecHandle.java:209)*19:09:14*
at 

CassandraIOTest failing in presubmit

2019-05-17 Thread Alex Amato
https://issues.apache.org/jira/browse/BEAM-7355?filter=-2

Error Message

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.TransportException: [localhost/
127.0.0.1:9042] Cannot connect))
Stacktrace

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.TransportException: [localhost/
127.0.0.1:9042] Cannot connect)) at
com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:268)
at
com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:107)
at
com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect(Cluster.java:1652)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1571) at
com.datastax.driver.core.Cluster.init(Cluster.java:208) at
com.datastax.driver.core.Cluster.connectAsync(Cluster.java:376) at
com.datastax.driver.core.Cluster.connectAsync(Cluster.java:355) at
com.datastax.driver.core.Cluster.connect(Cluster.java:305) at
info.archinnov.achilles.embedded.AchillesInitializer.initializeFromParameters(AchillesInitializer.java:63)
at
info.archinnov.achilles.embedded.CassandraEmbeddedServer.(CassandraEmbeddedServer.java:64)
at
info.archinnov.achilles.embedded.CassandraEmbeddedServerBuilder.buildNativeCluster(CassandraEmbeddedServerBuilder.java:535)
at
org.apache.beam.sdk.io.cassandra.CassandraIOTest.beforeClass(CassandraIOTest.java:131)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at
org.junit.rules.RunRules.evaluate(RunRules.java:20) at
org.junit.runners.ParentRunner.run(ParentRunner.java:396) at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at

Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-30 Thread Alex Amato
Thanks, updated the JIRA with a link to this thread and a note of what
could be done.

On Mon, Apr 29, 2019 at 10:29 AM Udi Meiri  wrote:

> Pip has a --cache-dir which should be safe with concurrent writes.
>
> On Fri, Apr 26, 2019 at 3:59 PM Ahmet Altay  wrote:
>
>> It is possible to download dependencies with pip to a local directory and
>> install from there [1]. As a side benefit this is supposed to speed up the
>> installation process. Since we setup virtualenv multiple times, this could
>> actually help us in a single run. And if we can keep this cache across test
>> runs we can reduce flakiness.
>>
>> [1]
>> https://pip.pypa.io/en/latest/user_guide/#installing-from-local-packages
>>
>> On Fri, Apr 26, 2019 at 3:42 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> We do retry certain inherently flaky tests, for example, see[1]. This
>>> practice should be used with caution, see discussion [2].
>>>
>>> However retrying an individual test would not avoid the flake that Alex
>>> brought up in this thread, we'd have to retry setupVirtualEnv task that is
>>> executed once per suite of tests. Retrying just that task is different from
>>> retrying the whole suite.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/516cdb6401d9fb7adb004de472771fb1fb3a92af/sdks/python/apache_beam/runners/worker/statesampler_test.py#L41,
>>> this was discussed
>>> [2]
>>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af@%3Cdev.beam.apache.org%3E
>>>  discussed
>>>
>>> On Fri, Apr 26, 2019 at 3:30 PM Udi Meiri  wrote:
>>>
>>>> Alex, I changed my mind: I'm okay retrying single tests, just not
>>>> entire suites of tests (e.g. if precommits take an hour, retrying the run
>>>> takes up an additional hour on the Jenkins machine).
>>>> This is more of an issue in Python, where gradle does not (currently)
>>>> have insight into which tests failed and how to retry just them.
>>>>
>>>>
>>>>
>>>> On Fri, Apr 26, 2019 at 2:17 PM Alex Amato  wrote:
>>>>
>>>>> @Udi Meiri , Is this true if the specific tests are
>>>>> rerun? I don't think we should rerun all tests.
>>>>>
>>>>> On Fri, Apr 26, 2019 at 12:11 PM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> Preinstalling dependencies may affect the dependency resolution, and
>>>>>> we may end up testing a different configuration than a user would have
>>>>>> after installing beam into a clean environment.
>>>>>>
>>>>>> I do think pip uses cache, unless one specifies "--no-cache-dir". By
>>>>>> default the cache is ~/.cache/pip. Looking up the log message in OP, we 
>>>>>> can
>>>>>> see several "Using cached..." log entries. Not sure why futures was not
>>>>>> fetched from cache or PyPi. Perhaps it is also a pip flake.
>>>>>>
>>>>>> I would be against wiping flakes under the rug by rerunning the whole
>>>>>> suite after an error, but re-rerunning parts of the test environment set
>>>>>> up, that are prone to environmental flakes, such as setupVirtualEnv seems
>>>>>> reasonable. I agree with Udi that care should be taken to not overload
>>>>>> Jenkins (e.g. retries should be limited)
>>>>>>
>>>>>


Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Alex Amato
@Udi Meiri , Is this true if the specific tests are
rerun? I don't think we should rerun all tests.

On Fri, Apr 26, 2019 at 12:11 PM Valentyn Tymofieiev 
wrote:

> Preinstalling dependencies may affect the dependency resolution, and we
> may end up testing a different configuration than a user would have after
> installing beam into a clean environment.
>
> I do think pip uses cache, unless one specifies "--no-cache-dir". By
> default the cache is ~/.cache/pip. Looking up the log message in OP, we can
> see several "Using cached..." log entries. Not sure why futures was not
> fetched from cache or PyPi. Perhaps it is also a pip flake.
>
> I would be against wiping flakes under the rug by rerunning the whole
> suite after an error, but re-rerunning parts of the test environment set
> up, that are prone to environmental flakes, such as setupVirtualEnv seems
> reasonable. I agree with Udi that care should be taken to not overload
> Jenkins (e.g. retries should be limited)
>


Good command to run before pushing java PRs.

2019-04-26 Thread Alex Amato
I asked about this on the dev list in the past. Just wanted to give an FYI
that some of the command names changed. "findBugsMain" -> "spotBugsMain".

FWIW, I now use this command:
./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest javadoc
spotbugsMain compileJava compileTestJava
Hope this is useful, I put it on the wiki as well
.


[BEAM-7165] FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-04-26 Thread Alex Amato
https://issues.apache.org/jira/browse/BEAM-7165

https://builds.apache.org/job/beam_PreCommit_Java_Commit/5634/testReport/junit/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/

Note: This test was flakey and fixed in BEAM-6491
, filed this new ticket
since I am not sure if its the same issue.
Stacktrace

java.lang.AssertionError:
FileIO.MatchAll/Reshuffle.ViaRandomKey/Values/Values/Map/ParMultiDo(Anonymous).output:
Expected: iterable with items
[,
,
] in any
order but: not matched:
 at
org.apache.beam.sdk.testing.PAssert$PAssertionSite.capture(PAssert.java:169)
at org.apache.beam.sdk.testing.PAssert.that(PAssert.java:393) at
org.apache.beam.sdk.testing.PAssert.that(PAssert.java:385) at
org.apache.beam.sdk.io.FileIOTest.testMatchWatchForNewFiles(FileIOTest.java:262)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:265)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349) at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314) at
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312) at
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at
org.junit.runners.ParentRunner.run(ParentRunner.java:396) at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at

Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Alex Amato
It would be ideal to not need manual steps. If known flakey tests can be
auto retried that would be a great improvement.

On Fri, Apr 26, 2019 at 11:24 AM Valentyn Tymofieiev 
wrote:

> We could do something along the lines of retry with a back-off. Note that
> Java tests also have this problem as we sometimes fail to fetch packages
> from Maven Central.
>
> On Fri, Apr 26, 2019 at 11:19 AM Pablo Estrada  wrote:
>
>> hm no, these are somewhat common. Yes, I think we could have retries to
>> try to fix this sort of problem.
>>
>> Perhaps a mixture of reusing a virtualenv, and having retries when
>> creating it?
>>
>> On Fri, Apr 26, 2019 at 11:15 AM Alex Amato  wrote:
>>
>>> Okay but this occurred on jenkins. So does the machine need an update?
>>>
>>> On Fri, Apr 26, 2019 at 10:43 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
>>>> I think you hit a pypi flake.
>>>>
>>>> pip install futures>=2.2.0 works fine for me.
>>>>
>>>> On Fri, Apr 26, 2019 at 9:41 AM Alex Amato  wrote:
>>>>
>>>>> Would be nice to fix this as it can slow down PRs. I am not sure if this 
>>>>> one is fixed on retry yet or not.
>>>>>
>>>>>
>>>>>
>>>>> *https://issues.apache.org/jira/browse/BEAM-7164?filter=-2 
>>>>> <https://issues.apache.org/jira/browse/BEAM-7164?filter=-2>*
>>>>>
>>>>>
>>>>>
>>>>> *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull
>>>>> <https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull>*
>>>>>
>>>>>
>>>>> *18:05:44* >* Task 
>>>>> :beam-sdks-python-test-suites-dataflow:setupVirtualenv**18:05:44* New 
>>>>> python executable in 
>>>>> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44*
>>>>>  Also creating executable in 
>>>>> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44*
>>>>>  Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* 
>>>>> Running virtualenv with interpreter /usr/bin/python2.7*18:05:44* 
>>>>> DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 
>>>>> 2020. Please upgrade your Python as Python 2.7 won't be maintained after 
>>>>> that date. A future version of pip will drop support for Python 
>>>>> 2.7.*18:05:44* Collecting tox==3.0.0*18:05:44*   Using cached 
>>>>> https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl*18:05:44*
>>>>>  Collecting grpcio-tools==1.3.5*18:05:44*   Using cached 
>>>>> https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>>>>  Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44*   Using cached 
>>>>> https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl*18:05:44*
>>>>>  Collecting six (from tox==3.0.0)*18:05:44*   Using cached 
>>>>> https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl*18:05:44*
>>>>>  Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44*   Using cached 
>>>>> https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl*18:05:44*
>>>>>  Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44*   Using cached 
>>>>> https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl*18:05:44*
>>>>>  Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44*   Using 
>>>>> cached 
>>>>> https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>>>>  Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44*   Using 
>>>>> cached 
>>>>> https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>>>>  Collecting futures>=2.2.0; python_version < "3.2" (from 
>>>>> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44*   ERROR: Could not find a 
>>>>> version that satisfies the requirement futures>=2.2.0; python_version < 
>>>>> "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: 
>>>>> none)*18:05:44* ERROR: No matching distribution found for futures>=2.2.0; 
>>>>> python_version < "3.2" (from 
>>>>> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* >* Task 
>>>>> :beam-sdks-python-test-suites-dataflow:setupVirtualenv* FAILED*18:05:46*
>>>>>
>>>>>  
>>>>> <https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull>
>>>>>
>>>>>
>>>>>
>>>>>


Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Alex Amato
Okay but this occurred on jenkins. So does the machine need an update?

On Fri, Apr 26, 2019 at 10:43 AM Valentyn Tymofieiev 
wrote:

> I think you hit a pypi flake.
>
> pip install futures>=2.2.0 works fine for me.
>
> On Fri, Apr 26, 2019 at 9:41 AM Alex Amato  wrote:
>
>> Would be nice to fix this as it can slow down PRs. I am not sure if this one 
>> is fixed on retry yet or not.
>>
>>
>>
>> *https://issues.apache.org/jira/browse/BEAM-7164?filter=-2 
>> <https://issues.apache.org/jira/browse/BEAM-7164?filter=-2>*
>>
>>
>>
>> *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull
>> <https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull>*
>>
>>
>> *18:05:44* >* Task 
>> :beam-sdks-python-test-suites-dataflow:setupVirtualenv**18:05:44* New python 
>> executable in 
>> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44*
>>  Also creating executable in 
>> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44*
>>  Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* Running 
>> virtualenv with interpreter /usr/bin/python2.7*18:05:44* DEPRECATION: Python 
>> 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your 
>> Python as Python 2.7 won't be maintained after that date. A future version 
>> of pip will drop support for Python 2.7.*18:05:44* Collecting 
>> tox==3.0.0*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting grpcio-tools==1.3.5*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>  Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting six (from tox==3.0.0)*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44*   Using 
>> cached 
>> https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>  Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44*   Using 
>> cached 
>> https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>  Collecting futures>=2.2.0; python_version < "3.2" (from 
>> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44*   ERROR: Could not find a 
>> version that satisfies the requirement futures>=2.2.0; python_version < 
>> "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: 
>> none)*18:05:44* ERROR: No matching distribution found for futures>=2.2.0; 
>> python_version < "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* 
>> *18:05:46* >* Task :beam-sdks-python-test-suites-dataflow:setupVirtualenv* 
>> FAILED*18:05:46*
>>
>>  
>> <https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull>
>>
>>
>>
>>


[BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Alex Amato
Would be nice to fix this as it can slow down PRs. I am not sure if
this one is fixed on retry yet or not.



*https://issues.apache.org/jira/browse/BEAM-7164?filter=-2
*



*https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull
*


*18:05:44* >* Task
:beam-sdks-python-test-suites-dataflow:setupVirtualenv**18:05:44* New
python executable in
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44*
Also creating executable in
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44*
Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44*
Running virtualenv with interpreter /usr/bin/python2.7*18:05:44*
DEPRECATION: Python 2.7 will reach the end of its life on January 1st,
2020. Please upgrade your Python as Python 2.7 won't be maintained
after that date. A future version of pip will drop support for Python
2.7.*18:05:44* Collecting tox==3.0.0*18:05:44*   Using cached
https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl*18:05:44*
Collecting grpcio-tools==1.3.5*18:05:44*   Using cached
https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44*   Using
cached 
https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl*18:05:44*
Collecting six (from tox==3.0.0)*18:05:44*   Using cached
https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl*18:05:44*
Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44*   Using
cached 
https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl*18:05:44*
Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44*   Using cached
https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl*18:05:44*
Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44*   Using
cached 
https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44*
Using cached 
https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
Collecting futures>=2.2.0; python_version < "3.2" (from
grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44*   ERROR: Could not find
a version that satisfies the requirement futures>=2.2.0;
python_version < "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from
versions: none)*18:05:44* ERROR: No matching distribution found for
futures>=2.2.0; python_version < "3.2" (from
grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* >* Task
:beam-sdks-python-test-suites-dataflow:setupVirtualenv*
FAILED*18:05:46*

 


Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Alex Amato
+Lukasz Cwik , +Boyuan Zhang , +Lara
Schmidt 

Should splittable DoFn be considered in this design? In order to split and
scale the source step properly?

On Mon, Apr 8, 2019 at 9:11 AM Ahmet Altay  wrote:

> +dev  +Pablo Estrada  +Chamikara
> Jayalath  +Udi Meiri 
>
> Thank you Pasan. I quickly looked at the proposal and it looks good. Added
> a few folks who could offer additional feedback.
>
> On Mon, Apr 8, 2019 at 12:13 AM Pasan Kamburugamuwa <
> pasankamburugamu...@gmail.com> wrote:
>
>> Hi,
>>
>> I have updated the project proposal according to the given feedback. So
>> can you guys check my proposal again and give me your feedback about
>> corrections I have done.
>>
>> Here is the link to the updated project proposal
>>
>> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>>
>> Thank you
>> Pasan Kamburugamuwa
>>
>


test_split_crazy_sdf broken in python presubmit. 'DataInputOperation' object has no attribute 'index'

2019-04-04 Thread Alex Amato
https://jira.apache.org/jira/browse/BEAM-7006

https://builds.apache.org/job/beam_PreCommit_Python_Phrase/331/testReport/junit/apache_beam.runners.portability.fn_api_runner_test/FnApiRunnerSplitTest/test_split_crazy_sdf_2/

Traceback (most recent call last): File
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 157, in _execute response = task() File
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 216, in  lambda: self.progress_worker.do_instruction(request),
request) File
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 312, in do_instruction request.instruction_id) File
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 354, in process_bundle_split
process_bundle_split=processor.try_split(request)) File
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
line 588, in try_split desired_split.estimated_input_elements) File
"/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/worker/bundle_processor.py",
line 144, in try_split if total_buffer_size < self.index + 1:
AttributeError: 'DataInputOperation' object has no attribute 'index'


[BEAM-6835] beam-sdks-python-test-suites-tox-py35:sdist FAILED on clean branch

2019-03-14 Thread Alex Amato
https://issues.apache.org/jira/browse/BEAM-6835
Repro:
./gradlew lint :beam-sdks-python:docs

I tested this on a branch I just rebased from master, and the same issue is
occurring. I also recreated the virtual env.
https://gradle.com/s/lnlmhu2cggreg

I deleted the build directories, but it still generates multiple files and
fails.

---
> Task :beam-sdks-python-test-suites-tox-py35:sdist FAILED

FAILURE: Build completed with 2 failures.

1: Task failed with an exception.
---
* What went wrong:
Execution failed for task ':beam-sdks-python:sdist'.
> Expected directory '/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/build' to contain exactly one file,
however, it contains more than one file.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output. Run with --scan to get full insights.
==

2: Task failed with an exception.
---
* What went wrong:
Execution failed for task ':beam-sdks-python-test-suites-tox-py35:sdist'.
> Expected directory '/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/test-suites/tox/py35/build' to contain
exactly one file, however, it contains more than one file.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output. Run with --scan to get full insights.
==

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible
with Gradle 6.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See
https://docs.gradle.org/5.2.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 17s
4 actionable tasks: 4 executed

Publishing build scan...
https://gradle.com/s/lnlmhu2cggreg


Re: Looking for another reviewer and/or committer for beam metrics code.

2019-03-13 Thread Alex Amato
Thanks Pablo, appreciate that, I'll @ you again on the relevant PRs.

On Tue, Mar 12, 2019 at 6:24 PM Pablo Estrada  wrote:

> I've been looking at these recently, but I've opted to leave some stuff to
> Robert. I'm happy to pick them back up if we're looking to reduce the load
> on a single one : )
> Best
> -P.
>
> On Tue, Mar 12, 2019 at 5:29 PM Alex Amato  wrote:
>
>> Hi,
>>
>> Mikhail, Ryan and I have been working on some metric related PRs
>> recently. Robert has been reviewing and committing these changes, but we
>> would like another reviewer to get another opinion and reduce the volume
>> for Robert.
>>
>> Ryan's goal is to implement the GetMetrics API
>> <https://docs.google.com/document/d/1p7mRCUEigkrWickqCLCHBshrqQ97YIv1E5cZxJTKx3I/edit?ouid=113939718880580928184=docs_home=true>
>>  and
>> enable querying for metrics in the Flink Runner.
>>
>> Mikhail and I are finishing adding some standard metrics for the python
>> SDK and the Dataflow Runner, and writing integration tests (SDK metrics
>> design <https://s.apache.org/beam-fn-api-metrics>):
>>
>>- Element Count
>>- MeanByteCount
>>- User Distribution Metrics
>>
>> Please let us know if you would be interested in learning more about
>> metrics and get involved in reviewing the related PRs and we can add you as
>> a reviewer.
>>
>> Relevant PRs;
>> https://github.com/apache/beam/pull/7936
>> https://github.com/apache/beam/pull/7995
>> https://github.com/apache/beam/pull/8032
>> https://github.com/apache/beam/pull/7971
>> https://github.com/apache/beam/pull/7915
>> https://github.com/apache/beam/pull/7899
>>
>>
>>


Looking for another reviewer and/or committer for beam metrics code.

2019-03-12 Thread Alex Amato
Hi,

Mikhail, Ryan and I have been working on some metric related PRs recently.
Robert has been reviewing and committing these changes, but we would like
another reviewer to get another opinion and reduce the volume for Robert.

Ryan's goal is to implement the GetMetrics API

and
enable querying for metrics in the Flink Runner.

Mikhail and I are finishing adding some standard metrics for the python SDK
and the Dataflow Runner, and writing integration tests (SDK metrics design
):

   - Element Count
   - MeanByteCount
   - User Distribution Metrics

Please let us know if you would be interested in learning more about
metrics and get involved in reviewing the related PRs and we can add you as
a reviewer.

Relevant PRs;
https://github.com/apache/beam/pull/7936
https://github.com/apache/beam/pull/7995
https://github.com/apache/beam/pull/8032
https://github.com/apache/beam/pull/7971
https://github.com/apache/beam/pull/7915
https://github.com/apache/beam/pull/7899


Re: pylint command failing without an error? Blocked

2019-03-12 Thread Alex Amato
ahh, thanks somehow I missed that.

On Tue, Mar 12, 2019 at 11:32 AM Ahmet Altay  wrote:

> Error is in the logs you shared (from gradle scan)
>
> Running pycodestyle for module apache_beam  gen_protos.py  setup.py
> test_config.py:
> -> apache_beam/runners/dataflow/dataflow_metrics.py:49:1: E303 too many
> blank lines (4)
> -> apache_beam/runners/dataflow/dataflow_metrics.py:285:3: E303 too many
> blank lines (2)
> Command exited with non-zero status 1
>
> On Tue, Mar 12, 2019 at 11:27 AM Alex Amato  wrote:
>
>> Not sure how to proceed with my PR, this error says my code is 10/10 but
>> the command is still returning an error and fails.
>>
>> https://scans.gradle.com/s/47d4pkuf4tp46
>>
>>
>> :beam-sdks-python:lintPy27 FAILED
>> GLOB sdist-make: /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/setup.py
>> py27-lint recreate: /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/target/.tox/py27-lint
>> py27-lint installdeps: pycodestyle==2.3.1, pylint==1.9.3, future==0.16.0,
>> isort==4.2.15, flake8==3.5.0
>> WARNING:Discarding $PYTHONPATH from environment, to override specify
>> PYTHONPATH in 'passenv' in your configuration.
>> py27-lint inst: /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/target/.tox/dist/apache-beam-2.12.0.dev0.zip
>> py27-lint installed: DEPRECATION: Python 2.7 will reach the end of its
>> life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't
>> be maintained after that date. A future version of pip will drop support
>> for Python
>> 2.7.,apache-beam==2.12.0.dev0,astroid==1.6.5,avro==1.8.2,backports.functools-lru-cache==1.5,certifi==2019.3.9,chardet==3.0.4,configparser==3.7.3,crcmod==1.7,dill==0.2.9,docopt==0.6.2,enum34==1.1.6,fastavro==0.21.19,flake8==3.5.0,funcsigs==1.0.2,future==0.16.0,futures==3.2.0,grpcio==1.19.0,hdfs==2.2.2,httplib2==0.11.3,idna==2.8,isort==4.2.15,lazy-object-proxy==1.3.1,mccabe==0.6.1,mock==2.0.0,monotonic==1.5,nose==1.3.7,numpy==1.16.2,oauth2client==3.0.0,pandas==0.23.4,parameterized==0.6.3,pbr==5.1.3,protobuf==3.7.0,pyarrow==0.11.1,pyasn1==0.4.5,pyasn1-modules==0.2.4,pycodestyle==2.3.1,pydot==1.2.4,pyflakes==1.6.0,PyHamcrest==1.9.0,pylint==1.9.3,pyparsing==2.3.1,python-dateutil==2.8.0,pytz==2018.9,PyVCF==0.6.8,PyYAML==3.13,requests==2.21.0,rsa==4.0,singledispatch==3.4.0.3,six==1.12.0,tenacity==5.0.3,typing==3.6.6,urllib3==1.24.1,wrapt==1.11.1
>> py27-lint runtests: PYTHONHASHSEED='4209602030'
>> py27-lint runtests: commands[0] | pylint --version
>> Using config file /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/.pylintrc
>> pylint 1.9.3,
>> astroid 1.6.5
>> Python 2.7.14+ (default, Dec 5 2017, 15:17:02)
>> [GCC 7.2.0]
>> py27-lint runtests: commands[1] | pip --version
>> pip 19.0.3 from /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/target/.tox/py27-lint/lib/python2.7/site-packages/pip
>> (python 2.7)
>> py27-lint runtests: commands[2] | /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/scripts/run_tox_cleanup.sh
>> py27-lint runtests: commands[3] | time
>> /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/scripts/run_pylint.sh
>> Skipping lint for generated files: bigquery_v2_client.py,
>> bigquery_v2_messages.py, dataflow_v1b3_client.py,
>> dataflow_v1b3_messages.py, storage_v1_client.py, storage_v1_messages.py,
>> proto2_coder_test_messages_pb2.py, beam_artifact_api_pb2_grpc.py,
>> beam_artifact_api_pb2.py, beam_expansion_api_pb2_grpc.py,
>> beam_expansion_api_pb2.py, beam_fn_api_pb2_grpc.py, beam_fn_api_pb2.py,
>> beam_job_api_pb2_grpc.py, beam_job_api_pb2.py,
>> beam_provision_api_pb2_grpc.py, beam_provision_api_pb2.py,
>> beam_runner_api_pb2_grpc.py, beam_runner_api_pb2.py, endpoints_pb2_grpc.py,
>> endpoints_pb2.py, metrics_pb2_grpc.py, metrics_pb2.py,
>> standard_window_fns_pb2_grpc.py, standard_window_fns_pb2.py
>> Running pylint for module apache_beam gen_protos.py setup.py
>> test_config.py:
>> Using config file /usr/local/google/home/ajamato/go/src/
>> github.com/apache/beam/sdks/python/.pylintrc
>> 
>> Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
>> Running pycodestyle for module apache_beam gen_protos.py setup.py
>> test_config.py:
>> apache_beam/runners/dataflow/dataflow_metrics.py:49:1: E303 too many
>> blank lines (4)
>> apache_beam/runners/dataflow/dataflow_metrics.py:285:3: E303 too many
>> blank lines (2)
>

pylint command failing without an error? Blocked

2019-03-12 Thread Alex Amato
Not sure how to proceed with my PR, this error says my code is 10/10 but
the command is still returning an error and fails.

https://scans.gradle.com/s/47d4pkuf4tp46


:beam-sdks-python:lintPy27 FAILED
GLOB sdist-make: /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/setup.py
py27-lint recreate: /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/target/.tox/py27-lint
py27-lint installdeps: pycodestyle==2.3.1, pylint==1.9.3, future==0.16.0,
isort==4.2.15, flake8==3.5.0
WARNING:Discarding $PYTHONPATH from environment, to override specify
PYTHONPATH in 'passenv' in your configuration.
py27-lint inst: /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/target/.tox/dist/apache-beam-2.12.0.dev0.zip
py27-lint installed: DEPRECATION: Python 2.7 will reach the end of its life
on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be
maintained after that date. A future version of pip will drop support for
Python
2.7.,apache-beam==2.12.0.dev0,astroid==1.6.5,avro==1.8.2,backports.functools-lru-cache==1.5,certifi==2019.3.9,chardet==3.0.4,configparser==3.7.3,crcmod==1.7,dill==0.2.9,docopt==0.6.2,enum34==1.1.6,fastavro==0.21.19,flake8==3.5.0,funcsigs==1.0.2,future==0.16.0,futures==3.2.0,grpcio==1.19.0,hdfs==2.2.2,httplib2==0.11.3,idna==2.8,isort==4.2.15,lazy-object-proxy==1.3.1,mccabe==0.6.1,mock==2.0.0,monotonic==1.5,nose==1.3.7,numpy==1.16.2,oauth2client==3.0.0,pandas==0.23.4,parameterized==0.6.3,pbr==5.1.3,protobuf==3.7.0,pyarrow==0.11.1,pyasn1==0.4.5,pyasn1-modules==0.2.4,pycodestyle==2.3.1,pydot==1.2.4,pyflakes==1.6.0,PyHamcrest==1.9.0,pylint==1.9.3,pyparsing==2.3.1,python-dateutil==2.8.0,pytz==2018.9,PyVCF==0.6.8,PyYAML==3.13,requests==2.21.0,rsa==4.0,singledispatch==3.4.0.3,six==1.12.0,tenacity==5.0.3,typing==3.6.6,urllib3==1.24.1,wrapt==1.11.1
py27-lint runtests: PYTHONHASHSEED='4209602030'
py27-lint runtests: commands[0] | pylint --version
Using config file /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/.pylintrc
pylint 1.9.3,
astroid 1.6.5
Python 2.7.14+ (default, Dec 5 2017, 15:17:02)
[GCC 7.2.0]
py27-lint runtests: commands[1] | pip --version
pip 19.0.3 from /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/target/.tox/py27-lint/lib/python2.7/site-packages/pip
(python 2.7)
py27-lint runtests: commands[2] | /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/scripts/run_tox_cleanup.sh
py27-lint runtests: commands[3] | time
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/scripts/run_pylint.sh
Skipping lint for generated files: bigquery_v2_client.py,
bigquery_v2_messages.py, dataflow_v1b3_client.py,
dataflow_v1b3_messages.py, storage_v1_client.py, storage_v1_messages.py,
proto2_coder_test_messages_pb2.py, beam_artifact_api_pb2_grpc.py,
beam_artifact_api_pb2.py, beam_expansion_api_pb2_grpc.py,
beam_expansion_api_pb2.py, beam_fn_api_pb2_grpc.py, beam_fn_api_pb2.py,
beam_job_api_pb2_grpc.py, beam_job_api_pb2.py,
beam_provision_api_pb2_grpc.py, beam_provision_api_pb2.py,
beam_runner_api_pb2_grpc.py, beam_runner_api_pb2.py, endpoints_pb2_grpc.py,
endpoints_pb2.py, metrics_pb2_grpc.py, metrics_pb2.py,
standard_window_fns_pb2_grpc.py, standard_window_fns_pb2.py
Running pylint for module apache_beam gen_protos.py setup.py test_config.py:
Using config file /usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/.pylintrc

Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running pycodestyle for module apache_beam gen_protos.py setup.py
test_config.py:
apache_beam/runners/dataflow/dataflow_metrics.py:49:1: E303 too many blank
lines (4)
apache_beam/runners/dataflow/dataflow_metrics.py:285:3: E303 too many blank
lines (2)
Command exited with non-zero status 1
499.83user 15.71system 1:36.48elapsed 534%CPU (0avgtext+0avgdata
502996maxresident)k
8inputs+224outputs (0major+816639minor)pagefaults 0swaps
ERROR: InvocationError for command '/usr/bin/time
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/scripts/run_pylint.sh' (exited with code
1)
___ summary

ERROR: py27-lint: commands failed


Re: [BEAM-6761] Pydoc is giving cryptic error messages, blocking my PR :(

2019-03-01 Thread Alex Amato
Yup, copying and pasting your comment worked. Thanks for the tip Udi!
Helped me out a lot.

I guess this tool ignores all the comments when counting lines? Which makes
it hard for us to use because we have a commented headder

On Fri, Mar 1, 2019 at 3:46 PM Udi Meiri  wrote:

> Try making it a literal block:
>
> """MetricResult matchers for validating metrics in PipelineResults.
>
> example usage:
> ::
>
> result = my_pipeline.run()
> all_metrics = result.metrics().all_metrics()
>
> matchers = [
>   MetricResultMatcher(
>   namespace='myNamespace',
>   name='myName',
>   step='myStep',
>   labels={
>   'pcollection': 'myCollection',
>   'myCustomKey': 'myCustomValue'
>   },
>   attempted=42,
>   committed=42
>   )
> ]
> errors = metric_result_matchers.verify_all(all_metrics, matchers)
> self.assertFalse(errors, errors)
>
> """
> https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
>
> On Fri, Mar 1, 2019 at 3:31 PM Udi Meiri  wrote:
>
>> I think it's referring to the big comment at the top of the
>> sdks/python/apache_beam/testing/metric_result_matchers.py.
>> The line numbers are relative to the beginning of the block.
>>
>> On Fri, Mar 1, 2019 at 2:21 PM Alex Amato  wrote:
>>
>>> BEAM-6761 <https://issues.apache.org/jira/browse/BEAM-6761?filter=-2>
>>>
>>> This is blocking my PR at the moment, the output doesn't seem to match
>>> the file and I am not sure how to proceed
>>>
>>> pydoc Output
>>>
>>> https://scans.gradle.com/s/im6t66hhy4bdq/console-log?task=:beam-sdks-python:docs#L3
>>> <https://www.google.com/url?q=https://scans.gradle.com/s/im6t66hhy4bdq/console-log?task%3D:beam-sdks-python:docs%23L3=D=hangouts=1551564233723000=AFQjCNEblSgvJ5E5k_TxqgGujKeUGfuIOw>
>>>
>>> Files
>>> https://github.com/apache/beam/pull/7936/files
>>> <https://www.google.com/url?q=https://github.com/apache/beam/pull/7936/files=D=hangouts=1551564233724000=AFQjCNGXSwba2Q4Aod3FAVcYhQXOkQgYvQ>
>>>
>>>
>>>
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
>>> of apache_beam.testing.metric_result_matchers:13: WARNING: Unexpected
>>> indentation.
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
>>> of apache_beam.testing.metric_result_matchers:15: WARNING: Block quote ends
>>> without a blank line; unexpected unindent.
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
>>> of apache_beam.testing.metric_result_matchers:18: WARNING: Definition list
>>> ends without a blank line; unexpected unindent.
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
>>> of apache_beam.testing.metric_result_matchers:19: WARNING: Definition list
>>> ends without a blank line; unexpected unindent.
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
>>> of apache_beam.testing.metric_result_matchers:21: WARNING: Unexpected
>>> indentation.
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
>>> of apache_beam.testing.metric_result_matchers:22: WARNING: Block quote ends
>>> without a blank line; unexpected unindent.
>>>
>>>
>>>
>>> = copy of the file in its current state (I will probably modify the
>>> PR 
>>>
>>> https://pastebin.com/8bWrPZVJ
>>>
>>>
>>>
>>>


[BEAM-6761] Pydoc is giving cryptic error messages, blocking my PR :(

2019-03-01 Thread Alex Amato
BEAM-6761 

This is blocking my PR at the moment, the output doesn't seem to match the
file and I am not sure how to proceed

pydoc Output
https://scans.gradle.com/s/im6t66hhy4bdq/console-log?task=:beam-sdks-python:docs#L3


Files
https://github.com/apache/beam/pull/7936/files




/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
of apache_beam.testing.metric_result_matchers:13: WARNING: Unexpected
indentation.
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
of apache_beam.testing.metric_result_matchers:15: WARNING: Block quote ends
without a blank line; unexpected unindent.
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
of apache_beam.testing.metric_result_matchers:18: WARNING: Definition list
ends without a blank line; unexpected unindent.
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
of apache_beam.testing.metric_result_matchers:19: WARNING: Definition list
ends without a blank line; unexpected unindent.
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
of apache_beam.testing.metric_result_matchers:21: WARNING: Unexpected
indentation.
/usr/local/google/home/ajamato/go/src/
github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
of apache_beam.testing.metric_result_matchers:22: WARNING: Block quote ends
without a blank line; unexpected unindent.



= copy of the file in its current state (I will probably modify the PR


https://pastebin.com/8bWrPZVJ


[BEAM-6759] CassandraIOTest failing in presubmit in multiple PRs

2019-03-01 Thread Alex Amato
https://issues.apache.org/jira/browse/BEAM-6759


Hi, I have seen this test failing in presubmit in multiple PRs, which does
seem to be related to the changes. Any ideas why this is failing at the
moment?

CassandraIOTest - scans

https://builds.apache.org/job/beam_PreCommit_Java_Commit/4586/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/

https://scans.gradle.com/s/btppkeky63a5g/console-log?task=:beam-sdks-java-io-cassandra:test#L7



java.lang.NullPointerException at
org.cassandraunit.utils.EmbeddedCassandraServerHelper.dropKeyspacesWithNativeDriver(EmbeddedCassandraServerHelper.java:285)
at
org.cassandraunit.utils.EmbeddedCassandraServerHelper.dropKeyspaces(EmbeddedCassandraServerHelper.java:281)
at
org.cassandraunit.utils.EmbeddedCassandraServerHelper.cleanEmbeddedCassandra(EmbeddedCassandraServerHelper.java:193)
at
org.apache.beam.sdk.io.cassandra.CassandraIOTest.stopCassandra(CassandraIOTest.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner.run(ParentRunner.java:396) at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
at java.lang.Thread.run(Thread.java:748)


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Alex Amato
@Michael, no particular reason. I think Ken's suggestion makes more sense.

On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:

> Talking about Python:
> I only know of "./gradlew lint", which include style and some py3
> compliance checking.
> There is no auto-fix like spotlessApply AFAIK.
>
> As a side-note, I really dislike our python line continuation indent rule,
> since pycharm can't be configured to adhere to it and I find myself
> manually adjusting whitespace all the time.
>
>
> On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles  wrote:
>
>> FWIW gradle is a depgraph-based build system. You can gain a few seconds
>> by putting all but spotlessApply in one command.
>>
>> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
>> javadoc findbugsMain compileTestJava compileJava
>>
>> It might be clever to define a meta-task. Gradle "base plugin" has the
>> notable check (build and run tests), assemble (make artifacts), and build
>> (assemble + check, badly named!)
>>
>> I think something like "everything except running tests and building
>> artifacts" might be helpful.
>>
>> Kenn
>>
>> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:
>>
>>> I made a thread about this a while back for java, but I don't think the
>>> same commands like sptoless work for python.
>>>
>>> auto fixing lint issues
>>> running and quick checks which would fail the PR (without running the
>>> whole precommit?)
>>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>>
>>> FWIW, this is what I have been using for java. It will catch pretty much
>>> everything except presubmit test failures.
>>>
>>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>>> compileTestJava && ./gradlew compileJava
>>>
>>


What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Alex Amato
I made a thread about this a while back for java, but I don't think the
same commands like sptoless work for python.

auto fixing lint issues
running and quick checks which would fail the PR (without running the whole
precommit?)
Something like findbugs to detect common issues (i.e. py3 compliance)

FWIW, this is what I have been using for java. It will catch pretty much
everything except presubmit test failures.

./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
compileTestJava && ./gradlew compileJava


Re: Build Error This version of Gradle requires version 2.0.2 of the build scan plugin or later.

2019-02-19 Thread Alex Amato
I renamed my .gradle folder to make gradle generate a new one, and it was
working again for me.

On Tue, Feb 19, 2019 at 11:24 AM Michael Luckey  wrote:

> You adopted .gradle according to
> https://cwiki.apache.org/confluence/display/BEAM/Gradle+Tips -> Create
> Build Scan on failed builds
>
> So you probably do not want to delete, but edit.
>
> I missed that piece during migration to gradle 5. Sorry for the
> inconvenience
>
> On Tue, Feb 19, 2019 at 8:18 PM Alex Amato  wrote:
>
>> Is this some sort of locally installed dep, can I just delete the .gradle
>> folder?
>>
>> On Tue, Feb 19, 2019 at 11:17 AM Alex Amato  wrote:
>>
>>> My gradle knowledge is a bit limited, printing that file:
>>>
>>> ajamato@ajamato-linux0:~/go/src/github.com/apache/beam$ cat
>>> /usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle
>>> initscript {
>>> repositories {
>>> maven { url 'https://plugins.gradle.org/m2' }
>>> }
>>> dependencies {
>>> classpath 'com.gradle:build-scan-plugin:1.13.1'
>>> }
>>> }
>>> rootProject {
>>> apply plugin: com.gradle.scan.plugin.BuildScanPlugin
>>> buildScan {
>>>     publishOnFailure()
>>> termsOfServiceUrl = 'https://gradle.com/terms-of-service'
>>> termsOfServiceAgree = 'yes'
>>> }
>>> }
>>>
>>>
>>>
>>>
>>> On Tue, Feb 19, 2019 at 10:59 AM Alex Amato  wrote:
>>>
>>>> Is there some local step I should take to upgrade the plugin? I assumed
>>>> the correct version is pulled in through gradle.
>>>>
>>>>
>>>>
>>>> https://github.com/apache/beam/blob/6ce9701dff711906e09eb163eeb7bdde47220c08/build.gradle#L22
>>>>
>>>> Though does the use of apply False, not override my local version or
>>>> something?
>>>>
>>>> > Task :buildSrc:build UP-TO-DATE
>>>> FAILURE: Build failed with an exception.
>>>> * Where:
>>>> Initialization script
>>>> '/usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle' line: 10
>>>> * What went wrong:
>>>> Failed to apply plugin [class 'com.gradle.scan.plugin.BuildScanPlugin']
>>>> > This version of Gradle requires version 2.0.2 of the build scan
>>>> plugin or later.
>>>>   Please see
>>>> https://gradle.com/scans/help/gradle-incompatible-plugin-version for
>>>> more information.
>>>> * Try:
>>>> Run with --stacktrace option to get the stack trace. Run with --info or
>>>> --debug option to get more log output. Run with --scan to get full 
>>>> insights.
>>>> * Get more help at https://help.gradle.org
>>>> BUILD FAILED in 0s
>>>> This version of Gradle requires version 2.0.2 of the build scan plugin
>>>> or later.
>>>> Please see
>>>> https://gradle.com/scans/help/gradle-incompatible-plugin-version for
>>>> more information.
>>>>
>>>>
>>>>


Re: Build Error This version of Gradle requires version 2.0.2 of the build scan plugin or later.

2019-02-19 Thread Alex Amato
My gradle knowledge is a bit limited, printing that file:

ajamato@ajamato-linux0:~/go/src/github.com/apache/beam$ cat
/usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle
initscript {
repositories {
maven { url 'https://plugins.gradle.org/m2' }
}
dependencies {
classpath 'com.gradle:build-scan-plugin:1.13.1'
}
}
rootProject {
apply plugin: com.gradle.scan.plugin.BuildScanPlugin
buildScan {
publishOnFailure()
termsOfServiceUrl = 'https://gradle.com/terms-of-service'
termsOfServiceAgree = 'yes'
}
}




On Tue, Feb 19, 2019 at 10:59 AM Alex Amato  wrote:

> Is there some local step I should take to upgrade the plugin? I assumed
> the correct version is pulled in through gradle.
>
>
>
> https://github.com/apache/beam/blob/6ce9701dff711906e09eb163eeb7bdde47220c08/build.gradle#L22
>
> Though does the use of apply False, not override my local version or
> something?
>
> > Task :buildSrc:build UP-TO-DATE
> FAILURE: Build failed with an exception.
> * Where:
> Initialization script
> '/usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle' line: 10
> * What went wrong:
> Failed to apply plugin [class 'com.gradle.scan.plugin.BuildScanPlugin']
> > This version of Gradle requires version 2.0.2 of the build scan plugin
> or later.
>   Please see
> https://gradle.com/scans/help/gradle-incompatible-plugin-version for more
> information.
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or
> --debug option to get more log output. Run with --scan to get full insights.
> * Get more help at https://help.gradle.org
> BUILD FAILED in 0s
> This version of Gradle requires version 2.0.2 of the build scan plugin or
> later.
> Please see
> https://gradle.com/scans/help/gradle-incompatible-plugin-version for more
> information.
>
>
>


Re: Build Error This version of Gradle requires version 2.0.2 of the build scan plugin or later.

2019-02-19 Thread Alex Amato
Is this some sort of locally installed dep, can I just delete the .gradle
folder?

On Tue, Feb 19, 2019 at 11:17 AM Alex Amato  wrote:

> My gradle knowledge is a bit limited, printing that file:
>
> ajamato@ajamato-linux0:~/go/src/github.com/apache/beam$ cat
> /usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle
> initscript {
> repositories {
> maven { url 'https://plugins.gradle.org/m2' }
> }
> dependencies {
> classpath 'com.gradle:build-scan-plugin:1.13.1'
> }
> }
> rootProject {
> apply plugin: com.gradle.scan.plugin.BuildScanPlugin
> buildScan {
> publishOnFailure()
> termsOfServiceUrl = 'https://gradle.com/terms-of-service'
> termsOfServiceAgree = 'yes'
>     }
> }
>
>
>
>
> On Tue, Feb 19, 2019 at 10:59 AM Alex Amato  wrote:
>
>> Is there some local step I should take to upgrade the plugin? I assumed
>> the correct version is pulled in through gradle.
>>
>>
>>
>> https://github.com/apache/beam/blob/6ce9701dff711906e09eb163eeb7bdde47220c08/build.gradle#L22
>>
>> Though does the use of apply False, not override my local version or
>> something?
>>
>> > Task :buildSrc:build UP-TO-DATE
>> FAILURE: Build failed with an exception.
>> * Where:
>> Initialization script
>> '/usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle' line: 10
>> * What went wrong:
>> Failed to apply plugin [class 'com.gradle.scan.plugin.BuildScanPlugin']
>> > This version of Gradle requires version 2.0.2 of the build scan plugin
>> or later.
>>   Please see
>> https://gradle.com/scans/help/gradle-incompatible-plugin-version for
>> more information.
>> * Try:
>> Run with --stacktrace option to get the stack trace. Run with --info or
>> --debug option to get more log output. Run with --scan to get full insights.
>> * Get more help at https://help.gradle.org
>> BUILD FAILED in 0s
>> This version of Gradle requires version 2.0.2 of the build scan plugin or
>> later.
>> Please see
>> https://gradle.com/scans/help/gradle-incompatible-plugin-version for
>> more information.
>>
>>
>>


Build Error This version of Gradle requires version 2.0.2 of the build scan plugin or later.

2019-02-19 Thread Alex Amato
Is there some local step I should take to upgrade the plugin? I assumed the
correct version is pulled in through gradle.


https://github.com/apache/beam/blob/6ce9701dff711906e09eb163eeb7bdde47220c08/build.gradle#L22

Though does the use of apply False, not override my local version or
something?

> Task :buildSrc:build UP-TO-DATE
FAILURE: Build failed with an exception.
* Where:
Initialization script
'/usr/local/google/home/ajamato/.gradle/init.d/buildScan.gradle' line: 10
* What went wrong:
Failed to apply plugin [class 'com.gradle.scan.plugin.BuildScanPlugin']
> This version of Gradle requires version 2.0.2 of the build scan plugin or
later.
  Please see
https://gradle.com/scans/help/gradle-incompatible-plugin-version for more
information.
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output. Run with --scan to get full insights.
* Get more help at https://help.gradle.org
BUILD FAILED in 0s
This version of Gradle requires version 2.0.2 of the build scan plugin or
later.
Please see https://gradle.com/scans/help/gradle-incompatible-plugin-version
for more information.


Re: Signing off

2019-02-15 Thread Alex Amato
Thanks's for your contributions Scott. We will miss you.

On Fri, Feb 15, 2019 at 7:08 AM Etienne Chauchot 
wrote:

> Thank you for your contributions Scott ! Your new project seems very fun.
> Enjoy !
>
> Etienne
>
> Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit :
>
> Your work and willingness to make Beam better will be missed.
>
> Good luck for the next phase!
>
>
> On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
>
>
> Good luck!
>
>
> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
> napisał(a):
>
>
> Good luck, Scott, with your new adventure!
>
>
> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>
>
> Thank you for your contributions Scott. Best of luck!
>
>
> On 15.02.19 10:48, Michael Luckey wrote:
>
>
> Hi Scott,
>
> yes, thanks for all your time and all the best!
>
> michel
>
> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles  > wrote:
>
>+1
>
>Thanks for the contributions to community & code, and enjoy the new
>
>chapter!
>
>Kenn
>
>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise 
>> wrote:
>
>Hi Scott,
>
>Thank you for the many contributions to Beam and best of luck
>
>with the new endeavor!
>
>Thomas
>
>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner 
>> wrote:
>
>I wanted to let you all know that I've decided to pursue a
>
>new adventure in my career, which will take me away from
>
>Apache Beam development.
>
>It's been a fun and fulfilling journey. Apache Beam has been
>
>my first significant experience working in open source. I'm
>
>inspired observing how the community has come together to
>
>deliver something great.
>
>Thanks for everything. If you're curious what's next: I'll
>
>be working on Federated Learning at Google:
>
>
> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>
>Take care,
>
>Scott
>
>Got feedback? tinyurl.com/swegner-feedback
>
>
>
>
>
>


[BEAM-6671] Possible dependency issue in 2.9.0 NoSuchFieldError

2019-02-14 Thread Alex Amato
Filed this:
https://issues.apache.org/jira/browse/BEAM-6671

I received a report from a Dataflow user encountering this in Beam 2.9.0
when creating a spanner instance. I wanted to post this here as this is
known to be related to dependency conflicts in the past (
https://stackoverflow.com/questions/46684071/error-using-spannerio-in-apache-beam).
Does anyone have an idea of the root cause here? I am trying to get a bit
more information from the user in the meantime, to see if they added any
extra deps of their own. But I wanted to mention it here as well.


java.lang.NoSuchFieldError:
internal_static_google_rpc_LocalizedMessage_fieldAccessorTable
at
com.google.rpc.LocalizedMessage.internalGetFieldAccessorTable(LocalizedMessage.java:90)
at
com.google.protobuf.GeneratedMessageV3.getDescriptorForType(GeneratedMessageV3.java:121)
at io.grpc.protobuf.ProtoUtils.keyForProto(ProtoUtils.java:67)
at
com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor.(SpannerErrorInterceptor.java:47)
at
com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.(GrpcSpannerRpc.java:136)
at
com.google.cloud.spanner.SpannerOptions$DefaultSpannerRpcFactory.create(SpannerOptions.java:73)


Is there a reason why these are error logs? Missing required coder_id on grpc_port

2019-02-12 Thread Alex Amato
These errors are very spammy in certain jobs, I was wondering if we could
reduce the log level. Or put some conditions around this?

https://github.com/apache/beam/search?q=Missing+required+coder_id+on+grpc_port_q=Missing+required+coder_id+on+grpc_port


  1   2   >