Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Michał Walenia
That's interesting, the command works for me (it crashes on communication
with GCP, but that was expected). From the log output it seems you're using
Windows.
Did you use WSL to run this? Do you have an option to use Linux to check
this command? The JSON created in the Gradle script may be treated
differently in CMD (due to quoting differences between it and Bash/ZSH/
other shells).

I'd try to run this somewhere else other than CMD.
Good luck!

@Brian: -DbeamPipelineOptions parameter is constructed by Gradle from an
object via JsonOutput, so it should be a valid object after reading back.

On Thu, Nov 28, 2019 at 7:39 AM Rehman Murad Ali <
rehman.murad...@venturedive.com> wrote:

> Thank you, Brian and Michal, for replying. Here is the full command:
>
> ./gradlew
> :runners:google-cloud-dataflow-java:examples:preCommitLegacyWorker
> -PdataflowProject=apache-beam-testing -Pproject=apache-beam-testing
> -PgcpProject=apache-beam-testing
> -PgcsTempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java
> -PdataflowTempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java
>
>
> Source: Confluence Java Tips
> 
>
>
>
> *Thanks & Regards*
>
>
> *Rehman Murad Ali*
>
> Software Engineer
> Mobile: +92 3452076766
> Skype: rehman,muradali
>
> 
>
>
> On Thu, Nov 28, 2019 at 10:42 AM Zohaib Baig 
> wrote:
>
>> +Rehman Murad Ali 
>>
>> On Thu, Nov 28, 2019 at 2:58 AM Brian Hulette 
>> wrote:
>>
>>> It looks like you passed an argument like
>>> -DbeamTestPipelineOptions 
>>> "[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]",
>>> but the string inside the quotes needs to be a valid JSON array of strings.
>>> If you change it to something like -DbeamTestPipelineOptions
>>> '["--project=apache-beam-testing",...]' you should get past that error.
>>>
>>> Agree with Michał though that we could help best if you share your full
>>> command line.
>>>
>>> Brian
>>>
>>> On Wed, Nov 27, 2019 at 8:20 AM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 Hi,
 can you please post the command you used in the terminal? It seems you
 used a wrong combination of quotes, but I'd need to see it to be sure.
 Cheers,
 Michal

 On Wed, Nov 27, 2019 at 5:11 PM Rehman Murad Ali <
 rehman.murad...@venturedive.com> wrote:

> Hi Community,
>
> I have been recently trying to test data flow jobs with the beam. I
> have set up Gcloud account and tried to copy a file from the local system
> to the Gcloud storage (which works fine).
>
> Now I am trying to run preCommitLegacyWorker task on local and I am
> getting the following error:
>
> Unable to instantiate test options from system property
> beamTestPipelineOptions
>
>
> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
> character ('-' (code 45)) in numeric value: expected digit (0-9) to follow
> minus sign, for valid numeric value
>  at [Source:
> (String)"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]";
> line: 1, column: 4]
> at
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase.reportUnexpectedNumberChar(ParserMinimalBase.java:541)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleInvalidNumberStart(ReaderBasedJsonParser.java:1637)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseNegNumber(ReaderBasedJsonParser.java:1391)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:742)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1160)
> at
> com.fasterxml.jackson.databind.deser.std.StringArrayDeserializer.deserialize(StringArrayDeserializer.java:145)
>
>
>
> Any help would be appreciable.
>
>
>
> *Thanks & Regards*
>
>
> *Rehman Murad Ali*
>
> Software Engineer
> Mobile: +92 3452076766 <+92%20345%202076766>
> Skype: rehman,muradali
>
> 
>


 --

 Michał Walenia
 Polidea  | Software Engineer

 M: +48 791 432 002 

Re: Full stream-stream join semantics

2019-11-27 Thread Reza Rokni
Hi,

With regards to the processing needed for sort:
The first naive implementation of the prototype did a read and sort for
every Timer that fired ( timers was set to fire for every LHS element
timestamp, a property of the use case we was looking at). This worked but
was very slow as you would expect, so we changed things to make use of
bundle boundaries as a way to reduce the number of sorts, by storing the
sorted list into a static map ( Key-Window as key) for the duration of the
bundle. It was very effective for the use case, but added a lot of
technical debt and hard to figure out potential bugs...

With regards to memory needs:
In our use case, while there was a lot of elements, the elements were small
in size and even in batch mode we could process all of the data without
OOM. But we would want a generalized solution not to have to rely on this
property when in batch mode of course.

Just a thought Jan as a temporary solution, for your use case, would
stripping down the element to just timestamp & joinkey allow the data to
fit into memory for the batch processing mode? It would require more work
afterwards to add back the other properties ( a lhs and rhs pass I think..)
, which could make it prohibitive...?

Cheers
Reza








On Thu, 28 Nov 2019 at 14:12, Kenneth Knowles  wrote:

> Yes, I am suggesting to add more intelligent state data structures for
> just that sort of join. I tagged Reza because his work basically does it,
> but explicitly pulls a BagState into memory and sorts it. We just need to
> avoid that. It is the sort of thing that already exists in some engines so
> there's proof of concept :-). Jan makes the good point that executing the
> same join in batch you wouldn't use the same algorithm, because the
> disorder will be unbounded. In Beam you'd want a PTransform that expands
> differently based on whether the inputs are bounded or unbounded.
>
> Kenn
>
> On Tue, Nov 26, 2019 at 4:16 AM David Morávek 
> wrote:
>
>> Yes, in batch case with long-term historical data, this would be O(n^2)
>> as it basically a bubble sort. If you have large # of updates for a single
>> key, this would be super expensive.
>>
>> Kenn, can this be re-implemented with your solution?
>>
>> On Tue, Nov 26, 2019 at 1:10 PM Jan Lukavský  wrote:
>>
>>> Functionally yes. But this straightforward solution is not working for
>>> me for two main reasons:
>>>
>>>  - it either blows state in batch case or the time complexity of the
>>> sort would be O(n^2) (and reprocessing several years of dense time-series
>>> data makes it a no go)
>>>
>>>  - it is not reusable for different time-ordering needs, because the
>>> logic implemented purely in user-space cannot be transferred to different
>>> problem (there are two states needed, one for buffer, the other for
>>> user-state) and extending DoFns does not work (cannot create abstract
>>> SortedDoFn, because of the state annotation definitions)
>>>
>>> Jan
>>> On 11/26/19 12:56 PM, David Morávek wrote:
>>>
>>> Hi,
>>>
>>> I think what Jan has in mind would look something like this
>>> , if
>>> implemented in user code. Am I right?
>>>
>>> D.
>>>
>>>
>>> On Tue, Nov 26, 2019 at 10:23 AM Jan Lukavský  wrote:
>>>

 On 11/25/19 11:45 PM, Kenneth Knowles wrote:



 On Mon, Nov 25, 2019 at 1:56 PM Jan Lukavský  wrote:

> Hi Rui,
>
> > Hi Kenn, you think stateful DoFn based join can emit joined rows
> that never to be retracted because in stateful DoFn case joined rows will
> be controlled by timers and emit will be only once? If so I will agree 
> with
> it. Generally speaking, if only emit once is the factor of needing
> retraction or not.
>
> that would imply buffering elements up until watermark, then sorting
> and so reduces to the option a) again, is that true? This also has to deal
> with allowed lateness, that would mean, that with allowed lateness greater
> than zero, there can still be multiple firings and so retractions are
> needed.
>
 Specifically, when I say "bi-temporal join" I mean
 unbounded-to-unbounded join where one of the join conditions is that
 elements are within event time distance d of one another. An element at
 time t will be saved until time t + 2d and then garbage collected. Every
 matching pair can be emitted immediately.

 OK, this might simplify things a little. Is there a design doc for
 that? If there are multiple LHS elements within event time distance from
 RHS element, which one should be joined? I suppose all of them, but that is
 not "(time-varying-)relational" join semantics. In that semantics only the
 last element must be joined, because that is how a (classical) relational
 database would see the relation at time T (the old record would have been
 overwritten and not be part of the output). Because of the time distance
 constraint 

Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Rehman Murad Ali
Thank you, Brian and Michal, for replying. Here is the full command:

./gradlew
:runners:google-cloud-dataflow-java:examples:preCommitLegacyWorker
-PdataflowProject=apache-beam-testing -Pproject=apache-beam-testing
-PgcpProject=apache-beam-testing
-PgcsTempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java
-PdataflowTempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java


Source: Confluence Java Tips




*Thanks & Regards*


*Rehman Murad Ali*

Software Engineer
Mobile: +92 3452076766
Skype: rehman,muradali




On Thu, Nov 28, 2019 at 10:42 AM Zohaib Baig 
wrote:

> +Rehman Murad Ali 
>
> On Thu, Nov 28, 2019 at 2:58 AM Brian Hulette  wrote:
>
>> It looks like you passed an argument like
>> -DbeamTestPipelineOptions 
>> "[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]",
>> but the string inside the quotes needs to be a valid JSON array of strings.
>> If you change it to something like -DbeamTestPipelineOptions
>> '["--project=apache-beam-testing",...]' you should get past that error.
>>
>> Agree with Michał though that we could help best if you share your full
>> command line.
>>
>> Brian
>>
>> On Wed, Nov 27, 2019 at 8:20 AM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> Hi,
>>> can you please post the command you used in the terminal? It seems you
>>> used a wrong combination of quotes, but I'd need to see it to be sure.
>>> Cheers,
>>> Michal
>>>
>>> On Wed, Nov 27, 2019 at 5:11 PM Rehman Murad Ali <
>>> rehman.murad...@venturedive.com> wrote:
>>>
 Hi Community,

 I have been recently trying to test data flow jobs with the beam. I
 have set up Gcloud account and tried to copy a file from the local system
 to the Gcloud storage (which works fine).

 Now I am trying to run preCommitLegacyWorker task on local and I am
 getting the following error:

 Unable to instantiate test options from system property
 beamTestPipelineOptions


 Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
 character ('-' (code 45)) in numeric value: expected digit (0-9) to follow
 minus sign, for valid numeric value
  at [Source:
 (String)"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]";
 line: 1, column: 4]
 at
 com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
 at
 com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693)
 at
 com.fasterxml.jackson.core.base.ParserMinimalBase.reportUnexpectedNumberChar(ParserMinimalBase.java:541)
 at
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleInvalidNumberStart(ReaderBasedJsonParser.java:1637)
 at
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseNegNumber(ReaderBasedJsonParser.java:1391)
 at
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:742)
 at
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1160)
 at
 com.fasterxml.jackson.databind.deser.std.StringArrayDeserializer.deserialize(StringArrayDeserializer.java:145)



 Any help would be appreciable.



 *Thanks & Regards*


 *Rehman Murad Ali*

 Software Engineer
 Mobile: +92 3452076766 <+92%20345%202076766>
 Skype: rehman,muradali

 

>>>
>>>
>>> --
>>>
>>> Michał Walenia
>>> Polidea  | Software Engineer
>>>
>>> M: +48 791 432 002 <+48791432002>
>>> E: michal.wale...@polidea.com
>>>
>>> Unique Tech
>>> Check out our projects! 
>>>
>>
>
> --
>
> *Muhammad Zohaib Baig*
> Senior Software Engineer
> Mobile: +92 3443060266
> Skype: mzobii.baig
>
> 
>


Re: Python staging file weirdness

2019-11-27 Thread Valentyn Tymofieiev
Tests job specify[1] a requirements.txt file that contains two entries:
pyhamcrest, mock.

We download[2]  sources of packages specified in requirements file,
and packages they depend on. While doing so, it appears that we use a cache
directory on jenkins to store the sources of the packages [3], perhaps to
save a trip to pypi and reduce pypi flakiness? Then, we stage the entire
cache directory[4], which includes all packages ever cached. Overtime the
versions that our requirements packages need change, but I guess we don't
clean the cache on Jenkins workers.

[1]
https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/scripts/run_integration_test.sh#L197
[2]
https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L469
[3]
https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L161

[4]
https://github.com/apache/beam/blob/438055c95116f4e6e419e5faa9c42f7d329c421c/sdks/python/apache_beam/runners/portability/stager.py#L172

On Wed, Nov 27, 2019 at 11:55 AM Udi Meiri  wrote:

> I was investigating a Dataflow postcommit test failure (endpoints_pb2
> missing), and saw this in the staging directory:
>
> $ gsutil ls 
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/PyHamcrest-1.9.0.tar.gz
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow-worker.jar
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow_python_sdk.tar
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/funcsigs-1.0.2.tar.gz
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/mock-3.0.5.tar.gz
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/pipeline.pb
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/requirements.txt
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.2.0.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.4.0.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.0.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.1.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.6.0.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.0.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.1.zip
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.12.0.tar.gz
> gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.13.0.tar.gz
>
>
> Does anyone know why so many versions of setuptools need to be staged?
> Shouldn't 1 be enough?
>


Re: Full stream-stream join semantics

2019-11-27 Thread Kenneth Knowles
Yes, I am suggesting to add more intelligent state data structures for just
that sort of join. I tagged Reza because his work basically does it, but
explicitly pulls a BagState into memory and sorts it. We just need to avoid
that. It is the sort of thing that already exists in some engines so
there's proof of concept :-). Jan makes the good point that executing the
same join in batch you wouldn't use the same algorithm, because the
disorder will be unbounded. In Beam you'd want a PTransform that expands
differently based on whether the inputs are bounded or unbounded.

Kenn

On Tue, Nov 26, 2019 at 4:16 AM David Morávek 
wrote:

> Yes, in batch case with long-term historical data, this would be O(n^2) as
> it basically a bubble sort. If you have large # of updates for a single
> key, this would be super expensive.
>
> Kenn, can this be re-implemented with your solution?
>
> On Tue, Nov 26, 2019 at 1:10 PM Jan Lukavský  wrote:
>
>> Functionally yes. But this straightforward solution is not working for me
>> for two main reasons:
>>
>>  - it either blows state in batch case or the time complexity of the sort
>> would be O(n^2) (and reprocessing several years of dense time-series data
>> makes it a no go)
>>
>>  - it is not reusable for different time-ordering needs, because the
>> logic implemented purely in user-space cannot be transferred to different
>> problem (there are two states needed, one for buffer, the other for
>> user-state) and extending DoFns does not work (cannot create abstract
>> SortedDoFn, because of the state annotation definitions)
>>
>> Jan
>> On 11/26/19 12:56 PM, David Morávek wrote:
>>
>> Hi,
>>
>> I think what Jan has in mind would look something like this
>> , if
>> implemented in user code. Am I right?
>>
>> D.
>>
>>
>> On Tue, Nov 26, 2019 at 10:23 AM Jan Lukavský  wrote:
>>
>>>
>>> On 11/25/19 11:45 PM, Kenneth Knowles wrote:
>>>
>>>
>>>
>>> On Mon, Nov 25, 2019 at 1:56 PM Jan Lukavský  wrote:
>>>
 Hi Rui,

 > Hi Kenn, you think stateful DoFn based join can emit joined rows that
 never to be retracted because in stateful DoFn case joined rows will be
 controlled by timers and emit will be only once? If so I will agree with
 it. Generally speaking, if only emit once is the factor of needing
 retraction or not.

 that would imply buffering elements up until watermark, then sorting
 and so reduces to the option a) again, is that true? This also has to deal
 with allowed lateness, that would mean, that with allowed lateness greater
 than zero, there can still be multiple firings and so retractions are
 needed.

>>> Specifically, when I say "bi-temporal join" I mean
>>> unbounded-to-unbounded join where one of the join conditions is that
>>> elements are within event time distance d of one another. An element at
>>> time t will be saved until time t + 2d and then garbage collected. Every
>>> matching pair can be emitted immediately.
>>>
>>> OK, this might simplify things a little. Is there a design doc for that?
>>> If there are multiple LHS elements within event time distance from RHS
>>> element, which one should be joined? I suppose all of them, but that is not
>>> "(time-varying-)relational" join semantics. In that semantics only the last
>>> element must be joined, because that is how a (classical) relational
>>> database would see the relation at time T (the old record would have been
>>> overwritten and not be part of the output). Because of the time distance
>>> constraint this is different from the join I have in mind, because that
>>> simply joins every LHS element(s) to most recent RHS element(s) and vice
>>> versa, without any additional time constraints (that is the RHS "update"
>>> can happen arbitrarily far in past).
>>>
>>> Jan
>>>
>>>
>>> In the triggered CoGBK + join-product implementation, you do need
>>> retractions as a model concept. But you don't need full support, since they
>>> only need to be shipped as deltas and only from the CoGBK to the
>>> join-product transform where they are all consumed to create only positive
>>> elements. Again a delay is not required; this yields correct results with
>>> the "always" trigger.
>>>
>>> Neither case requires waiting or time sorting a whole buffer. The
>>> bi-temporal join requires something more, in a way, since you need to query
>>> by time range and GC time prefixes.
>>>
>>> Kenn
>>>
>>> Jan
 On 11/25/19 10:17 PM, Rui Wang wrote:



 On Mon, Nov 25, 2019 at 11:29 AM Jan Lukavský  wrote:

>
> On 11/25/19 7:47 PM, Kenneth Knowles wrote:
>
>
>
> On Sun, Nov 24, 2019 at 12:57 AM Jan Lukavský  wrote:
>
>> I can put down a design document, but before that I need to clarify
>> some things for me. I'm struggling to put all of this into a bigger
>> picture. Sorry if the arguments are circulating, but I didn't notice any
>> proposal of how 

Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Zohaib Baig
+Rehman Murad Ali 

On Thu, Nov 28, 2019 at 2:58 AM Brian Hulette  wrote:

> It looks like you passed an argument like
> -DbeamTestPipelineOptions 
> "[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]",
> but the string inside the quotes needs to be a valid JSON array of strings.
> If you change it to something like -DbeamTestPipelineOptions
> '["--project=apache-beam-testing",...]' you should get past that error.
>
> Agree with Michał though that we could help best if you share your full
> command line.
>
> Brian
>
> On Wed, Nov 27, 2019 at 8:20 AM Michał Walenia 
> wrote:
>
>> Hi,
>> can you please post the command you used in the terminal? It seems you
>> used a wrong combination of quotes, but I'd need to see it to be sure.
>> Cheers,
>> Michal
>>
>> On Wed, Nov 27, 2019 at 5:11 PM Rehman Murad Ali <
>> rehman.murad...@venturedive.com> wrote:
>>
>>> Hi Community,
>>>
>>> I have been recently trying to test data flow jobs with the beam. I have
>>> set up Gcloud account and tried to copy a file from the local system to the
>>> Gcloud storage (which works fine).
>>>
>>> Now I am trying to run preCommitLegacyWorker task on local and I am
>>> getting the following error:
>>>
>>> Unable to instantiate test options from system property
>>> beamTestPipelineOptions
>>>
>>>
>>> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
>>> character ('-' (code 45)) in numeric value: expected digit (0-9) to follow
>>> minus sign, for valid numeric value
>>>  at [Source:
>>> (String)"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]";
>>> line: 1, column: 4]
>>> at
>>> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>>> at
>>> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693)
>>> at
>>> com.fasterxml.jackson.core.base.ParserMinimalBase.reportUnexpectedNumberChar(ParserMinimalBase.java:541)
>>> at
>>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleInvalidNumberStart(ReaderBasedJsonParser.java:1637)
>>> at
>>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseNegNumber(ReaderBasedJsonParser.java:1391)
>>> at
>>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:742)
>>> at
>>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1160)
>>> at
>>> com.fasterxml.jackson.databind.deser.std.StringArrayDeserializer.deserialize(StringArrayDeserializer.java:145)
>>>
>>>
>>>
>>> Any help would be appreciable.
>>>
>>>
>>>
>>> *Thanks & Regards*
>>>
>>>
>>> *Rehman Murad Ali*
>>>
>>> Software Engineer
>>> Mobile: +92 3452076766 <+92%20345%202076766>
>>> Skype: rehman,muradali
>>>
>>> 
>>>
>>
>>
>> --
>>
>> Michał Walenia
>> Polidea  | Software Engineer
>>
>> M: +48 791 432 002 <+48791432002>
>> E: michal.wale...@polidea.com
>>
>> Unique Tech
>> Check out our projects! 
>>
>

-- 

*Muhammad Zohaib Baig*
Senior Software Engineer
Mobile: +92 3443060266
Skype: mzobii.baig




Re: Update on push-down for SQL IOs.

2019-11-27 Thread Kenneth Knowles
Nice! Thanks for the very thorough summary. I think this will be a really
good thing for Beam. Most of the IO sources are very highly optimized for
querying and will do it more efficiently than the Beam runner when the
structure of the query matches. I'm really excited to see the performance
measurements.

A have a thought: your update did not mention a few extensions that we
might consider: ParquetIO, CassandraIO/HBaseIO/BigTableIO (all should be
about the same), JdbcIO, IcebergIO (doesn't exist yet, but is basically
generalized schema-aware files as I understand it). Are these things you
are thinking about doing, or would these be Jiras that could potentially be
tagged "starter"? They seem complex but maybe your framework will make it
feasible for someone with slightly less experience to implement new
versions of what you have already finished?

Kenn

On Tue, Nov 26, 2019 at 12:19 PM Kirill Kozlov 
wrote:

> Hello everyone!
>
> I have been working on a push-down feature and would like to give a brief
> update on what is done and is still under works.
>
> *Things that are done*:
> General API for SQL IOs to provide information about what filters/projects
> they support [1]:
> - *Filter* can be unsupported, supported with field reordering, and
> supported without field reordering.
> - *Predicate* is broken down into a conjunctive normal form (CNF) and
> passed to a validator class to check what parts are supported or
> unsupported by an IO.
>
> A Calcite rule [2] that checks for push-down support, constructs a new IO
> source Rel [3] with pushed-down projects and filters when applicable, and
> preserves unsupported filters/projects.
>
> BigQuery should perform push-down when running queries in DIRECT_READ
> method [4].
>
> MongoDB project push-down support is in a PR [5] and predicate support
> will be added soon.
>
>
> *Things that are in progress:*
> Documenting how developers can enable push-down for IOs that support it.
>
> Documenting certain limitation for BigQuery push-down (ex: comparing
> values of 2 columns is not supported at the moment, so it is being
> preserved in a Calc).
>
> Updating google-cloud-bigquerystorage to 0.117.0-beta. Earlier versions
> have a gRPC message limit set to ~11MB, which may cause some pipelies to
> break when reading from a table with rows larger than the limit.
>
> Adding some sort of performance tests to run continuously to
> measure speed-up and detect regressions.
>
> Deciding how cost should be computed for the IO source Rel with push-down
> [6]. Right now the following formula is used: cost of an IO without
> push-down minus the normalized (between 0.0 and 1.0) benefit of a performed
> push-down.
> The challenge here is to make the change to the cost small enough to not
> break join reordering, but large enough to make the optimizer favor
> pushed-down IO.
>
>
> If you have any suggestions/questions/concerns I would love to hear them.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java#L36
> [2]
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
> [3]
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java
> [4]
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L128
> [5] https://github.com/apache/beam/pull/10095
> [6] https://github.com/apache/beam/pull/10060
>
> --
> Kirill
>


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Kenneth Knowles
On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold 
wrote:

> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  wrote:
> >
>
> > *Opt-in*: This is a powerful idea that I think changes everything.
> >- for an experimental new IO, a separate artifact; this way we can
> also see downloads
> >- for experimental code fragments, add checkState that the relevant
> experiment is turned on via flags
>
> To be clear the experimental artifact would have the same group ID and
> artifact ID but a different version than the non-experimental
> artifacts?  E.g.
> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>
> That could work. Changing the artifact ID or the package name would
> risk split package issues and diamond dependency problems. We'd still
> need to be careful about mixing experimental and non-experimental
> artifacts.
>

That's clever! I think using the classifier might be better than a modified
version number, e.g. org.apache.beam:beam-io-mydb:2.4.0:experimental

My prior idea was much less clever: for any version 2.X there would either
be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no
problem with a split package. There would be no "same artifact id" concern.

Your idea would allow us to ship two variants of the library, if we
developed the tooling for it. I think doing the stripping of experimental
bits and ensuring they both compile might be tricky unless we are stripping
rather disjoint piece of the library.

Kenn


Python interactive runner: test dependencies removed

2019-11-27 Thread Udi Meiri
As part of a move to stop using the deprecated (and racey) setup.py
keywords setup_requires and test_require, interactive runner dependencies
have been removed from tests in
https://github.com/apache/beam/pull/10227

If this breaks any tests, please let me know.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Preparing for Beam 2.18 release

2019-11-27 Thread Valentyn Tymofieiev
+1. Thanks, Udi!

On Wed, Nov 27, 2019 at 12:58 PM Ahmet Altay  wrote:

> Thank you Udi for keeping the release cadence. +1 to cutting 2.18.0 branch
> on time.
>
> On Thu, Nov 21, 2019 at 10:07 AM Udi Meiri  wrote:
>
>> Thanks Cham. Tomo, if there are any dependencies you believe are blockers
>> please mark them.
>> Also, only the sub-tasks
>> 
>> seem to be real upgrade tickets (so only ~150).
>>
>> Back to the original intent of my message, do we have consensus on who
>> will do the 2.18 release?
>>
>> On Wed, Nov 20, 2019 at 6:05 PM Tomo Suzuki  wrote:
>>
>>> Thank you for response.
>>>
>>> On Wed, Nov 20, 2019 at 16:49 Chamikara Jayalath 
>>> wrote:
>>>


 On Wed, Nov 20, 2019 at 1:04 PM Tomo Suzuki  wrote:

> Hi Udi,
>
> (Question) I started learning how Beam dependencies are maintained
> through releases. https://beam.apache.org/contribute/dependencies/
>  says
>
>
> *Beam community has agreed on following policies regarding upgrading
> dependencies.*
>
> ...
>
> *A significantly outdated dependency (identified manually or through
> the automated Jenkins job) should result in a JIRA that is a blocker for
> the next release. Release manager may choose to push the blocker to the
> subsequent release or downgrade from a blocker.*
>
>
> Is the statement above still valid? We have ~250 automatically created
> tickets [1] for dependency upgrade.
>

 I think it's up to the release manager as mentioned in the statement.
 We surely don't want to block releases by all these JIRAs but Beam
 community and/or release manager may decide to make some of these blockers
 if needed.
 I don't think the tool automatically makes the auto generated JIRAs
 release blockers.

 Thanks,
 Cham


> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20dependencies%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC
>
> Regards,
> Tomo
>
>
> On Wed, Nov 20, 2019 at 3:48 PM Udi Meiri  wrote:
>
>> Hi all,
>>
>> The next (2.18) release branch cut is scheduled for Dec 4, according
>> to the calendar
>> 
>> .
>> I would like to volunteer myself to do this release.
>> The plan is to cut the branch on that date, and cherrypick 
>> release-blocking
>> fixes afterwards if any.
>>
>> Any unresolved release blocking JIRA issues for 2.18 should have
>> their "Fix Version/s" marked as "2.18.0".
>>
>> Any comments or objections?
>>
>>
>
> --
> Regards,
> Tomo
>
 --
>>> Regards,
>>> Tomo
>>>
>>


Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Brian Hulette
It looks like you passed an argument like
-DbeamTestPipelineOptions
"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]",
but the string inside the quotes needs to be a valid JSON array of strings.
If you change it to something like -DbeamTestPipelineOptions
'["--project=apache-beam-testing",...]' you should get past that error.

Agree with Michał though that we could help best if you share your full
command line.

Brian

On Wed, Nov 27, 2019 at 8:20 AM Michał Walenia 
wrote:

> Hi,
> can you please post the command you used in the terminal? It seems you
> used a wrong combination of quotes, but I'd need to see it to be sure.
> Cheers,
> Michal
>
> On Wed, Nov 27, 2019 at 5:11 PM Rehman Murad Ali <
> rehman.murad...@venturedive.com> wrote:
>
>> Hi Community,
>>
>> I have been recently trying to test data flow jobs with the beam. I have
>> set up Gcloud account and tried to copy a file from the local system to the
>> Gcloud storage (which works fine).
>>
>> Now I am trying to run preCommitLegacyWorker task on local and I am
>> getting the following error:
>>
>> Unable to instantiate test options from system property
>> beamTestPipelineOptions
>>
>>
>> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
>> character ('-' (code 45)) in numeric value: expected digit (0-9) to follow
>> minus sign, for valid numeric value
>>  at [Source:
>> (String)"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]";
>> line: 1, column: 4]
>> at
>> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>> at
>> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693)
>> at
>> com.fasterxml.jackson.core.base.ParserMinimalBase.reportUnexpectedNumberChar(ParserMinimalBase.java:541)
>> at
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleInvalidNumberStart(ReaderBasedJsonParser.java:1637)
>> at
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseNegNumber(ReaderBasedJsonParser.java:1391)
>> at
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:742)
>> at
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1160)
>> at
>> com.fasterxml.jackson.databind.deser.std.StringArrayDeserializer.deserialize(StringArrayDeserializer.java:145)
>>
>>
>>
>> Any help would be appreciable.
>>
>>
>>
>> *Thanks & Regards*
>>
>>
>> *Rehman Murad Ali*
>>
>> Software Engineer
>> Mobile: +92 3452076766 <+92%20345%202076766>
>> Skype: rehman,muradali
>>
>> 
>>
>
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Elliotte Rusty Harold
On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles  wrote:
>

> *Opt-in*: This is a powerful idea that I think changes everything.
>- for an experimental new IO, a separate artifact; this way we can also 
> see downloads
>- for experimental code fragments, add checkState that the relevant 
> experiment is turned on via flags

To be clear the experimental artifact would have the same group ID and
artifact ID but a different version than the non-experimental
artifacts?  E.g.
org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental

That could work. Changing the artifact ID or the package name would
risk split package issues and diamond dependency problems. We'd still
need to be careful about mixing experimental and non-experimental
artifacts.



-- 
Elliotte Rusty Harold
elh...@ibiblio.org


Re: [PROPOSAL] Preparing for Beam 2.18 release

2019-11-27 Thread Ahmet Altay
Thank you Udi for keeping the release cadence. +1 to cutting 2.18.0 branch
on time.

On Thu, Nov 21, 2019 at 10:07 AM Udi Meiri  wrote:

> Thanks Cham. Tomo, if there are any dependencies you believe are blockers
> please mark them.
> Also, only the sub-tasks
> 
> seem to be real upgrade tickets (so only ~150).
>
> Back to the original intent of my message, do we have consensus on who
> will do the 2.18 release?
>
> On Wed, Nov 20, 2019 at 6:05 PM Tomo Suzuki  wrote:
>
>> Thank you for response.
>>
>> On Wed, Nov 20, 2019 at 16:49 Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Wed, Nov 20, 2019 at 1:04 PM Tomo Suzuki  wrote:
>>>
 Hi Udi,

 (Question) I started learning how Beam dependencies are maintained
 through releases. https://beam.apache.org/contribute/dependencies/ says


 *Beam community has agreed on following policies regarding upgrading
 dependencies.*

 ...

 *A significantly outdated dependency (identified manually or through
 the automated Jenkins job) should result in a JIRA that is a blocker for
 the next release. Release manager may choose to push the blocker to the
 subsequent release or downgrade from a blocker.*


 Is the statement above still valid? We have ~250 automatically created
 tickets [1] for dependency upgrade.

>>>
>>> I think it's up to the release manager as mentioned in the statement. We
>>> surely don't want to block releases by all these JIRAs but Beam community
>>> and/or release manager may decide to make some of these blockers if needed.
>>> I don't think the tool automatically makes the auto generated JIRAs
>>> release blockers.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
 [1]:
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20dependencies%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC

 Regards,
 Tomo


 On Wed, Nov 20, 2019 at 3:48 PM Udi Meiri  wrote:

> Hi all,
>
> The next (2.18) release branch cut is scheduled for Dec 4, according
> to the calendar
> 
> .
> I would like to volunteer myself to do this release.
> The plan is to cut the branch on that date, and cherrypick 
> release-blocking
> fixes afterwards if any.
>
> Any unresolved release blocking JIRA issues for 2.18 should have their
> "Fix Version/s" marked as "2.18.0".
>
> Any comments or objections?
>
>

 --
 Regards,
 Tomo

>>> --
>> Regards,
>> Tomo
>>
>


Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
Agree, this makes sense.

On Wed, Nov 27, 2019 at 6:23 PM Luke Cwik  wrote:

> That looks good as well.
>
> I would suggest that we make the classpath scanning system pluggable using
> PipelineOptions. For example in GcpOptions[1], we use two default instance
> factories. The first one controls which class is used as the factory[2] and
> the second one instantiates an instance of that class and creates the
> credential[3]. The same strategy could be added where there is a default
> instance factory for the set of resources and another option which controls
> which class is instantiated to provide that default.
>
> Do you think that we could make the default always:
> new ClassGraph()
>   .addClassLoader(classLoader)
>   .getClasspathURLs();
>
> 1:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
> 2:
> https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144
> 3:
> https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
>
> On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov  wrote:
>
>> I didn't think it through, but this is something I have in mind. Keep
>> existing implementation for URLClassLoader, and use URLClassLoader for
>> experimental support of Java 11.
>>
>> List urls;
>> if (classLoader instanceof URLClassLoader) {
>>   urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
>> } else {
>>   urls = new ClassGraph()
>>   .addClassLoader(classLoader)
>>   .getClasspathURLs();
>> }
>>
>> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy 
>> wrote:
>>
>>> This looks promising. Do you think you could share your code as well?
>>>
>>> That part sounds very calming:
>>> "ClassGraph is fully compatible with the new JPMS module system (Project
>>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
>>> module path. However, the code is also fully backwards compatible with JDK
>>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
>>> all interaction with the module system is implemented via reflection for
>>> backwards compatibility)."
>>>
>>> I'm currently working on rebuilding the classpath detection mechanism so
>>> that it scans java.class.path when URLClassLoader cannot be used (as Luke
>>> suggested) but if we decide to use classgraph it should be relatively easy
>>> to do that instead. Moreover, I want to enable the possibility of injecting
>>> any algorithm implementation through pipeline options - this will enable
>>> third-party vendors to inject their custom implementations if needed (SPI
>>> pattern that was mentioned at some point in a jira ticket). I think I'm
>>> pretty close to finishing that.
>>>
>>> Thanks!
>>>
>>> śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):
>>>
 Today I tried using classgraph [1] library to scan classpath in Java 11
 instead of using URLClassLoader, and after that, the job worked on
 Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
 classgraph doesn't have any dependencies. I'm wondering if we can relocate
 it to java-core jar and use in for non-URLClassLoaders?

 [1]: https://github.com/classgraph/classgraph
 [2]:
 https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java

 On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:

> I believe the closest suggestion[1] we had that worked for Java 11 and
> maintained backwards compatibility was to use the URLClassLoader to infer
> the resources and if we couldn't do that then look at the java.class.path
> system property to do the inference otherwise fail and force the users to
> tell us what. There are too many scenarios where we will do it wrong
> because of how people package and deploy their code whether it is an
> embedded application server or some other application container with a
> security manager that will prevent us from doing the right thing.
>
> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
> wrote:
>
>> Note that resources are more properly tied to specific operations and
>> stages, not to the entire pipeline. This is especially true in the
>> face of libraries (which should have the ability to declare their own
>> resources) and cross-language.
>>
>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
>> wrote:
>> >
>> > I figured that it would be good to bump this thread for greater
>> visibility even though I don't have a strong opinion about this (yet -
>> hopefully, I will know more later to 

Python staging file weirdness

2019-11-27 Thread Udi Meiri
I was investigating a Dataflow postcommit test failure (endpoints_pb2
missing), and saw this in the staging directory:

$ gsutil ls 
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/PyHamcrest-1.9.0.tar.gz
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow-worker.jar
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/dataflow_python_sdk.tar
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/funcsigs-1.0.2.tar.gz
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/mock-3.0.5.tar.gz
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/pipeline.pb
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/requirements.txt
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.2.0.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.4.0.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.0.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.5.1.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-41.6.0.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.0.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/setuptools-42.0.1.zip
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.12.0.tar.gz
gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1126202146-314738.1574799706.314882/six-1.13.0.tar.gz


Does anyone know why so many versions of setuptools need to be staged?
Shouldn't 1 be enough?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Andrew Pilloud
We need is an annotation checker to ensure every public method is
tagged either @Experimental or @Deprecated. That way there will be no
confusion about what we expect to be stable. If we really want to
offer stable APIs there exist many tools (such as JAPICC[1]) to ensure
we don't make breaking changes. Without some actual tests we are just
hoping we don't break anything, so everything is actually
@Experimental.

Andrew

[1] https://lvc.github.io/japi-compliance-checker/

On Wed, Nov 27, 2019 at 10:12 AM Kenneth Knowles  wrote:
>
> Hi all
>
> I wanted to start a dedicated thread to the discussion of how to manage our 
> @Experimental annotations, API evolution in general, etc.
>
> After some email back-and-forth this will get too big so then I will try to 
> summarize into a document. But I think a thread to start with makes sense.
>
> Problem statement:
>
> 1. Users need stable APIs so their software can just keep working
> 2. Breaking changes are necessary to achieve correctness / high quality
>
> Neither of these is actually universally true. Many changes don't really hurt 
> users, and some APIs are so obvious they don't require adjustment, or 
> continuing to use an inferior API is OK since at least correctness is 
> possible.
>
> But we have had to many breaking changes in Beam, some quite late, for the 
> purposes of fixing major data loss bugs, design errors, changes in underlying 
> services, and usability. [1] So I take for granted that we do need to make 
> these changes.
>
> So the problem becomes:
>
> 1. Users need to know *which* APIs are frozen, clearly and with enough buy-in 
> that changes don't surprise them
> 2. Useful APIs that are not technically frozen but never change will still 
> get usage and should "graduate"
>
> Current status:
>
>  - APIs (classes, methods, etc) can be marked "experimental" with annotations 
> in languages
>  - "experimental" features are shipped in the same jar with non-experimental 
> bits; sometimes it is just a couple methods or classes
>  - "experimental" APIs are supposed to allow breaking changes
>  - there is no particular process for removing "experimental" status
>  - we do go through "deprecation" process even for experimental things
>
> Downsides to this:
>
>  - tons of Beam has become very mature but still "experimental" so it isn't 
> really safe to make breaking changes
>  - users are not really alerted that well to when they are using unstable 
> pieces
>  - we don't have an easy way to determine the impact of any breaking changes
>  - we also don't have a clear policy or guidance around underlying 
> services/client libs making breaking changes (such as services rejecting 
> older clients)
>  - having something both "experimental" and "deprecated" is maybe confusing, 
> but also just deleting experimental stuff is not safe in the current state of 
> things
>
> Some proposals that I can think of people made:
>
>  - making experimental features opt-in only (for example by a separate dep or 
> a flag)
>  - putting a version next to any experimental annotation and force review at 
> that time (lots of objections to this, but noting it for completeness)
>  - reviews for graduating on a case-by-case basis, with dev@ thread and maybe 
> vote
>  - try to improve our ability to know usage of experimental features (really, 
> all features!)
>
> I will start with my own thoughts from here:
>
> *Opt-in*: This is a powerful idea that I think changes everything.
>- for an experimental new IO, a separate artifact; this way we can also 
> see downloads
>- for experimental code fragments, add checkState that the relevant 
> experiment is turned on via flags
>
> *Graduation*: Once things are opt-in, the drive to graduate them will be 
> stronger than it is today. I think vote is appropriate, with rationale 
> including usage and test coverage and stability, since it is a commitment by 
> the community to maintain the code, which constitutes most of the TCO of code.
>
> *Tracking*:
>  - We should know what experiments we have and how old they are.
>  - It means that just tagging methods and classes "@Experimental" doesn't 
> really work. I think that is probably a good thing. It is confusing to have 
> hundreds of tiny experiments. We can target larger-scale experiments.
>  - If we regularly poll on twitter or user@ about features then it might 
> become a source of OK signal, for things where we cannot look at download 
> stats.
>
> I think with these three approaches, the @Experimental annotation is actually 
> obsolete. We could still use it to drive some kind of annotation processor to 
> ensure "if there is @Experimental then there is a checkState" but I don't 
> have experience doing such things.
>
> Kenn
>
> [1] 
> https://lists.apache.org/thread.html/1bfe7aa55f8d77c4ddfde39595c9473b233edfcc3255ed38b3f85612@%3Cdev.beam.apache.org%3E


[DISCUSS] @Experimental annotations - processes and alternatives

2019-11-27 Thread Kenneth Knowles
Hi all

I wanted to start a dedicated thread to the discussion of how to manage
our @Experimental annotations, API evolution in general, etc.

After some email back-and-forth this will get too big so then I will try to
summarize into a document. But I think a thread to start with makes sense.

Problem statement:

1. Users need stable APIs so their software can just keep working
2. Breaking changes are necessary to achieve correctness / high quality

Neither of these is actually universally true. Many changes don't really
hurt users, and some APIs are so obvious they don't require adjustment, or
continuing to use an inferior API is OK since at least correctness is
possible.

But we have had to many breaking changes in Beam, some quite late, for the
purposes of fixing major data loss bugs, design errors, changes in
underlying services, and usability. [1] So I take for granted that we do
need to make these changes.

So the problem becomes:

1. Users need to know *which* APIs are frozen, clearly and with enough
buy-in that changes don't surprise them
2. Useful APIs that are not technically frozen but never change will still
get usage and should "graduate"

Current status:

 - APIs (classes, methods, etc) can be marked "experimental" with
annotations in languages
 - "experimental" features are shipped in the same jar with
non-experimental bits; sometimes it is just a couple methods or classes
 - "experimental" APIs are supposed to allow breaking changes
 - there is no particular process for removing "experimental" status
 - we do go through "deprecation" process even for experimental things

Downsides to this:

 - tons of Beam has become very mature but still "experimental" so it isn't
really safe to make breaking changes
 - users are not really alerted that well to when they are using unstable
pieces
 - we don't have an easy way to determine the impact of any breaking changes
 - we also don't have a clear policy or guidance around underlying
services/client libs making breaking changes (such as services rejecting
older clients)
 - having something both "experimental" and "deprecated" is maybe
confusing, but also just deleting experimental stuff is not safe in the
current state of things

Some proposals that I can think of people made:

 - making experimental features opt-in only (for example by a separate dep
or a flag)
 - putting a version next to any experimental annotation and force review
at that time (lots of objections to this, but noting it for completeness)
 - reviews for graduating on a case-by-case basis, with dev@ thread and
maybe vote
 - try to improve our ability to know usage of experimental features
(really, all features!)

I will start with my own thoughts from here:

*Opt-in*: This is a powerful idea that I think changes everything.
   - for an experimental new IO, a separate artifact; this way we can also
see downloads
   - for experimental code fragments, add checkState that the relevant
experiment is turned on via flags

*Graduation*: Once things are opt-in, the drive to graduate them will be
stronger than it is today. I think vote is appropriate, with rationale
including usage and test coverage and stability, since it is a commitment
by the community to maintain the code, which constitutes most of the TCO of
code.

*Tracking*:
 - We should know what experiments we have and how old they are.
 - It means that just tagging methods and classes "@Experimental" doesn't
really work. I think that is probably a good thing. It is confusing to have
hundreds of tiny experiments. We can target larger-scale experiments.
 - If we regularly poll on twitter or user@ about features then it might
become a source of OK signal, for things where we cannot look at download
stats.

I think with these three approaches, the @Experimental annotation is
actually obsolete. We could still use it to drive some kind of annotation
processor to ensure "if there is @Experimental then there is a checkState"
but I don't have experience doing such things.

Kenn

[1]
https://lists.apache.org/thread.html/1bfe7aa55f8d77c4ddfde39595c9473b233edfcc3255ed38b3f85612@%3Cdev.beam.apache.org%3E


Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2019-11-27 Thread Chuck Yang
I would love to fix this, but not sure if I have the bandwidth at the
moment. Anyway, created the jira here:
https://jira.apache.org/jira/browse/BEAM-8841

Thanks!
Chuck

-- 


*Confidentiality Note:* We care about protecting our proprietary 
information, confidential material, and trade secrets. This message may 
contain some or all of those things. Cruise will suffer material harm if 
anyone other than the intended recipient disseminates or takes any action 
based on this message. If you have received this message (including any 
attachments) in error, please delete it immediately and notify the sender 
promptly.


Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-27 Thread Kenneth Knowles
On Tue, Nov 26, 2019 at 7:00 PM Chamikara Jayalath 
wrote:

>
>
> On Tue, Nov 26, 2019 at 6:17 PM Reza Rokni  wrote:
>
>> With regards to @Experimental there are a couple of discussions around
>> its usage ( or rather over usage! ) on dev@. It is something that we
>> need to clean up ( some of those IO are now being used on production env
>> for years!).
>>
>
> I agree that we should move some IO connectors out of experimental state
> and probably this should be a separate discussion. Also, this issue is
> probably more than for IO connectors since there are other parts of the
> code that is marked as experimental as well, sometimes for a good reason
> (for example, SDF).
>

Yes, let's have a separate thread on @Experimental. There are a ton of
threads that start talking about it, and they all seem to agree it isn't
working. Only one direct thread* that was about something a bit more
specific
https://lists.apache.org/thread.html/302bd51c77feb5c9ce39882316d391535a0fc92e7608a623d9139160@%3Cdev.beam.apache.org%3E




> On Tue, Nov 26, 2019 at 8:21 AM Alexey Romanenko 
 wrote:

> AFAICT, all AWS SDK V1 IOs (SnsIO, SqsIO, DynamoDBIO, KinesisIO) are
> marked as "Experimental". So, it should not be a problem to gracefully
> deprecate and finally remove them. We already did the similar procedure 
> for
> “HadoopInputFormatIO”, which was renamed to just “HadoopFormatIO” (since 
> it
> started to support HadoopOutputFormatI as well). Old “HadoopInputFormatIO”
> was deprecated and removed after *3 consecutive* Beam releases (as we
> agreed on mailing list).
>
> In the same time, some users for some reasons would not be able or to
> want to move on AWS SDK V2. So, I’d prefer to just deprecate AWS SDK V1 
> IOs
> and accept new features/fixes *only* for V2 IOs.
>

> +1 for deprecating AWS V1 IO connectors as opposed to removing as well
> unless we can confirm that usage is extremely limited.
>

+1 to deprecate as soon as there is an alternative.

Trying to measure usage is a good idea, but hard. If the maven coordinates
were different we could look at download numbers and dependencies.


Talking about “Experimental” annotation. Sorry in advance If I missed that
> and switch a subject a bit, but do we have clear rules or an agreement 
> when
> IO becomes stable and should not be marked as experimental anymore?
> *Most* of our Java IOs are marked as Experimental but many of them
> were using in production by real users under real load. Does it mean that
> they are ready to be stable in terms of API? Perhaps, this topic deserves 
> a
> new discussion if there are several opinions on that.
>

> Probably, decision to move component APIs (for example, an IO connector)
> out of experimental state should be done on a case-by-case basis.
>

Let's repeat these good points on a dedicated thread.

Kenn



>
> Thanks,
> Cham
>
>
>>
> On 26 Nov 2019, at 00:39, Luke Cwik  wrote:
>
> Phase I sounds fine.
>
> Apache Beam follows semantic versioning and I believe removing the IOs
> will be a backwards incompatible change unless they were marked
> experimental which will be a problem for Phase 2.
>
> What is the feasibility of making the V1 transforms wrappers around V2?
>
> On Mon, Nov 25, 2019 at 1:46 PM Cam Mach  wrote:
>
>> Hello Beam Devs,
>>
>> I have been working on the migration of Amazon Web Services IO
>> connectors into the new AWS SDK for Java V2. The goal is to have an 
>> updated
>> implementation aligned with the most recent AWS improvements. So far we
>> have already migrated the connectors for AWS SNS, SQS and  DynamoDB.
>>
>> In the meantime some contributions are still going on V1 IOs. So far
>> we have dealt with those by porting (or asking contributors) to port the
>> changes into V2 IOs too because we don’t want features of both versions 
>> to
>> be unaligned but this may quickly become a maintenance issue, so we want 
>> to
>> discuss a plan to stop supporting (deprecate) V1 IOs and encourage users 
>> to
>> move to V2.
>>
>> Phase I (ASAP):
>>
>>- Mark migrated AWS V1 IOs as deprecated
>>- Document migration path to V2
>>
>> Phase II (end of 2020):
>>
>>- Decide a date or Beam release to remove the V1 IOs
>>- Send a notification to the community 3 months before we remove
>>them
>>- Completely get rid of V1 IOs
>>
>>
>> Please let me know what you think or if you see any potential issues?
>>
>> Thanks,
>> Cam Mach
>>
>>
>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> 

Re: Detecting resources to stage

2019-11-27 Thread Luke Cwik
That looks good as well.

I would suggest that we make the classpath scanning system pluggable using
PipelineOptions. For example in GcpOptions[1], we use two default instance
factories. The first one controls which class is used as the factory[2] and
the second one instantiates an instance of that class and creates the
credential[3]. The same strategy could be added where there is a default
instance factory for the set of resources and another option which controls
which class is instantiated to provide that default.

Do you think that we could make the default always:
new ClassGraph()
  .addClassLoader(classLoader)
  .getClasspathURLs();

1:
https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
2:
https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144
3:
https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159

On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov  wrote:

> I didn't think it through, but this is something I have in mind. Keep
> existing implementation for URLClassLoader, and use URLClassLoader for
> experimental support of Java 11.
>
> List urls;
> if (classLoader instanceof URLClassLoader) {
>   urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
> } else {
>   urls = new ClassGraph()
>   .addClassLoader(classLoader)
>   .getClasspathURLs();
> }
>
> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy 
> wrote:
>
>> This looks promising. Do you think you could share your code as well?
>>
>> That part sounds very calming:
>> "ClassGraph is fully compatible with the new JPMS module system (Project
>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
>> module path. However, the code is also fully backwards compatible with JDK
>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
>> all interaction with the module system is implemented via reflection for
>> backwards compatibility)."
>>
>> I'm currently working on rebuilding the classpath detection mechanism so
>> that it scans java.class.path when URLClassLoader cannot be used (as Luke
>> suggested) but if we decide to use classgraph it should be relatively easy
>> to do that instead. Moreover, I want to enable the possibility of injecting
>> any algorithm implementation through pipeline options - this will enable
>> third-party vendors to inject their custom implementations if needed (SPI
>> pattern that was mentioned at some point in a jira ticket). I think I'm
>> pretty close to finishing that.
>>
>> Thanks!
>>
>> śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):
>>
>>> Today I tried using classgraph [1] library to scan classpath in Java 11
>>> instead of using URLClassLoader, and after that, the job worked on
>>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
>>> classgraph doesn't have any dependencies. I'm wondering if we can relocate
>>> it to java-core jar and use in for non-URLClassLoaders?
>>>
>>> [1]: https://github.com/classgraph/classgraph
>>> [2]:
>>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>>>
>>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:
>>>
 I believe the closest suggestion[1] we had that worked for Java 11 and
 maintained backwards compatibility was to use the URLClassLoader to infer
 the resources and if we couldn't do that then look at the java.class.path
 system property to do the inference otherwise fail and force the users to
 tell us what. There are too many scenarios where we will do it wrong
 because of how people package and deploy their code whether it is an
 embedded application server or some other application container with a
 security manager that will prevent us from doing the right thing.

 On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
 wrote:

> Note that resources are more properly tied to specific operations and
> stages, not to the entire pipeline. This is especially true in the
> face of libraries (which should have the ability to declare their own
> resources) and cross-language.
>
> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
> wrote:
> >
> > I figured that it would be good to bump this thread for greater
> visibility even though I don't have a strong opinion about this (yet -
> hopefully, I will know more later to be able to share ;) ).
> >
> > Answering the questions Luke asked will unblock this issue:
> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
> for Java 11 

Hadoop client version 2.8.5 from 2.7 (EOL)

2019-11-27 Thread Tomo Suzuki
Hi Beam developers,

I created a PR to upgrade Hadoop client version.
https://github.com/apache/beam/pull/10222 . However, I don't have Hadoop
cluster to test this.

Can anybody try to see whether this change is compatible with a real Hadoop
2.7 / 2.8 cluster or not?

-- 
Regards,
Tomo


Re: Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Michał Walenia
Hi,
can you please post the command you used in the terminal? It seems you used
a wrong combination of quotes, but I'd need to see it to be sure.
Cheers,
Michal

On Wed, Nov 27, 2019 at 5:11 PM Rehman Murad Ali <
rehman.murad...@venturedive.com> wrote:

> Hi Community,
>
> I have been recently trying to test data flow jobs with the beam. I have
> set up Gcloud account and tried to copy a file from the local system to the
> Gcloud storage (which works fine).
>
> Now I am trying to run preCommitLegacyWorker task on local and I am
> getting the following error:
>
> Unable to instantiate test options from system property
> beamTestPipelineOptions
>
>
> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
> character ('-' (code 45)) in numeric value: expected digit (0-9) to follow
> minus sign, for valid numeric value
>  at [Source:
> (String)"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]";
> line: 1, column: 4]
> at
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693)
> at
> com.fasterxml.jackson.core.base.ParserMinimalBase.reportUnexpectedNumberChar(ParserMinimalBase.java:541)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleInvalidNumberStart(ReaderBasedJsonParser.java:1637)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseNegNumber(ReaderBasedJsonParser.java:1391)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:742)
> at
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1160)
> at
> com.fasterxml.jackson.databind.deser.std.StringArrayDeserializer.deserialize(StringArrayDeserializer.java:145)
>
>
>
> Any help would be appreciable.
>
>
>
> *Thanks & Regards*
>
>
> *Rehman Murad Ali*
>
> Software Engineer
> Mobile: +92 3452076766
> Skype: rehman,muradali
>
> 
>


-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects! 


Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
I didn't think it through, but this is something I have in mind. Keep
existing implementation for URLClassLoader, and use URLClassLoader for
experimental support of Java 11.

List urls;
if (classLoader instanceof URLClassLoader) {
  urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
} else {
  urls = new ClassGraph()
  .addClassLoader(classLoader)
  .getClasspathURLs();
}

On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy 
wrote:

> This looks promising. Do you think you could share your code as well?
>
> That part sounds very calming:
> "ClassGraph is fully compatible with the new JPMS module system (Project
> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
> module path. However, the code is also fully backwards compatible with JDK
> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
> all interaction with the module system is implemented via reflection for
> backwards compatibility)."
>
> I'm currently working on rebuilding the classpath detection mechanism so
> that it scans java.class.path when URLClassLoader cannot be used (as Luke
> suggested) but if we decide to use classgraph it should be relatively easy
> to do that instead. Moreover, I want to enable the possibility of injecting
> any algorithm implementation through pipeline options - this will enable
> third-party vendors to inject their custom implementations if needed (SPI
> pattern that was mentioned at some point in a jira ticket). I think I'm
> pretty close to finishing that.
>
> Thanks!
>
> śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):
>
>> Today I tried using classgraph [1] library to scan classpath in Java 11
>> instead of using URLClassLoader, and after that, the job worked on
>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
>> classgraph doesn't have any dependencies. I'm wondering if we can relocate
>> it to java-core jar and use in for non-URLClassLoaders?
>>
>> [1]: https://github.com/classgraph/classgraph
>> [2]:
>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>>
>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:
>>
>>> I believe the closest suggestion[1] we had that worked for Java 11 and
>>> maintained backwards compatibility was to use the URLClassLoader to infer
>>> the resources and if we couldn't do that then look at the java.class.path
>>> system property to do the inference otherwise fail and force the users to
>>> tell us what. There are too many scenarios where we will do it wrong
>>> because of how people package and deploy their code whether it is an
>>> embedded application server or some other application container with a
>>> security manager that will prevent us from doing the right thing.
>>>
>>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
>>> wrote:
>>>
 Note that resources are more properly tied to specific operations and
 stages, not to the entire pipeline. This is especially true in the
 face of libraries (which should have the ability to declare their own
 resources) and cross-language.

 On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
 wrote:
 >
 > I figured that it would be good to bump this thread for greater
 visibility even though I don't have a strong opinion about this (yet -
 hopefully, I will know more later to be able to share ;) ).
 >
 > Answering the questions Luke asked will unblock this issue:
 https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
 for Java 11 migration (current detecting mechanism does not work with java
 > 8).
 >
 >
 >>
 >> That said letting the user resolve the jars to stage can be saner
 instead of assuming it is in the classpath/loader. I already have a few
 cases where it will fail cause the transforms load the jars from outside
 the app classloader (transforms are isolated).
 >
 >
 >
 > If I understand correctly, at least in Dataflow runner, if users want
 to provide custom resources to stage, they can use filesToStage pipeline
 option. Once the option is not null, the runner doesn't detect the
 resources automatically and stages resources enlisted in the option
 instead. I think this should be the approach common for all runners (if it
 is not the case already).

>>>
>>> Your understanding is correct and consistency across runners for a
>>> pipeline option is good for our users.
>>>
>>>
 >
 > Thanks,
 > Łukasz
 >
 >
 >

>>>
>>> 1: https://github.com/apache/beam/pull/8775
>>>
>>


Exception at testing DataFlow with preCommitLegacyWorker

2019-11-27 Thread Rehman Murad Ali
Hi Community,

I have been recently trying to test data flow jobs with the beam. I have
set up Gcloud account and tried to copy a file from the local system to the
Gcloud storage (which works fine).

Now I am trying to run preCommitLegacyWorker task on local and I am getting
the following error:

Unable to instantiate test options from system property
beamTestPipelineOptions


Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected
character ('-' (code 45)) in numeric value: expected digit (0-9) to follow
minus sign, for valid numeric value
 at [Source:
(String)"[--project=apache-beam-testing,--tempRoot=gs://venturedive-beamers-shoaib-mszb/rehman-java,--runner=TestDataflowRunner,--dataflowWorkerJar=D:\\Workspace\\apache\\beam\\runners\\google-cloud-dataflow-java\\worker\\legacy-worker\\build\\libs\\beam-runners-google-cloud-dataflow-java-legacy-worker-2.18.0-SNAPSHOT.jar,]";
line: 1, column: 4]
at
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693)
at
com.fasterxml.jackson.core.base.ParserMinimalBase.reportUnexpectedNumberChar(ParserMinimalBase.java:541)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleInvalidNumberStart(ReaderBasedJsonParser.java:1637)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parseNegNumber(ReaderBasedJsonParser.java:1391)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:742)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextTextValue(ReaderBasedJsonParser.java:1160)
at
com.fasterxml.jackson.databind.deser.std.StringArrayDeserializer.deserialize(StringArrayDeserializer.java:145)



Any help would be appreciable.



*Thanks & Regards*


*Rehman Murad Ali*

Software Engineer
Mobile: +92 3452076766
Skype: rehman,muradali




Re: Detecting resources to stage

2019-11-27 Thread Łukasz Gajowy
This looks promising. Do you think you could share your code as well?

That part sounds very calming:
"ClassGraph is fully compatible with the new JPMS module system (Project
Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
module path. However, the code is also fully backwards compatible with JDK
7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
all interaction with the module system is implemented via reflection for
backwards compatibility)."

I'm currently working on rebuilding the classpath detection mechanism so
that it scans java.class.path when URLClassLoader cannot be used (as Luke
suggested) but if we decide to use classgraph it should be relatively easy
to do that instead. Moreover, I want to enable the possibility of injecting
any algorithm implementation through pipeline options - this will enable
third-party vendors to inject their custom implementations if needed (SPI
pattern that was mentioned at some point in a jira ticket). I think I'm
pretty close to finishing that.

Thanks!

śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):

> Today I tried using classgraph [1] library to scan classpath in Java 11
> instead of using URLClassLoader, and after that, the job worked on
> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
> classgraph doesn't have any dependencies. I'm wondering if we can relocate
> it to java-core jar and use in for non-URLClassLoaders?
>
> [1]: https://github.com/classgraph/classgraph
> [2]:
> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>
> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:
>
>> I believe the closest suggestion[1] we had that worked for Java 11 and
>> maintained backwards compatibility was to use the URLClassLoader to infer
>> the resources and if we couldn't do that then look at the java.class.path
>> system property to do the inference otherwise fail and force the users to
>> tell us what. There are too many scenarios where we will do it wrong
>> because of how people package and deploy their code whether it is an
>> embedded application server or some other application container with a
>> security manager that will prevent us from doing the right thing.
>>
>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
>> wrote:
>>
>>> Note that resources are more properly tied to specific operations and
>>> stages, not to the entire pipeline. This is especially true in the
>>> face of libraries (which should have the ability to declare their own
>>> resources) and cross-language.
>>>
>>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
>>> wrote:
>>> >
>>> > I figured that it would be good to bump this thread for greater
>>> visibility even though I don't have a strong opinion about this (yet -
>>> hopefully, I will know more later to be able to share ;) ).
>>> >
>>> > Answering the questions Luke asked will unblock this issue:
>>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
>>> for Java 11 migration (current detecting mechanism does not work with java
>>> > 8).
>>> >
>>> >
>>> >>
>>> >> That said letting the user resolve the jars to stage can be saner
>>> instead of assuming it is in the classpath/loader. I already have a few
>>> cases where it will fail cause the transforms load the jars from outside
>>> the app classloader (transforms are isolated).
>>> >
>>> >
>>> >
>>> > If I understand correctly, at least in Dataflow runner, if users want
>>> to provide custom resources to stage, they can use filesToStage pipeline
>>> option. Once the option is not null, the runner doesn't detect the
>>> resources automatically and stages resources enlisted in the option
>>> instead. I think this should be the approach common for all runners (if it
>>> is not the case already).
>>>
>>
>> Your understanding is correct and consistency across runners for a
>> pipeline option is good for our users.
>>
>>
>>> >
>>> > Thanks,
>>> > Łukasz
>>> >
>>> >
>>> >
>>>
>>
>> 1: https://github.com/apache/beam/pull/8775
>>
>


Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
Today I tried using classgraph [1] library to scan classpath in Java 11
instead of using URLClassLoader, and after that, the job worked on
Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
classgraph doesn't have any dependencies. I'm wondering if we can relocate
it to java-core jar and use in for non-URLClassLoaders?

[1]: https://github.com/classgraph/classgraph
[2]:
https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java

On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:

> I believe the closest suggestion[1] we had that worked for Java 11 and
> maintained backwards compatibility was to use the URLClassLoader to infer
> the resources and if we couldn't do that then look at the java.class.path
> system property to do the inference otherwise fail and force the users to
> tell us what. There are too many scenarios where we will do it wrong
> because of how people package and deploy their code whether it is an
> embedded application server or some other application container with a
> security manager that will prevent us from doing the right thing.
>
> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
> wrote:
>
>> Note that resources are more properly tied to specific operations and
>> stages, not to the entire pipeline. This is especially true in the
>> face of libraries (which should have the ability to declare their own
>> resources) and cross-language.
>>
>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy  wrote:
>> >
>> > I figured that it would be good to bump this thread for greater
>> visibility even though I don't have a strong opinion about this (yet -
>> hopefully, I will know more later to be able to share ;) ).
>> >
>> > Answering the questions Luke asked will unblock this issue:
>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
>> for Java 11 migration (current detecting mechanism does not work with java
>> > 8).
>> >
>> >
>> >>
>> >> That said letting the user resolve the jars to stage can be saner
>> instead of assuming it is in the classpath/loader. I already have a few
>> cases where it will fail cause the transforms load the jars from outside
>> the app classloader (transforms are isolated).
>> >
>> >
>> >
>> > If I understand correctly, at least in Dataflow runner, if users want
>> to provide custom resources to stage, they can use filesToStage pipeline
>> option. Once the option is not null, the runner doesn't detect the
>> resources automatically and stages resources enlisted in the option
>> instead. I think this should be the approach common for all runners (if it
>> is not the case already).
>>
>
> Your understanding is correct and consistency across runners for a
> pipeline option is good for our users.
>
>
>> >
>> > Thanks,
>> > Łukasz
>> >
>> >
>> >
>>
>
> 1: https://github.com/apache/beam/pull/8775
>


Re: goVet and clickHouse tests failing

2019-11-27 Thread Elliotte Rusty Harold
I did get through this one, and made the classic mistake of not
immediately committing the steps I took to writing. I believe it
involved some combination of setting go paths in environment
variables. I seem to have added this to the end of my .profile:

export GOROOT=/usr/local/go
export GOPATH=$HOME/go/packages
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

According to my history, I also ran

go get github.com/linkedin/goavro

I really need to dedicate some time to cleaning up the build
documentation. See https://issues.apache.org/jira/browse/BEAM-8798

Right now we have overlapping and sometimes contradictory docs in four
different places:

README.md
CONTRIBUTING.md
https://cwiki.apache.org/confluence/display/BEAM/Contributor+FAQ
https://beam.apache.org/contribute/

We should probably pick one as the source of truth and rewrite the
other three to simply point to it. I propose putting all checkout,
build, test, commit, and push instructions in CONTRIBUTING.md in the
repo. What do folks think?

On Wed, Nov 27, 2019 at 3:51 AM Amogh Tiwari  wrote:
>
> Hi Elliotte,
> I am facing a similar goVet issue. It would be great if you can guide me 
> through the solution. Please let me know the steps that you followed.
> Regards,
> Amogh
>
> On Thu, Nov 21, 2019 at 6:06 PM Elliotte Rusty Harold  
> wrote:
>>
>> Tentatively, the goVet issue does seem to have been an issue with my
>> Go install I have now cleaned up. The clickhouse issue remains, as do
>> several others I'm working through.
>>
>> I've filed https://issues.apache.org/jira/browse/BEAM-8798 to
>> consolidate and update the instructions for getting to a working
>> build.
>>
>> On Thu, Nov 21, 2019 at 6:10 AM Elliotte Rusty Harold
>>  wrote:
>> >
>> > I'm slowly working my way through getting the tests to run and pass.
>> > We have a lot of work to do on the contributing docs to explain how to
>> > setup and run the build. There's clearly a lot of knowledge in
>> > developers' heads and workstations that hasn't yet made it into the
>> > docs.
>> >
>> > The latest is a problem finding "github.com/linkedin/goavro" when I
>> > run goVet. I'm not a go person. Is this something that requires an
>> > extra install? If so, how is it installed? Or is this some error in
>> > the build.gradles? Or perhaps my go config is borked and gradle is
>> > looking in the wrong directory?
>> >
>> > > Task :sdks:go:examples:resolveBuildDependencies
>> > Resolving ./github.com/apache/beam/sdks/go@/home/elharo/beam/sdks/go
>> > .gogradle/project_gopath/src/github.com/apache/beam/sdks/go/examples/vendor/github.com/apache/beam/sdks/go/pkg/beam/io/avroio/avroio.go:28:2:
>> > cannot find package "github.com/linkedin/goavro" in any of:
>> > 
>> > /home/elharo/beam/sdks/go/examples/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/examples/vendor/github.com/linkedin/goavro
>> > (vendor tree)
>> > 
>> > /home/elharo/.gradle/go/binary/1.12/go/src/github.com/linkedin/goavro
>> > (from $GOROOT)
>> > 
>> > /home/elharo/beam/sdks/go/examples/.gogradle/project_gopath/src/github.com/linkedin/goavro
>> > (from $GOPATH)
>> >
>> > > Task :sdks:go:examples:goVet FAILED
>> >
>> > I'm also seeing failures in ClickHouseIOTest:
>> >
>> > > Task :sdks:java:io:clickhouse:test
>> >
>> > org.apache.beam.sdk.io.clickhouse.ClickHouseIOTest > classMethod FAILED
>> > java.lang.IllegalStateException
>> >
>> > org.apache.beam.sdk.io.clickhouse.ClickHouseIOTest > classMethod FAILED
>> > java.lang.NullPointerException
>> >
>> > org.apache.beam.sdk.io.clickhouse.AtomicInsertTest > classMethod FAILED
>> > java.lang.IllegalStateException
>> >
>> > org.apache.beam.sdk.io.clickhouse.AtomicInsertTest > classMethod FAILED
>> > java.lang.NullPointerException
>> >
>> > --
>> > Elliotte Rusty Harold
>> > elh...@ibiblio.org
>>
>>
>>
>> --
>> Elliotte Rusty Harold
>> elh...@ibiblio.org



-- 
Elliotte Rusty Harold
elh...@ibiblio.org


Re: real real-time beam

2019-11-27 Thread Jan Lukavský
> Trigger firings can have decreasing event timestamps w/ the minimum 
timestamp combiner*. I do think the issue at hand is best analyzed in 
terms of the explicit ordering on panes. And I do think we need to have 
an explicit guarantee or annotation strong enough to describe a 
correct-under-all-allowed runners sink. Today an antagonistic runner 
could probably break a lot of things.


Thanks for this insight. I didn't know about the relation between 
trigger firing (event) time - which is always non-decreasing - and the 
resulting timestamp of output pane - which can be affected by timestamp 
combiner and decrease in cases you describe. What actually correlates 
with the pane index at all times is processing time of trigger firings 
with the pane index. Would you say, that if the "annotation that would 
guarantee ordering of panes" could be viewed as a time ordering 
annotation with an additional time domain (event time, processing time)? 
Could then these two be viewed as a single one with some distinguishing 
parameter?


@RequiresTimeSortedInput(Domain.PANE_INDEX | Domain.EVENT_TIME)

?

Event time should be probably made the default, because that is 
information that is accessible with every WindowedValue, while pane 
index is available only after GBK (or generally might be available after 
every keyed sequential operation, but is missing after source for instance).


Jan

On 11/27/19 1:32 AM, Kenneth Knowles wrote:



On Tue, Nov 26, 2019 at 1:00 AM Jan Lukavský > wrote:


> I will not try to formalize this notion in this email. But I
will note that since it is universally assured, it would be zero
cost and significantly safer to formalize it and add an annotation
noting it was required. It has nothing to do with event time
ordering, only trigger firing ordering.

I cannot agree with the last sentence (and I'm really not doing
this on purpose :-)). Panes generally arrive out of order, as
mentioned several times in the discussions linked from this
thread. If we want to ensure "trigger firing ordering", we can use
the pane index, that is correct. But - that is actually equivalent
to sorting by event time, because pane index order will be
(nearly) the same as event time order. This is due to the fact,
that pane index and event time correlate (both are monotonic).

Trigger firings can have decreasing event timestamps w/ the minimum 
timestamp combiner*. I do think the issue at hand is best analyzed in 
terms of the explicit ordering on panes. And I do think we need to 
have an explicit guarantee or annotation strong enough to describe a 
correct-under-all-allowed runners sink. Today an antagonistic runner 
could probably break a lot of things.


Kenn

*In fact, they can decrease via the "maximum" timestamp combiner 
because actually timestamp combiners only apply to the elements that 
particular pane. This is weird, and maybe a design bug, but good to 
know about.


The pane index "only" solves the issue of preserving ordering even
in case where there are multiple firings within the same timestamp
(regardless of granularity). This was mentioned in the initial
discussion about event time ordering, and is part of the design
doc - users should be allowed to provide UDF for extracting
time-correlated ordering field (which means ability to choose a
preferred, or authoritative, observer which assigns unambiguous
ordering to events). Example of this might include Kafka offsets
as well, or any queue index for that matter. This is not yet
implemented, but could (should) be in the future.

The only case where these two things are (somewhat) different is
the case mentioned by @Steve - if the output is stateless ParDo,
which will get fused. But that is only because the processing is
single-threaded per key, and therefore the ordering is implied by
timer ordering (and careful here, many runners don't have this
ordering 100% correct, as of now - this problem luckily appears
only when there are multiple timers per key). Moreover, if there
should be a failure, then the output might (would) get back in
time anyway. If there would be a shuffle operation after
GBK/Combine, then the ordering is no longer guaranteed and must be
explicitly taken care of.

Last note, I must agree with @Rui that all these discussions are
very much related to retractions (precisely the ability to
implement them).

Jan

On 11/26/19 7:34 AM, Kenneth Knowles wrote:

Hi Aaron,

Another insightful observation.

Whenever an aggregation (GBK / Combine per key) has a trigger
firing, there is a per-key sequence number attached. It is
included in metadata known as "PaneInfo" [1]. The value of
PaneInfo.getIndex() is colloquially referred to as the "pane
index". You can also make use of the "on time index" if you like.
The best way to access this 

Re: goVet and clickHouse tests failing

2019-11-27 Thread Amogh Tiwari
Hi Elliotte,
I am facing a similar goVet issue. It would be great if you can guide me
through the solution. Please let me know the steps that you followed.
Regards,
Amogh

On Thu, Nov 21, 2019 at 6:06 PM Elliotte Rusty Harold 
wrote:

> Tentatively, the goVet issue does seem to have been an issue with my
> Go install I have now cleaned up. The clickhouse issue remains, as do
> several others I'm working through.
>
> I've filed https://issues.apache.org/jira/browse/BEAM-8798 to
> consolidate and update the instructions for getting to a working
> build.
>
> On Thu, Nov 21, 2019 at 6:10 AM Elliotte Rusty Harold
>  wrote:
> >
> > I'm slowly working my way through getting the tests to run and pass.
> > We have a lot of work to do on the contributing docs to explain how to
> > setup and run the build. There's clearly a lot of knowledge in
> > developers' heads and workstations that hasn't yet made it into the
> > docs.
> >
> > The latest is a problem finding "github.com/linkedin/goavro" when I
> > run goVet. I'm not a go person. Is this something that requires an
> > extra install? If so, how is it installed? Or is this some error in
> > the build.gradles? Or perhaps my go config is borked and gradle is
> > looking in the wrong directory?
> >
> > > Task :sdks:go:examples:resolveBuildDependencies
> > Resolving ./github.com/apache/beam/sdks/go@/home/elharo/beam/sdks/go
> > .gogradle/project_gopath/src/
> github.com/apache/beam/sdks/go/examples/vendor/github.com/apache/beam/sdks/go/pkg/beam/io/avroio/avroio.go:28:2
> :
> > cannot find package "github.com/linkedin/goavro" in any of:
> > /home/elharo/beam/sdks/go/examples/.gogradle/project_gopath/src/
> github.com/apache/beam/sdks/go/examples/vendor/github.com/linkedin/goavro
> > (vendor tree)
> > /home/elharo/.gradle/go/binary/1.12/go/src/
> github.com/linkedin/goavro
> > (from $GOROOT)
> > /home/elharo/beam/sdks/go/examples/.gogradle/project_gopath/src/
> github.com/linkedin/goavro
> > (from $GOPATH)
> >
> > > Task :sdks:go:examples:goVet FAILED
> >
> > I'm also seeing failures in ClickHouseIOTest:
> >
> > > Task :sdks:java:io:clickhouse:test
> >
> > org.apache.beam.sdk.io.clickhouse.ClickHouseIOTest > classMethod FAILED
> > java.lang.IllegalStateException
> >
> > org.apache.beam.sdk.io.clickhouse.ClickHouseIOTest > classMethod FAILED
> > java.lang.NullPointerException
> >
> > org.apache.beam.sdk.io.clickhouse.AtomicInsertTest > classMethod FAILED
> > java.lang.IllegalStateException
> >
> > org.apache.beam.sdk.io.clickhouse.AtomicInsertTest > classMethod FAILED
> > java.lang.NullPointerException
> >
> > --
> > Elliotte Rusty Harold
> > elh...@ibiblio.org
>
>
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>


Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-27 Thread Ankur Goenka
Congrats Daniel!

On Mon, Nov 25, 2019 at 10:02 PM Tanay Tummalapalli 
wrote:

> Congratulations!
>
> On Mon, Nov 25, 2019 at 11:12 PM Mark Liu  wrote:
>
>> Congratulations, Daniel!
>>
>> On Mon, Nov 25, 2019 at 9:31 AM Ahmet Altay  wrote:
>>
>>> Congratulations, Daniel!
>>>
>>> On Sat, Nov 23, 2019 at 3:47 AM jincheng sun 
>>> wrote:
>>>

 Congrats, Daniel!
 Best,
 Jincheng

 Alexey Romanenko  于2019年11月22日周五 下午5:47写道:

> Congratulations, Daniel!
>
> On 22 Nov 2019, at 09:18, Jan Lukavský  wrote:
>
> Congrats Daniel!
> On 11/21/19 10:11 AM, Gleb Kanterov wrote:
>
> Congratulations!
>
> On Thu, Nov 21, 2019 at 6:24 AM Thomas Weise  wrote:
>
>> Congratulations!
>>
>>
>> On Wed, Nov 20, 2019, 7:56 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Congrats!!
>>>
>>> On Wed, Nov 20, 2019 at 5:21 PM Daniel Oliveira <
>>> danolive...@google.com> wrote:
>>>
 Thank you everyone! I won't let you down. o7

 On Wed, Nov 20, 2019 at 2:12 PM Ruoyun Huang 
 wrote:

> Congrats Daniel!
>
> On Wed, Nov 20, 2019 at 1:58 PM Robert Burke 
> wrote:
>
>> Congrats Daniel! Much deserved.
>>
>> On Wed, Nov 20, 2019, 12:49 PM Udi Meiri 
>> wrote:
>>
>>> Congrats Daniel!
>>>
>>> On Wed, Nov 20, 2019 at 12:42 PM Kyle Weaver <
>>> kcwea...@google.com> wrote:
>>>
 Congrats Dan! Keep up the good work :)

 On Wed, Nov 20, 2019 at 12:41 PM Cyrus Maden 
 wrote:

> Congratulations! This is great news.
>
> On Wed, Nov 20, 2019 at 3:24 PM Rui Wang 
> wrote:
>
>> Congrats!
>>
>>
>> -Rui
>>
>> On Wed, Nov 20, 2019 at 11:48 AM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> Congrats, Daniel!
>>>
>>> On Wed, Nov 20, 2019 at 11:47 AM Kenneth Knowles <
>>> k...@apache.org> wrote:
>>>
 Hi all,

 Please join me and the rest of the Beam PMC in welcoming a
 new committer: Daniel Oliveira

 Daniel introduced himself to dev@ over two years ago and
 has contributed in many ways since then. Daniel has 
 contributed to general
 project health, the portability framework, and all three 
 languages: Java,
 Python SDK, and Go. I would like to particularly highlight how 
 he deleted
 12k lines of dead reference runner code [1].

 In consideration of Daniel's contributions, the Beam PMC
 trusts him with the responsibilities of a Beam committer [2].

 Thank you, Daniel, for your contributions and looking
 forward to many more!

 Kenn, on behalf of the Apache Beam PMC

 [1] https://github.com/apache/beam/pull/8380
 [2]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

>>>
>
> --
> 
> Ruoyun  Huang
>
>
>