Re: [VOTE] Release 2.23.0, release candidate #2

2020-07-22 Thread Ahmet Altay
+1 - I validated py3 quickstarts.

On Wed, Jul 22, 2020 at 6:21 PM Valentyn Tymofieiev 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version
> 2.23.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2],
> which is signed with the key with fingerprint 1DF50603225D29A4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.23.0-RС2" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Maven 3.6.0 and Oracle JDK 1.8.0_201-b09 .
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.23.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release manager.
>
> [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347145
> [2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1106/
> [5] https://github.com/apache/beam/tree/v2.23.0-RC2
> [6] https://github.com/apache/beam/pull/12212
> [7] https://github.com/apache/beam-site/pull/605
> [8] https://github.com/apache/beam/pull/12213
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: [BROKEN] Please add "Fix Version" when resolving or closing Jiras

2020-07-22 Thread Brian Hulette
Is setting the Resolution broken as well? I realized I've been closing
jiras with Resolution "Unresolved" and I can't actually change it to
"Fixed".

On Tue, Jul 21, 2020 at 7:19 AM Maximilian Michels  wrote:

> Also, a friendly reminder to always close the JIRA issue after merging a
> fix. It's easy to forget.
>
> On 20.07.20 21:04, Kenneth Knowles wrote:
> > Hi all,
> >
> > In working on our Jira automation, I've messed up our Jira workflow. It
> > will no longer prompt you to fill in "Fix Version" when you resolve or
> > close an issue. I will be working with infra to restore this. In the
> > meantime, please try to remember to add a Fix Version to each issue that
> > you close, so that we get automated detailed release notes.
> >
> > Kenn
>


Re: ReadFromKafka returns error - RuntimeError: cannot encode a null byte[]

2020-07-22 Thread Robert Bradshaw
On Sat, Jul 18, 2020 at 12:08 PM Chamikara Jayalath 
wrote:

>
>
> On Fri, Jul 17, 2020 at 10:04 PM ayush sharma <1705ay...@gmail.com> wrote:
>
>> Thank you guys for the reply. I am really stuck and could not proceed
>> further.
>> Yes, the previous trial published message had null key.
>> But when I send key:value pair through producer using
>>
>> ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
>> mytopic --property *"parse.key=true" --property "key.separator=:"*
>> > tryKey:tryValue
>>
>> I do not get any error but beam does not print the received message. Here
>> is how my pipeline looks like,
>> result = (
>> pipeline
>>
>> | "Read from kafka" >> ReadFromKafka(
>> consumer_config={
>> "bootstrap.servers": 'localhost:9092',
>> },
>> topics=['mytopic'],
>> expansion_service='localhost:8097',
>>
>> | "print" >> beam.Map(print)
>> )
>>
>>
> I suspect DirectRunner in LOOPBACK mode might not be working for
> cross-language transforms today.
>

When running a Streaming pipeline, the DirectRuner falls back to the old
runner that does not support cross-language.
https://issues.apache.org/jira/browse/BEAM-7514

Please note that cross-language transforms framework is fairly new [1] and
> we are adding support for various runners and environment configurations.
> Can you try with Flink in DOCKER mode ?
>
>
>> If this is not the way we make beam and kafka communicate then please
>> share a working example which showcases how a message published in kafka
>> gets received by beam while streaming.
>>
>
> I'm adding an example but I've only tested this with Dataflow yet. I hope
> to test that example for more runners and add additional instructions
> there.
> https://github.com/apache/beam/pull/12188
>
> Thanks,
> Cham
>
> [1] https://beam.apache.org/roadmap/connectors-multi-sdk/
>
>>
>> Regards,
>> Ayush Sharma
>>
>> On Fri, Jul 17, 2020 at 11:39 PM Chamikara Jayalath 
>> wrote:
>>
>>> Yes, seems like this is due to the key being null. XLang KafkaIO has to
>>> be updated to support this. You should not run into this error if you
>>> publish keys and values that are not null.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 17, 2020 at 8:04 PM Luke Cwik  wrote:
>>>
 +dev 

 On Fri, Jul 17, 2020 at 8:03 PM Luke Cwik  wrote:

> +Heejong Lee  +Chamikara Jayalath
> 
>
> Do you know if your trial record has an empty key or value?
> If so, then you hit a bug and it seems as though there was a miss
> supporting this usecase.
>
> Heejong and Cham,
> It looks like the Javadoc for ByteArrayDeserializer and other
> Deserializers can return null[1, 2] and we aren't using
> NullableCoder.of(ByteArrayCoder.of()) in the expansion[3]. Note that the
> non XLang KafkaIO does this correctly in its regular coder inference
> logic[4]. I flied BEAM-10529[5]
>
> 1:
> https://kafka.apache.org/21/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html#deserialize-java.lang.String-byte:A-
> 2:
> https://kafka.apache.org/21/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html#deserialize-java.lang.String-byte:A-
> 3:
> https://github.com/apache/beam/blob/af2d6b0379d64b522ecb769d88e9e7e7b8900208/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L478
> 4:
> https://github.com/apache/beam/blob/af2d6b0379d64b522ecb769d88e9e7e7b8900208/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/LocalDeserializerProvider.java#L85
> 5: https://issues.apache.org/jira/browse/BEAM-10529
>
>
> On Fri, Jul 17, 2020 at 8:51 AM ayush sharma <1705ay...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am trying to build a streaming beam pipeline in python which should
>> capture messages from kafka and then execute further stages of data
>> fetching from other sources and aggregation. The step-by-step process of
>> what I have built till now is:
>>
>>1.
>>
>>Running Kafka instance on localhost:9092
>>
>>./bin/kafka-server-start.sh ./config/server.properties
>>2.
>>
>>Run beam-flink job server using docker
>>
>>docker run --net=host apache/beam_flink1.10_job_server:latest
>>3.
>>
>>Run beam-kafka pipeline
>>
>> import apache_beam as beamfrom apache_beam.io.external.kafka import 
>> ReadFromKafka, WriteToKafkafrom apache_beam.options.pipeline_options 
>> import PipelineOptions, StandardOptions
>>
>> if __name__ == '__main__':
>> options = PipelineOptions([
>> "--job_endpoint=localhost:8099",
>> "--environment_type=LOOPBACK",
>> "--streaming",
>> "--environment_config={\"command\":\"/opt/apache/beam/boot\"}",
>> ])
>>
>> options = options.view_as(StandardOptions)
>> 

Re: ReadFromKafka returns error - RuntimeError: cannot encode a null byte[]

2020-07-22 Thread Chamikara Jayalath
Yeah, this is a known issue. According to +Boyuan Zhang 's
comment in the bug you should still be able to read as long as Kafka
cluster is set up to auto-commit though and these errors can be safely
ignored. For example, you can set "enable.auto.commit" to "true" in the
consumer config passed to ReadFromKafka.
I haven't tried this myself though so please comment in the JIRA if this is
a true blocker for you for reading from Kafka which will help us to
identify the true priority of this JIRA.

Thanks,
Cham

On Mon, Jul 20, 2020 at 2:40 PM ayush sharma <1705ay...@gmail.com> wrote:

> Is there any workaround to this issue?
>
> On Mon, Jul 20, 2020 at 5:33 PM ayush sharma <1705ay...@gmail.com> wrote:
>
>> Thank you for the suggestions. I tried using FlinkRunner and
>> setting environment_type either to DOCKER or LOOPBACK gives an error -
>> java.lang.UnsupportedOperationException: The ActiveBundle does not have a
>> registered bundle checkpoint handler.
>>
>> I found that this issue has been reported (
>> https://issues.apache.org/jira/browse/BEAM-6868) and hence upvoting it.
>> Thank you for the prompt responses and looking forward to using this
>> feature in the future.
>>
>> Regards,
>> Ayush.
>>
>> On Sat, Jul 18, 2020 at 3:14 PM Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Fri, Jul 17, 2020 at 10:04 PM ayush sharma <1705ay...@gmail.com>
>>> wrote:
>>>
 Thank you guys for the reply. I am really stuck and could not proceed
 further.
 Yes, the previous trial published message had null key.
 But when I send key:value pair through producer using

 ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
 mytopic --property *"parse.key=true" --property "key.separator=:"*
 > tryKey:tryValue

 I do not get any error but beam does not print the received message.
 Here is how my pipeline looks like,
 result = (
 pipeline

 | "Read from kafka" >> ReadFromKafka(
 consumer_config={
 "bootstrap.servers": 'localhost:9092',
 },
 topics=['mytopic'],
 expansion_service='localhost:8097',

 | "print" >> beam.Map(print)
 )


>>> I suspect DirectRunner in LOOPBACK mode might not be working for
>>> cross-language transforms today. Please note that cross-language transforms
>>> framework is fairly new [1] and we are adding support for various runners
>>> and environment configurations.
>>> Can you try with Flink in DOCKER mode ?
>>>
>>>
 If this is not the way we make beam and kafka communicate then please
 share a working example which showcases how a message published in kafka
 gets received by beam while streaming.

>>>
>>> I'm adding an example but I've only tested this with Dataflow yet. I
>>> hope to test that example for more runners and add additional instructions
>>> there.
>>> https://github.com/apache/beam/pull/12188
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1] https://beam.apache.org/roadmap/connectors-multi-sdk/
>>>

 Regards,
 Ayush Sharma

 On Fri, Jul 17, 2020 at 11:39 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Yes, seems like this is due to the key being null. XLang KafkaIO has
> to be updated to support this. You should not run into this error if you
> publish keys and values that are not null.
>
>
>
>
> On Fri, Jul 17, 2020 at 8:04 PM Luke Cwik  wrote:
>
>> +dev 
>>
>> On Fri, Jul 17, 2020 at 8:03 PM Luke Cwik  wrote:
>>
>>> +Heejong Lee  +Chamikara Jayalath
>>> 
>>>
>>> Do you know if your trial record has an empty key or value?
>>> If so, then you hit a bug and it seems as though there was a miss
>>> supporting this usecase.
>>>
>>> Heejong and Cham,
>>> It looks like the Javadoc for ByteArrayDeserializer and other
>>> Deserializers can return null[1, 2] and we aren't using
>>> NullableCoder.of(ByteArrayCoder.of()) in the expansion[3]. Note that the
>>> non XLang KafkaIO does this correctly in its regular coder inference
>>> logic[4]. I flied BEAM-10529[5]
>>>
>>> 1:
>>> https://kafka.apache.org/21/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html#deserialize-java.lang.String-byte:A-
>>> 2:
>>> https://kafka.apache.org/21/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html#deserialize-java.lang.String-byte:A-
>>> 3:
>>> https://github.com/apache/beam/blob/af2d6b0379d64b522ecb769d88e9e7e7b8900208/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L478
>>> 4:
>>> https://github.com/apache/beam/blob/af2d6b0379d64b522ecb769d88e9e7e7b8900208/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/LocalDeserializerProvider.java#L85
>>> 5: https://issues.apache.org/jira/browse/BEAM-10529
>>>
>>>
>>> On Fri, Jul 17, 2020 at 8:51 AM 

Re: Jenkins trigger phrase "run seed job" not working?

2020-07-22 Thread Ahmet Altay
+Damian Gadomski , it might be related to this
change: https://github.com/apache/beam/pull/12319.

/cc +Tyson Hamilton 

On Wed, Jul 22, 2020 at 1:17 PM Udi Meiri  wrote:

> HI,
> I'm trying to test a groovy change but I can't seem to trigger the seed
> job. It worked yesterday so I'm not sure what changed.
>
> https://github.com/apache/beam/pull/12326
>
>


Jenkins trigger phrase "run seed job" not working?

2020-07-22 Thread Udi Meiri
HI,
I'm trying to test a groovy change but I can't seem to trigger the seed
job. It worked yesterday so I'm not sure what changed.

https://github.com/apache/beam/pull/12326


smime.p7s
Description: S/MIME Cryptographic Signature


Re: No space left on device - beam-jenkins 1 and 7

2020-07-22 Thread Robert Bradshaw
On Wed, Jul 22, 2020 at 11:57 AM Tyson Hamilton  wrote:

> Ah I see, thanks Kenn. I found some advice from the Apache infra wiki that
> also suggests using a tmpdir inside the workspace [1]:
>
> Procedures Projects can take to clean up disk space
>
> Projects can help themselves and Infra by taking some basic steps to help
> clean up their jobs after themselves on the build nodes.
>
>
>
>1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned up
>when job workspaces expire.
>
>
Tests should be (able to be) written to use the standard temporary file
mechanisms, and the environment set up on Jenkins such that that falls into
the respective workspaces. Ideally this should be as simple as setting
the TMPDIR (or similar) environment variable (and making sure it exists/is
writable).

>
>1. Configure your jobs to wipe workspaces on start or finish.
>2. Configure your jobs to only keep 5 or 10 previous builds.
>3. Configure your jobs to only keep 5 or 10 previous artifacts.
>
>
>
> [1]:
> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>
> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles  wrote:
>
>> Those file listings look like the result of using standard temp file APIs
>> but with TMPDIR set to /tmp.
>>
>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton 
>> wrote:
>>
>>> Jobs are hermetic as far as I can tell and use unique subdirectories
>>> inside of /tmp. Here is a quick look into two examples:
>>>
>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>> head -n 20
>>> 1.6G2020-07-21 02:25.
>>> 242M2020-07-17 18:48./beam-pipeline-temp3ybuY4
>>> 242M2020-07-17 18:46./beam-pipeline-tempuxjiPT
>>> 242M2020-07-17 18:44./beam-pipeline-tempVpg1ME
>>> 242M2020-07-17 18:42./beam-pipeline-tempJ4EpyB
>>> 242M2020-07-17 18:39./beam-pipeline-tempepea7Q
>>> 242M2020-07-17 18:35./beam-pipeline-temp79qot2
>>> 236M2020-07-17 18:48./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>>> 236M2020-07-17 18:46./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>>> 236M2020-07-17 18:44./beam-pipeline-tempVpg1ME/tmpxSm8pX
>>> 236M2020-07-17 18:42./beam-pipeline-tempJ4EpyB/tmpMZJU76
>>> 236M2020-07-17 18:39./beam-pipeline-tempepea7Q/tmpWy1vWX
>>> 236M2020-07-17 18:35./beam-pipeline-temp79qot2/tmpvN7vWA
>>> 3.7M2020-07-17 18:48./beam-pipeline-temp3ybuY4/tmprlh_di
>>> 3.7M2020-07-17 18:46./beam-pipeline-tempuxjiPT/tmpLmVWfe
>>> 3.7M2020-07-17 18:44./beam-pipeline-tempVpg1ME/tmpvrxbY7
>>> 3.7M2020-07-17 18:42./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>>> 3.7M2020-07-17 18:39./beam-pipeline-tempepea7Q/tmptYF1v1
>>> 3.7M2020-07-17 18:35./beam-pipeline-temp79qot2/tmplfV0Rg
>>> 2.7M2020-07-17 20:10./pip-install-q9l227ef
>>>
>>>
>>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>>> head -n 20
>>> 817M2020-07-21 02:26.
>>> 242M2020-07-19 12:14./beam-pipeline-tempUTXqlM
>>> 242M2020-07-19 12:11./beam-pipeline-tempx3Yno3
>>> 242M2020-07-19 12:05./beam-pipeline-tempyCrMYq
>>> 236M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpstXoL0
>>> 236M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmpnnVn65
>>> 236M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpRF0iNs
>>> 3.7M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>>> 3.7M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmpsmmzqe
>>> 3.7M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>>> 2.0M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpoj3orz
>>> 2.0M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmptng9sZ
>>> 2.0M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpWp6njc
>>> 1.2M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmphgdj35
>>> 1.2M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmp8ySXpm
>>> 1.2M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>>> 992K2020-07-12 12:00./junit642086915811430564
>>> 988K2020-07-12 12:00./junit642086915811430564/beam
>>> 984K2020-07-12 12:00./junit642086915811430564/beam/nodes
>>> 980K2020-07-12 12:00./junit642086915811430564/beam/nodes/0
>>>
>>>
>>>
>>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri  wrote:
>>>
 You're right, job workspaces should be hermetic.



 On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles 
 wrote:

> I'm probably late to this discussion and missing something, but why
> are we writing to /tmp at all? I would expect TMPDIR to point somewhere
> inside the job directory that will be wiped by Jenkins, and I would expect
> code to always create temp files via APIs that respect this. Is Jenkins 
> not
> cleaning up? Do we not have the ability to set this 

Re: No space left on device - beam-jenkins 1 and 7

2020-07-22 Thread Tyson Hamilton
Ah I see, thanks Kenn. I found some advice from the Apache infra wiki that
also suggests using a tmpdir inside the workspace [1]:

Procedures Projects can take to clean up disk space

Projects can help themselves and Infra by taking some basic steps to help
clean up their jobs after themselves on the build nodes.



   1. Use a ./tmp dir in your jobs workspace. That way it gets cleaned up
   when job workspaces expire.
   2. Configure your jobs to wipe workspaces on start or finish.
   3. Configure your jobs to only keep 5 or 10 previous builds.
   4. Configure your jobs to only keep 5 or 10 previous artifacts.



[1]:
https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes

On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles  wrote:

> Those file listings look like the result of using standard temp file APIs
> but with TMPDIR set to /tmp.
>
> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton  wrote:
>
>> Jobs are hermetic as far as I can tell and use unique subdirectories
>> inside of /tmp. Here is a quick look into two examples:
>>
>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 1.6G2020-07-21 02:25.
>> 242M2020-07-17 18:48./beam-pipeline-temp3ybuY4
>> 242M2020-07-17 18:46./beam-pipeline-tempuxjiPT
>> 242M2020-07-17 18:44./beam-pipeline-tempVpg1ME
>> 242M2020-07-17 18:42./beam-pipeline-tempJ4EpyB
>> 242M2020-07-17 18:39./beam-pipeline-tempepea7Q
>> 242M2020-07-17 18:35./beam-pipeline-temp79qot2
>> 236M2020-07-17 18:48./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>> 236M2020-07-17 18:46./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>> 236M2020-07-17 18:44./beam-pipeline-tempVpg1ME/tmpxSm8pX
>> 236M2020-07-17 18:42./beam-pipeline-tempJ4EpyB/tmpMZJU76
>> 236M2020-07-17 18:39./beam-pipeline-tempepea7Q/tmpWy1vWX
>> 236M2020-07-17 18:35./beam-pipeline-temp79qot2/tmpvN7vWA
>> 3.7M2020-07-17 18:48./beam-pipeline-temp3ybuY4/tmprlh_di
>> 3.7M2020-07-17 18:46./beam-pipeline-tempuxjiPT/tmpLmVWfe
>> 3.7M2020-07-17 18:44./beam-pipeline-tempVpg1ME/tmpvrxbY7
>> 3.7M2020-07-17 18:42./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
>> 3.7M2020-07-17 18:39./beam-pipeline-tempepea7Q/tmptYF1v1
>> 3.7M2020-07-17 18:35./beam-pipeline-temp79qot2/tmplfV0Rg
>> 2.7M2020-07-17 20:10./pip-install-q9l227ef
>>
>>
>> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
>> head -n 20
>> 817M2020-07-21 02:26.
>> 242M2020-07-19 12:14./beam-pipeline-tempUTXqlM
>> 242M2020-07-19 12:11./beam-pipeline-tempx3Yno3
>> 242M2020-07-19 12:05./beam-pipeline-tempyCrMYq
>> 236M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpstXoL0
>> 236M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmpnnVn65
>> 236M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpRF0iNs
>> 3.7M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpbJjUAQ
>> 3.7M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmpsmmzqe
>> 3.7M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmp5b3ZvY
>> 2.0M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpoj3orz
>> 2.0M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmptng9sZ
>> 2.0M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpWp6njc
>> 1.2M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmphgdj35
>> 1.2M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmp8ySXpm
>> 1.2M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpNVEJ4e
>> 992K2020-07-12 12:00./junit642086915811430564
>> 988K2020-07-12 12:00./junit642086915811430564/beam
>> 984K2020-07-12 12:00./junit642086915811430564/beam/nodes
>> 980K2020-07-12 12:00./junit642086915811430564/beam/nodes/0
>>
>>
>>
>> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri  wrote:
>>
>>> You're right, job workspaces should be hermetic.
>>>
>>>
>>>
>>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles  wrote:
>>>
 I'm probably late to this discussion and missing something, but why are
 we writing to /tmp at all? I would expect TMPDIR to point somewhere inside
 the job directory that will be wiped by Jenkins, and I would expect code to
 always create temp files via APIs that respect this. Is Jenkins not
 cleaning up? Do we not have the ability to set this up? Do we have bugs in
 our code (that we could probably find by setting TMPDIR to somewhere
 not-/tmp and running the tests without write permission to /tmp, etc)

 Kenn

 On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay  wrote:

> Related to workspace directory growth, +Udi Meiri  filed
> a relevant issue previously (
> https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
> workspace directory after successful jobs. Alternatively, 

Re: Monitoring performance for releases

2020-07-22 Thread Robert Bradshaw
On Tue, Jul 21, 2020 at 9:58 AM Thomas Weise  wrote:

> It appears that there is coverage missing in the Grafana dashboards (it
> could also be that I just don't find it).
>
> For example:
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056
>
> The GBK and ParDo tests have a selection for {batch, streaming} and SDK.
> No coverage for streaming and python? There is also no runner option
> currently.
>
> We have seen repeated regressions with streaming, Python, Flink. The test
> has been contributed. It would be great if the results can be covered as
> part of release verification.
>

Even better would be if we can use these dashboards (plus alerting or
similar?) to find issues before release verification. It's much easier to
fix things earlier.


>
> Thomas
>
>
>
> On Tue, Jul 21, 2020 at 7:55 AM Kamil Wasilewski <
> kamil.wasilew...@polidea.com> wrote:
>
>> The prerequisite is that we have all the stats in one place. They seem
>>> to be scattered across http://metrics.beam.apache.org and
>>> https://apache-beam-testing.appspot.com.
>>>
>>> Would it be possible to consolidate the two, i.e. use the Grafana-based
>>> dashboard to load the legacy stats?
>>
>>
>> I'm pretty sure that all dashboards have been moved to
>> http://metrics.beam.apache.org. Let me know if I missed something during
>> the migration.
>>
>> I think we should turn off https://apache-beam-testing.appspot.com in
>> the near future. New Grafana-based dashboards have been working seamlessly
>> for some time now and there's no point in maintaining the older solution.
>> We'd also avoid ambiguity in where the stats should be looked for.
>>
>> Kamil
>>
>> On Tue, Jul 21, 2020 at 4:17 PM Maximilian Michels 
>> wrote:
>>
>>> > It doesn't support https. I had to add an exception to the HTTPS
>>> Everywhere extension for "metrics.beam.apache.org".
>>>
>>> *facepalm* Thanks Udi! It would always hang on me because I use HTTPS
>>> Everywhere.
>>>
>>> > To be explicit, I am supporting the idea of reviewing the release
>>> guide but not changing the release process for the already in-progress
>>> release.
>>>
>>> I consider the release guide immutable for the process of a release.
>>> Thus, a change to the release guide can only affect new upcoming
>>> releases, not an in-process release.
>>>
>>> > +1 and I think we can also evaluate whether flaky tests should be
>>> reviewed as release blockers or not. Some flaky tests would be hiding real
>>> issues our users could face.
>>>
>>> Flaky tests are also worth to take into account when releasing, but a
>>> little harder to find because may just happen to pass during building
>>> the release. It is possible though if we strictly capture flaky tests
>>> via JIRA and mark them with the Fix Version for the release.
>>>
>>> > We keep accumulating dashboards and
>>> > tests that few people care about, so it is probably worth that we use
>>> > them or get a way to alert us of regressions during the release cycle
>>> > to catch this even before the RCs.
>>>
>>> +1 The release guide should be explicit about which performance test
>>> results to evaluate.
>>>
>>> The prerequisite is that we have all the stats in one place. They seem
>>> to be scattered across http://metrics.beam.apache.org and
>>> https://apache-beam-testing.appspot.com.
>>>
>>> Would it be possible to consolidate the two, i.e. use the Grafana-based
>>> dashboard to load the legacy stats?
>>>
>>> For the evaluation during the release process, I suggest to use a
>>> standardized set of performance tests for all runners, e.g.:
>>>
>>> - Nexmark
>>> - ParDo (Classic/Portable)
>>> - GroupByKey
>>> - IO
>>>
>>>
>>> -Max
>>>
>>> On 21.07.20 01:23, Ahmet Altay wrote:
>>> >
>>> > On Mon, Jul 20, 2020 at 3:07 PM Ismaël Mejía >> > > wrote:
>>> >
>>> > +1
>>> >
>>> > This is not in the release guide and we should probably re
>>> evaluate if
>>> > this should be a release blocking reason.
>>> > Of course exceptionally a performance regression could be
>>> motivated by
>>> > a correctness fix or a worth refactor, so we should consider this.
>>> >
>>> >
>>> > +1 and I think we can also evaluate whether flaky tests should be
>>> > reviewed as release blockers or not. Some flaky tests would be hiding
>>> > real issues our users could face.
>>> >
>>> > To be explicit, I am supporting the idea of reviewing the release
>>> guide
>>> > but not changing the release process for the already in-progress
>>> release.
>>> >
>>> >
>>> > We have been tracking and fixing performance regressions multiple
>>> > times found simply by checking the nexmark tests including on the
>>> > ongoing 2.23.0 release so value is there. Nexmark does not cover
>>> yet
>>> > python and portable runners so we are probably still missing many
>>> > issues and it is worth to work on this. In any case we should
>>> probably
>>> > decide what validations matter. We keep accumulating dashboards and
>>> > 

Re: Beam Jenkins Migration

2020-07-22 Thread Kenneth Knowles
Are Spark and Flink runners benchmarking against local clusters on the
Jenkins VMs? Needless to say that is not a very controlled environment (and
of course not realistic scale). That is probably why Dataflow was not
affected. Is it possible that simply the different version of the Jenkins
worker software and/or the instructions from the Cloudbees instance cause
differing load?

Kenn

On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev 
wrote:

> FYI it looks like the transition to new Jenkins CI is visible on Nexmark
> performance graphs[1][2]. Are new VM nodes less performant than old ones?
>
> [1] hhttp://
> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=1587597387737=1595373387737=batch=All=All
> [2]
> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>
> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton  wrote:
>
>> Currently no. We're already experiencing a backlog of builds so the
>> additional load would be a problem. I've opened two related issues that I
>> think need completion before allowing non-committers to trigger tests:
>>
>> Load sharing improvements:
>> https://issues.apache.org/jira/browse/BEAM-10281
>> Admin access (maybe not required but nice to have):
>> https://issues.apache.org/jira/browse/BEAM-10280
>>
>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track
>> opening up triggering for non-committers.
>>
>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik  wrote:
>>
>>> Was about to ask the same question, so can non-committers trigger the
>>> tests now?
>>>
>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee  wrote:
>>>
 This is awesome. Could non-committers also trigger the test now?

 On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski <
 damian.gadom...@polidea.com> wrote:

> Hello,
>
> Good news, we've just migrated to the new CI:
> https://ci-beam.apache.org. As from now beam projects at
> builds.apache.org are disabled.
>
> If you experience any issues with the new setup please let me know,
> either here or on ASF slack.
>
> Regards,
> Damian
>
> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Happy to see your positive response :)
>>
>> @Udi Meiri, Thanks for pointing that out. I've checked it and indeed
>> it needs some attention.
>>
>> There are two things basing on my research:
>>
>>- data uploaded by performance and load tests by the jobs,
>>directly to the influx DB - that should be handled automatically as 
>> new
>>jobs will upload the same data in the same way
>>- data fetched using Jenkins API by the metrics tool
>>(syncjenkins.py) - here the situation is a bit more complex as the 
>> script
>>relies on the build number (it's used actually as a time reference and
>>primary key in the DB is created from it). To avoid refactoring of the
>>script and database migration to use timestamp instead of build 
>> number I've
>>just "fast-forwarded" the numbers on the new
>>https://ci-beam.apache.org to follow current numbering from the
>>old CI. Therefore simple replacement of the Jenkins URL in the metrics
>>scripts should do the trick to have continuous metrics data. I'll 
>> check
>>that tomorrow on my local grafana instance.
>>
>> Please let me know if there's anything that I missed.
>>
>> Regards,
>> Damian
>>
>> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Great! Thank you for working on this and letting us know.
>>>
>>> On 12 Jun 2020, at 16:58, Damian Gadomski <
>>> damian.gadom...@polidea.com> wrote:
>>>
>>> Hello,
>>>
>>> During the last few days, I was preparing for the Beam Jenkins
>>> migration from builds.apache.org to ci-beam.apache.org. The new
>>> Jenkins Master will be dedicated only for Beam related jobs, all Beam
>>> Committers will have build configure access, and Beam PMC will have 
>>> Admin
>>> (GUI) Access.
>>>
>>> We (in cooperation with Infra) are almost ready for the migration
>>> itself and I want to share with you the details of our plan. We are
>>> planning to start the migration next week, most likely on Tuesday. I'll
>>> keep you updated on the progress. We do not expect any issues nor the
>>> outage of the CI services, everything should be more or less 
>>> unnoticeable.
>>> Just don't be surprised that the Jenkins URL will change to
>>> https://ci-beam.apache.org
>>>
>>> If you are curious, here are the steps that we are going to take:
>>>
>>> 1. Create 16 new CI nodes that will be connected to the new CI. We
>>> will then have simultaneously running two CI servers.
>>> 2. 

Re: Beam Jenkins Migration

2020-07-22 Thread Damian Gadomski
Hey, thanks for pointing that out. As I replied in the issue, the nodes
should have exactly the same configuration. They are all `n1-highmem-16 (16
vCPUs, 104 GB memory)` - exactly as on the old CI. They were also created
from the same disk images and the disk type is also the same (Standard
persistent disk, 500GB).

We increased the number of workers on the nodes (and therefore the number
of consecutive jobs running on them) but that's unrelated as we did it a
few days after the migration. Performance graphs show an immediate effect.

Regards,
Damian

On Wed, Jul 22, 2020 at 1:17 AM Valentyn Tymofieiev 
wrote:

> FYI it looks like the transition to new Jenkins CI is visible on Nexmark
> performance graphs[1][2]. Are new VM nodes less performant than old ones?
>
> [1] hhttp://
> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=1587597387737=1595373387737=batch=All=All
> [2]
> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>
> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton  wrote:
>
>> Currently no. We're already experiencing a backlog of builds so the
>> additional load would be a problem. I've opened two related issues that I
>> think need completion before allowing non-committers to trigger tests:
>>
>> Load sharing improvements:
>> https://issues.apache.org/jira/browse/BEAM-10281
>> Admin access (maybe not required but nice to have):
>> https://issues.apache.org/jira/browse/BEAM-10280
>>
>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track
>> opening up triggering for non-committers.
>>
>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik  wrote:
>>
>>> Was about to ask the same question, so can non-committers trigger the
>>> tests now?
>>>
>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee  wrote:
>>>
 This is awesome. Could non-committers also trigger the test now?

 On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski <
 damian.gadom...@polidea.com> wrote:

> Hello,
>
> Good news, we've just migrated to the new CI:
> https://ci-beam.apache.org. As from now beam projects at
> builds.apache.org are disabled.
>
> If you experience any issues with the new setup please let me know,
> either here or on ASF slack.
>
> Regards,
> Damian
>
> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Happy to see your positive response :)
>>
>> @Udi Meiri, Thanks for pointing that out. I've checked it and indeed
>> it needs some attention.
>>
>> There are two things basing on my research:
>>
>>- data uploaded by performance and load tests by the jobs,
>>directly to the influx DB - that should be handled automatically as 
>> new
>>jobs will upload the same data in the same way
>>- data fetched using Jenkins API by the metrics tool
>>(syncjenkins.py) - here the situation is a bit more complex as the 
>> script
>>relies on the build number (it's used actually as a time reference and
>>primary key in the DB is created from it). To avoid refactoring of the
>>script and database migration to use timestamp instead of build 
>> number I've
>>just "fast-forwarded" the numbers on the new
>>https://ci-beam.apache.org to follow current numbering from the
>>old CI. Therefore simple replacement of the Jenkins URL in the metrics
>>scripts should do the trick to have continuous metrics data. I'll 
>> check
>>that tomorrow on my local grafana instance.
>>
>> Please let me know if there's anything that I missed.
>>
>> Regards,
>> Damian
>>
>> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Great! Thank you for working on this and letting us know.
>>>
>>> On 12 Jun 2020, at 16:58, Damian Gadomski <
>>> damian.gadom...@polidea.com> wrote:
>>>
>>> Hello,
>>>
>>> During the last few days, I was preparing for the Beam Jenkins
>>> migration from builds.apache.org to ci-beam.apache.org. The new
>>> Jenkins Master will be dedicated only for Beam related jobs, all Beam
>>> Committers will have build configure access, and Beam PMC will have 
>>> Admin
>>> (GUI) Access.
>>>
>>> We (in cooperation with Infra) are almost ready for the migration
>>> itself and I want to share with you the details of our plan. We are
>>> planning to start the migration next week, most likely on Tuesday. I'll
>>> keep you updated on the progress. We do not expect any issues nor the
>>> outage of the CI services, everything should be more or less 
>>> unnoticeable.
>>> Just don't be surprised that the Jenkins URL will change to
>>> https://ci-beam.apache.org
>>>
>>> If you are curious, here are the steps that we are going 

Re: No space left on device - beam-jenkins 1 and 7

2020-07-22 Thread Kenneth Knowles
Those file listings look like the result of using standard temp file APIs
but with TMPDIR set to /tmp.

On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton  wrote:

> Jobs are hermetic as far as I can tell and use unique subdirectories
> inside of /tmp. Here is a quick look into two examples:
>
> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
> head -n 20
> 1.6G2020-07-21 02:25.
> 242M2020-07-17 18:48./beam-pipeline-temp3ybuY4
> 242M2020-07-17 18:46./beam-pipeline-tempuxjiPT
> 242M2020-07-17 18:44./beam-pipeline-tempVpg1ME
> 242M2020-07-17 18:42./beam-pipeline-tempJ4EpyB
> 242M2020-07-17 18:39./beam-pipeline-tempepea7Q
> 242M2020-07-17 18:35./beam-pipeline-temp79qot2
> 236M2020-07-17 18:48./beam-pipeline-temp3ybuY4/tmpy_Ytzz
> 236M2020-07-17 18:46./beam-pipeline-tempuxjiPT/tmpN5_UfJ
> 236M2020-07-17 18:44./beam-pipeline-tempVpg1ME/tmpxSm8pX
> 236M2020-07-17 18:42./beam-pipeline-tempJ4EpyB/tmpMZJU76
> 236M2020-07-17 18:39./beam-pipeline-tempepea7Q/tmpWy1vWX
> 236M2020-07-17 18:35./beam-pipeline-temp79qot2/tmpvN7vWA
> 3.7M2020-07-17 18:48./beam-pipeline-temp3ybuY4/tmprlh_di
> 3.7M2020-07-17 18:46./beam-pipeline-tempuxjiPT/tmpLmVWfe
> 3.7M2020-07-17 18:44./beam-pipeline-tempVpg1ME/tmpvrxbY7
> 3.7M2020-07-17 18:42./beam-pipeline-tempJ4EpyB/tmpLTb6Mj
> 3.7M2020-07-17 18:39./beam-pipeline-tempepea7Q/tmptYF1v1
> 3.7M2020-07-17 18:35./beam-pipeline-temp79qot2/tmplfV0Rg
> 2.7M2020-07-17 20:10./pip-install-q9l227ef
>
>
> @apache-ci-beam-jenkins-11:/tmp$ sudo du -ah --time . | sort -rhk 1,1 |
> head -n 20
> 817M2020-07-21 02:26.
> 242M2020-07-19 12:14./beam-pipeline-tempUTXqlM
> 242M2020-07-19 12:11./beam-pipeline-tempx3Yno3
> 242M2020-07-19 12:05./beam-pipeline-tempyCrMYq
> 236M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpstXoL0
> 236M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmpnnVn65
> 236M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpRF0iNs
> 3.7M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpbJjUAQ
> 3.7M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmpsmmzqe
> 3.7M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmp5b3ZvY
> 2.0M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmpoj3orz
> 2.0M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmptng9sZ
> 2.0M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpWp6njc
> 1.2M2020-07-19 12:14./beam-pipeline-tempUTXqlM/tmphgdj35
> 1.2M2020-07-19 12:11./beam-pipeline-tempx3Yno3/tmp8ySXpm
> 1.2M2020-07-19 12:05./beam-pipeline-tempyCrMYq/tmpNVEJ4e
> 992K2020-07-12 12:00./junit642086915811430564
> 988K2020-07-12 12:00./junit642086915811430564/beam
> 984K2020-07-12 12:00./junit642086915811430564/beam/nodes
> 980K2020-07-12 12:00./junit642086915811430564/beam/nodes/0
>
>
>
> On Mon, Jul 20, 2020 at 6:46 PM Udi Meiri  wrote:
>
>> You're right, job workspaces should be hermetic.
>>
>>
>>
>> On Mon, Jul 20, 2020 at 1:24 PM Kenneth Knowles  wrote:
>>
>>> I'm probably late to this discussion and missing something, but why are
>>> we writing to /tmp at all? I would expect TMPDIR to point somewhere inside
>>> the job directory that will be wiped by Jenkins, and I would expect code to
>>> always create temp files via APIs that respect this. Is Jenkins not
>>> cleaning up? Do we not have the ability to set this up? Do we have bugs in
>>> our code (that we could probably find by setting TMPDIR to somewhere
>>> not-/tmp and running the tests without write permission to /tmp, etc)
>>>
>>> Kenn
>>>
>>> On Mon, Jul 20, 2020 at 11:39 AM Ahmet Altay  wrote:
>>>
 Related to workspace directory growth, +Udi Meiri  filed
 a relevant issue previously (
 https://issues.apache.org/jira/browse/BEAM-9865) for cleaning up
 workspace directory after successful jobs. Alternatively, we can consider
 periodically cleaning up the /src directories.

 I would suggest moving the cron task from internal cron scripts to the
 inventory job (
 https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L51).
 That way, we can see all the cron jobs as part of the source tree, adjust
 frequencies and clean up codes with PRs. I do not know how internal cron
 scripts are created, maintained, and how would they be recreated for new
 worker instances.

 /cc +Tyson Hamilton 

 On Mon, Jul 20, 2020 at 4:50 AM Damian Gadomski <
 damian.gadom...@polidea.com> wrote:

> Hey,
>
> I've recently created a solution for the growing /tmp directory. Part
> of it is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
> intentionally not 

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-22 Thread Alexey Romanenko
We do and it includes SparkJobServerDriver. I guess the source was needed 
before just to build a JobServer Docker image for SparkRunner but, starting 
from 2.20, we publish them in release process [1]. So we just need to update 
documentation accordingly [2].

[1] https://issues.apache.org/jira/browse/BEAM-9022
[2] https://issues.apache.org/jira/browse/BEAM-9857

> On 22 Jul 2020, at 13:12, Maximilian Michels  wrote:
> 
>> On the SparkRunner page, we advise users to download Beam sources and build 
>> JobService. So I think it would be better just to add a note there about 
>> this issue with old branches. 
> 
> Why is that? Don't we publish the Spark job server jar?
> 
> -Max
> 
> On 21.07.20 18:20, Alexey Romanenko wrote:
>> On the SparkRunner page, we advise users to download Beam sources and build 
>> JobService. So I think it would be better just to add a note there about 
>> this issue with old branches.
>>> On 20 Jul 2020, at 22:29, Kenneth Knowles >>  >> 
>>> wrote:
>>> 
>>> I think it is fine to fix it in branches. I do not see too much value in 
>>> fixing it except in branches you know you are going to use.
>>> 
>>> The "Downloads" page is for users and only mentioned the voted source 
>>> releases, maven central, and pypi. There is nothing to do with GitHub or 
>>> ongoing branches there. I don't think un-published cherrypicks to branches 
>>> matter to users. Did you mean some other place?
>>> 
>>> Kenn
>>> 
>>> On Mon, Jul 20, 2020 at 9:44 AM Alexey Romanenko >>  >> >> wrote:
>>> 
>>>Then, would it be ok to fix it in branches (the question is how
>>>many branches we should fix?) with additional commit and mention
>>>that on “Downloads" page?
>>> 
On 8 Jul 2020, at 21:24, Kenneth Knowles >>> 
>> wrote:
 
 
 
On Wed, Jul 8, 2020 at 12:07 PM Kyle Weaver >>> 
>> wrote:
 
> To fix on previous release branches, we would need to make
a new release, is it not? Since hashes would change..
 
Would it be alright to patch the release branches on Github
and leave the released source as-is? Github release branches
themselves aren't release artifacts, so I think it should be
okay to patch them without making a new release.
 
 
Yea. There are tags for the exact hashes that RCs were built
from. The release branch is fine to get new commits, and then if
anyone wants to build a patch release they will get those commits.
 
Kenn
 
On Wed, Jul 8, 2020 at 11:59 AM Pablo Estrada
mailto:pabl...@google.com> 
 >> wrote:
 
Ah that's annoying that a dependency would be removed
from maven. I thought that was not meant to happen? This
must be an issue happening for many other projects...
Why is errorprone a dependency anyway?
 
To fix on previous release branches, we would need to
make a new release, is it not? Since hashes would change..
 
On Wed, Jul 8, 2020 at 10:21 AM Alexey Romanenko
mailto:aromanenko@gmail.com>
>> wrote:
 
Hi Max,
 
I’m +1 for back porting as well but that seems quite
complicated since we distribute release source code
from https://archive.apache.org/ 
 
Perhaps, we should just warn users about this issue
and how to workaround it.
 
Any other ideas?
 
> On 8 Jul 2020, at 11:46, Maximilian Michels
mailto:m...@apache.org> 
 >> wrote:
>
> Hi Alexey,
>
> I also came across this issue when building a
custom Beam version. I applied the same fix
(https://github.com/apache/beam/pull/11527) which you
have mentioned.
>
> It appears that the Maven dependencies changed or
are no longer available which causes the missing
class files.
>
> +1 for backporting the fix to the release branches.
>
> Cheers,
> Max
 

Re: Errorprone plugin fails for release branches <2.22.0

2020-07-22 Thread Maximilian Michels
On the SparkRunner page, we advise users to download Beam sources and build JobService. So I think it would be better just to add a note there about this issue with old branches. 


Why is that? Don't we publish the Spark job server jar?

-Max

On 21.07.20 18:20, Alexey Romanenko wrote:
On the SparkRunner page, we advise users to download Beam sources and 
build JobService. So I think it would be better just to add a note there 
about this issue with old branches.


On 20 Jul 2020, at 22:29, Kenneth Knowles > wrote:


I think it is fine to fix it in branches. I do not see too much value 
in fixing it except in branches you know you are going to use.


The "Downloads" page is for users and only mentioned the voted source 
releases, maven central, and pypi. There is nothing to do with GitHub 
or ongoing branches there. I don't think un-published cherrypicks to 
branches matter to users. Did you mean some other place?


Kenn

On Mon, Jul 20, 2020 at 9:44 AM Alexey Romanenko 
mailto:aromanenko@gmail.com>> wrote:


Then, would it be ok to fix it in branches (the question is how
many branches we should fix?) with additional commit and mention
that on “Downloads" page?


On 8 Jul 2020, at 21:24, Kenneth Knowles mailto:k...@apache.org>> wrote:



On Wed, Jul 8, 2020 at 12:07 PM Kyle Weaver mailto:kcwea...@google.com>> wrote:

> To fix on previous release branches, we would need to make
a new release, is it not? Since hashes would change..

Would it be alright to patch the release branches on Github
and leave the released source as-is? Github release branches
themselves aren't release artifacts, so I think it should be
okay to patch them without making a new release.


Yea. There are tags for the exact hashes that RCs were built
from. The release branch is fine to get new commits, and then if
anyone wants to build a patch release they will get those commits.

Kenn

On Wed, Jul 8, 2020 at 11:59 AM Pablo Estrada
mailto:pabl...@google.com>> wrote:

Ah that's annoying that a dependency would be removed
from maven. I thought that was not meant to happen? This
must be an issue happening for many other projects...
Why is errorprone a dependency anyway?

To fix on previous release branches, we would need to
make a new release, is it not? Since hashes would change..

On Wed, Jul 8, 2020 at 10:21 AM Alexey Romanenko
mailto:aromanenko@gmail.com>> wrote:

Hi Max,

I’m +1 for back porting as well but that seems quite
complicated since we distribute release source code
from https://archive.apache.org/
Perhaps, we should just warn users about this issue
and how to workaround it.

Any other ideas?

> On 8 Jul 2020, at 11:46, Maximilian Michels
mailto:m...@apache.org>> wrote:
>
> Hi Alexey,
>
> I also came across this issue when building a
custom Beam version. I applied the same fix
(https://github.com/apache/beam/pull/11527) which you
have mentioned.
>
> It appears that the Maven dependencies changed or
are no longer available which causes the missing
class files.
>
> +1 for backporting the fix to the release branches.
>
> Cheers,
> Max
>
> On 08.07.20 11:36, Alexey Romanenko wrote:
>> Hello,
>> Some days ago I noticed that I can’t build the
project from old release branches . For example, I
wanted to build and run Spark Job Server from
“release-2.20.0” branch and it failed:
>> ./gradlew :runners:spark:job-server:runShadow
—stacktrace
>> * Exception is:
>> org.gradle.api.tasks.TaskExecutionException:
Execution failed for task ':model:pipeline:compileJava’.
>> …
>> Caused by: org.gradle.internal.UncheckedException:
java.lang.ClassNotFoundException:
com.google.errorprone.ErrorProneCompiler$Builder
>> …
>> I experienced the same issue for “release-2.19.0”
and  “release-2.21.0” branches, I didn’t check older
branches but seems it’s a global issue for
“net.ltgt.gradle:gradle-errorprone-plugin:0.0.13".
>> This is already known issue and it was fixed for
2.22.0 [1] a while ago. By applying a fix from [2] on
top of previous branch, for example,