Re: [DISCUSS] Migrate Jira to GitHub Issues?

2021-12-07 Thread Jarek Potiuk
> Do I understand correctly that this transition (if it will happen) includes 
> the transfer of all Beam Jira archive to GitHub issues with a proper 
> statuses/comments/refs/etc? If not, what are the options?

Suggestion from the experience of Airflow again - you can look it up
in our notes.

We've tried it initially to copy the issues manually or in bulk, but
eventually we decided to tap into the wisdom and cooperation of our
community.

We migrated some (not many) important things only and asked our users
to move the important issues if they think they are still
relevant/important to them. We closed the JIRA for entry and left the
issues in JIRA in read-only state so that we could always refer to
them if needed.

So rather than proactively copy the issues, we asked the users to make
the decision which issues are important to them and proactively move
it and we left an option of reactive moving if someone came back to
the issue later.

That turned out to be a smart decision considering the effort it would
require to smartly move the issues vs. the results achieved. And
helped us to clean some "stale/useless/not important" issues.

We've had 1719 open JIRA issues when we migrated. Over the course of
~1.5 years (since about April 2020) we've had ~140 issues that refer
to any of the JIRA issues
https://github.com/apache/airflow/issues?q=is%3Aissue+is%3Aclosed+%22https%3A%2F%2Fissues.apache.org%2Fjira%22+.
Currently we have > 4500 GH issues (3700 closed, 800 opened).

This means that roughly speaking only < 10% of original open JIRA
issues were actually somewhat valuable (roughly speaking of course)
and they were < 5% of today's numbers. Of course some of the new GH
issues duplicated those JIRA ones. But not many I think, especially
that those issues in JIRA referred mostly to older Airflow versions.

One more comment for the migration - I STRONGLY recommend using well
designed templates for GH issues from day one. That significantly
improves the quality of issues - and using Discussions as the place
where you move unclear/not reproducible issues (and for example
guiding users to use discussions if they have no clearly reproducible
case). This significantly reduces the "bad issue overload" (see also
more detailed comments in
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191332632).

I personally think a well designed issue entry process for new issues
is more important than migrating old issues in bulk. Especially if you
will ask users to help - as they will have to make a structured entry
with potentially more detailed information/reproducibility) or they
will decide themselves that opening a github discussion is better than
opening an issue if they do not have a reproducible case. Or they will
give up if too much information is needed (but this means that their
issue is essentially not that important IMHO).

But this is just friendly advice from the experience of those who did
it quite some time ago :)

J.

On Wed, Dec 8, 2021 at 1:08 AM Brian Hulette  wrote:
>
> At this point I just wanted to see if the community is interested in such a 
> change or if there are any hard blockers. If we do go down this path I think 
> we should port jiras over to GH Issues. You're right this isn't trivial, 
> there's no ready-made solution we can use, we'd need to decide on a mapping 
> for everything and write a tool to do the migration. It sounds like there may 
> be other work in this area we can build on (e.g. Airflow may have made a tool 
> we can work from?).
>
> I honestly don't have much experience with GH Issues so I can't provide 
> concrete examples of better usability (maybe Jarek can?). From my perspective:
> - I hear a lot of grumbling about jira, and a lot of praise for GitHub Issues.
> - Most new users/contributors already have a GitHub account, and very few 
> already have an ASF account. It sounds silly, but I'm sure this is a barrier 
> for engaging with the community. Filing an issue, or commenting on one to 
> provide additional context, or asking a clarifying question about a starter 
> task should be very quick and easy - I bet a lot of these interactions are 
> blocked at the jira registration page.
>
> Brian
>
> On Tue, Dec 7, 2021 at 9:04 AM Alexey Romanenko  
> wrote:
>>
>> Do I understand correctly that this transition (if it will happen) includes 
>> the transfer of all Beam Jira archive to GitHub issues with a proper 
>> statuses/comments/refs/etc? If not, what are the options?
>>
>> Since this transfer looks quite complicated at the first glance, what are 
>> the real key advantages (some concrete examples are very appreciated) to 
>> initiate this process and what are the show-stoppers for us with a current 
>> Jira workflow?
>>
>> —
>> Alexey
>>
>> On 6 Dec 2021, at 19:48, Udi Meiri  wrote:
>>
>> +1 on migrating to GH issues.
>> We will need to update our release process. Hopefully it'll make it simpler.
>>
>>
>> On Sat, Dec 4, 2021 at 2:35 AM Jarek Potiuk  wrote:
>>>

Help configuring and running jenkins tests on JDK 17

2021-12-07 Thread Fernando Morales Martinez
Hi everyone!
As part of this JIRA task ,
a few weeks ago I added  new
jenkins tests and configurations to run such tests against the Java v17 SDK

I'm currently facing two issues to make the whole change work:

1.- Even after running the seed job, most of the new tests are not being
picked up. The status shown is: This project is currently disabled. Does
anyone know how to enable the tests? Weird thing is, another set of tests
were picked up. Some failed, some were successful.

2.- As for the failing ones, the error I'm getting has to do with the path
of the Java 17 SDK which I set up as: */usr/lib/jvm/java-17-openjdk-amd64*,
basically the same convention as SDK 8 (*/usr/lib/jvm/java-8-openjdk-amd64)
*and 11 (*/usr/lib/jvm/java-11-openjdk-amd64*). I don't see any other
configuration change as part of the efforts made when setting up the Java
11 SDK harness.
Does anyone know what could be causing these errors?

Thanks!
-Fer

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


GCS staging location when uploading artifacts from python for dataflow

2021-12-07 Thread Steve Niemitz
I noticed that the python dataflow runner appends some "uniqueness" (the
timestamp) [1] to the staging directory when staging artifacts for a
dataflow job.  This is very suboptimal because it makes caching artifacts
between job runs useless.

The jvm runner doesn't do this, is there a good reason the python one
does?  Or is this just an oversight that hasn't been fixed yet?

[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L467


Re: [DISCUSS] Migrate Jira to GitHub Issues?

2021-12-07 Thread Alexey Romanenko
Do I understand correctly that this transition (if it will happen) includes the 
transfer of all Beam Jira archive to GitHub issues with a proper 
statuses/comments/refs/etc? If not, what are the options?

Since this transfer looks quite complicated at the first glance, what are the 
real key advantages (some concrete examples are very appreciated) to initiate 
this process and what are the show-stoppers for us with a current Jira workflow?

—
Alexey

> On 6 Dec 2021, at 19:48, Udi Meiri  wrote:
> 
> +1 on migrating to GH issues.
> We will need to update our release process. Hopefully it'll make it simpler.
> 
> 
> On Sat, Dec 4, 2021 at 2:35 AM Jarek Potiuk  > wrote:
> Just to add a comment on those requirements Kenneth, looking into the
> near future.
> 
> Soon GitHub issues will open for GA a whole new way of interacting
> with the issues (without removing the current way) which will greatly
> improve iI think all aspects of what You mentioned). The issues (and
> associated projects) will gain new capabilities:
> 
> * structured metadata that you will be able to define (much better
> than unstructured labels)
> * table-like visualisations which will allow for fast, bulk,
> keyboard-driven management
> * better automation of workflows
> * complete APIs to manage the issues (good for GitHub Actions
> integration for example)
> 
> Re: assigning by non-committers is one of the things that won't work
> currently. Only comitters can assign the issues, and only if a user
> commented on the issue. But it nicely works - when a user comments "I
> want to work on that issue", a committer assigns the user. And It
> could be easily automated as well.
> 
> You can see what it will is about here: https://github.com/features/issues 
> 
> 
> They are currently at the "Public Beta" and heading towards General
> Availability, but it is not available to "open" projects yet. However
> I have a promise from the GitHub Product manager (my friend heads the
> team implementing it) that ASF will be the first on the list when the
> public projects will be enabled, because it looks like it will make
> our triaging and organisation much better.
> 
> J.
> 
> On Sat, Dec 4, 2021 at 1:46 AM Kenneth Knowles  > wrote:
> >
> > This sounds really good to me. Much more familiar to newcomers. I think we 
> > end up doing a lot more ad hoc stuff with labels, yes? Probably worth 
> > having a specific plan. Things I care about:
> >
> > - priorities with documented meaning
> > - targeting issues to future releases
> > - basic visualizations (mainly total vs open issues over time)
> > - tags / components
> > - editing/assigning by non-committers
> > - workflow supporting "needs triage" (default) -> open -> resolved
> >
> > I think a lot of the above is done via ad hoc labels but I'm not sure if 
> > there are other fancy ways to do it.
> >
> > Anyhow we should switch even if there is a feature gap for the sake of 
> > community.
> >
> > Kenn
> >
> > On Fri, Dec 3, 2021 at 3:06 PM David Huntsperger  > > wrote:
> >>
> >> Yes, please. I can help clean up the website issues as part of a migration.
> >>
> >> On Fri, Dec 3, 2021 at 1:46 PM Robert Burke  >> > wrote:
> >>>
> >>> Similar thing happened for Go migrating to use GH issues for everything 
> >>> from Language Feature proposals to bugs. Much easier than the very gerrit 
> >>> driven process it was before, and User Discussions are far more 
> >>> discoverable by users: they usually already have a GH account, and don't 
> >>> need to create a new separate one.
> >>>
> >>> GitHub does seem to permit user directed templates for issues so we can 
> >>> simplify issue triage by users: Eg for Go there are a number of requests 
> >>> one can make: https://github.com/golang/go/issues/new/choose 
> >>> 
> >>>
> >>> On Fri, Dec 3, 2021, 12:17 PM Andy Ye  >>> > wrote:
> 
>  Chiming in from the perspective of a new Beam contributor. +1 on Github 
>  issues. I feel like it would be easier to learn about and contribute to 
>  existing issues/bugs if it were tracked in the same place as that of the 
>  source code, rather than bouncing back and forth between the two 
>  different sites.
> 
>  On Fri, Dec 3, 2021 at 1:18 PM Jarek Potiuk   > wrote:
> >
> > Comment from a friendly outsider.
> >
> > TL; DR; Yes. Do migrate. Highly recommended.
> >
> > There were already similar discussions happening recently (community
> > and infra mailing lists) and as a result I captured Airflow's
> > experiences and recommendations in the BUILD wiki. You might find some
> > hints and suggestions to follow as well as our experiences at Airflow:
> > 

Default output timestamp of processing-time timers

2021-12-07 Thread Steve Niemitz
If I have a processing time timer, is there any way to automatically set
the output timestamp to the timer firing timestamp (similar to how
event-time timers work).

A common use case would be to do something like:
timer.offset(X).align(Y).setRelative()

but have the output timestamp be the firing timestamp.  In order to do this
now you need to re-calculate the output timestamp (using the same logic as
the timer does internally) and manually use withOutputTimestamp.

I'm not sure what the API would look like here, but it would also be nice
to allow event-time timers to do the same in reverse (use the element
input timestamp rather than the firing timestamp).  Maybe something like
`withDefaultOutputTimestampFrom(...)` and an enum of FIRING_TIMESTAMP,
ELEMENT_TIMESTAMP?


Re: [VOTE] Release 2.35.0, release candidate #2

2021-12-07 Thread Alexey Romanenko
+1 (binding) Tested with "beam-samples" (Java SDK, Spark Runner) but seems we 
need to do another RC because of Reuven’s issue.

—
Alexey

> On 7 Dec 2021, at 05:58, Reuven Lax  wrote:
> 
> The fix is in https://github.com/apache/beam/pull/16146 
>  . Unfortunately slightly more 
> involved than simply marking a variable transient. (I looked to see if we 
> could roll back the original commit that caused the regression, but it looks 
> like that might be even more complicated at this point)
> 
> On Mon, Dec 6, 2021 at 7:54 PM Reuven Lax  > wrote:
> -1
> 
> Unfortunately the regression in BigQueryIO wasn't fixed - one case was missed 
> due to unsynced branch.
> 
> On Mon, Dec 6, 2021 at 10:57 AM Chamikara Jayalath  > wrote:
> +1 (binding).
> Validated some Java and multi-language scenarios and updated the spreadsheet.
> 
> Thanks,
> Cham
> 
> On Fri, Dec 3, 2021 at 7:35 PM Ahmet Altay  > wrote:
> +1 (binding) - I validated a few of the python quickstarts. 
> 
> Thank you Valentyn!
> 
> On Thu, Dec 2, 2021 at 10:47 PM Valentyn Tymofieiev  > wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version 2.35.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> Reviewers are encouraged to test their own use cases with the release 
> candidate, and vote +1 if no issues are found.
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
>  [2], which is signed with the key with fingerprint 
> A182A23D3AA31B4874BE23FAE0256093E9D4DB9A [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.35.0-RC2" [5],
> * website pull request listing the release [6], the blog post [6], and 
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 6.9.1 and Oracle JDK 1.8.0_201.
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org  [2] and PyPI[8].
> * Validation sheet with a tab for 2.35.0 release to help with validation [9].
> * Docker images published to Docker Hub [10].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> For guidelines on how to try the release in your projects, check out our blog 
> post at https://beam.apache.org/blog/validate-beam-release/ 
> .
> 
> Thanks,
> Valentyn
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12350406
>  
> 
>  
> [2] https://dist.apache.org/repos/dist/dev/beam/2.35.0/ 
> 
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1240/ 
> 
> [5] https://github.com/apache/beam/tree/v2.35.0-RC2 
> 
> [6] https://github.com/apache/beam/pull/16115 
> 
> [7] https://github.com/apache/beam-site/pull/621 
> 
> [8] https://pypi.org/project/apache-beam/2.35.0rc2/ 
> 
> [9] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=.
>  
> ..
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image 
> 


Re: Contributor permission for Beam Jira Tickets

2021-12-07 Thread Alexey Romanenko
Done. 

Welcome to Beam, Daniela!

—
Alexey

> On 6 Dec 2021, at 23:35, Daniela Martín  wrote:
> 
> Hello everyone,
> 
> Hope you are doing well. 
> 
> I'm Daniela and I'm currently working at Wizeline. I would like to be added 
> as a contributor in the Beam Jira issue tracker to assign myself to a couple 
> of Beam tasks. 
> 
> My JiraID is: danimartin
> 
> Thank you in advance!
> 
> Regards,
> -- 
> Daniela Martín <> (She/Her) <> |  
> Site Reliability Engineer
> daniela.mar...@wizeline.com 
> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.
> Follow us Twitter  | Facebook 
>  | Instagram 
>  | LinkedIn 
> 
> Share feedback on Clutch  
> 
> This email and its contents (including any attachments) are being sent to
> you on the condition of confidentiality and may be protected by legal
> privilege. Access to this email by anyone other than the intended recipient
> is unauthorized. If you are not the intended recipient, please immediately
> notify the sender by replying to this message and delete the material
> immediately from your system. Any further use, dissemination, distribution
> or reproduction of this email is strictly prohibited. Further, no
> representation is made with respect to any content contained in this email.