Re: contributor permission for Beam Jira tickets

2022-08-12 Thread Kanishk Karanawat
Sounds good Robert. I had initially come across this link [1]. But it looks
like GitHub is used for tracking tasks now.

Thanks.

[1]
https://cwiki.apache.org/confluence/display/BEAM/Beam+Jira+Beginner%27s+Guide


On Sat, Aug 13, 2022 at 12:51 AM Robert Burke  wrote:

> Beam uses GitHub Issues directly these days.
>
> Commenting
>
> .take-issue
>
> will assign an issue to you.
>
> See beam.apache.org/contribute for more details.
>
> That said:
>
> Do we still have instructions that point to the legacy tracker? Could you
> link them, if so?
>
> Or do you need to make a modification to an older issue for some reason?
>
> On Fri, Aug 12, 2022, 9:28 PM Kanishk Karanawat  wrote:
>
>> Hi,
>>
>>
>> Could someone kindly add me as a contributor to Beam's Jira issue
>> tracker? I would like to create/assign tickets. My ASF Jira username is
>> *Kanishk.k*
>>
>>
>> Thanks,
>>
>> Kanishk
>>
>

-- 
Kanishk Karanawat


Re: contributor permission for Beam Jira tickets

2022-08-12 Thread Robert Burke
Beam uses GitHub Issues directly these days.

Commenting

.take-issue

will assign an issue to you.

See beam.apache.org/contribute for more details.

That said:

Do we still have instructions that point to the legacy tracker? Could you
link them, if so?

Or do you need to make a modification to an older issue for some reason?

On Fri, Aug 12, 2022, 9:28 PM Kanishk Karanawat  wrote:

> Hi,
>
>
> Could someone kindly add me as a contributor to Beam's Jira issue tracker? I
> would like to create/assign tickets. My ASF Jira username is *Kanishk.k*
>
>
> Thanks,
>
> Kanishk
>


contributor permission for Beam Jira tickets

2022-08-12 Thread Kanishk Karanawat
Hi,


Could someone kindly add me as a contributor to Beam's Jira issue tracker? I
would like to create/assign tickets. My ASF Jira username is *Kanishk.k*


Thanks,

Kanishk


Re: [VOTE] Release 2.41.0, release candidate #1

2022-08-12 Thread Ahmet Altay via dev
+1 - I validated python quickstarts on direct runner.

Thank you Kiley!



On Thu, Aug 11, 2022 at 9:56 PM Kiley Sok via dev 
wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 2.41.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 4D5731CC0AA38097D091EB091E7B28884452AE5D [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.41.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle 7.4 and OpenJDK/Oracle JDK
> 1.8.0_232.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Validation sheet with a tab for 2.41.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> blog post at https://beam.apache.org/blog/validate-beam-release/.
>
> Thanks,
> Release Manager
>
> [1] https://github.com/apache/beam/milestone/3
> [2] https://dist.apache.org/repos/dist/dev/beam/2.41.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1282/
> [5] https://github.com/apache/beam/tree/v2.41.0-RC1
> [6] https://github.com/apache/beam/pull/22706
> [7] https://github.com/apache/beam-site/pull/633
> [8] https://pypi.org/project/apache-beam/2.41.0rc1/
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=331459080
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: Representation of logical type beam:logical_type:datetime:v1

2022-08-12 Thread Brian Hulette via dev
Ah sorry, I forgot that INT64 is encoded with VarIntCoder, so we can't
simulate TimestampCoder with a logical type.

I think the ideal end state would be to have a well-defined
beam:logical_type:millis_instant that we use for cross-language (when
appropriate), and never use DATETIME at cross-language boundaries. Would it
be possible to add millis_instant, and use that for JDBC read/write instead
of DATETIME?

Separately we could consider how to resolve the conflicting definitions of
beam:logical_type:datetime:v1. I'm not quite sure how/if we can do that
without breaking pipeline update.

Brian


On Fri, Aug 12, 2022 at 7:50 AM Yi Hu via dev  wrote:

> Hi Cham,
>
> Thanks for the comments.
>
>
>>
>>>
>>> ii. "beam:logical_type:instant:v1" is still backed by INT64, but in
>>> implementation it will use BigEndianLongCoder to encode/decode the stream.
>>>
>>>
>> Is this to be compatible with the current Java implementation ? And we
>> have to update other SDKs to use big endian coders when encoding/decoding
>> the "beam:logical_type:instant:v1" logical type ?
>>
>>
> Yes, and the proposal is aimed to keep the Java SDK change minimal; we
> have to update other SDKs to make it work. Currently python and go sdk does
> not implement "beam:logical_type:datetime:v1" (will
> be "beam:logical_type:instant:v1") at all.
>
>
>>
>>
>>> For the second step ii, the problem is that there is a primitive type
>>> backed by a fixed length integer coder. Currently INT8, INT16, INT32,
>>> INT64... are all backed by VarInt (and there is ongoing work to use fixed
>>> size big endian to encode INT8, INT16 (
>>> https://github.com/apache/beam/issues/19815)). Ideally I would think
>>> (INT8, INT16, INT32, INT64) are all fixed and having a generic (INT)
>>> primitive type is backed by VarInt. But this may be a more substantial
>>> change for the current code base.
>>>
>>
>> I'm a bit confused by this. Did you mean that there's *no* primitive
>> type backed by a fixed length integer coder ? Also, by primitive, I'm
>> assuming you mean Beam Schema types here.
>>
>>
> Yes I mean Beam Schema types here. The proto for datetime(instant) logical
> type is constructed here:
> https://github.com/apache/beam/blob/cf9ea1f442636f781b9f449e953016bb39622781/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java#L202
> It is represented by an INT64 atomic type. In cross-language case, another
> SDK receives proto and decodes the stream according to the proto. Currently
> I do not see an atomic type that will be decoded using a fixed-length
> BigEndianLong coder. INT8, ..., INT64 will all be decoded with VarInt.
>
> As a workaround in the PR (#22561), in python's RowCoder I explicitly set
> the coder for "beam:logical_type:datetime:v1" (will
> be "beam:logical_type:instant:v1") to be TimestampCoder. I do not find a
> way to keep the logic contained in the logical type implementation, e.g. in
> to_language_type and to_representation_type method. To do this I will need
> an atomic type that is decoded using the BigEndianLong coder.
> Please point out if I was wrong.
>
> Best,
> Yi
>


Re: Design Doc for Controlling Batching in RunInference

2022-08-12 Thread Brian Hulette via dev
Hi Andy,

Thanks for writing this up! This seems like something that Batched DoFns
could help with. Could we make a BatchConverter [1] that represents the
necessary transformations here, and define RunInference as a Batched DoFn?
Note that the Numpy BatchConverter already enables users to specify a batch
dimension using a custom typehint, like NumpyArray[np.int64, (N, 10)] (the
N identifies the batch dimension) [2]. I think we could do something
similar, but with pytorch types. It's likely we'd need to define our own
typehints though, I suspect pytorch typehints aren't already parameterized
by size.

Brian


[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
[2]
https://github.com/apache/beam/blob/3173b503beaf30c4d32a4a39c709fd81e8161907/sdks/python/apache_beam/typehints/batch_test.py#L42

On Fri, Aug 12, 2022 at 12:36 PM Andy Ye via dev 
wrote:

> Hi everyone,
>
> I've written up a design doc
> 
>  [1]
> on controlling batching in RunInference. I'd appreciate any feedback.
> Thanks!
>
> Summary:
> Add a custom stacking function to RunInference to enable users to control
> how they want their data to be stacked. This addresses issues regarding
> data that have existing batching dimensions, or different sizes.
>
> Best,
> Andy
>
> [1]
> https://docs.google.com/document/d/1l40rOTOEqrQAkto3r_AYq8S_L06dDgoZu-4RLKAE6bo/edit#
>


Design Doc for Controlling Batching in RunInference

2022-08-12 Thread Andy Ye via dev
Hi everyone,

I've written up a design doc

[1]
on controlling batching in RunInference. I'd appreciate any feedback.
Thanks!

Summary:
Add a custom stacking function to RunInference to enable users to control
how they want their data to be stacked. This addresses issues regarding
data that have existing batching dimensions, or different sizes.

Best,
Andy

[1]
https://docs.google.com/document/d/1l40rOTOEqrQAkto3r_AYq8S_L06dDgoZu-4RLKAE6bo/edit#


Re: Community event cooperation

2022-08-12 Thread Brittany Hermann via dev
Hi!

Would 8:00 AM PST on Wednesday, August 17th work for you?

On Thu, Aug 11, 2022 at 8:49 PM 曾辉  wrote:

> Hi, Ahmet,
>
> Thank you for reaching out. We will definitely post the consensus we
> reached at the meeting here and determine when it will be held,.
> Now waiting for Brittany, and Danielle to reply when it is convenient for
> the meeting
>
> Ahmet Altay  于2022年8月12日周五 10:45写道:
>
>> Thank you Niko, Brittany, Danielle. Sharing about Beam in different
>> regions sounds like a good idea.
>>
>> Once you have a call, could you please summarize your discussion and the
>> next steps here?
>>
>> On Wed, Aug 10, 2022 at 9:37 PM 曾辉  wrote:
>>
>>> The following is our detailed introduction
>>>
>>> 曾辉  于2022年8月11日周四 11:21写道:
>>>
 Great, I have time.
 because our time zones are different, we recommend you make an
 appointment for a ZOOM meeting there. Pacific time can be selected from
 8-10 in the morning or at night. You can choose a specific day, and we will
 attend the meeting on time.

>>>
>> Maybe I am missing something. Was there supposed to be a link for ZOOM ?
>>
>>
>>>
 Brittany Hermann  于2022年8月11日周四 00:22写道:

> Hi Niko,
>
> It's great to meet you! @Danielle Syse  and I would
> love to set up a quick call with you to learn more and see how we could
> collaborate. Are you available next week?
>
> On Wed, Aug 10, 2022 at 8:21 AM 曾辉  wrote:
>
>> Hi, Ahmet,
>>
>> Thank you for reaching out. In fact, the friend I'm currently
>> looking for may be an organizer who can help us coordinate events. Of
>> course, the form of our cooperation is very open, either online or 
>> offline.
>> We can organize events in Silicon Valley, USA. Next event, you know, we
>> have a community manager in the United States, and there are some 
>> lecturers
>> who can provide topics. There is no space limit online. As for the form 
>> of
>> cooperation, I will give some examples. I hope to share with friends in 
>> the
>> Beam community. Let's discuss and see which one you like better on your 
>> side
>>
>> Here, I reiterate a position, I want to cooperate with the excellent
>> Apache Beam community for the purpose of technical exchange, and I found
>> that there are users who use DolphinScheduler and Beam at the same time,
>> but I am worried that this user cannot speak in English. I'll keep trying
>> to get in touch with him.
>>
>> On the other hand, we found that the Apache Beam community has done
>> little evangelism work in China. In fact, we have a great influence in 
>> the
>> field of big data technology in China, and we can help the Apache Beam
>> community to expand its influence in China. We have done technical 
>> exchange
>> activities with many excellent apache projects, which is a good form of
>> evangelism, which can not only allow more people to learn knowledge but
>> also let more people know about our project.
>>
>> So, from this aspect, I also hope to jointly hold a Meetup with the
>> Apache Beam community. The basic introduction to Apache DolphinScheduler 
>> I
>> have put in the google documentation in the first email, and I will
>> introduce our detailed presentation later. send over.
>>
>> As you said, I think possible collaborations are the following:
>> First, one or more speakers from the DolphinScheduler community
>> become Apache Beam online/offline sharing guests. We can all be offline 
>> in
>> the United States.
>>
>> Second, we invite developers from the Apache Beam community to join
>> our Meetup and be a guest speakers to give a speech
>>
>> Third, what I have been asking is that we organize a meetup together.
>> Both parties will invest resources to hold this technical exchange 
>> meetup,
>> collide with the developers on speech topics, and promote together, and 
>> we
>> will give the greatest support in China. This event, let more developers
>> know about Apache Beam
>>
>> Fourth, you can even just mention each other's names in the process
>> of speaking at their respective community technical activities. I think 
>> it
>> is also a way of cooperation.
>>
>> Fifth, it is a pity to miss the Beam summit, otherwise, we can also
>> use this identity to cooperate
>>
>> Speaking of this, I still want to hear it. At present, if the Beam
>> community can do technical exchange activities with other open source
>> communities in the field of big data, which cooperation method is more
>> preferred, and I look forward to your reply
>>
>> the best
>> Niko
>>
>> Ahmet Altay via dev  于2022年8月9日周二 07:03写道:
>>
>>> Hi Niko,
>>>
>>> Thank you for reaching out. We do have contributors who might be
>>> interested in participating but they 

Re: Representation of logical type beam:logical_type:datetime:v1

2022-08-12 Thread Yi Hu via dev
Hi Cham,

Thanks for the comments.


>
>>
>> ii. "beam:logical_type:instant:v1" is still backed by INT64, but in
>> implementation it will use BigEndianLongCoder to encode/decode the stream.
>>
>>
> Is this to be compatible with the current Java implementation ? And we
> have to update other SDKs to use big endian coders when encoding/decoding
> the "beam:logical_type:instant:v1" logical type ?
>
>
Yes, and the proposal is aimed to keep the Java SDK change minimal; we have
to update other SDKs to make it work. Currently python and go sdk does not
implement "beam:logical_type:datetime:v1" (will
be "beam:logical_type:instant:v1") at all.


>
>
>> For the second step ii, the problem is that there is a primitive type
>> backed by a fixed length integer coder. Currently INT8, INT16, INT32,
>> INT64... are all backed by VarInt (and there is ongoing work to use fixed
>> size big endian to encode INT8, INT16 (
>> https://github.com/apache/beam/issues/19815)). Ideally I would think
>> (INT8, INT16, INT32, INT64) are all fixed and having a generic (INT)
>> primitive type is backed by VarInt. But this may be a more substantial
>> change for the current code base.
>>
>
> I'm a bit confused by this. Did you mean that there's *no* primitive type
> backed by a fixed length integer coder ? Also, by primitive, I'm assuming
> you mean Beam Schema types here.
>
>
Yes I mean Beam Schema types here. The proto for datetime(instant) logical
type is constructed here:
https://github.com/apache/beam/blob/cf9ea1f442636f781b9f449e953016bb39622781/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaTranslation.java#L202
It is represented by an INT64 atomic type. In cross-language case, another
SDK receives proto and decodes the stream according to the proto. Currently
I do not see an atomic type that will be decoded using a fixed-length
BigEndianLong coder. INT8, ..., INT64 will all be decoded with VarInt.

As a workaround in the PR (#22561), in python's RowCoder I explicitly set
the coder for "beam:logical_type:datetime:v1" (will
be "beam:logical_type:instant:v1") to be TimestampCoder. I do not find a
way to keep the logic contained in the logical type implementation, e.g. in
to_language_type and to_representation_type method. To do this I will need
an atomic type that is decoded using the BigEndianLong coder.
Please point out if I was wrong.

Best,
Yi


Beam High Priority Issue Report (69)

2022-08-12 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P0 Issues:

https://github.com/apache/beam/issues/22696 [Bug]: Contact us. The Action of 
the purpose "Contact users and contributors in real time through the ASF slack 
workspace" contains  the word "#general" without the link


Unassigned P1 Issues:

https://github.com/apache/beam/issues/22642 [Bug]: Dataflow fails to drain a 
job when using BigQuery (java sdk v.2.38)
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner 
test items cost exactly 5 seconds to run
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21257 Either Create or DirectRunner fails 
to produce all elements to the following transform