Re: [Apace BEAM Go improvements] Lazy map side inputs

2021-03-29 Thread Ahmet Altay
Adding some folks who might be able to help: @Robert Burke 
 @Kenneth Knowles  @Tyson Hamilton 

On Mon, Mar 29, 2021 at 2:31 PM Miguel Anzo Palomo 
wrote:

> Hi,
> I was checking out this task BEAM-3293
>  and I'm having some
> issues fully understanding the idea of how side inputs work internally. Is
> there any resource or specific example that can help to better understand
> how they work and why/where the lazy map implementation would help so I can
> get a better grasp of the task?
>
> Thanks
>
> --
>
> Miguel Angel Anzo Palomo | WIZELINE
>
> Software Engineer
>
> miguel.a...@wizeline.com
>
> Remote Office
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


Flaky test issue report

2021-03-29 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12061: beam_PostCommit_SQL failing on 
KafkaTableProviderIT.testFakeNested 
(https://issues.apache.org/jira/browse/BEAM-12061)
BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11733: [beam_PostCommit_Java] [testFhirIO_Import|export] flaky 
(https://issues.apache.org/jira/browse/BEAM-11733)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11662: elasticsearch tests failing 
(https://issues.apache.org/jira/browse/BEAM-11662)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on 
direct runner. (https://issues.apache.org/jira/browse/BEAM-11541)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-11493: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyAndWindows
 (https://issues.apache.org/jira/browse/BEAM-11493)
BEAM-11492: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows
 (https://issues.apache.org/jira/browse/BEAM-11492)
BEAM-11491: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMultipleWindows
 (https://issues.apache.org/jira/browse/BEAM-11491)
BEAM-11490: Spark test failure: 
org.apache.beam.sdk.transforms.ReifyTimestampsTest.inValuesSucceeds 
(https://issues.apache.org/jira/browse/BEAM-11490)
BEAM-11489: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-11489)
BEAM-11488: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics
 (https://issues.apache.org/jira/browse/BEAM-11488)
BEAM-11487: Spark test failure: 
org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsShouldApplyTimestamps
 (https://issues.apache.org/jira/browse/BEAM-11487)
BEAM-11486: Spark test failure: 
org.apache.beam.sdk.testing.PAssertTest.testSerializablePredicate 
(https://issues.apache.org/jira/browse/BEAM-11486)
BEAM-11485: Spark test failure: 
org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineNullValues 
(https://issues.apache.org/jira/browse/BEAM-11485)
BEAM-11484: Spark test failure: 
org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics 
(https://issues.apache.org/jira/browse/BEAM-11484)
BEAM-11483: Spark PostCommit Test Improvements 
(https://issues.apache.org/jira/browse/BEAM-11483)
BEAM-10995: Java + Universal Local Runner: 
WindowingTest.testWindowPreservation fails 
(https://issues.apache.org/jira/browse/BEAM-10995)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10901: Flaky test: 
PipelineInstrumentTest.test_able_to_cache_intermediate_unbounded_source_pcollection
 (https://issues.apache.org/jira/browse/BEAM-10901)
BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM 
(https://issues.apache.org/jira/browse/BEAM-10899)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10763: Spotless flake (NullPointerException) 
(https://issues.apache.org/jira/browse/BEAM-10763)
BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types 
(https://issues.apache.org/jira/browse/BEAM-10590)
BEAM-10589: Samza ValidatesRunner failure: 
testParDoWithSi

P1 issues report

2021-03-29 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12060: beam_PostCommit_Go_VR_Flink and beam_PostCommit_Go_VR_Spark 
failing since Mar 23, 2021 6:00:00 AM 
(https://issues.apache.org/jira/browse/BEAM-12060)
BEAM-12056: [2.29.0 cherrypick] DataframeTransfom, BatchRowsAsDataFrame do 
not preserve field order when schema created with beam.Row 
(https://issues.apache.org/jira/browse/BEAM-12056)
BEAM-12050: ParDoTest TimerTests that use TestStream failing for portable 
FlinkRunner (https://issues.apache.org/jira/browse/BEAM-12050)
BEAM-11965: testSplitQueryFnWithLargeDataset timeout failures 
(https://issues.apache.org/jira/browse/BEAM-11965)
BEAM-11961: InfluxDBIOIT failing with unauthorized error 
(https://issues.apache.org/jira/browse/BEAM-11961)
BEAM-11959: Python Beam SDK Harness hangs when installing pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11922: 
org.apache.beam.examples.cookbook.MapClassIntegrationIT.testDataflowMapState 
has been failing in master (https://issues.apache.org/jira/browse/BEAM-11922)
BEAM-11886: MapState and SetState failing tests on Dataflow streaming 
(https://issues.apache.org/jira/browse/BEAM-11886)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11772: GCP BigQuery sink (file loads) uses runner determined sharding 
for unbounded data (https://issues.apache.org/jira/browse/BEAM-11772)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner 
(https://issues.apache.org/jira/browse/BEAM-11576)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 
(https://issues.apache.org/jira/browse/BEAM-11227)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10883: XmlIO parsing of multibyte characters 
(https://issues.apache.org/jira/browse/BEAM-10883)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10663: CrossLanguageKafkaIOTest broken on Flink Runner 
(https://issues.apache.org/jira/browse/BEAM-10663)
BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine 
results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617)
BEAM-10573: CSV files are loaded several times if they are too large 
(https://issues.apache.org/jira/browse/BEAM-10573)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9917: BigQueryBatchFileLoads dynamic destination 
(https://issues.apache.org/jira/browse/BEAM-9917)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9154: Move Chicago Taxi Example to Python 3 
(https://issues.apache.org/jira/browse/BEAM-9154)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails to get created 
(https://issues.apache.org/jira/browse/BEAM-7195)
BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or 
above (https://issues.apache.org/jira/browse/BEAM-6839)
BEAM-6466: KafkaIO doesn't commit offsets while being used as bounded 
source (https://issues.apache.org/jira/browse/BEAM-6466)
BEAM-5997: EVENT_TIME timer throws ex

Hi Team

2021-03-29 Thread Uday Singh
Hi Everyone,

This is Uday and i will be working on Apache Beam from GCP dataflow team. 
Looking forward to contributing to the community.

Thanks
Uday


Re: Hi Team

2021-03-29 Thread Aizhamal Nurmamat kyzy
Welcome, Uday!

On Mon, Mar 29, 2021 at 12:57 PM Ahmet Altay  wrote:

> Welcome Uday!
>
> On Mon, Mar 29, 2021 at 11:23 AM Uday Singh  wrote:
>
>> Hi Everyone,
>>
>> This is Uday and i will be working on Apache Beam from GCP dataflow team.
>> Looking forward to contributing to the community.
>>
>> Thanks
>> Uday
>>
>


[Apace BEAM Go improvements] Lazy map side inputs

2021-03-29 Thread Miguel Anzo Palomo
Hi,
I was checking out this task BEAM-3293
 and I'm having some
issues fully understanding the idea of how side inputs work internally. Is
there any resource or specific example that can help to better understand
how they work and why/where the lazy map implementation would help so I can
get a better grasp of the task?

Thanks

-- 

Miguel Angel Anzo Palomo | WIZELINE

Software Engineer

miguel.a...@wizeline.com

Remote Office

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


BEAM-9114 - BigQueryIO TableRowParser Arrow & Avro (Java SDK)

2021-03-29 Thread Isidro Martinez
Hello Team,

I'm taking a look at the ticket BEAM-9114
. It seems it was opened
in the PR 10369

(currently closed)
but it wasn't merged with master. Do you think we should close this ticket
or the same ticket it is related in another part of the code?

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


Re: Hi Team

2021-03-29 Thread Ahmet Altay
Welcome Uday!

On Mon, Mar 29, 2021 at 11:23 AM Uday Singh  wrote:

> Hi Everyone,
>
> This is Uday and i will be working on Apache Beam from GCP dataflow team.
> Looking forward to contributing to the community.
>
> Thanks
> Uday
>


Re: Please help triage issues!

2021-03-29 Thread Kenneth Knowles
We are down to about 550.

I randomly selected some long-time contributors who I am sure know about
components and priorities well enough. There are 10-15 issues across a
number of people. If these are already good, then it would close out a lot
of them and help focus on the ones that need attention.

This Jira search searches by "current user" so you should see the bugs that
you have reported that are still marked as "Triage Needed". Take a quick
look and if you are confident you got the components, priority, labels
(especially "currently-failing" and "flake") then you could bulk edit them
to "Open" status:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20%3D%20%22Triage%20Needed%22%20AND%20reporter%20in%20(currentUser())

Kenn

On Mon, Mar 15, 2021 at 10:28 AM Tyson Hamilton  wrote:

> There is a 'Triaged' button that I click:
> https://photos.app.goo.gl/Ub5Qwnpp6aFrmaDZ9
>
> On Mon, Mar 15, 2021 at 9:48 AM Alex Amato  wrote:
>
>> (Do I need certain permissions to be able to do this?)
>>
>> On Mon, Mar 15, 2021 at 9:47 AM Alex Amato  wrote:
>>
>>> Would you mind posting a screenshot of exactly where you are supposed to
>>> click to move a jira issue to "Open" status? I honestly can't find where to
>>> click. I don't see the option in the edit dialog box
>>>
>>> On Sun, Mar 14, 2021 at 8:03 PM Kenneth Knowles  wrote:
>>>
 No need for feeling any guilt :-)

 I'm just hoping that by everyone randomly doing a very small amount of
 work, this could be in good shape very quickly. I've done a number of bulk
 edits like automated dependency upgrade requests which brings the number
 down to just over 600.

 Your message does highlight some easy cases: issues filed to track your
 own feature work. I did built automation for this: "On Issue Created" ->
 "If Assignee == Issue Creator" -> "Transition to 'Open'". If the automation
 isn't working, that can probably be fixed. Some of the issues might just
 predate the automation.

 To be super clear: I don't mean to ask anyone to waste time looking at
 things that don't need attention, but to be able to notice things that do
 need attention. I did a few manually too, and the components, issue type,
 and priority very often need fixing up. I especially want to get untriaged
 P0s and P1s to zero.

 Kenn

 On Fri, Mar 12, 2021 at 5:07 PM Tyson Hamilton 
 wrote:

> I'm guilty of creating issues and not moving them to 'open'. I'll do
> better to move them to open in the future. To recompense I will spend some
> additional time triaging =)
>
> Thanks for the review of the flow.
>
> On Thu, Mar 11, 2021 at 12:39 PM Kenneth Knowles 
> wrote:
>
>> Hi all,
>>
>> You may or may not think about this very often, but our Jira workflow
>> goes like this:
>>
>> Needs Triage --> Open --> In Progress --> Resolved
>>
>> "Needs Triage" means someone needs to look at it briefly:
>>
>>  - component(s)
>>  - label(s)
>>  - issue type
>>  - priority (see https://beam.apache.org/contribute/jira-priorities/)
>>  - if appropriate, ping someone or write to dev@ especially for P1
>> and P0
>>
>> Then transition the issue to "Open".
>>
>> Currently there is a big backlog but I don't think it is actually
>> accurate. I also think we have enough people to keep up with this and 
>> even
>> to eliminate the backlog pretty quick.
>>
>> Here are some things you can do when you are waiting for Jenkins
>> tests to complete:
>>
>>  - check your assigned issues
>>  - open up this filter and triage a couple issues at random:
>> https://issues.apache.org/jira/issues/?filter=12345682
>>
>> 800+ may seem like a lot, but dev@ had 65 participants in the last
>> 28 days (126 participants in the last 3 months). I would guess it 
>> averages
>> less than a minute per issue so this could be done in less than a day,
>> especially considering our CI times :-)
>>
>> Kenn
>>
>>


Re: BEAM-449 and BEAM-621 PRs request for review

2021-03-29 Thread Brian Hulette
Hi Vitaly,
It looks like Kenn is helping out with the BEAM-449 PR, I can look at the
one for BEAM-621.

Brian

On Fri, Mar 26, 2021 at 3:27 AM Vitaly Terentyev <
vitaly.terent...@akvelon.com> wrote:

> Hello devs,
>
> I am new to Beam. I recently assigned myself and worked on these two
> issues: BEAM-449 ,
> BEAM-621 .
> Can someone check my PRs and review them? Here they are:
> PR for BEAM-449 .
> PR for BEAM-621 .
>
> Best regards,
> Vitaly
>
>
>


Re: Python Dataframe API issue

2021-03-29 Thread Brian Hulette
Thanks for the feedback and the bug report Xinyu! I really appreciate it.

Brian

On Thu, Mar 25, 2021 at 6:04 PM Xinyu Liu  wrote:

> Np, thanks for quickly identifying the fix.
>
> Btw, I am very happy about Beam Python supporting the same Pandas
> dataframe api. It's super user-friendly to both devs and data scientists.
> Really cool work!
>
> Thanks,
> Xinyu
>
> On Thu, Mar 25, 2021 at 4:53 PM Robert Bradshaw 
> wrote:
>
>> Thanks, Xinyu, for finding this!
>>
>> On Thu, Mar 25, 2021 at 4:48 PM Kenneth Knowles  wrote:
>>
>>> Cloned to https://issues.apache.org/jira/browse/BEAM-12056
>>>
>>> On Thu, Mar 25, 2021 at 4:46 PM Brian Hulette 
>>> wrote:
>>>
 Yes this looks like https://issues.apache.org/jira/browse/BEAM-11929,
 I removed it from the release blockers since there is a workaround (use a
 NamedTuple type), but it's probably worth cherrypicking the fix.

 On Thu, Mar 25, 2021 at 4:44 PM Robert Bradshaw 
 wrote:

> This could be https://issues.apache.org/jira/browse/BEAM-11929
>
> On Thu, Mar 25, 2021 at 4:26 PM Robert Bradshaw 
> wrote:
>
>> This is definitely wrong. Looking into what's going on here, but this
>> seems severe enough to be a blocker for the next release.
>>
>> On Thu, Mar 25, 2021 at 3:39 PM Xinyu Liu 
>> wrote:
>>
>>> Hi, folks,
>>>
>>> I am playing around with the Python Dataframe API, and seemly got an
>>> schema issue when converting pcollection to dataframe. I wrote the
>>> following code for a simple test:
>>>
>>> import apache_beam as beam
>>> from apache_beam.dataframe.convert import to_dataframe
>>> from apache_beam.dataframe.convert import to_pcollection
>>>
>>> p = beam.Pipeline()
>>> data = p | beam.Create([('a', ''), ('b', '')]) | beam.Map(
>>> lambda x : beam.Row(word=x[0], val=x[1]))
>>> _ = data | beam.Map(print)
>>> p.run()
>>>
>>> This shows the following:
>>> Row(val='', word='a') Row(val='', word='b')
>>>
>>> But if I use to_dataframe() to convert it into a df, seems the
>>> schema was reversed:
>>>
>>> df = to_dataframe(data)
>>> dataCopy = to_pcollection(df)
>>> _ = dataCopy | beam.Map(print)
>>> p.run()
>>>
>>> I got:
>>> BeamSchema_4100b64e_16e9_467d_932e_5fc2e4acaca7(word='',
>>> val='a') BeamSchema_4100b64e_16e9_467d_932e_5fc2e4acaca7(word='',
>>> val='b')
>>>
>>> Seems now the column 'word' and 'val' is swapped. The problem seems
>>> to happen during to_dataframe(). If I print out df['word'], I got ''
>>> and ''. I am not sure whether I am doing something wrong or there 
>>> is an
>>> issue in the schema conversion. Could someone help me take a look?
>>>
>>> Thanks, Xinyu
>>>
>>


Re: BEAM-9613 - BigQueryUtils and Avro

2021-03-29 Thread Matthew Ouyang
It looks like some unit tests failed.  I'm unable to reproduce it in my
environment because the error I received (described in the PR) suggests I
can't even start the job in the first place.  Any help would be
appreciated.  Ideally I'd like to have this included in the next possible
release (2.30.0 it looks like) since my team had to do a few workarounds
to get past these issues.

On Sun, Mar 28, 2021 at 2:25 AM Matthew Ouyang 
wrote:

> Thank you for the feedback.  I walked back my initial approach in favour
> of Brian's option (2) and also implemented a fix for lists as well (
> https://github.com/apache/beam/pull/14350).  I agree with the tradeoff
> Brian pointed out as it is consistent with the rest of that component.  If
> performance ends up being a problem BigQueryUtils could have a different
> mode for TableRow and Avro.
>
> On Tue, Mar 23, 2021 at 1:47 PM Reuven Lax  wrote:
>
>> Logically, all JSON values are string. We often have put other objects in
>> there, which I believe works simply because of the implicit .toString()
>> method on those objects, but I'm not convinced this is really correct.
>>
>> On Tue, Mar 23, 2021 at 6:38 AM Brian Hulette 
>> wrote:
>>
>>> Thank you for digging into this and figuring out how this bug was
>>> introduced!
>>>
>>> In the long-term I think it would be preferable to avoid
>>> TableRow altogether in order to do a schema-aware read of avro data from
>>> BQ. We can go directly from Avro GenericRecord to Beam Rows now [1]. This
>>> would also be preferable for Arrow, where we could produce Row instances
>>> that are references into an underlying arrow RecordBatch (using
>>> RowWithGetters), rather than having to materialize each row to make a
>>> TableRow instance.
>>>
>>> For a short-term fix there are two options, both came up in Reuven's
>>> comments on BEAM-9613:
>>>
 However if we expect to get Long, Double,etc. objects in the TableRow,
 then this mapping code needs to handle those objects. Handling them
 directly would be more efficient - converting to a String would simply be a
 stopgap "one-line" fix.
>>>
>>>
>>> 1. handle them directly [in BigQueryUtils], this is what you've done in
>>> https://github.com/mouyang/beam/commit/326b291ab333c719a9f54446c34611581ea696eb
>>> 2. convert to a String [in BigQueryAvroUtils]
>>>
>>> I don't have a strong preference but I think (2) is a cleaner, albeit
>>> less performant, solution. It seems that BigQueryUtils expects all values
>>> in TableRow instances to be String instances. Since BigQueryAvroUtils is
>>> just a shim to convert GenericRecord to TableRow for use in BigQueryUtils,
>>> it should comply with that interface, rather than making BigQueryUtils work
>>> around the discrepancy.
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
>>>
>>> On Mon, Mar 22, 2021 at 5:59 PM Matthew Ouyang 
>>> wrote:
>>>
 I'm working on fixing BEAM-9613 because encountered this issue as a
 result of using BigQueryIO.readTableRowsWithSchema()

1. BEAM-7526 provided support for Lists and Maps that came from the
JSON export format
2. BEAM-2879 switched the export format from JSON to Avro.  This
caused issue BEAM-9613 since Avro no longer treated BQ BOOLEAN and 
 FLOAT as
a Java String but rather Java Boolean and Double.
3. The switch from JSON to Avro also introduced an issue with
fields with BQ REPEATED mode because fields of this mode.

 I have a simple fix to handle BQ BOOLEAN and FLOAT (
 https://github.com/mouyang/beam/commit/326b291ab333c719a9f54446c34611581ea696eb)
 however I'm a bit uncomfortable with it because

1. This would introduce mixing of both the JSON and Avro export
formats,
2. BEAM-8933 while still in progress would introduce Arrow and risk
a regression,
3. I haven't made a fix for REPEATED mode yet, but tests that use
BigQueryUtilsTest.BQ_ARRAY_ROW would have to change (

 https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilsTest.java#L306-L311).
I don't know if I should change this because I don't know if Beam 
 wishes to
continue support for the JSON format.

 I'd like to incorporate these changes in my project as soon as
 possible, but I also need guidance on next steps that would be in line with
 the general direction of the project.  I'm looking forward to any
 feedback.  Thanks.

>>>


Beam Dependency Check Report (2021-03-29)

2021-03-29 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
chromedriver-binary
88.0.4324.96.0
90.0.4430.24.0
2021-01-25
2021-03-22BEAM-10426
dill
0.3.1.1
0.3.3
2019-10-07
2020-11-02BEAM-11167
google-cloud-bigquery
1.28.0
2.13.1
2020-10-05
2021-03-29BEAM-5537
google-cloud-datastore
1.15.3
2.1.0
2020-11-16
2020-12-07BEAM-8443
google-cloud-dlp
1.0.0
3.0.1
2020-06-29
2021-02-01BEAM-10344
google-cloud-language
1.3.0
2.0.0
2020-10-26
2020-10-26BEAM-8
google-cloud-pubsub
1.7.0
2.3.0
2020-07-20
2021-02-15BEAM-5539
google-cloud-spanner
1.19.1
3.3.0
2020-11-16
2021-03-29BEAM-10345
google-cloud-videointelligence
1.16.1
2.0.0
2020-11-23
2020-11-23BEAM-11319
google-cloud-vision
1.0.0
2.2.0
2020-03-24
2021-02-15BEAM-9581
grpcio-tools
1.30.0
1.36.1
2020-06-29
2021-03-08BEAM-9582
idna
2.10
3.1
2021-01-04
2021-01-11BEAM-9328
mock
2.0.0
4.0.3
2019-05-20
2020-12-14BEAM-7369
mypy-protobuf
1.18
2.4
2020-03-24
2021-02-08BEAM-10346
nbconvert
5.6.1
6.0.7
2020-10-05
2020-10-05BEAM-11007
Pillow
7.2.0
8.1.2
2020-10-19
2021-03-08BEAM-11071
PyHamcrest
1.10.1
2.0.2
2020-01-20
2020-07-08BEAM-9155
pytest
4.6.11
6.2.2
2020-07-08
2021-02-01BEAM-8606
pytest-xdist
1.34.0
2.2.1
2020-08-17
2021-02-15BEAM-10713
tenacity
5.1.5
7.0.0
2019-11-11
2021-03-08BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.azure:azure-core
1.6.0
1.14.1
2020-07-02
2021-03-19BEAM-11888
com.azure:azure-identity
1.0.8
1.3.0-beta.2
2020-07-07
2021-03-11BEAM-11814
com.azure:azure-storage-common
12.8.0
12.11.0-beta.1
2020-08-13
2021-02-11BEAM-11889
com.datastax.cassandra:cassandra-driver-core
3.10.2
4.0.0
2020-08-26
2019-03-18BEAM-8674
com.esotericsoftware:kryo
4.0.2
5.0.4
2018-03-20
2021-03-12BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.fasterxml.jackson.module:jackson-module-scala_2.11
2.10.2
2.12.2
2020-01-05
2021-03-04BEAM-11603
com.fasterxml.jackson.module:jackson-module-scala_2.12
2.10.2
2.12.2
2020-01-05
2021-03-04BEAM-11973
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.33.0
0.38.0
2020-09-14
2021-03-08BEAM-6645
com.google.api.grpc:grpc-google-cloud-pubsublite-v1
0.7.0
0.12.0
2020-12-08
2021-03-18BEAM-11008
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1
1.8.0
1.16.1
2021-01-05
2021-03-26BEAM-11890
com.google.api.grpc:proto-google-cloud-bigquerystorage-v1beta2
0.108.0
0.116.1
2021-01-05
2021-03-26BEAM-11891
com.google.api.grpc:proto-google-cloud-dlp-v2
1.1.4
2.3.0
2020-05-04
2021-03-11BEAM-11892
com.google.api.grpc:proto-google-cloud-pubsublite-v1
0.7.0
0.12.0
2020-12-08
2021-03-18BEAM-11009
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1
3.2.1
6.0.0
2021-01-06
2021-03-22BEAM-8682
com.google.api.grpc:proto-google-cloud-spanner-v1
3.2.1
6.0.0
2021-01-06
2021-03-22BEAM-11893
com.google.api.grpc:proto-google-cloud-video-intelligence-v1
1.2.0
1.6.0
2020-03-10
2021-03-11BEAM-11894
com.google.api.grpc:proto-google-cloud-vision-v1
1.81.3
1.102.0
2020-04-07
2021-03-11BEAM-11895
com.google.apis:google-api-services-bigquery
v2-rev20210219-1.31.0
v2-rev20210313-1.31.0
2021-02-26
2021-