date:20200305

Re: Proposal: Beam to use GCP Libraries BOM

2020-03-05 Thread Ismaël Mejía

+1 Sounds like a good improvement for users and maintainers !

On Thu, Mar 5, 2020 at 6:59 AM Alex Van Boxel  wrote:
>
> +1, I can remember the countless hours that we fought with Google 
> dependencies.
>
> On Thu, Mar 5, 2020, 04:07 Chamikara Jayalath  wrote:
>>
>> +1 for this.
>>
>> This will make life easy for many of our users and will help us keep GCP 
>> related dependencies compatible (which has not been easy in the past).
>>
>> On Wed, Mar 4, 2020 at 2:16 PM Tomo Suzuki  wrote:
>>>
>>> Hi Beam developers,
>>>
>>> Shall we use GCP Libraries BOM [1] to specify the Google-related library 
>>> versions in Beam?
>>>
>>> I've been working on Beam's dependency upgrades in the past few months. 
>>> It's time to consider a long-term solution to keep the libraries up-to-date 
>>> with small maintenance effort. To achieve that, I propose Beam to use GCP 
>>> Libraries BOM to set the Google-related library versions, rather than the 
>>> current way of making changes in each of ~30 Google libraries with 
>>> individual PRs [2].
>>>
>>> After the proposal is implemented, Beam project upgrades the BOM version to 
>>> upgrade these Google-related libraries. This still needs to ensure the 
>>> libraries in GCP Library BOM are compatible with Beam's other dependencies. 
>>> (Linkage Checker will help with this job.) I believe onboarding GCP 
>>> Libraries BOM will solve lots of incompatibilities which we have seen in 
>>> gax, gRPC, google-cloud-core, and so on with minimal effort in Beam's 
>>> developers.
>>>
>>> Created an issue to track this: BEAM-9444 [3]. I appreciate if you can 
>>> share questions or feedback (thumbs-up / concerns).
>>>
>>> [1]: 
>>> https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM
>>> [2]: https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Asuztomo
>>> [3]: https://issues.apache.org/jira/browse/BEAM-9444
>>>
>>> --
>>> Regards,
>>> Tomo

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

2020-03-05 Thread Ismaël Mejía

The calcite vote already passed so this is good to go, thanks for
volunteering Rui.
https://lists.apache.org/thread.html/r4962a4a2bacf481f2ee1064806b78829d96385c2e4a3c0ecb24a55a2%40%3Cdev.calcite.apache.org%3E

On Thu, Mar 5, 2020 at 8:10 AM Kai Jiang  wrote:
>
> Thanks, Rui! Big +1 for calcite vendor release (1.22.0)
> Curious, what's the progress of Calcite 1.22.0 official release? I saw 
> Calcite community just passes the vote for 1.22.0 rc3.
>
> Best,
> Kai
>
>
> On Wed, Mar 4, 2020 at 9:24 PM Rui Wang  wrote:
>>
>> Hi Community,
>>
>> As Calcite is closing to finish their 1.22.0 release, I want to propose a 
>> Calcite vendor release and I am volunteer to be the release manager.
>>
>> I will wait until next Monday(03/09) to kick off the release if there is no 
>> objection.
>>
>>
>> Best,
>> Rui Wang

[DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Taher Koitawala

Hi All,
 We have been using Apache Beam extensively to process huge amounts
of data, while beam is really powerful and can solve a huge number of use
cases. A Beam job's development and testing time is significantly high.

   This gap can be filled with Beam SQL, where a complete SQL based
interface can reduce development and testing time to matter of minutes, it
also makes Apache Beam more user friendly where a wide variety of audience
with different analytical skillsets can interact.

The current Beam SQL is still needs to be used programmatically, and so I
propose the following additions/improvements.

*Note: Whist the below given examples are more GCP biased, they apply to
other sources in a generic manner*

For Example: Imagine a user who wants to write a stream processing job on
Google Cloud Dataflow. The user wants to process credit card transaction
streams from Google Cloud PubSub (Something like Kafka) and enrich each
record of the stream with some data that is stored in Google Cloud Spanner,
after enrichment the user wishes to write the following data to Google
Cloud BigQuery.

Given Below are the queries which the user should be able to fire on Beam
and the rest should be automatically handled by the framework.

//Infer schema from Spanner table upon table creation

CREATE TABLE SPANNER_CARD_INFO

OPTIONS (

 ProjectId: “gcp-project”,

 InstanceId : “spanner-instance-id”,

 Database: “some-database”,

 Table: “card_info”,

 CloudResource: “SPANNER”,

CreateTableIfNotExists: “FALSE”

  )
 //Apply schema to each record read from pubsub, and then apply SQL.

CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC

OPTIONS (

ProjectId: “gcp-project”,

Topic: “card-transactions”,

CloudResource : “PUBSUB”

SubscriptionId : “subscriptionId-1”,

CreateTopicIfNotExists: “FALSE”,

CreateSubscriptionIfNotExist: “TRUE”,

RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc

JsonRecordSchema : “{

“CardNumber” : “INT”,

“Amount”: “DOUBLE”,

“eventTimeStamp” : “EVENT_TIME”

}”)

//Create table in BigQuery if not exists and insert

CREATE TABLE TRANSACTION_HISTORY

OPTIONS (

ProjectId: “gcp-project”,

CloudResource : “BIGQUERY”

dataset: “dataset1”,

table : “table1”,

CreateTableIfNotExists: “TRUE”,

TableSchema : “

{

“card_number” : “INT”,

“first_name” : “STRING”,

“last_name” : “STRING”,

“phone” : “INT”,

“city” : “STRING”,

“amount”: “FLOAT”,

“eventtimestamp” : “INT”,

}”)

//Actual query that should get stretched to a Beam dag

INSERT INTO TRANSACTION_HISTORY

SELECT
pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner on
(pubsub.card_number = spanner.card_number);



Also to consider that if any of the sources or sinks change, we only change
the SQL and done!.

Please let me know your thoughts about this.

Regards,
Taher Koitawala

Re: [EXTERNAL] Re: Java Build broken

2020-03-05 Thread Maximilian Michels

Good find, Thomas! It looks like it is for testing releases because they 
are staged to this repository. IMHO there is no need for it to be 
enabled by default.

-Max

On 04.03.20 23:06, Thomas Weise wrote:
I run into this problem today and found that removing 
https://oss.sonatype.org/content/repositories/staging/ from buildSrc/src/main/groovy/org/apache/beam/gradle/Repositories.groovy 
also resolves the issue.

Is it possible that a flaky repository can poison the gradle cache? Do 
we need this repository entry at all?

On Tue, Mar 3, 2020 at 7:06 AM Pulasthi Supun Wickramasinghe 
mailto:pulasthi...@gmail.com>> wrote:

Thanks, that seems to have fixed the issue.

Best Regards,
Pulasthi

On Tue, Mar 3, 2020 at 5:47 AM Kamil Wasilewski
mailto:kamil.wasilew...@polidea.com>>
wrote:

I had the same problem, it seems that removing Gradle's cache
(`rm -rf ~/.gradle/caches`) solved the issue.

On Tue, Feb 25, 2020 at 4:33 PM Pulasthi Supun Wickramasinghe
mailto:pulasthi...@gmail.com>> wrote:

Hi Stefan,

Yes, I am also still getting this error on my local setup,
However, strangely I am not getting this on my laptop. I
tried manually installing the missing 'error_prone'
dependencies to maven but then go some other error.
Might this be some kind of cache issue?

Best Regards,
Pulasthi

On Tue, Feb 25, 2020 at 5:38 AM Stefan Djelekar
mailto:stefan.djele...@redbox.com>> wrote:

Hi all,

__ __

No this is not yet fixed.

__ __

@Pulasthi do you still get the same error?

__ __

@Maximilian I don’t have any overrides.

It looks like on localhost build references

https://oss.sonatype.org/content/repositories/staging/com/google/errorprone/error_prone_check_api/2.3.4/

istead of 

https://mvnrepository.com/artifact/com.google.errorprone/error_prone_check_api/2.3.4

__ __

and the first link returns 404

__ __

__ __

Can you please advise?

__ __

All the best,

Stefan

__ __

*From:*Pulasthi Supun Wickramasinghe
mailto:pulasthi...@gmail.com>>
*Sent:* Tuesday, February 18, 2020 5:11 PM
*To:* dev mailto:dev@beam.apache.org>>
*Cc:* Stefan Djelekar mailto:stefan.djele...@redbox.com>>
*Subject:* [EXTERNAL] Re: Java Build broken

__ __

Hi All,

__ __

Was this issue resolved? I started to get the same error
on my local build suddenly.

__ __

Best Regards,

Pulasthi

__ __

On Thu, Jan 23, 2020 at 10:17 AM Maximilian Michels
mailto:m...@apache.org>> wrote:

Do you have any overrides in your
~/.m2/settings.xml? The artifacts
should be found as part of Maven central, e.g.

https://mvnrepository.com/artifact/com.google.errorprone/error_prone_check_api

Cheers,
Max

On 23.01.20 11:11, Stefan Djelekar wrote:
 > Hi guys,
 >
 > It’s been days now since the build for Java SDK
is broken.
 >
 > Namely, pipeline is successful on Jenkins, but it
fails in my localhost
 > with error in task model:pipeline:compileJava. As
I've seen, last couple
 > of builds were served from cache, so maybe that
is the reason why it's
 > green. I confirmed same thing happened to other
devs as well.
 >
 > 22:49:34 > Task :model:pipeline:compileJava
FROM-CACHE
 >
 > It looks like it’s related to mismatch of
com.google.errorprone library
 > version. Can someone please take a look as this
is a blocker to
 > localhost development?
 >
 > Cheers,
 >
 > *Stefan Đelekar*
 >
 > Sofware Engineer
 >
 > Mobile +381 65 22 33 293
 >
 >

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Andrew Pilloud

I believe we have this functionality alredy:
https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/

Existing GCP tables can also be loaded through the GCP datacatalog
metastore. What are you proposing that is new?

Andrew


On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala  wrote:

> Hi All,
>  We have been using Apache Beam extensively to process huge
> amounts of data, while beam is really powerful and can solve a huge number
> of use cases. A Beam job's development and testing time is significantly
> high.
>
>This gap can be filled with Beam SQL, where a complete SQL based
> interface can reduce development and testing time to matter of minutes, it
> also makes Apache Beam more user friendly where a wide variety of audience
> with different analytical skillsets can interact.
>
> The current Beam SQL is still needs to be used programmatically, and so I
> propose the following additions/improvements.
>
> *Note: Whist the below given examples are more GCP biased, they apply to
> other sources in a generic manner*
>
> For Example: Imagine a user who wants to write a stream processing job on
> Google Cloud Dataflow. The user wants to process credit card transaction
> streams from Google Cloud PubSub (Something like Kafka) and enrich each
> record of the stream with some data that is stored in Google Cloud Spanner,
> after enrichment the user wishes to write the following data to Google
> Cloud BigQuery.
>
> Given Below are the queries which the user should be able to fire on Beam
> and the rest should be automatically handled by the framework.
>
> //Infer schema from Spanner table upon table creation
>
> CREATE TABLE SPANNER_CARD_INFO
>
> OPTIONS (
>
>  ProjectId: “gcp-project”,
>
>  InstanceId : “spanner-instance-id”,
>
>  Database: “some-database”,
>
>  Table: “card_info”,
>
>  CloudResource: “SPANNER”,
>
> CreateTableIfNotExists: “FALSE”
>
>   )
>  //Apply schema to each record read from pubsub, and then apply SQL.
>
> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>
> OPTIONS (
>
> ProjectId: “gcp-project”,
>
> Topic: “card-transactions”,
>
> CloudResource : “PUBSUB”
>
> SubscriptionId : “subscriptionId-1”,
>
> CreateTopicIfNotExists: “FALSE”,
>
> CreateSubscriptionIfNotExist: “TRUE”,
>
> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>
> JsonRecordSchema : “{
>
> “CardNumber” : “INT”,
>
> “Amount”: “DOUBLE”,
>
> “eventTimeStamp” : “EVENT_TIME”
>
> }”)
>
> //Create table in BigQuery if not exists and insert
>
> CREATE TABLE TRANSACTION_HISTORY
>
> OPTIONS (
>
> ProjectId: “gcp-project”,
>
> CloudResource : “BIGQUERY”
>
> dataset: “dataset1”,
>
> table : “table1”,
>
> CreateTableIfNotExists: “TRUE”,
>
> TableSchema : “
>
> {
>
> “card_number” : “INT”,
>
> “first_name” : “STRING”,
>
> “last_name” : “STRING”,
>
> “phone” : “INT”,
>
> “city” : “STRING”,
>
> “amount”: “FLOAT”,
>
> “eventtimestamp” : “INT”,
>
> }”)
>
> //Actual query that should get stretched to a Beam dag
>
> INSERT INTO TRANSACTION_HISTORY
>
> SELECT
> pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
> FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner on
> (pubsub.card_number = spanner.card_number);
>
>
>
> Also to consider that if any of the sources or sinks change, we only
> change the SQL and done!.
>
> Please let me know your thoughts about this.
>
> Regards,
> Taher Koitawala
>
>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Taher Koitawala

Also auto creation is not there

On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala  wrote:

> Proposal is to add more sources and also have time event time or
> processing enhancements further on them
>
> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud  wrote:
>
>> I believe we have this functionality alredy:
>> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>>
>> Existing GCP tables can also be loaded through the GCP datacatalog
>> metastore. What are you proposing that is new?
>>
>> Andrew
>>
>>
>> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala  wrote:
>>
>>> Hi All,
>>>  We have been using Apache Beam extensively to process huge
>>> amounts of data, while beam is really powerful and can solve a huge number
>>> of use cases. A Beam job's development and testing time is significantly
>>> high.
>>>
>>>This gap can be filled with Beam SQL, where a complete SQL based
>>> interface can reduce development and testing time to matter of minutes, it
>>> also makes Apache Beam more user friendly where a wide variety of audience
>>> with different analytical skillsets can interact.
>>>
>>> The current Beam SQL is still needs to be used programmatically, and so
>>> I propose the following additions/improvements.
>>>
>>> *Note: Whist the below given examples are more GCP biased, they apply to
>>> other sources in a generic manner*
>>>
>>> For Example: Imagine a user who wants to write a stream processing job
>>> on Google Cloud Dataflow. The user wants to process credit card transaction
>>> streams from Google Cloud PubSub (Something like Kafka) and enrich each
>>> record of the stream with some data that is stored in Google Cloud Spanner,
>>> after enrichment the user wishes to write the following data to Google
>>> Cloud BigQuery.
>>>
>>> Given Below are the queries which the user should be able to fire on
>>> Beam and the rest should be automatically handled by the framework.
>>>
>>> //Infer schema from Spanner table upon table creation
>>>
>>> CREATE TABLE SPANNER_CARD_INFO
>>>
>>> OPTIONS (
>>>
>>>  ProjectId: “gcp-project”,
>>>
>>>  InstanceId : “spanner-instance-id”,
>>>
>>>  Database: “some-database”,
>>>
>>>  Table: “card_info”,
>>>
>>>  CloudResource: “SPANNER”,
>>>
>>> CreateTableIfNotExists: “FALSE”
>>>
>>>   )
>>>  //Apply schema to each record read from pubsub, and then apply SQL.
>>>
>>> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>>>
>>> OPTIONS (
>>>
>>> ProjectId: “gcp-project”,
>>>
>>> Topic: “card-transactions”,
>>>
>>> CloudResource : “PUBSUB”
>>>
>>> SubscriptionId : “subscriptionId-1”,
>>>
>>> CreateTopicIfNotExists: “FALSE”,
>>>
>>> CreateSubscriptionIfNotExist: “TRUE”,
>>>
>>> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>>>
>>> JsonRecordSchema : “{
>>>
>>> “CardNumber” : “INT”,
>>>
>>> “Amount”: “DOUBLE”,
>>>
>>> “eventTimeStamp” : “EVENT_TIME”
>>>
>>> }”)
>>>
>>> //Create table in BigQuery if not exists and insert
>>>
>>> CREATE TABLE TRANSACTION_HISTORY
>>>
>>> OPTIONS (
>>>
>>> ProjectId: “gcp-project”,
>>>
>>> CloudResource : “BIGQUERY”
>>>
>>> dataset: “dataset1”,
>>>
>>> table : “table1”,
>>>
>>> CreateTableIfNotExists: “TRUE”,
>>>
>>> TableSchema : “
>>>
>>> {
>>>
>>> “card_number” : “INT”,
>>>
>>> “first_name” : “STRING”,
>>>
>>> “last_name” : “STRING”,
>>>
>>> “phone” : “INT”,
>>>
>>> “city” : “STRING”,
>>>
>>> “amount”: “FLOAT”,
>>>
>>> “eventtimestamp” : “INT”,
>>>
>>> }”)
>>>
>>> //Actual query that should get stretched to a Beam dag
>>>
>>> INSERT INTO TRANSACTION_HISTORY
>>>
>>> SELECT
>>> pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
>>> FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner on
>>> (pubsub.card_number = spanner.card_number);
>>>
>>>
>>>
>>> Also to consider that if any of the sources or sinks change, we only
>>> change the SQL and done!.
>>>
>>> Please let me know your thoughts about this.
>>>
>>> Regards,
>>> Taher Koitawala
>>>
>>>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Taher Koitawala

Proposal is to add more sources and also have time event time or processing
enhancements further on them

On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud  wrote:

> I believe we have this functionality alredy:
> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>
> Existing GCP tables can also be loaded through the GCP datacatalog
> metastore. What are you proposing that is new?
>
> Andrew
>
>
> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala  wrote:
>
>> Hi All,
>>  We have been using Apache Beam extensively to process huge
>> amounts of data, while beam is really powerful and can solve a huge number
>> of use cases. A Beam job's development and testing time is significantly
>> high.
>>
>>This gap can be filled with Beam SQL, where a complete SQL based
>> interface can reduce development and testing time to matter of minutes, it
>> also makes Apache Beam more user friendly where a wide variety of audience
>> with different analytical skillsets can interact.
>>
>> The current Beam SQL is still needs to be used programmatically, and so I
>> propose the following additions/improvements.
>>
>> *Note: Whist the below given examples are more GCP biased, they apply to
>> other sources in a generic manner*
>>
>> For Example: Imagine a user who wants to write a stream processing job on
>> Google Cloud Dataflow. The user wants to process credit card transaction
>> streams from Google Cloud PubSub (Something like Kafka) and enrich each
>> record of the stream with some data that is stored in Google Cloud Spanner,
>> after enrichment the user wishes to write the following data to Google
>> Cloud BigQuery.
>>
>> Given Below are the queries which the user should be able to fire on Beam
>> and the rest should be automatically handled by the framework.
>>
>> //Infer schema from Spanner table upon table creation
>>
>> CREATE TABLE SPANNER_CARD_INFO
>>
>> OPTIONS (
>>
>>  ProjectId: “gcp-project”,
>>
>>  InstanceId : “spanner-instance-id”,
>>
>>  Database: “some-database”,
>>
>>  Table: “card_info”,
>>
>>  CloudResource: “SPANNER”,
>>
>> CreateTableIfNotExists: “FALSE”
>>
>>   )
>>  //Apply schema to each record read from pubsub, and then apply SQL.
>>
>> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>>
>> OPTIONS (
>>
>> ProjectId: “gcp-project”,
>>
>> Topic: “card-transactions”,
>>
>> CloudResource : “PUBSUB”
>>
>> SubscriptionId : “subscriptionId-1”,
>>
>> CreateTopicIfNotExists: “FALSE”,
>>
>> CreateSubscriptionIfNotExist: “TRUE”,
>>
>> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>>
>> JsonRecordSchema : “{
>>
>> “CardNumber” : “INT”,
>>
>> “Amount”: “DOUBLE”,
>>
>> “eventTimeStamp” : “EVENT_TIME”
>>
>> }”)
>>
>> //Create table in BigQuery if not exists and insert
>>
>> CREATE TABLE TRANSACTION_HISTORY
>>
>> OPTIONS (
>>
>> ProjectId: “gcp-project”,
>>
>> CloudResource : “BIGQUERY”
>>
>> dataset: “dataset1”,
>>
>> table : “table1”,
>>
>> CreateTableIfNotExists: “TRUE”,
>>
>> TableSchema : “
>>
>> {
>>
>> “card_number” : “INT”,
>>
>> “first_name” : “STRING”,
>>
>> “last_name” : “STRING”,
>>
>> “phone” : “INT”,
>>
>> “city” : “STRING”,
>>
>> “amount”: “FLOAT”,
>>
>> “eventtimestamp” : “INT”,
>>
>> }”)
>>
>> //Actual query that should get stretched to a Beam dag
>>
>> INSERT INTO TRANSACTION_HISTORY
>>
>> SELECT
>> pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
>> FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner on
>> (pubsub.card_number = spanner.card_number);
>>
>>
>>
>> Also to consider that if any of the sources or sinks change, we only
>> change the SQL and done!.
>>
>> Please let me know your thoughts about this.
>>
>> Regards,
>> Taher Koitawala
>>
>>

No space left on device - beam-jenkins 1 and 7

2020-03-05 Thread Michał Walenia

Hi there,
it seems we have a problem with Jenkins workers again. Nodes 1 and 7 both
fail jobs with "No space left on device".
Who is the best person to contact in these cases (someone with access
permissions to the workers).

I also noticed that such errors are becoming more and more frequent
recently and I'd like to discuss how can this be remedied. Can a cleanup
task be automated on Jenkins somehow?

Regards
Michal

-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects!

Re: Proposal: Beam to use GCP Libraries BOM

2020-03-05 Thread Filipe Regadas

Big +1, this is a step in the right direction and checking with other
Beam's direct and transitive deps is crucial since the referred bom only
convers a small part of it. Apache Commons, Jackson, `com.google.{api,
apis, cloud}`, slf4j comes to mind.

Filipe Regadas


On Thu, Mar 5, 2020 at 3:33 AM Ismaël Mejía  wrote:

> +1 Sounds like a good improvement for users and maintainers !
>
> On Thu, Mar 5, 2020 at 6:59 AM Alex Van Boxel  wrote:
> >
> > +1, I can remember the countless hours that we fought with Google
> dependencies.
> >
> > On Thu, Mar 5, 2020, 04:07 Chamikara Jayalath 
> wrote:
> >>
> >> +1 for this.
> >>
> >> This will make life easy for many of our users and will help us keep
> GCP related dependencies compatible (which has not been easy in the past).
> >>
> >> On Wed, Mar 4, 2020 at 2:16 PM Tomo Suzuki  wrote:
> >>>
> >>> Hi Beam developers,
> >>>
> >>> Shall we use GCP Libraries BOM [1] to specify the Google-related
> library versions in Beam?
> >>>
> >>> I've been working on Beam's dependency upgrades in the past few
> months. It's time to consider a long-term solution to keep the libraries
> up-to-date with small maintenance effort. To achieve that, I propose Beam
> to use GCP Libraries BOM to set the Google-related library versions, rather
> than the current way of making changes in each of ~30 Google libraries with
> individual PRs [2].
> >>>
> >>> After the proposal is implemented, Beam project upgrades the BOM
> version to upgrade these Google-related libraries. This still needs to
> ensure the libraries in GCP Library BOM are compatible with Beam's other
> dependencies. (Linkage Checker will help with this job.) I believe
> onboarding GCP Libraries BOM will solve lots of incompatibilities which we
> have seen in gax, gRPC, google-cloud-core, and so on with minimal effort in
> Beam's developers.
> >>>
> >>> Created an issue to track this: BEAM-9444 [3]. I appreciate if you can
> share questions or feedback (thumbs-up / concerns).
> >>>
> >>> [1]:
> https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM
> >>> [2]:
> https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Asuztomo
> >>> [3]: https://issues.apache.org/jira/browse/BEAM-9444
> >>>
> >>> --
> >>> Regards,
> >>> Tomo
>

Re: [VOTE] Upgrade gradle to 6.2

2020-03-05 Thread Alex Van Boxel

I will

 _/
_/ Alex Van Boxel


On Thu, Mar 5, 2020 at 8:17 PM Ismaël Mejía  wrote:

> Looks like we have consensus on this one. Can you create a JIRA to
> track this Alex.
> I found this interesting presentation and associated repo, for the
> interested on new improvements we can win with the move to version
> 6.x.x
> https://melix.github.io/gradle-6-whats-new/#/
>
> https://github.com/melix/gradle-6-whats-new/tree/master/demos/hello-gradle-6
>
> On Tue, Feb 25, 2020 at 7:24 PM Luke Cwik  wrote:
> >
> > +1
> >
> > On Tue, Feb 25, 2020 at 12:49 AM Gleb Kanterov  wrote:
> >>
> >> +1 (non-binding)
> >>
> >> On Tue, Feb 25, 2020 at 9:38 AM Ismaël Mejía  wrote:
> >>>
> >>> +1 great to have our build updated, please share if there are new
> interesting features/plugin advantages we can benefit from too.
> >>>
> >>> On Tue, Feb 25, 2020 at 8:24 AM Jean-Baptiste Onofré 
> wrote:
> 
>  Hi Alex
> 
>  I also have couple of contacts at Gradle. Let me know if needed.
> 
>  Regards
>  JB
> 
>  Le mar. 25 f?vr. 2020 ? 08:20, Alex Van Boxel  a
> ?crit :
> >
> > OK, great. I know someone that works at gradle, so I can ping them
> when I have some problems.
> >
> > Any other know pitfalls I can expect, let me know :-)
> >
> >  _/
> > _/ Alex Van Boxel
> >
> >
> > On Tue, Feb 25, 2020 at 7:20 AM Jean-Baptiste Onofr? <
> j...@nanthrax.net> wrote:
> >
> > +1
> >
> > It makes sense.
> >
> > Thanks.
> > Regards
> > JB
> >
> > Le lun. 24 f?vr. 2020 ? 22:37, Alex Van Boxel  a
> ?crit :
> >
> > Anyone objections that I upgrade gradle to 6.2. If ok this will be
> done over several commits where I will:
> >
> > Upgrade plugins
> > Upgrade gradle to 6.2
> > See where we can use some of the new features
> >
> >
> >  _/
> > _/ Alex Van Boxel
>

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

2020-03-05 Thread Xinyu Liu

Thanks, Rui! We've been waiting for the new version of Calcite which has
the fix to unflatten the fields. Seems this version will come with it.

Thanks,
Xinyu

On Thu, Mar 5, 2020 at 12:41 AM Ismaël Mejía  wrote:

> The calcite vote already passed so this is good to go, thanks for
> volunteering Rui.
>
> https://lists.apache.org/thread.html/r4962a4a2bacf481f2ee1064806b78829d96385c2e4a3c0ecb24a55a2%40%3Cdev.calcite.apache.org%3E
>
> On Thu, Mar 5, 2020 at 8:10 AM Kai Jiang  wrote:
> >
> > Thanks, Rui! Big +1 for calcite vendor release (1.22.0)
> > Curious, what's the progress of Calcite 1.22.0 official release? I saw
> Calcite community just passes the vote for 1.22.0 rc3.
> >
> > Best,
> > Kai
> >
> >
> > On Wed, Mar 4, 2020 at 9:24 PM Rui Wang  wrote:
> >>
> >> Hi Community,
> >>
> >> As Calcite is closing to finish their 1.22.0 release, I want to propose
> a Calcite vendor release and I am volunteer to be the release manager.
> >>
> >> I will wait until next Monday(03/09) to kick off the release if there
> is no objection.
> >>
> >>
> >> Best,
> >> Rui Wang
>

Re: Proposal: Beam to use GCP Libraries BOM

2020-03-05 Thread Tomo Suzuki

> Do Spark or Flink have BOMs?

Not that I know of. I couldn't find "bom" in their artifacts [1, 2].

[1]: https://search.maven.org/search?q=g:org.apache.flink
[2]: https://search.maven.org/search?q=g:org.apache.spark


On Thu, Mar 5, 2020 at 1:46 PM Kenneth Knowles  wrote:

> +1 and you have phrased the benefits and limitations well. We have plenty
> of not-Google-related dependencies that use Guava and protobuf (I know of
> Calcite, Cassandra, Kinesis, and Spark) so there's still work in managing
> deps, but the BOM should make it a lot easier to upgrade all these tightly
> coupled libraries that Google ships from their head.
>
> Do Spark or Flink have BOMs? I wonder if there's an opportunity to catch
> incompatible deps at a larger scale by comparing and merging a half dozen
> BOMs (although in the limit it approximately expands to one per runner and
> one per IO and projects mature and become independent)
>
> Kenn
>
> On Thu, Mar 5, 2020 at 10:05 AM Luke Cwik  wrote:
>
>> How would the Apache Beam BOM and GCP BOM work together?
>>
>> On Thu, Mar 5, 2020 at 7:25 AM Filipe Regadas 
>> wrote:
>>
>>> Big +1, this is a step in the right direction and checking with other
>>> Beam's direct and transitive deps is crucial since the referred bom only
>>> convers a small part of it. Apache Commons, Jackson, `com.google.{api,
>>> apis, cloud}`, slf4j comes to mind.
>>>
>>> Filipe Regadas
>>>
>>>
>>> On Thu, Mar 5, 2020 at 3:33 AM Ismaël Mejía  wrote:
>>>
 +1 Sounds like a good improvement for users and maintainers !

 On Thu, Mar 5, 2020 at 6:59 AM Alex Van Boxel  wrote:
 >
 > +1, I can remember the countless hours that we fought with Google
 dependencies.
 >
 > On Thu, Mar 5, 2020, 04:07 Chamikara Jayalath 
 wrote:
 >>
 >> +1 for this.
 >>
 >> This will make life easy for many of our users and will help us keep
 GCP related dependencies compatible (which has not been easy in the past).
 >>
 >> On Wed, Mar 4, 2020 at 2:16 PM Tomo Suzuki 
 wrote:
 >>>
 >>> Hi Beam developers,
 >>>
 >>> Shall we use GCP Libraries BOM [1] to specify the Google-related
 library versions in Beam?
 >>>
 >>> I've been working on Beam's dependency upgrades in the past few
 months. It's time to consider a long-term solution to keep the libraries
 up-to-date with small maintenance effort. To achieve that, I propose Beam
 to use GCP Libraries BOM to set the Google-related library versions, rather
 than the current way of making changes in each of ~30 Google libraries with
 individual PRs [2].
 >>>
 >>> After the proposal is implemented, Beam project upgrades the BOM
 version to upgrade these Google-related libraries. This still needs to
 ensure the libraries in GCP Library BOM are compatible with Beam's other
 dependencies. (Linkage Checker will help with this job.) I believe
 onboarding GCP Libraries BOM will solve lots of incompatibilities which we
 have seen in gax, gRPC, google-cloud-core, and so on with minimal effort in
 Beam's developers.
 >>>
 >>> Created an issue to track this: BEAM-9444 [3]. I appreciate if you
 can share questions or feedback (thumbs-up / concerns).
 >>>
 >>> [1]:
 https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM
 >>> [2]:
 https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Asuztomo
 >>> [3]: https://issues.apache.org/jira/browse/BEAM-9444
 >>>
 >>> --
 >>> Regards,
 >>> Tomo

>>>

-- 
Regards,
Tomo

Re: [VOTE] Upgrade gradle to 6.2

2020-03-05 Thread Alex Van Boxel

https://issues.apache.org/jira/browse/BEAM-9456

I'll take it step-by-step, expect slow proces as till now haven't focussed
on Python, Go and other runners. So be patient.


 _/
_/ Alex Van Boxel


On Thu, Mar 5, 2020 at 9:13 PM Jean-Baptiste Onofre  wrote:

> Fair enough, we have the consensus, so agree to create Jira and move
> forward about this update.
>
> Regards
> JB
>
> > Le 5 mars 2020 à 20:16, Ismaël Mejía  a écrit :
> >
> > Looks like we have consensus on this one. Can you create a JIRA to
> > track this Alex.
> > I found this interesting presentation and associated repo, for the
> > interested on new improvements we can win with the move to version
> > 6.x.x
> > https://melix.github.io/gradle-6-whats-new/#/
> >
> https://github.com/melix/gradle-6-whats-new/tree/master/demos/hello-gradle-6
> >
> > On Tue, Feb 25, 2020 at 7:24 PM Luke Cwik  wrote:
> >>
> >> +1
> >>
> >> On Tue, Feb 25, 2020 at 12:49 AM Gleb Kanterov 
> wrote:
> >>>
> >>> +1 (non-binding)
> >>>
> >>> On Tue, Feb 25, 2020 at 9:38 AM Ismaël Mejía 
> wrote:
> 
>  +1 great to have our build updated, please share if there are new
> interesting features/plugin advantages we can benefit from too.
> 
>  On Tue, Feb 25, 2020 at 8:24 AM Jean-Baptiste Onofré 
> wrote:
> >
> > Hi Alex
> >
> > I also have couple of contacts at Gradle. Let me know if needed.
> >
> > Regards
> > JB
> >
> > Le mar. 25 f?vr. 2020 ? 08:20, Alex Van Boxel  a
> ?crit :
> >>
> >> OK, great. I know someone that works at gradle, so I can ping them
> when I have some problems.
> >>
> >> Any other know pitfalls I can expect, let me know :-)
> >>
> >> _/
> >> _/ Alex Van Boxel
> >>
> >>
> >> On Tue, Feb 25, 2020 at 7:20 AM Jean-Baptiste Onofr? <
> j...@nanthrax.net> wrote:
> >>
> >> +1
> >>
> >> It makes sense.
> >>
> >> Thanks.
> >> Regards
> >> JB
> >>
> >> Le lun. 24 f?vr. 2020 ? 22:37, Alex Van Boxel  a
> ?crit :
> >>
> >> Anyone objections that I upgrade gradle to 6.2. If ok this will be
> done over several commits where I will:
> >>
> >> Upgrade plugins
> >> Upgrade gradle to 6.2
> >> See where we can use some of the new features
> >>
> >>
> >> _/
> >> _/ Alex Van Boxel
>
>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Andrew Pilloud

For BigQueryIO, "CREATE EXTERNAL TABLE" does exactly what you describe in
"CREATE TABLE". You could add a table property to set the CreateDisposition
if you wanted to change that behavior.

Andrew

On Thu, Mar 5, 2020 at 11:10 AM Rui Wang  wrote:

> "CREATE TABLE" can be used to indicate if a table does not exist, BeamSQL
> will help create it in storage systems if allowed, while "CREATE EXTERNAL
> TABLE" can be used only for registering a table, no matter if the table
> exists or not. BeamSQL provides a finer-grained way to distinct
> different behaviours.
>
> In both cases BeamSQL does not store the table. Another approach is to
> leverage the options/table property to specify the expected behaviour.
>
>
> -Rui
>
> On Thu, Mar 5, 2020 at 10:55 AM Andrew Pilloud 
> wrote:
>
>> I'm not following the "CREATE TABLE" vs "CREATE EXTERNAL TABLE"
>> distinction. We added the "EXTERNAL" to make it clear that Beam wasn't
>> storing the table. Most of our current table providers will create the
>> underlying table as needed.
>>
>> Andrew
>>
>> On Thu, Mar 5, 2020 at 10:47 AM Rui Wang  wrote:
>>
>>> There are two pieces of news from the proposal:
>>> 1. Spanner source in SQL. (Welcome to contribute it)
>>> 2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
>>> whether assuming the table exists or not)
>>>
>>>
>>> There is a table property in the statement already that you can reuse to
>>> save your options.
>>>
>>>
>>> -Rui
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala 
>>> wrote:
>>>
 Also auto creation is not there

 On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala 
 wrote:

> Proposal is to add more sources and also have time event time or
> processing enhancements further on them
>
> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
> wrote:
>
>> I believe we have this functionality alredy:
>> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>>
>> Existing GCP tables can also be loaded through the GCP datacatalog
>> metastore. What are you proposing that is new?
>>
>> Andrew
>>
>>
>> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
>> wrote:
>>
>>> Hi All,
>>>  We have been using Apache Beam extensively to process huge
>>> amounts of data, while beam is really powerful and can solve a huge 
>>> number
>>> of use cases. A Beam job's development and testing time is significantly
>>> high.
>>>
>>>This gap can be filled with Beam SQL, where a complete SQL based
>>> interface can reduce development and testing time to matter of minutes, 
>>> it
>>> also makes Apache Beam more user friendly where a wide variety of 
>>> audience
>>> with different analytical skillsets can interact.
>>>
>>> The current Beam SQL is still needs to be used programmatically, and
>>> so I propose the following additions/improvements.
>>>
>>> *Note: Whist the below given examples are more GCP biased, they
>>> apply to other sources in a generic manner*
>>>
>>> For Example: Imagine a user who wants to write a stream processing
>>> job on Google Cloud Dataflow. The user wants to process credit card
>>> transaction streams from Google Cloud PubSub (Something like Kafka) and
>>> enrich each record of the stream with some data that is stored in Google
>>> Cloud Spanner, after enrichment the user wishes to write the following 
>>> data
>>> to Google Cloud BigQuery.
>>>
>>> Given Below are the queries which the user should be able to fire on
>>> Beam and the rest should be automatically handled by the framework.
>>>
>>> //Infer schema from Spanner table upon table creation
>>>
>>> CREATE TABLE SPANNER_CARD_INFO
>>>
>>> OPTIONS (
>>>
>>>  ProjectId: “gcp-project”,
>>>
>>>  InstanceId : “spanner-instance-id”,
>>>
>>>  Database: “some-database”,
>>>
>>>  Table: “card_info”,
>>>
>>>  CloudResource: “SPANNER”,
>>>
>>> CreateTableIfNotExists: “FALSE”
>>>
>>>   )
>>>  //Apply schema to each record read from pubsub, and then apply SQL.
>>>
>>> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>>>
>>> OPTIONS (
>>>
>>> ProjectId: “gcp-project”,
>>>
>>> Topic: “card-transactions”,
>>>
>>> CloudResource : “PUBSUB”
>>>
>>> SubscriptionId : “subscriptionId-1”,
>>>
>>> CreateTopicIfNotExists: “FALSE”,
>>>
>>> CreateSubscriptionIfNotExist: “TRUE”,
>>>
>>> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>>>
>>> JsonRecordSchema : “{
>>>
>>> “CardNumber” : “INT”,
>>>
>>> “Amount”: “DOUBLE”,
>>>
>>> “eventTimeStamp” : “EVENT_TIME”
>>>
>>> }”)
>>>
>>> //Create table in BigQuery if not exists and insert
>>>
>>> CREATE TABLE TRANSACTION_HISTORY
>>>
>>>

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

2020-03-05 Thread Ismaël Mejía

+1 (binding)

Verified signatures
Verified that there are no conscrypt classes or binaries in jar
Verified pom.xml has runtime dependency on conscrypt

On Thu, Mar 5, 2020 at 9:14 PM Jean-Baptiste Onofre  wrote:
>
> +1 (binding)
>
> Regards
> JB
>
> Le 5 mars 2020 à 19:55, Luke Cwik  a écrit :
>
> Please review the release of the following artifacts that we vendor:
>  * beam-vendor-grpc-1_26_0
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 0.3, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org [1], 
> which is signed with the key with fingerprint 
> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * commit hash "28d05d317a39b5d60a31988c69d1fb8a9c5006fc" [4],
>
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1098/
> [4] 
> https://github.com/apache/beam/commit/28d05d317a39b5d60a31988c69d1fb8a9c5006fc
>
>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Rui Wang

"CREATE TABLE" can be used to indicate if a table does not exist, BeamSQL
will help create it in storage systems if allowed, while "CREATE EXTERNAL
TABLE" can be used only for registering a table, no matter if the table
exists or not. BeamSQL provides a finer-grained way to distinct
different behaviours.

In both cases BeamSQL does not store the table. Another approach is to
leverage the options/table property to specify the expected behaviour.


-Rui

On Thu, Mar 5, 2020 at 10:55 AM Andrew Pilloud  wrote:

> I'm not following the "CREATE TABLE" vs "CREATE EXTERNAL TABLE"
> distinction. We added the "EXTERNAL" to make it clear that Beam wasn't
> storing the table. Most of our current table providers will create the
> underlying table as needed.
>
> Andrew
>
> On Thu, Mar 5, 2020 at 10:47 AM Rui Wang  wrote:
>
>> There are two pieces of news from the proposal:
>> 1. Spanner source in SQL. (Welcome to contribute it)
>> 2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
>> whether assuming the table exists or not)
>>
>>
>> There is a table property in the statement already that you can reuse to
>> save your options.
>>
>>
>> -Rui
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala 
>> wrote:
>>
>>> Also auto creation is not there
>>>
>>> On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala 
>>> wrote:
>>>
 Proposal is to add more sources and also have time event time or
 processing enhancements further on them

 On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
 wrote:

> I believe we have this functionality alredy:
> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>
> Existing GCP tables can also be loaded through the GCP datacatalog
> metastore. What are you proposing that is new?
>
> Andrew
>
>
> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
> wrote:
>
>> Hi All,
>>  We have been using Apache Beam extensively to process huge
>> amounts of data, while beam is really powerful and can solve a huge 
>> number
>> of use cases. A Beam job's development and testing time is significantly
>> high.
>>
>>This gap can be filled with Beam SQL, where a complete SQL based
>> interface can reduce development and testing time to matter of minutes, 
>> it
>> also makes Apache Beam more user friendly where a wide variety of 
>> audience
>> with different analytical skillsets can interact.
>>
>> The current Beam SQL is still needs to be used programmatically, and
>> so I propose the following additions/improvements.
>>
>> *Note: Whist the below given examples are more GCP biased, they apply
>> to other sources in a generic manner*
>>
>> For Example: Imagine a user who wants to write a stream processing
>> job on Google Cloud Dataflow. The user wants to process credit card
>> transaction streams from Google Cloud PubSub (Something like Kafka) and
>> enrich each record of the stream with some data that is stored in Google
>> Cloud Spanner, after enrichment the user wishes to write the following 
>> data
>> to Google Cloud BigQuery.
>>
>> Given Below are the queries which the user should be able to fire on
>> Beam and the rest should be automatically handled by the framework.
>>
>> //Infer schema from Spanner table upon table creation
>>
>> CREATE TABLE SPANNER_CARD_INFO
>>
>> OPTIONS (
>>
>>  ProjectId: “gcp-project”,
>>
>>  InstanceId : “spanner-instance-id”,
>>
>>  Database: “some-database”,
>>
>>  Table: “card_info”,
>>
>>  CloudResource: “SPANNER”,
>>
>> CreateTableIfNotExists: “FALSE”
>>
>>   )
>>  //Apply schema to each record read from pubsub, and then apply SQL.
>>
>> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>>
>> OPTIONS (
>>
>> ProjectId: “gcp-project”,
>>
>> Topic: “card-transactions”,
>>
>> CloudResource : “PUBSUB”
>>
>> SubscriptionId : “subscriptionId-1”,
>>
>> CreateTopicIfNotExists: “FALSE”,
>>
>> CreateSubscriptionIfNotExist: “TRUE”,
>>
>> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>>
>> JsonRecordSchema : “{
>>
>> “CardNumber” : “INT”,
>>
>> “Amount”: “DOUBLE”,
>>
>> “eventTimeStamp” : “EVENT_TIME”
>>
>> }”)
>>
>> //Create table in BigQuery if not exists and insert
>>
>> CREATE TABLE TRANSACTION_HISTORY
>>
>> OPTIONS (
>>
>> ProjectId: “gcp-project”,
>>
>> CloudResource : “BIGQUERY”
>>
>> dataset: “dataset1”,
>>
>> table : “table1”,
>>
>> CreateTableIfNotExists: “TRUE”,
>>
>> TableSchema : “
>>
>> {
>>
>> “card_number” : “INT”,
>>
>> “first_name” : “STRING”,
>>
>> “last_name” : “STRING”,
>>
>> “phone” : “INT”,
>>
>>

Re: [VOTE] Upgrade gradle to 6.2

2020-03-05 Thread Jean-Baptiste Onofre

Fair enough, we have the consensus, so agree to create Jira and move forward 
about this update.

Regards
JB

> Le 5 mars 2020 à 20:16, Ismaël Mejía  a écrit :
> 
> Looks like we have consensus on this one. Can you create a JIRA to
> track this Alex.
> I found this interesting presentation and associated repo, for the
> interested on new improvements we can win with the move to version
> 6.x.x
> https://melix.github.io/gradle-6-whats-new/#/
> https://github.com/melix/gradle-6-whats-new/tree/master/demos/hello-gradle-6
> 
> On Tue, Feb 25, 2020 at 7:24 PM Luke Cwik  wrote:
>> 
>> +1
>> 
>> On Tue, Feb 25, 2020 at 12:49 AM Gleb Kanterov  wrote:
>>> 
>>> +1 (non-binding)
>>> 
>>> On Tue, Feb 25, 2020 at 9:38 AM Ismaël Mejía  wrote:
 
 +1 great to have our build updated, please share if there are new 
 interesting features/plugin advantages we can benefit from too.
 
 On Tue, Feb 25, 2020 at 8:24 AM Jean-Baptiste Onofré  
 wrote:
> 
> Hi Alex
> 
> I also have couple of contacts at Gradle. Let me know if needed.
> 
> Regards
> JB
> 
> Le mar. 25 f?vr. 2020 ? 08:20, Alex Van Boxel  a ?crit :
>> 
>> OK, great. I know someone that works at gradle, so I can ping them when 
>> I have some problems.
>> 
>> Any other know pitfalls I can expect, let me know :-)
>> 
>> _/
>> _/ Alex Van Boxel
>> 
>> 
>> On Tue, Feb 25, 2020 at 7:20 AM Jean-Baptiste Onofr?  
>> wrote:
>> 
>> +1
>> 
>> It makes sense.
>> 
>> Thanks.
>> Regards
>> JB
>> 
>> Le lun. 24 f?vr. 2020 ? 22:37, Alex Van Boxel  a ?crit 
>> :
>> 
>> Anyone objections that I upgrade gradle to 6.2. If ok this will be done 
>> over several commits where I will:
>> 
>> Upgrade plugins
>> Upgrade gradle to 6.2
>> See where we can use some of the new features
>> 
>> 
>> _/
>> _/ Alex Van Boxel

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

2020-03-05 Thread Jean-Baptiste Onofre

+1 (binding)

Regards
JB

> Le 5 mars 2020 à 19:55, Luke Cwik  a écrit :
> 
> Please review the release of the following artifacts that we vendor:
>  * beam-vendor-grpc-1_26_0
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 0.3, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org 
>  [1], which is signed with the key with fingerprint 
> EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * commit hash "28d05d317a39b5d60a31988c69d1fb8a9c5006fc" [4],
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Release Manager
> 
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/ 
> 
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS 
> 
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1098/ 
> 
> [4] 
> https://github.com/apache/beam/commit/28d05d317a39b5d60a31988c69d1fb8a9c5006fc
>  
>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Rui Wang

Back to this proposal, I think it's ok if there is a need to
further distinguish the create/not create behaviour by either options or
using "create external table/create table".

-Rui

On Thu, Mar 5, 2020 at 11:19 AM Andrew Pilloud  wrote:

> For BigQueryIO, "CREATE EXTERNAL TABLE" does exactly what you describe in
> "CREATE TABLE". You could add a table property to set the CreateDisposition
> if you wanted to change that behavior.
>
> Andrew
>
> On Thu, Mar 5, 2020 at 11:10 AM Rui Wang  wrote:
>
>> "CREATE TABLE" can be used to indicate if a table does not exist, BeamSQL
>> will help create it in storage systems if allowed, while "CREATE EXTERNAL
>> TABLE" can be used only for registering a table, no matter if the table
>> exists or not. BeamSQL provides a finer-grained way to distinct
>> different behaviours.
>>
>> In both cases BeamSQL does not store the table. Another approach is to
>> leverage the options/table property to specify the expected behaviour.
>>
>>
>> -Rui
>>
>> On Thu, Mar 5, 2020 at 10:55 AM Andrew Pilloud 
>> wrote:
>>
>>> I'm not following the "CREATE TABLE" vs "CREATE EXTERNAL TABLE"
>>> distinction. We added the "EXTERNAL" to make it clear that Beam wasn't
>>> storing the table. Most of our current table providers will create the
>>> underlying table as needed.
>>>
>>> Andrew
>>>
>>> On Thu, Mar 5, 2020 at 10:47 AM Rui Wang  wrote:
>>>
 There are two pieces of news from the proposal:
 1. Spanner source in SQL. (Welcome to contribute it)
 2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
 whether assuming the table exists or not)

 There is a table property in the statement already that you can reuse
 to save your options.

 -Rui

 On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala 
 wrote:

> Also auto creation is not there
>
> On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala 
> wrote:
>
>> Proposal is to add more sources and also have time event time or
>> processing enhancements further on them
>>
>> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
>> wrote:
>>
>>> I believe we have this functionality alredy:
>>> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>>>
>>> Existing GCP tables can also be loaded through the GCP datacatalog
>>> metastore. What are you proposing that is new?
>>>
>>> Andrew
>>>
>>>
>>> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
>>> wrote:
>>>
 Hi All,
  We have been using Apache Beam extensively to process huge
 amounts of data, while beam is really powerful and can solve a huge 
 number
 of use cases. A Beam job's development and testing time is 
 significantly
 high.

This gap can be filled with Beam SQL, where a complete SQL based
 interface can reduce development and testing time to matter of 
 minutes, it
 also makes Apache Beam more user friendly where a wide variety of 
 audience
 with different analytical skillsets can interact.

 The current Beam SQL is still needs to be used programmatically,
 and so I propose the following additions/improvements.

 *Note: Whist the below given examples are more GCP biased, they
 apply to other sources in a generic manner*

 For Example: Imagine a user who wants to write a stream processing
 job on Google Cloud Dataflow. The user wants to process credit card
 transaction streams from Google Cloud PubSub (Something like Kafka) and
 enrich each record of the stream with some data that is stored in 
 Google
 Cloud Spanner, after enrichment the user wishes to write the following 
 data
 to Google Cloud BigQuery.

 Given Below are the queries which the user should be able to fire
 on Beam and the rest should be automatically handled by the framework.

 //Infer schema from Spanner table upon table creation

 CREATE TABLE SPANNER_CARD_INFO

 OPTIONS (

  ProjectId: “gcp-project”,

  InstanceId : “spanner-instance-id”,

  Database: “some-database”,

  Table: “card_info”,

  CloudResource: “SPANNER”,

 CreateTableIfNotExists: “FALSE”

   )
  //Apply schema to each record read from pubsub, and then apply SQL.

 CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC

 OPTIONS (

 ProjectId: “gcp-project”,

 Topic: “card-transactions”,

 CloudResource : “PUBSUB”

 SubscriptionId : “subscriptionId-1”,

 CreateTopicIfNotExists: “FALSE”,

Re: [VOTE] Upgrade gradle to 6.2

2020-03-05 Thread Ismaël Mejía

Looks like we have consensus on this one. Can you create a JIRA to
track this Alex.
I found this interesting presentation and associated repo, for the
interested on new improvements we can win with the move to version
6.x.x
https://melix.github.io/gradle-6-whats-new/#/
https://github.com/melix/gradle-6-whats-new/tree/master/demos/hello-gradle-6

On Tue, Feb 25, 2020 at 7:24 PM Luke Cwik  wrote:
>
> +1
>
> On Tue, Feb 25, 2020 at 12:49 AM Gleb Kanterov  wrote:
>>
>> +1 (non-binding)
>>
>> On Tue, Feb 25, 2020 at 9:38 AM Ismaël Mejía  wrote:
>>>
>>> +1 great to have our build updated, please share if there are new 
>>> interesting features/plugin advantages we can benefit from too.
>>>
>>> On Tue, Feb 25, 2020 at 8:24 AM Jean-Baptiste Onofré  
>>> wrote:

 Hi Alex

 I also have couple of contacts at Gradle. Let me know if needed.

 Regards
 JB

 Le mar. 25 f?vr. 2020 ? 08:20, Alex Van Boxel  a ?crit :
>
> OK, great. I know someone that works at gradle, so I can ping them when I 
> have some problems.
>
> Any other know pitfalls I can expect, let me know :-)
>
>  _/
> _/ Alex Van Boxel
>
>
> On Tue, Feb 25, 2020 at 7:20 AM Jean-Baptiste Onofr?  
> wrote:
>
> +1
>
> It makes sense.
>
> Thanks.
> Regards
> JB
>
> Le lun. 24 f?vr. 2020 ? 22:37, Alex Van Boxel  a ?crit :
>
> Anyone objections that I upgrade gradle to 6.2. If ok this will be done 
> over several commits where I will:
>
> Upgrade plugins
> Upgrade gradle to 6.2
> See where we can use some of the new features
>
>
>  _/
> _/ Alex Van Boxel

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

2020-03-05 Thread Robin Qiu

+1

Thanks Rui for proposing this. Bringing in the newest version of Calcite
will also simplify our codebase [1] and resolve some existing issues [2]

[1] https://issues.apache.org/jira/browse/BEAM-9190
[2] https://issues.apache.org/jira/browse/BEAM-9191

On Thu, Mar 5, 2020 at 11:42 AM Xinyu Liu  wrote:

> Thanks, Rui! We've been waiting for the new version of Calcite which has
> the fix to unflatten the fields. Seems this version will come with it.
>
> Thanks,
> Xinyu
>
> On Thu, Mar 5, 2020 at 12:41 AM Ismaël Mejía  wrote:
>
>> The calcite vote already passed so this is good to go, thanks for
>> volunteering Rui.
>>
>> https://lists.apache.org/thread.html/r4962a4a2bacf481f2ee1064806b78829d96385c2e4a3c0ecb24a55a2%40%3Cdev.calcite.apache.org%3E
>>
>> On Thu, Mar 5, 2020 at 8:10 AM Kai Jiang  wrote:
>> >
>> > Thanks, Rui! Big +1 for calcite vendor release (1.22.0)
>> > Curious, what's the progress of Calcite 1.22.0 official release? I saw
>> Calcite community just passes the vote for 1.22.0 rc3.
>> >
>> > Best,
>> > Kai
>> >
>> >
>> > On Wed, Mar 4, 2020 at 9:24 PM Rui Wang  wrote:
>> >>
>> >> Hi Community,
>> >>
>> >> As Calcite is closing to finish their 1.22.0 release, I want to
>> propose a Calcite vendor release and I am volunteer to be the release
>> manager.
>> >>
>> >> I will wait until next Monday(03/09) to kick off the release if there
>> is no objection.
>> >>
>> >>
>> >> Best,
>> >> Rui Wang
>>
>

Run Python PreCommit break?

2020-03-05 Thread Rui Wang

Hi Community,

Is python precommit breaking? I have observed a consistent test case
failure from
apache_beam.runners.portability.portable_runner_test.PortableRunnerTest.test_group_by_key
[1]
in the release branch.


It might have been fixed in the master branch. Does anyone have insight on
it?




[1]:
https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1535/testReport/junit/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_group_by_key/


-Rui

Re: Contributing Twister2 runner to Apache Beam

2020-03-05 Thread Kenneth Knowles

I agree with both of you, mostly :-)

The monorepo approach doesn't work/scale well for shipped libraries (name a
Google library that silently just works and never causes any dependency
problems) and the pain we feel has been constant and increasing, but I
don't think we are at the breaking point.

But Google's big monorepo [1] demonstrates similar benefits to what Kyle
describes. In the early stages the benefit of not having to think too hard
about build/test infra and share it everywhere is a big help, and it scales
well. Eventually, shipping test utility libraries and compliance suites can
be equivalent. And to your point - it is very helpful for users to know
that they can use CassandraIO with the other Beam artifacts. This is why
Google requires the whole big repo to depend on a single version of any
externally-controlled artifact. But, yes, as a consequence it is
preposterously difficult to stay up to date, since literally anything can
block progress. You need a unified escalation chain for that policy to make
sense. It is the definition of a healthy Apache project to *not* have that
(PMC is different).

Independent dependencies, independent git histories, and independent
release cadence/process are all separate discussions.

It is a broader question than this particular contribution, so let's merge
this runner before changing our whole way of doing things :-)

Kenn

[1]
https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
(really
quite a balanced analysis)

On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver  wrote:

> > Should runners, current and future, be in the same repository as Beam
> > core?
>
> In the distant past, runners lived in their own repositories, and then
> were donated to Beam. But Beam's current uber-repo setup allows a lot of
> convenience. For example, a ton of code (including core functionality and
> tests) is shared directly between runners, which is useful for keeping
> runners up to date and ensuring consistent behavior between them (in other
> words, maintainable and reliable).
>
> Generally, it is up to the authors of a particular Beam related
> project/subproject to decide whether to host their code in Beam or in a
> different repo, and up to the community to decide whether to take on the
> donation, as discussed in previous threads on the Twister2 runner. In this
> case, it seems there is agreement between the Twister2 runner authors and
> the community that the runner can be hosted in Beam proper.
>
> There are examples of successful independent Beam projects, such as
> Spotify's Scio, but having an independent project with its own releases
> requires a lot of dedicated resources, and the bar for entry for extending
> Beam should not be that high. All that's required of subproject authors is
> that they keep the subproject in step with Beam. If they can't maintain it
> any longer, the subproject can be allowed to bitrot without getting in
> anyone's way. On the other hand, I'm not sure of the details with
> Cassandra, but in general, a subproject should not have "the ability to
> block progress" just because it is contained in the Beam uber-repo.
>
> tl;dr Having an uber repo generally seems to work for Beam. Exceptions are
> few enough to be handled on a case-by-case basis.
>
> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold 
> wrote:
>
>> Generic question without commenting on Twister2 specifically:
>>
>> Should runners, current and future, be in the same repository as Beam
>> core? Can or should they be completely separate products with their
>> own release cycles?
>>
>> Generally, loose coupling leads to more maintainable, reliable
>> projects. Specifically, Cassandra is holding back some other changes
>> in Beam and I really wish it didn't have the ability to block
>> progress. The more different runners we have in core, the worse this
>> problem is likely to become.
>>
>>
>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe
>>  wrote:
>> >
>> > Hi
>> >
>> > I believe the pull request is pretty complete now with the help of
>> Ismaël. Kenn, would you be able to take a look at it and suggest any
>> changes if needed?. The build checks and validations tests are passing at
>> the moment.  I will start working on the documentation that you mentioned
>> in an earlier email separately.
>> >
>> > Best Regards,
>> > Pulasthi
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe <
>> pulasthi...@gmail.com> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> I have created the initial pull request [1] to contribute the Twister2
>> Beam runner to the Apache Beam codebase. More information on Twister2 can
>> be found here[2] and the Twister2 codebase is available here[3]. At the
>> moment only batch mode is supported in the runner, but we are planning to
>> add stream support and implement a portable runner for Twister2 in the near
>> future.
>> >>
>> >> As Kenn pointed out

Re: Contributing Twister2 runner to Apache Beam

2020-03-05 Thread Robert Bradshaw

I think we will get to a point where it makes sense for runners to
live in their own repositories, with their own release cadence, but
we're not at that point yet. One prerequisite is a stable API--we're
closing in on that with the portability protos, but many (java)
runners actually share the common runner core libraries and that is
even less set in stone.

On the other hand, taking responsibility for maintaining all runners
is not a tenable or scalable position for the Beam project. If a
runner is merged, it should be understood that it can be "un-merged"
if it causes a maintenance burden. A completely separate
project/repository makes this less messy.

On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles  wrote:
>
> I agree with both of you, mostly :-)
>
> The monorepo approach doesn't work/scale well for shipped libraries (name a 
> Google library that silently just works and never causes any dependency 
> problems) and the pain we feel has been constant and increasing, but I don't 
> think we are at the breaking point.
>
> But Google's big monorepo [1] demonstrates similar benefits to what Kyle 
> describes. In the early stages the benefit of not having to think too hard 
> about build/test infra and share it everywhere is a big help, and it scales 
> well. Eventually, shipping test utility libraries and compliance suites can 
> be equivalent. And to your point - it is very helpful for users to know that 
> they can use CassandraIO with the other Beam artifacts. This is why Google 
> requires the whole big repo to depend on a single version of any 
> externally-controlled artifact. But, yes, as a consequence it is 
> preposterously difficult to stay up to date, since literally anything can 
> block progress. You need a unified escalation chain for that policy to make 
> sense. It is the definition of a healthy Apache project to *not* have that 
> (PMC is different).
>
> Independent dependencies, independent git histories, and independent release 
> cadence/process are all separate discussions.
>
> It is a broader question than this particular contribution, so let's merge 
> this runner before changing our whole way of doing things :-)
>
> Kenn
>
> [1] 
> https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
>  (really quite a balanced analysis)
>
> On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver  wrote:
>>
>> > Should runners, current and future, be in the same repository as Beam
>> > core?
>>
>> In the distant past, runners lived in their own repositories, and then were 
>> donated to Beam. But Beam's current uber-repo setup allows a lot of 
>> convenience. For example, a ton of code (including core functionality and 
>> tests) is shared directly between runners, which is useful for keeping 
>> runners up to date and ensuring consistent behavior between them (in other 
>> words, maintainable and reliable).
>>
>> Generally, it is up to the authors of a particular Beam related 
>> project/subproject to decide whether to host their code in Beam or in a 
>> different repo, and up to the community to decide whether to take on the 
>> donation, as discussed in previous threads on the Twister2 runner. In this 
>> case, it seems there is agreement between the Twister2 runner authors and 
>> the community that the runner can be hosted in Beam proper.
>>
>> There are examples of successful independent Beam projects, such as 
>> Spotify's Scio, but having an independent project with its own releases 
>> requires a lot of dedicated resources, and the bar for entry for extending 
>> Beam should not be that high. All that's required of subproject authors is 
>> that they keep the subproject in step with Beam. If they can't maintain it 
>> any longer, the subproject can be allowed to bitrot without getting in 
>> anyone's way. On the other hand, I'm not sure of the details with Cassandra, 
>> but in general, a subproject should not have "the ability to block progress" 
>> just because it is contained in the Beam uber-repo.
>>
>> tl;dr Having an uber repo generally seems to work for Beam. Exceptions are 
>> few enough to be handled on a case-by-case basis.
>>
>> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold  
>> wrote:
>>>
>>> Generic question without commenting on Twister2 specifically:
>>>
>>> Should runners, current and future, be in the same repository as Beam
>>> core? Can or should they be completely separate products with their
>>> own release cycles?
>>>
>>> Generally, loose coupling leads to more maintainable, reliable
>>> projects. Specifically, Cassandra is holding back some other changes
>>> in Beam and I really wish it didn't have the ability to block
>>> progress. The more different runners we have in core, the worse this
>>> problem is likely to become.
>>>
>>>
>>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe
>>>  wrote:
>>> >
>>> > Hi
>>> >
>>> > I believe the pull request is pretty complete now with the help of 
>>> >

Re: Run Python PreCommit break?

2020-03-05 Thread Robert Bradshaw

https://github.com/apache/beam/pull/11021 for getting rid of these
vestigal error logs.

On Thu, Mar 5, 2020 at 1:21 PM Rui Wang  wrote:
>
> Hi Community,
>
> Is python precommit breaking? I have observed a consistent test case failure 
> from 
> apache_beam.runners.portability.portable_runner_test.PortableRunnerTest.test_group_by_key
>  [1] in the release branch.
>
>
> It might have been fixed in the master branch. Does anyone have insight on it?
>
>
>
>
> [1]: 
> https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1535/testReport/junit/apache_beam.runners.portability.portable_runner_test/PortableRunnerTest/test_group_by_key/
>
>
> -Rui

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

2020-03-05 Thread Kenneth Knowles

+1 thanks!

On Thu, Mar 5, 2020 at 12:50 PM Robin Qiu  wrote:

> +1
>
> Thanks Rui for proposing this. Bringing in the newest version of Calcite
> will also simplify our codebase [1] and resolve some existing issues [2]
>
> [1] https://issues.apache.org/jira/browse/BEAM-9190
> [2] https://issues.apache.org/jira/browse/BEAM-9191
>
> On Thu, Mar 5, 2020 at 11:42 AM Xinyu Liu  wrote:
>
>> Thanks, Rui! We've been waiting for the new version of Calcite which has
>> the fix to unflatten the fields. Seems this version will come with it.
>>
>> Thanks,
>> Xinyu
>>
>> On Thu, Mar 5, 2020 at 12:41 AM Ismaël Mejía  wrote:
>>
>>> The calcite vote already passed so this is good to go, thanks for
>>> volunteering Rui.
>>>
>>> https://lists.apache.org/thread.html/r4962a4a2bacf481f2ee1064806b78829d96385c2e4a3c0ecb24a55a2%40%3Cdev.calcite.apache.org%3E
>>>
>>> On Thu, Mar 5, 2020 at 8:10 AM Kai Jiang  wrote:
>>> >
>>> > Thanks, Rui! Big +1 for calcite vendor release (1.22.0)
>>> > Curious, what's the progress of Calcite 1.22.0 official release? I saw
>>> Calcite community just passes the vote for 1.22.0 rc3.
>>> >
>>> > Best,
>>> > Kai
>>> >
>>> >
>>> > On Wed, Mar 4, 2020 at 9:24 PM Rui Wang  wrote:
>>> >>
>>> >> Hi Community,
>>> >>
>>> >> As Calcite is closing to finish their 1.22.0 release, I want to
>>> propose a Calcite vendor release and I am volunteer to be the release
>>> manager.
>>> >>
>>> >> I will wait until next Monday(03/09) to kick off the release if there
>>> is no objection.
>>> >>
>>> >>
>>> >> Best,
>>> >> Rui Wang
>>>
>>

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #2

2020-03-05 Thread Luke Cwik

Cancelling this release, I made a mistake for the commit id which I built
from which I should have caught before sending this out.

On Thu, Mar 5, 2020 at 10:45 AM Luke Cwik  wrote:

> Please review the release of the following artifacts that we vendor:
>  * beam-vendor-grpc-1_26_0
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 0.3, as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org [1],
> which is signed with the key with
> fingerprint EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * commit hash "dd4c01ff9f6abcd7bd8d3fc76e89325e409fdd46" [4],
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1097/
> [4]
> https://github.com/apache/beam/commit/dd4c01ff9f6abcd7bd8d3fc76e89325e409fdd46
>

Re: Proposal: Beam to use GCP Libraries BOM

2020-03-05 Thread Tomo Suzuki

> How would the Apache Beam BOM and GCP BOM work together?

I envision there will be (new) "Beam GCP BOM" that imports (existing) Beam
BOM and GCP Libraries BOM with necessary overwrites (such as Guava version).
This clarifies which versions of Google libraries should be compatible with
Beam's version.


On Thu, Mar 5, 2020 at 1:05 PM Luke Cwik  wrote:

> How would the Apache Beam BOM and GCP BOM work together?
>
> On Thu, Mar 5, 2020 at 7:25 AM Filipe Regadas 
> wrote:
>
>> Big +1, this is a step in the right direction and checking with other
>> Beam's direct and transitive deps is crucial since the referred bom only
>> convers a small part of it. Apache Commons, Jackson, `com.google.{api,
>> apis, cloud}`, slf4j comes to mind.
>>
>> Filipe Regadas
>>
>>
>> On Thu, Mar 5, 2020 at 3:33 AM Ismaël Mejía  wrote:
>>
>>> +1 Sounds like a good improvement for users and maintainers !
>>>
>>> On Thu, Mar 5, 2020 at 6:59 AM Alex Van Boxel  wrote:
>>> >
>>> > +1, I can remember the countless hours that we fought with Google
>>> dependencies.
>>> >
>>> > On Thu, Mar 5, 2020, 04:07 Chamikara Jayalath 
>>> wrote:
>>> >>
>>> >> +1 for this.
>>> >>
>>> >> This will make life easy for many of our users and will help us keep
>>> GCP related dependencies compatible (which has not been easy in the past).
>>> >>
>>> >> On Wed, Mar 4, 2020 at 2:16 PM Tomo Suzuki 
>>> wrote:
>>> >>>
>>> >>> Hi Beam developers,
>>> >>>
>>> >>> Shall we use GCP Libraries BOM [1] to specify the Google-related
>>> library versions in Beam?
>>> >>>
>>> >>> I've been working on Beam's dependency upgrades in the past few
>>> months. It's time to consider a long-term solution to keep the libraries
>>> up-to-date with small maintenance effort. To achieve that, I propose Beam
>>> to use GCP Libraries BOM to set the Google-related library versions, rather
>>> than the current way of making changes in each of ~30 Google libraries with
>>> individual PRs [2].
>>> >>>
>>> >>> After the proposal is implemented, Beam project upgrades the BOM
>>> version to upgrade these Google-related libraries. This still needs to
>>> ensure the libraries in GCP Library BOM are compatible with Beam's other
>>> dependencies. (Linkage Checker will help with this job.) I believe
>>> onboarding GCP Libraries BOM will solve lots of incompatibilities which we
>>> have seen in gax, gRPC, google-cloud-core, and so on with minimal effort in
>>> Beam's developers.
>>> >>>
>>> >>> Created an issue to track this: BEAM-9444 [3]. I appreciate if you
>>> can share questions or feedback (thumbs-up / concerns).
>>> >>>
>>> >>> [1]:
>>> https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM
>>> >>> [2]:
>>> https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Asuztomo
>>> >>> [3]: https://issues.apache.org/jira/browse/BEAM-9444
>>> >>>
>>> >>> --
>>> >>> Regards,
>>> >>> Tomo
>>>
>>

-- 
Regards,
Tomo

[VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

2020-03-05 Thread Luke Cwik

Please review the release of the following artifacts that we vendor:
 * beam-vendor-grpc-1_26_0

Hi everyone,
Please review and vote on the release candidate #1 for the version 0.3, as
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* the official Apache source release to be deployed to dist.apache.org [1],
which is signed with the key with
fingerprint EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
* all artifacts to be deployed to the Maven Central Repository [3],
* commit hash "28d05d317a39b5d60a31988c69d1fb8a9c5006fc" [4],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1] https://dist.apache.org/repos/dist/dev/beam/vendor/
[2] https://dist.apache.org/repos/dist/release/beam/KEYS
[3] https://repository.apache.org/content/repositories/orgapachebeam-1098/
[4]
https://github.com/apache/beam/commit/28d05d317a39b5d60a31988c69d1fb8a9c5006fc

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Andrew Pilloud

I'm not following the "CREATE TABLE" vs "CREATE EXTERNAL TABLE"
distinction. We added the "EXTERNAL" to make it clear that Beam wasn't
storing the table. Most of our current table providers will create the
underlying table as needed.

Andrew

On Thu, Mar 5, 2020 at 10:47 AM Rui Wang  wrote:

> There are two pieces of news from the proposal:
> 1. Spanner source in SQL. (Welcome to contribute it)
> 2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
> whether assuming the table exists or not)
>
>
> There is a table property in the statement already that you can reuse to
> save your options.
>
>
> -Rui
>
>
>
>
>
>
>
> On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala  wrote:
>
>> Also auto creation is not there
>>
>> On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala 
>> wrote:
>>
>>> Proposal is to add more sources and also have time event time or
>>> processing enhancements further on them
>>>
>>> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
>>> wrote:
>>>
 I believe we have this functionality alredy:
 https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/

 Existing GCP tables can also be loaded through the GCP datacatalog
 metastore. What are you proposing that is new?

 Andrew


 On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
 wrote:

> Hi All,
>  We have been using Apache Beam extensively to process huge
> amounts of data, while beam is really powerful and can solve a huge number
> of use cases. A Beam job's development and testing time is significantly
> high.
>
>This gap can be filled with Beam SQL, where a complete SQL based
> interface can reduce development and testing time to matter of minutes, it
> also makes Apache Beam more user friendly where a wide variety of audience
> with different analytical skillsets can interact.
>
> The current Beam SQL is still needs to be used programmatically, and
> so I propose the following additions/improvements.
>
> *Note: Whist the below given examples are more GCP biased, they apply
> to other sources in a generic manner*
>
> For Example: Imagine a user who wants to write a stream processing job
> on Google Cloud Dataflow. The user wants to process credit card 
> transaction
> streams from Google Cloud PubSub (Something like Kafka) and enrich each
> record of the stream with some data that is stored in Google Cloud 
> Spanner,
> after enrichment the user wishes to write the following data to Google
> Cloud BigQuery.
>
> Given Below are the queries which the user should be able to fire on
> Beam and the rest should be automatically handled by the framework.
>
> //Infer schema from Spanner table upon table creation
>
> CREATE TABLE SPANNER_CARD_INFO
>
> OPTIONS (
>
>  ProjectId: “gcp-project”,
>
>  InstanceId : “spanner-instance-id”,
>
>  Database: “some-database”,
>
>  Table: “card_info”,
>
>  CloudResource: “SPANNER”,
>
> CreateTableIfNotExists: “FALSE”
>
>   )
>  //Apply schema to each record read from pubsub, and then apply SQL.
>
> CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC
>
> OPTIONS (
>
> ProjectId: “gcp-project”,
>
> Topic: “card-transactions”,
>
> CloudResource : “PUBSUB”
>
> SubscriptionId : “subscriptionId-1”,
>
> CreateTopicIfNotExists: “FALSE”,
>
> CreateSubscriptionIfNotExist: “TRUE”,
>
> RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc
>
> JsonRecordSchema : “{
>
> “CardNumber” : “INT”,
>
> “Amount”: “DOUBLE”,
>
> “eventTimeStamp” : “EVENT_TIME”
>
> }”)
>
> //Create table in BigQuery if not exists and insert
>
> CREATE TABLE TRANSACTION_HISTORY
>
> OPTIONS (
>
> ProjectId: “gcp-project”,
>
> CloudResource : “BIGQUERY”
>
> dataset: “dataset1”,
>
> table : “table1”,
>
> CreateTableIfNotExists: “TRUE”,
>
> TableSchema : “
>
> {
>
> “card_number” : “INT”,
>
> “first_name” : “STRING”,
>
> “last_name” : “STRING”,
>
> “phone” : “INT”,
>
> “city” : “STRING”,
>
> “amount”: “FLOAT”,
>
> “eventtimestamp” : “INT”,
>
> }”)
>
> //Actual query that should get stretched to a Beam dag
>
> INSERT INTO TRANSACTION_HISTORY
>
> SELECT
> pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
> FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner
> on (pubsub.card_number = spanner.card_number);
>
>
>
> Also to consider that if any of the sources or sinks change, we only
> change the SQL and done!.
>
> Please let me know your thoughts about this.
>
> Regards,
> Taher

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

2020-03-05 Thread Luke Cwik

+1 (binding)
Verified that conscrypt jars and .so files don't appear in the jar.

On Thu, Mar 5, 2020 at 10:55 AM Luke Cwik  wrote:

> Please review the release of the following artifacts that we vendor:
>  * beam-vendor-grpc-1_26_0
>
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 0.3, as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * the official Apache source release to be deployed to dist.apache.org [1],
> which is signed with the key with
> fingerprint EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
> * all artifacts to be deployed to the Maven Central Repository [3],
> * commit hash "28d05d317a39b5d60a31988c69d1fb8a9c5006fc" [4],
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1] https://dist.apache.org/repos/dist/dev/beam/vendor/
> [2] https://dist.apache.org/repos/dist/release/beam/KEYS
> [3] https://repository.apache.org/content/repositories/orgapachebeam-1098/
> [4]
> https://github.com/apache/beam/commit/28d05d317a39b5d60a31988c69d1fb8a9c5006fc
>

Re: Proposal: Beam to use GCP Libraries BOM

2020-03-05 Thread Luke Cwik

How would the Apache Beam BOM and GCP BOM work together?

On Thu, Mar 5, 2020 at 7:25 AM Filipe Regadas 
wrote:

> Big +1, this is a step in the right direction and checking with other
> Beam's direct and transitive deps is crucial since the referred bom only
> convers a small part of it. Apache Commons, Jackson, `com.google.{api,
> apis, cloud}`, slf4j comes to mind.
>
> Filipe Regadas
>
>
> On Thu, Mar 5, 2020 at 3:33 AM Ismaël Mejía  wrote:
>
>> +1 Sounds like a good improvement for users and maintainers !
>>
>> On Thu, Mar 5, 2020 at 6:59 AM Alex Van Boxel  wrote:
>> >
>> > +1, I can remember the countless hours that we fought with Google
>> dependencies.
>> >
>> > On Thu, Mar 5, 2020, 04:07 Chamikara Jayalath 
>> wrote:
>> >>
>> >> +1 for this.
>> >>
>> >> This will make life easy for many of our users and will help us keep
>> GCP related dependencies compatible (which has not been easy in the past).
>> >>
>> >> On Wed, Mar 4, 2020 at 2:16 PM Tomo Suzuki  wrote:
>> >>>
>> >>> Hi Beam developers,
>> >>>
>> >>> Shall we use GCP Libraries BOM [1] to specify the Google-related
>> library versions in Beam?
>> >>>
>> >>> I've been working on Beam's dependency upgrades in the past few
>> months. It's time to consider a long-term solution to keep the libraries
>> up-to-date with small maintenance effort. To achieve that, I propose Beam
>> to use GCP Libraries BOM to set the Google-related library versions, rather
>> than the current way of making changes in each of ~30 Google libraries with
>> individual PRs [2].
>> >>>
>> >>> After the proposal is implemented, Beam project upgrades the BOM
>> version to upgrade these Google-related libraries. This still needs to
>> ensure the libraries in GCP Library BOM are compatible with Beam's other
>> dependencies. (Linkage Checker will help with this job.) I believe
>> onboarding GCP Libraries BOM will solve lots of incompatibilities which we
>> have seen in gax, gRPC, google-cloud-core, and so on with minimal effort in
>> Beam's developers.
>> >>>
>> >>> Created an issue to track this: BEAM-9444 [3]. I appreciate if you
>> can share questions or feedback (thumbs-up / concerns).
>> >>>
>> >>> [1]:
>> https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM
>> >>> [2]:
>> https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Asuztomo
>> >>> [3]: https://issues.apache.org/jira/browse/BEAM-9444
>> >>>
>> >>> --
>> >>> Regards,
>> >>> Tomo
>>
>

Re: [DISCUSS] Query external resources as Tables with Beam SQL

2020-03-05 Thread Rui Wang

There are two pieces of news from the proposal:
1. Spanner source in SQL. (Welcome to contribute it)
2. CREATE TABLE statement than CREATE EXTERNAL TABLE (the difference is
whether assuming the table exists or not)

There is a table property in the statement already that you can reuse to
save your options.

-Rui

On Thu, Mar 5, 2020 at 2:30 AM Taher Koitawala  wrote:

> Also auto creation is not there
>
> On Thu, Mar 5, 2020 at 3:59 PM Taher Koitawala  wrote:
>
>> Proposal is to add more sources and also have time event time or
>> processing enhancements further on them
>>
>> On Thu, Mar 5, 2020 at 3:50 PM Andrew Pilloud 
>> wrote:
>>
>>> I believe we have this functionality alredy:
>>> https://beam.apache.org/documentation/dsls/sql/extensions/create-external-table/
>>>
>>> Existing GCP tables can also be loaded through the GCP datacatalog
>>> metastore. What are you proposing that is new?
>>>
>>> Andrew
>>>
>>>
>>> On Thu, Mar 5, 2020, 12:29 AM Taher Koitawala 
>>> wrote:
>>>
 Hi All,
  We have been using Apache Beam extensively to process huge
 amounts of data, while beam is really powerful and can solve a huge number
 of use cases. A Beam job's development and testing time is significantly
 high.

This gap can be filled with Beam SQL, where a complete SQL based
 interface can reduce development and testing time to matter of minutes, it
 also makes Apache Beam more user friendly where a wide variety of audience
 with different analytical skillsets can interact.

 The current Beam SQL is still needs to be used programmatically, and so
 I propose the following additions/improvements.

 *Note: Whist the below given examples are more GCP biased, they apply
 to other sources in a generic manner*

 For Example: Imagine a user who wants to write a stream processing job
 on Google Cloud Dataflow. The user wants to process credit card transaction
 streams from Google Cloud PubSub (Something like Kafka) and enrich each
 record of the stream with some data that is stored in Google Cloud Spanner,
 after enrichment the user wishes to write the following data to Google
 Cloud BigQuery.

 Given Below are the queries which the user should be able to fire on
 Beam and the rest should be automatically handled by the framework.

 //Infer schema from Spanner table upon table creation

 CREATE TABLE SPANNER_CARD_INFO

 OPTIONS (

  ProjectId: “gcp-project”,

  InstanceId : “spanner-instance-id”,

  Database: “some-database”,

  Table: “card_info”,

  CloudResource: “SPANNER”,

 CreateTableIfNotExists: “FALSE”

   )
  //Apply schema to each record read from pubsub, and then apply SQL.

 CREATE TABLE TRANSACTIONS_PUBSUB_TOPIC

 OPTIONS (

 ProjectId: “gcp-project”,

 Topic: “card-transactions”,

 CloudResource : “PUBSUB”

 SubscriptionId : “subscriptionId-1”,

 CreateTopicIfNotExists: “FALSE”,

 CreateSubscriptionIfNotExist: “TRUE”,

 RecordType: “JSON” //POssible values: Avro, JSON, TVS..etc

 JsonRecordSchema : “{

 “CardNumber” : “INT”,

 “Amount”: “DOUBLE”,

 “eventTimeStamp” : “EVENT_TIME”

 }”)

 //Create table in BigQuery if not exists and insert

 CREATE TABLE TRANSACTION_HISTORY

 OPTIONS (

 ProjectId: “gcp-project”,

 CloudResource : “BIGQUERY”

 dataset: “dataset1”,

 table : “table1”,

 CreateTableIfNotExists: “TRUE”,

 TableSchema : “

 {

 “card_number” : “INT”,

 “first_name” : “STRING”,

 “last_name” : “STRING”,

 “phone” : “INT”,

 “city” : “STRING”,

 “amount”: “FLOAT”,

 “eventtimestamp” : “INT”,

 }”)

 //Actual query that should get stretched to a Beam dag

 INSERT INTO TRANSACTION_HISTORY

 SELECT
 pubsub.card_number,spanner.first_name,spanner.last_name,spanner.phone,spanner.city,pubsub.amount,pubsub.eventTimeStamp
 FROM TRANSACTIONS_PUBSUB_TOPIC pubsub join SPANNER_CARD_INFO spanner
 on (pubsub.card_number = spanner.card_number);

 Also to consider that if any of the sources or sinks change, we only
 change the SQL and done!.

 Please let me know your thoughts about this.

 Regards,
 Taher Koitawala

[VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #2

2020-03-05 Thread Luke Cwik

Please review the release of the following artifacts that we vendor:
 * beam-vendor-grpc-1_26_0

Hi everyone,
Please review and vote on the release candidate #1 for the version 0.3, as
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* the official Apache source release to be deployed to dist.apache.org [1],
which is signed with the key with
fingerprint EAD5DE293F4A03DD2E77565589E68A56E371CCA2 [2],
* all artifacts to be deployed to the Maven Central Repository [3],
* commit hash "dd4c01ff9f6abcd7bd8d3fc76e89325e409fdd46" [4],

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1] https://dist.apache.org/repos/dist/dev/beam/vendor/
[2] https://dist.apache.org/repos/dist/release/beam/KEYS
[3] https://repository.apache.org/content/repositories/orgapachebeam-1097/
[4]
https://github.com/apache/beam/commit/dd4c01ff9f6abcd7bd8d3fc76e89325e409fdd46

Re: Proposal: Beam to use GCP Libraries BOM

2020-03-05 Thread Kenneth Knowles

+1 and you have phrased the benefits and limitations well. We have plenty
of not-Google-related dependencies that use Guava and protobuf (I know of
Calcite, Cassandra, Kinesis, and Spark) so there's still work in managing
deps, but the BOM should make it a lot easier to upgrade all these tightly
coupled libraries that Google ships from their head.

Do Spark or Flink have BOMs? I wonder if there's an opportunity to catch
incompatible deps at a larger scale by comparing and merging a half dozen
BOMs (although in the limit it approximately expands to one per runner and
one per IO and projects mature and become independent)

Kenn

On Thu, Mar 5, 2020 at 10:05 AM Luke Cwik  wrote:

> How would the Apache Beam BOM and GCP BOM work together?
>
> On Thu, Mar 5, 2020 at 7:25 AM Filipe Regadas 
> wrote:
>
>> Big +1, this is a step in the right direction and checking with other
>> Beam's direct and transitive deps is crucial since the referred bom only
>> convers a small part of it. Apache Commons, Jackson, `com.google.{api,
>> apis, cloud}`, slf4j comes to mind.
>>
>> Filipe Regadas
>>
>>
>> On Thu, Mar 5, 2020 at 3:33 AM Ismaël Mejía  wrote:
>>
>>> +1 Sounds like a good improvement for users and maintainers !
>>>
>>> On Thu, Mar 5, 2020 at 6:59 AM Alex Van Boxel  wrote:
>>> >
>>> > +1, I can remember the countless hours that we fought with Google
>>> dependencies.
>>> >
>>> > On Thu, Mar 5, 2020, 04:07 Chamikara Jayalath 
>>> wrote:
>>> >>
>>> >> +1 for this.
>>> >>
>>> >> This will make life easy for many of our users and will help us keep
>>> GCP related dependencies compatible (which has not been easy in the past).
>>> >>
>>> >> On Wed, Mar 4, 2020 at 2:16 PM Tomo Suzuki 
>>> wrote:
>>> >>>
>>> >>> Hi Beam developers,
>>> >>>
>>> >>> Shall we use GCP Libraries BOM [1] to specify the Google-related
>>> library versions in Beam?
>>> >>>
>>> >>> I've been working on Beam's dependency upgrades in the past few
>>> months. It's time to consider a long-term solution to keep the libraries
>>> up-to-date with small maintenance effort. To achieve that, I propose Beam
>>> to use GCP Libraries BOM to set the Google-related library versions, rather
>>> than the current way of making changes in each of ~30 Google libraries with
>>> individual PRs [2].
>>> >>>
>>> >>> After the proposal is implemented, Beam project upgrades the BOM
>>> version to upgrade these Google-related libraries. This still needs to
>>> ensure the libraries in GCP Library BOM are compatible with Beam's other
>>> dependencies. (Linkage Checker will help with this job.) I believe
>>> onboarding GCP Libraries BOM will solve lots of incompatibilities which we
>>> have seen in gax, gRPC, google-cloud-core, and so on with minimal effort in
>>> Beam's developers.
>>> >>>
>>> >>> Created an issue to track this: BEAM-9444 [3]. I appreciate if you
>>> can share questions or feedback (thumbs-up / concerns).
>>> >>>
>>> >>> [1]:
>>> https://github.com/GoogleCloudPlatform/cloud-opensource-java/wiki/The-Google-Cloud-Platform-Libraries-BOM
>>> >>> [2]:
>>> https://github.com/apache/beam/pulls?page=1=is%3Apr+author%3Asuztomo
>>> >>> [3]: https://issues.apache.org/jira/browse/BEAM-9444
>>> >>>
>>> >>> --
>>> >>> Regards,
>>> >>> Tomo
>>>
>>

Re: Proposal: Beam to use GCP Libraries BOM

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

[DISCUSS] Query external resources as Tables with Beam SQL

Re: [EXTERNAL] Re: Java Build broken

Re: [DISCUSS] Query external resources as Tables with Beam SQL

Re: [DISCUSS] Query external resources as Tables with Beam SQL

Re: [DISCUSS] Query external resources as Tables with Beam SQL

No space left on device - beam-jenkins 1 and 7

Re: Proposal: Beam to use GCP Libraries BOM

Re: [VOTE] Upgrade gradle to 6.2

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

Re: Proposal: Beam to use GCP Libraries BOM

Re: [VOTE] Upgrade gradle to 6.2

Re: [DISCUSS] Query external resources as Tables with Beam SQL

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

Re: [DISCUSS] Query external resources as Tables with Beam SQL

Re: [VOTE] Upgrade gradle to 6.2

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

Re: [DISCUSS] Query external resources as Tables with Beam SQL

Re: [VOTE] Upgrade gradle to 6.2

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

Run Python PreCommit break?

Re: Contributing Twister2 runner to Apache Beam

Re: Contributing Twister2 runner to Apache Beam

Re: Run Python PreCommit break?

Re: [Discuss] Propose Calcite Vendor Release (1.22.0)

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #2

Re: Proposal: Beam to use GCP Libraries BOM

[VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

Re: [DISCUSS] Query external resources as Tables with Beam SQL

Re: [VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #3

Re: Proposal: Beam to use GCP Libraries BOM

Re: [DISCUSS] Query external resources as Tables with Beam SQL

[VOTE] Vendored Dependencies Release gRPC 1.26.0 v0.3 for BEAM-9288 RC #2

Re: Proposal: Beam to use GCP Libraries BOM

35 matches

Site Navigation

Mail list logo

Footer information