[jira] [Work logged] (BEAM-8550) @RequiresTimeSortedInput DoFn annotation

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8550?focusedWorklogId=382757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382757
 ]

ASF GitHub Bot logged work on BEAM-8550:


Author: ASF GitHub Bot
Created on: 06/Feb/20 07:28
Start Date: 06/Feb/20 07:28
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #8774: [BEAM-8550] 
Requires time sorted input
URL: https://github.com/apache/beam/pull/8774
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382757)
Time Spent: 10h 50m  (was: 10h 40m)

> @RequiresTimeSortedInput DoFn annotation
> 
>
> Key: BEAM-8550
> URL: https://issues.apache.org/jira/browse/BEAM-8550
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Implement new annotation {{@RequiresTimeSortedInput}} for stateful DoFn as 
> described in [design 
> document|https://docs.google.com/document/d/1ObLVUFsf1NcG8ZuIZE4aVy2RYKx2FfyMhkZYWPnI9-c/edit?usp=sharing].
>  First implementation will assume that:
>   - time is defined by timestamp in associated WindowedValue
>   - allowed lateness is explicitly zero and all late elements are dropped 
> (due to being out of order)
> The above properties are considered temporary and will be resolved by 
> subsequent extensions (backwards compatible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9037) Instant and duration as logical type

2020-02-05 Thread Alex Van Boxel (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Van Boxel closed BEAM-9037.


> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9037) Instant and duration as logical type

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9037?focusedWorklogId=382747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382747
 ]

ASF GitHub Bot logged work on BEAM-9037:


Author: ASF GitHub Bot
Created on: 06/Feb/20 06:12
Start Date: 06/Feb/20 06:12
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10486: [BEAM-9037] 
Instant and duration as logical type
URL: https://github.com/apache/beam/pull/10486#issuecomment-582753782
 
 
   Thank you! Happy Happy :-)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382747)
Time Spent: 5h  (was: 4h 50m)

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9178) Support ZetaSQL TIMESTAMP functions in BeamSQL

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9178?focusedWorklogId=382729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382729
 ]

ASF GitHub Bot logged work on BEAM-9178:


Author: ASF GitHub Bot
Created on: 06/Feb/20 05:33
Start Date: 06/Feb/20 05:33
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #10634: [BEAM-9178] 
Support all ZetaSQL TIMESTAMP functions
URL: https://github.com/apache/beam/pull/10634#issuecomment-582744585
 
 
   @apilloud Could you help run precommit tests and merge this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382729)
Time Spent: 2h 20m  (was: 2h 10m)

> Support ZetaSQL TIMESTAMP functions in BeamSQL
> --
>
> Key: BEAM-9178
> URL: https://issues.apache.org/jira/browse/BEAM-9178
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Support *all* TIMESTAMP functions defined in ZetaSQL (BigQuery Standard SQL). 
> See the full list of functions below:
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9178) Support ZetaSQL TIMESTAMP functions in BeamSQL

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9178?focusedWorklogId=382728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382728
 ]

ASF GitHub Bot logged work on BEAM-9178:


Author: ASF GitHub Bot
Created on: 06/Feb/20 05:32
Start Date: 06/Feb/20 05:32
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #10634: [BEAM-9178] 
Support all ZetaSQL TIMESTAMP functions
URL: https://github.com/apache/beam/pull/10634#issuecomment-582744585
 
 
   @apilloud Could you help running precommit tests and merge this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382728)
Time Spent: 2h 10m  (was: 2h)

> Support ZetaSQL TIMESTAMP functions in BeamSQL
> --
>
> Key: BEAM-9178
> URL: https://issues.apache.org/jira/browse/BEAM-9178
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Support *all* TIMESTAMP functions defined in ZetaSQL (BigQuery Standard SQL). 
> See the full list of functions below:
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9178) Support ZetaSQL TIMESTAMP functions in BeamSQL

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9178?focusedWorklogId=382727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382727
 ]

ASF GitHub Bot logged work on BEAM-9178:


Author: ASF GitHub Bot
Created on: 06/Feb/20 05:31
Start Date: 06/Feb/20 05:31
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #10634: [BEAM-9178] 
Support all ZetaSQL TIMESTAMP functions
URL: https://github.com/apache/beam/pull/10634#discussion_r374846150
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java
 ##
 @@ -770,88 +765,34 @@ private RexNode convertResolvedFunctionCall(
   throw new UnsupportedOperationException(
   "Only support TUMBLE, HOP AND SESSION functions right now.");
   }
-} else if (functionCall.getFunction().getGroup().equals("ZetaSQL")) {
-  op =
-  
SqlStdOperatorMappingTable.ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR.get(
-  functionCall.getFunction().getName());
-
+} else if ("ZetaSQL".equals(funGroup)) {
   if (op == null) {
-throw new UnsupportedOperationException(
-"Does not support ZetaSQL function: " + 
functionCall.getFunction().getName());
+TypeKind returnType =
+
SqlStdOperatorMappingTable.ZETASQL_FUNCTION_NAME_TO_RETURN_TYPE.get(funName);
+if (returnType != null) {
+  op =
+  SqlOperators.createSimpleSqlFunction(
+  funName, ZetaSqlUtils.zetaSqlTypeToCalciteType(returnType));
+} else {
+  throw new UnsupportedOperationException("Does not support ZetaSQL 
function: " + funName);
+}
   }
 
-  // There are different processes to handle argument conversion because 
INTERVAL is not a
-  // type in ZetaSQL.
-  if 
(FUNCTION_FAMILY_DATE_ADD.contains(functionCall.getFunction().getName())) {
-return convertTimestampAddFunction(functionCall, columnList, 
fieldList);
-  } else {
-for (ResolvedExpr expr : functionCall.getArgumentList()) {
-  operands.add(convertRexNodeFromResolvedExpr(expr, columnList, 
fieldList));
-}
+  for (ResolvedExpr expr : functionCall.getArgumentList()) {
+operands.add(convertRexNodeFromResolvedExpr(expr, columnList, 
fieldList));
   }
 } else {
-  throw new UnsupportedOperationException(
-  "Does not support function group: " + 
functionCall.getFunction().getGroup());
+  throw new UnsupportedOperationException("Does not support function 
group: " + funGroup);
 }
 
 SqlOperatorRewriter rewriter =
-
SqlStdOperatorMappingTable.ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR_REWRITER.get(
-functionCall.getFunction().getName());
+
SqlStdOperatorMappingTable.ZETASQL_FUNCTION_TO_CALCITE_SQL_OPERATOR_REWRITER.get(funName);
 
 if (rewriter != null) {
-  ret = rewriter.apply(rexBuilder(), operands);
+  return rewriter.apply(rexBuilder(), operands);
 } else {
-  ret = rexBuilder().makeCall(op, operands);
-}
-return ret;
-  }
-
-  private RexNode convertTimestampAddFunction(
-  ResolvedFunctionCall functionCall,
-  List columnList,
-  List fieldList) {
-
-TimeUnit unit =
-TIME_UNIT_CASTING_MAP.get(
-((ResolvedLiteral) 
functionCall.getArgumentList().get(2)).getValue().getEnumValue());
-
-if ((unit == TimeUnit.MICROSECOND) || (unit == TimeUnit.NANOSECOND)) {
 
 Review comment:
   Yeah right now it might work inconsistently. We can throw an exception 
during timestamp value conversion 
([here](https://github.com/apache/beam/blob/39c4a9fafb82dce126870a129c7848470d21006d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUtils.java#L186))
 if we see microsecond precision used, but I am not sure if we should do that. 
One  down side of that is we might break functions unnecessarily (e.g. 
CURRENT_TIMESTAMP). Or maybe we can leave it as it is and see if we can fully 
fix the precision problem? WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382727)
Time Spent: 2h  (was: 1h 50m)

> Support ZetaSQL TIMESTAMP functions in BeamSQL
> --
>
> Key: BEAM-9178
> URL: https://issues.apache.org/jira/browse/BEAM-9178
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Yueyang 

[jira] [Work logged] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9240?focusedWorklogId=382702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382702
 ]

ASF GitHub Bot logged work on BEAM-9240:


Author: ASF GitHub Bot
Created on: 06/Feb/20 03:19
Start Date: 06/Feb/20 03:19
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on issue #10744: [BEAM-9240]: Check 
for Nullability in typesEqual() method of FieldTyp…
URL: https://github.com/apache/beam/pull/10744#issuecomment-582717462
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382702)
Time Spent: 1h  (was: 50m)

> Check for Nullability in typesEqual() method of FieldType class
> ---
>
> Key: BEAM-9240
> URL: https://issues.apache.org/jira/browse/BEAM-9240
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.18.0
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{If two schemas are created like this:}}
> {{Schema schema1 = Schema.builder().addStringField("col1").build();}}
>  {{Schema schema2 = Schema.builder().addNullableField("col1", 
> FieldType.STRING).build();}}
>  
> {{schema1.typeEquals(schema2) returns "true" even though the schemas differ 
> by Nullability}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6807) Implement an Azure blobstore filesystem for Python SDK

2020-02-05 Thread Aldair Coronel (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031224#comment-17031224
 ] 

Aldair Coronel commented on BEAM-6807:
--

Hello! Will this task be available for GSoC 2020? If so, I'd love to implement 
it.

> Implement an Azure blobstore filesystem for Python SDK
> --
>
> Key: BEAM-6807
> URL: https://issues.apache.org/jira/browse/BEAM-6807
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Priority: Major
>  Labels: GSoC2019, azure, azureblob, gsoc, gsoc2019, mentor, 
> outreachy19dec
>
> This is similar to BEAM-2572, but for Azure's blobstore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382698
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 03:00
Start Date: 06/Feb/20 03:00
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582713735
 
 
   The test are triggered but test result does not appear in the UI - I looked 
up the test logs on Jenkins builds for targets I ran...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382698)
Time Spent: 3h  (was: 2h 50m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9230) Enable CrossLanguageValidateRunner test for Spark runner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9230?focusedWorklogId=382691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382691
 ]

ASF GitHub Bot logged work on BEAM-9230:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:23
Start Date: 06/Feb/20 02:23
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10758: [BEAM-9230] 
Enable CrossLanguageValidateRunner test for Spark runner
URL: https://github.com/apache/beam/pull/10758#issuecomment-582705769
 
 
   Thanks Heejong. This is great.
   
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382691)
Time Spent: 2h 10m  (was: 2h)

> Enable CrossLanguageValidateRunner test for Spark runner
> 
>
> Key: BEAM-9230
> URL: https://issues.apache.org/jira/browse/BEAM-9230
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Enable CrossLanguageValidateRunner test for Spark runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382689
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:13
Start Date: 06/Feb/20 02:13
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582703495
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382689)
Time Spent: 2h 50m  (was: 2h 40m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9230) Enable CrossLanguageValidateRunner test for Spark runner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9230?focusedWorklogId=382688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382688
 ]

ASF GitHub Bot logged work on BEAM-9230:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:13
Start Date: 06/Feb/20 02:13
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10758: [BEAM-9230] 
Enable CrossLanguageValidateRunner test for Spark runner
URL: https://github.com/apache/beam/pull/10758#issuecomment-582703470
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382688)
Time Spent: 2h  (was: 1h 50m)

> Enable CrossLanguageValidateRunner test for Spark runner
> 
>
> Key: BEAM-9230
> URL: https://issues.apache.org/jira/browse/BEAM-9230
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Enable CrossLanguageValidateRunner test for Spark runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382686
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:12
Start Date: 06/Feb/20 02:12
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582703132
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382686)
Time Spent: 2.5h  (was: 2h 20m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382687
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:12
Start Date: 06/Feb/20 02:12
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582703189
 
 
   Run PythonLint PreCommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382687)
Time Spent: 2h 40m  (was: 2.5h)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382685
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:11
Start Date: 06/Feb/20 02:11
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582702982
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382685)
Time Spent: 2h 20m  (was: 2h 10m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382684
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:10
Start Date: 06/Feb/20 02:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582702732
 
 
   My last comment did not trigger tests. (@tvalentyn - how did you resolved it 
last time?)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382684)
Time Spent: 2h 10m  (was: 2h)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382683
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:10
Start Date: 06/Feb/20 02:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582702541
 
 
   ping to trigger tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382683)
Time Spent: 2h  (was: 1h 50m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382681
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:09
Start Date: 06/Feb/20 02:09
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582702417
 
 
   ping to trigger tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382681)
Time Spent: 1h 40m  (was: 1.5h)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382682
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 02:09
Start Date: 06/Feb/20 02:09
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582702541
 
 
   ping to trigger tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382682)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2645) Implement DisplayData translation to/from protos

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2645?focusedWorklogId=382672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382672
 ]

ASF GitHub Bot logged work on BEAM-2645:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:42
Start Date: 06/Feb/20 01:42
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10770: [BEAM-2645] 
Define the display data model type
URL: https://github.com/apache/beam/pull/10770
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382672)
Time Spent: 1h 10m  (was: 1h)

> Implement DisplayData translation to/from protos
> 
>
> Key: BEAM-2645
> URL: https://issues.apache.org/jira/browse/BEAM-2645
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2645) Implement DisplayData translation to/from protos

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2645?focusedWorklogId=382670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382670
 ]

ASF GitHub Bot logged work on BEAM-2645:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:41
Start Date: 06/Feb/20 01:41
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10770: [BEAM-2645] 
Define the display data model type
URL: https://github.com/apache/beam/pull/10770#discussion_r375602370
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1205,58 +1207,58 @@ message FunctionSpec {
   bytes payload = 3;
 }
 
-// TODO: transfer javadoc here
-message DisplayData {
-
-  // (Required) The list of display data.
-  repeated Item items = 1;
-
-  // A complete identifier for a DisplayData.Item
-  message Identifier {
-
-// (Required) The transform originating this display data.
-string transform_id = 1;
-
-// (Optional) The URN indicating the type of the originating transform,
-// if there is one.
-string transform_urn = 2;
-
-string key = 3;
+// A set of well known URNs describing display data.
+//
+// All descriptions must contain how the value should be classified and how it
+// is encoded. Note that some types are logical types which convey contextual
+// information about the pipeline in addition to an encoding while others only
+// specify the encoding itself.
+message StandardDisplayData {
+  enum DisplayData {
+// A string label and value. Has a payload containing an encoded
+// LabelledStringPayload.
+LABELLED_STRING = 0 [(beam_urn) = "beam:display_data:labelled_string:v1"];
 
 Review comment:
   It doesn't matter for these constant fields since we don't expect to use the 
enum as an actual field anywhere on a message and only use them for lookups. We 
can always change it if we need to in the future and it won't impact execution 
or existing pipelines either.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382670)
Time Spent: 50m  (was: 40m)

> Implement DisplayData translation to/from protos
> 
>
> Key: BEAM-2645
> URL: https://issues.apache.org/jira/browse/BEAM-2645
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2645) Implement DisplayData translation to/from protos

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2645?focusedWorklogId=382671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382671
 ]

ASF GitHub Bot logged work on BEAM-2645:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:41
Start Date: 06/Feb/20 01:41
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10770: [BEAM-2645] Define 
the display data model type
URL: https://github.com/apache/beam/pull/10770#issuecomment-582695933
 
 
   This is not used in Python or Go.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382671)
Time Spent: 1h  (was: 50m)

> Implement DisplayData translation to/from protos
> 
>
> Key: BEAM-2645
> URL: https://issues.apache.org/jira/browse/BEAM-2645
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9037) Instant and duration as logical type

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9037?focusedWorklogId=382667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382667
 ]

ASF GitHub Bot logged work on BEAM-9037:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:38
Start Date: 06/Feb/20 01:38
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10486: 
[BEAM-9037] Instant and duration as logical type
URL: https://github.com/apache/beam/pull/10486
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382667)
Time Spent: 4h 50m  (was: 4h 40m)

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9037) Instant and duration as logical type

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9037?focusedWorklogId=382666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382666
 ]

ASF GitHub Bot logged work on BEAM-9037:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:38
Start Date: 06/Feb/20 01:38
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10486: [BEAM-9037] 
Instant and duration as logical type
URL: https://github.com/apache/beam/pull/10486#issuecomment-582695162
 
 
   Sorry for the delay, I was out of the office for a few days. Merging now!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382666)
Time Spent: 4h 40m  (was: 4.5h)

> Instant and duration as logical type 
> -
>
> Key: BEAM-9037
> URL: https://issues.apache.org/jira/browse/BEAM-9037
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The proto schema includes Timestamp and Duration with nano precision. The 
> logical types should be promoted to the core logical types, so they can be 
> handled on various IO's as standard mandatory conversions.
> This means that the logical type should use the proto specific Timestamp and 
> Duration but the java 8 Instant and Duration.
> See discussion in the design document:
> [https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit#heading=h.9uhml95iygqr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5391) Use repeated fields for model for ordering of input/output/etc maps

2020-02-05 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031181#comment-17031181
 ] 

Luke Cwik commented on BEAM-5391:
-

The plan is to document how names are munged by runners as described in 
BEAM-3203 and ensure that runners do this. This should allow for SDKs to use 
schemes like i0, i1, ... without needing to worry about the runner breaking 
this except for a few expansions which will be documented.

> Use repeated fields for model for ordering of input/output/etc maps
> ---
>
> Key: BEAM-5391
> URL: https://issues.apache.org/jira/browse/BEAM-5391
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Go SDK uses positional side input tagging and needs the ordering. It 
> cannot as easily encode it into a sequential tag ("i0", "i1", ..) because 
> runners much with the naming when doing certain optimizations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=382665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382665
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:36
Start Date: 06/Feb/20 01:36
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10593: [BEAM-7746] Add 
typing for try_split
URL: https://github.com/apache/beam/pull/10593#issuecomment-582694781
 
 
   Fantastic.  That's what I was hoping for.   Once some of the outstanding 
typing PRs get merged the noise in the mypy output should be greatly reduced, 
so that should make things easier. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382665)
Time Spent: 60h 50m  (was: 60h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 60h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5391) Use repeated fields for model for ordering of input/output/etc maps

2020-02-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-5391.
-
Fix Version/s: Not applicable
 Assignee: Luke Cwik
   Resolution: Won't Fix

There isn't enough value to do this. Also no progress has been done in 1.5 
years and working towards releasing a stable version across Beam versions.

> Use repeated fields for model for ordering of input/output/etc maps
> ---
>
> Key: BEAM-5391
> URL: https://issues.apache.org/jira/browse/BEAM-5391
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Go SDK uses positional side input tagging and needs the ordering. It 
> cannot as easily encode it into a sequential tag ("i0", "i1", ..) because 
> runners much with the naming when doing certain optimizations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-3223) PTransform spec should not reuse FunctionSpec

2020-02-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-3223.
---
Fix Version/s: Not applicable
 Assignee: Luke Cwik
   Resolution: Won't Fix

There isn't enough value to do this. Also no progress has been done in 2+ years 
and working towards releasing a stable version across Beam versions.

> PTransform spec should not reuse FunctionSpec
> -
>
> Key: BEAM-3223
> URL: https://issues.apache.org/jira/browse/BEAM-3223
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>
> We should add a new type instead, TransformSpec, say, or just inline a URN 
> and payload. It's confusing otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-6655) Support progress reporting over FnAPI

2020-02-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik closed BEAM-6655.
---
Fix Version/s: Not applicable
   Resolution: Duplicate

> Support progress reporting over FnAPI
> -
>
> Key: BEAM-6655
> URL: https://issues.apache.org/jira/browse/BEAM-6655
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-harness, sdk-py-harness
>Reporter: Robert Bradshaw
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-02-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8271.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

Based on the comments in the PR it seems as though we accepted to keep the 
field as bytes and fixed the type hints to appropriate account for the field 
being bytes.

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=382662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382662
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:22
Start Date: 06/Feb/20 01:22
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #10785: [BEAM-8630] 
Add logical types, make public
URL: https://github.com/apache/beam/pull/10785
 
 
   Found some missing type translation when wiring 
`beamRowToZetaSqlStructValue` and `createZetaSqlStructTypeFromBeamSchema` up to 
ZetaSQL compliance tests.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=382663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382663
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:22
Start Date: 06/Feb/20 01:22
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10785: [BEAM-8630] Add 
logical types, make public
URL: https://github.com/apache/beam/pull/10785#issuecomment-582691458
 
 
   R: @robinyqiu 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382663)
Time Spent: 10h  (was: 9h 50m)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8172) Add a label field to FunctionSpec proto

2020-02-05 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031172#comment-17031172
 ] 

Luke Cwik edited comment on BEAM-8172 at 2/6/20 1:19 AM:
-

The display_data on the PTransform is specifically meant to enable this. Please 
use that instead. We can add display_data for other types where it would be 
useful.


was (Author: lcwik):
The display_data on the PTransform is specifically meant to enable this. Please 
use that instead.

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: Not applicable
>
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8172) Add a label field to FunctionSpec proto

2020-02-05 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031172#comment-17031172
 ] 

Luke Cwik edited comment on BEAM-8172 at 2/6/20 1:18 AM:
-

The display_data on the PTransform is specifically meant to enable this. Please 
use that instead.


was (Author: lcwik):
The display_data on the PTransform is specifically meant to enable this.

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8172) Add a label field to FunctionSpec proto

2020-02-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-8172.
-
Fix Version/s: Not applicable
   Resolution: Won't Fix

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: Not applicable
>
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8172) Add a label field to FunctionSpec proto

2020-02-05 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031172#comment-17031172
 ] 

Luke Cwik commented on BEAM-8172:
-

The display_data on the PTransform is specifically meant to enable this.

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3221) Model pipeline representation improvements

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3221?focusedWorklogId=382656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382656
 ]

ASF GitHub Bot logged work on BEAM-3221:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:13
Start Date: 06/Feb/20 01:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10779: [BEAM-3221] Clarify 
documentation for StandardTransforms.Primitives, Pipeline, and PTransform.
URL: https://github.com/apache/beam/pull/10779#issuecomment-582689320
 
 
   R: @TheNeuralBit 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382656)
Time Spent: 1h 40m  (was: 1.5h)

> Model pipeline representation improvements
> --
>
> Key: BEAM-3221
> URL: https://issues.apache.org/jira/browse/BEAM-3221
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Collections of various (breaking) tweaks to the Runner API, notably the 
> pipeline representation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=382647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382647
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:05
Start Date: 06/Feb/20 01:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10784: [BEAM-7746] 
DoFnRunner.receive() does not return a value
URL: https://github.com/apache/beam/pull/10784
 
 
   This must be some cruft leftover from a refactor, because I can't see any 
way that `receive` could possibly return a value. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382655
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:12
Start Date: 06/Feb/20 01:12
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582688942
 
 
   Try upgrading Guava to the version that has the missing methods, then get 
(potential) errors from tests and from beam-linkage-check.sh
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382655)
Remaining Estimate: 159h 10m  (was: 159h 20m)
Time Spent: 8h 50m  (was: 8h 40m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 8h 50m
>  Remaining Estimate: 159h 10m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382651
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:07
Start Date: 06/Feb/20 01:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#issuecomment-582687779
 
 
   beam_PreCommit_Website_Stage_GCS
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382651)
Time Spent: 6h 10m  (was: 6h)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382650
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:07
Start Date: 06/Feb/20 01:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#issuecomment-582687688
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382650)
Time Spent: 6h  (was: 5h 50m)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382653
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:07
Start Date: 06/Feb/20 01:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#issuecomment-582687812
 
 
   beam_PreCommit_Website
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382653)
Time Spent: 6h 20m  (was: 6h 10m)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=382648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382648
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:05
Start Date: 06/Feb/20 01:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10784: [BEAM-7746] 
DoFnRunner.receive() does not return a value
URL: https://github.com/apache/beam/pull/10784#issuecomment-582687283
 
 
   R: @udim 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382648)
Time Spent: 60h 40m  (was: 60.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 60h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382646
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:03
Start Date: 06/Feb/20 01:03
Worklog Time Spent: 10m 
  Work Description: veblush commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582686709
 
 
   @suztomo I'm curious about that, too. @chamikaramj might know the answer?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382646)
Remaining Estimate: 159h 20m  (was: 159.5h)
Time Spent: 8h 40m  (was: 8.5h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 8h 40m
>  Remaining Estimate: 159h 20m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382645
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 01:02
Start Date: 06/Feb/20 01:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10762: [BEAM-3453] Use 
default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#issuecomment-582686497
 
 
   LGTM, thank you. @angoenka do you have additional comments?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382645)
Time Spent: 1.5h  (was: 1h 20m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382643
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:57
Start Date: 06/Feb/20 00:57
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582685311
 
 
   @veblush What prevents us upgrading Guava? (Last time I checked it was 
Cassandra.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382643)
Remaining Estimate: 159.5h  (was: 159h 40m)
Time Spent: 8.5h  (was: 8h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 8.5h
>  Remaining Estimate: 159.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5504) PubsubAvroTable

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=382641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382641
 ]

ASF GitHub Bot logged work on BEAM-5504:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:49
Start Date: 06/Feb/20 00:49
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #10487: [BEAM-5504] 
Introduce PubsubAvroTable
URL: https://github.com/apache/beam/pull/10487#discussion_r375588906
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/AvroPubsubMessageToRow.java
 ##
 @@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.pubsub;
+
+import com.google.auto.value.AutoValue;
+import java.io.Serializable;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.utils.AvroUtils;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTagList;
+import org.joda.time.Instant;
+
+/** Read side converter for {@link PubsubMessage} with Avro payload. */
+@Internal
+@Experimental
+@AutoValue
+public abstract class AvroPubsubMessageToRow extends PubsubMessageToRow 
implements Serializable {
+
+  @Override
+  public PCollectionTuple expand(PCollection input) {
+PCollectionTuple rows =
+input.apply(
+ParDo.of(
+useFlatSchema()
+? new FlatSchemaPubsubMessageToRow(messageSchema(), 
useDlq())
+: new NestedSchemaPubsubMessageToRow(messageSchema(), 
useDlq()))
+.withOutputTags(
+MAIN_TAG, useDlq() ? TupleTagList.of(DLQ_TAG) : 
TupleTagList.empty()));
+return rows;
+  }
+
+  public static Builder builder() {
+return new AutoValue_AvroPubsubMessageToRow.Builder();
+  }
+
+  @Internal
+  private static class FlatSchemaPubsubMessageToRow extends 
DoFn {
+
+private final Schema messageSchema;
+
+private final boolean useDlq;
+
+protected FlatSchemaPubsubMessageToRow(Schema messageSchema, boolean 
useDlq) {
+  this.messageSchema = messageSchema;
+  this.useDlq = useDlq;
+}
+
+private GenericRecord parsePayload(PubsubMessage pubsubMessage) {
+  byte[] avroPayload = pubsubMessage.getPayload();
+
+  // Construct payload flat schema.
+  Schema payloadSchema =
+  new Schema(
+  messageSchema.getFields().stream()
+  .filter(field -> !TIMESTAMP_FIELD.equals(field.getName()))
+  .collect(Collectors.toList()));
+  org.apache.avro.Schema avroSchema = 
AvroUtils.toAvroSchema(payloadSchema);
+  return AvroUtils.toGenericRecord(avroPayload, avroSchema);
+}
+
+private Object getValuedForFieldFlatSchema(Field field, Instant timestamp, 
Row payload) {
+  String fieldName = field.getName();
+  if (TIMESTAMP_FIELD.equals(fieldName)) {
+return timestamp;
+  } else {
+return payload.getValue(fieldName);
+  }
+}
+
+@ProcessElement
+public void processElement(ProcessContext context) {
+  try {
+GenericRecord record = parsePayload(context.element());
+System.out.println(record);
+Row row = AvroUtils.toBeamRowStrict(record, null);
+List values =
+messageSchema.getFields().stream()
+.map(field -> getValuedForFieldFlatSchema(field, 
context.timestamp(), row))
+.collect(Collectors.toList());
+

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=382642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382642
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:49
Start Date: 06/Feb/20 00:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10593: [BEAM-7746] Add 
typing for try_split
URL: https://github.com/apache/beam/pull/10593#issuecomment-582683585
 
 
   OK, I'm going to merge this and let @boyuanzz build on top of it. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382642)
Time Spent: 60h 20m  (was: 60h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 60h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5504) PubsubAvroTable

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=382640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382640
 ]

ASF GitHub Bot logged work on BEAM-5504:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:48
Start Date: 06/Feb/20 00:48
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #10487: [BEAM-5504] 
Introduce PubsubAvroTable
URL: https://github.com/apache/beam/pull/10487#discussion_r375588906
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/AvroPubsubMessageToRow.java
 ##
 @@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.pubsub;
+
+import com.google.auto.value.AutoValue;
+import java.io.Serializable;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Internal;
+import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.Field;
+import org.apache.beam.sdk.schemas.utils.AvroUtils;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionTuple;
+import org.apache.beam.sdk.values.Row;
+import org.apache.beam.sdk.values.TupleTagList;
+import org.joda.time.Instant;
+
+/** Read side converter for {@link PubsubMessage} with Avro payload. */
+@Internal
+@Experimental
+@AutoValue
+public abstract class AvroPubsubMessageToRow extends PubsubMessageToRow 
implements Serializable {
+
+  @Override
+  public PCollectionTuple expand(PCollection input) {
+PCollectionTuple rows =
+input.apply(
+ParDo.of(
+useFlatSchema()
+? new FlatSchemaPubsubMessageToRow(messageSchema(), 
useDlq())
+: new NestedSchemaPubsubMessageToRow(messageSchema(), 
useDlq()))
+.withOutputTags(
+MAIN_TAG, useDlq() ? TupleTagList.of(DLQ_TAG) : 
TupleTagList.empty()));
+return rows;
+  }
+
+  public static Builder builder() {
+return new AutoValue_AvroPubsubMessageToRow.Builder();
+  }
+
+  @Internal
+  private static class FlatSchemaPubsubMessageToRow extends 
DoFn {
+
+private final Schema messageSchema;
+
+private final boolean useDlq;
+
+protected FlatSchemaPubsubMessageToRow(Schema messageSchema, boolean 
useDlq) {
+  this.messageSchema = messageSchema;
+  this.useDlq = useDlq;
+}
+
+private GenericRecord parsePayload(PubsubMessage pubsubMessage) {
+  byte[] avroPayload = pubsubMessage.getPayload();
+
+  // Construct payload flat schema.
+  Schema payloadSchema =
+  new Schema(
+  messageSchema.getFields().stream()
+  .filter(field -> !TIMESTAMP_FIELD.equals(field.getName()))
+  .collect(Collectors.toList()));
+  org.apache.avro.Schema avroSchema = 
AvroUtils.toAvroSchema(payloadSchema);
+  return AvroUtils.toGenericRecord(avroPayload, avroSchema);
+}
+
+private Object getValuedForFieldFlatSchema(Field field, Instant timestamp, 
Row payload) {
+  String fieldName = field.getName();
+  if (TIMESTAMP_FIELD.equals(fieldName)) {
+return timestamp;
+  } else {
+return payload.getValue(fieldName);
+  }
+}
+
+@ProcessElement
+public void processElement(ProcessContext context) {
+  try {
+GenericRecord record = parsePayload(context.element());
+System.out.println(record);
+Row row = AvroUtils.toBeamRowStrict(record, null);
+List values =
+messageSchema.getFields().stream()
+.map(field -> getValuedForFieldFlatSchema(field, 
context.timestamp(), row))
+.collect(Collectors.toList());
+

[jira] [Work logged] (BEAM-5504) PubsubAvroTable

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=382639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382639
 ]

ASF GitHub Bot logged work on BEAM-5504:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:47
Start Date: 06/Feb/20 00:47
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #10487: [BEAM-5504] 
Introduce PubsubAvroTable
URL: https://github.com/apache/beam/pull/10487#discussion_r375588508
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -339,6 +350,35 @@ public static Schema toBeamSchema(org.apache.avro.Schema 
schema) {
 return toAvroSchema(beamSchema, null, null);
   }
 
+  /** Convert a {@link GenericRecord} to an corresponding array of bytes. */
+  public static byte[] toBytes(GenericRecord record) {
+org.apache.avro.Schema schema = record.getSchema();
+DatumWriter datumWriter = new GenericDatumWriter<>(schema);
+try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
+  Encoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
+  datumWriter.write(record, encoder);
+  encoder.flush();
+  return outputStream.toByteArray();
+} catch (IOException exception) {
+  throw new RuntimeException("Fail to parse generic record", exception);
+}
+  }
+
+  /** Convert an array to bytes to a {@link GenericRecord} with the target 
schema. */
+  public static GenericRecord toGenericRecord(byte[] bytes, 
org.apache.avro.Schema schema) {
+DatumReader datumReader = new GenericDatumReader<>(schema);
+try (InputStream inputStream = new SeekableByteArrayInput(bytes)) {
+  Decoder decoder = DecoderFactory.get().binaryDecoder(inputStream, null);
+  GenericRecord record = datumReader.read(null, decoder);
+  if (record != null) {
+return record;
+  }
+} catch (IOException exception) {
+  throw new RuntimeException("Failed to extract the record from the 
payload.", exception);
+}
+throw new RuntimeException("No record is extracted from the payload");
+  }
+
 
 Review comment:
   I think `AvroUtils.java` is good place. Coder seems not being used as util 
class. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382639)
Time Spent: 3h 50m  (was: 3h 40m)

> PubsubAvroTable
> ---
>
> Key: BEAM-5504
> URL: https://issues.apache.org/jira/browse/BEAM-5504
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382636
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:41
Start Date: 06/Feb/20 00:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582681712
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382636)
Remaining Estimate: 159h 40m  (was: 159h 50m)
Time Spent: 8h 20m  (was: 8h 10m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 8h 20m
>  Remaining Estimate: 159h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=382637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382637
 ]

ASF GitHub Bot logged work on BEAM-9250:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:41
Start Date: 06/Feb/20 00:41
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10782: [BEAM-9250] 
Update verify_release_build script to run python tests with dev version.
URL: https://github.com/apache/beam/pull/10782
 
 
   Update script to run python dataflow batch tests with dev version.
   Example: https://github.com/apache/beam/pull/10781/files
   
   R: @aaltay 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-5504) PubsubAvroTable

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=382633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382633
 ]

ASF GitHub Bot logged work on BEAM-5504:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:38
Start Date: 06/Feb/20 00:38
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10487: [BEAM-5504] 
Introduce PubsubAvroTable
URL: https://github.com/apache/beam/pull/10487#issuecomment-582681127
 
 
   Sorry I was busy on something else so didn't review this PR in time. But 
great thanks @TheNeuralBit jumps in to review it!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382633)
Time Spent: 3h 40m  (was: 3.5h)

> PubsubAvroTable
> ---
>
> Key: BEAM-5504
> URL: https://issues.apache.org/jira/browse/BEAM-5504
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382631
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:35
Start Date: 06/Feb/20 00:35
Worklog Time Spent: 10m 
  Work Description: davidwrede commented on pull request #10745: 
[BEAM-9219] Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#discussion_r375585355
 
 

 ##
 File path: website/src/documentation/sdks/java-dependencies.md
 ##
 @@ -26,336 +26,24 @@ behavior in the service. If you are using any of these 
packages in your code, be
 aware that some libraries are not forward-compatible and you may need to pin to
 the listed versions that will be in scope during execution.
 
-To see the compile and runtime dependencies for your Beam SDK version, 
expand
-the relevant section below.
+Compile and runtime dependencies for your Beam SDK version are listed in 
`BeamModulePlugin.groovy` in the Beam repository. To view them, perform the 
following steps:
 
-2.9.0
+1. Open `BeamModulePlugin.groovy`.
 
-Beam SDK for Java 2.9.0 has the following compile and runtime 
dependencies.
+```
+
https://raw.githubusercontent.com/apache/beam/v/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 
 Review comment:
   I don't think the disambiguation is required here given the text and example 
below the code block, but since you're the second person to bring it up, I'll 
add the change in. :)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382631)
Time Spent: 5h 50m  (was: 5h 40m)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382628
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:32
Start Date: 06/Feb/20 00:32
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582679619
 
 
   Seems like commands not working. Let's wait a bit to see if latency is just 
high.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382628)
Remaining Estimate: 159h 50m  (was: 160h)
Time Spent: 8h 10m  (was: 8h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 8h 10m
>  Remaining Estimate: 159h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382624
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:29
Start Date: 06/Feb/20 00:29
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582679104
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382624)
Remaining Estimate: 160h  (was: 160h 10m)
Time Spent: 8h  (was: 7h 50m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 8h
>  Remaining Estimate: 160h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382623
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:29
Start Date: 06/Feb/20 00:29
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582679038
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382623)
Remaining Estimate: 160h 10m  (was: 160h 20m)
Time Spent: 7h 50m  (was: 7h 40m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 7h 50m
>  Remaining Estimate: 160h 10m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=382622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382622
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:23
Start Date: 06/Feb/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10595: [BEAM-8271] 
Properly encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382622)
Time Spent: 1.5h  (was: 1h 20m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382618
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:13
Start Date: 06/Feb/20 00:13
Worklog Time Spent: 10m 
  Work Description: davidwrede commented on pull request #10745: 
[BEAM-9219] Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#discussion_r375579487
 
 

 ##
 File path: website/src/documentation/sdks/python-dependencies.md
 ##
 @@ -26,460 +26,41 @@ behavior in the service. If you are using any of these 
packages in your code, be
 aware that some libraries are not forward-compatible and you may need to pin to
 the listed versions that will be in scope during execution.
 
-To see the compile and runtime dependencies for your Beam SDK version, 
expand
-the relevant section below.
+Dependencies for your Beam SDK version are listed in `setup.py` in the Beam 
repository. To view them, perform the following steps:
 
-2.17.0
+1. Open `setup.py`.
 
-Beam SDK for Python 2.17.0 has the following compile and runtime 
dependencies.
+```
+
https://raw.githubusercontent.com/apache/beam/v/sdks/python/setup.py
+```
+
+Replace `` with the major.minor.patch version of the SDK. 
For example, 
{:target="_blank"}
 will provide the dependencies for the 2.18.0 release.
 
 Review comment:
   As long as that is consistent between versions of the file (for supported 
SDKs), then I think that makes sense. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382618)
Time Spent: 5.5h  (was: 5h 20m)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382620
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:13
Start Date: 06/Feb/20 00:13
Worklog Time Spent: 10m 
  Work Description: davidwrede commented on pull request #10745: 
[BEAM-9219] Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#discussion_r375579600
 
 

 ##
 File path: website/src/documentation/sdks/python-dependencies.md
 ##
 @@ -26,460 +26,41 @@ behavior in the service. If you are using any of these 
packages in your code, be
 aware that some libraries are not forward-compatible and you may need to pin to
 the listed versions that will be in scope during execution.
 
-To see the compile and runtime dependencies for your Beam SDK version, 
expand
-the relevant section below.
+Dependencies for your Beam SDK version are listed in `setup.py` in the Beam 
repository. To view them, perform the following steps:
 
-2.17.0
+1. Open `setup.py`.
 
-Beam SDK for Python 2.17.0 has the following compile and runtime 
dependencies.
+```
+
https://raw.githubusercontent.com/apache/beam/v/sdks/python/setup.py
+```
+
+Replace `` with the major.minor.patch version of the SDK. 
For example, 
{:target="_blank"}
 will provide the dependencies for the 2.18.0 release.
+
+
+2. Review the core dependency list under `REQUIRED_PACKAGES`.
 
-
-PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.3.0,0.3.1
-  fastavro=0.21.4,0.22
-  funcsigs=1.0.2,2; python_version  
"3.0"
-  future=0.16.0,1.0.0
-  futures=3.2.0,4.0.0; python_version  
"3.0"
-  google-apitools=0.5.28,0.5.29
-  google-cloud-bigquery=1.6.0,1.18.0
-  google-cloud-bigtable=0.31.1,1.1.0
-  google-cloud-core=0.28.1,2
-  google-cloud-datastore=1.7.1,1.8.0
-  google-cloud-pubsub=0.39.0,1.1.0
-  googledatastore=7.0.1,7.1; python_version  
"3.0"
-  grpcio=1.12.1,2
-  hdfs=2.1.0,3.0.0
-  httplib2=0.8,=0.12.0
-  mock=1.0.1,3.0.0
-  oauth2client=2.0.1,4
-  proto-google-cloud-datastore-v1=0.90.0,=0.90.4; 
python_version  "3.0"
-  protobuf=3.5.0.post1,4
-  pyarrow=0.15.1,0.16.0; python_version = "3.0" 
or platform_system != "Windows"
-  pydot=1.2.0,2
-  pymongo=3.8.0,4.0.0
-  python-dateutil=2.8.0,3
-  pytz=2018.3
-  pyvcf=0.6.8,0.7.0; python_version  
"3.0"
-  typing=3.6.0,3.7.0; python_version  
"3.5.0"
-
+**Note:** If you require [extra features]({{ site.baseurl 
}}/get-started/quickstart-py#extra-requirements) such as `gcp` or `test`, you 
should review the lists under `REQUIRED_TEST_PACKAGES`, `GCP_REQUIREMENTS`, or 
`INTERACTIVE_BEAM` for additional dependencies. 
 
-
+You can also retrieve the dependency list from the command line using the 
following process:
 
-2.16.0
+1.  Create a clean virtual environment on your local machine.
 
-Beam SDK for Python 2.16.0 has the following compile and
-  runtime dependencies.
+Python 3:
 
-
-  PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.3.0,0.3.1
-  fastavro=0.21.4,0.22
-  funcsigs=1.0.2,2; python_version  
"3.0"
-  future=0.16.0,1.0.0
-  futures=3.2.0,4.0.0; python_version  
"3.0"
-  google-apitools=0.5.28,0.5.29
-  google-cloud-bigquery=1.6.0,1.18.0
-  google-cloud-bigtable=0.31.1,1.1.0
-  google-cloud-core=0.28.1,2
-  google-cloud-datastore=1.7.1,1.8.0
-  google-cloud-pubsub=0.39.0,1.1.0
-  googledatastore=7.0.1,7.1; python_version  
"3.0"
-  grpcio=1.12.1,2
-  hdfs=2.1.0,3.0.0
-  httplib2=0.8,=0.12.0
-  mock=1.0.1,3.0.0
-  oauth2client=2.0.1,4
-  proto-google-cloud-datastore-v1=0.90.0,=0.90.4; 
python_version  "3.0"
-  protobuf=3.5.0.post1,4
-  pyarrow=0.11.1,0.15.0; python_version = "3.0" 
or platform_system != "Windows"
-  pydot=1.2.0,2
-  pymongo=3.8.0,4.0.0
-  python-dateutil=2.8.0,3
-  pytz=2018.3
-  pyvcf=0.6.8,0.7.0; python_version  
"3.0"
-  pyyaml=3.12,4.0.0
-  typing=3.6.0,3.7.0; python_version  
"3.5.0"
-
+```
+$ python3 -m venv env && source env/bin/activate
+```
+
+Python 2: 
 
-
+```
+$ pip install virtualenv && virtualenv env && source env/bin/activate
+```
 
-2.15.0
+2. [Install the Beam Python SDK]({{ site.baseurl 
}}/get-started/quickstart-py/#download-and-install).
 
-Beam SDK for Python 2.15.0 has the following compile and
-  runtime dependencies.
-
-  PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.2.9,0.2.10
-  fastavro=0.21.4,0.22
-  future=0.16.0,1.0.0
-  

[jira] [Work logged] (BEAM-9213) FlinkRunner ignores --flink_submit_uber_jar when master unset

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9213?focusedWorklogId=382619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382619
 ]

ASF GitHub Bot logged work on BEAM-9213:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:13
Start Date: 06/Feb/20 00:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10708: [BEAM-9213] 
throw error when flink_submit_uber_jar and not flink_master
URL: https://github.com/apache/beam/pull/10708#discussion_r375579498
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -51,8 +51,11 @@ def default_job_server(self, options):
 flink_master = self.add_http_scheme(
 flink_options.flink_master)
 flink_options.flink_master = flink_master
-if (flink_options.flink_submit_uber_jar
-and flink_master not in MAGIC_HOST_NAMES):
+if flink_options.flink_submit_uber_jar:
+  if flink_master in MAGIC_HOST_NAMES:
+raise ValueError(
+'Cannot use flink_submit_uber_jar with flink_master %s'
+% flink_master)
 
 Review comment:
   Certainly. I would prefer that users not have to set this parameter at 
all--it would be there as an opt-out just in case (or possibly for testing). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382619)
Time Spent: 1h 40m  (was: 1.5h)

> FlinkRunner ignores --flink_submit_uber_jar when master unset
> -
>
> Key: BEAM-9213
> URL: https://issues.apache.org/jira/browse/BEAM-9213
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Instead, an error should be thrown to let the user know that 
> flink_submit_uber_jar is incompatible with auto/local master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382617
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:12
Start Date: 06/Feb/20 00:12
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582674458
 
 
   @veblush OK. Then let's see how the post commit checks work.
   
   Run Java PostCommit
   Run Java HadoopFormatIO Performance Test
   Run BigQueryIO Streaming Performance Test Java
   Run Dataflow ValidatesRunner
   Run Spark ValidatesRunner
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382617)
Remaining Estimate: 160h 20m  (was: 160.5h)
Time Spent: 7h 40m  (was: 7.5h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 7h 40m
>  Remaining Estimate: 160h 20m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3453?focusedWorklogId=382616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382616
 ]

ASF GitHub Bot logged work on BEAM-3453:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:11
Start Date: 06/Feb/20 00:11
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10762: 
[BEAM-3453] Use default project when creating pubsub subscription
URL: https://github.com/apache/beam/pull/10762#discussion_r375579010
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py
 ##
 @@ -516,7 +524,7 @@ def get_subscription(cls, transform, project, 
short_topic_name,
 
 sub_client = pubsub.SubscriberClient()
 sub_name = sub_client.subscription_path(
-project, 'beam_%d_%x' % (int(time.time()), random.randrange(1 << 32)))
+sub_project, 'beam_%d_%x' % (int(time.time()), random.randrange(1 << 
32)))
 
 Review comment:
   According to our offline discussion, I've removed the code that gets the 
project from google.auth and we now only rely on the project specified in 
pipeline_options. PTAL again. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382616)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow usage of public Google PubSub topics in Python DirectRunner
> -
>
> Key: BEAM-3453
> URL: https://issues.apache.org/jira/browse/BEAM-3453
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Affects Versions: 2.2.0
>Reporter: Charles Chen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, the Beam Python DirectRunner does not allow the usage of data from 
> public Google Cloud PubSub topics.  We should allow this functionality so 
> that users can more easily test Beam Python's streaming functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382615
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375574532
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
 
 Review comment:
   Is this specific to files? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382615)
Time Spent: 50m  (was: 40m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382610
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375575923
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message ArtifactInformation {
+  string urn = 1;
 
 Review comment:
   s/urn/type?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382610)
Time Spent: 40m  (was: 0.5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382613
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375575037
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
 
 Review comment:
   It seems this only makes sense on a shared filesystem. (And could result in 
confusion otherwise.) Perhaps we should also store a hash (e.g. sha256) so a 
check of "sameness" can be performed? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382613)
Time Spent: 40m  (was: 0.5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382608
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375572386
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
 
 Review comment:
   Nit: separate these with a newline, so the definitions cling more tightly to 
the constants. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382608)
Time Spent: 0.5h  (was: 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382611
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375573106
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message ArtifactInformation {
+  string urn = 1;
+  bytes payload = 2;
+  string artifact_id = 3;
 
 Review comment:
   It seems that payload and artifact_id are redundant. Merge them? Should 
version-range be part of the payload as well? (Or, if it's a top-level 
attribute, should it be in a standard format (not sure if there is one, but if 
there is it'd be nice to be able to do generic intersections and merging...)).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382611)
Time Spent: 40m  (was: 0.5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382614
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375572820
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
 
 Review comment:
   Or should this be URL (or maybe REMOTE) to allow for ftp, etc? It could be 
argued we want to restrict to HTTP(S), though I don't know what would be gained 
by that. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382614)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382612
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375576529
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message ArtifactInformation {
+  string urn = 1;
+  bytes payload = 2;
+  string artifact_id = 3;
+  string version_range = 4;
+}
 
 Review comment:
   Seems we also need a staging name and/or role field--for embedded, there's 
not even a file extension to go off of, and even if we have an extension it may 
not be unambiguous, e.g. for a .tar.gz file, is it something that should be pip 
installed, or something that should simply be placed in an accessible location 
as data. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382612)
Time Spent: 40m  (was: 0.5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=382609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382609
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:10
Start Date: 06/Feb/20 00:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r375578623
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
 
 Review comment:
   I think we'll need the ability to serve up bytes that don't fit into a proto 
(e.g. huge jar files). Perhaps this could be yet another type, and a streaming 
protocol added to the extension service that allows one to retrieve the bytes 
given a token + environment? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382609)
Time Spent: 0.5h  (was: 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9175) Introduce an autoformatting tool to Python SDK

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9175?focusedWorklogId=382606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382606
 ]

ASF GitHub Bot logged work on BEAM-9175:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:54
Start Date: 05/Feb/20 23:54
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10684: [BEAM-9175] Introduce 
an autoformatting tool to Python SDK
URL: https://github.com/apache/beam/pull/10684#issuecomment-582669597
 
 
   Is it possible to update some docs/wiki, explaining how users could run this 
locally?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382606)
Time Spent: 8h 20m  (was: 8h 10m)

> Introduce an autoformatting tool to Python SDK
> --
>
> Key: BEAM-9175
> URL: https://issues.apache.org/jira/browse/BEAM-9175
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Michał Walenia
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> It seems there are three main options:
>  * black - very simple, but not configurable at all (except for line length), 
> would drastically change code style
>  * yapf - more options to tweak, can omit parts of code
>  * autopep8 - more similar to spotless - only touches code that breaks 
> formatting guidelines, can use pycodestyle and flake8 as configuration
>  The rigidity of Black makes it unusable for Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5504) PubsubAvroTable

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=382605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382605
 ]

ASF GitHub Bot logged work on BEAM-5504:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:52
Start Date: 05/Feb/20 23:52
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10487: [BEAM-5504] 
Introduce PubsubAvroTable
URL: https://github.com/apache/beam/pull/10487#issuecomment-582669130
 
 
   Nope, not blocked! I just wanted to check in in case there's anything I can 
do to help :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382605)
Time Spent: 3.5h  (was: 3h 20m)

> PubsubAvroTable
> ---
>
> Key: BEAM-5504
> URL: https://issues.apache.org/jira/browse/BEAM-5504
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382603
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:49
Start Date: 05/Feb/20 23:49
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#discussion_r375572082
 
 

 ##
 File path: website/src/documentation/sdks/python-dependencies.md
 ##
 @@ -26,460 +26,41 @@ behavior in the service. If you are using any of these 
packages in your code, be
 aware that some libraries are not forward-compatible and you may need to pin to
 the listed versions that will be in scope during execution.
 
-To see the compile and runtime dependencies for your Beam SDK version, 
expand
-the relevant section below.
+Dependencies for your Beam SDK version are listed in `setup.py` in the Beam 
repository. To view them, perform the following steps:
 
-2.17.0
+1. Open `setup.py`.
 
-Beam SDK for Python 2.17.0 has the following compile and runtime 
dependencies.
+```
+
https://raw.githubusercontent.com/apache/beam/v/sdks/python/setup.py
+```
+
+Replace `` with the major.minor.patch version of the SDK. 
For example, 
{:target="_blank"}
 will provide the dependencies for the 2.18.0 release.
 
 Review comment:
   I looked at the staged website: I think the documentation may be more 
user-friendly and the link we give in "For example" points to a place in the 
file where requirements are defined instead of the beginning of the file.
   We could use 'https://github.com/apache/beam/...' instead of raw URL for 
this purpose. What do you think? 
   
   Same about BeamModulePlugin reference.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382603)
Time Spent: 5h 10m  (was: 5h)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382604
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:49
Start Date: 05/Feb/20 23:49
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#discussion_r375572630
 
 

 ##
 File path: website/src/documentation/sdks/python-dependencies.md
 ##
 @@ -26,460 +26,41 @@ behavior in the service. If you are using any of these 
packages in your code, be
 aware that some libraries are not forward-compatible and you may need to pin to
 the listed versions that will be in scope during execution.
 
-To see the compile and runtime dependencies for your Beam SDK version, 
expand
-the relevant section below.
+Dependencies for your Beam SDK version are listed in `setup.py` in the Beam 
repository. To view them, perform the following steps:
 
-2.17.0
+1. Open `setup.py`.
 
-Beam SDK for Python 2.17.0 has the following compile and runtime 
dependencies.
+```
+
https://raw.githubusercontent.com/apache/beam/v/sdks/python/setup.py
+```
+
+Replace `` with the major.minor.patch version of the SDK. 
For example, 
{:target="_blank"}
 will provide the dependencies for the 2.18.0 release.
+
+
+2. Review the core dependency list under `REQUIRED_PACKAGES`.
 
-
-PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.3.0,0.3.1
-  fastavro=0.21.4,0.22
-  funcsigs=1.0.2,2; python_version  
"3.0"
-  future=0.16.0,1.0.0
-  futures=3.2.0,4.0.0; python_version  
"3.0"
-  google-apitools=0.5.28,0.5.29
-  google-cloud-bigquery=1.6.0,1.18.0
-  google-cloud-bigtable=0.31.1,1.1.0
-  google-cloud-core=0.28.1,2
-  google-cloud-datastore=1.7.1,1.8.0
-  google-cloud-pubsub=0.39.0,1.1.0
-  googledatastore=7.0.1,7.1; python_version  
"3.0"
-  grpcio=1.12.1,2
-  hdfs=2.1.0,3.0.0
-  httplib2=0.8,=0.12.0
-  mock=1.0.1,3.0.0
-  oauth2client=2.0.1,4
-  proto-google-cloud-datastore-v1=0.90.0,=0.90.4; 
python_version  "3.0"
-  protobuf=3.5.0.post1,4
-  pyarrow=0.15.1,0.16.0; python_version = "3.0" 
or platform_system != "Windows"
-  pydot=1.2.0,2
-  pymongo=3.8.0,4.0.0
-  python-dateutil=2.8.0,3
-  pytz=2018.3
-  pyvcf=0.6.8,0.7.0; python_version  
"3.0"
-  typing=3.6.0,3.7.0; python_version  
"3.5.0"
-
+**Note:** If you require [extra features]({{ site.baseurl 
}}/get-started/quickstart-py#extra-requirements) such as `gcp` or `test`, you 
should review the lists under `REQUIRED_TEST_PACKAGES`, `GCP_REQUIREMENTS`, or 
`INTERACTIVE_BEAM` for additional dependencies. 
 
-
+You can also retrieve the dependency list from the command line using the 
following process:
 
-2.16.0
+1.  Create a clean virtual environment on your local machine.
 
-Beam SDK for Python 2.16.0 has the following compile and
-  runtime dependencies.
+Python 3:
 
-
-  PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.3.0,0.3.1
-  fastavro=0.21.4,0.22
-  funcsigs=1.0.2,2; python_version  
"3.0"
-  future=0.16.0,1.0.0
-  futures=3.2.0,4.0.0; python_version  
"3.0"
-  google-apitools=0.5.28,0.5.29
-  google-cloud-bigquery=1.6.0,1.18.0
-  google-cloud-bigtable=0.31.1,1.1.0
-  google-cloud-core=0.28.1,2
-  google-cloud-datastore=1.7.1,1.8.0
-  google-cloud-pubsub=0.39.0,1.1.0
-  googledatastore=7.0.1,7.1; python_version  
"3.0"
-  grpcio=1.12.1,2
-  hdfs=2.1.0,3.0.0
-  httplib2=0.8,=0.12.0
-  mock=1.0.1,3.0.0
-  oauth2client=2.0.1,4
-  proto-google-cloud-datastore-v1=0.90.0,=0.90.4; 
python_version  "3.0"
-  protobuf=3.5.0.post1,4
-  pyarrow=0.11.1,0.15.0; python_version = "3.0" 
or platform_system != "Windows"
-  pydot=1.2.0,2
-  pymongo=3.8.0,4.0.0
-  python-dateutil=2.8.0,3
-  pytz=2018.3
-  pyvcf=0.6.8,0.7.0; python_version  
"3.0"
-  pyyaml=3.12,4.0.0
-  typing=3.6.0,3.7.0; python_version  
"3.5.0"
-
+```
+$ python3 -m venv env && source env/bin/activate
+```
+
+Python 2: 
 
-
+```
+$ pip install virtualenv && virtualenv env && source env/bin/activate
+```
 
-2.15.0
+2. [Install the Beam Python SDK]({{ site.baseurl 
}}/get-started/quickstart-py/#download-and-install).
 
-Beam SDK for Python 2.15.0 has the following compile and
-  runtime dependencies.
-
-  PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.2.9,0.2.10
-  fastavro=0.21.4,0.22
-  future=0.16.0,1.0.0
-  

[jira] [Work logged] (BEAM-2645) Implement DisplayData translation to/from protos

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2645?focusedWorklogId=382602=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382602
 ]

ASF GitHub Bot logged work on BEAM-2645:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:43
Start Date: 05/Feb/20 23:43
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10770: [BEAM-2645] 
Define the display data model type
URL: https://github.com/apache/beam/pull/10770#discussion_r375569755
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1205,58 +1207,58 @@ message FunctionSpec {
   bytes payload = 3;
 }
 
-// TODO: transfer javadoc here
-message DisplayData {
-
-  // (Required) The list of display data.
-  repeated Item items = 1;
-
-  // A complete identifier for a DisplayData.Item
-  message Identifier {
-
-// (Required) The transform originating this display data.
-string transform_id = 1;
-
-// (Optional) The URN indicating the type of the originating transform,
-// if there is one.
-string transform_urn = 2;
-
-string key = 3;
+// A set of well known URNs describing display data.
+//
+// All descriptions must contain how the value should be classified and how it
+// is encoded. Note that some types are logical types which convey contextual
+// information about the pipeline in addition to an encoding while others only
+// specify the encoding itself.
+message StandardDisplayData {
+  enum DisplayData {
+// A string label and value. Has a payload containing an encoded
+// LabelledStringPayload.
+LABELLED_STRING = 0 [(beam_urn) = "beam:display_data:labelled_string:v1"];
 
 Review comment:
   Isn't it best practice to start with 1, so that unset == 0?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382602)
Time Spent: 40m  (was: 0.5h)

> Implement DisplayData translation to/from protos
> 
>
> Key: BEAM-2645
> URL: https://issues.apache.org/jira/browse/BEAM-2645
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3221) Model pipeline representation improvements

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3221?focusedWorklogId=382601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382601
 ]

ASF GitHub Bot logged work on BEAM-3221:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:35
Start Date: 05/Feb/20 23:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10779: [BEAM-3221] 
Clarify documentation for StandardTransforms.Primitives, Pipeline, and 
PTransform.
URL: https://github.com/apache/beam/pull/10779
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9230) Enable CrossLanguageValidateRunner test for Spark runner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9230?focusedWorklogId=382600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382600
 ]

ASF GitHub Bot logged work on BEAM-9230:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:29
Start Date: 05/Feb/20 23:29
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10758: [BEAM-9230] 
Enable CrossLanguageValidateRunner test for Spark runner
URL: https://github.com/apache/beam/pull/10758#issuecomment-582662988
 
 
   Run XVR_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382600)
Time Spent: 1h 50m  (was: 1h 40m)

> Enable CrossLanguageValidateRunner test for Spark runner
> 
>
> Key: BEAM-9230
> URL: https://issues.apache.org/jira/browse/BEAM-9230
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Enable CrossLanguageValidateRunner test for Spark runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382599
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:27
Start Date: 05/Feb/20 23:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10745: [BEAM-9219] 
Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#issuecomment-582662323
 
 
   Looks like Jenkins triggered the runs, but did not pass propagate the signal 
onto the PR:
   https://builds.apache.org/job/beam_PreCommit_Website_Commit/3228/
   https://builds.apache.org/job/beam_PreCommit_Website_Stage_GCS_Commit/3001/
   
   The runs are passing. 
   
   Staging link: 
http://apache-beam-website-pull-requests.storage.googleapis.com/10745/index.html
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382599)
Time Spent: 5h  (was: 4h 50m)

> Streamline creation of Python and Java dependencies pages
> -
>
> Key: BEAM-9219
> URL: https://issues.apache.org/jira/browse/BEAM-9219
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: David Wrede
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> This issue is about the need to address keeping both Python and Java SDK 
> dependency pages more relevant and up-to-date while reducing the amount of 
> time it takes to provide that information. The current method of scraping and 
> copying dependencies into a table for every release is a non-trivial task 
> because of the semi-automated workflows done by the tech writers on the 
> website.
> In an effort to provide accurate dependency listings that are always in sync 
> with SDK releases, referring people to the appropriate places in the source 
> code (or through CLI commands) should provide people the information they are 
> looking for and not require the creation and maintenance of an automated 
> tooling solution to generate the dependency tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=382597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382597
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:26
Start Date: 05/Feb/20 23:26
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10778: [BEAM-3301] 
Small cleanup to KV Decoder.
URL: https://github.com/apache/beam/pull/10778
 
 
   This enables nested KV coders to be decoded properly (the previous
   code would drop nested KV's value). Currently there are no nested KV
   coders present in the Go SDK, but it will be needed for SDF support.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=382598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382598
 ]

ASF GitHub Bot logged work on BEAM-3301:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:26
Start Date: 05/Feb/20 23:26
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10778: [BEAM-3301] Small 
cleanup to KV Decoder.
URL: https://github.com/apache/beam/pull/10778#issuecomment-582662103
 
 
   R: @lostluck 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382598)
Time Spent: 20m  (was: 10m)

> Go SplittableDoFn support
> -
>
> Key: BEAM-3301
> URL: https://issues.apache.org/jira/browse/BEAM-3301
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Reporter: Henning Rohde
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SDFs will be the only way to add streaming and liquid sharded IO for Go.
> Design doc: https://s.apache.org/splittable-do-fn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382596
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 05/Feb/20 23:18
Start Date: 05/Feb/20 23:18
Worklog Time Spent: 10m 
  Work Description: veblush commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582659649
 
 
   @suztomo I chatted with @medb, the author of gcsio and he confirmed that 
gcsio.cooplock & gcsio.testing won't be used by Beam because it's controlled by 
the option. And then it seems like a false positive. WDTY?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382596)
Remaining Estimate: 160.5h  (was: 160h 40m)
Time Spent: 7.5h  (was: 7h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 7.5h
>  Remaining Estimate: 160.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9253) SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9253:
---
Status: Open  (was: Triage Needed)

> SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> --
>
> Key: BEAM-9253
> URL: https://issues.apache.org/jira/browse/BEAM-9253
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Tomo Suzuki
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.20.0
>
> Attachments: VarLongCoder_is_applied_Integer.png
>
>
> SQL Postcommit check started failing since build number 3924. The first 
> failure: https://builds.apache.org/job/beam_PostCommit_SQL/3924/
> {noformat}
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogGCSIT.testReadFromGCS
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
>   at org.apache.beam.sdk.coders.VarLongCoder.encode(VarLongCoder.java:35)
>   at 
> org.apache.beam.sdk.coders.RowCoderGenerator$EncodeInstruction.encodeDelegate(RowCoderGenerator.java:239)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$WZzHUNls.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$WZzHUNls.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:166)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:98)
> {noformat}
> History: https://builds.apache.org/job/beam_PostCommit_SQL/
> Pull request before the failure: 
> https://github.com/apache/beam/pull/10563#issuecomment-582384925
> DataCatalogGCSIT does not run locally: 
> https://gist.github.com/suztomo/43de4ff1f0458e801fef1a16a28da301
> DataCatalogGCSIT reads this Google Cloud Storage object:
> {code:java}
> String gcsEntryId =
> "`datacatalog`" // this is part of the resource name in DataCatalog, 
> so it has to be
> + ".`entry`" // different from the table provider name ("dc" in 
> this test)
> + ".`apache-beam-testing`"
> + ".`us-central1`"
> + ".`samples`"
> + ".`integ_test_small_csv_test_1`";
> {code}
> but I cannot see the content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9255) Beam Python Docs for 2.19 is not avilable

2020-02-05 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9255.

Fix Version/s: Not applicable
   Resolution: Fixed

> Beam Python Docs for 2.19 is not avilable
> -
>
> Key: BEAM-9255
> URL: https://issues.apache.org/jira/browse/BEAM-9255
> Project: Beam
>  Issue Type: Bug
>  Components: project-management
>Affects Versions: 2.19.0
>Reporter: Chethan UK
>Assignee: Boyuan Zhang
>Priority: Minor
>  Labels: documentaion
> Fix For: Not applicable
>
>
> Docs for 2.19.0 is [not 
> available|[https://beam.apache.org/releases/pydoc/2.19.0/]] while package has 
> been [released in pypi|[https://pypi.org/project/apache-beam/2.19.0/]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9255) Beam Python Docs for 2.19 is not avilable

2020-02-05 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031107#comment-17031107
 ] 

Ismaël Mejía commented on BEAM-9255:


Thanks [~boyuanz] !

> Beam Python Docs for 2.19 is not avilable
> -
>
> Key: BEAM-9255
> URL: https://issues.apache.org/jira/browse/BEAM-9255
> Project: Beam
>  Issue Type: Bug
>  Components: project-management
>Affects Versions: 2.19.0
>Reporter: Chethan UK
>Assignee: Boyuan Zhang
>Priority: Minor
>  Labels: documentaion
>
> Docs for 2.19.0 is [not 
> available|[https://beam.apache.org/releases/pydoc/2.19.0/]] while package has 
> been [released in pypi|[https://pypi.org/project/apache-beam/2.19.0/]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9231) Annotate as Experimental/Internal missing classes in beam-sdks-java-core

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9231?focusedWorklogId=382586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382586
 ]

ASF GitHub Bot logged work on BEAM-9231:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:57
Start Date: 05/Feb/20 22:57
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10739: [BEAM-9231] Annotate 
as Experimental/Internal missing classes in beam-sdks-java-core
URL: https://github.com/apache/beam/pull/10739#issuecomment-582653186
 
 
   Requested fixes done, I added an extra missing Experimental tag for 
`DoFn.OnTimersContext`). PTAL again @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382586)
Time Spent: 5h 10m  (was: 5h)

> Annotate as Experimental/Internal missing classes in beam-sdks-java-core
> 
>
> Key: BEAM-9231
> URL: https://issues.apache.org/jira/browse/BEAM-9231
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: portability
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> For some extra context I was studying the evolution of our APIs between 
> versions in particular for beam-sdks-java-core and noticed that some parts 
> were not well classified as Experimental in particular classes (and 
> transforms)  for portability and SplittableDoFn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382585
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:56
Start Date: 05/Feb/20 22:56
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582652943
 
 
   The second one seems wrong. Filed a bug in our tool: 
https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1185 .
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382585)
Remaining Estimate: 160h 40m  (was: 160h 50m)
Time Spent: 7h 20m  (was: 7h 10m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 7h 20m
>  Remaining Estimate: 160h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=382581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382581
 ]

ASF GitHub Bot logged work on BEAM-2535:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:43
Start Date: 05/Feb/20 22:43
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10627: [BEAM-2535] 
Support outputTimestamp and watermark holds in processing timers.
URL: https://github.com/apache/beam/pull/10627#discussion_r375546026
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -3806,6 +3812,90 @@ public void onTimer(
   pipeline.run();
 }
 
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesStatefulParDo.class,
+  UsesTimersInParDo.class,
+  UsesTestStreamWithProcessingTime.class,
+  UsesTestStreamWithOutputTimestamp.class
+})
+public void testOutputTimestampWithProcessingTime() {
+  final String timerId = "foo";
+  DoFn, KV> fn1 =
+  new DoFn, KV>() {
+
+@TimerId(timerId)
+private final TimerSpec timer = 
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
+
+@ProcessElement
+public void processElement(
+@TimerId(timerId) Timer timer,
+@Timestamp Instant timestamp,
+OutputReceiver> o) {
+  timer
+  
.withOutputTimestamp(timestamp.plus(Duration.standardSeconds(5)))
+  .offset(Duration.standardSeconds(10))
+  .setRelative();
+  // Output a message. This will cause the next DoFn to set a 
timer as well.
+  o.output(KV.of("foo", 100));
+}
+
+@OnTimer(timerId)
+public void onTimer(OnTimerContext c, BoundedWindow w) {}
+  };
+
+  DoFn, Integer> fn2 =
+  new DoFn, Integer>() {
+
+@TimerId(timerId)
+private final TimerSpec timer = 
TimerSpecs.timer(TimeDomain.EVENT_TIME);
+
+@StateId("timerFired")
+final StateSpec> timerFiredState = 
StateSpecs.value();
+
+@ProcessElement
+public void processElement(
+@TimerId(timerId) Timer timer,
+@StateId("timerFired") ValueState timerFiredState) {
+  Boolean timerFired = timerFiredState.read();
+  assertTrue(timerFired == null || !timerFired);
+  // Set a timer to 8. This is earlier than the previous DoFn's 
timer, but after the
+  // previous
+  // DoFn timer's watermark hold. This timer should not fire until 
the previous timer
+  // fires and removes
+  // the watermark hold.
+  timer.offset(Duration.standardSeconds(8)).setRelative();
 
 Review comment:
   just use timer.set, not timer.offset
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382581)
Time Spent: 22h 50m  (was: 22h 40m)

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 22h 50m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification 

[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=382583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382583
 ]

ASF GitHub Bot logged work on BEAM-2535:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:43
Start Date: 05/Feb/20 22:43
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10627: [BEAM-2535] 
Support outputTimestamp and watermark holds in processing timers.
URL: https://github.com/apache/beam/pull/10627#discussion_r375549630
 
 

 ##
 File path: 
runners/direct-java/src/main/java/org/apache/beam/runners/direct/WatermarkManager.java
 ##
 @@ -597,20 +597,29 @@ public synchronized Instant getEarliestTimerTimestamp() {
   Instant earliest = THE_END_OF_TIME.get();
   for (NavigableSet timers : processingTimers.values()) {
 
 Review comment:
   What about the other getEarliestTimerTimestamp function? Does that need to 
be updated?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382583)
Time Spent: 23h  (was: 22h 50m)

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 23h
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2535?focusedWorklogId=382582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382582
 ]

ASF GitHub Bot logged work on BEAM-2535:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:43
Start Date: 05/Feb/20 22:43
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10627: [BEAM-2535] 
Support outputTimestamp and watermark holds in processing timers.
URL: https://github.com/apache/beam/pull/10627#discussion_r375548830
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -3806,6 +3812,90 @@ public void onTimer(
   pipeline.run();
 }
 
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesStatefulParDo.class,
+  UsesTimersInParDo.class,
+  UsesTestStreamWithProcessingTime.class,
+  UsesTestStreamWithOutputTimestamp.class
+})
+public void testOutputTimestampWithProcessingTime() {
+  final String timerId = "foo";
+  DoFn, KV> fn1 =
+  new DoFn, KV>() {
+
+@TimerId(timerId)
+private final TimerSpec timer = 
TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
+
+@ProcessElement
+public void processElement(
+@TimerId(timerId) Timer timer,
+@Timestamp Instant timestamp,
+OutputReceiver> o) {
+  timer
+  
.withOutputTimestamp(timestamp.plus(Duration.standardSeconds(5)))
+  .offset(Duration.standardSeconds(10))
+  .setRelative();
+  // Output a message. This will cause the next DoFn to set a 
timer as well.
+  o.output(KV.of("foo", 100));
+}
+
+@OnTimer(timerId)
+public void onTimer(OnTimerContext c, BoundedWindow w) {}
+  };
+
+  DoFn, Integer> fn2 =
+  new DoFn, Integer>() {
+
+@TimerId(timerId)
+private final TimerSpec timer = 
TimerSpecs.timer(TimeDomain.EVENT_TIME);
+
+@StateId("timerFired")
+final StateSpec> timerFiredState = 
StateSpecs.value();
+
+@ProcessElement
+public void processElement(
+@TimerId(timerId) Timer timer,
+@StateId("timerFired") ValueState timerFiredState) {
+  Boolean timerFired = timerFiredState.read();
+  assertTrue(timerFired == null || !timerFired);
+  // Set a timer to 8. This is earlier than the previous DoFn's 
timer, but after the
+  // previous
+  // DoFn timer's watermark hold. This timer should not fire until 
the previous timer
+  // fires and removes
+  // the watermark hold.
+  timer.offset(Duration.standardSeconds(8)).setRelative();
+}
+
+@OnTimer(timerId)
+public void onTimer(
+@StateId("timerFired") ValueState timerFiredState,
+OutputReceiver o) {
+  timerFiredState.write(true);
+  o.output(100);
+}
+  };
+
+  TestStream> stream =
+  TestStream.create(KvCoder.of(StringUtf8Coder.of(), VarIntCoder.of()))
+  .advanceProcessingTime(Duration.standardSeconds(1))
+  // Cause fn2 to set a timer.
+  .addElements(KV.of("key", 1))
+  // Normally this would case fn2's timer to expire, but it 
shouldn't here because of
+  // the output timestamp.
+  .advanceProcessingTime(Duration.standardSeconds(9))
 
 Review comment:
   I think you might have to advance processing time to at least 11.
   
   You also need to advance the watermark to at least 10 to allow the timer to 
fire. Right now this isn't testing anything, because the input watermark is 
preventing the timer from firing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382582)
Time Spent: 23h  (was: 22h 50m)

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 23h
>  Remaining Estimate: 0h
>
> Today, we have 

[jira] [Assigned] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-05 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang reassigned BEAM-9228:
--

Assignee: Hannah Jiang

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-05 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9228:
---
Status: Open  (was: Triage Needed)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0
>Reporter: Hannah Jiang
>Priority: Major
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=382580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382580
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:36
Start Date: 05/Feb/20 22:36
Worklog Time Spent: 10m 
  Work Description: veblush commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-582646693
 
 
   @suztomo Oh I missed the messages in between. For those dependencies, what I 
need to address that? For the first one, probably using the old function which 
is available on that version would be feasible. For the last one, it seems a 
bit strange.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382580)
Remaining Estimate: 160h 50m  (was: 161h)
Time Spent: 7h 10m  (was: 7h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 7h 10m
>  Remaining Estimate: 160h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9230) Enable CrossLanguageValidateRunner test for Spark runner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9230?focusedWorklogId=382577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382577
 ]

ASF GitHub Bot logged work on BEAM-9230:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:31
Start Date: 05/Feb/20 22:31
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10758: [BEAM-9230] Enable 
CrossLanguageValidateRunner test for Spark runner
URL: https://github.com/apache/beam/pull/10758#issuecomment-582645305
 
 
   Run XVR_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382577)
Time Spent: 1h 40m  (was: 1.5h)

> Enable CrossLanguageValidateRunner test for Spark runner
> 
>
> Key: BEAM-9230
> URL: https://issues.apache.org/jira/browse/BEAM-9230
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Enable CrossLanguageValidateRunner test for Spark runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=382570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382570
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:23
Start Date: 05/Feb/20 22:23
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-582642317
 
 
   @udim, this works but it's pip-installing mypy_protobuf from my github repo. 
 I've opened a PR on that project to fix this, but there's no telling how long 
it will take them to merge it and make a new release to PyPI.  Should we merge 
this as-is and make a new jira for fixing it when a new version of 
mypy_protobuf, or should we wait for the official release?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382570)
Time Spent: 8h 10m  (was: 8h)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including 

[jira] [Work logged] (BEAM-9230) Enable CrossLanguageValidateRunner test for Spark runner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9230?focusedWorklogId=382566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382566
 ]

ASF GitHub Bot logged work on BEAM-9230:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:17
Start Date: 05/Feb/20 22:17
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10758: [BEAM-9230] Enable 
CrossLanguageValidateRunner test for Spark runner
URL: https://github.com/apache/beam/pull/10758#issuecomment-582640104
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382566)
Time Spent: 1.5h  (was: 1h 20m)

> Enable CrossLanguageValidateRunner test for Spark runner
> 
>
> Key: BEAM-9230
> URL: https://issues.apache.org/jira/browse/BEAM-9230
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Enable CrossLanguageValidateRunner test for Spark runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9230) Enable CrossLanguageValidateRunner test for Spark runner

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9230?focusedWorklogId=382565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382565
 ]

ASF GitHub Bot logged work on BEAM-9230:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:15
Start Date: 05/Feb/20 22:15
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10758: [BEAM-9230] Enable 
CrossLanguageValidateRunner test for Spark runner
URL: https://github.com/apache/beam/pull/10758#issuecomment-582639284
 
 
   Run XVR_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382565)
Time Spent: 1h 20m  (was: 1h 10m)

> Enable CrossLanguageValidateRunner test for Spark runner
> 
>
> Key: BEAM-9230
> URL: https://issues.apache.org/jira/browse/BEAM-9230
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Enable CrossLanguageValidateRunner test for Spark runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9231) Annotate as Experimental/Internal missing classes in beam-sdks-java-core

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9231?focusedWorklogId=382564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382564
 ]

ASF GitHub Bot logged work on BEAM-9231:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:14
Start Date: 05/Feb/20 22:14
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10739: [BEAM-9231] 
Annotate as Experimental/Internal missing classes in beam-sdks-java-core
URL: https://github.com/apache/beam/pull/10739#discussion_r375527310
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java
 ##
 @@ -53,6 +55,7 @@
  * service. Note that this is a low-level API and mainly for internal use. A 
user may want to use
  * high-level wrapper classes rather than this one.
  */
+@Experimental(Kind.PORTABILITY)
 
 Review comment:
   I prefer to do an exception for this one because of 
https://issues.apache.org/jira/browse/BEAM-8546. This class is mislocated but 
nobody has cared about putting it on `sdks/java/core` where it belongs so this 
is to prevent that once it is moved there the 'experimental' annotation gets 
forgotten.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382564)
Time Spent: 5h  (was: 4h 50m)

> Annotate as Experimental/Internal missing classes in beam-sdks-java-core
> 
>
> Key: BEAM-9231
> URL: https://issues.apache.org/jira/browse/BEAM-9231
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: portability
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> For some extra context I was studying the evolution of our APIs between 
> versions in particular for beam-sdks-java-core and noticed that some parts 
> were not well classified as Experimental in particular classes (and 
> transforms)  for portability and SplittableDoFn.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=382562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382562
 ]

ASF GitHub Bot logged work on BEAM-9219:


Author: ASF GitHub Bot
Created on: 05/Feb/20 22:10
Start Date: 05/Feb/20 22:10
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #10745: 
[BEAM-9219] Streamline creation of Python and Java dependencies pages
URL: https://github.com/apache/beam/pull/10745#discussion_r375536526
 
 

 ##
 File path: website/src/documentation/sdks/python-dependencies.md
 ##
 @@ -26,460 +26,41 @@ behavior in the service. If you are using any of these 
packages in your code, be
 aware that some libraries are not forward-compatible and you may need to pin to
 the listed versions that will be in scope during execution.
 
-To see the compile and runtime dependencies for your Beam SDK version, 
expand
-the relevant section below.
+Dependencies for your Beam SDK version are listed in `setup.py` in the Beam 
repository. To view them, perform the following steps:
 
-2.17.0
+1. Open `setup.py`.
 
-Beam SDK for Python 2.17.0 has the following compile and runtime 
dependencies.
+```
+
https://raw.githubusercontent.com/apache/beam/v/sdks/python/setup.py
+```
+
+Replace `` with the major.minor.patch version of the SDK. 
For example, 
{:target="_blank"}
 will provide the dependencies for the 2.18.0 release.
+
+
+2. Review the core dependency list under `REQUIRED_PACKAGES`.
 
-
-PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.3.0,0.3.1
-  fastavro=0.21.4,0.22
-  funcsigs=1.0.2,2; python_version  
"3.0"
-  future=0.16.0,1.0.0
-  futures=3.2.0,4.0.0; python_version  
"3.0"
-  google-apitools=0.5.28,0.5.29
-  google-cloud-bigquery=1.6.0,1.18.0
-  google-cloud-bigtable=0.31.1,1.1.0
-  google-cloud-core=0.28.1,2
-  google-cloud-datastore=1.7.1,1.8.0
-  google-cloud-pubsub=0.39.0,1.1.0
-  googledatastore=7.0.1,7.1; python_version  
"3.0"
-  grpcio=1.12.1,2
-  hdfs=2.1.0,3.0.0
-  httplib2=0.8,=0.12.0
-  mock=1.0.1,3.0.0
-  oauth2client=2.0.1,4
-  proto-google-cloud-datastore-v1=0.90.0,=0.90.4; 
python_version  "3.0"
-  protobuf=3.5.0.post1,4
-  pyarrow=0.15.1,0.16.0; python_version = "3.0" 
or platform_system != "Windows"
-  pydot=1.2.0,2
-  pymongo=3.8.0,4.0.0
-  python-dateutil=2.8.0,3
-  pytz=2018.3
-  pyvcf=0.6.8,0.7.0; python_version  
"3.0"
-  typing=3.6.0,3.7.0; python_version  
"3.5.0"
-
+**Note:** If you require [extra features]({{ site.baseurl 
}}/get-started/quickstart-py#extra-requirements) such as `gcp` or `test`, you 
should review the lists under `REQUIRED_TEST_PACKAGES`, `GCP_REQUIREMENTS`, or 
`INTERACTIVE_BEAM` for additional dependencies. 
 
-
+You can also retrieve the dependency list from the command line using the 
following process:
 
-2.16.0
+1.  Create a clean virtual environment on your local machine.
 
-Beam SDK for Python 2.16.0 has the following compile and
-  runtime dependencies.
+Python 3:
 
-
-  PackageVersion
-  avro-python3=1.8.1,2.0.0; python_version = 
"3.0"
-  avro=1.8.1,2.0.0; python_version  
"3.0"
-  cachetools=3.1.0,4
-  crcmod=1.7,2.0
-  dill=0.3.0,0.3.1
-  fastavro=0.21.4,0.22
-  funcsigs=1.0.2,2; python_version  
"3.0"
-  future=0.16.0,1.0.0
-  futures=3.2.0,4.0.0; python_version  
"3.0"
-  google-apitools=0.5.28,0.5.29
-  google-cloud-bigquery=1.6.0,1.18.0
-  google-cloud-bigtable=0.31.1,1.1.0
-  google-cloud-core=0.28.1,2
-  google-cloud-datastore=1.7.1,1.8.0
-  google-cloud-pubsub=0.39.0,1.1.0
-  googledatastore=7.0.1,7.1; python_version  
"3.0"
-  grpcio=1.12.1,2
-  hdfs=2.1.0,3.0.0
-  httplib2=0.8,=0.12.0
-  mock=1.0.1,3.0.0
-  oauth2client=2.0.1,4
-  proto-google-cloud-datastore-v1=0.90.0,=0.90.4; 
python_version  "3.0"
-  protobuf=3.5.0.post1,4
-  pyarrow=0.11.1,0.15.0; python_version = "3.0" 
or platform_system != "Windows"
-  pydot=1.2.0,2
-  pymongo=3.8.0,4.0.0
-  python-dateutil=2.8.0,3
-  pytz=2018.3
-  pyvcf=0.6.8,0.7.0; python_version  
"3.0"
-  pyyaml=3.12,4.0.0
-  typing=3.6.0,3.7.0; python_version  
"3.5.0"
-
+```
+$ python3 -m venv env && source env/bin/activate
+```
+
+Python 2: 
 
-
+```
+$ pip install virtualenv && virtualenv env && source env/bin/activate
+```
 
-2.15.0
+2. [Install the Beam Python SDK]({{ site.baseurl 
}}/get-started/quickstart-py/#download-and-install).
 
 Review comment:
   Got it
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure 

[jira] [Commented] (BEAM-9253) SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast to java.lang.Long

2020-02-05 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031076#comment-17031076
 ] 

Tomo Suzuki commented on BEAM-9253:
---

[~bhulette] Once you fix the schema, would you run "Run SQL Postcommit" in 
https://github.com/apache/beam/pull/10765 to confirm the effect?

> SQL PostCommit failure: ClassCastException: java.lang.Integer cannot be cast 
> to java.lang.Long
> --
>
> Key: BEAM-9253
> URL: https://issues.apache.org/jira/browse/BEAM-9253
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Tomo Suzuki
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.20.0
>
> Attachments: VarLongCoder_is_applied_Integer.png
>
>
> SQL Postcommit check started failing since build number 3924. The first 
> failure: https://builds.apache.org/job/beam_PostCommit_SQL/3924/
> {noformat}
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogGCSIT.testReadFromGCS
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
>   at org.apache.beam.sdk.coders.VarLongCoder.encode(VarLongCoder.java:35)
>   at 
> org.apache.beam.sdk.coders.RowCoderGenerator$EncodeInstruction.encodeDelegate(RowCoderGenerator.java:239)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$WZzHUNls.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$WZzHUNls.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:166)
>   at 
> org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:98)
> {noformat}
> History: https://builds.apache.org/job/beam_PostCommit_SQL/
> Pull request before the failure: 
> https://github.com/apache/beam/pull/10563#issuecomment-582384925
> DataCatalogGCSIT does not run locally: 
> https://gist.github.com/suztomo/43de4ff1f0458e801fef1a16a28da301
> DataCatalogGCSIT reads this Google Cloud Storage object:
> {code:java}
> String gcsEntryId =
> "`datacatalog`" // this is part of the resource name in DataCatalog, 
> so it has to be
> + ".`entry`" // different from the table provider name ("dc" in 
> this test)
> + ".`apache-beam-testing`"
> + ".`us-central1`"
> + ".`samples`"
> + ".`integ_test_small_csv_test_1`";
> {code}
> but I cannot see the content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >