[jira] [Work logged] (BEAM-7642) (Dataflow) Python AfterProcessingTime fires before defined time

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7642?focusedWorklogId=300427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300427
 ]

ASF GitHub Bot logged work on BEAM-7642:


Author: ASF GitHub Bot
Created on: 23/Aug/19 18:20
Start Date: 23/Aug/19 18:20
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9397: [BEAM-7642] 
Fix python sdk AfterProcessingTime unit discrepancy
URL: https://github.com/apache/beam/pull/9397
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300427)
Time Spent: 1h 20m  (was: 1h 10m)

> (Dataflow) Python AfterProcessingTime fires before defined time
> ---
>
> Key: BEAM-7642
> URL: https://issues.apache.org/jira/browse/BEAM-7642
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Mikhail Gryzykhin
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/56700913/why-my-data-in-apache-beam-is-emitted-after-a-few-minutes-instead-of-10h]
>  
> User on StackOverflow has a problem that AfterProcessingTime on global window 
> fires before allocated time (10h). It dumps events within first 
> minutes/seconds and then drops the rest due to window closed.
> Code user provided seems valid. I tried to verify time units accepted by the 
> method, but I couldn't track it all the way through the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7988) Error should include runner name when runner is invalid

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7988?focusedWorklogId=300428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300428
 ]

ASF GitHub Bot logged work on BEAM-7988:


Author: ASF GitHub Bot
Created on: 23/Aug/19 18:22
Start Date: 23/Aug/19 18:22
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9355: [BEAM-7988] 
py: include runner name when runner is invalid
URL: https://github.com/apache/beam/pull/9355
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300428)
Time Spent: 20m  (was: 10m)

> Error should include runner name when runner is invalid
> ---
>
> Key: BEAM-7988
> URL: https://issues.apache.org/jira/browse/BEAM-7988
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "TypeError: Runner must be a PipelineRunner object or the name of a 
> registered runner." would be a more helpful error message if it included the 
> runner that was found to be incorrect.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-23 Thread Rahul Patwari (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914535#comment-16914535
 ] 

Rahul Patwari commented on BEAM-6114:
-

To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|[https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175].
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236]
 doesn't take care of all the join 
types.|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175].]

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.

And we have to 

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-23 Thread Rahul Patwari (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914535#comment-16914535
 ] 

Rahul Patwari edited comment on BEAM-6114 at 8/23/19 6:31 PM:
--

To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|[https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175]],
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236],
 doesn't take care of all the join types.

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.


was (Author: rahul8383):
To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|[https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175].
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236]
 doesn't take care of all the join 
types.|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175].]

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.

And we have to 

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-23 Thread Rahul Patwari (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914535#comment-16914535
 ] 

Rahul Patwari edited comment on BEAM-6114 at 8/23/19 6:34 PM:
--

To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175]
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236],
 doesn't take care of all the join types.

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.


was (Author: rahul8383):
To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|[https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175]],
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236],
 doesn't take care of all the join types.

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7973) Python doesn't shut down Flink job server properly

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7973?focusedWorklogId=300435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300435
 ]

ASF GitHub Bot logged work on BEAM-7973:


Author: ASF GitHub Bot
Created on: 23/Aug/19 18:34
Start Date: 23/Aug/19 18:34
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9340: [BEAM-7973] py: shut 
down Flink job server automatically
URL: https://github.com/apache/beam/pull/9340#issuecomment-524418714
 
 
   R: @mxm 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300435)
Time Spent: 0.5h  (was: 20m)

> Python doesn't shut down Flink job server properly
> --
>
> Key: BEAM-7973
> URL: https://issues.apache.org/jira/browse/BEAM-7973
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Using the new Python FlinkRunner [1], a new job server is created and the job 
> succeeds, but seemingly not being shut down properly when the Python command 
> exits. Specifically, the java -jar command that started the job server is 
> still running in the background, eating up memory.
> Relevant args:
> python ...
>  --runner FlinkRunner \ 
>  --flink_job_server_jar $FLINK_JOB_SERVER_JAR ...
> [1] [https://github.com/apache/beam/pull/9043]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8085) PubsubMessageWithAttributesCoder should not produce null attributes map

2019-08-23 Thread Chad Dombrova (Jira)
Chad Dombrova created BEAM-8085:
---

 Summary: PubsubMessageWithAttributesCoder should not produce null 
attributes map
 Key: BEAM-8085
 URL: https://issues.apache.org/jira/browse/BEAM-8085
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Chad Dombrova


Hi, I just got caught by an issue where PubsubMessage.getAttributeMap() 
returned null, because the message was created by 
PubsubMessageWithAttributesCoder which uses a NullableCoder for attributes.

Here are the relevant code snippets:

{code:java}
public class PubsubMessageWithAttributesCoder extends 
CustomCoder {
  // A message's payload can not be null
  private static final Coder PAYLOAD_CODER = ByteArrayCoder.of();
  // A message's attributes can be null.
  private static final Coder> ATTRIBUTES_CODER =
  NullableCoder.of(MapCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of()));

  @Override
  public PubsubMessage decode(InputStream inStream) throws IOException {
return decode(inStream, Context.NESTED);
  }

  @Override
  public PubsubMessage decode(InputStream inStream, Context context) throws 
IOException {
byte[] payload = PAYLOAD_CODER.decode(inStream);
Map attributes = ATTRIBUTES_CODER.decode(inStream, context);
return new PubsubMessage(payload, attributes);
  }
}

{code}
 
{code:java}
public class PubsubMessage {

  private byte[] message;
  private Map attributes;

  public PubsubMessage(byte[] payload, Map attributes) {
this.message = payload;
this.attributes = attributes;
  }

  /** Returns the main PubSub message. */
  public byte[] getPayload() {
return message;
  }

  /** Returns the given attribute value. If not such attribute exists, returns 
null. */
  @Nullable
  public String getAttribute(String attribute) {
checkNotNull(attribute, "attribute");
return attributes.get(attribute);
  }

  /** Returns the full map of attributes. This is an unmodifiable map. */
  public Map getAttributeMap() {
return attributes;
  }
}
{code}

There are a handful of potential solutions:

# Remove the NullableCoder
# In PubsubMessageWithAttributesCoder.decode, check for null and create an 
empty Map before instantiating PubsubMessage
# Allow attributes to be null for  PubsubMessage constructor, but create an 
empty Map if it is (similar to above, but handle it in PubsubMessage)
# Allow PubsubMessage.attributes to be nullable, and indicate it as such




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-23 Thread Rahul Patwari (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914535#comment-16914535
 ] 

Rahul Patwari edited comment on BEAM-6114 at 8/23/19 6:45 PM:
--

To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175]
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236],
 doesn't take care of all the join types.

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.

I think Once triggering will also apply to Unbounded data with Default Trigger.


was (Author: rahul8383):
To support all types of Joins with Seekable Join Bounded, we have to enhance 
[BeamJoinTransforms.JoinAsLookup|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L175]
 as the current logic for 
[LookupJoin|https://github.com/apache/beam/blob/ad92513154fca8a7ef0a2f44b9ff54c005cbc565/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamJoinTransforms.java#L236],
 doesn't take care of all the join types.

We also have to introduce a method in 
[BeamSqlSeekableTable|https://github.com/apache/beam/blob/862b04c100360c58fb4e867d97d9c49525286874/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java#L30]
 to get all the rows.

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=300447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300447
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 23/Aug/19 19:24
Start Date: 23/Aug/19 19:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] fix Spark 
reshuffle translation with Python SDK
URL: https://github.com/apache/beam/pull/9410#issuecomment-524433753
 
 
   Run Python Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300447)
Time Spent: 40m  (was: 0.5h)

> Portable spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=300464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300464
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 23/Aug/19 19:43
Start Date: 23/Aug/19 19:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] fix Spark 
reshuffle translation with Python SDK
URL: https://github.com/apache/beam/pull/9410#issuecomment-524438917
 
 
   R: @iemejia Any particular reason we need to separate the keys and values in 
the current reshuffle translation? 
https://github.com/apache/beam/blob/c5d45331796693ad48c0cceaf1e4d9903c1d98fb/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/GroupCombineFunctions.java#L191-L197
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300464)
Time Spent: 50m  (was: 40m)

> Portable spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=300484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300484
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:17
Start Date: 23/Aug/19 20:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9388: [BEAM-7909] 
upgrade python lib versions to match to dataflow worker
URL: https://github.com/apache/beam/pull/9388
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300484)
Time Spent: 7h 10m  (was: 7h)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7980) External environment with containerized worker pool

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?focusedWorklogId=300487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300487
 ]

ASF GitHub Bot logged work on BEAM-7980:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:22
Start Date: 23/Aug/19 20:22
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9398: [BEAM-7980] 
Exactly once artifact retrieval for Python SDK worker pool
URL: https://github.com/apache/beam/pull/9398#discussion_r31726
 
 

 ##
 File path: sdks/python/container/boot.go
 ##
 @@ -105,18 +107,57 @@ func main() {
 
// (2) Retrieve and install the staged packages.
 
-   dir := filepath.Join(*semiPersistDir, "staged")
+   func() {
 
-   files, err := artifact.Materialize(ctx, *artifactEndpoint, 
info.GetRetrievalToken(), dir)
-   if err != nil {
-   log.Fatalf("Failed to retrieve staged files: %v", err)
-   }
+installCompleteFile := filepath.Join(os.TempDir(), 
"beam.install.complete")
 
-   // TODO(herohde): the packages to install should be specified 
explicitly. It
-   // would also be possible to install the SDK in the Dockerfile.
-   if setupErr := installSetupPackages(files, dir); setupErr != nil {
-   log.Fatalf("Failed to install required packages: %v", setupErr)
-   }
+   // skip if install already complete
+   _, err = os.Stat(installCompleteFile)
+   if err == nil {
+   return
+   }
+
+   // lock to guard from concurrent artifact retrieval and 
installation,
+   // when called by child processes in a worker pool
+   lock, err := lockfile.New(filepath.Join(os.TempDir(), 
"beam.install.lck"))
+   if err != nil {
+   log.Fatalf("Cannot init artifact retrieval lock: %v", 
err)
+   }
+
+   for err = lock.TryLock(); err != nil; err = lock.TryLock() {
+   switch err {
+   case lockfile.ErrBusy, lockfile.ErrNotExist:
+   time.Sleep(5 * time.Second)
+   log.Printf("Worker %v waiting for artifact 
retrieval lock: %v", *id, lock)
+   default:
+   log.Fatalf("Worker %v could not obtain artifact 
retrieval lock: %v", *id, err)
+   }
+   }
+   defer lock.Unlock()
+
+   // skip if install already complete
+   _, err = os.Stat(installCompleteFile)
+   if err == nil {
+   return
 
 Review comment:
   This is within the lock section. defer ... above executes after the 
completion of this function.
   
   I think we should block all workers as intended here, because workers will 
need the environment to be setup first.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300487)
Time Spent: 7h 20m  (was: 7h 10m)

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8045) Publish custom windows pattern

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8045?focusedWorklogId=300489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300489
 ]

ASF GitHub Bot logged work on BEAM-8045:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:26
Start Date: 23/Aug/19 20:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9406: [BEAM-8045] 
Custom windows patterns
URL: https://github.com/apache/beam/pull/9406#discussion_r317289450
 
 

 ##
 File path: website/src/documentation/patterns/custom-window.md
 ##
 @@ -0,0 +1,112 @@
+---
+layout: section
+title: "Custom window patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/custom-window/
+---
+
+
+# Custom window patterns
+The samples on this page demonstrate common custom window patterns. You can 
create custom windows with [`WindowFn` functions]({{ site.baseurl 
}}/documentation/programming-guide/#windowing#provided-windowing-functions). 
For more information, see the [programming guide section on windowing]({{ 
site.baseurl }}/documentation/programming-guide/#windowing).
+
+**Note**: Merging windows isn't supported in Python.
 
 Review comment:
   _custom_ merging windows isn't supported in Python.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300489)
Remaining Estimate: 0h
Time Spent: 10m

> Publish custom windows pattern
> --
>
> Key: BEAM-8045
> URL: https://issues.apache.org/jira/browse/BEAM-8045
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7980) External environment with containerized worker pool

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?focusedWorklogId=300500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300500
 ]

ASF GitHub Bot logged work on BEAM-7980:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:46
Start Date: 23/Aug/19 20:46
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9398: [BEAM-7980] 
Exactly once artifact retrieval for Python SDK worker pool
URL: https://github.com/apache/beam/pull/9398#discussion_r317295950
 
 

 ##
 File path: sdks/python/container/boot.go
 ##
 @@ -105,18 +107,57 @@ func main() {
 
// (2) Retrieve and install the staged packages.
 
-   dir := filepath.Join(*semiPersistDir, "staged")
+   func() {
 
-   files, err := artifact.Materialize(ctx, *artifactEndpoint, 
info.GetRetrievalToken(), dir)
-   if err != nil {
-   log.Fatalf("Failed to retrieve staged files: %v", err)
-   }
+installCompleteFile := filepath.Join(os.TempDir(), 
"beam.install.complete")
 
-   // TODO(herohde): the packages to install should be specified 
explicitly. It
-   // would also be possible to install the SDK in the Dockerfile.
-   if setupErr := installSetupPackages(files, dir); setupErr != nil {
-   log.Fatalf("Failed to install required packages: %v", setupErr)
-   }
+   // skip if install already complete
+   _, err = os.Stat(installCompleteFile)
+   if err == nil {
+   return
 
 Review comment:
   There is no need to enter the locked section when the install was already 
completed (hence the comment above).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300500)
Time Spent: 7.5h  (was: 7h 20m)

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914549#comment-16914549
 ] 

Rui Wang commented on BEAM-6114:


You are right. There are enhancements required to support all join types for 
Seekable Join Bounded, especially on the seekable table interface (which 
however is not the goal for the seekable interface to return all rows). 

I think your original idea is practical: only support OUTER JOIN on the side of 
non-seekable side, and throw exception otherwise.


  

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=300444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300444
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 23/Aug/19 19:16
Start Date: 23/Aug/19 19:16
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8915: [DO NOT MERGE] 
[BEAM-4046] Remove old project name mappings.
URL: https://github.com/apache/beam/pull/8915#issuecomment-524431626
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300444)
Time Spent: 41.5h  (was: 41h 20m)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 41.5h
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-8083) Support INT32 type

2019-08-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914569#comment-16914569
 ] 

Rui Wang edited comment on BEAM-8083 at 8/23/19 7:48 PM:
-

Hi Gleb,

Missing support of INT32 it is because 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L100.

If you set the mode to ProductMode.PRODUCT_INTERNAL,  INT32  in analyzer 
support will back. PPRODUCT_EXTERNAL actually disables INT32 type in analyzer 
directly. You can think of the external mode is a subset of internal mode 
without sacrificing too much: INT64 can represent INT32 , etc. 

However I don't remember to what extend, we support INT32 in Beam ZetaSQL. 

If on your side you have to use INT32, it would make sense to add the control 
of mode in BeamSqlPipelineOption so we can open/close it on demand. Supporting 
INT32 will be very straightforward or already partially exists in Beam ZetaSQL.




was (Author: amaliujia):
Hi Gleb,

Missing support of INT32 it is because 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L100.

If you set the mode to ProductMode.PRODUCT_INTERNAL,  INT32 support will back. 
PPRODUCT_EXTERNAL actually disables INT32 type in analyzer directly. You can 
think of the external mode is a subset of internal mode without sacrificing too 
much: INT64 can represent INT32 , etc. 

If on your side you have to use INT32, it would make sense to add the control 
of mode in BeamSqlPipelineOption so we can open/close it on demand. Supporting 
int32 will be very straightforward or already partially exists.



> Support INT32 type
> --
>
> Key: BEAM-8083
> URL: https://issues.apache.org/jira/browse/BEAM-8083
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Priority: Critical
>
> Test to reproduce
> {code}
> @Test
>   public void selectInt32Parameter() {
> String sql = "SELECT @p0";
> ImmutableMap params = ImmutableMap.of("p0", 
> Value.createInt32Value(123));
> ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
> BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql, 
> params);
> PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
> final Schema schema = Schema.builder().addInt32Field("field1").build();
> 
> PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(123).build());
> 
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>   }
> {code}
> {code}
> Parameter p0 has unsupported type: INT32
> org.apache.beam.repackaged.sql.com.google.zetasql.SqlException: Parameter p0 
> has unsupported type: INT32
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.Analyzer.analyzeStatement(Analyzer.java:57)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$addBuiltinFunctionsToCatalog$1(SqlAnalyzer.java:149)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:157)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:80)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecZetaSQLTest.selectInt32Parameter(ZetaSQLDialectSpecZetaSQLTest.java:3204)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>

[jira] [Updated] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-08-23 Thread Ning Kang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-7926:

Description: 
Support auto plotting / charting of materialized data of a given PCollection 
with Interactive Beam.

Say an Interactive Beam pipeline defined as

p = create_pipeline()

pcoll = p | 'Transform' >> transform()

The use can call a single function and get auto-magical charting of the data as 
materialized pcoll.

e.g., visualize(pcoll)

  was:
Support auto plotting / charting of materialized data of a given PCollection 
with interactive Beam.

Say an iBeam pipeline defined as

p = ibeam.create_pipeline()

pcoll = p | 'Transform' >> transform()

The use can call a single function and get auto-magical charting of the data as 
materialized pcoll.

e.g., ibeam.visualize(pcoll)


> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-7988) Error should include runner name when runner is invalid

2019-08-23 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-7988.
---
Fix Version/s: 2.16.0
   Resolution: Fixed

> Error should include runner name when runner is invalid
> ---
>
> Key: BEAM-7988
> URL: https://issues.apache.org/jira/browse/BEAM-7988
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Trivial
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "TypeError: Runner must be a PipelineRunner object or the name of a 
> registered runner." would be a more helpful error message if it included the 
> runner that was found to be incorrect.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300494
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:32
Start Date: 23/Aug/19 20:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524452566
 
 
   Can we just SchemaCoder as the name? I agree that this does not matter much. 
Would changing this in the future will be difficult?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300494)
Time Spent: 8h 40m  (was: 8.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-6783) byte[] breaks in BeamSQL codegen

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6783?focusedWorklogId=300461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300461
 ]

ASF GitHub Bot logged work on BEAM-6783:


Author: ASF GitHub Bot
Created on: 23/Aug/19 19:38
Start Date: 23/Aug/19 19:38
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9415: [BEAM-6783] 
Unignore ZetaSQLDialectSpecTestZetaSQL#testSelectBytes
URL: https://github.com/apache/beam/pull/9415
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300461)
Time Spent: 5h 10m  (was: 5h)

> byte[] breaks in BeamSQL codegen
> 
>
> Key: BEAM-6783
> URL: https://issues.apache.org/jira/browse/BEAM-6783
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Gleb Kanterov
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Calcite will call `byte[].toString` because BeamSQL codegen read byte[] from 
> Row to calcite (see: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L334).
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=300462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300462
 ]

ASF GitHub Bot logged work on BEAM-8080:


Author: ASF GitHub Bot
Created on: 23/Aug/19 19:39
Start Date: 23/Aug/19 19:39
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9414: [BEAM-8080] [SQL] 
Fix relocation of com.google.types
URL: https://github.com/apache/beam/pull/9414#issuecomment-524437861
 
 
   Run SQL PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300462)
Time Spent: 0.5h  (was: 20m)

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8083) Support INT32 type

2019-08-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914569#comment-16914569
 ] 

Rui Wang commented on BEAM-8083:


Hi Gleb,

Missing support of INT32 it is because 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SqlAnalyzer.java#L100.

If you set the mode to ProductMode.PRODUCT_INTERNAL,  INT32 support will back. 
PPRODUCT_EXTERNAL actually disables INT32 type in analyzer directly. You can 
think of the external mode is a subset of internal mode without sacrificing too 
much: INT64 can represent INT32 , etc. 

If on your side you have to use INT32, it would make sense to add the control 
of mode in BeamSqlPipelineOption so we can open/close it on demand. Supporting 
int32 will be very straightforward or already partially exists.



> Support INT32 type
> --
>
> Key: BEAM-8083
> URL: https://issues.apache.org/jira/browse/BEAM-8083
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Priority: Critical
>
> Test to reproduce
> {code}
> @Test
>   public void selectInt32Parameter() {
> String sql = "SELECT @p0";
> ImmutableMap params = ImmutableMap.of("p0", 
> Value.createInt32Value(123));
> ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
> BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql, 
> params);
> PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
> final Schema schema = Schema.builder().addInt32Field("field1").build();
> 
> PAssert.that(stream).containsInAnyOrder(Row.withSchema(schema).addValues(123).build());
> 
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>   }
> {code}
> {code}
> Parameter p0 has unsupported type: INT32
> org.apache.beam.repackaged.sql.com.google.zetasql.SqlException: Parameter p0 
> has unsupported type: INT32
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.Analyzer.analyzeStatement(Analyzer.java:57)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$addBuiltinFunctionsToCatalog$1(SqlAnalyzer.java:149)
>   at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>   at 
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>   at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
>   at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:157)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:80)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecZetaSQLTest.selectInt32Parameter(ZetaSQLDialectSpecZetaSQLTest.java:3204)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>   at 
> 

[jira] [Assigned] (BEAM-8042) Parsing of aggregate query fails

2019-08-23 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-8042:
--

Assignee: Rui Wang

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>
> {code}
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=300467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300467
 ]

ASF GitHub Bot logged work on BEAM-8080:


Author: ASF GitHub Bot
Created on: 23/Aug/19 19:53
Start Date: 23/Aug/19 19:53
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9414: [BEAM-8080] 
[SQL] Fix relocation of com.google.types
URL: https://github.com/apache/beam/pull/9414#discussion_r317280131
 
 

 ##
 File path: sdks/java/extensions/sql/build.gradle
 ##
 @@ -49,6 +50,7 @@ applyJavaNature(
 relocate "com.google.logging", getJavaRelocatedPath("com.google.logging")
 relocate "com.google.longrunning", 
getJavaRelocatedPath("com.google.longrunning")
 relocate "com.google.rpc", getJavaRelocatedPath("com.google.rpc")
+relocate "com.google.api", getJavaRelocatedPath("com.google.api")
 
 Review comment:
   I deleted this line because it breaks the SQL postcommit.
   
   Honestly I am not sure how this line fix BEAM-8080. Could we have a minimal 
relocation(or a minimal exclude relocation) to both fix BEAM-8080 and pass the 
SQL postcommit?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300467)
Time Spent: 40m  (was: 0.5h)

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8042) Parsing of aggregate query fails

2019-08-23 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8042:
---
Status: Open  (was: Triage Needed)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>
> {code}
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

2019-08-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914574#comment-16914574
 ] 

Rui Wang commented on BEAM-8042:


[~kanterov] I will take this one.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>
> {code}
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=300485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300485
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:18
Start Date: 23/Aug/19 20:18
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9388: [BEAM-7909] upgrade 
python lib versions to match to dataflow worker
URL: https://github.com/apache/beam/pull/9388#issuecomment-524448436
 
 
   Thank you!
   
   @Hannah-Jiang could you also file another JIRA issue for finding a process 
to keep this list in sync with setup.py.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300485)
Time Spent: 7h 20m  (was: 7h 10m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8084) List transitive dependencies

2019-08-23 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914589#comment-16914589
 ] 

Ahmet Altay commented on BEAM-8084:
---

cc: [~tvalentyn]

> List transitive dependencies
> 
>
> Key: BEAM-8084
> URL: https://issues.apache.org/jira/browse/BEAM-8084
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Priority: Major
>
> Beam has a list of dependencies (listed in setup.py). These are Beam's direct 
> dependency. Each of these direct dependency will have their own list of 
> dependencies. These are transitive dependencies of Beam. Beam depends on them 
> indirectly and in setup.py Beam does not list them.
> In a worker environment it is beneficial to lock version of all Beam 
> dependencies including the indirect ones.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=300493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300493
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:31
Start Date: 23/Aug/19 20:31
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-524452373
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300493)
Time Spent: 40m  (was: 0.5h)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=300631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300631
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:09
Start Date: 24/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on issue #9237: [BEAM-5878] support 
DoFns with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9237#issuecomment-524497249
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300631)
Time Spent: 8h 20m  (was: 8h 10m)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300636
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:32
Start Date: 24/Aug/19 00:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524499580
 
 
   @reuvenlax I put up a separate PR for this at #9424, since it ended up being 
a pretty large diff.
   
   I removed `RowSize` and `RowSizeTest` since they don't seem to be used and 
it let me remove some `RowCoder` functionality, and the rest of the 
functionality (assigning IDs to schemas, `CODER_MAP`, `coderForFieldType`, 
etc... is now in `SchemaCoder`.
   
   `RowCoder` is still there but just as a trivial sub-class of 
`SchemaCoder`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300636)
Time Spent: 9h 50m  (was: 9h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300635
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:32
Start Date: 24/Aug/19 00:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524499580
 
 
   @reuvenlax I put up a separate PR for this at #9424, since it ended up being 
a pretty large diff.
   
   I removed `RowSize` and `RowSizeTest` since they don't seem to be used and 
it let me remove some `RowCoder` functionality, and the rest of the 
functionality (assigning IDs to schemas, `CODER_MAP`, `coderForFieldType`, 
etc... is now in `SchemaCoder`.
   
   RowCoder is still there but just as a trivial sub-class of 
`SchemaCoder`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300635)
Time Spent: 9h 40m  (was: 9.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=300638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300638
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:59
Start Date: 24/Aug/19 00:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9364: BEAM-7995: python 
PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#issuecomment-524502458
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300638)
Time Spent: 1.5h  (was: 1h 20m)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK operation: 
> [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889]
>  
>     if windows is 0:
>       self.output(_globally_windowed_value.with_value((key, value)))
>     else:
>       self.output(WindowedValue((key, value), *windows[0].end*, windows))
>  
> Here when we generate the window value, the timestamp is assigned to the 
> closed interval end (45.000) as opposed to open interval end (44.999)
>  
> Clearly the "end of window" definition is a bit inconsistent across Python 
> and Java. I'm yet to try this on 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300639
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Aug/19 01:02
Start Date: 24/Aug/19 01:02
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317338268
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   I think that makes sense. It doesn't seem too onerous to ask people to use 
unicode in python 2. And that's a good point that it's a backwards compatible 
change if we find out otherwise. I pushed a commit that does this: 
https://github.com/apache/beam/pull/9188/commits/ecaf73cd8eb82739c21f0a3551efa0d03cc842b6
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this 

[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=300628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300628
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:00
Start Date: 24/Aug/19 00:00
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on issue #9237: [BEAM-5878] support 
DoFns with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9237#issuecomment-524495973
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300628)
Time Spent: 8h 10m  (was: 8h)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=300630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300630
 ]

ASF GitHub Bot logged work on BEAM-7859:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:03
Start Date: 24/Aug/19 00:03
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9421: [BEAM-7859] set SDK 
worker parallelism to 1 in word count test
URL: https://github.com/apache/beam/pull/9421#issuecomment-524496433
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300630)
Time Spent: 40m  (was: 0.5h)

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The error was observed during Beam 2.14.0 release validation, see: 
> https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831
> Looks like master currently fails with a different error, both in Loopback 
> and Docker modes.
> [~ibzib] [~altay] [~robertwb] [~angoenka]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-6858) Support side inputs injected into a DoFn

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6858?focusedWorklogId=300508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300508
 ]

ASF GitHub Bot logged work on BEAM-6858:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:57
Start Date: 23/Aug/19 20:57
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9275: [BEAM-6858] Support 
side inputs injected into a DoFn
URL: https://github.com/apache/beam/pull/9275#issuecomment-524459684
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300508)
Time Spent: 8.5h  (was: 8h 20m)

> Support side inputs injected into a DoFn
> 
>
> Key: BEAM-6858
> URL: https://issues.apache.org/jira/browse/BEAM-6858
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Beam currently supports injecting main inputs into a DoFn process method. A 
> user can write the following:
> @ProcessElement public void process(@Element InputT element)
> And Beam will (using ByteBuddy code generation) inject the input element into 
> the process method.
> We would like to also support the same for side inputs. For example:
> @ProcessElement public void process(@Element InputT element, 
> @SideInput("tag1") String input1, @SideInput("tag2") Integer input2) 
> This requires the existing process-method analysis framework to capture these 
> side inputs. The ParDo code would have to verify the type of the side input 
> and include them in the list of side inputs. This would also eliminate the 
> need for the user to explicitly call withSideInputs on the ParDo.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7980) External environment with containerized worker pool

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?focusedWorklogId=300511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300511
 ]

ASF GitHub Bot logged work on BEAM-7980:


Author: ASF GitHub Bot
Created on: 23/Aug/19 21:07
Start Date: 23/Aug/19 21:07
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #9398: [BEAM-7980] Exactly 
once artifact retrieval for Python SDK worker pool
URL: https://github.com/apache/beam/pull/9398#issuecomment-524462613
 
 
   FYI I'm gone until Sept 3rd, please don't block on unnecessarily on further 
feed back from me.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300511)
Time Spent: 7h 40m  (was: 7.5h)

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-08-23 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914638#comment-16914638
 ] 

Kyle Weaver commented on BEAM-7859:
---

The error happens because work is happening on different docker containers, 
which do not have access to each other's file systems. We need all the work to 
happen in a single container for this to be fixed. This requires:
 # BEAM-7600 is merged (allows the reuse of the SDK harness)
 # The option --sdk_worker_parallelism=1 needs to be set, directly (which I 
prefer, because tracking down defaults is too confusing – BEAM-7657) and/or by 
modifying the Spark runner to default to 1, as Flink does [1]. (Right now, we 
are instead using n_cores-1)

[1] 
[https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobInvoker.java#L73-L75]

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> The error was observed during Beam 2.14.0 release validation, see: 
> https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831
> Looks like master currently fails with a different error, both in Loopback 
> and Docker modes.
> [~ibzib] [~altay] [~robertwb] [~angoenka]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=300528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300528
 ]

ASF GitHub Bot logged work on BEAM-7859:


Author: ASF GitHub Bot
Created on: 23/Aug/19 21:39
Start Date: 23/Aug/19 21:39
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9421: [BEAM-7859] set 
SDK worker parallelism to 1 in word count test
URL: https://github.com/apache/beam/pull/9421
 
 
   This does not affect the Flink runner because of its default behavior:
   
   
https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobInvoker.java#L73-L75
   
   However, I would like to make this explicit, because work must happen in a 
shared filesystem for this job to succeed.
   
   R: @angoenka 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?focusedWorklogId=300539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300539
 ]

ASF GitHub Bot logged work on BEAM-8086:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:05
Start Date: 23/Aug/19 22:05
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9422: [BEAM-8086] add 
isRetraciton() to WindowedValue.
URL: https://github.com/apache/beam/pull/9422#issuecomment-524476898
 
 
   An open question is if the default return value should be `false` than 
throwing an exception. It could be left to the future for clarification.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300539)
Time Spent: 20m  (was: 10m)

> Add isRetraction() to WindowedValue
> ---
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7711) Support DATETIME as a logical type in BeamSQL

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7711?focusedWorklogId=300540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300540
 ]

ASF GitHub Bot logged work on BEAM-7711:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:06
Start Date: 23/Aug/19 22:06
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #8994: [BEAM-7711] Add 
DATETIME as a logical type in BeamSQL
URL: https://github.com/apache/beam/pull/8994#issuecomment-524477076
 
 
   There is still an open question for type mapping. I will clsoe this PR as I 
am not working on it so other people can pick it up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300540)
Time Spent: 3h 20m  (was: 3h 10m)

> Support DATETIME as a logical type in BeamSQL
> -
>
> Key: BEAM-7711
> URL: https://issues.apache.org/jira/browse/BEAM-7711
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> DATETIME as a type represents a year, month, day, hour, minute, second, and 
> subsecond(millis)
> it ranges from 0001-01-01 00:00:00 to -12-31 23:59:59.999.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7711) Support DATETIME as a logical type in BeamSQL

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7711?focusedWorklogId=300541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300541
 ]

ASF GitHub Bot logged work on BEAM-7711:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:06
Start Date: 23/Aug/19 22:06
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #8994: [BEAM-7711] 
Add DATETIME as a logical type in BeamSQL
URL: https://github.com/apache/beam/pull/8994
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300541)
Time Spent: 3.5h  (was: 3h 20m)

> Support DATETIME as a logical type in BeamSQL
> -
>
> Key: BEAM-7711
> URL: https://issues.apache.org/jira/browse/BEAM-7711
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> DATETIME as a type represents a year, month, day, hour, minute, second, and 
> subsecond(millis)
> it ranges from 0001-01-01 00:00:00 to -12-31 23:59:59.999.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300542
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:13
Start Date: 23/Aug/19 22:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317318110
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   Thinking about this some more, what about just rejecting str in Python 2 
(forcing the user to say unicode). We can loosen things if this becomes too 
cumbersome in the future (but going the other way is backwards incompatible). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=300547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300547
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:22
Start Date: 23/Aug/19 22:22
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9376: [BEAM-8014] 
Using OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#discussion_r317319705
 
 

 ##
 File path: sdks/python/apache_beam/io/restriction_trackers.py
 ##
 @@ -159,7 +162,8 @@ def try_split(self, fraction_of_remainder):
 cur + int(max(1, (self._range.stop - cur) * 
fraction_of_remainder)))
 if split_point < self._range.stop:
   prev_stop, self._range.stop = self._range.stop, split_point
-  return (self._range.start, split_point), (split_point, prev_stop)
+  return (OffsetRange(self._range.start, split_point),
 
 Review comment:
   I wonder if a `OffsetRange.split(split_pos)` method would make sense here? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300547)
Time Spent: 20m  (was: 10m)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=300552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300552
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:29
Start Date: 23/Aug/19 22:29
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9364: BEAM-7995: 
python PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#discussion_r317320944
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -886,7 +886,7 @@ def output_key(self, wkey, accumulator):
 if windows is 0:
   self.output(_globally_windowed_value.with_value((key, value)))
 else:
-  self.output(WindowedValue((key, value), windows[0].end, windows))
+  self.output(WindowedValue((key, value), windows[0].max_timestamp(), 
windows))
 
 Review comment:
   This change certainly fixes a bug (though I agree the docs could and should 
be cleaned up). 
   
   We should file a JIRA that we're not supporting more general timestamp 
combining fns here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300552)
Time Spent: 1h  (was: 50m)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK 

[jira] [Commented] (BEAM-7998) MatchesFiles or MatchAll seems to return seveval time the same element

2019-08-23 Thread Jerome MASSOT (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914688#comment-16914688
 ] 

Jerome MASSOT commented on BEAM-7998:
-

Hi Pablo,

complement of information : the MatchFiles() and ReadMatches() send me back 
several times the same file ... with different metadata.path attached...

I know it because I hash the metadata.path returned to used it as PrimaryKey in 
a SQL database... and I cannot remove the duplicates because the PrimaryKey of 
duplicates are different...

Using hash() implementation in Python, the only possible for different hashed.. 
is a different metadata.path...

Good luck

Best regards

Jerome

> MatchesFiles or MatchAll seems to return seveval time the same element
> --
>
> Key: BEAM-7998
> URL: https://issues.apache.org/jira/browse/BEAM-7998
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-files
>Affects Versions: 2.14.0
> Environment: GCP for storage, DirectRunner and DataflowRunner both 
> have the problem. PyCharm on Win10 for IDE and dev environment.
>Reporter: Jerome MASSOT
>Assignee: Pablo Estrada
>Priority: Major
>
> Hi team,
> when I use MatcheFiles using wildcard and files located in a GCP bucket, the 
> MatcheFiles transform returns several times (at least 2) the same file.
> I have tried to follow the stack, and I can see that the MatchesAll is called 
> twice when I run the pipeline on a debug project where a single element is 
> present in the bucket.
> But I am not good enough to say more than that. Sorry.
> Best regards
> Jerome



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300606
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:08
Start Date: 23/Aug/19 23:08
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524488668
 
 
   Let me see if I can put together a patch that does that. I think I could 
just get rid of the current RowCoder and move it's logic to SchemaCoder and 
RowCoderGenerator.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300606)
Time Spent: 9.5h  (was: 9h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=300605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300605
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:08
Start Date: 23/Aug/19 23:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067#issuecomment-524488554
 
 
   Ping?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300605)
Time Spent: 2h 40m  (was: 2.5h)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=300610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300610
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:11
Start Date: 23/Aug/19 23:11
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #9351: [BEAM-7909] support 
customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-524489235
 
 
   LGTM. Thanks, Hannah.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300610)
Time Spent: 7.5h  (was: 7h 20m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8087) PGBKCVOperation is undocumented

2019-08-23 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8087:
-

 Summary: PGBKCVOperation is undocumented
 Key: BEAM-8087
 URL: https://issues.apache.org/jira/browse/BEAM-8087
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Kyle Weaver


I can't tell at a glance exactly what PGBKCVOperation is doing, or even what it 
stands for. A docstring would be appreciated.

[https://github.com/apache/beam/blob/19d9f28094b6c0d621380736cd3980d46a40214b/sdks/python/apache_beam/runners/worker/operations.py#L814]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=300613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300613
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:23
Start Date: 23/Aug/19 23:23
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9364: BEAM-7995: python 
PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#issuecomment-524491076
 
 
   This is only tangential to this PR, but while we're here, can someone tell 
me what "PGBKCV" stands for? :sweat_smile: I'd like to see this documented. 
https://issues.apache.org/jira/browse/BEAM-8087
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300613)
Time Spent: 1h 10m  (was: 1h)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK operation: 
> [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889]
>  
>     if windows is 0:
>       self.output(_globally_windowed_value.with_value((key, value)))
>     else:
>       self.output(WindowedValue((key, value), *windows[0].end*, windows))
>  
> Here when we generate the window value, the timestamp is assigned to the 
> closed 

[jira] [Resolved] (BEAM-7642) (Dataflow) Python AfterProcessingTime fires before defined time

2019-08-23 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang resolved BEAM-7642.
---
Resolution: Fixed

> (Dataflow) Python AfterProcessingTime fires before defined time
> ---
>
> Key: BEAM-7642
> URL: https://issues.apache.org/jira/browse/BEAM-7642
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Mikhail Gryzykhin
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://stackoverflow.com/questions/56700913/why-my-data-in-apache-beam-is-emitted-after-a-few-minutes-instead-of-10h]
>  
> User on StackOverflow has a problem that AfterProcessingTime on global window 
> fires before allocated time (10h). It dumps events within first 
> minutes/seconds and then drops the rest due to window closed.
> Code user provided seems valid. I tried to verify time units accepted by the 
> method, but I couldn't track it all the way through the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=300514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300514
 ]

ASF GitHub Bot logged work on BEAM-7600:


Author: ASF GitHub Bot
Created on: 23/Aug/19 21:24
Start Date: 23/Aug/19 21:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9095: [BEAM-7600] borrow SDK 
harness management code into Spark runner
URL: https://github.com/apache/beam/pull/9095#issuecomment-524467042
 
 
   Alright, I've finally ironed out the problems with this one. However, we 
should merge #9410 first for best results.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300514)
Time Spent: 5.5h  (was: 5h 20m)

> Spark portable runner: reuse SDK harness
> 
>
> Key: BEAM-7600
> URL: https://issues.apache.org/jira/browse/BEAM-7600
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Right now, we're creating a new SDK harness every time an executable stage is 
> run [1], which is expensive. We should be able to re-use code from the Flink 
> runner to re-use the SDK harness [2].
>  
> [1] 
> [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L135]
> [2] 
> [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8086) Add retraction flag to windowed value

2019-08-23 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8086:
---
Status: Open  (was: Triage Needed)

> Add retraction flag to windowed value
> -
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8086) Add retraction flag to windowed value

2019-08-23 Thread Rui Wang (Jira)
Rui Wang created BEAM-8086:
--

 Summary: Add retraction flag to windowed value
 Key: BEAM-8086
 URL: https://issues.apache.org/jira/browse/BEAM-8086
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-java-core
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?focusedWorklogId=300537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300537
 ]

ASF GitHub Bot logged work on BEAM-8086:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:01
Start Date: 23/Aug/19 22:01
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9422: [BEAM-8086] 
add isRetraciton() to WindowedValue.
URL: https://github.com/apache/beam/pull/9422
 
 
   We need a way to mark WindowedValue as a retraction. Current idea is we 
should separate sub-classes of WindowedValue without and with retraction flag 
to reduce the impact of enabling retractions. It is mostly because I plan to 
use Boolean as the flag of retraction so each instance of WindowedValue will 
have 8 bytes overhead if we don't have such a separation.
   
   In the future, as the implementation of retraction work goes, I will 
probably add more sub-classes of WindowedValue which implements isRetraction(), 
and those will be used for retracting mode.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Updated] (BEAM-8015) Get logs for SDK worker Docker containers

2019-08-23 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8015:
--
Summary: Get logs for SDK worker Docker containers  (was: Get logs for 
Docker containers that fail to start up)

> Get logs for SDK worker Docker containers
> -
>
> Key: BEAM-8015
> URL: https://issues.apache.org/jira/browse/BEAM-8015
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, when SDK worker containers fail to start up properly, an exception 
> is thrown that provides no information about what happened. We can improve 
> debugging by keeping containers around long enough to log their logs before 
> removing them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?focusedWorklogId=300554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300554
 ]

ASF GitHub Bot logged work on BEAM-8086:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:32
Start Date: 23/Aug/19 22:32
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9422: [BEAM-8086] 
add isRetraciton() to WindowedValue.
URL: https://github.com/apache/beam/pull/9422#discussion_r317321495
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java
 ##
 @@ -141,6 +141,11 @@ public boolean isSingleWindowedValue() {
 return false;
   }
 
+  /** Returns {@code true} if this WindowedValue is a retraction. */
+  public boolean isRetraction() {
 
 Review comment:
   Should we add `@Expremental`? How to decide what piece of SDK is user facing?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300554)
Time Spent: 40m  (was: 0.5h)

> Add isRetraction() to WindowedValue
> ---
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?focusedWorklogId=300553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300553
 ]

ASF GitHub Bot logged work on BEAM-8086:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:31
Start Date: 23/Aug/19 22:31
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9422: [BEAM-8086] add 
isRetraciton() to WindowedValue.
URL: https://github.com/apache/beam/pull/9422#issuecomment-524476898
 
 
   An open question is if the default return value should be `false` than 
throwing an exception. It could be left to the future for decision.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300553)
Time Spent: 0.5h  (was: 20m)

> Add isRetraction() to WindowedValue
> ---
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?focusedWorklogId=300580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300580
 ]

ASF GitHub Bot logged work on BEAM-8086:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:38
Start Date: 23/Aug/19 22:38
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9422: [BEAM-8086] add 
isRetraciton() to WindowedValue.
URL: https://github.com/apache/beam/pull/9422#issuecomment-524483429
 
 
   @lukecwik Thanks for the idea. It reminds me that I actually got this 
suggestion in #9199 as well.
   
   Let me compare PaneInfo with WindowedValue and I will summarize pros and 
cons for the further discussion. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300580)
Time Spent: 1h  (was: 50m)

> Add isRetraction() to WindowedValue
> ---
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=300588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300588
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:44
Start Date: 23/Aug/19 22:44
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9376: [BEAM-8014] 
Using OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#discussion_r317323491
 
 

 ##
 File path: sdks/python/apache_beam/testing/synthetic_pipeline.py
 ##
 @@ -520,12 +520,12 @@ def process(
   element,
   restriction_tracker=beam.DoFn.RestrictionParam(
   SyntheticSDFSourceRestrictionProvider())):
-for k in range(*restriction_tracker.current_restriction()):
-  if not restriction_tracker.try_claim(k):
-return
-  r = np.random.RandomState(k)
+cur = restriction_tracker.start_position()
+while restriction_tracker.try_claim(cur):
 
 Review comment:
   Yep, I can change back to for loop. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300588)
Time Spent: 1h  (was: 50m)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=300589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300589
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:44
Start Date: 23/Aug/19 22:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9334: [BEAM-7972] Always 
use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#issuecomment-524484393
 
 
   OK, yuck, that JRH code is really bad (and possibly buggy). The windowing 
function need not be interpreted by the runner, it should just note it's 
non-merging and pass things through. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300589)
Time Spent: 2h 20m  (was: 2h 10m)

> Portable Python Reshuffle does not work with windowed pcollection
> -
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7973) Python doesn't shut down Flink job server properly

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7973?focusedWorklogId=300591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300591
 ]

ASF GitHub Bot logged work on BEAM-7973:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:46
Start Date: 23/Aug/19 22:46
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9340: [BEAM-7973] py: shut 
down Flink job server automatically
URL: https://github.com/apache/beam/pull/9340#issuecomment-524484739
 
 
   I'm fine with exit. If the user doesn't want the job server hogging 
resources, they can just not call `wait_until_finish`, right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300591)
Time Spent: 40m  (was: 0.5h)

> Python doesn't shut down Flink job server properly
> --
>
> Key: BEAM-7973
> URL: https://issues.apache.org/jira/browse/BEAM-7973
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Using the new Python FlinkRunner [1], a new job server is created and the job 
> succeeds, but seemingly not being shut down properly when the Python command 
> exits. Specifically, the java -jar command that started the job server is 
> still running in the background, eating up memory.
> Relevant args:
> python ...
>  --runner FlinkRunner \ 
>  --flink_job_server_jar $FLINK_JOB_SERVER_JAR ...
> [1] [https://github.com/apache/beam/pull/9043]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300594
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:02
Start Date: 23/Aug/19 23:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317326390
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   Do we really need to support python2? If this is going to be a burden in 
general, I would rather not add support for it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300594)
Time Spent: 9h 10m  (was: 9h)

> Make row coder a 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300604
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:07
Start Date: 23/Aug/19 23:07
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317327150
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   I think it'll be harder to consistently exclude support for Python 2 for all 
schema use. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300604)
Time Spent: 9h 20m  (was: 9h 10m)

> Make row coder a standard coder and implement 

[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=300611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300611
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:14
Start Date: 23/Aug/19 23:14
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on issue #9351: [BEAM-7909] support 
customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-524489675
 
 
   And python portable failed on streaming wordcount. please fix.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300611)
Time Spent: 7h 40m  (was: 7.5h)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-7804) Fix unclear python programming guide document

2019-08-23 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang resolved BEAM-7804.
---
Fix Version/s: 2.16.0
   Resolution: Fixed

> Fix unclear python programming guide document
> -
>
> Key: BEAM-7804
> URL: https://issues.apache.org/jira/browse/BEAM-7804
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [https://beam.apache.org/documentation/programming-guide/#additional-outputs],
>  section 4.5
> last two code snippets don't provide enough context when switching language 
> from java to python



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7995) IllegalStateException: TimestampCombiner moved element from to earlier time in Python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7995?focusedWorklogId=300621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300621
 ]

ASF GitHub Bot logged work on BEAM-7995:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:40
Start Date: 23/Aug/19 23:40
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9364: BEAM-7995: python 
PGBKCVOperation using wrong timestamp
URL: https://github.com/apache/beam/pull/9364#issuecomment-524491076
 
 
   This is only tangential to this PR, but while we're here, can someone tell 
me what "PGBKCV" stands for? :sweat_smile: I'd like to see this documented. 
https://issues.apache.org/jira/browse/BEAM-8087
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300621)
Time Spent: 1h 20m  (was: 1h 10m)

> IllegalStateException: TimestampCombiner moved element from to earlier time 
> in Python
> -
>
> Key: BEAM-7995
> URL: https://issues.apache.org/jira/browse/BEAM-7995
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I'm looking into a bug I found internally when using Beam portable API 
> (Python) on our own Samza runner. 
>  
> The pipeline looks something like this:
>  
>     (p
>      | 'read' >> ReadFromKafka(cluster="tracking", topic="PageViewEvent")
>      | 'transform' >> beam.Map(lambda event: process_event(event))
>      | 'window' >> beam.WindowInto(FixedWindows(15))
>      | 'group' >> *beam.CombinePerKey(beam.combiners.CountCombineFn())*
>      ...
>  
> The problem comes from the combiners which cause the following exception on 
> Java side:
>  
> Caused by: java.lang.IllegalStateException: TimestampCombiner moved element 
> from 2019-08-15T03:34:*45.000*Z to earlier time 2019-08-15T03:34:*44.999*Z 
> for window [2019-08-15T03:34:30.000Z..2019-08-15T03:34:*45.000*Z)
>     at 
> org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
>     at 
> org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
>     at 
> org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
>     at 
> org.apache.beam.runners.core.GroupAlsoByWindowViaWindowSetNewDoFn.processElement(GroupAlsoByWindowViaWindowSetNewDoFn.java:136)
>  
> The exception happens here 
> [https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/WatermarkHold.java#L116]
>  when we check the shifted timestamp to ensure it's before the timestamp.
>  
>     if (shifted.isBefore(timestamp)) {
>       throw new IllegalStateException(
>           String.format(
>               "TimestampCombiner moved element from %s to earlier time %s for 
> window %s",
>               BoundedWindow.formatTimestamp(timestamp),
>               BoundedWindow.formatTimestamp(shifted),
>               window));
>     }
>  
> As you can see from the exception, the "shifted" is "XXX 44.999" while the 
> "timestamp" is "XXX 45.000". The "44.999" is coming from 
> [TimestampCombiner.END_OF_WINDOW|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/TimestampCombiner.java#L116]:
>  
>     @Override
>     public Instant merge(BoundedWindow intoWindow, Iterable Instant> mergingTimestamps) {
>       return intoWindow.maxTimestamp();
>     }
>  
> where intoWindow.maxTimestamp() is:
>  
>   /** Returns the largest timestamp that can be included in this window. */
>   @Override
>   public Instant maxTimestamp() {
>     *// end not inclusive*
>     return *end.minus(1)*;
>   }
>  
> Hence, the "44.*999*". 
>  
> And the "45.000" comes from the Python side when the combiner output results 
> as pre GBK operation: 
> [operations.py#PGBKCVOperation#output_key|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py#L889]
>  
>     if windows is 0:
>       self.output(_globally_windowed_value.with_value((key, value)))
>     else:
>       self.output(WindowedValue((key, value), *windows[0].end*, windows))
>  
> Here when we generate the window value, the timestamp is assigned to the 
> 

[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=300519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300519
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 23/Aug/19 21:30
Start Date: 23/Aug/19 21:30
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-524468593
 
 
   Run Release Gradle Build
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300519)
Time Spent: 50m  (was: 40m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300523
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 21:34
Start Date: 23/Aug/19 21:34
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524469704
 
 
   This is a subclass of schema coder.
   
   Ideally we should get rid of the current row coder (make it a utility
   class) and call this RowCoder.
   
   On Fri, Aug 23, 2019, 1:32 PM Ahmet Altay  wrote:
   
   > Can we just SchemaCoder as the name? I agree that this does not matter
   > much. Would changing this in the future will be difficult?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300523)
Time Spent: 8h 50m  (was: 8h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8086:
---
Summary: Add isRetraction() to WindowedValue  (was: Add retraction flag to 
windowed value)

> Add isRetraction() to WindowedValue
> ---
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=300543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300543
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:20
Start Date: 23/Aug/19 22:20
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9401: [BEAM-7616] apitools use 
urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524479936
 
 
   > I looked at the postcommit tests, I do not see the new log message. Not 
sure why. Save main session is not used in all post commits, so there is 
probably another reason. Feel free to take this over and try the other idea of 
applying in init.
   
   I think that INFO logs are not printed nor saved to xunit XML output if the 
test is successful. Ways around this:
   1. Add "--nologcapture" to `basicTestOpts` and change increase logging level 
to warning or error.
   1. Use `print()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300543)
Time Spent: 4h  (was: 3h 50m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=300550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300550
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:26
Start Date: 23/Aug/19 22:26
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9376: [BEAM-8014] 
Using OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#discussion_r317320247
 
 

 ##
 File path: sdks/python/apache_beam/testing/synthetic_pipeline.py
 ##
 @@ -520,12 +520,12 @@ def process(
   element,
   restriction_tracker=beam.DoFn.RestrictionParam(
   SyntheticSDFSourceRestrictionProvider())):
-for k in range(*restriction_tracker.current_restriction()):
-  if not restriction_tracker.try_claim(k):
-return
-  r = np.random.RandomState(k)
+cur = restriction_tracker.start_position()
+while restriction_tracker.try_claim(cur):
 
 Review comment:
   I think a for loop would still be more natural here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300550)
Time Spent: 0.5h  (was: 20m)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=300551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300551
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:26
Start Date: 23/Aug/19 22:26
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9376: [BEAM-8014] 
Using OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#discussion_r317319851
 
 

 ##
 File path: sdks/python/apache_beam/io/restriction_trackers.py
 ##
 @@ -171,7 +175,7 @@ def checkpoint(self):
   else:
 end_position = self._current_position + 1
 
-  residual_range = (end_position, self._range.stop)
+  residual_range = OffsetRange(end_position, self._range.stop)
 
 Review comment:
   In which case this could be
   
   `self._range, residual_range = self._range.split(end_position)`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300551)
Time Spent: 0.5h  (was: 20m)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8086) Add isRetraction() to WindowedValue

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8086?focusedWorklogId=300555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300555
 ]

ASF GitHub Bot logged work on BEAM-8086:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:35
Start Date: 23/Aug/19 22:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9422: [BEAM-8086] add 
isRetraciton() to WindowedValue.
URL: https://github.com/apache/beam/pull/9422#issuecomment-524482797
 
 
   LATE/EARLY/... status is part of PaneInfo, would that be a better place?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300555)
Time Spent: 50m  (was: 40m)

> Add isRetraction() to WindowedValue
> ---
>
> Key: BEAM-8086
> URL: https://issues.apache.org/jira/browse/BEAM-8086
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=300586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300586
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:42
Start Date: 23/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9334: [BEAM-7972] 
Always use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#discussion_r317323227
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -612,29 +611,23 @@ def restore_timestamps(element):
 for (value, timestamp) in values]
 
 else:
-  # The linter is confused.
-  # hash(1) is used to force "runtime" selection of _IdentityWindowFn
-  # pylint: disable=abstract-class-instantiated
-  cls = hash(1) and _IdentityWindowFn
-  window_fn = cls(
-  windowing_saved.windowfn.get_window_coder())
-
-  def reify_timestamps(element, timestamp=DoFn.TimestampParam):
+  def reify_timestamps(element,
+   timestamp=DoFn.TimestampParam,
+   window=DoFn.WindowParam):
 key, value = element
-return key, TimestampedValue(value, timestamp)
+# Transport the window as part of the value and restore it later.
+return key, TimestampedValue((value, window), timestamp)
 
 Review comment:
   Why would the timestamp be duplicated? The (window, timestamp, value) tuple 
seems best represented by a WindowedValue. (I would be OK with a plain old 
3-tuple as well, but only going half way seems odd). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300586)
Time Spent: 2h 10m  (was: 2h)

> Portable Python Reshuffle does not work with windowed pcollection
> -
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=300584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300584
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:42
Start Date: 23/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9376: [BEAM-8014] 
Using OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#discussion_r317323192
 
 

 ##
 File path: sdks/python/apache_beam/io/restriction_trackers.py
 ##
 @@ -159,7 +162,8 @@ def try_split(self, fraction_of_remainder):
 cur + int(max(1, (self._range.stop - cur) * 
fraction_of_remainder)))
 if split_point < self._range.stop:
   prev_stop, self._range.stop = self._range.stop, split_point
-  return (self._range.start, split_point), (split_point, prev_stop)
+  return (OffsetRange(self._range.start, split_point),
 
 Review comment:
   Yes, I agree. There is a `OffsetRange.split(bundle_size)` to perform initial 
split already. If we want to provide splitting into 2 pieces based on given 
pos, we need to have a different naming, like 'split_at'?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300584)
Time Spent: 40m  (was: 0.5h)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=300585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300585
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:42
Start Date: 23/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9376: [BEAM-8014] 
Using OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#discussion_r317323192
 
 

 ##
 File path: sdks/python/apache_beam/io/restriction_trackers.py
 ##
 @@ -159,7 +162,8 @@ def try_split(self, fraction_of_remainder):
 cur + int(max(1, (self._range.stop - cur) * 
fraction_of_remainder)))
 if split_point < self._range.stop:
   prev_stop, self._range.stop = self._range.stop, split_point
-  return (self._range.start, split_point), (split_point, prev_stop)
+  return (OffsetRange(self._range.start, split_point),
 
 Review comment:
   Yes, I agree. There is a `OffsetRange.split(bundle_size)` to perform initial 
split already. If we want to provide splitting into 2 pieces based on given 
pos, we need to have a different naming, like `split_at`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300585)
Time Spent: 50m  (was: 40m)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

2019-08-23 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914698#comment-16914698
 ] 

Rui Wang commented on BEAM-8042:


I cannot reproduce the exception from AggregateProjectMergeRule, but I do can 
reproduce an IndexOutOfBoundException. Needs further investigation.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>
> {code}
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=300595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300595
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:04
Start Date: 23/Aug/19 23:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524488023
 
 
   I should be more clear. I looked at the stackdriver logs for some of the 
post commit tests that ran on Dataflow. I expected worker logs to have this 
info statement.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300595)
Time Spent: 4h 10m  (was: 4h)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=300596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300596
 ]

ASF GitHub Bot logged work on BEAM-7859:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:04
Start Date: 23/Aug/19 23:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9421: [BEAM-7859] set SDK 
worker parallelism to 1 in word count test
URL: https://github.com/apache/beam/pull/9421#issuecomment-524488028
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300596)
Time Spent: 20m  (was: 10m)

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The error was observed during Beam 2.14.0 release validation, see: 
> https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831
> Looks like master currently fails with a different error, both in Loopback 
> and Docker modes.
> [~ibzib] [~altay] [~robertwb] [~angoenka]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=300597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300597
 ]

ASF GitHub Bot logged work on BEAM-7859:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:04
Start Date: 23/Aug/19 23:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9421: [BEAM-7859] set SDK 
worker parallelism to 1 in word count test
URL: https://github.com/apache/beam/pull/9421#issuecomment-524488040
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300597)
Time Spent: 0.5h  (was: 20m)

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The error was observed during Beam 2.14.0 release validation, see: 
> https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831
> Looks like master currently fails with a different error, both in Loopback 
> and Docker modes.
> [~ibzib] [~altay] [~robertwb] [~angoenka]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-6318) beam-sdks-python:setupVirtualenv sometimes fails due to a pip flake "No matching distribution found"

2019-08-23 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914699#comment-16914699
 ] 

Kyle Weaver commented on BEAM-6318:
---

Ran into this today: 

14:43:03 ERROR: Could not find a version that satisfies the requirement 
six<2,>=1.0.0 (from tox==3.11.1) (from versions: none)

[https://builds.apache.org/job/beam_PreCommit_Python_Commit/8270/console]

> beam-sdks-python:setupVirtualenv sometimes fails due to a pip flake "No 
> matching distribution found"
> 
>
> Key: BEAM-6318
> URL: https://issues.apache.org/jira/browse/BEAM-6318
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Andrew Pilloud
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/752/
> {code}
> 16:00:43 Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)
> 16:00:43   Could not find a version that satisfies the requirement 
> pluggy<1.0,>=0.3.0 (from tox==3.0.0) (from versions: )
> 16:00:43 No matching distribution found for pluggy<1.0,>=0.3.0 (from 
> tox==3.0.0)
> 16:00:46 
> 16:00:46 > Task :beam-sdks-python:setupVirtualenv FAILED
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=300067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300067
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 23/Aug/19 06:09
Start Date: 23/Aug/19 06:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-524185713
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300067)
Time Spent: 27h 10m  (was: 27h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8039) SUM(CASE WHEN xxx THEN 1 ELSE 0)

2019-08-23 Thread Gleb Kanterov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914005#comment-16914005
 ] 

Gleb Kanterov commented on BEAM-8039:
-

A workaround would be supporting COUNTIF function from ZetaSQL, that might be 
easier to implement than more generic rewrites.

> SUM(CASE WHEN xxx THEN 1 ELSE 0)
> 
>
> Key: BEAM-8039
> URL: https://issues.apache.org/jira/browse/BEAM-8039
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> java.lang.RuntimeException: Aggregate function only accepts Column Reference 
> or CAST(Column Reference) as its input.
> I was able to rewrite SQL using WITH statement, and it seemed to work, but 
> requires us rewriting a lot of queries and makes them pretty much unreadable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8042) Parsing of aggregate query fails

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8042:

Description: 
{code}
SELECT
  key,
  COUNT(*) as f1,
  SUM(has_f2) AS f2,
  SUM(has_f3) AS f3,
  SUM(has_f4) AS f4,
  SUM(has_f5) AS f5,
  SUM(has_f6) AS f6,
  SUM(has_f7) AS f7
FROM xxx
GROUP BY key


Caused by: java.lang.RuntimeException: Error while applying rule 
AggregateProjectMergeRule, args 
[rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
 
rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
at 
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
at 
org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
at 
... 39 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at 
org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
... 48 more

{code}


  was:
SELECT
  key,
  COUNT(*) as f1,
  SUM(has_f2) AS f2,
  SUM(has_f3) AS f3,
  SUM(has_f4) AS f4,
  SUM(has_f5) AS f5,
  SUM(has_f6) AS f6,
  SUM(has_f7) AS f7
FROM xxx
GROUP BY key


Caused by: java.lang.RuntimeException: Error while applying rule 
AggregateProjectMergeRule, args 
[rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
 
rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
at 
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
at 
org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
at 
... 39 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at 
org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
at 
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
... 48 more





> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> {code}
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while 

[jira] [Updated] (BEAM-8042) Parsing of aggregate query fails

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8042:

Priority: Critical  (was: Major)

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>
> {code}
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8040) NPE in table name resolver when selecting from a table that doesn't exist

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8040:

Description: 
NullPointerException when selecting from a table that doesn't exist.

{code}
Caused by: java.lang.NullPointerException
at 
org.apache.beam.sdk.extensions.sql.zetasql.TableResolverImpl.assumeLeafIsTable(TableResolverImpl.java:42)
at 
org.apache.beam.sdk.extensions.sql.zetasql.TableResolution.resolveCalciteTable(TableResolution.java:48)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:174)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$0(SqlAnalyzer.java:132)
{code}


  was:
NullPointerException when selecting from a table that doesn't exist.


Caused by: java.lang.NullPointerException
at 
org.apache.beam.sdk.extensions.sql.zetasql.TableResolverImpl.assumeLeafIsTable(TableResolverImpl.java:42)
at 
org.apache.beam.sdk.extensions.sql.zetasql.TableResolution.resolveCalciteTable(TableResolution.java:48)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:174)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$0(SqlAnalyzer.java:132)



> NPE in table name resolver when selecting from a table that doesn't exist
> -
>
> Key: BEAM-8040
> URL: https://issues.apache.org/jira/browse/BEAM-8040
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> NullPointerException when selecting from a table that doesn't exist.
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.TableResolverImpl.assumeLeafIsTable(TableResolverImpl.java:42)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.TableResolution.resolveCalciteTable(TableResolution.java:48)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:174)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$0(SqlAnalyzer.java:132)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8075) IndexOutOfBounds in LogicalProject

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8075:

Description: 
{code}
SELECT payload.bankId, 
   SUM(payload.purchaseAmountCents) / 100 AS totalPurchase
FROM pubsub.topic.`instant-insights`.`retaildemo-online-transactions-json`
GROUP BY payload.bankId
{code}

Causes the workers to fail with:

{code}
Exception in thread "main" java.lang.RuntimeException: Error while applying 
rule ProjectToCalcRule, args 
[rel#9:LogicalProject.NONE(input=RelSubset#8,bankId=$0,totalPurchase=/(CAST($3):DOUBLE
 NOT NULL, 1E2))]
at org.apache
{code}

  was:
SELECT payload.bankId, 
   SUM(payload.purchaseAmountCents) / 100 AS totalPurchase
FROM pubsub.topic.`instant-insights`.`retaildemo-online-transactions-json`
GROUP BY payload.bankId

Causes the workers to fail with:

Exception in thread "main" java.lang.RuntimeException: Error while applying 
rule ProjectToCalcRule, args 
[rel#9:LogicalProject.NONE(input=RelSubset#8,bankId=$0,totalPurchase=/(CAST($3):DOUBLE
 NOT NULL, 1E2))]
at org.apache


> IndexOutOfBounds in LogicalProject
> --
>
> Key: BEAM-8075
> URL: https://issues.apache.org/jira/browse/BEAM-8075
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> {code}
> SELECT payload.bankId, 
>SUM(payload.purchaseAmountCents) / 100 AS totalPurchase
> FROM pubsub.topic.`instant-insights`.`retaildemo-online-transactions-json`
> GROUP BY payload.bankId
> {code}
> Causes the workers to fail with:
> {code}
> Exception in thread "main" java.lang.RuntimeException: Error while applying 
> rule ProjectToCalcRule, args 
> [rel#9:LogicalProject.NONE(input=RelSubset#8,bankId=$0,totalPurchase=/(CAST($3):DOUBLE
>  NOT NULL, 1E2))]
> at org.apache
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=300145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300145
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 23/Aug/19 08:07
Start Date: 23/Aug/19 08:07
Worklog Time Spent: 10m 
  Work Description: yifanzou commented on pull request #9411: [BEAM-8079] 
Move release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#discussion_r317017968
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_Build.groovy
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+
+job('beam_Release_Gradle_Build') {
 
 Review comment:
   Shall we add a comment to explain what this job does, and add a link to the 
release guide?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300145)
Time Spent: 20m  (was: 10m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov reassigned BEAM-8080:
---

Assignee: Gleb Kanterov

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8080:

Description: 
{code}
java.lang.NoClassDefFoundError: 
org/apache/beam/repackaged/sql/com/google/type/Date
at 
org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
at 
org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
at 
org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
{code}

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread Gleb Kanterov (Jira)
Gleb Kanterov created BEAM-8080:
---

 Summary: java.lang.NoClassDefFoundError: 
org/apache/beam/repackaged/sql/com/google/type/Date
 Key: BEAM-8080
 URL: https://issues.apache.org/jira/browse/BEAM-8080
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Gleb Kanterov
 Fix For: 2.16.0






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work started] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8080 started by Gleb Kanterov.
---
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov updated BEAM-8080:

Status: Open  (was: Triage Needed)

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=300148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300148
 ]

ASF GitHub Bot logged work on BEAM-8080:


Author: ASF GitHub Bot
Created on: 23/Aug/19 08:14
Start Date: 23/Aug/19 08:14
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9414: [BEAM-8080] 
[SQL] Fix relocation of com.google.types
URL: https://github.com/apache/beam/pull/9414
 
 
   Classes were relocated, but we never included artifact with them.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7980) External environment with containerized worker pool

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?focusedWorklogId=300153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300153
 ]

ASF GitHub Bot logged work on BEAM-7980:


Author: ASF GitHub Bot
Created on: 23/Aug/19 08:23
Start Date: 23/Aug/19 08:23
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9398: [BEAM-7980] 
Exactly once artifact retrieval for Python SDK worker pool
URL: https://github.com/apache/beam/pull/9398#discussion_r317022471
 
 

 ##
 File path: sdks/python/container/boot.go
 ##
 @@ -105,18 +107,57 @@ func main() {
 
// (2) Retrieve and install the staged packages.
 
-   dir := filepath.Join(*semiPersistDir, "staged")
+   func() {
 
-   files, err := artifact.Materialize(ctx, *artifactEndpoint, 
info.GetRetrievalToken(), dir)
-   if err != nil {
-   log.Fatalf("Failed to retrieve staged files: %v", err)
-   }
+installCompleteFile := filepath.Join(os.TempDir(), 
"beam.install.complete")
 
-   // TODO(herohde): the packages to install should be specified 
explicitly. It
-   // would also be possible to install the SDK in the Dockerfile.
-   if setupErr := installSetupPackages(files, dir); setupErr != nil {
-   log.Fatalf("Failed to install required packages: %v", setupErr)
-   }
+   // skip if install already complete
+   _, err = os.Stat(installCompleteFile)
+   if err == nil {
+   return
+   }
+
+   // lock to guard from concurrent artifact retrieval and 
installation,
+   // when called by child processes in a worker pool
+   lock, err := lockfile.New(filepath.Join(os.TempDir(), 
"beam.install.lck"))
+   if err != nil {
+   log.Fatalf("Cannot init artifact retrieval lock: %v", 
err)
+   }
+
+   for err = lock.TryLock(); err != nil; err = lock.TryLock() {
+   switch err {
+   case lockfile.ErrBusy, lockfile.ErrNotExist:
+   time.Sleep(5 * time.Second)
+   log.Printf("Worker %v waiting for artifact 
retrieval lock: %v", *id, lock)
+   default:
+   log.Fatalf("Worker %v could not obtain artifact 
retrieval lock: %v", *id, err)
+   }
+   }
+   defer lock.Unlock()
+
+   // skip if install already complete
+   _, err = os.Stat(installCompleteFile)
+   if err == nil {
+   return
 
 Review comment:
   Why is this not within the locked section?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300153)
Time Spent: 7h 10m  (was: 7h)

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7980) External environment with containerized worker pool

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7980?focusedWorklogId=300152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300152
 ]

ASF GitHub Bot logged work on BEAM-7980:


Author: ASF GitHub Bot
Created on: 23/Aug/19 08:23
Start Date: 23/Aug/19 08:23
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9398: [BEAM-7980] 
Exactly once artifact retrieval for Python SDK worker pool
URL: https://github.com/apache/beam/pull/9398#discussion_r317022289
 
 

 ##
 File path: sdks/python/container/boot.go
 ##
 @@ -105,18 +107,57 @@ func main() {
 
// (2) Retrieve and install the staged packages.
 
-   dir := filepath.Join(*semiPersistDir, "staged")
+   func() {
 
-   files, err := artifact.Materialize(ctx, *artifactEndpoint, 
info.GetRetrievalToken(), dir)
-   if err != nil {
-   log.Fatalf("Failed to retrieve staged files: %v", err)
-   }
+installCompleteFile := filepath.Join(os.TempDir(), 
"beam.install.complete")
 
-   // TODO(herohde): the packages to install should be specified 
explicitly. It
-   // would also be possible to install the SDK in the Dockerfile.
-   if setupErr := installSetupPackages(files, dir); setupErr != nil {
-   log.Fatalf("Failed to install required packages: %v", setupErr)
-   }
+   // skip if install already complete
+   _, err = os.Stat(installCompleteFile)
+   if err == nil {
+   return
 
 Review comment:
   I think this should live within the locked section.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300152)
Time Spent: 7h 10m  (was: 7h)

> External environment with containerized worker pool
> ---
>
> Key: BEAM-7980
> URL: https://issues.apache.org/jira/browse/BEAM-7980
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Augment Beam Python docker image and boot.go so that it can be used to launch 
> BeamFnExternalWorkerPoolServicer.
> [https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.lnhm75dhvhi0]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


  1   2   >