[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337197
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 01/Nov/19 05:54
Start Date: 01/Nov/19 05:54
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on pull request #9937: [BEAM-8513] 
Allow reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#discussion_r341458892
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java
 ##
 @@ -59,29 +65,46 @@
 public class RabbitMqIOTest implements Serializable {
   private static final Logger LOG = 
LoggerFactory.getLogger(RabbitMqIOTest.class);
 
+  private static final int ONE_MINUTE_MS = 60 * 1000;
+
   private static int port;
+  private static String defaultPort;
+
   @ClassRule public static TemporaryFolder temporaryFolder = new 
TemporaryFolder();
 
   @Rule public transient TestPipeline p = TestPipeline.create();
 
-  private static transient Broker broker;
+  private static transient SystemLauncher launcher;
 
   @BeforeClass
   public static void beforeClass() throws Exception {
 port = NetworkTestHelper.getAvailableLocalPort();
 
+defaultPort = System.getProperty("qpid.amqp_port");
 
 Review comment:
   I think it's simple enough as it is.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337197)
Time Spent: 2h  (was: 1h 50m)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Assignee: Jean-Baptiste Onofré
>Priority: Critical
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337198
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 01/Nov/19 05:54
Start Date: 01/Nov/19 05:54
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on pull request #9937: [BEAM-8513] 
Allow reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#discussion_r341458718
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -82,28 +97,31 @@
  *
  * As for the {@link Read}, the {@link Write} is configured with a RabbitMQ 
URI.
  *
- * For instance, you can write to an exchange (providing the exchange type):
+ * Examples
  *
  * {@code
+ * // Publishing to a named, non-durable exchange, declared by Beam:
  * pipeline
  *   .apply(...) // provide PCollection
  *   
.apply(RabbitMqIO.write().withUri("amqp://user:password@localhost:5672").withExchange("EXCHANGE",
 "fanout"));
- * }
  *
- * For instance, you can write to a queue:
+ * // Publishing to an existing exchange
+ * pipeline
+ *   .apply(...) // provide PCollection
+ *   
.apply(RabbitMqIO.write().withUri("amqp://user:password@localhost:5672").withExchange("EXCHANGE"));
  *
- * {@code
+ * // Publishing to a named queue in the default exchange:
  * pipeline
  *   .apply(...) // provide PCollection
  *   
.apply(RabbitMqIO.write().withUri("amqp://user:password@localhost:5672").withQueue("QUEUE"));
- *
  * }
  */
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public class RabbitMqIO {
   public static Read read() {
 return new AutoValue_RabbitMqIO_Read.Builder()
 .setQueueDeclare(false)
+.setExchangeDeclare(false)
 
 Review comment:
   I remember I discussed about that when I created the IO. I don't remember 
why I didn't do that finally. It makes sense anyway, thanks !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337198)
Time Spent: 2h 10m  (was: 2h)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Assignee: Jean-Baptiste Onofré
>Priority: Critical
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337196
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 01/Nov/19 05:54
Start Date: 01/Nov/19 05:54
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on pull request #9937: [BEAM-8513] 
Allow reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#discussion_r341458821
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -453,6 +524,8 @@ public boolean start() throws IOException {
 // we consume message without autoAck (we want to do the ack ourselves)
 channel.setDefaultConsumer(consumer);
 channel.basicConsume(queueName, false, consumer);
+  } catch (IOException e) {
+throw e;
 
 Review comment:
   Agree, no need. My bad.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337196)
Time Spent: 2h  (was: 1h 50m)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Assignee: Jean-Baptiste Onofré
>Priority: Critical
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337199
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 01/Nov/19 05:54
Start Date: 01/Nov/19 05:54
Worklog Time Spent: 10m 
  Work Description: jbonofre commented on pull request #9937: [BEAM-8513] 
Allow reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#discussion_r341458995
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java
 ##
 @@ -282,28 +424,4 @@ public void testWriteExchange() throws Exception {
   }
 }
   }
-
-  private static List generateRecords(int maxNumRecords) {
 
 Review comment:
   +1, it looks good to me.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337199)
Time Spent: 2h 20m  (was: 2h 10m)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Assignee: Jean-Baptiste Onofré
>Priority: Critical
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reassigned BEAM-8513:
--

Assignee: Jean-Baptiste Onofré

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Assignee: Jean-Baptiste Onofré
>Priority: Critical
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré updated BEAM-8513:
---
Status: Open  (was: Triage Needed)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Assignee: Jean-Baptiste Onofré
>Priority: Critical
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337169
 ]

ASF GitHub Bot logged work on BEAM-8503:


Author: ASF GitHub Bot
Created on: 01/Nov/19 03:57
Start Date: 01/Nov/19 03:57
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9880: [BEAM-8503] 
Improve TestBigQuery and TestPubsub
URL: https://github.com/apache/beam/pull/9880#issuecomment-548657485
 
 
   run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337169)
Time Spent: 2h  (was: 1h 50m)

> Improve TestBigQuery and TestPubsub
> ---
>
> Key: BEAM-8503
> URL: https://issues.apache.org/jira/browse/BEAM-8503
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add better support for E2E BigQuery and Pubsub testing:
> - TestBigQuery should have the ability to insert data into the underlying 
> table before a test.
> - TestPubsub should have the ability to subcribe to the underlying topic and 
> read messages that were written during a test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8539) Clearly define the valid job state transitions

2019-10-31 Thread Chad Dombrova (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chad Dombrova updated BEAM-8539:

Description: 
The Beam job state transitions are ill-defined, which is big problem for 
anything that relies on the values coming from JobAPI.GetStateStream.

I was hoping to find something like a state transition diagram in the docs so 
that I could determine the start state, the terminal states, and the valid 
transitions, but I could not find this. The code reveals that the SDKs differ 
on the fundamentals:

Java InMemoryJobService:
 * start state: *STOPPED*
 * run: about to submit to executor:  STARTING
 * run: actually running on executor:  RUNNING
 * terminal states: DONE, FAILED, CANCELLED, DRAINED

Python AbstractJobServiceServicer / LocalJobServicer:
 * start state: STARTING
 * terminal states: DONE, FAILED, CANCELLED, *STOPPED*

I think it would be good to make python work like Java, so that there is a 
difference in state between a job that has been prepared and one that has 
additionally been run.

It's hard to tell how far this problem has spread within the various runners.  
I think a simple thing that can be done to help standardize behavior is to 
implement the terminal states as an enum in the beam_job_api.proto, or create a 
utility function in each language for checking if a state is terminal, so that 
it's not left up to each runner to reimplement this logic.

 

  was:
The Beam job state transitions are ill-defined, which is big problem for 
anything that relies on the values coming from JobAPI.GetStateStream.

I was hoping to find something like a state transition diagram in the docs so 
that I could determine the start state, the terminal states, and the valid 
transitions, but I could not find this. The code reveals that the SDKs differ 
on the fundamentals:

Java InMemoryJobService:
 * start state (prepared): STOPPED
 * about to submit to executor:  STARTING
 * actually running on executor:  RUNNING
 * terminal states: DONE, FAILED, CANCELLED, DRAINED

Python AbstractJobServiceServicer / LocalJobServicer:
 * start state: STARTING
 * terminal states: DONE, FAILED, CANCELLED, STOPPED

I think it would be good to make python work like Java, so that there is a 
difference in state between a job that has been prepared and one that has been 
run.

It's hard to tell how far this problem has spread within the various runners.  
I think a simple thing that can be done to help standardize behavior is to 
implement the terminal states as an enum in the beam_job_api.proto, or create a 
utility function in each language for checking if a state is terminal, so that 
it's not left up to each runner to reimplement this logic.

 


> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run: about to submit to executor:  STARTING
>  * run: actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8539) Clearly define the valid job state transitions

2019-10-31 Thread Chad Dombrova (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chad Dombrova updated BEAM-8539:

Description: 
The Beam job state transitions are ill-defined, which is big problem for 
anything that relies on the values coming from JobAPI.GetStateStream.

I was hoping to find something like a state transition diagram in the docs so 
that I could determine the start state, the terminal states, and the valid 
transitions, but I could not find this. The code reveals that the SDKs differ 
on the fundamentals:

Java InMemoryJobService:
 * start state: *STOPPED*
 * run - about to submit to executor:  STARTING
 * run - actually running on executor:  RUNNING
 * terminal states: DONE, FAILED, CANCELLED, DRAINED

Python AbstractJobServiceServicer / LocalJobServicer:
 * start state: STARTING
 * terminal states: DONE, FAILED, CANCELLED, *STOPPED*

I think it would be good to make python work like Java, so that there is a 
difference in state between a job that has been prepared and one that has 
additionally been run.

It's hard to tell how far this problem has spread within the various runners.  
I think a simple thing that can be done to help standardize behavior is to 
implement the terminal states as an enum in the beam_job_api.proto, or create a 
utility function in each language for checking if a state is terminal, so that 
it's not left up to each runner to reimplement this logic.

 

  was:
The Beam job state transitions are ill-defined, which is big problem for 
anything that relies on the values coming from JobAPI.GetStateStream.

I was hoping to find something like a state transition diagram in the docs so 
that I could determine the start state, the terminal states, and the valid 
transitions, but I could not find this. The code reveals that the SDKs differ 
on the fundamentals:

Java InMemoryJobService:
 * start state: *STOPPED*
 * run: about to submit to executor:  STARTING
 * run: actually running on executor:  RUNNING
 * terminal states: DONE, FAILED, CANCELLED, DRAINED

Python AbstractJobServiceServicer / LocalJobServicer:
 * start state: STARTING
 * terminal states: DONE, FAILED, CANCELLED, *STOPPED*

I think it would be good to make python work like Java, so that there is a 
difference in state between a job that has been prepared and one that has 
additionally been run.

It's hard to tell how far this problem has spread within the various runners.  
I think a simple thing that can be done to help standardize behavior is to 
implement the terminal states as an enum in the beam_job_api.proto, or create a 
utility function in each language for checking if a state is terminal, so that 
it's not left up to each runner to reimplement this logic.

 


> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8539) Clearly define the valid job state transitions

2019-10-31 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964580#comment-16964580
 ] 

Chad Dombrova commented on BEAM-8539:
-

[~lcwik], [~mxm] you might be interested in this.

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run: about to submit to executor:  STARTING
>  * run: actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8539) Clearly define the valid job state transitions

2019-10-31 Thread Chad Dombrova (Jira)
Chad Dombrova created BEAM-8539:
---

 Summary: Clearly define the valid job state transitions
 Key: BEAM-8539
 URL: https://issues.apache.org/jira/browse/BEAM-8539
 Project: Beam
  Issue Type: Improvement
  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
Reporter: Chad Dombrova


The Beam job state transitions are ill-defined, which is big problem for 
anything that relies on the values coming from JobAPI.GetStateStream.

I was hoping to find something like a state transition diagram in the docs so 
that I could determine the start state, the terminal states, and the valid 
transitions, but I could not find this. The code reveals that the SDKs differ 
on the fundamentals:

Java InMemoryJobService:
 * start state (prepared): STOPPED
 * about to submit to executor:  STARTING
 * actually running on executor:  RUNNING
 * terminal states: DONE, FAILED, CANCELLED, DRAINED

Python AbstractJobServiceServicer / LocalJobServicer:
 * start state: STARTING
 * terminal states: DONE, FAILED, CANCELLED, STOPPED

I think it would be good to make python work like Java, so that there is a 
difference in state between a job that has been prepared and one that has been 
run.

It's hard to tell how far this problem has spread within the various runners.  
I think a simple thing that can be done to help standardize behavior is to 
implement the terminal states as an enum in the beam_job_api.proto, or create a 
utility function in each language for checking if a state is terminal, so that 
it's not left up to each runner to reimplement this logic.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337165
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 01/Nov/19 03:43
Start Date: 01/Nov/19 03:43
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9820: 
[BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#discussion_r341442887
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -376,16 +389,11 @@ public void finalizeCheckpoint() throws IOException {
   this.source = source;
   this.current = null;
   this.checkpointMark = checkpointMark != null ? checkpointMark : new 
RabbitMQCheckpointMark();
-  try {
-connectionHandler = new ConnectionHandler(source.spec.uri());
-  } catch (Exception e) {
-throw new IOException(e);
-  }
 }
 
 @Override
 public Instant getWatermark() {
-  return checkpointMark.oldestTimestamp;
+  return checkpointMark.latestTimestamp;
 
 Review comment:
   Ideally, the watermark would be informed by the RabbitMQ broker's backlog. 
The goal is for the watermark to be a good prediction of the fact "no future 
records will have timestamps before the watermark's timestamp".
   
   My impression is that you have made the watermark advance more reliably, in 
good ways, but will perhaps result in a large amount of data being marked late. 
What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337165)
Time Spent: 40m  (was: 0.5h)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337164
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 01/Nov/19 03:43
Start Date: 01/Nov/19 03:43
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9820: 
[BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#discussion_r341442438
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -376,16 +389,11 @@ public void finalizeCheckpoint() throws IOException {
   this.source = source;
   this.current = null;
   this.checkpointMark = checkpointMark != null ? checkpointMark : new 
RabbitMQCheckpointMark();
-  try {
-connectionHandler = new ConnectionHandler(source.spec.uri());
 
 Review comment:
   My impression is that your change here is an improvement.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337164)
Time Spent: 40m  (was: 0.5h)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337163
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 01/Nov/19 03:38
Start Date: 01/Nov/19 03:38
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9900: 
[BEAM-7434] [BEAM-5895] and [BEAM-5894] Upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9900
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337163)
Time Spent: 50m  (was: 40m)

> RabbitMqIO uses a deprecated API
> 
>
> Key: BEAM-7434
> URL: https://issues.apache.org/jira/browse/BEAM-7434
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Reporter: Nicolas Delsaux
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the 
> QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo 
> should replace this consumer with the DefaultConsumer provided by RabbitMq.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337162
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 01/Nov/19 03:36
Start Date: 01/Nov/19 03:36
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9900: 
[BEAM-7434] [BEAM-5895] and [BEAM-5894] Upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9900#discussion_r341442028
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java
 ##
 @@ -462,12 +449,14 @@ public boolean start() throws IOException {
 @Override
 public boolean advance() throws IOException {
   try {
-QueueingConsumer.Delivery delivery = consumer.nextDelivery(1000);
+Channel channel = connectionHandler.getChannel();
+// we consume message without autoAck (we want to do the ack ourselves)
+GetResponse delivery = channel.basicGet(queueName, false);
 
 Review comment:
   @chamikaramj @dpmills @reuvenlax might have good guidance here
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337162)
Time Spent: 40m  (was: 0.5h)

> RabbitMqIO uses a deprecated API
> 
>
> Key: BEAM-7434
> URL: https://issues.apache.org/jira/browse/BEAM-7434
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Reporter: Nicolas Delsaux
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the 
> QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo 
> should replace this consumer with the DefaultConsumer provided by RabbitMq.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337161
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 01/Nov/19 03:34
Start Date: 01/Nov/19 03:34
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-548653719
 
 
   Even though 2.17.0 has been cut, there is still time to cherrypick. Sorry 
for the large review delay.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337161)
Time Spent: 0.5h  (was: 20m)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8514) ZetaSql should use cost-based optimization to take advantage of Join Reordering Rule and Push-Down Rule

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8514?focusedWorklogId=337150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337150
 ]

ASF GitHub Bot logged work on BEAM-8514:


Author: ASF GitHub Bot
Created on: 01/Nov/19 02:50
Start Date: 01/Nov/19 02:50
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9874: 
[BEAM-8514] ZetaSql should use cost based optimization
URL: https://github.com/apache/beam/pull/9874
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337150)
Time Spent: 20m  (was: 10m)

> ZetaSql should use cost-based optimization to take advantage of Join 
> Reordering Rule and Push-Down Rule
> ---
>
> Key: BEAM-8514
> URL: https://issues.apache.org/jira/browse/BEAM-8514
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Default config should use BeamCostModel, as well as tests with custom 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-10-31 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964541#comment-16964541
 ] 

Thomas Weise commented on BEAM-8243:


Looking forward to see the gradle commands disappear from 
[https://beam.apache.org/documentation/runners/flink/] and it would be good to 
add back running the wordcount example with FlinkRunner. That might look really 
user friendly now!

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=337139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337139
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 01/Nov/19 02:18
Start Date: 01/Nov/19 02:18
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on issue #9686: [WIP][BEAM-5878] 
update dill min version to 0.3.1.1 and add test for functions with Keyword-only 
arguments
URL: https://github.com/apache/beam/pull/9686#issuecomment-548639816
 
 
   @tvalentyn Sure. Wait for a while.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337139)
Time Spent: 15.5h  (was: 15h 20m)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-10-31 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964534#comment-16964534
 ] 

Thomas Weise commented on BEAM-8243:


[~ibzib] [~robertwb] is this going to cover documentation for BEAM-8372?

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-31 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated BEAM-8372:
---
Fix Version/s: 2.17.0

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability, portability-flink
> Fix For: 2.17.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-31 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated BEAM-8372:
---
Labels: portability portability-flink  (was: )

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability, portability-flink
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337135
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 01/Nov/19 01:57
Start Date: 01/Nov/19 01:57
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836#discussion_r341425717
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1162,12 +1173,33 @@ def process_timer(self, window_id, unused_name, 
time_domain, timestamp,
 if self.trigger_fn.should_fire(time_domain, timestamp,
window, context):
   finished = self.trigger_fn.on_fire(timestamp, window, context)
-  yield self._output(window, finished, state)
+  yield self._output(window, finished, state, timestamp,
+ time_domain == TimeDomain.WATERMARK)
 else:
   raise Exception('Unexpected time domain: %s' % time_domain)
 
-  def _output(self, window, finished, state):
+  def _output(self, window, finished, state, watermark, maybe_ontime):
 """Output window and clean up if appropriate."""
+index = state.get_state(window, self.INDEX)
+state.add_state(window, self.INDEX, 1)
+if watermark <= window.max_timestamp():
+  nonspeculative_index = -1
+  timing = windowed_value.PaneInfoTiming.EARLY
+  if state.get_state(window, self.NONSPECULATIVE_INDEX):
+logging.warning('Watermark moved backwards in time.')
+else:
+  nonspeculative_index = state.get_state(window, self.NONSPECULATIVE_INDEX)
+  state.add_state(window, self.NONSPECULATIVE_INDEX, 1)
+  timing = (
+  windowed_value.PaneInfoTiming.ON_TIME
+  if maybe_ontime and nonspeculative_index == 0
 
 Review comment:
   `maybe_ontime` is False or null so pane can never be on_time
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337135)
Time Spent: 40m  (was: 0.5h)

> Allow access to PaneInfo from Python DoFns
> --
>
> Key: BEAM-8435
> URL: https://issues.apache.org/jira/browse/BEAM-8435
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
> never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337134
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 01/Nov/19 01:57
Start Date: 01/Nov/19 01:57
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836#discussion_r341426171
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1162,12 +1173,33 @@ def process_timer(self, window_id, unused_name, 
time_domain, timestamp,
 if self.trigger_fn.should_fire(time_domain, timestamp,
window, context):
   finished = self.trigger_fn.on_fire(timestamp, window, context)
-  yield self._output(window, finished, state)
+  yield self._output(window, finished, state, timestamp,
+ time_domain == TimeDomain.WATERMARK)
 else:
   raise Exception('Unexpected time domain: %s' % time_domain)
 
-  def _output(self, window, finished, state):
+  def _output(self, window, finished, state, watermark, maybe_ontime):
 """Output window and clean up if appropriate."""
+index = state.get_state(window, self.INDEX)
+state.add_state(window, self.INDEX, 1)
+if watermark <= window.max_timestamp():
+  nonspeculative_index = -1
+  timing = windowed_value.PaneInfoTiming.EARLY
+  if state.get_state(window, self.NONSPECULATIVE_INDEX):
+logging.warning('Watermark moved backwards in time.')
 
 Review comment:
   Session window max_timestamp can move forward which can result in this log.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337134)

> Allow access to PaneInfo from Python DoFns
> --
>
> Key: BEAM-8435
> URL: https://issues.apache.org/jira/browse/BEAM-8435
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
> never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337133
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 01/Nov/19 01:57
Start Date: 01/Nov/19 01:57
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836#discussion_r341420088
 
 

 ##
 File path: sdks/python/apache_beam/pipeline_test.py
 ##
 @@ -602,6 +603,12 @@ def test_timestamp_param_map(self):
   p | Create([1, 2]) | beam.Map(lambda _, t=DoFn.TimestampParam: t),
   equal_to([MIN_TIMESTAMP, MIN_TIMESTAMP]))
 
+  def test_pane_info_param(self):
 
 Review comment:
   We can add 1 more test with actual pane.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337133)
Time Spent: 0.5h  (was: 20m)

> Allow access to PaneInfo from Python DoFns
> --
>
> Key: BEAM-8435
> URL: https://issues.apache.org/jira/browse/BEAM-8435
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
> never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8534) XlangParquetIOTest failing

2019-10-31 Thread Heejong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964492#comment-16964492
 ] 

Heejong Lee commented on BEAM-8534:
---

Looks like there's a mismatch between published docker images vs locally built 
docker images. When we run the test, it automatically pulls the published ones 
and overwrites the locally built ones.

> XlangParquetIOTest failing
> --
>
> Key: BEAM-8534
> URL: https://issues.apache.org/jira/browse/BEAM-8534
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Heejong Lee
>Priority: Major
>
>  *13:43:05* [grpc-default-executor-1] ERROR 
> org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while 
> trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: 
> unable to deserialize Custom DoFn With Execution Info*13:43:05*   at 
> org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
>  at 
> org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
> at 
> org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
> at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
>at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
> at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
> java.io.InvalidClassException: 
> org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
> incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
> class serialVersionUID = -7141898054594373712*13:43:05*   at 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*  
>at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*
>at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> 

[jira] [Work logged] (BEAM-7303) Move Portable Runner and other of reference runner.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7303?focusedWorklogId=337124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337124
 ]

ASF GitHub Bot logged work on BEAM-7303:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:56
Start Date: 01/Nov/19 00:56
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9936: [BEAM-7303] 
Move PortableRunner from runners.reference to java.sdks.portability package
URL: https://github.com/apache/beam/pull/9936#discussion_r341418994
 
 

 ##
 File path: 
sdks/java/portability/src/main/java/org/apache/beam/sdks/java/portability/PortableRunnerRegistrar.java
 ##
 @@ -15,7 +15,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.beam.runners.reference;
+package org.apache.beam.sdks.java.portability;
 
 Review comment:
   This can also go to
   `package org.apache.beam.runners.portability;`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337124)
Time Spent: 0.5h  (was: 20m)

> Move Portable Runner and other of reference runner.
> ---
>
> Key: BEAM-7303
> URL: https://issues.apache.org/jira/browse/BEAM-7303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PortableRunner is used by all Flink, Spark ... . 
> We should move it out of Reference Runner package to stream line the 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337122
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:45
Start Date: 01/Nov/19 00:45
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r341417808
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -65,23 +66,32 @@
   private static final List TEST_DATA =
   Arrays.asList("Apple", "Orange", "Banana", "Orange");
 
-  // Data Table: used by testReadSketchFromBigQuery())
+  // Data Table: used by tests reading sketches from BigQuery
   // Schema: only one STRING field named "data".
-  // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
   private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
-  private static final Long EXPECTED_COUNT = 3L;
 
-  // Sketch Table: used by testWriteSketchToBigQuery()
+  // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
+  private static final String DATA_TABLE_ID_NON_EMPTY = "hll_data_non_empty";
+  private static final Long EXPECTED_COUNT_NON_EMPTY = 3L;
+
+  // Content: empty
+  private static final String DATA_TABLE_ID_EMPTY = "hll_data_empty";
 
 Review comment:
   Yes it does and I have tried it (although it is not mentioned in the 
BigQuery documentation).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337122)
Time Spent: 36h 10m  (was: 36h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 36h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337118
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:29
Start Date: 01/Nov/19 00:29
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r341415540
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception {
   }
 
   /**
-   * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll 
sketch is computed by
-   * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies 
that we can run {@link
-   * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam 
to get the correct
-   * estimated count.
+   * Test that non-empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
 
 Review comment:
   Fixed! Thanks for the link, that's a good read. :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337118)
Time Spent: 36h  (was: 35h 50m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 36h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2019-10-31 Thread Boyuan Zhang (Jira)
Boyuan Zhang created BEAM-8537:
--

 Summary: Provide WatermarkEstimatorProvider for different types of 
WatermarkEstimator
 Key: BEAM-8537
 URL: https://issues.apache.org/jira/browse/BEAM-8537
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core, sdk-py-harness
Reporter: Boyuan Zhang
Assignee: Boyuan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8538) Python typehints: support namedtuples

2019-10-31 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8538:
---

 Summary: Python typehints: support namedtuples
 Key: BEAM-8538
 URL: https://issues.apache.org/jira/browse/BEAM-8538
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


2 bugs:
1. convert_to_beam_type passes typing.NamedTuple as-is. The code that's 
supposed to convert it to Any is broken (and the test is buggy too).
Fix: 
https://github.com/apache/beam/pull/9754/files#diff-9b2eac354738047a44814d4b429cR247

2. Possible bug: collections.NamedTuple instances should also be converted to 
Any, or Tuple[Any, Any, ..], if other static types checker do this as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8536) Migrate usage of DelayedBundleApplication.requested_execution_time to time duration

2019-10-31 Thread Boyuan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang reassigned BEAM-8536:
--

Assignee: Boyuan Zhang

> Migrate usage of DelayedBundleApplication.requested_execution_time to time 
> duration 
> 
>
> Key: BEAM-8536
> URL: https://issues.apache.org/jira/browse/BEAM-8536
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> In DelayedBundleApplication, we used to use an absolute time to represent 
> reshceduling time. We want to switch to use a relative time duration,  which 
> requires a migration in Java SDK and dataflow java runner harness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=337114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337114
 ]

ASF GitHub Bot logged work on BEAM-8451:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:26
Start Date: 01/Nov/19 00:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9865: [BEAM-8451] 
Fix interactive beam max recursion err
URL: https://github.com/apache/beam/pull/9865
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337114)
Time Spent: 1.5h  (was: 1h 20m)

> Interactive Beam example failing from stack overflow
> 
>
> Key: BEAM-8451
> URL: https://issues.apache.org/jira/browse/BEAM-8451
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python, runner-py-interactive
>Reporter: Igor Durovic
>Assignee: Igor Durovic
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  
> RecursionError: maximum recursion depth exceeded in __instancecheck__
> at 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405]
>  
> This occurred after the execution of the last cell in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=337116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337116
 ]

ASF GitHub Bot logged work on BEAM-8451:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:26
Start Date: 01/Nov/19 00:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9865: [BEAM-8451] Fix 
interactive beam max recursion err
URL: https://github.com/apache/beam/pull/9865#issuecomment-548620441
 
 
   Thanks all!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337116)
Time Spent: 1h 40m  (was: 1.5h)

> Interactive Beam example failing from stack overflow
> 
>
> Key: BEAM-8451
> URL: https://issues.apache.org/jira/browse/BEAM-8451
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python, runner-py-interactive
>Reporter: Igor Durovic
>Assignee: Igor Durovic
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  
> RecursionError: maximum recursion depth exceeded in __instancecheck__
> at 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405]
>  
> This occurred after the execution of the last cell in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337110
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:23
Start Date: 01/Nov/19 00:23
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r341414587
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception {
   }
 
   /**
-   * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll 
sketch is computed by
-   * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies 
that we can run {@link
-   * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam 
to get the correct
-   * estimated count.
+   * Test that non-empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
+   * verifies that we can run {@link HllCount.MergePartial} and {@link 
HllCount.Extract} on the
+   * sketch in Beam to get the correct estimated count.
+   */
+  @Test
+  public void testReadNonEmptySketchFromBigQuery() {
+readSketchFromBigQuery(DATA_TABLE_ID_NON_EMPTY, EXPECTED_COUNT_NON_EMPTY);
+  }
+
+  /**
+   * Test that empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
+   * verifies that we can run {@link HllCount.MergePartial} and {@link 
HllCount.Extract} on the
+   * sketch in Beam to get the correct estimated count.
*/
   @Test
-  public void testReadSketchFromBigQuery() {
-String tableSpec = String.format("%s.%s", DATASET_ID, DATA_TABLE_ID);
+  public void testReadEmptySketchFromBigQuery() {
+readSketchFromBigQuery(DATA_TABLE_ID_EMPTY, EXPECTED_COUNT_EMPTY);
+  }
+
+  private void readSketchFromBigQuery(String tableId, Long expectedCount) {
+String tableSpec = String.format("%s.%s", DATASET_ID, tableId);
 String query =
 String.format(
 "SELECT HLL_COUNT.INIT(%s) AS %s FROM %s",
 DATA_FIELD_NAME, QUERY_RESULT_FIELD_NAME, tableSpec);
+
 SerializableFunction parseQueryResultToByteArray =
-(SchemaAndRecord schemaAndRecord) ->
-// BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type
-((ByteBuffer) 
schemaAndRecord.getRecord().get(QUERY_RESULT_FIELD_NAME)).array();
+input -> {
+  // BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type
+  ByteBuffer sketch = (ByteBuffer) 
input.getRecord().get(QUERY_RESULT_FIELD_NAME);
+  if (sketch == null) {
+// Empty sketch is represented by null in BigQuery and by empty 
byte array in Beam
+return new byte[0];
+  } else {
+byte[] result = new byte[sketch.remaining()];
 
 Review comment:
   Exactly. We know that is the case by looking into `Avro`'s implementation, 
but compiler does not know that, and it gives a warning if we use `.array()`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337110)
Time Spent: 35h 50m  (was: 35h 40m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337108
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:21
Start Date: 01/Nov/19 00:21
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r341414290
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -96,28 +106,37 @@
   public static void prepareDatasetAndDataTable() throws Exception {
 
 Review comment:
   Ah, nice catch!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337108)
Time Spent: 35h 40m  (was: 35.5h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8521?focusedWorklogId=337107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337107
 ]

ASF GitHub Bot logged work on BEAM-8521:


Author: ASF GitHub Bot
Created on: 01/Nov/19 00:14
Start Date: 01/Nov/19 00:14
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9939: [BEAM-8521] only build 
Python 2.7 for xlang test
URL: https://github.com/apache/beam/pull/9939#issuecomment-548618170
 
 
   I think we should submit this change even if it will not fix the test. 
There's no reason that we build all versions of Python SDKs and it takes a lot 
of time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337107)
Time Spent: 40m  (was: 0.5h)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337105
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 31/Oct/19 23:59
Start Date: 31/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9868: [BEAM-8472] get 
default GCP region option (Python)
URL: https://github.com/apache/beam/pull/9868
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337105)
Time Spent: 1.5h  (was: 1h 20m)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8252?focusedWorklogId=337104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337104
 ]

ASF GitHub Bot logged work on BEAM-8252:


Author: ASF GitHub Bot
Created on: 31/Oct/19 23:55
Start Date: 31/Oct/19 23:55
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9594: [BEAM-8252] 
Python: add worker_region and worker_zone options
URL: https://github.com/apache/beam/pull/9594
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337104)
Time Spent: 40m  (was: 0.5h)

> (Python SDK) Add worker_region and worker_zone options
> --
>
> Key: BEAM-8252
> URL: https://issues.apache.org/jira/browse/BEAM-8252
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=337102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337102
 ]

ASF GitHub Bot logged work on BEAM-8254:


Author: ASF GitHub Bot
Created on: 31/Oct/19 23:53
Start Date: 31/Oct/19 23:53
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9961: [BEAM-8254] add 
workerRegion and workerZone options to Java SDK
URL: https://github.com/apache/beam/pull/9961
 
 
   Same as #9594.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337090
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 31/Oct/19 23:04
Start Date: 31/Oct/19 23:04
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9912: 
[BEAM-8491] Add ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912#discussion_r341398835
 
 

 ##
 File path: sdks/python/apache_beam/pipeline_test.py
 ##
 @@ -438,6 +438,101 @@ def get_replacement_transform(self, ptransform):
   p.replace_all([override])
   self.assertEqual(pcoll.producer.inputs[0].element_type, expected_type)
 
+
+  def test_ptransform_override_multiple_outputs(self):
+class _MultiParDoComposite(PTransform):
 
 Review comment:
   Is this used ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337090)
Time Spent: 50m  (was: 40m)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=337092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337092
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 31/Oct/19 23:04
Start Date: 31/Oct/19 23:04
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960
 
 
   Some triggers can cause data loss. This PR prevents them from being used.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337091
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 31/Oct/19 23:04
Start Date: 31/Oct/19 23:04
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9912: 
[BEAM-8491] Add ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912#discussion_r341396394
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -263,31 +262,29 @@ def _replace_if_needed(self, original_transform_node):
 
   new_output = replacement_transform.expand(input_node)
 
-  new_output.element_type = None
-  self.pipeline._infer_result_type(replacement_transform, inputs,
-   new_output)
-
+  if isinstance(new_output, pvalue.PValue):
+new_output.element_type = None
+self.pipeline._infer_result_type(replacement_transform, inputs,
+ new_output)
   replacement_transform_node.add_output(new_output)
-  if not new_output.producer:
-new_output.producer = replacement_transform_node
-
-  # We only support replacing transforms with a single output with
-  # another transform that produces a single output.
-  # TODO: Support replacing PTransforms with multiple outputs.
-  if (len(original_transform_node.outputs) > 1 or
-  not isinstance(original_transform_node.outputs[None],
- (PCollection, PDone)) or
-  not isinstance(new_output, (PCollection, PDone))):
-raise NotImplementedError(
-'PTransform overriding is only supported for PTransforms that '
-'have a single output. Tried to replace output of '
-'AppliedPTransform %r with %r.'
-% (original_transform_node, new_output))
 
   # Recording updated outputs. This cannot be done in the same visitor
   # since if we dynamically update output type here, we'll run into
   # errors when visiting child nodes.
-  output_map[original_transform_node.outputs[None]] = new_output
+  if isinstance(new_output, pvalue.PValue):
+if not new_output.producer:
+  new_output.producer = replacement_transform_node
+output_map[original_transform_node.outputs[None]] = new_output
+  elif isinstance(new_output, (pvalue.DoOutputsTuple, tuple)):
+for pcoll in new_output:
+  if not pcoll.producer:
+pcoll.producer = replacement_transform_node
+  output_map[original_transform_node.outputs[pcoll.tag]] = pcoll
 
 Review comment:
   nit: add a comment that we expect output tags of original and replacement 
transforms to be the same. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337091)
Time Spent: 50m  (was: 40m)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-31 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8365.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8533) test_java_expansion_portable_runner failing

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8533?focusedWorklogId=337074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337074
 ]

ASF GitHub Bot logged work on BEAM-8533:


Author: ASF GitHub Bot
Created on: 31/Oct/19 22:14
Start Date: 31/Oct/19 22:14
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9956: [BEAM-8533] Revert 
"[BEAM-8442] Remove duplicate code for bundle register in Pyth…
URL: https://github.com/apache/beam/pull/9956#issuecomment-548590139
 
 
   CrossLanguageValidatesRunner failed only 1 test 
([BEAM-8534](https://issues.apache.org/jira/browse/BEAM-8534)), where before it 
was failing 2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337074)
Time Spent: 20m  (was: 10m)

> test_java_expansion_portable_runner failing
> ---
>
> Key: BEAM-8533
> URL: https://issues.apache.org/jira/browse/BEAM-8533
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> probable cause: 
> https://github.com/apache/beam/pull/9842#issuecomment-548496295
> 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction 72: Traceback (most recent 
> call last):11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 173, in _execute*11:13:37* response = task()11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 196, in 11:13:37 self._execute(lambda: 
> worker.do_instruction(work), work)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 378, in process_bundle*11:13:37* instruction_id, 
> request.process_bundle_descriptor_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: 
> u'1-37'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8536) Migrate usage of DelayedBundleApplication.requested_execution_time to time duration

2019-10-31 Thread Boyuan Zhang (Jira)
Boyuan Zhang created BEAM-8536:
--

 Summary: Migrate usage of 
DelayedBundleApplication.requested_execution_time to time duration 
 Key: BEAM-8536
 URL: https://issues.apache.org/jira/browse/BEAM-8536
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow, sdk-java-harness
Reporter: Boyuan Zhang


In DelayedBundleApplication, we used to use an absolute time to represent 
reshceduling time. We want to switch to use a relative time duration,  which 
requires a migration in Java SDK and dataflow java runner harness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-10-31 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964406#comment-16964406
 ] 

Chad Dombrova commented on BEAM-8523:
-

WIP: https://github.com/apache/beam/pull/9959

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337072
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:57
Start Date: 31/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959
 
 
   When connecting to GetStateStream, the job service will first stream the 
history of missed state events, thereby catching up the client.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8532?focusedWorklogId=337071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337071
 ]

ASF GitHub Bot logged work on BEAM-8532:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:49
Start Date: 31/Oct/19 21:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9958: [BEAM-8532] 
Output timestamp should be the inclusive, not exclusive, end of window.
URL: https://github.com/apache/beam/pull/9958
 
 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337065
 ]

ASF GitHub Bot logged work on BEAM-8503:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:31
Start Date: 31/Oct/19 21:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9880: [BEAM-8503] 
Improve TestBigQuery and TestPubsub
URL: https://github.com/apache/beam/pull/9880#issuecomment-548577455
 
 
   Thanks that's a great tip! Here I thought people were just adding that 
"fixup!" for fun, I didn't realize it worked with `git rebase --autosquash`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337065)
Time Spent: 1h 50m  (was: 1h 40m)

> Improve TestBigQuery and TestPubsub
> ---
>
> Key: BEAM-8503
> URL: https://issues.apache.org/jira/browse/BEAM-8503
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add better support for E2E BigQuery and Pubsub testing:
> - TestBigQuery should have the ability to insert data into the underlying 
> table before a test.
> - TestPubsub should have the ability to subcribe to the underlying topic and 
> read messages that were written during a test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8535) TextIO.read doesn't work with single wildcard with relative path

2019-10-31 Thread Tim (Jira)
Tim created BEAM-8535:
-

 Summary: TextIO.read doesn't work with single wildcard with 
relative path
 Key: BEAM-8535
 URL: https://issues.apache.org/jira/browse/BEAM-8535
 Project: Beam
  Issue Type: Bug
  Components: beam-model, io-java-files
Affects Versions: 2.16.0
 Environment: Mac High Sierra 10.13.6.   DirectRunner local. 
Reporter: Tim


It looks like the TextIO.read transform is not matching files when using a glob 
wildcard when the glob starts with a * and the path is relative.  IE 
/full/path/* and ./path/f* work but ./path/* does not.

Reproduction steps using the word count example from the Beam Quick start for 
current version 2.16 ([https://beam.apache.org/get-started/quickstart-java/]) - 
{code:java}
$ mkdir test-folder && cp pom.xml ./test-folder
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>      -Dexec.args="--inputFile=./test-folder/* --output=counts" -Pdirect-runner


{code}
 The above fails when it is expected to find the pom.xml file. I tested the 
same way with 2.15 and it works as expected.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337060
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:19
Start Date: 31/Oct/19 21:19
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9868: [BEAM-8472] get default 
GCP region option (Python)
URL: https://github.com/apache/beam/pull/9868#issuecomment-548573257
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337060)
Time Spent: 1h 20m  (was: 1h 10m)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337057
 ]

ASF GitHub Bot logged work on BEAM-8503:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:17
Start Date: 31/Oct/19 21:17
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9880: 
[BEAM-8503] Improve TestBigQuery and TestPubsub
URL: https://github.com/apache/beam/pull/9880#discussion_r341369467
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java
 ##
 @@ -172,6 +193,65 @@ public void publish(List messages) throws 
IOException {
 pubsub.publish(eventsTopicPath, outgoingMessages);
   }
 
+  /** Pull up to 100 messages from {@link #subscriptionPath()}. */
+  public List pull() throws IOException {
+return pull(100);
+  }
+
+  /** Pull up to {@code maxBatchSize} messages from {@link 
#subscriptionPath()}. */
+  public List pull(int maxBatchSize) throws IOException {
+return pubsub.pull(0, subscriptionPath, 100, true).stream()
 
 Review comment:
   Ah I see. I dismissed the asynchronous option since it would require some 
sort of synchronization, but that may be simpler. I can look at doing that as a 
follow-up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337057)
Time Spent: 1h 40m  (was: 1.5h)

> Improve TestBigQuery and TestPubsub
> ---
>
> Key: BEAM-8503
> URL: https://issues.apache.org/jira/browse/BEAM-8503
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add better support for E2E BigQuery and Pubsub testing:
> - TestBigQuery should have the ability to insert data into the underlying 
> table before a test.
> - TestPubsub should have the ability to subcribe to the underlying topic and 
> read messages that were written during a test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8533) test_java_expansion_portable_runner failing

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8533?focusedWorklogId=337055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337055
 ]

ASF GitHub Bot logged work on BEAM-8533:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:15
Start Date: 31/Oct/19 21:15
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9956: [BEAM-8533] Revert 
"[BEAM-8442] Remove duplicate code for bundle register in Pyth…
URL: https://github.com/apache/beam/pull/9956#issuecomment-548571667
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337055)
Remaining Estimate: 0h
Time Spent: 10m

> test_java_expansion_portable_runner failing
> ---
>
> Key: BEAM-8533
> URL: https://issues.apache.org/jira/browse/BEAM-8533
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> probable cause: 
> https://github.com/apache/beam/pull/9842#issuecomment-548496295
> 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction 72: Traceback (most recent 
> call last):11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 173, in _execute*11:13:37* response = task()11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 196, in 11:13:37 self._execute(lambda: 
> worker.do_instruction(work), work)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 378, in process_bundle*11:13:37* instruction_id, 
> request.process_bundle_descriptor_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: 
> u'1-37'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=337051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337051
 ]

ASF GitHub Bot logged work on BEAM-8451:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:08
Start Date: 31/Oct/19 21:08
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9865: [BEAM-8451] Fix 
interactive beam max recursion err
URL: https://github.com/apache/beam/pull/9865#issuecomment-548569362
 
 
   Hey @pabloem is this good to merge?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337051)
Time Spent: 1h 20m  (was: 1h 10m)

> Interactive Beam example failing from stack overflow
> 
>
> Key: BEAM-8451
> URL: https://issues.apache.org/jira/browse/BEAM-8451
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python, runner-py-interactive
>Reporter: Igor Durovic
>Assignee: Igor Durovic
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>  
> RecursionError: maximum recursion depth exceeded in __instancecheck__
> at 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405]
>  
> This occurred after the execution of the last cell in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=337053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337053
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:09
Start Date: 31/Oct/19 21:09
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9956: Revert 
"[BEAM-8442] Remove duplicate code for bundle register in Pyth…
URL: https://github.com/apache/beam/pull/9956
 
 
   …on SDK harness"
   
   This reverts commit cce7a3bf30caed118db7bdc9f212117355853695.
   Asynchronous register was causing tests to fail 
([BEAM-8533](https://issues.apache.org/jira/browse/BEAM-8533))
   
   @mxm @sunjincheng121 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 

[jira] [Updated] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-10-31 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8521:
--
Description: 
https://builds.apache.org/job/beam_PostCommit_XVR_Flink/

Edit: Made subtasks for what appear to be two separate issues.

  was:
https://builds.apache.org/job/beam_PostCommit_XVR_Flink/

There appear to be two failures:

test_java_expansion_portable_runner:
*11:13:37* java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
Error received from SDK harness for instruction 72: Traceback (most recent call 
last):*11:13:37* File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 173, in _execute*11:13:37* response = task()*11:13:37* File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 196, in *11:13:37* self._execute(lambda: 
worker.do_instruction(work), work)*11:13:37* File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 358, in do_instruction*11:13:37* request.instruction_id)*11:13:37* File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 378, in process_bundle*11:13:37* instruction_id, 
request.process_bundle_descriptor_id)*11:13:37* File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 311, in get*11:13:37* self.fns[bundle_descriptor_id],*11:13:37* KeyError: 
u'1-37'

 XlangParquetIOTest:
*13:43:05* [grpc-default-executor-1] ERROR 
org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while trying 
to handle InstructionRequest 10 java.lang.IllegalArgumentException: unable to 
deserialize Custom DoFn With Execution Info*13:43:05*  at 
org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
 at 
org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
at 
org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
  at 
org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
at 
org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
  at 
org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
  at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
   at 
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
  at 
org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
java.io.InvalidClassException: 
org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
class serialVersionUID = -7141898054594373712*13:43:05*   at 
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*
 at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
at 
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*  
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* 
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* 
 at 

[jira] [Updated] (BEAM-8533) test_java_expansion_portable_runner failing

2019-10-31 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8533:
--
Status: Open  (was: Triage Needed)

> test_java_expansion_portable_runner failing
> ---
>
> Key: BEAM-8533
> URL: https://issues.apache.org/jira/browse/BEAM-8533
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> probable cause: 
> https://github.com/apache/beam/pull/9842#issuecomment-548496295
> 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> Error received from SDK harness for instruction 72: Traceback (most recent 
> call last):11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 173, in _execute*11:13:37* response = task()11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 196, in 11:13:37 self._execute(lambda: 
> worker.do_instruction(work), work)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 378, in process_bundle*11:13:37* instruction_id, 
> request.process_bundle_descriptor_id)11:13:37 File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: 
> u'1-37'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8530) Dataflow portable runner fails timer ordering tests

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8530?focusedWorklogId=337041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337041
 ]

ASF GitHub Bot logged work on BEAM-8530:


Author: ASF GitHub Bot
Created on: 31/Oct/19 21:00
Start Date: 31/Oct/19 21:00
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9951: 
[BEAM-8530] disable UsesStrictEventTimeOrdering for portable dataflow
URL: https://github.com/apache/beam/pull/9951
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337041)
Time Spent: 40m  (was: 0.5h)

> Dataflow portable runner fails timer ordering tests
> ---
>
> Key: BEAM-8530
> URL: https://issues.apache.org/jira/browse/BEAM-8530
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.17.0
>Reporter: Jan Lukavský
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> According to 
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8534) XlangParquetIOTest failing

2019-10-31 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8534:
-

 Summary: XlangParquetIOTest failing
 Key: BEAM-8534
 URL: https://issues.apache.org/jira/browse/BEAM-8534
 Project: Beam
  Issue Type: Sub-task
  Components: test-failures
Reporter: Kyle Weaver
Assignee: Heejong Lee


 *13:43:05* [grpc-default-executor-1] ERROR 
org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while trying 
to handle InstructionRequest 10 java.lang.IllegalArgumentException: unable to 
deserialize Custom DoFn With Execution Info*13:43:05* at 
org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
 at 
org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
at 
org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
  at 
org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
at 
org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
  at 
org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
  at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
   at 
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
  at 
org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
java.io.InvalidClassException: 
org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
class serialVersionUID = -7141898054594373712*13:43:05*   at 
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*
 at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
at 
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*  
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* 
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* 
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* 
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* 
 at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
  at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*
 at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
   at 

[jira] [Created] (BEAM-8533) test_java_expansion_portable_runner failing

2019-10-31 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8533:
-

 Summary: test_java_expansion_portable_runner failing
 Key: BEAM-8533
 URL: https://issues.apache.org/jira/browse/BEAM-8533
 Project: Beam
  Issue Type: Sub-task
  Components: test-failures
Reporter: Kyle Weaver
Assignee: Kyle Weaver


probable cause: https://github.com/apache/beam/pull/9842#issuecomment-548496295

11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
Error received from SDK harness for instruction 72: Traceback (most recent call 
last):11:13:37 File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 173, in _execute*11:13:37* response = task()11:13:37 File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 196, in 11:13:37 self._execute(lambda: 
worker.do_instruction(work), work)11:13:37 File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 378, in process_bundle*11:13:37* instruction_id, 
request.process_bundle_descriptor_id)11:13:37 File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: 
u'1-37'





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337033
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 31/Oct/19 20:40
Start Date: 31/Oct/19 20:40
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9912: [BEAM-8491] Add 
ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912#issuecomment-548559353
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337033)
Time Spent: 40m  (was: 0.5h)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337028
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 31/Oct/19 20:23
Start Date: 31/Oct/19 20:23
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite from MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-548553421
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337028)
Time Spent: 3.5h  (was: 3h 20m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337027
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 31/Oct/19 20:23
Start Date: 31/Oct/19 20:23
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite from MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-548553421
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337027)
Time Spent: 3h 20m  (was: 3h 10m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.

2019-10-31 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8532:
-

 Summary: Beam Python trigger driver sets incorrect timestamp for 
output windows.
 Key: BEAM-8532
 URL: https://issues.apache.org/jira/browse/BEAM-8532
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw
Assignee: Robert Bradshaw


The timestamp should lie in the window, otherwise re-windowing will not be 
idempotent. 

https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183
 should be using {{window.max_timestamp()}} rather than {{.end}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337018
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 31/Oct/19 20:09
Start Date: 31/Oct/19 20:09
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9806: [BEAM-8427] 
Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r341343191
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
 
 Review comment:
   I see what you mean, replaced `?`with `[` `]` to be consistent with 
MongoDbIO format.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337018)
Time Spent: 3h 10m  (was: 3h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8531) DoFnTester should either be removed from the documentation or undeprecated.

2019-10-31 Thread Daniel Collins (Jira)
Daniel Collins created BEAM-8531:


 Summary: DoFnTester should either be removed from the 
documentation or undeprecated.
 Key: BEAM-8531
 URL: https://issues.apache.org/jira/browse/BEAM-8531
 Project: Beam
  Issue Type: Bug
  Components: beam-community, beam-model
Reporter: Daniel Collins
Assignee: Aizhamal Nurmamat kyzy


DoFnTester is the first method described in the "Test Your Pipeline" section 
[https://beam.apache.org/documentation/pipelines/test-your-pipeline/], but it 
is deprecated.  It should likely either be removed from the documentation 
suggesting using it or undeprecated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=337016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337016
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:51
Start Date: 31/Oct/19 19:51
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9056: [BEAM-7746] 
Add python type hints
URL: https://github.com/apache/beam/pull/9056#discussion_r340904610
 
 

 ##
 File path: sdks/python/.pylintrc
 ##
 @@ -143,6 +143,7 @@ disable =
   unnecessary-lambda,
   unnecessary-pass,
   unneeded-not,
+  unsubscriptable-object,
 
 Review comment:
   ```python
   class _ListBuffer(List[bytes]):
 """Used to support parititioning of a list."""
 def partition(self, n):
   # type: (int) -> List[List[bytes]]
   return [self[k::n] for k in range(n)]# <-- ERROR HERE
   ```
   
   ```python
   
   class SdfProcessSizedElements(DoOperation):
 ...
 def progress_metrics(self):
   # type: () -> beam_fn_api_pb2.Metrics.PTransform
   with self.lock:
 metrics = super(SdfProcessSizedElements, self).progress_metrics()
 current_element_progress = self.current_element_progress()
   if current_element_progress:
 assert self.input_info is not None
 metrics.active_elements.measured.input_element_counts[
 self.input_info[1]] = 1# <-- ERROR HERE
 metrics.active_elements.fraction_remaining = (
 current_element_progress.fraction_remaining)
   return metrics
   ```
   
   Generally speaking, there are a handful of checks that overlap between mypy 
and pylint where pylint does a worse job.  In the second example above, 
`self.input_info` is an `Optional[Tuple]` that's initialized to `None`, so 
pylint thinks it's unsubscriptable, but it's initialized to a tuple elsewhere 
and there's a line just above the error that says `assert self.input_info is 
not None`, so it obviously is in this context.  These are all subtleties that 
mypy understands but pylint does not, so I think we'll see a trend of moving 
responsibility for these structural tests to mypy.
   
   > Perhaps a newer version of pylint could help?
   
   I'm already using the latest and greatest (one of the commits here ticks the 
version up from 2.4.2 to 2.4.3)
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337016)
Time Spent: 16h 50m  (was: 16h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=337015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337015
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:50
Start Date: 31/Oct/19 19:50
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9056: [BEAM-7746] 
Add python type hints
URL: https://github.com/apache/beam/pull/9056#discussion_r340904610
 
 

 ##
 File path: sdks/python/.pylintrc
 ##
 @@ -143,6 +143,7 @@ disable =
   unnecessary-lambda,
   unnecessary-pass,
   unneeded-not,
+  unsubscriptable-object,
 
 Review comment:
   ```python
   class _ListBuffer(List[bytes]):
 """Used to support parititioning of a list."""
 def partition(self, n):
   # type: (int) -> List[List[bytes]]
   return [self[k::n] for k in range(n)]# <-- ERROR HERE
   ```
   
   ```python
   
   class SdfProcessSizedElements(DoOperation):
 ...
 def progress_metrics(self):
   # type: () -> beam_fn_api_pb2.Metrics.PTransform
   with self.lock:
 metrics = super(SdfProcessSizedElements, self).progress_metrics()
 current_element_progress = self.current_element_progress()
   if current_element_progress:
 assert self.input_info is not None
 metrics.active_elements.measured.input_element_counts[
 self.input_info[1]] = 1# <-- ERROR HERE
 metrics.active_elements.fraction_remaining = (
 current_element_progress.fraction_remaining)
   return metrics
   ```
   
   Generally speaking, there are a handful of checks that overlap between mypy 
and pylint where pylint does a worse job.  In the second example above, 
`self.input_info` is an `Optional[Tuple]` that's initialized to `None`, so 
pylint thinks it's unsubscriptable, but it's initialized to a tuple elsewhere 
and there's a line just above the error that says `assert self.input_info is 
not None`, so it obviously is in this context.  These are all subtleties that 
mypy understands but pylint does not.
   
   > Perhaps a newer version of pylint could help?
   
   I'm already using the latest and greatest (one of the commits here ticks the 
version up from 2.4.2 to 2.4.3)
   
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337015)
Time Spent: 16h 40m  (was: 16.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=337014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337014
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:48
Start Date: 31/Oct/19 19:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-548540370
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337014)
Time Spent: 19.5h  (was: 19h 20m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=337013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337013
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:47
Start Date: 31/Oct/19 19:47
Worklog Time Spent: 10m 
  Work Description: MattMorgis commented on pull request #9955: [BEAM-2572] 
Python SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955
 
 
   Co-authored-by: Matthew Morgis 
   Co-authored-by: Tamera Lanham 
   
   This adds an AWS S3 file system implementation to the python SDK. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337009
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:36
Start Date: 31/Oct/19 19:36
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9937: [BEAM-8513] Allow 
reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#issuecomment-548535598
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337009)
Time Spent: 1h 50m  (was: 1h 40m)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Priority: Critical
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337008
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:36
Start Date: 31/Oct/19 19:36
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9937: [BEAM-8513] Allow 
reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#issuecomment-548498490
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337008)
Time Spent: 1h 40m  (was: 1.5h)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Priority: Critical
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=337007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337007
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:27
Start Date: 31/Oct/19 19:27
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r341326775
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -211,9 +211,8 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
 }
   }
 
-  // When project push-down is supported.
-  if ((options == PushDownOptions.PROJECT || options == 
PushDownOptions.BOTH)
-  && !fieldNames.isEmpty()) {
+  // When project push-down is supported or field reordering is needed.
 
 Review comment:
   Decided to not drop the `Calc` when fields are selected in a different order 
for now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337007)
Time Spent: 1h 50m  (was: 1h 40m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=337006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337006
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:26
Start Date: 31/Oct/19 19:26
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r341326424
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) {
 // 1. Calc only does projects and renames.
 //And
 // 2. Predicate can be completely pushed-down to IO level.
-if (isProjectRenameOnlyProgram(program) && 
tableFilter.getNotSupported().isEmpty()) {
+//And
+// 3. And IO supports project push-down OR all fields are projected by a 
Calc.
+if (isProjectRenameOnlyProgram(program)
+&& tableFilter.getNotSupported().isEmpty()
+&& (beamSqlTable.supportsProjects()
+|| calc.getRowType().getFieldCount() == 
calcInputRowType.getFieldCount())) {
 
 Review comment:
   I decided not to drop the `Calc` when project push-down is not supported.
   It might make sense to allow IOs communicate to the Rule that they support 
field reordering to decided whether a Calc should be dropped.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337006)
Time Spent: 1h 40m  (was: 1.5h)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8434) Allow trigger transcript tests to be run as ValidatesRunner tests.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8434?focusedWorklogId=337005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337005
 ]

ASF GitHub Bot logged work on BEAM-8434:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:17
Start Date: 31/Oct/19 19:17
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9832: [BEAM-8434] 
Translate trigger transcripts into validates runner tests.
URL: https://github.com/apache/beam/pull/9832
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337005)
Time Spent: 3h 20m  (was: 3h 10m)

> Allow trigger transcript tests to be run as ValidatesRunner tests. 
> ---
>
> Key: BEAM-8434
> URL: https://issues.apache.org/jira/browse/BEAM-8434
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337003
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 31/Oct/19 19:11
Start Date: 31/Oct/19 19:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r341320380
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
+this.dbUri = matcher.group("credsHostPort"); // "mongodb://localhost:27017"
+this.dbName = matcher.group("database");
+this.dbCollection = matcher.group("collection");
+  }
+
+  @Override
+  public PCollection buildIOReader(PBegin begin) {
+// Read MongoDb Documents
+PCollection readDocuments =
+MongoDbIO.read()
+.withUri(dbUri)
+.withDatabase(dbName)
+.withCollection(dbCollection)
+.expand(begin);
+
+return readDocuments
+// TODO: figure out a way convert Document directly to Row.
+.apply("Convert Document to JSON", createParserParDo())
+.apply("Transform JSON to Row", JsonToRow.withSchema(getSchema()))
+.setRowSchema(getSchema());
+  }
+
+  @Override
+  public POutput buildIOWriter(PCollection input) {
+throw new UnsupportedOperationException("Writing to a MongoDB is not 
supported");
+  }
+
+  @Override
+  public IsBounded isBounded() {
+return IsBounded.BOUNDED;
+  }
+
+  @Override
+  public BeamTableStatistics getTableStatistics(PipelineOptions options) {
+return BeamTableStatistics.BOUNDED_UNKNOWN;
 
 Review comment:
   Awesome, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, 

[jira] [Resolved] (BEAM-8008) Show error message from expansion service in Java External transform

2019-10-31 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-8008.
---
Fix Version/s: 2.16.0
   Resolution: Fixed

> Show error message from expansion service in Java External transform
> 
>
> Key: BEAM-8008
> URL: https://issues.apache.org/jira/browse/BEAM-8008
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Labels: portability
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> show error message from expansion service in Java External transform



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=336999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336999
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:50
Start Date: 31/Oct/19 18:50
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r341311660
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
 
 Review comment:
   I guess I was expecting something like the MonogoDbIO string you linked, 
where the parens (or square brackets) indicate something is optional on their 
own. To me this looks like the the `?` could be part of the actual location 
string. This is just a bikeshed, I'm sure it will get the point across to any 
reasonable user either way, so I'm fine with leaving it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336999)
Time Spent: 2h 50m  (was: 2h 40m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement 

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=336988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336988
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:28
Start Date: 31/Oct/19 18:28
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r341301583
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -211,9 +211,8 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
 }
   }
 
-  // When project push-down is supported.
-  if ((options == PushDownOptions.PROJECT || options == 
PushDownOptions.BOTH)
-  && !fieldNames.isEmpty()) {
+  // When project push-down is supported or field reordering is needed.
 
 Review comment:
   The problem I was trying to fix by modifying that `if` statement is a 
scenario when an IO only supports Filter push-down and predicate can be 
completely pushed-down to IO layer, all fields are selected, so there is no 
need to preserve the `Calc` to perform project, but selected fields are not in 
the random order.
   In the scenario described above an IO does need to reorder fields (either 
with a `Select` or by not dropping a `Calc`).
   
   I agree that the current `if` statement is incorrect and it should look more 
like what it used to, but with an additional check. Not quite sure how to check 
that the `IORel` is not followed by a `CalcRel` and that the fields are not 
selected in the order they are present in the original schema, but I'm working 
on it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336988)
Time Spent: 1.5h  (was: 1h 20m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336979
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:12
Start Date: 31/Oct/19 18:12
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548501188
 
 
   Right, it's similar but a much more powerful construct than the 
DoOutputsTuple. The DoOutputsTuple assumes a single, shared producer for each 
PCollection. The PTuple allows for each PCollection to be an output of a 
different PTransform. You can have much more interesting composites using a 
PTuple instead of a DoOutputsTuple.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336979)
Time Spent: 14h 10m  (was: 14h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336977
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:09
Start Date: 31/Oct/19 18:09
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548501188
 
 
   Right, it's similar but a much more powerful construct than the 
DoOutputsTuple. The DoOutputsTuple assumes a single, shared producer for each 
PCollection. This allows for each PCollection to be an output of a different 
PTransform. You can have much more interesting composites using a PTuple 
instead of a DoOutputsTuple.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336977)
Time Spent: 14h  (was: 13h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336976
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:08
Start Date: 31/Oct/19 18:08
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548501188
 
 
   Right, it's similar but a much more powerful construct than the 
DoOutputsTuple. The DoOutputsTuple assumes a single producer for each 
PCollection. This allows for each PCollection to be an output of a different 
PTransform. You can have much more interesting composites using a PTuple 
instead of a DoOutputsTuple.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336976)
Time Spent: 13h 50m  (was: 13h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336970
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:06
Start Date: 31/Oct/19 18:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9954: [BEAM-8335] Add the 
PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548500321
 
 
   This is very similar to DoOutputsTuple, but DoOutputsTuple is quite a bit 
ideosyncratic, and not super reusable. I think this LGTM. But to be sure, I'll 
also request review from r: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336970)
Time Spent: 13.5h  (was: 13h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336971
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:06
Start Date: 31/Oct/19 18:06
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548496431
 
 
   > Can you show some examples of how this would be used? Would this be 
exposed to users at all? Java has a PCollectionTuple that is used in transforms 
like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. 
`(pcoll1, pcoll2, pcoll3) | Flatten()`.
   > 
   > How would `PTuple` be used?
   
   `PTuple`s will be used like:
   ```
   class SomeCustomComposite(PTransform):
 def expand(self, pcoll):
   def my_multi_do_fn(x):
 if isinstance(x, int):
   yield pvalue.TaggedOutput('number', x)
 if isinstance(x, str):
   yield pvalue.TaggedOutput('string', x)
   
   def printer(x):
 print(x)
 yield x
   
   outputs = pcoll | beam.ParDo(my_multi_do_fn).with_outputs()
   return pvalue.PTuple({
   'number': output.number | beam.ParDo(printer),
   'string': output.string | beam.ParDo(printer)
   })
   
   p = beam.Pipeline()
   
   main = p | SomeCustomComposite()
   
   # Prints out each element that is a number.
   numbers = main.number | beam.ParDo(...)
   
   # Prints out each element that is a string.
   strings = main.string | beam.ParDo(...)
   ```
   
   This is a fundamentally different and necessary type than what is available 
in the SDK. It allows for referencing an arbitrary amount of PCollections by 
name each with a different producer.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336971)
Time Spent: 13h 40m  (was: 13.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336963
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:03
Start Date: 31/Oct/19 18:03
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548496431
 
 
   > Can you show some examples of how this would be used? Would this be 
exposed to users at all? Java has a PCollectionTuple that is used in transforms 
like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. 
`(pcoll1, pcoll2, pcoll3) | Flatten()`.
   > 
   > How would `PTuple` be used?
   
   `PTuple`s will be used like:
   ```
   class SomeCustomComposite(PTransform):
 def expand(self, pcoll):
   def my_multi_do_fn(x):
 if isinstance(x, int):
   yield pvalue.TaggedOutput('number', x)
 if isinstance(x, str):
   yield pvalue.TaggedOutput('string', x)
   
 def printer(x):
   print(x)
   yield x
   
 outputs = pcoll | beam.ParDo(my_multi_do_fn).with_outputs()
 return pvalue.PTuple({
 'number': output.number | beam.ParDo(printer),
 'string': output.string | beam.ParDo(printer)
 })
   
   p = beam.Pipeline()
   
   main = p | SomeCustomComposite()
   
   # Prints out each element that is a number.
   numbers = main.number | beam.ParDo(...)
   
   # Prints out each element that is a string.
   strings = main.string | beam.ParDo(...)
   ```
   
   This is a fundamentally different and necessary type than what is available 
in the SDK. It allows for referencing an arbitrary amount of PCollections by 
name each with a different producer.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336963)
Time Spent: 13h 20m  (was: 13h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=336961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336961
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:01
Start Date: 31/Oct/19 18:01
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9937: [BEAM-8513] Allow 
reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#issuecomment-548389063
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336961)
Time Spent: 1h 20m  (was: 1h 10m)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=336962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336962
 ]

ASF GitHub Bot logged work on BEAM-8513:


Author: ASF GitHub Bot
Created on: 31/Oct/19 18:02
Start Date: 31/Oct/19 18:02
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9937: [BEAM-8513] Allow 
reads from exchange-bound queue without declaring the exchange
URL: https://github.com/apache/beam/pull/9937#issuecomment-548498490
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336962)
Time Spent: 1.5h  (was: 1h 20m)

> RabbitMqIO: Allow reads from exchange-bound queue without declaring the 
> exchange
> 
>
> Key: BEAM-8513
> URL: https://issues.apache.org/jira/browse/BEAM-8513
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-rabbitmq
> Environment: testing with DirectRunner
>Reporter: Nick Aldwin
>Priority: Critical
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The RabbitMqIO always declares an exchange if it is configured to read from 
> it.  This is problematic with pre-existing exchanges (a relatively common 
> pattern), as there's no provided configuration for the exchange beyond 
> exchange type.  (We stumbled on this because RabbitMqIO always declares a 
> non-durable exchange, which fails if the exchange already exists as a durable 
> exchange)
>  
> A solution to this would be to allow RabbitMqIO to read from an exchange 
> without declaring it.  This pattern is already available for queues via the 
> `queueDeclare` flag.  I propose an `exchangeDeclare` flag which preserves 
> existing behavior except for skipping the call to `exchangeDeclare` before 
> binding the queue to the exchange.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-10-31 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964253#comment-16964253
 ] 

Kyle Weaver commented on BEAM-8521:
---

I think I have tracked the Python failure: 
https://github.com/apache/beam/pull/9842#issuecomment-548496295

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> There appear to be two failures:
> test_java_expansion_portable_runner:
> *11:13:37* java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: Error received from SDK harness for instruction 
> 72: Traceback (most recent call last):*11:13:37* File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 173, in _execute*11:13:37* response = task()*11:13:37* File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 196, in *11:13:37* self._execute(lambda: 
> worker.do_instruction(work), work)*11:13:37* File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 358, in do_instruction*11:13:37* request.instruction_id)*11:13:37* File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 378, in process_bundle*11:13:37* instruction_id, 
> request.process_bundle_descriptor_id)*11:13:37* File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 311, in get*11:13:37* self.fns[bundle_descriptor_id],*11:13:37* 
> KeyError: u'1-37'
>  XlangParquetIOTest:
> *13:43:05* [grpc-default-executor-1] ERROR 
> org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while 
> trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: 
> unable to deserialize Custom DoFn With Execution Info*13:43:05*at 
> org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
>  at 
> org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
> at 
> org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
> at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
>at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
> at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
> java.io.InvalidClassException: 
> org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
> incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
> class serialVersionUID = -7141898054594373712*13:43:05*   at 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*  
>at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*
>at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> 

[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=336959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336959
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:57
Start Date: 31/Oct/19 17:57
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9686: [WIP][BEAM-5878] 
update dill min version to 0.3.1.1 and add test for functions with Keyword-only 
arguments
URL: https://github.com/apache/beam/pull/9686#issuecomment-548496713
 
 
   Hi @lazylynx , could you please give this PR another try after resolving 
conflicts? We should have fixed the test failures now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336959)
Time Spent: 15h 20m  (was: 15h 10m)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=336957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336957
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:56
Start Date: 31/Oct/19 17:56
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on pull request #9686: 
[WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions 
with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9686
 
 
   update dill minimum version to 0.3.1.1 in setup.py and re-add unittest for 
keyword-only arguments from #9237
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Resolved] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to infinite recursion during pickling on Python 3.7.

2019-10-31 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-8397.
---
Fix Version/s: 2.17.0
   Resolution: Fixed

> DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to 
> infinite recursion during pickling on Python 3.7.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
> ...
> {noformat}
> cc: [~yoshiki.obata]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336958
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:56
Start Date: 31/Oct/19 17:56
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548496431
 
 
   > Can you show some examples of how this would be used? Would this be 
exposed to users at all? Java has a PCollectionTuple that is used in transforms 
like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. 
`(pcoll1, pcoll2, pcoll3) | Flatten()`.
   > 
   > How would `PTuple` be used?
   
   `PTuple`s will be used like:
   ```
   class SomeCustomTransform(PTransform):
 def expand(self, pcoll):
   def my_multi_do_fn(x):
 if isinstance(x, int):
   yield pvalue.TaggedOutput('number', x)
 if isinstance(x, str):
   yield pvalue.TaggedOutput('string', x)
   
 def printer(x):
   print(x)
   yield x
   
 outputs = pcoll | beam.ParDo(my_multi_do_fn).with_outputs()
 return pvalue.PTuple({
 'number': output.number | beam.ParDo(printer),
 'string': output.string | beam.ParDo(printer)
 })
   
   p = beam.Pipeline()
   
   main = p | SomeCustomTransform()
   
   # Prints out each element that is a number.
   numbers = main.number | beam.ParDo(...)
   
   # Prints out each element that is a string.
   strings = main.string | beam.ParDo(...)
   ```
   
   This is a fundamentally different and necessary type than what is available 
in the SDK. It allows for referencing an arbitrary amount of PCollections by 
name each with a different producer.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336958)
Time Spent: 13h 10m  (was: 13h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=336956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336956
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:56
Start Date: 31/Oct/19 17:56
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9842: [BEAM-8442] Remove 
duplicate code for bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/9842#issuecomment-548496295
 
 
   I think this PR is causing 
[beam_PostCommit_XVR_Flink](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/)
 to fail ([BEAM-8521](https://issues.apache.org/jira/browse/BEAM-8521)). (There 
are two test failures; this PR seems related to the failure in 
`test_java_expansion_portable_runner`.) Example stack trace:
   
   ```
   11:13:37 File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 311, in get
   11:13:37 self.fns[bundle_descriptor_id],11:13:37 KeyError: u'1-37'
   ```
   
   I suspect the reason is that this PR changed the `register` instruction call 
from synchronous to asynchronous.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336956)
Time Spent: 1h 10m  (was: 1h)

> Unfiy bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to infinite recursion during pickling on Python 3.7.

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=336955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336955
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:55
Start Date: 31/Oct/19 17:55
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test on Python 
3.7.
URL: https://github.com/apache/beam/pull/9881
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336955)
Time Spent: 2.5h  (was: 2h 20m)

> DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to 
> infinite recursion during pickling on Python 3.7.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336952
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:47
Start Date: 31/Oct/19 17:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9954: [BEAM-8335] Add the 
PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548492290
 
 
   Can you show some examples of how this would be used? Would this be exposed 
to users at all? Java has a PCollectionTuple that is used in transforms like 
CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. 
`(pcoll1, pcoll2, pcoll3) | Flatten()`.
   
   How would `PTuple` be used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336952)
Time Spent: 13h  (was: 12h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=336951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336951
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:46
Start Date: 31/Oct/19 17:46
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r341279712
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -211,9 +211,8 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
 }
   }
 
-  // When project push-down is supported.
-  if ((options == PushDownOptions.PROJECT || options == 
PushDownOptions.BOTH)
-  && !fieldNames.isEmpty()) {
+  // When project push-down is supported or field reordering is needed.
 
 Review comment:
   This is interesting. I wouldn't expect an IO to support field reordering 
unless it also supports push-down. Can you explain what is going on here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336951)
Time Spent: 1h 20m  (was: 1h 10m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=336950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336950
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:46
Start Date: 31/Oct/19 17:46
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r341280735
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) {
 // 1. Calc only does projects and renames.
 //And
 // 2. Predicate can be completely pushed-down to IO level.
-if (isProjectRenameOnlyProgram(program) && 
tableFilter.getNotSupported().isEmpty()) {
+//And
+// 3. And IO supports project push-down OR all fields are projected by a 
Calc.
+if (isProjectRenameOnlyProgram(program)
+&& tableFilter.getNotSupported().isEmpty()
+&& (beamSqlTable.supportsProjects()
+|| calc.getRowType().getFieldCount() == 
calcInputRowType.getFieldCount())) {
 
 Review comment:
   I think this needs to verify the order as well as the count.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336950)
Time Spent: 1h 20m  (was: 1h 10m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336948
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:43
Start Date: 31/Oct/19 17:43
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548490521
 
 
   R: @pabloem for review please and thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336948)
Time Spent: 12h 50m  (was: 12h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336945
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 31/Oct/19 17:38
Start Date: 31/Oct/19 17:38
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9954: 
[BEAM-8335] Add the PTuple
URL: https://github.com/apache/beam/pull/9954
 
 
   This PR adds the PTuple, which is analogous to Java's PCollectionTuple.
   
   This is a very useful output type for composites that want to emit multiple 
PCollections where each have their own producer. This is a fundamentally 
different type and a more general type than the DoOutputsTuple. The 
DoOutputsTuple assumes a single producer for all of its PCollections; this does 
not.
   
   This is being used in https://github.com/apache/beam/pull/9953.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

  1   2   >