[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337197 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 01/Nov/19 05:54 Start Date: 01/Nov/19 05:54 Worklog Time Spent: 10m Work Description: jbonofre commented on pull request #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#discussion_r341458892 ## File path: sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java ## @@ -59,29 +65,46 @@ public class RabbitMqIOTest implements Serializable { private static final Logger LOG = LoggerFactory.getLogger(RabbitMqIOTest.class); + private static final int ONE_MINUTE_MS = 60 * 1000; + private static int port; + private static String defaultPort; + @ClassRule public static TemporaryFolder temporaryFolder = new TemporaryFolder(); @Rule public transient TestPipeline p = TestPipeline.create(); - private static transient Broker broker; + private static transient SystemLauncher launcher; @BeforeClass public static void beforeClass() throws Exception { port = NetworkTestHelper.getAvailableLocalPort(); +defaultPort = System.getProperty("qpid.amqp_port"); Review comment: I think it's simple enough as it is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337197) Time Spent: 2h (was: 1h 50m) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Assignee: Jean-Baptiste Onofré >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337198 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 01/Nov/19 05:54 Start Date: 01/Nov/19 05:54 Worklog Time Spent: 10m Work Description: jbonofre commented on pull request #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#discussion_r341458718 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -82,28 +97,31 @@ * * As for the {@link Read}, the {@link Write} is configured with a RabbitMQ URI. * - * For instance, you can write to an exchange (providing the exchange type): + * Examples * * {@code + * // Publishing to a named, non-durable exchange, declared by Beam: * pipeline * .apply(...) // provide PCollection * .apply(RabbitMqIO.write().withUri("amqp://user:password@localhost:5672").withExchange("EXCHANGE", "fanout")); - * } * - * For instance, you can write to a queue: + * // Publishing to an existing exchange + * pipeline + * .apply(...) // provide PCollection + * .apply(RabbitMqIO.write().withUri("amqp://user:password@localhost:5672").withExchange("EXCHANGE")); * - * {@code + * // Publishing to a named queue in the default exchange: * pipeline * .apply(...) // provide PCollection * .apply(RabbitMqIO.write().withUri("amqp://user:password@localhost:5672").withQueue("QUEUE")); - * * } */ @Experimental(Experimental.Kind.SOURCE_SINK) public class RabbitMqIO { public static Read read() { return new AutoValue_RabbitMqIO_Read.Builder() .setQueueDeclare(false) +.setExchangeDeclare(false) Review comment: I remember I discussed about that when I created the IO. I don't remember why I didn't do that finally. It makes sense anyway, thanks ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337198) Time Spent: 2h 10m (was: 2h) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Assignee: Jean-Baptiste Onofré >Priority: Critical > Time Spent: 2h 10m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337196=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337196 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 01/Nov/19 05:54 Start Date: 01/Nov/19 05:54 Worklog Time Spent: 10m Work Description: jbonofre commented on pull request #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#discussion_r341458821 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -453,6 +524,8 @@ public boolean start() throws IOException { // we consume message without autoAck (we want to do the ack ourselves) channel.setDefaultConsumer(consumer); channel.basicConsume(queueName, false, consumer); + } catch (IOException e) { +throw e; Review comment: Agree, no need. My bad. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337196) Time Spent: 2h (was: 1h 50m) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Assignee: Jean-Baptiste Onofré >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337199 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 01/Nov/19 05:54 Start Date: 01/Nov/19 05:54 Worklog Time Spent: 10m Work Description: jbonofre commented on pull request #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#discussion_r341458995 ## File path: sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java ## @@ -282,28 +424,4 @@ public void testWriteExchange() throws Exception { } } } - - private static List generateRecords(int maxNumRecords) { Review comment: +1, it looks good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337199) Time Spent: 2h 20m (was: 2h 10m) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Assignee: Jean-Baptiste Onofré >Priority: Critical > Time Spent: 2h 20m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Baptiste Onofré reassigned BEAM-8513: -- Assignee: Jean-Baptiste Onofré > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Assignee: Jean-Baptiste Onofré >Priority: Critical > Time Spent: 1h 50m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Baptiste Onofré updated BEAM-8513: --- Status: Open (was: Triage Needed) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Assignee: Jean-Baptiste Onofré >Priority: Critical > Time Spent: 1h 50m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub
[ https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337169 ] ASF GitHub Bot logged work on BEAM-8503: Author: ASF GitHub Bot Created on: 01/Nov/19 03:57 Start Date: 01/Nov/19 03:57 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9880: [BEAM-8503] Improve TestBigQuery and TestPubsub URL: https://github.com/apache/beam/pull/9880#issuecomment-548657485 run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337169) Time Spent: 2h (was: 1h 50m) > Improve TestBigQuery and TestPubsub > --- > > Key: BEAM-8503 > URL: https://issues.apache.org/jira/browse/BEAM-8503 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Add better support for E2E BigQuery and Pubsub testing: > - TestBigQuery should have the ability to insert data into the underlying > table before a test. > - TestPubsub should have the ability to subcribe to the underlying topic and > read messages that were written during a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8539: Description: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state: *STOPPED* * run: about to submit to executor: STARTING * run: actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, *STOPPED* I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. was: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state (prepared): STOPPED * about to submit to executor: STARTING * actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, STOPPED I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run: about to submit to executor: STARTING > * run: actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8539: Description: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state: *STOPPED* * run - about to submit to executor: STARTING * run - actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, *STOPPED* I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. was: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state: *STOPPED* * run: about to submit to executor: STARTING * run: actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, *STOPPED* I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964580#comment-16964580 ] Chad Dombrova commented on BEAM-8539: - [~lcwik], [~mxm] you might be interested in this. > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run: about to submit to executor: STARTING > * run: actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8539) Clearly define the valid job state transitions
Chad Dombrova created BEAM-8539: --- Summary: Clearly define the valid job state transitions Key: BEAM-8539 URL: https://issues.apache.org/jira/browse/BEAM-8539 Project: Beam Issue Type: Improvement Components: beam-model, runner-core, sdk-java-core, sdk-py-core Reporter: Chad Dombrova The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state (prepared): STOPPED * about to submit to executor: STARTING * actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, STOPPED I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337165 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 01/Nov/19 03:43 Start Date: 01/Nov/19 03:43 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#discussion_r341442887 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -376,16 +389,11 @@ public void finalizeCheckpoint() throws IOException { this.source = source; this.current = null; this.checkpointMark = checkpointMark != null ? checkpointMark : new RabbitMQCheckpointMark(); - try { -connectionHandler = new ConnectionHandler(source.spec.uri()); - } catch (Exception e) { -throw new IOException(e); - } } @Override public Instant getWatermark() { - return checkpointMark.oldestTimestamp; + return checkpointMark.latestTimestamp; Review comment: Ideally, the watermark would be informed by the RabbitMQ broker's backlog. The goal is for the watermark to be a good prediction of the fact "no future records will have timestamps before the watermark's timestamp". My impression is that you have made the watermark advance more reliably, in good ways, but will perhaps result in a large amount of data being marked late. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337165) Time Spent: 40m (was: 0.5h) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337164 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 01/Nov/19 03:43 Start Date: 01/Nov/19 03:43 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#discussion_r341442438 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -376,16 +389,11 @@ public void finalizeCheckpoint() throws IOException { this.source = source; this.current = null; this.checkpointMark = checkpointMark != null ? checkpointMark : new RabbitMQCheckpointMark(); - try { -connectionHandler = new ConnectionHandler(source.spec.uri()); Review comment: My impression is that your change here is an improvement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337164) Time Spent: 40m (was: 0.5h) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API
[ https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337163 ] ASF GitHub Bot logged work on BEAM-7434: Author: ASF GitHub Bot Created on: 01/Nov/19 03:38 Start Date: 01/Nov/19 03:38 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9900: [BEAM-7434] [BEAM-5895] and [BEAM-5894] Upgrade to rabbit amqp-client 5.x URL: https://github.com/apache/beam/pull/9900 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337163) Time Spent: 50m (was: 40m) > RabbitMqIO uses a deprecated API > > > Key: BEAM-7434 > URL: https://issues.apache.org/jira/browse/BEAM-7434 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Reporter: Nicolas Delsaux >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the > QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo > should replace this consumer with the DefaultConsumer provided by RabbitMq. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API
[ https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=337162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337162 ] ASF GitHub Bot logged work on BEAM-7434: Author: ASF GitHub Bot Created on: 01/Nov/19 03:36 Start Date: 01/Nov/19 03:36 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9900: [BEAM-7434] [BEAM-5895] and [BEAM-5894] Upgrade to rabbit amqp-client 5.x URL: https://github.com/apache/beam/pull/9900#discussion_r341442028 ## File path: sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java ## @@ -462,12 +449,14 @@ public boolean start() throws IOException { @Override public boolean advance() throws IOException { try { -QueueingConsumer.Delivery delivery = consumer.nextDelivery(1000); +Channel channel = connectionHandler.getChannel(); +// we consume message without autoAck (we want to do the ack ourselves) +GetResponse delivery = channel.basicGet(queueName, false); Review comment: @chamikaramj @dpmills @reuvenlax might have good guidance here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337162) Time Spent: 40m (was: 0.5h) > RabbitMqIO uses a deprecated API > > > Key: BEAM-7434 > URL: https://issues.apache.org/jira/browse/BEAM-7434 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Reporter: Nicolas Delsaux >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the > QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo > should replace this consumer with the DefaultConsumer provided by RabbitMq. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337161 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 01/Nov/19 03:34 Start Date: 01/Nov/19 03:34 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#issuecomment-548653719 Even though 2.17.0 has been cut, there is still time to cherrypick. Sorry for the large review delay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337161) Time Spent: 0.5h (was: 20m) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8514) ZetaSql should use cost-based optimization to take advantage of Join Reordering Rule and Push-Down Rule
[ https://issues.apache.org/jira/browse/BEAM-8514?focusedWorklogId=337150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337150 ] ASF GitHub Bot logged work on BEAM-8514: Author: ASF GitHub Bot Created on: 01/Nov/19 02:50 Start Date: 01/Nov/19 02:50 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9874: [BEAM-8514] ZetaSql should use cost based optimization URL: https://github.com/apache/beam/pull/9874 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337150) Time Spent: 20m (was: 10m) > ZetaSql should use cost-based optimization to take advantage of Join > Reordering Rule and Push-Down Rule > --- > > Key: BEAM-8514 > URL: https://issues.apache.org/jira/browse/BEAM-8514 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Default config should use BeamCostModel, as well as tests with custom > configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964541#comment-16964541 ] Thomas Weise commented on BEAM-8243: Looking forward to see the gradle commands disappear from [https://beam.apache.org/documentation/runners/flink/] and it would be good to add back running the wordcount example with FlinkRunner. That might look really user friendly now! > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=337139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337139 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 01/Nov/19 02:18 Start Date: 01/Nov/19 02:18 Worklog Time Spent: 10m Work Description: lazylynx commented on issue #9686: [WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions with Keyword-only arguments URL: https://github.com/apache/beam/pull/9686#issuecomment-548639816 @tvalentyn Sure. Wait for a while. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337139) Time Spent: 15.5h (was: 15h 20m) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Time Spent: 15.5h > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964534#comment-16964534 ] Thomas Weise commented on BEAM-8243: [~ibzib] [~robertwb] is this going to cover documentation for BEAM-8372? > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.
[ https://issues.apache.org/jira/browse/BEAM-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated BEAM-8372: --- Fix Version/s: 2.17.0 > Allow submission of Flink UberJar directly to flink cluster. > > > Key: BEAM-8372 > URL: https://issues.apache.org/jira/browse/BEAM-8372 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Labels: portability, portability-flink > Fix For: 2.17.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.
[ https://issues.apache.org/jira/browse/BEAM-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated BEAM-8372: --- Labels: portability portability-flink (was: ) > Allow submission of Flink UberJar directly to flink cluster. > > > Key: BEAM-8372 > URL: https://issues.apache.org/jira/browse/BEAM-8372 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Labels: portability, portability-flink > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns
[ https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337135 ] ASF GitHub Bot logged work on BEAM-8435: Author: ASF GitHub Bot Created on: 01/Nov/19 01:57 Start Date: 01/Nov/19 01:57 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9836: [BEAM-8435] Implement PaneInfo computation for Python. URL: https://github.com/apache/beam/pull/9836#discussion_r341425717 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -1162,12 +1173,33 @@ def process_timer(self, window_id, unused_name, time_domain, timestamp, if self.trigger_fn.should_fire(time_domain, timestamp, window, context): finished = self.trigger_fn.on_fire(timestamp, window, context) - yield self._output(window, finished, state) + yield self._output(window, finished, state, timestamp, + time_domain == TimeDomain.WATERMARK) else: raise Exception('Unexpected time domain: %s' % time_domain) - def _output(self, window, finished, state): + def _output(self, window, finished, state, watermark, maybe_ontime): """Output window and clean up if appropriate.""" +index = state.get_state(window, self.INDEX) +state.add_state(window, self.INDEX, 1) +if watermark <= window.max_timestamp(): + nonspeculative_index = -1 + timing = windowed_value.PaneInfoTiming.EARLY + if state.get_state(window, self.NONSPECULATIVE_INDEX): +logging.warning('Watermark moved backwards in time.') +else: + nonspeculative_index = state.get_state(window, self.NONSPECULATIVE_INDEX) + state.add_state(window, self.NONSPECULATIVE_INDEX, 1) + timing = ( + windowed_value.PaneInfoTiming.ON_TIME + if maybe_ontime and nonspeculative_index == 0 Review comment: `maybe_ontime` is False or null so pane can never be on_time This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337135) Time Spent: 40m (was: 0.5h) > Allow access to PaneInfo from Python DoFns > -- > > Key: BEAM-8435 > URL: https://issues.apache.org/jira/browse/BEAM-8435 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > PaneInfoParam exists, but the plumbing to actually populate it at runtime was > never added. (Nor, clearly, were any tests...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns
[ https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337134 ] ASF GitHub Bot logged work on BEAM-8435: Author: ASF GitHub Bot Created on: 01/Nov/19 01:57 Start Date: 01/Nov/19 01:57 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9836: [BEAM-8435] Implement PaneInfo computation for Python. URL: https://github.com/apache/beam/pull/9836#discussion_r341426171 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -1162,12 +1173,33 @@ def process_timer(self, window_id, unused_name, time_domain, timestamp, if self.trigger_fn.should_fire(time_domain, timestamp, window, context): finished = self.trigger_fn.on_fire(timestamp, window, context) - yield self._output(window, finished, state) + yield self._output(window, finished, state, timestamp, + time_domain == TimeDomain.WATERMARK) else: raise Exception('Unexpected time domain: %s' % time_domain) - def _output(self, window, finished, state): + def _output(self, window, finished, state, watermark, maybe_ontime): """Output window and clean up if appropriate.""" +index = state.get_state(window, self.INDEX) +state.add_state(window, self.INDEX, 1) +if watermark <= window.max_timestamp(): + nonspeculative_index = -1 + timing = windowed_value.PaneInfoTiming.EARLY + if state.get_state(window, self.NONSPECULATIVE_INDEX): +logging.warning('Watermark moved backwards in time.') Review comment: Session window max_timestamp can move forward which can result in this log. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337134) > Allow access to PaneInfo from Python DoFns > -- > > Key: BEAM-8435 > URL: https://issues.apache.org/jira/browse/BEAM-8435 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > PaneInfoParam exists, but the plumbing to actually populate it at runtime was > never added. (Nor, clearly, were any tests...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns
[ https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337133 ] ASF GitHub Bot logged work on BEAM-8435: Author: ASF GitHub Bot Created on: 01/Nov/19 01:57 Start Date: 01/Nov/19 01:57 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9836: [BEAM-8435] Implement PaneInfo computation for Python. URL: https://github.com/apache/beam/pull/9836#discussion_r341420088 ## File path: sdks/python/apache_beam/pipeline_test.py ## @@ -602,6 +603,12 @@ def test_timestamp_param_map(self): p | Create([1, 2]) | beam.Map(lambda _, t=DoFn.TimestampParam: t), equal_to([MIN_TIMESTAMP, MIN_TIMESTAMP])) + def test_pane_info_param(self): Review comment: We can add 1 more test with actual pane. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337133) Time Spent: 0.5h (was: 20m) > Allow access to PaneInfo from Python DoFns > -- > > Key: BEAM-8435 > URL: https://issues.apache.org/jira/browse/BEAM-8435 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > PaneInfoParam exists, but the plumbing to actually populate it at runtime was > never added. (Nor, clearly, were any tests...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8534) XlangParquetIOTest failing
[ https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964492#comment-16964492 ] Heejong Lee commented on BEAM-8534: --- Looks like there's a mismatch between published docker images vs locally built docker images. When we run the test, it automatically pulls the published ones and overwrites the locally built ones. > XlangParquetIOTest failing > -- > > Key: BEAM-8534 > URL: https://issues.apache.org/jira/browse/BEAM-8534 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Heejong Lee >Priority: Major > > *13:43:05* [grpc-default-executor-1] ERROR > org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while > trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: > unable to deserialize Custom DoFn With Execution Info*13:43:05* at > org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* >at > org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* > at > org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* > at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: > java.io.InvalidClassException: > org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class > incompatible: stream classdesc serialVersionUID = -7089438576249123133, local > class serialVersionUID = -7141898054594373712*13:43:05* at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* >at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at >
[jira] [Work logged] (BEAM-7303) Move Portable Runner and other of reference runner.
[ https://issues.apache.org/jira/browse/BEAM-7303?focusedWorklogId=337124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337124 ] ASF GitHub Bot logged work on BEAM-7303: Author: ASF GitHub Bot Created on: 01/Nov/19 00:56 Start Date: 01/Nov/19 00:56 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9936: [BEAM-7303] Move PortableRunner from runners.reference to java.sdks.portability package URL: https://github.com/apache/beam/pull/9936#discussion_r341418994 ## File path: sdks/java/portability/src/main/java/org/apache/beam/sdks/java/portability/PortableRunnerRegistrar.java ## @@ -15,7 +15,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.beam.runners.reference; +package org.apache.beam.sdks.java.portability; Review comment: This can also go to `package org.apache.beam.runners.portability;` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337124) Time Spent: 0.5h (was: 20m) > Move Portable Runner and other of reference runner. > --- > > Key: BEAM-7303 > URL: https://issues.apache.org/jira/browse/BEAM-7303 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > PortableRunner is used by all Flink, Spark ... . > We should move it out of Reference Runner package to stream line the > dependencies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337122 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 01/Nov/19 00:45 Start Date: 01/Nov/19 00:45 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases URL: https://github.com/apache/beam/pull/9778#discussion_r341417808 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -65,23 +66,32 @@ private static final List TEST_DATA = Arrays.asList("Apple", "Orange", "Banana", "Orange"); - // Data Table: used by testReadSketchFromBigQuery()) + // Data Table: used by tests reading sketches from BigQuery // Schema: only one STRING field named "data". - // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; - private static final Long EXPECTED_COUNT = 3L; - // Sketch Table: used by testWriteSketchToBigQuery() + // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" + private static final String DATA_TABLE_ID_NON_EMPTY = "hll_data_non_empty"; + private static final Long EXPECTED_COUNT_NON_EMPTY = 3L; + + // Content: empty + private static final String DATA_TABLE_ID_EMPTY = "hll_data_empty"; Review comment: Yes it does and I have tried it (although it is not mentioned in the BigQuery documentation). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337122) Time Spent: 36h 10m (was: 36h) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 36h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337118 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 01/Nov/19 00:29 Start Date: 01/Nov/19 00:29 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases URL: https://github.com/apache/beam/pull/9778#discussion_r341415540 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception { } /** - * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll sketch is computed by - * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies that we can run {@link - * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam to get the correct - * estimated count. + * Test that non-empty HLL++ sketch computed in BigQuery can be processed by Beam. + * + * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test Review comment: Fixed! Thanks for the link, that's a good read. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337118) Time Spent: 36h (was: 35h 50m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 36h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
Boyuan Zhang created BEAM-8537: -- Summary: Provide WatermarkEstimatorProvider for different types of WatermarkEstimator Key: BEAM-8537 URL: https://issues.apache.org/jira/browse/BEAM-8537 Project: Beam Issue Type: Improvement Components: sdk-py-core, sdk-py-harness Reporter: Boyuan Zhang Assignee: Boyuan Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8538) Python typehints: support namedtuples
Udi Meiri created BEAM-8538: --- Summary: Python typehints: support namedtuples Key: BEAM-8538 URL: https://issues.apache.org/jira/browse/BEAM-8538 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri 2 bugs: 1. convert_to_beam_type passes typing.NamedTuple as-is. The code that's supposed to convert it to Any is broken (and the test is buggy too). Fix: https://github.com/apache/beam/pull/9754/files#diff-9b2eac354738047a44814d4b429cR247 2. Possible bug: collections.NamedTuple instances should also be converted to Any, or Tuple[Any, Any, ..], if other static types checker do this as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8536) Migrate usage of DelayedBundleApplication.requested_execution_time to time duration
[ https://issues.apache.org/jira/browse/BEAM-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyuan Zhang reassigned BEAM-8536: -- Assignee: Boyuan Zhang > Migrate usage of DelayedBundleApplication.requested_execution_time to time > duration > > > Key: BEAM-8536 > URL: https://issues.apache.org/jira/browse/BEAM-8536 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow, sdk-java-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > > In DelayedBundleApplication, we used to use an absolute time to represent > reshceduling time. We want to switch to use a relative time duration, which > requires a migration in Java SDK and dataflow java runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow
[ https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=337114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337114 ] ASF GitHub Bot logged work on BEAM-8451: Author: ASF GitHub Bot Created on: 01/Nov/19 00:26 Start Date: 01/Nov/19 00:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9865: [BEAM-8451] Fix interactive beam max recursion err URL: https://github.com/apache/beam/pull/9865 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337114) Time Spent: 1.5h (was: 1h 20m) > Interactive Beam example failing from stack overflow > > > Key: BEAM-8451 > URL: https://issues.apache.org/jira/browse/BEAM-8451 > Project: Beam > Issue Type: Bug > Components: examples-python, runner-py-interactive >Reporter: Igor Durovic >Assignee: Igor Durovic >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > > RecursionError: maximum recursion depth exceeded in __instancecheck__ > at > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405] > > This occurred after the execution of the last cell in > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow
[ https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=337116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337116 ] ASF GitHub Bot logged work on BEAM-8451: Author: ASF GitHub Bot Created on: 01/Nov/19 00:26 Start Date: 01/Nov/19 00:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9865: [BEAM-8451] Fix interactive beam max recursion err URL: https://github.com/apache/beam/pull/9865#issuecomment-548620441 Thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337116) Time Spent: 1h 40m (was: 1.5h) > Interactive Beam example failing from stack overflow > > > Key: BEAM-8451 > URL: https://issues.apache.org/jira/browse/BEAM-8451 > Project: Beam > Issue Type: Bug > Components: examples-python, runner-py-interactive >Reporter: Igor Durovic >Assignee: Igor Durovic >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > > RecursionError: maximum recursion depth exceeded in __instancecheck__ > at > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405] > > This occurred after the execution of the last cell in > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337110 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 01/Nov/19 00:23 Start Date: 01/Nov/19 00:23 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases URL: https://github.com/apache/beam/pull/9778#discussion_r341414587 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception { } /** - * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll sketch is computed by - * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies that we can run {@link - * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam to get the correct - * estimated count. + * Test that non-empty HLL++ sketch computed in BigQuery can be processed by Beam. + * + * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test + * verifies that we can run {@link HllCount.MergePartial} and {@link HllCount.Extract} on the + * sketch in Beam to get the correct estimated count. + */ + @Test + public void testReadNonEmptySketchFromBigQuery() { +readSketchFromBigQuery(DATA_TABLE_ID_NON_EMPTY, EXPECTED_COUNT_NON_EMPTY); + } + + /** + * Test that empty HLL++ sketch computed in BigQuery can be processed by Beam. + * + * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test + * verifies that we can run {@link HllCount.MergePartial} and {@link HllCount.Extract} on the + * sketch in Beam to get the correct estimated count. */ @Test - public void testReadSketchFromBigQuery() { -String tableSpec = String.format("%s.%s", DATASET_ID, DATA_TABLE_ID); + public void testReadEmptySketchFromBigQuery() { +readSketchFromBigQuery(DATA_TABLE_ID_EMPTY, EXPECTED_COUNT_EMPTY); + } + + private void readSketchFromBigQuery(String tableId, Long expectedCount) { +String tableSpec = String.format("%s.%s", DATASET_ID, tableId); String query = String.format( "SELECT HLL_COUNT.INIT(%s) AS %s FROM %s", DATA_FIELD_NAME, QUERY_RESULT_FIELD_NAME, tableSpec); + SerializableFunction parseQueryResultToByteArray = -(SchemaAndRecord schemaAndRecord) -> -// BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type -((ByteBuffer) schemaAndRecord.getRecord().get(QUERY_RESULT_FIELD_NAME)).array(); +input -> { + // BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type + ByteBuffer sketch = (ByteBuffer) input.getRecord().get(QUERY_RESULT_FIELD_NAME); + if (sketch == null) { +// Empty sketch is represented by null in BigQuery and by empty byte array in Beam +return new byte[0]; + } else { +byte[] result = new byte[sketch.remaining()]; Review comment: Exactly. We know that is the case by looking into `Avro`'s implementation, but compiler does not know that, and it gives a warning if we use `.array()`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337110) Time Spent: 35h 50m (was: 35h 40m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 35h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337108 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 01/Nov/19 00:21 Start Date: 01/Nov/19 00:21 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases URL: https://github.com/apache/beam/pull/9778#discussion_r341414290 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -96,28 +106,37 @@ public static void prepareDatasetAndDataTable() throws Exception { Review comment: Ah, nice catch! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337108) Time Spent: 35h 40m (was: 35.5h) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 35h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?focusedWorklogId=337107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337107 ] ASF GitHub Bot logged work on BEAM-8521: Author: ASF GitHub Bot Created on: 01/Nov/19 00:14 Start Date: 01/Nov/19 00:14 Worklog Time Spent: 10m Work Description: ihji commented on issue #9939: [BEAM-8521] only build Python 2.7 for xlang test URL: https://github.com/apache/beam/pull/9939#issuecomment-548618170 I think we should submit this change even if it will not fix the test. There's no reason that we build all versions of Python SDKs and it takes a lot of time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337107) Time Spent: 40m (was: 0.5h) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337105 ] ASF GitHub Bot logged work on BEAM-8472: Author: ASF GitHub Bot Created on: 31/Oct/19 23:59 Start Date: 31/Oct/19 23:59 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9868: [BEAM-8472] get default GCP region option (Python) URL: https://github.com/apache/beam/pull/9868 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337105) Time Spent: 1.5h (was: 1h 20m) > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8252?focusedWorklogId=337104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337104 ] ASF GitHub Bot logged work on BEAM-8252: Author: ASF GitHub Bot Created on: 31/Oct/19 23:55 Start Date: 31/Oct/19 23:55 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9594: [BEAM-8252] Python: add worker_region and worker_zone options URL: https://github.com/apache/beam/pull/9594 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337104) Time Spent: 40m (was: 0.5h) > (Python SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8252 > URL: https://issues.apache.org/jira/browse/BEAM-8252 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options
[ https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=337102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337102 ] ASF GitHub Bot logged work on BEAM-8254: Author: ASF GitHub Bot Created on: 31/Oct/19 23:53 Start Date: 31/Oct/19 23:53 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9961: [BEAM-8254] add workerRegion and workerZone options to Java SDK URL: https://github.com/apache/beam/pull/9961 Same as #9594. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337090 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 31/Oct/19 23:04 Start Date: 31/Oct/19 23:04 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912#discussion_r341398835 ## File path: sdks/python/apache_beam/pipeline_test.py ## @@ -438,6 +438,101 @@ def get_replacement_transform(self, ptransform): p.replace_all([override]) self.assertEqual(pcoll.producer.inputs[0].element_type, expected_type) + + def test_ptransform_override_multiple_outputs(self): +class _MultiParDoComposite(PTransform): Review comment: Is this used ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337090) Time Spent: 50m (was: 40m) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time
[ https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=337092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337092 ] ASF GitHub Bot logged work on BEAM-3288: Author: ASF GitHub Bot Created on: 31/Oct/19 23:04 Start Date: 31/Oct/19 23:04 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9960: [BEAM-3288] Guard against unsafe triggers at construction time URL: https://github.com/apache/beam/pull/9960 Some triggers can cause data loss. This PR prevents them from being used. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337091 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 31/Oct/19 23:04 Start Date: 31/Oct/19 23:04 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912#discussion_r341396394 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -263,31 +262,29 @@ def _replace_if_needed(self, original_transform_node): new_output = replacement_transform.expand(input_node) - new_output.element_type = None - self.pipeline._infer_result_type(replacement_transform, inputs, - new_output) - + if isinstance(new_output, pvalue.PValue): +new_output.element_type = None +self.pipeline._infer_result_type(replacement_transform, inputs, + new_output) replacement_transform_node.add_output(new_output) - if not new_output.producer: -new_output.producer = replacement_transform_node - - # We only support replacing transforms with a single output with - # another transform that produces a single output. - # TODO: Support replacing PTransforms with multiple outputs. - if (len(original_transform_node.outputs) > 1 or - not isinstance(original_transform_node.outputs[None], - (PCollection, PDone)) or - not isinstance(new_output, (PCollection, PDone))): -raise NotImplementedError( -'PTransform overriding is only supported for PTransforms that ' -'have a single output. Tried to replace output of ' -'AppliedPTransform %r with %r.' -% (original_transform_node, new_output)) # Recording updated outputs. This cannot be done in the same visitor # since if we dynamically update output type here, we'll run into # errors when visiting child nodes. - output_map[original_transform_node.outputs[None]] = new_output + if isinstance(new_output, pvalue.PValue): +if not new_output.producer: + new_output.producer = replacement_transform_node +output_map[original_transform_node.outputs[None]] = new_output + elif isinstance(new_output, (pvalue.DoOutputsTuple, tuple)): +for pcoll in new_output: + if not pcoll.producer: +pcoll.producer = replacement_transform_node + output_map[original_transform_node.outputs[pcoll.tag]] = pcoll Review comment: nit: add a comment that we expect output tags of original and replacement transforms to be the same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337091) Time Spent: 50m (was: 40m) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8365) Add project push-down capability to IO APIs
[ https://issues.apache.org/jira/browse/BEAM-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8365. - Fix Version/s: 2.18.0 Resolution: Fixed > Add project push-down capability to IO APIs > --- > > Key: BEAM-8365 > URL: https://issues.apache.org/jira/browse/BEAM-8365 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > * InMemoryTable should implement a following method: > {code:java} > public PCollection buildIOReader( > PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code} > Which should return a `PCollection` with fields specified in `fieldNames` > list. > * Create a rule to push fields used by a Calc (in projects and in a > condition) down into TestTable IO. > * Updating that same Calc (from previous step) to have a proper input and > output schemes, remove unused fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8533) test_java_expansion_portable_runner failing
[ https://issues.apache.org/jira/browse/BEAM-8533?focusedWorklogId=337074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337074 ] ASF GitHub Bot logged work on BEAM-8533: Author: ASF GitHub Bot Created on: 31/Oct/19 22:14 Start Date: 31/Oct/19 22:14 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9956: [BEAM-8533] Revert "[BEAM-8442] Remove duplicate code for bundle register in Pyth… URL: https://github.com/apache/beam/pull/9956#issuecomment-548590139 CrossLanguageValidatesRunner failed only 1 test ([BEAM-8534](https://issues.apache.org/jira/browse/BEAM-8534)), where before it was failing 2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337074) Time Spent: 20m (was: 10m) > test_java_expansion_portable_runner failing > --- > > Key: BEAM-8533 > URL: https://issues.apache.org/jira/browse/BEAM-8533 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > probable cause: > https://github.com/apache/beam/pull/9842#issuecomment-548496295 > 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction 72: Traceback (most recent > call last):11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 173, in _execute*11:13:37* response = task()11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 196, in 11:13:37 self._execute(lambda: > worker.do_instruction(work), work)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 378, in process_bundle*11:13:37* instruction_id, > request.process_bundle_descriptor_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: > u'1-37' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8536) Migrate usage of DelayedBundleApplication.requested_execution_time to time duration
Boyuan Zhang created BEAM-8536: -- Summary: Migrate usage of DelayedBundleApplication.requested_execution_time to time duration Key: BEAM-8536 URL: https://issues.apache.org/jira/browse/BEAM-8536 Project: Beam Issue Type: Improvement Components: runner-dataflow, sdk-java-harness Reporter: Boyuan Zhang In DelayedBundleApplication, we used to use an absolute time to represent reshceduling time. We want to switch to use a relative time duration, which requires a migration in Java SDK and dataflow java runner harness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964406#comment-16964406 ] Chad Dombrova commented on BEAM-8523: - WIP: https://github.com/apache/beam/pull/9959 > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337072 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 31/Oct/19 21:57 Start Date: 31/Oct/19 21:57 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959 When connecting to GetStateStream, the job service will first stream the history of missed state events, thereby catching up the client. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.
[ https://issues.apache.org/jira/browse/BEAM-8532?focusedWorklogId=337071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337071 ] ASF GitHub Bot logged work on BEAM-8532: Author: ASF GitHub Bot Created on: 31/Oct/19 21:49 Start Date: 31/Oct/19 21:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9958: [BEAM-8532] Output timestamp should be the inclusive, not exclusive, end of window. URL: https://github.com/apache/beam/pull/9958 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub
[ https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337065 ] ASF GitHub Bot logged work on BEAM-8503: Author: ASF GitHub Bot Created on: 31/Oct/19 21:31 Start Date: 31/Oct/19 21:31 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9880: [BEAM-8503] Improve TestBigQuery and TestPubsub URL: https://github.com/apache/beam/pull/9880#issuecomment-548577455 Thanks that's a great tip! Here I thought people were just adding that "fixup!" for fun, I didn't realize it worked with `git rebase --autosquash` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337065) Time Spent: 1h 50m (was: 1h 40m) > Improve TestBigQuery and TestPubsub > --- > > Key: BEAM-8503 > URL: https://issues.apache.org/jira/browse/BEAM-8503 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Add better support for E2E BigQuery and Pubsub testing: > - TestBigQuery should have the ability to insert data into the underlying > table before a test. > - TestPubsub should have the ability to subcribe to the underlying topic and > read messages that were written during a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8535) TextIO.read doesn't work with single wildcard with relative path
Tim created BEAM-8535: - Summary: TextIO.read doesn't work with single wildcard with relative path Key: BEAM-8535 URL: https://issues.apache.org/jira/browse/BEAM-8535 Project: Beam Issue Type: Bug Components: beam-model, io-java-files Affects Versions: 2.16.0 Environment: Mac High Sierra 10.13.6. DirectRunner local. Reporter: Tim It looks like the TextIO.read transform is not matching files when using a glob wildcard when the glob starts with a * and the path is relative. IE /full/path/* and ./path/f* work but ./path/* does not. Reproduction steps using the word count example from the Beam Quick start for current version 2.16 ([https://beam.apache.org/get-started/quickstart-java/]) - {code:java} $ mkdir test-folder && cp pom.xml ./test-folder $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ > -Dexec.args="--inputFile=./test-folder/* --output=counts" -Pdirect-runner {code} The above fails when it is expected to find the pom.xml file. I tested the same way with 2.15 and it works as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337060 ] ASF GitHub Bot logged work on BEAM-8472: Author: ASF GitHub Bot Created on: 31/Oct/19 21:19 Start Date: 31/Oct/19 21:19 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9868: [BEAM-8472] get default GCP region option (Python) URL: https://github.com/apache/beam/pull/9868#issuecomment-548573257 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337060) Time Spent: 1h 20m (was: 1h 10m) > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub
[ https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337057 ] ASF GitHub Bot logged work on BEAM-8503: Author: ASF GitHub Bot Created on: 31/Oct/19 21:17 Start Date: 31/Oct/19 21:17 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9880: [BEAM-8503] Improve TestBigQuery and TestPubsub URL: https://github.com/apache/beam/pull/9880#discussion_r341369467 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java ## @@ -172,6 +193,65 @@ public void publish(List messages) throws IOException { pubsub.publish(eventsTopicPath, outgoingMessages); } + /** Pull up to 100 messages from {@link #subscriptionPath()}. */ + public List pull() throws IOException { +return pull(100); + } + + /** Pull up to {@code maxBatchSize} messages from {@link #subscriptionPath()}. */ + public List pull(int maxBatchSize) throws IOException { +return pubsub.pull(0, subscriptionPath, 100, true).stream() Review comment: Ah I see. I dismissed the asynchronous option since it would require some sort of synchronization, but that may be simpler. I can look at doing that as a follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337057) Time Spent: 1h 40m (was: 1.5h) > Improve TestBigQuery and TestPubsub > --- > > Key: BEAM-8503 > URL: https://issues.apache.org/jira/browse/BEAM-8503 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add better support for E2E BigQuery and Pubsub testing: > - TestBigQuery should have the ability to insert data into the underlying > table before a test. > - TestPubsub should have the ability to subcribe to the underlying topic and > read messages that were written during a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8533) test_java_expansion_portable_runner failing
[ https://issues.apache.org/jira/browse/BEAM-8533?focusedWorklogId=337055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337055 ] ASF GitHub Bot logged work on BEAM-8533: Author: ASF GitHub Bot Created on: 31/Oct/19 21:15 Start Date: 31/Oct/19 21:15 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9956: [BEAM-8533] Revert "[BEAM-8442] Remove duplicate code for bundle register in Pyth… URL: https://github.com/apache/beam/pull/9956#issuecomment-548571667 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337055) Remaining Estimate: 0h Time Spent: 10m > test_java_expansion_portable_runner failing > --- > > Key: BEAM-8533 > URL: https://issues.apache.org/jira/browse/BEAM-8533 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > probable cause: > https://github.com/apache/beam/pull/9842#issuecomment-548496295 > 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction 72: Traceback (most recent > call last):11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 173, in _execute*11:13:37* response = task()11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 196, in 11:13:37 self._execute(lambda: > worker.do_instruction(work), work)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 378, in process_bundle*11:13:37* instruction_id, > request.process_bundle_descriptor_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: > u'1-37' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow
[ https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=337051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337051 ] ASF GitHub Bot logged work on BEAM-8451: Author: ASF GitHub Bot Created on: 31/Oct/19 21:08 Start Date: 31/Oct/19 21:08 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9865: [BEAM-8451] Fix interactive beam max recursion err URL: https://github.com/apache/beam/pull/9865#issuecomment-548569362 Hey @pabloem is this good to merge? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337051) Time Spent: 1h 20m (was: 1h 10m) > Interactive Beam example failing from stack overflow > > > Key: BEAM-8451 > URL: https://issues.apache.org/jira/browse/BEAM-8451 > Project: Beam > Issue Type: Bug > Components: examples-python, runner-py-interactive >Reporter: Igor Durovic >Assignee: Igor Durovic >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > > RecursionError: maximum recursion depth exceeded in __instancecheck__ > at > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405] > > This occurred after the execution of the last cell in > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=337053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337053 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 31/Oct/19 21:09 Start Date: 31/Oct/19 21:09 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9956: Revert "[BEAM-8442] Remove duplicate code for bundle register in Pyth… URL: https://github.com/apache/beam/pull/9956 …on SDK harness" This reverts commit cce7a3bf30caed118db7bdc9f212117355853695. Asynchronous register was causing tests to fail ([BEAM-8533](https://issues.apache.org/jira/browse/BEAM-8533)) @mxm @sunjincheng121 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build
[jira] [Updated] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8521: -- Description: https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ Edit: Made subtasks for what appear to be two separate issues. was: https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ There appear to be two failures: test_java_expansion_portable_runner: *11:13:37* java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction 72: Traceback (most recent call last):*11:13:37* File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 173, in _execute*11:13:37* response = task()*11:13:37* File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 196, in *11:13:37* self._execute(lambda: worker.do_instruction(work), work)*11:13:37* File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 358, in do_instruction*11:13:37* request.instruction_id)*11:13:37* File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 378, in process_bundle*11:13:37* instruction_id, request.process_bundle_descriptor_id)*11:13:37* File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 311, in get*11:13:37* self.fns[bundle_descriptor_id],*11:13:37* KeyError: u'1-37' XlangParquetIOTest: *13:43:05* [grpc-default-executor-1] ERROR org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With Execution Info*13:43:05* at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* at org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* at org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* at org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* at org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* at org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* at org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* at org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* at org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: java.io.InvalidClassException: org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class incompatible: stream classdesc serialVersionUID = -7089438576249123133, local class serialVersionUID = -7141898054594373712*13:43:05* at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* at
[jira] [Updated] (BEAM-8533) test_java_expansion_portable_runner failing
[ https://issues.apache.org/jira/browse/BEAM-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8533: -- Status: Open (was: Triage Needed) > test_java_expansion_portable_runner failing > --- > > Key: BEAM-8533 > URL: https://issues.apache.org/jira/browse/BEAM-8533 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > probable cause: > https://github.com/apache/beam/pull/9842#issuecomment-548496295 > 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: > Error received from SDK harness for instruction 72: Traceback (most recent > call last):11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 173, in _execute*11:13:37* response = task()11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 196, in 11:13:37 self._execute(lambda: > worker.do_instruction(work), work)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 378, in process_bundle*11:13:37* instruction_id, > request.process_bundle_descriptor_id)11:13:37 File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: > u'1-37' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8530) Dataflow portable runner fails timer ordering tests
[ https://issues.apache.org/jira/browse/BEAM-8530?focusedWorklogId=337041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337041 ] ASF GitHub Bot logged work on BEAM-8530: Author: ASF GitHub Bot Created on: 31/Oct/19 21:00 Start Date: 31/Oct/19 21:00 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9951: [BEAM-8530] disable UsesStrictEventTimeOrdering for portable dataflow URL: https://github.com/apache/beam/pull/9951 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337041) Time Spent: 40m (was: 0.5h) > Dataflow portable runner fails timer ordering tests > --- > > Key: BEAM-8530 > URL: https://issues.apache.org/jira/browse/BEAM-8530 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.17.0 >Reporter: Jan Lukavský >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > According to > https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8534) XlangParquetIOTest failing
Kyle Weaver created BEAM-8534: - Summary: XlangParquetIOTest failing Key: BEAM-8534 URL: https://issues.apache.org/jira/browse/BEAM-8534 Project: Beam Issue Type: Sub-task Components: test-failures Reporter: Kyle Weaver Assignee: Heejong Lee *13:43:05* [grpc-default-executor-1] ERROR org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: unable to deserialize Custom DoFn With Execution Info*13:43:05* at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* at org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* at org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* at org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* at org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* at org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* at org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* at org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* at org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: java.io.InvalidClassException: org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class incompatible: stream classdesc serialVersionUID = -7089438576249123133, local class serialVersionUID = -7141898054594373712*13:43:05* at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* at
[jira] [Created] (BEAM-8533) test_java_expansion_portable_runner failing
Kyle Weaver created BEAM-8533: - Summary: test_java_expansion_portable_runner failing Key: BEAM-8533 URL: https://issues.apache.org/jira/browse/BEAM-8533 Project: Beam Issue Type: Sub-task Components: test-failures Reporter: Kyle Weaver Assignee: Kyle Weaver probable cause: https://github.com/apache/beam/pull/9842#issuecomment-548496295 11:13:37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction 72: Traceback (most recent call last):11:13:37 File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 173, in _execute*11:13:37* response = task()11:13:37 File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 196, in 11:13:37 self._execute(lambda: worker.do_instruction(work), work)11:13:37 File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 358, in do_instruction*11:13:37* request.instruction_id)11:13:37 File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 378, in process_bundle*11:13:37* instruction_id, request.process_bundle_descriptor_id)11:13:37 File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 311, in get*11:13:37* self.fns[bundle_descriptor_id],11:13:37 KeyError: u'1-37' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337033 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 31/Oct/19 20:40 Start Date: 31/Oct/19 20:40 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912#issuecomment-548559353 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337033) Time Spent: 40m (was: 0.5h) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337028 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 31/Oct/19 20:23 Start Date: 31/Oct/19 20:23 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] buildIOWrite from MongoDb Table URL: https://github.com/apache/beam/pull/9892#issuecomment-548553421 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337028) Time Spent: 3.5h (was: 3h 20m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337027 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 31/Oct/19 20:23 Start Date: 31/Oct/19 20:23 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] buildIOWrite from MongoDb Table URL: https://github.com/apache/beam/pull/9892#issuecomment-548553421 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337027) Time Spent: 3h 20m (was: 3h 10m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.
Robert Bradshaw created BEAM-8532: - Summary: Beam Python trigger driver sets incorrect timestamp for output windows. Key: BEAM-8532 URL: https://issues.apache.org/jira/browse/BEAM-8532 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Robert Bradshaw Assignee: Robert Bradshaw The timestamp should lie in the window, otherwise re-windowing will not be idempotent. https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183 should be using {{window.max_timestamp()}} rather than {{.end}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337018 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 31/Oct/19 20:09 Start Date: 31/Oct/19 20:09 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9806: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/9806#discussion_r341343191 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.Serializable; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.mongodb.MongoDbIO; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.JsonToRow; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.ParDo.SingleOutput; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting; +import org.bson.Document; +import org.bson.json.JsonMode; +import org.bson.json.JsonWriterSettings; + +@Experimental +public class MongoDbTable extends SchemaBaseBeamTable implements Serializable { + // Should match: mongodb://username:password@localhost:27017/database/collection + @VisibleForTesting + final Pattern locationPattern = + Pattern.compile( + "(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)"); + + @VisibleForTesting final String dbCollection; + @VisibleForTesting final String dbName; + @VisibleForTesting final String dbUri; + + MongoDbTable(Table table) { +super(table.getSchema()); + +String location = table.getLocation(); +Matcher matcher = locationPattern.matcher(location); +checkArgument( +matcher.matches(), +"MongoDb location must be in the following format: 'mongodb://(username:password@)?localhost:27017/database/collection'"); Review comment: I see what you mean, replaced `?`with `[` `]` to be consistent with MongoDbIO format. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337018) Time Spent: 3h 10m (was: 3h) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8531) DoFnTester should either be removed from the documentation or undeprecated.
Daniel Collins created BEAM-8531: Summary: DoFnTester should either be removed from the documentation or undeprecated. Key: BEAM-8531 URL: https://issues.apache.org/jira/browse/BEAM-8531 Project: Beam Issue Type: Bug Components: beam-community, beam-model Reporter: Daniel Collins Assignee: Aizhamal Nurmamat kyzy DoFnTester is the first method described in the "Test Your Pipeline" section [https://beam.apache.org/documentation/pipelines/test-your-pipeline/], but it is deprecated. It should likely either be removed from the documentation suggesting using it or undeprecated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=337016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337016 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 31/Oct/19 19:51 Start Date: 31/Oct/19 19:51 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#discussion_r340904610 ## File path: sdks/python/.pylintrc ## @@ -143,6 +143,7 @@ disable = unnecessary-lambda, unnecessary-pass, unneeded-not, + unsubscriptable-object, Review comment: ```python class _ListBuffer(List[bytes]): """Used to support parititioning of a list.""" def partition(self, n): # type: (int) -> List[List[bytes]] return [self[k::n] for k in range(n)]# <-- ERROR HERE ``` ```python class SdfProcessSizedElements(DoOperation): ... def progress_metrics(self): # type: () -> beam_fn_api_pb2.Metrics.PTransform with self.lock: metrics = super(SdfProcessSizedElements, self).progress_metrics() current_element_progress = self.current_element_progress() if current_element_progress: assert self.input_info is not None metrics.active_elements.measured.input_element_counts[ self.input_info[1]] = 1# <-- ERROR HERE metrics.active_elements.fraction_remaining = ( current_element_progress.fraction_remaining) return metrics ``` Generally speaking, there are a handful of checks that overlap between mypy and pylint where pylint does a worse job. In the second example above, `self.input_info` is an `Optional[Tuple]` that's initialized to `None`, so pylint thinks it's unsubscriptable, but it's initialized to a tuple elsewhere and there's a line just above the error that says `assert self.input_info is not None`, so it obviously is in this context. These are all subtleties that mypy understands but pylint does not, so I think we'll see a trend of moving responsibility for these structural tests to mypy. > Perhaps a newer version of pylint could help? I'm already using the latest and greatest (one of the commits here ticks the version up from 2.4.2 to 2.4.3) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337016) Time Spent: 16h 50m (was: 16h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=337015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337015 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 31/Oct/19 19:50 Start Date: 31/Oct/19 19:50 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#discussion_r340904610 ## File path: sdks/python/.pylintrc ## @@ -143,6 +143,7 @@ disable = unnecessary-lambda, unnecessary-pass, unneeded-not, + unsubscriptable-object, Review comment: ```python class _ListBuffer(List[bytes]): """Used to support parititioning of a list.""" def partition(self, n): # type: (int) -> List[List[bytes]] return [self[k::n] for k in range(n)]# <-- ERROR HERE ``` ```python class SdfProcessSizedElements(DoOperation): ... def progress_metrics(self): # type: () -> beam_fn_api_pb2.Metrics.PTransform with self.lock: metrics = super(SdfProcessSizedElements, self).progress_metrics() current_element_progress = self.current_element_progress() if current_element_progress: assert self.input_info is not None metrics.active_elements.measured.input_element_counts[ self.input_info[1]] = 1# <-- ERROR HERE metrics.active_elements.fraction_remaining = ( current_element_progress.fraction_remaining) return metrics ``` Generally speaking, there are a handful of checks that overlap between mypy and pylint where pylint does a worse job. In the second example above, `self.input_info` is an `Optional[Tuple]` that's initialized to `None`, so pylint thinks it's unsubscriptable, but it's initialized to a tuple elsewhere and there's a line just above the error that says `assert self.input_info is not None`, so it obviously is in this context. These are all subtleties that mypy understands but pylint does not. > Perhaps a newer version of pylint could help? I'm already using the latest and greatest (one of the commits here ticks the version up from 2.4.2 to 2.4.3) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337015) Time Spent: 16h 40m (was: 16.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=337014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337014 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 31/Oct/19 19:48 Start Date: 31/Oct/19 19:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9741: [BEAM-7926] Visualize PCollection URL: https://github.com/apache/beam/pull/9741#issuecomment-548540370 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337014) Time Spent: 19.5h (was: 19h 20m) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 19.5h > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=337013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337013 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 31/Oct/19 19:47 Start Date: 31/Oct/19 19:47 Worklog Time Spent: 10m Work Description: MattMorgis commented on pull request #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955 Co-authored-by: Matthew Morgis Co-authored-by: Tamera Lanham This adds an AWS S3 file system implementation to the python SDK. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337009 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 31/Oct/19 19:36 Start Date: 31/Oct/19 19:36 Worklog Time Spent: 10m Work Description: drobert commented on issue #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#issuecomment-548535598 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337009) Time Spent: 1h 50m (was: 1h 40m) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Priority: Critical > Time Spent: 1h 50m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=337008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337008 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 31/Oct/19 19:36 Start Date: 31/Oct/19 19:36 Worklog Time Spent: 10m Work Description: drobert commented on issue #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#issuecomment-548498490 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337008) Time Spent: 1h 40m (was: 1.5h) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Priority: Critical > Time Spent: 1h 40m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=337007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337007 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 31/Oct/19 19:27 Start Date: 31/Oct/19 19:27 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r341326775 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java ## @@ -211,9 +211,8 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { } } - // When project push-down is supported. - if ((options == PushDownOptions.PROJECT || options == PushDownOptions.BOTH) - && !fieldNames.isEmpty()) { + // When project push-down is supported or field reordering is needed. Review comment: Decided to not drop the `Calc` when fields are selected in a different order for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337007) Time Spent: 1h 50m (was: 1h 40m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=337006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337006 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 31/Oct/19 19:26 Start Date: 31/Oct/19 19:26 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r341326424 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java ## @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) { // 1. Calc only does projects and renames. //And // 2. Predicate can be completely pushed-down to IO level. -if (isProjectRenameOnlyProgram(program) && tableFilter.getNotSupported().isEmpty()) { +//And +// 3. And IO supports project push-down OR all fields are projected by a Calc. +if (isProjectRenameOnlyProgram(program) +&& tableFilter.getNotSupported().isEmpty() +&& (beamSqlTable.supportsProjects() +|| calc.getRowType().getFieldCount() == calcInputRowType.getFieldCount())) { Review comment: I decided not to drop the `Calc` when project push-down is not supported. It might make sense to allow IOs communicate to the Rule that they support field reordering to decided whether a Calc should be dropped. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337006) Time Spent: 1h 40m (was: 1.5h) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8434) Allow trigger transcript tests to be run as ValidatesRunner tests.
[ https://issues.apache.org/jira/browse/BEAM-8434?focusedWorklogId=337005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337005 ] ASF GitHub Bot logged work on BEAM-8434: Author: ASF GitHub Bot Created on: 31/Oct/19 19:17 Start Date: 31/Oct/19 19:17 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9832: [BEAM-8434] Translate trigger transcripts into validates runner tests. URL: https://github.com/apache/beam/pull/9832 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337005) Time Spent: 3h 20m (was: 3h 10m) > Allow trigger transcript tests to be run as ValidatesRunner tests. > --- > > Key: BEAM-8434 > URL: https://issues.apache.org/jira/browse/BEAM-8434 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337003 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 31/Oct/19 19:11 Start Date: 31/Oct/19 19:11 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9806: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/9806#discussion_r341320380 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.Serializable; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.mongodb.MongoDbIO; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.JsonToRow; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.ParDo.SingleOutput; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting; +import org.bson.Document; +import org.bson.json.JsonMode; +import org.bson.json.JsonWriterSettings; + +@Experimental +public class MongoDbTable extends SchemaBaseBeamTable implements Serializable { + // Should match: mongodb://username:password@localhost:27017/database/collection + @VisibleForTesting + final Pattern locationPattern = + Pattern.compile( + "(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)"); + + @VisibleForTesting final String dbCollection; + @VisibleForTesting final String dbName; + @VisibleForTesting final String dbUri; + + MongoDbTable(Table table) { +super(table.getSchema()); + +String location = table.getLocation(); +Matcher matcher = locationPattern.matcher(location); +checkArgument( +matcher.matches(), +"MongoDb location must be in the following format: 'mongodb://(username:password@)?localhost:27017/database/collection'"); +this.dbUri = matcher.group("credsHostPort"); // "mongodb://localhost:27017" +this.dbName = matcher.group("database"); +this.dbCollection = matcher.group("collection"); + } + + @Override + public PCollection buildIOReader(PBegin begin) { +// Read MongoDb Documents +PCollection readDocuments = +MongoDbIO.read() +.withUri(dbUri) +.withDatabase(dbName) +.withCollection(dbCollection) +.expand(begin); + +return readDocuments +// TODO: figure out a way convert Document directly to Row. +.apply("Convert Document to JSON", createParserParDo()) +.apply("Transform JSON to Row", JsonToRow.withSchema(getSchema())) +.setRowSchema(getSchema()); + } + + @Override + public POutput buildIOWriter(PCollection input) { +throw new UnsupportedOperationException("Writing to a MongoDB is not supported"); + } + + @Override + public IsBounded isBounded() { +return IsBounded.BOUNDED; + } + + @Override + public BeamTableStatistics getTableStatistics(PipelineOptions options) { +return BeamTableStatistics.BOUNDED_UNKNOWN; Review comment: Awesome, thanks! This is an automated message from the Apache Git Service. To respond to the message,
[jira] [Resolved] (BEAM-8008) Show error message from expansion service in Java External transform
[ https://issues.apache.org/jira/browse/BEAM-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee resolved BEAM-8008. --- Fix Version/s: 2.16.0 Resolution: Fixed > Show error message from expansion service in Java External transform > > > Key: BEAM-8008 > URL: https://issues.apache.org/jira/browse/BEAM-8008 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Labels: portability > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > > show error message from expansion service in Java External transform -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=336999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336999 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 31/Oct/19 18:50 Start Date: 31/Oct/19 18:50 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9806: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/9806#discussion_r341311660 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import java.io.Serializable; +import java.util.regex.Matcher; +import java.util.regex.Pattern; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics; +import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable; +import org.apache.beam.sdk.extensions.sql.meta.Table; +import org.apache.beam.sdk.io.mongodb.MongoDbIO; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.JsonToRow; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.ParDo.SingleOutput; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollection.IsBounded; +import org.apache.beam.sdk.values.POutput; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting; +import org.bson.Document; +import org.bson.json.JsonMode; +import org.bson.json.JsonWriterSettings; + +@Experimental +public class MongoDbTable extends SchemaBaseBeamTable implements Serializable { + // Should match: mongodb://username:password@localhost:27017/database/collection + @VisibleForTesting + final Pattern locationPattern = + Pattern.compile( + "(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)"); + + @VisibleForTesting final String dbCollection; + @VisibleForTesting final String dbName; + @VisibleForTesting final String dbUri; + + MongoDbTable(Table table) { +super(table.getSchema()); + +String location = table.getLocation(); +Matcher matcher = locationPattern.matcher(location); +checkArgument( +matcher.matches(), +"MongoDb location must be in the following format: 'mongodb://(username:password@)?localhost:27017/database/collection'"); Review comment: I guess I was expecting something like the MonogoDbIO string you linked, where the parens (or square brackets) indicate something is optional on their own. To me this looks like the the `?` could be part of the actual location string. This is just a bikeshed, I'm sure it will get the point across to any reasonable user either way, so I'm fine with leaving it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336999) Time Spent: 2h 50m (was: 2h 40m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=336988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336988 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 31/Oct/19 18:28 Start Date: 31/Oct/19 18:28 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r341301583 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java ## @@ -211,9 +211,8 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { } } - // When project push-down is supported. - if ((options == PushDownOptions.PROJECT || options == PushDownOptions.BOTH) - && !fieldNames.isEmpty()) { + // When project push-down is supported or field reordering is needed. Review comment: The problem I was trying to fix by modifying that `if` statement is a scenario when an IO only supports Filter push-down and predicate can be completely pushed-down to IO layer, all fields are selected, so there is no need to preserve the `Calc` to perform project, but selected fields are not in the random order. In the scenario described above an IO does need to reorder fields (either with a `Select` or by not dropping a `Calc`). I agree that the current `if` statement is incorrect and it should look more like what it used to, but with an additional check. Not quite sure how to check that the `IORel` is not followed by a `CalcRel` and that the fields are not selected in the order they are present in the original schema, but I'm working on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336988) Time Spent: 1.5h (was: 1h 20m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336979 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 18:12 Start Date: 31/Oct/19 18:12 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548501188 Right, it's similar but a much more powerful construct than the DoOutputsTuple. The DoOutputsTuple assumes a single, shared producer for each PCollection. The PTuple allows for each PCollection to be an output of a different PTransform. You can have much more interesting composites using a PTuple instead of a DoOutputsTuple. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336979) Time Spent: 14h 10m (was: 14h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 14h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336977 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 18:09 Start Date: 31/Oct/19 18:09 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548501188 Right, it's similar but a much more powerful construct than the DoOutputsTuple. The DoOutputsTuple assumes a single, shared producer for each PCollection. This allows for each PCollection to be an output of a different PTransform. You can have much more interesting composites using a PTuple instead of a DoOutputsTuple. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336977) Time Spent: 14h (was: 13h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 14h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336976 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 18:08 Start Date: 31/Oct/19 18:08 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548501188 Right, it's similar but a much more powerful construct than the DoOutputsTuple. The DoOutputsTuple assumes a single producer for each PCollection. This allows for each PCollection to be an output of a different PTransform. You can have much more interesting composites using a PTuple instead of a DoOutputsTuple. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336976) Time Spent: 13h 50m (was: 13h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 13h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336970 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 18:06 Start Date: 31/Oct/19 18:06 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548500321 This is very similar to DoOutputsTuple, but DoOutputsTuple is quite a bit ideosyncratic, and not super reusable. I think this LGTM. But to be sure, I'll also request review from r: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336970) Time Spent: 13.5h (was: 13h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 13.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336971 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 18:06 Start Date: 31/Oct/19 18:06 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548496431 > Can you show some examples of how this would be used? Would this be exposed to users at all? Java has a PCollectionTuple that is used in transforms like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. `(pcoll1, pcoll2, pcoll3) | Flatten()`. > > How would `PTuple` be used? `PTuple`s will be used like: ``` class SomeCustomComposite(PTransform): def expand(self, pcoll): def my_multi_do_fn(x): if isinstance(x, int): yield pvalue.TaggedOutput('number', x) if isinstance(x, str): yield pvalue.TaggedOutput('string', x) def printer(x): print(x) yield x outputs = pcoll | beam.ParDo(my_multi_do_fn).with_outputs() return pvalue.PTuple({ 'number': output.number | beam.ParDo(printer), 'string': output.string | beam.ParDo(printer) }) p = beam.Pipeline() main = p | SomeCustomComposite() # Prints out each element that is a number. numbers = main.number | beam.ParDo(...) # Prints out each element that is a string. strings = main.string | beam.ParDo(...) ``` This is a fundamentally different and necessary type than what is available in the SDK. It allows for referencing an arbitrary amount of PCollections by name each with a different producer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336971) Time Spent: 13h 40m (was: 13.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 13h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336963 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 18:03 Start Date: 31/Oct/19 18:03 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548496431 > Can you show some examples of how this would be used? Would this be exposed to users at all? Java has a PCollectionTuple that is used in transforms like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. `(pcoll1, pcoll2, pcoll3) | Flatten()`. > > How would `PTuple` be used? `PTuple`s will be used like: ``` class SomeCustomComposite(PTransform): def expand(self, pcoll): def my_multi_do_fn(x): if isinstance(x, int): yield pvalue.TaggedOutput('number', x) if isinstance(x, str): yield pvalue.TaggedOutput('string', x) def printer(x): print(x) yield x outputs = pcoll | beam.ParDo(my_multi_do_fn).with_outputs() return pvalue.PTuple({ 'number': output.number | beam.ParDo(printer), 'string': output.string | beam.ParDo(printer) }) p = beam.Pipeline() main = p | SomeCustomComposite() # Prints out each element that is a number. numbers = main.number | beam.ParDo(...) # Prints out each element that is a string. strings = main.string | beam.ParDo(...) ``` This is a fundamentally different and necessary type than what is available in the SDK. It allows for referencing an arbitrary amount of PCollections by name each with a different producer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336963) Time Spent: 13h 20m (was: 13h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 13h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=336961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336961 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 31/Oct/19 18:01 Start Date: 31/Oct/19 18:01 Worklog Time Spent: 10m Work Description: drobert commented on issue #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#issuecomment-548389063 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336961) Time Spent: 1h 20m (was: 1h 10m) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8513) RabbitMqIO: Allow reads from exchange-bound queue without declaring the exchange
[ https://issues.apache.org/jira/browse/BEAM-8513?focusedWorklogId=336962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336962 ] ASF GitHub Bot logged work on BEAM-8513: Author: ASF GitHub Bot Created on: 31/Oct/19 18:02 Start Date: 31/Oct/19 18:02 Worklog Time Spent: 10m Work Description: drobert commented on issue #9937: [BEAM-8513] Allow reads from exchange-bound queue without declaring the exchange URL: https://github.com/apache/beam/pull/9937#issuecomment-548498490 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336962) Time Spent: 1.5h (was: 1h 20m) > RabbitMqIO: Allow reads from exchange-bound queue without declaring the > exchange > > > Key: BEAM-8513 > URL: https://issues.apache.org/jira/browse/BEAM-8513 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq > Environment: testing with DirectRunner >Reporter: Nick Aldwin >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > The RabbitMqIO always declares an exchange if it is configured to read from > it. This is problematic with pre-existing exchanges (a relatively common > pattern), as there's no provided configuration for the exchange beyond > exchange type. (We stumbled on this because RabbitMqIO always declares a > non-durable exchange, which fails if the exchange already exists as a durable > exchange) > > A solution to this would be to allow RabbitMqIO to read from an exchange > without declaring it. This pattern is already available for queues via the > `queueDeclare` flag. I propose an `exchangeDeclare` flag which preserves > existing behavior except for skipping the call to `exchangeDeclare` before > binding the queue to the exchange. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964253#comment-16964253 ] Kyle Weaver commented on BEAM-8521: --- I think I have tracked the Python failure: https://github.com/apache/beam/pull/9842#issuecomment-548496295 > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > There appear to be two failures: > test_java_expansion_portable_runner: > *11:13:37* java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > 72: Traceback (most recent call last):*11:13:37* File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 173, in _execute*11:13:37* response = task()*11:13:37* File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 196, in *11:13:37* self._execute(lambda: > worker.do_instruction(work), work)*11:13:37* File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 358, in do_instruction*11:13:37* request.instruction_id)*11:13:37* File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 378, in process_bundle*11:13:37* instruction_id, > request.process_bundle_descriptor_id)*11:13:37* File > "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 311, in get*11:13:37* self.fns[bundle_descriptor_id],*11:13:37* > KeyError: u'1-37' > XlangParquetIOTest: > *13:43:05* [grpc-default-executor-1] ERROR > org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while > trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: > unable to deserialize Custom DoFn With Execution Info*13:43:05*at > org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* >at > org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* > at > org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* > at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: > java.io.InvalidClassException: > org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class > incompatible: stream classdesc serialVersionUID = -7089438576249123133, local > class serialVersionUID = -7141898054594373712*13:43:05* at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* >at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at >
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=336959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336959 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 31/Oct/19 17:57 Start Date: 31/Oct/19 17:57 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9686: [WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions with Keyword-only arguments URL: https://github.com/apache/beam/pull/9686#issuecomment-548496713 Hi @lazylynx , could you please give this PR another try after resolving conflicts? We should have fixed the test failures now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336959) Time Spent: 15h 20m (was: 15h 10m) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Time Spent: 15h 20m > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=336957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336957 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 31/Oct/19 17:56 Start Date: 31/Oct/19 17:56 Worklog Time Spent: 10m Work Description: lazylynx commented on pull request #9686: [WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions with Keyword-only arguments URL: https://github.com/apache/beam/pull/9686 update dill minimum version to 0.3.1.1 in setup.py and re-add unittest for keyword-only arguments from #9237 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Resolved] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to infinite recursion during pickling on Python 3.7.
[ https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-8397. --- Fix Version/s: 2.17.0 Resolution: Fixed > DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to > infinite recursion during pickling on Python 3.7. > --- > > Key: BEAM-8397 > URL: https://issues.apache.org/jira/browse/BEAM-8397 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > `python ./setup.py test -s > apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data` > passes. > `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam > depends on dill==0.3.1.1.`python ./setup.py nosetests --tests > 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data` > fails currently if run on master. > The failure indicates infinite recursion during pickling: > {noformat} > test_remote_runner_display_data > (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... > Fatal Python error: Cannot recover from stack overflow. > Current thread 0x7f9d700ed740 (most recent call first): > File "/usr/lib/python3.7/pickle.py", line 479 in get > File "/usr/lib/python3.7/pickle.py", line 497 in save > File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 1394 in save_function > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems > File "/usr/lib/python3.7/pickle.py", line 856 in save_dict > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 910 in save_module_dict > File > "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py", > line 198 in new_save_module_dict > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py", > line 114 in wrapper > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 1137 in save_cell > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 1394 in save_function > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems > File "/usr/lib/python3.7/pickle.py", line 856 in save_dict > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 910 in save_module_dict > File > "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py", > line 198 in new_save_module_dict > ... > {noformat} > cc: [~yoshiki.obata] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336958 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 17:56 Start Date: 31/Oct/19 17:56 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548496431 > Can you show some examples of how this would be used? Would this be exposed to users at all? Java has a PCollectionTuple that is used in transforms like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. `(pcoll1, pcoll2, pcoll3) | Flatten()`. > > How would `PTuple` be used? `PTuple`s will be used like: ``` class SomeCustomTransform(PTransform): def expand(self, pcoll): def my_multi_do_fn(x): if isinstance(x, int): yield pvalue.TaggedOutput('number', x) if isinstance(x, str): yield pvalue.TaggedOutput('string', x) def printer(x): print(x) yield x outputs = pcoll | beam.ParDo(my_multi_do_fn).with_outputs() return pvalue.PTuple({ 'number': output.number | beam.ParDo(printer), 'string': output.string | beam.ParDo(printer) }) p = beam.Pipeline() main = p | SomeCustomTransform() # Prints out each element that is a number. numbers = main.number | beam.ParDo(...) # Prints out each element that is a string. strings = main.string | beam.ParDo(...) ``` This is a fundamentally different and necessary type than what is available in the SDK. It allows for referencing an arbitrary amount of PCollections by name each with a different producer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336958) Time Spent: 13h 10m (was: 13h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 13h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=336956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336956 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 31/Oct/19 17:56 Start Date: 31/Oct/19 17:56 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9842: [BEAM-8442] Remove duplicate code for bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/9842#issuecomment-548496295 I think this PR is causing [beam_PostCommit_XVR_Flink](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/) to fail ([BEAM-8521](https://issues.apache.org/jira/browse/BEAM-8521)). (There are two test failures; this PR seems related to the failure in `test_java_expansion_portable_runner`.) Example stack trace: ``` 11:13:37 File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 311, in get 11:13:37 self.fns[bundle_descriptor_id],11:13:37 KeyError: u'1-37' ``` I suspect the reason is that this PR changed the `register` instruction call from synchronous to asynchronous. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336956) Time Spent: 1h 10m (was: 1h) > Unfiy bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to infinite recursion during pickling on Python 3.7.
[ https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=336955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336955 ] ASF GitHub Bot logged work on BEAM-8397: Author: ASF GitHub Bot Created on: 31/Oct/19 17:55 Start Date: 31/Oct/19 17:55 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] Fix infinite recursion errors in test_remote_runner_display_data_test on Python 3.7. URL: https://github.com/apache/beam/pull/9881 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336955) Time Spent: 2.5h (was: 2h 20m) > DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to > infinite recursion during pickling on Python 3.7. > --- > > Key: BEAM-8397 > URL: https://issues.apache.org/jira/browse/BEAM-8397 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > `python ./setup.py test -s > apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data` > passes. > `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam > depends on dill==0.3.1.1.`python ./setup.py nosetests --tests > 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data` > fails currently if run on master. > The failure indicates infinite recursion during pickling: > {noformat} > test_remote_runner_display_data > (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... > Fatal Python error: Cannot recover from stack overflow. > Current thread 0x7f9d700ed740 (most recent call first): > File "/usr/lib/python3.7/pickle.py", line 479 in get > File "/usr/lib/python3.7/pickle.py", line 497 in save > File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 1394 in save_function > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems > File "/usr/lib/python3.7/pickle.py", line 856 in save_dict > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 910 in save_module_dict > File > "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py", > line 198 in new_save_module_dict > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py", > line 114 in wrapper > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 1137 in save_cell > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 1394 in save_function > File "/usr/lib/python3.7/pickle.py", line 504 in save > File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems > File "/usr/lib/python3.7/pickle.py", line 856 in save_dict > File > "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py", > line 910 in save_module_dict > File >
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336952 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 17:47 Start Date: 31/Oct/19 17:47 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548492290 Can you show some examples of how this would be used? Would this be exposed to users at all? Java has a PCollectionTuple that is used in transforms like CoGBK, and Flatten. In Python we use a tuple containing PCollections e.g. `(pcoll1, pcoll2, pcoll3) | Flatten()`. How would `PTuple` be used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336952) Time Spent: 13h (was: 12h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 13h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=336951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336951 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 31/Oct/19 17:46 Start Date: 31/Oct/19 17:46 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r341279712 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java ## @@ -211,9 +211,8 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { } } - // When project push-down is supported. - if ((options == PushDownOptions.PROJECT || options == PushDownOptions.BOTH) - && !fieldNames.isEmpty()) { + // When project push-down is supported or field reordering is needed. Review comment: This is interesting. I wouldn't expect an IO to support field reordering unless it also supports push-down. Can you explain what is going on here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336951) Time Spent: 1h 20m (was: 1h 10m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=336950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336950 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 31/Oct/19 17:46 Start Date: 31/Oct/19 17:46 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r341280735 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java ## @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) { // 1. Calc only does projects and renames. //And // 2. Predicate can be completely pushed-down to IO level. -if (isProjectRenameOnlyProgram(program) && tableFilter.getNotSupported().isEmpty()) { +//And +// 3. And IO supports project push-down OR all fields are projected by a Calc. +if (isProjectRenameOnlyProgram(program) +&& tableFilter.getNotSupported().isEmpty() +&& (beamSqlTable.supportsProjects() +|| calc.getRowType().getFieldCount() == calcInputRowType.getFieldCount())) { Review comment: I think this needs to verify the order as well as the count. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336950) Time Spent: 1h 20m (was: 1h 10m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336948 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 17:43 Start Date: 31/Oct/19 17:43 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548490521 R: @pabloem for review please and thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336948) Time Spent: 12h 50m (was: 12h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 12h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=336945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336945 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 31/Oct/19 17:38 Start Date: 31/Oct/19 17:38 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954 This PR adds the PTuple, which is analogous to Java's PCollectionTuple. This is a very useful output type for composites that want to emit multiple PCollections where each have their own producer. This is a fundamentally different type and a more general type than the DoOutputsTuple. The DoOutputsTuple assumes a single producer for all of its PCollections; this does not. This is being used in https://github.com/apache/beam/pull/9953. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build