[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=96547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-96547 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 30/Apr/18 05:34 Start Date: 30/Apr/18 05:34 Worklog Time Spent: 10m Work Description: chamikaramj closed pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/NaiveSpannerRead.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/NaiveSpannerRead.java new file mode 100644 index 000..3b68d9f91a8 --- /dev/null +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/NaiveSpannerRead.java @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.spanner; + +import com.google.auto.value.AutoValue; +import com.google.cloud.spanner.BatchReadOnlyTransaction; +import com.google.cloud.spanner.ResultSet; +import com.google.cloud.spanner.Struct; +import com.google.cloud.spanner.TimestampBound; +import com.google.common.annotations.VisibleForTesting; +import javax.annotation.Nullable; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionView; + +/** A naive version of Spanner read that doesn't use the Batch API. */ +@VisibleForTesting +@AutoValue +abstract class NaiveSpannerRead +extends PTransform, PCollection> { + + public static NaiveSpannerRead create(SpannerConfig spannerConfig, + PCollectionView txView, TimestampBound timestampBound) { +return new AutoValue_NaiveSpannerRead(spannerConfig, txView, timestampBound); + } + + abstract SpannerConfig getSpannerConfig(); + + @Nullable + abstract PCollectionView getTxView(); + + abstract TimestampBound getTimestampBound(); + + @Override + public PCollection expand(PCollection input) { +PCollectionView txView = getTxView(); +if (txView == null) { + Pipeline begin = input.getPipeline(); + SpannerIO.CreateTransaction createTx = SpannerIO.createTransaction() + .withSpannerConfig(getSpannerConfig()).withTimestampBound(getTimestampBound()); + txView = begin.apply(createTx); +} + +return input.apply("Naive read from Cloud Spanner", +ParDo.of(new NaiveSpannerReadFn(getSpannerConfig(), txView)).withSideInputs(txView)); + } + + private static class NaiveSpannerReadFn extends DoFn { + +private final SpannerConfig config; +@Nullable private final PCollectionView txView; +private transient SpannerAccessor spannerAccessor; + +NaiveSpannerReadFn(SpannerConfig config, @Nullable PCollectionView transaction) { + this.config = config; + this.txView = transaction; +} + +@Setup +public void setup() throws Exception { + spannerAccessor = config.connectToSpanner(); +} + +@Teardown +public void teardown() throws Exception { + spannerAccessor.close(); +} + +@ProcessElement +public void processElement(ProcessContext c) throws Exception { + Transaction tx = c.sideInput(txView); + ReadOperation op = c.element(); + BatchReadOnlyTransaction context = spannerAccessor.getBatchClient() + .batchReadOnlyTransaction(tx.transactionId()); + try (ResultSet resultSet = execute(op, context)) { +while (resultSet.next()) { + c.output(resultSet.getCurrentRowAsStruct()); +} + } +} + +private ResultSet exe
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95775 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 26/Apr/18 23:30 Start Date: 26/Apr/18 23:30 Worklog Time Spent: 10m Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-384819222 @chamikaramj ptal This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 95775) Time Spent: 3.5h (was: 3h 20m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95772 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 26/Apr/18 23:25 Start Date: 26/Apr/18 23:25 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r184556370 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java ## @@ -329,12 +333,26 @@ public ReadAll withTimestampBound(TimestampBound timestampBound) { return toBuilder().setTimestampBound(timestampBound).build(); } +/** If true the uses Cloud Spanner batch API. */ Review comment: Done. I think `without` method will be more confusing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 95772) Time Spent: 3h (was: 2h 50m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 3h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95773 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 26/Apr/18 23:25 Start Date: 26/Apr/18 23:25 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r184557523 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +184,59 @@ public void testQuery() throws Exception { p.run(); } + private SpannerConfig createSpannerConfig() { +return SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + } + + @Test + public void testReadAllRecordsInDb() throws Exception { +DatabaseClient databaseClient = getDatabaseClient(); + +List mutations = new ArrayList<>(); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 95773) Time Spent: 3h 10m (was: 3h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95774 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 26/Apr/18 23:25 Start Date: 26/Apr/18 23:25 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r184556560 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + +PCollectionView tx = +p.apply( +SpannerIO.createTransaction() +.withSpannerConfig(spannerConfig) +.withTimestampBound(TimestampBound.strong())); + +PCollection allRecords = p.apply(SpannerIO.read() +.withSpannerConfig(spannerConfig) +.withBatching(false) Review comment: Ah, unfortunately Cloud Spanner query planner is not open source. I don't think we are able to implement this today. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 95774) Time Spent: 3h 20m (was: 3h 10m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88687 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 07/Apr/18 02:24 Start Date: 07/Apr/18 02:24 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-379425746 Please let me know when the comments are addressed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88687) Time Spent: 2h 50m (was: 2h 40m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88178 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 18:15 Start Date: 05/Apr/18 18:15 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179554941 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java ## @@ -329,12 +333,26 @@ public ReadAll withTimestampBound(TimestampBound timestampBound) { return toBuilder().setTimestampBound(timestampBound).build(); } +/** If true the uses Cloud Spanner batch API. */ Review comment: Can you clarify in the documentation that batching is the default ? Alternatively, wow about just having a method withoutBatching() that can be used to disable batching ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88178) Time Spent: 2.5h (was: 2h 20m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88179 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 18:15 Start Date: 05/Apr/18 18:15 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179554941 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java ## @@ -329,12 +333,26 @@ public ReadAll withTimestampBound(TimestampBound timestampBound) { return toBuilder().setTimestampBound(timestampBound).build(); } +/** If true the uses Cloud Spanner batch API. */ Review comment: Can you clarify in the documentation that batching is the default ? Alternatively, how about just having a method withoutBatching() that can be used to disable batching ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88179) Time Spent: 2h 40m (was: 2.5h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88024 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 12:50 Start Date: 05/Apr/18 12:50 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179449485 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +184,59 @@ public void testQuery() throws Exception { p.run(); } + private SpannerConfig createSpannerConfig() { +return SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + } + + @Test + public void testReadAllRecordsInDb() throws Exception { +DatabaseClient databaseClient = getDatabaseClient(); + +List mutations = new ArrayList<>(); Review comment: I referred also to this part too, and what I expected was more in the line of a makeTableData method like [BigtableIOTest's one](https://github.com/apache/beam/blob/50a84326581941bc1edf573a0ad2b798ecb0f6a1/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java#L995) given that all the tests are using the same data. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88024) Time Spent: 2h 20m (was: 2h 10m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88021 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 12:49 Start Date: 05/Apr/18 12:49 Worklog Time Spent: 10m Work Description: iemejia commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-378924305 @chamikaramj It should be ok now, Can you please take a second look and merge if you agree. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88021) Time Spent: 2h 10m (was: 2h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88019 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 12:46 Start Date: 05/Apr/18 12:46 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179449485 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +184,59 @@ public void testQuery() throws Exception { p.run(); } + private SpannerConfig createSpannerConfig() { +return SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + } + + @Test + public void testReadAllRecordsInDb() throws Exception { +DatabaseClient databaseClient = getDatabaseClient(); + +List mutations = new ArrayList<>(); Review comment: I referred also to this part too, and what I expected was more in the line of a makeTableData method like [BigtableIOTest's one](https://github.com/apache/beam/blob/50a84326581941bc1edf573a0ad2b798ecb0f6a1/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java#L995). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88019) Time Spent: 2h (was: 1h 50m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 2h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88017 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 12:33 Start Date: 05/Apr/18 12:33 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179446138 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + +PCollectionView tx = +p.apply( +SpannerIO.createTransaction() +.withSpannerConfig(spannerConfig) +.withTimestampBound(TimestampBound.strong())); + +PCollection allRecords = p.apply(SpannerIO.read() +.withSpannerConfig(spannerConfig) +.withBatching(false) Review comment: I don't know if I was clear. I don't intend to make this implicit or the IO 'smarter', I think the explicit batching approach is ok, I was just wondering if we can detect eagerly if the query is not root partitionable and the user chose the batching to break even before executing the complete thing. Maybe we can do something like that in a different JIRA (as well as the test for it). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 88017) Time Spent: 1h 50m (was: 1h 40m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87831 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 00:15 Start Date: 05/Apr/18 00:15 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179319302 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87831) Time Spent: 1h 20m (was: 1h 10m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87832 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 00:15 Start Date: 05/Apr/18 00:15 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179319684 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + +PCollectionView tx = +p.apply( +SpannerIO.createTransaction() +.withSpannerConfig(spannerConfig) +.withTimestampBound(TimestampBound.strong())); + +PCollection allRecords = p.apply(SpannerIO.read() +.withSpannerConfig(spannerConfig) +.withBatching(false) Review comment: So the alternative would to catch the root partitionable exception and fall back to naive read. I prefer to keep this transparent flag here, we'd rather fail the pipeline and give the user feedback, than silently run the inefficient query. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87832) Time Spent: 1.5h (was: 1h 20m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87833 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 00:15 Start Date: 05/Apr/18 00:15 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179319684 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + +PCollectionView tx = +p.apply( +SpannerIO.createTransaction() +.withSpannerConfig(spannerConfig) +.withTimestampBound(TimestampBound.strong())); + +PCollection allRecords = p.apply(SpannerIO.read() +.withSpannerConfig(spannerConfig) +.withBatching(false) Review comment: So the alternative would to be catch the root partitionable exception and fall back to naive read. I prefer to keep this transparent flag here, we'd rather fail the pipeline and give the user feedback, than silently run the inefficient query. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87833) Time Spent: 1h 40m (was: 1.5h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87829 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 00:15 Start Date: 05/Apr/18 00:15 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179319227 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87829) Time Spent: 1h 10m (was: 1h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87830 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 05/Apr/18 00:15 Start Date: 05/Apr/18 00:15 Worklog Time Spent: 10m Work Description: mairbek commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179319153 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { Review comment: It does call read all later on, but I've made the name more descriptive This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87830) Time Spent: 1h 10m (was: 1h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87564 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 04/Apr/18 14:33 Start Date: 04/Apr/18 14:33 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179141144 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { Review comment: minor nitpick, can you rename this ? I was expecting this one to use SpannerIO.readAll() from its name, also maybe worth to mention that it tests the naive case. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87564) Time Spent: 1h (was: 50m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87563 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 04/Apr/18 14:33 Start Date: 04/Apr/18 14:33 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179143055 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = Review comment: We can make the data creation code into a method, since every test is reusing exactly the same code, and refactor this in the other tests too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87563) Time Spent: 1h (was: 50m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87565 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 04/Apr/18 14:33 Start Date: 04/Apr/18 14:33 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179146877 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() +.withProjectId(project) +.withInstanceId(options.getInstanceId()) +.withDatabaseId(databaseName); + +PCollectionView tx = +p.apply( +SpannerIO.createTransaction() +.withSpannerConfig(spannerConfig) +.withTimestampBound(TimestampBound.strong())); + +PCollection allRecords = p.apply(SpannerIO.read() +.withSpannerConfig(spannerConfig) +.withBatching(false) Review comment: Is there a way to detect that a user is using a non ŕoot partitionable query without using the right batching flag ? I wonder if it is worth to create a test for this error case, and if we can find it early on via some call in the API maybe we should add this to the expand. (I saw a TODO there but not sure if it is for the same goal). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87565) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87562 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 04/Apr/18 14:33 Start Date: 04/Apr/18 14:33 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#discussion_r179143531 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java ## @@ -193,6 +196,52 @@ public void testQuery() throws Exception { p.run(); } + @Test + public void testReadAll() throws Exception { +DatabaseClient databaseClient = +spanner.getDatabaseClient( +DatabaseId.of( +project, options.getInstanceId(), databaseName)); + +List mutations = new ArrayList<>(); +for (int i = 0; i < 5L; i++) { + mutations.add( + Mutation.newInsertOrUpdateBuilder(options.getTable()) + .set("key") + .to((long) i) + .set("value") + .to(RandomUtils.randomAlphaNumeric(100)) + .build()); +} + +databaseClient.writeAtLeastOnce(mutations); + +SpannerConfig spannerConfig = SpannerConfig.create() Review comment: We can do this into a method (or attribute) so we don't repeat the code as in the other tests, can you please fix this in the other methods too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87562) Time Spent: 1h (was: 50m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87525&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87525 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 04/Apr/18 12:59 Start Date: 04/Apr/18 12:59 Worklog Time Spent: 10m Work Description: iemejia commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-378590991 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87525) Time Spent: 40m (was: 0.5h) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87526 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 04/Apr/18 12:59 Start Date: 04/Apr/18 12:59 Worklog Time Spent: 10m Work Description: iemejia commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-378590991 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 87526) Time Spent: 50m (was: 40m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=86112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86112 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 30/Mar/18 18:29 Start Date: 30/Mar/18 18:29 Worklog Time Spent: 10m Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-377590656 @vkedia PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 86112) Time Spent: 0.5h (was: 20m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the table names from the > information_schema.* and reading the content of those tables in the next > step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=85807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85807 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 29/Mar/18 22:02 Start Date: 29/Mar/18 22:02 Worklog Time Spent: 10m Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-377386198 Please take a look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 85807) Time Spent: 20m (was: 10m) > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the tables from the information_schema.* and > reading the content of the schema schema in the next step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO
[ https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=85806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85806 ] ASF GitHub Bot logged work on BEAM-3973: Author: ASF GitHub Bot Created on: 29/Mar/18 22:02 Start Date: 29/Mar/18 22:02 Worklog Time Spent: 10m Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can disable batch API URL: https://github.com/apache/beam/pull/4946#issuecomment-377386198 Please take look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 85806) Time Spent: 10m Remaining Estimate: 0h > Allow to disable batch API in SpannerIO > --- > > Key: BEAM-3973 > URL: https://issues.apache.org/jira/browse/BEAM-3973 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.4.0 >Reporter: Mairbek Khadikov >Assignee: Mairbek Khadikov >Priority: Major > Fix For: 2.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API > provides abstractions to scale out reads from Spanner, but it requires the > query to be root-partitionable. The root-partitionable queries cover majority > of the use cases, however there are examples when running arbitrary query is > useful. For example, reading all the tables from the information_schema.* and > reading the content of the schema schema in the next step. -- This message was sent by Atlassian JIRA (v7.6.3#76005)