[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-29 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=96547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-96547
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 30/Apr/18 05:34
Start Date: 30/Apr/18 05:34
Worklog Time Spent: 10m 
  Work Description: chamikaramj closed pull request #4946: [BEAM-3973] Adds 
a parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/NaiveSpannerRead.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/NaiveSpannerRead.java
new file mode 100644
index 000..3b68d9f91a8
--- /dev/null
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/NaiveSpannerRead.java
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.spanner;
+
+import com.google.auto.value.AutoValue;
+import com.google.cloud.spanner.BatchReadOnlyTransaction;
+import com.google.cloud.spanner.ResultSet;
+import com.google.cloud.spanner.Struct;
+import com.google.cloud.spanner.TimestampBound;
+import com.google.common.annotations.VisibleForTesting;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/** A naive version of Spanner read that doesn't use the Batch API. */
+@VisibleForTesting
+@AutoValue
+abstract class NaiveSpannerRead
+extends PTransform, PCollection> {
+
+  public static NaiveSpannerRead create(SpannerConfig spannerConfig,
+  PCollectionView txView, TimestampBound timestampBound) {
+return new AutoValue_NaiveSpannerRead(spannerConfig, txView, 
timestampBound);
+  }
+
+  abstract SpannerConfig getSpannerConfig();
+
+  @Nullable
+  abstract PCollectionView getTxView();
+
+  abstract TimestampBound getTimestampBound();
+
+  @Override
+  public PCollection expand(PCollection input) {
+PCollectionView txView = getTxView();
+if (txView == null) {
+  Pipeline begin = input.getPipeline();
+  SpannerIO.CreateTransaction createTx = SpannerIO.createTransaction()
+  
.withSpannerConfig(getSpannerConfig()).withTimestampBound(getTimestampBound());
+  txView = begin.apply(createTx);
+}
+
+return input.apply("Naive read from Cloud Spanner",
+ParDo.of(new NaiveSpannerReadFn(getSpannerConfig(), 
txView)).withSideInputs(txView));
+  }
+
+  private static class NaiveSpannerReadFn extends DoFn {
+
+private final SpannerConfig config;
+@Nullable private final PCollectionView txView;
+private transient SpannerAccessor spannerAccessor;
+
+NaiveSpannerReadFn(SpannerConfig config, @Nullable 
PCollectionView transaction) {
+  this.config = config;
+  this.txView = transaction;
+}
+
+@Setup
+public void setup() throws Exception {
+  spannerAccessor = config.connectToSpanner();
+}
+
+@Teardown
+public void teardown() throws Exception {
+  spannerAccessor.close();
+}
+
+@ProcessElement
+public void processElement(ProcessContext c) throws Exception {
+  Transaction tx = c.sideInput(txView);
+  ReadOperation op = c.element();
+  BatchReadOnlyTransaction context = spannerAccessor.getBatchClient()
+  .batchReadOnlyTransaction(tx.transactionId());
+  try (ResultSet resultSet = execute(op, context)) {
+while (resultSet.next()) {
+  c.output(resultSet.getCurrentRowAsStruct());
+}
+  }
+}
+
+private ResultSet exe

[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95775
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 26/Apr/18 23:30
Start Date: 26/Apr/18 23:30
Worklog Time Spent: 10m 
  Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-384819222
 
 
   @chamikaramj ptal


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 95775)
Time Spent: 3.5h  (was: 3h 20m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95772
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 26/Apr/18 23:25
Start Date: 26/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r184556370
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
 ##
 @@ -329,12 +333,26 @@ public ReadAll withTimestampBound(TimestampBound 
timestampBound) {
   return toBuilder().setTimestampBound(timestampBound).build();
 }
 
+/** If true the uses Cloud Spanner batch API. */
 
 Review comment:
   Done. I think `without` method will be more confusing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 95772)
Time Spent: 3h  (was: 2h 50m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95773
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 26/Apr/18 23:25
Start Date: 26/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r184557523
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +184,59 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  private SpannerConfig createSpannerConfig() {
+return SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+  }
+
+  @Test
+  public void testReadAllRecordsInDb() throws Exception {
+DatabaseClient databaseClient = getDatabaseClient();
+
+List mutations = new ArrayList<>();
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 95773)
Time Spent: 3h 10m  (was: 3h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-26 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=95774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-95774
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 26/Apr/18 23:25
Start Date: 26/Apr/18 23:25
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r184556560
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+
+PCollectionView tx =
+p.apply(
+SpannerIO.createTransaction()
+.withSpannerConfig(spannerConfig)
+.withTimestampBound(TimestampBound.strong()));
+
+PCollection allRecords = p.apply(SpannerIO.read()
+.withSpannerConfig(spannerConfig)
+.withBatching(false)
 
 Review comment:
   Ah, unfortunately Cloud Spanner query planner is not open source. I don't 
think we are able to implement this today.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 95774)
Time Spent: 3h 20m  (was: 3h 10m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-06 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88687
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 07/Apr/18 02:24
Start Date: 07/Apr/18 02:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #4946: [BEAM-3973] Adds 
a parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-379425746
 
 
   Please let me know when the comments are addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88687)
Time Spent: 2h 50m  (was: 2h 40m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-05 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88178
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 18:15
Start Date: 05/Apr/18 18:15
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that 
can disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179554941
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
 ##
 @@ -329,12 +333,26 @@ public ReadAll withTimestampBound(TimestampBound 
timestampBound) {
   return toBuilder().setTimestampBound(timestampBound).build();
 }
 
+/** If true the uses Cloud Spanner batch API. */
 
 Review comment:
   Can you clarify in the documentation that batching is the default ? 
Alternatively, wow about just having a method withoutBatching() that can be 
used to disable batching ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88178)
Time Spent: 2.5h  (was: 2h 20m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-05 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88179
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 18:15
Start Date: 05/Apr/18 18:15
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on a change in pull request 
#4946: [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that 
can disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179554941
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
 ##
 @@ -329,12 +333,26 @@ public ReadAll withTimestampBound(TimestampBound 
timestampBound) {
   return toBuilder().setTimestampBound(timestampBound).build();
 }
 
+/** If true the uses Cloud Spanner batch API. */
 
 Review comment:
   Can you clarify in the documentation that batching is the default ? 
Alternatively, how about just having a method withoutBatching() that can be 
used to disable batching ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88179)
Time Spent: 2h 40m  (was: 2.5h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-05 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88024
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 12:50
Start Date: 05/Apr/18 12:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179449485
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +184,59 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  private SpannerConfig createSpannerConfig() {
+return SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+  }
+
+  @Test
+  public void testReadAllRecordsInDb() throws Exception {
+DatabaseClient databaseClient = getDatabaseClient();
+
+List mutations = new ArrayList<>();
 
 Review comment:
   I referred also to this part too, and what I expected was more in the line 
of a makeTableData method like [BigtableIOTest's 
one](https://github.com/apache/beam/blob/50a84326581941bc1edf573a0ad2b798ecb0f6a1/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java#L995)
 given that all the tests are using the same data.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88024)
Time Spent: 2h 20m  (was: 2h 10m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-05 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88021
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 12:49
Start Date: 05/Apr/18 12:49
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-378924305
 
 
   @chamikaramj It should be ok now, Can you please take a second look and 
merge if you agree.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88021)
Time Spent: 2h 10m  (was: 2h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-05 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88019
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 12:46
Start Date: 05/Apr/18 12:46
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179449485
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +184,59 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  private SpannerConfig createSpannerConfig() {
+return SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+  }
+
+  @Test
+  public void testReadAllRecordsInDb() throws Exception {
+DatabaseClient databaseClient = getDatabaseClient();
+
+List mutations = new ArrayList<>();
 
 Review comment:
   I referred also to this part too, and what I expected was more in the line 
of a makeTableData method like [BigtableIOTest's 
one](https://github.com/apache/beam/blob/50a84326581941bc1edf573a0ad2b798ecb0f6a1/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIOTest.java#L995).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88019)
Time Spent: 2h  (was: 1h 50m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-05 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=88017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-88017
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 12:33
Start Date: 05/Apr/18 12:33
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179446138
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+
+PCollectionView tx =
+p.apply(
+SpannerIO.createTransaction()
+.withSpannerConfig(spannerConfig)
+.withTimestampBound(TimestampBound.strong()));
+
+PCollection allRecords = p.apply(SpannerIO.read()
+.withSpannerConfig(spannerConfig)
+.withBatching(false)
 
 Review comment:
   I don't know if I was clear. I don't intend to make this implicit or the IO 
'smarter', I think the explicit batching approach is ok, I was just wondering 
if we can detect eagerly if the query is not root partitionable and the user 
chose the batching to break even before executing the complete thing. Maybe we 
can do something like that in a different JIRA (as well as the test for it).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 88017)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87831
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 00:15
Start Date: 05/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179319302
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87831)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87832&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87832
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 00:15
Start Date: 05/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179319684
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+
+PCollectionView tx =
+p.apply(
+SpannerIO.createTransaction()
+.withSpannerConfig(spannerConfig)
+.withTimestampBound(TimestampBound.strong()));
+
+PCollection allRecords = p.apply(SpannerIO.read()
+.withSpannerConfig(spannerConfig)
+.withBatching(false)
 
 Review comment:
   So the alternative would to catch the root partitionable exception and fall 
back to naive read. I prefer to keep this transparent flag here, we'd rather 
fail the pipeline and give the user feedback, than silently run the inefficient 
query.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87832)
Time Spent: 1.5h  (was: 1h 20m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87833
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 00:15
Start Date: 05/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179319684
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+
+PCollectionView tx =
+p.apply(
+SpannerIO.createTransaction()
+.withSpannerConfig(spannerConfig)
+.withTimestampBound(TimestampBound.strong()));
+
+PCollection allRecords = p.apply(SpannerIO.read()
+.withSpannerConfig(spannerConfig)
+.withBatching(false)
 
 Review comment:
   So the alternative would to be catch the root partitionable exception and 
fall back to naive read. I prefer to keep this transparent flag here, we'd 
rather fail the pipeline and give the user feedback, than silently run the 
inefficient query.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87833)
Time Spent: 1h 40m  (was: 1.5h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87829
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 00:15
Start Date: 05/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179319227
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87829)
Time Spent: 1h 10m  (was: 1h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87830
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 05/Apr/18 00:15
Start Date: 05/Apr/18 00:15
Worklog Time Spent: 10m 
  Work Description: mairbek commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179319153
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
 
 Review comment:
   It does call read all later on, but I've made the name more descriptive 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87830)
Time Spent: 1h 10m  (was: 1h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87564
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 04/Apr/18 14:33
Start Date: 04/Apr/18 14:33
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179141144
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
 
 Review comment:
   minor nitpick, can you rename this ? I was expecting this one to use 
SpannerIO.readAll() from its name, also maybe worth to mention that it tests 
the naive case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87564)
Time Spent: 1h  (was: 50m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87563&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87563
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 04/Apr/18 14:33
Start Date: 04/Apr/18 14:33
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179143055
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
 
 Review comment:
   We can make the data creation code into a method, since every test is 
reusing exactly the same code, and refactor this in the other tests too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87563)
Time Spent: 1h  (was: 50m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87565
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 04/Apr/18 14:33
Start Date: 04/Apr/18 14:33
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179146877
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
+.withProjectId(project)
+.withInstanceId(options.getInstanceId())
+.withDatabaseId(databaseName);
+
+PCollectionView tx =
+p.apply(
+SpannerIO.createTransaction()
+.withSpannerConfig(spannerConfig)
+.withTimestampBound(TimestampBound.strong()));
+
+PCollection allRecords = p.apply(SpannerIO.read()
+.withSpannerConfig(spannerConfig)
+.withBatching(false)
 
 Review comment:
   Is there a way to detect that a user is using a non ŕoot partitionable query 
without using the right batching flag ? I wonder if it is worth to create a 
test for this error case, and if we can find it early on via some call in the 
API maybe we should add this to the expand. (I saw a TODO there but not sure if 
it is for the same goal).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87565)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87562
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 04/Apr/18 14:33
Start Date: 04/Apr/18 14:33
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #4946: 
[BEAM-3973] Adds a parameter to the Cloud Spanner read connector that can 
disable batch API
URL: https://github.com/apache/beam/pull/4946#discussion_r179143531
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadIT.java
 ##
 @@ -193,6 +196,52 @@ public void testQuery() throws Exception {
 p.run();
   }
 
+  @Test
+  public void testReadAll() throws Exception {
+DatabaseClient databaseClient =
+spanner.getDatabaseClient(
+DatabaseId.of(
+project, options.getInstanceId(), databaseName));
+
+List mutations = new ArrayList<>();
+for (int i = 0; i < 5L; i++) {
+  mutations.add(
+  Mutation.newInsertOrUpdateBuilder(options.getTable())
+  .set("key")
+  .to((long) i)
+  .set("value")
+  .to(RandomUtils.randomAlphaNumeric(100))
+  .build());
+}
+
+databaseClient.writeAtLeastOnce(mutations);
+
+SpannerConfig spannerConfig = SpannerConfig.create()
 
 Review comment:
   We can do this into a method (or attribute) so we don't repeat the code as 
in the other tests, can you please fix this in the other methods too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87562)
Time Spent: 1h  (was: 50m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87525&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87525
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 04/Apr/18 12:59
Start Date: 04/Apr/18 12:59
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-378590991
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87525)
Time Spent: 40m  (was: 0.5h)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=87526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87526
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 04/Apr/18 12:59
Start Date: 04/Apr/18 12:59
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-378590991
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87526)
Time Spent: 50m  (was: 40m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-03-30 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=86112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86112
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 30/Mar/18 18:29
Start Date: 30/Mar/18 18:29
Worklog Time Spent: 10m 
  Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-377590656
 
 
   @vkedia PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86112)
Time Spent: 0.5h  (was: 20m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the table names from the 
> information_schema.* and reading the content of those tables in the next 
> step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-03-29 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=85807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85807
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 29/Mar/18 22:02
Start Date: 29/Mar/18 22:02
Worklog Time Spent: 10m 
  Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-377386198
 
 
   Please take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 85807)
Time Spent: 20m  (was: 10m)

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the tables from the information_schema.* and 
> reading the content of the schema schema in the next step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3973) Allow to disable batch API in SpannerIO

2018-03-29 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3973?focusedWorklogId=85806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-85806
 ]

ASF GitHub Bot logged work on BEAM-3973:


Author: ASF GitHub Bot
Created on: 29/Mar/18 22:02
Start Date: 29/Mar/18 22:02
Worklog Time Spent: 10m 
  Work Description: mairbek commented on issue #4946: [BEAM-3973] Adds a 
parameter to the Cloud Spanner read connector that can disable batch API
URL: https://github.com/apache/beam/pull/4946#issuecomment-377386198
 
 
   Please take look


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 85806)
Time Spent: 10m
Remaining Estimate: 0h

> Allow to disable batch API in SpannerIO
> ---
>
> Key: BEAM-3973
> URL: https://issues.apache.org/jira/browse/BEAM-3973
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.4.0
>Reporter: Mairbek Khadikov
>Assignee: Mairbek Khadikov
>Priority: Major
> Fix For: 2.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In 2.4.0, SpannerIO#read has been migrated to use batch API. The batch API 
> provides abstractions to scale out reads from Spanner, but it requires the 
> query to be root-partitionable. The root-partitionable queries cover majority 
> of the use cases, however there are examples when running arbitrary query is 
> useful. For example, reading all the tables from the information_schema.* and 
> reading the content of the schema schema in the next step. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)