[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=126097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-126097
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 23/Jul/18 16:06
Start Date: 23/Jul/18 16:06
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-407111915
 
 
   I missed that in the review; I think those interfaces should have kept
   their previous access.
   
   On Mon, Jul 23, 2018 at 7:25 AM Łukasz Gajowy 
   wrote:
   
   > Just FYI: Some of the interfaces became package-private (BigQueryService,
   > FakeBigQueryService, FakeDatasetService etc). This caused Nexmark suites to
   > fail. Luckily, it seems that those classes were used only for testing
   > purposes (but outside BigQueryIO module).
   >
   > Logs for reference:
   > 
https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Direct/112/console
   >
   > I had trouble to think of a better solution than simply deleting the test,
   > because now it seems impossible to test the code in a similar way (please
   > correct me if I'm wrong): #6018 
   >
   > Maybe it's a good idea to leave at least BigQueryServices public so that
   > it could be used by withTestServices() in other places (like nexmark)?
   > WDYT?
   >
   > CC: @calonso  @reuvenlax
   > 
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 126097)
Time Spent: 5.5h  (was: 5h 20m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=126042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-126042
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 23/Jul/18 14:29
Start Date: 23/Jul/18 14:29
Worklog Time Spent: 10m 
  Work Description: lgajowy edited a comment on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-407076438
 
 
   Just FYI: Some of the interfaces became package-private (BigQueryService, 
FakeBigQueryService, FakeDatasetService etc). This caused Nexmark suites to 
fail. Luckily, it seems that those classes were used only for testing purposes 
(but outside BigQueryIO module).
   
   Logs for reference: 
https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Direct/112/console
   
   I had trouble to think of a better solution than simply deleting the test, 
because now it seems impossible to test the code in a similar way (please 
correct me if I'm wrong): https://github.com/apache/beam/pull/6018. Please 
merge it only if you strongly believe those interfaces/classes should remain 
package-private.
   
   Maybe it's a good idea to leave at least `BigQueryServices` public so that 
it could be used by `withTestServices()` in other places (like nexmark)? WDYT?
   
   CC: @calonso @reuvenlax 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 126042)
Time Spent: 5h 20m  (was: 5h 10m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=126041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-126041
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 23/Jul/18 14:28
Start Date: 23/Jul/18 14:28
Worklog Time Spent: 10m 
  Work Description: lgajowy edited a comment on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-407076438
 
 
   Just FYI: Some of the interfaces became package-private (BigQueryService, 
FakeBigQueryService, FakeDatasetService etc). This caused Nexmark suites to 
fail. Luckily, it seems that those classes were used only for testing purposes 
(but outside BigQueryIO module).
   
   Logs for reference: 
https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Direct/112/console
   
   I had trouble to think of a better solution than simply deleting the test, 
because now it seems impossible to test the code in a similar way (please 
correct me if I'm wrong): https://github.com/apache/beam/pull/6018. Please 
merge it only if you strongly belive those interfaces/classes should remain 
package-private.
   
   Maybe it's a good idea to leave at least `BigQueryServices` public so that 
it could be used by `withTestServices()` in other places (like nexmark)? WDYT?
   
   CC: @calonso @reuvenlax 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 126041)
Time Spent: 5h 10m  (was: 5h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=126040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-126040
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 23/Jul/18 14:25
Start Date: 23/Jul/18 14:25
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-407076438
 
 
   Just FYI: Some of the interfaces became package-private (BigQueryService, 
FakeBigQueryService, FakeDatasetService etc). This caused Nexmark suites to 
fail. Luckily, it seems that those classes were used only for testing purposes 
(but outside BigQueryIO module).
   
   Logs for reference: 
https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_Nexmark_Direct/112/console
   
   I had trouble to think of a better solution than simply deleting the test, 
because now it seems impossible to test the code in a similar way (please 
correct me if I'm wrong): https://github.com/apache/beam/pull/6018
   
   Maybe it's a good idea to leave at least `BigQueryServices` public so that 
it could be used by `withTestServices()` in other places (like nexmark)? WDYT?
   
   CC: @calonso @reuvenlax 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 126040)
Time Spent: 5h  (was: 4h 50m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=125951=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125951
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 23/Jul/18 09:47
Start Date: 23/Jul/18 09:47
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-407001727
 
 
   Great, thanks, more to come!!
   
   What should I do with [the JIRA 
issue](https://issues.apache.org/jira/browse/BEAM-4257)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 125951)
Time Spent: 4h 50m  (was: 4h 40m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=125794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125794
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 22/Jul/18 16:03
Start Date: 22/Jul/18 16:03
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-406877690
 
 
   Thanks for the high-quality work, and sorry this took so long! This is now 
merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 125794)
Time Spent: 4h 40m  (was: 4.5h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=125793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125793
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 22/Jul/18 16:02
Start Date: 22/Jul/18 16:02
Worklog Time Spent: 10m 
  Work Description: reuvenlax closed pull request #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
index 26c96318375..e639f0c2acf 100644
--- 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
@@ -1052,6 +1052,7 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
 .setWriteDisposition(Write.WriteDisposition.WRITE_EMPTY)
 .setNumFileShards(0)
 .setMethod(Write.Method.DEFAULT)
+.setExtendedErrorInfo(false)
 .build();
   }
 
@@ -1158,6 +1159,8 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
 @Nullable
 abstract ValueProvider getCustomGcsTempLocation();
 
+abstract boolean getExtendedErrorInfo();
+
 abstract Builder toBuilder();
 
 @AutoValue.Builder
@@ -1203,6 +1206,8 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
 
   abstract Builder setCustomGcsTempLocation(ValueProvider 
customGcsTempLocation);
 
+  abstract Builder setExtendedErrorInfo(boolean extendedErrorInfo);
+
   abstract Write build();
 }
 
@@ -1482,6 +1487,16 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
   return 
toBuilder().setCustomGcsTempLocation(customGcsTempLocation).build();
 }
 
+/**
+ * Enables extended error information by enabling {@link 
WriteResult#getFailedInsertsWithErr()}
+ *
+ * ATM this only works if using {@link Method#STREAMING_INSERTS}. See 
{@link
+ * Write#withMethod(Method)}.
+ */
+public Write withExtendedErrorInfo() {
+  return toBuilder().setExtendedErrorInfo(true).build();
+}
+
 @VisibleForTesting
 /** This method is for test usage only */
 public Write withTestServices(BigQueryServices testServices) {
@@ -1666,7 +1681,8 @@ public WriteResult expand(PCollection input) {
 StreamingInserts streamingInserts =
 new StreamingInserts<>(getCreateDisposition(), dynamicDestinations)
 .withInsertRetryPolicy(retryPolicy)
-.withTestServices((getBigQueryServices()));
+.withTestServices((getBigQueryServices()))
+.withExtendedErrorInfo(getExtendedErrorInfo());
 return rowsWithDestination.apply(streamingInserts);
   } else {
 checkArgument(
diff --git 
a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryInsertError.java
 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryInsertError.java
new file mode 100644
index 000..cdd814587e8
--- /dev/null
+++ 
b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryInsertError.java
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.google.api.services.bigquery.model.TableDataInsertAllResponse;
+import com.google.api.services.bigquery.model.TableReference;
+import com.google.api.services.bigquery.model.TableRow;
+import java.util.Objects;
+
+/**
+ * Model definition for BigQueryInsertError.
+ *

[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=125447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125447
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/Jul/18 10:38
Start Date: 20/Jul/18 10:38
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-406562550
 
 
   @reuvenlax I think it is working now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 125447)
Time Spent: 4h 20m  (was: 4h 10m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=125384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125384
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/Jul/18 08:07
Start Date: 20/Jul/18 08:07
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-406522379
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 125384)
Time Spent: 4h 10m  (was: 4h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=125262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125262
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 19/Jul/18 22:16
Start Date: 19/Jul/18 22:16
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-406431360
 
 
   @calonso I apologize for the delay, I simply forgot to re-review this. The 
code looks good, and thanks for the detailed unit tests!
   
   There appears to be a compilation breakage. It might be because you are 
based against an old version of master, so I would try to rebase (git fetch 
--all git rebase origin/master) and see if that fixes things. I'll merge this 
PR once compilation and tests pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 125262)
Time Spent: 4h  (was: 3h 50m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=124816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-124816
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 19/Jul/18 06:28
Start Date: 19/Jul/18 06:28
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-406168604
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 124816)
Time Spent: 3h 50m  (was: 3h 40m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=124815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-124815
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 19/Jul/18 06:26
Start Date: 19/Jul/18 06:26
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-406168296
 
 
   @reuvenlax @kennknowles Just a friendly reminder that I believe this PR is 
ready ro review again and that the CI failure seems unrelated...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 124815)
Time Spent: 3h 40m  (was: 3.5h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-07-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=121892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-121892
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 11/Jul/18 13:12
Start Date: 11/Jul/18 13:12
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-404163811
 
 
   Hi @reuvenlax @kennknowles I believe this is ready to review again as the CI 
failure seems unrelated...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 121892)
Time Spent: 3.5h  (was: 3h 20m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=116871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116871
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 28/Jun/18 16:24
Start Date: 28/Jun/18 16:24
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-401093248
 
 
   We have turned on autoformatting of the codebase, which causes small 
conflicts across the board. You can probably safely rebase and just keep your 
changes. Like this:
   
   ```
   $ git rebase
   ... see some conflicts
   $ git diff
   ... confirmed that the conflicts are just autoformatting
   ... so we can just keep our changes are do our own autoformat
   $ git checkout --theirs --
   $ git add -u
   $ git rebase --continue
   $ ./gradlew spotlessJavaApply
   ```
   
   Please ping me if you run into any difficulty. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116871)
Time Spent: 3h 20m  (was: 3h 10m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=112751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112751
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 18/Jun/18 13:35
Start Date: 18/Jun/18 13:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r196077548
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java
 ##
 @@ -41,11 +41,12 @@
  */
 @SystemDoFnInternal
 @VisibleForTesting
-class StreamingWriteFn
+class StreamingWriteFn
 
 Review comment:
   Rename T to be ErrorT, or something more descriptive here. T is often used 
for element type, so this is confusing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 112751)
Time Spent: 3h  (was: 2h 50m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=112750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112750
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 18/Jun/18 13:35
Start Date: 18/Jun/18 13:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r196073075
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1451,6 +1456,19 @@ static String getExtractDestinationUri(String 
extractDestinationDir) {
   return 
toBuilder().setCustomGcsTempLocation(customGcsTempLocation).build();
 }
 
+/**
+ * Enables extended error information by enabling {@link 
WriteResult#getFailedInsertsWithErr()}
+ *
+ * ATM this only works if using {@link Method#STREAMING_INSERTS}.
+ * See {@link Write#withMethod(Method)}.
+ *
+ * Disclaimer: Enabling this may cause your job not to be able to update
+ * (you may need to drain it before)
 
 Review comment:
   Remove this disclaimer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 112750)
Time Spent: 2h 50m  (was: 2h 40m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=112753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112753
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 18/Jun/18 13:35
Start Date: 18/Jun/18 13:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r196077021
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingInserts.java
 ##
 @@ -34,36 +34,47 @@
   private final CreateDisposition createDisposition;
   private final DynamicDestinations dynamicDestinations;
   private InsertRetryPolicy retryPolicy;
+  private boolean extendedErrorInfo;
 
   /** Constructor. */
   public StreamingInserts(CreateDisposition createDisposition,
DynamicDestinations dynamicDestinations) {
 this(createDisposition, dynamicDestinations, new BigQueryServicesImpl(),
-InsertRetryPolicy.alwaysRetry());
+InsertRetryPolicy.alwaysRetry(), false);
   }
 
   /** Constructor. */
   private StreamingInserts(CreateDisposition createDisposition,
   DynamicDestinations 
dynamicDestinations,
   BigQueryServices bigQueryServices,
-  InsertRetryPolicy retryPolicy) {
+  InsertRetryPolicy retryPolicy, boolean 
extendedErrorInfo) {
 
 Review comment:
   put on new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 112753)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=112749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112749
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 18/Jun/18 13:35
Start Date: 18/Jun/18 13:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r196072731
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryCoderProviderRegistrar.java
 ##
 @@ -35,6 +35,8 @@
   public List getCoderProviders() {
 return ImmutableList.of(
 CoderProviders.forCoder(TypeDescriptor.of(TableRow.class), 
TableRowJsonCoder.of()),
-CoderProviders.forCoder(TypeDescriptor.of(TableRowInfo.class), 
TableRowInfoCoder.of()));
+CoderProviders.forCoder(TypeDescriptor.of(TableRowInfo.class), 
TableRowInfoCoder.of()),
+CoderProviders.forCoder(
+TypeDescriptor.of(BigQueryInsertError.class), 
BigQueryInsertErrorCoder.of()));
 
 Review comment:
   Don't think this is needed - you can just explicitly call setCoder inside 
BigQueryIO


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 112749)
Time Spent: 2h 40m  (was: 2.5h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-18 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=112752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112752
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 18/Jun/18 13:35
Start Date: 18/Jun/18 13:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r196075533
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryInsertErrorCoder.java
 ##
 @@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.SerializationFeature;
+import com.google.api.services.bigquery.model.TableDataInsertAllResponse;
+import com.google.api.services.bigquery.model.TableReference;
+import com.google.api.services.bigquery.model.TableRow;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.AtomicCoder;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.values.TypeDescriptor;
+
+/** A {@link Coder} that encodes BigQuery {@link BigQueryInsertError} objects. 
*/
+public class BigQueryInsertErrorCoder extends AtomicCoder 
{
+
+  public static BigQueryInsertErrorCoder of() {
+return INSTANCE;
+  }
+
+  @Override
+  public void encode(BigQueryInsertError value, OutputStream outStream) throws 
IOException {
+String errorStrValue = MAPPER.writeValueAsString(value.getError());
+StringUtf8Coder.of().encode(errorStrValue, outStream);
+
+TableRowJsonCoder.of().encode(value.getRow(), outStream);
+
+String tableStrValue = MAPPER.writeValueAsString(value.getTable());
 
 Review comment:
   No need to encode this class. You should be able to store just the string 
tablespec instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 112752)
Time Spent: 3h 10m  (was: 3h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=111443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111443
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 13/Jun/18 08:39
Start Date: 13/Jun/18 08:39
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-396860741
 
 
   Hi @reuvenlax, just wanted to know if you had a chance to look at this or 
may have an ETA so I don't bother you unnecessarily... Regards


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 111443)
Time Spent: 2.5h  (was: 2h 20m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=108564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108564
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 04/Jun/18 13:23
Start Date: 04/Jun/18 13:23
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-394352749
 
 
   Ok, no rush, I just wanted to make sure you were aware of the change. Nice 
trip!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 108564)
Time Spent: 2h 20m  (was: 2h 10m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=108484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108484
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 04/Jun/18 09:46
Start Date: 04/Jun/18 09:46
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-394297199
 
 
   Hi - sorry for the late reply, as I'm traveling right now. I'll take
   another look at the PR.
   
   On Thu, May 31, 2018 at 11:18 AM Carlos Alonso 
   wrote:
   
   > Hi @reuvenlax , just wanted to friendly
   > remind you about this PR.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 108484)
Time Spent: 2h 10m  (was: 2h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-31 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=107604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107604
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 31/May/18 08:18
Start Date: 31/May/18 08:18
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-393451323
 
 
   Hi @reuvenlax, just wanted to friendly remind you about this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 107604)
Time Spent: 2h  (was: 1h 50m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=105559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105559
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 24/May/18 12:21
Start Date: 24/May/18 12:21
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-391693882
 
 
   Ok, let's see how this new approach looks like. Basically I added a new 
`BigQueryIO.Write#withExtendedErrorInfo()` method so that the user can specify 
if wants it or not. By default it is disabled so everything should be the same 
as before (so jobs should be updatable).
   
   If enabled then they are forced to use the new 
`WriteResults#getFailedInsertsWithErr` when retrieving errors.
   
   Internally I've made Streaming related classes generic but limited to the 
implementations of the `ErrorContainer` interface.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 105559)
Time Spent: 1h 50m  (was: 1h 40m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-21 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103986
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 21/May/18 12:47
Start Date: 21/May/18 12:47
Worklog Time Spent: 10m 
  Work Description: calonso commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r189578728
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java
 ##
 @@ -105,8 +105,8 @@ public WriteResult expand(PCollection> input) {
 "StreamingWrite",
 ParDo.of(new StreamingWriteFn(bigQueryServices, retryPolicy, 
failedInsertsTag))
 .withOutputTags(mainOutputTag, 
TupleTagList.of(failedInsertsTag)));
-PCollection failedInserts = tuple.get(failedInsertsTag);
-failedInserts.setCoder(TableRowJsonCoder.of());
+PCollection failedInserts = 
tuple.get(failedInsertsTag);
+failedInserts.setCoder(BigQueryInsertErrorCoder.of());
 
 Review comment:
    


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103986)
Time Spent: 1h 40m  (was: 1.5h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103782
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/May/18 22:49
Start Date: 20/May/18 22:49
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on a change in pull request #5341: 
[BEAM-4257] Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#discussion_r189472901
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java
 ##
 @@ -105,8 +105,8 @@ public WriteResult expand(PCollection> input) {
 "StreamingWrite",
 ParDo.of(new StreamingWriteFn(bigQueryServices, retryPolicy, 
failedInsertsTag))
 .withOutputTags(mainOutputTag, 
TupleTagList.of(failedInsertsTag)));
-PCollection failedInserts = tuple.get(failedInsertsTag);
-failedInserts.setCoder(TableRowJsonCoder.of());
+PCollection failedInserts = 
tuple.get(failedInsertsTag);
+failedInserts.setCoder(BigQueryInsertErrorCoder.of());
 
 Review comment:
   This is an update-incompatible change of the type we try to avoid. Many 
users of streaming rely on the ability to do online updates of running 
pipelines. Changing the type of a PCollection breaks that (the update will be 
rejected as incompatible), so we try to avoid this. The best solution might be 
to simply create a new PCollection and leave the old one around (but not 
published to).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103782)
Time Spent: 1.5h  (was: 1h 20m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103763
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/May/18 18:39
Start Date: 20/May/18 18:39
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-390502110
 
 
   How does creating a new method in `WriteResults` class to get this 'extended 
errors collection'? 
   
   Something like `failedInsertsErrorInfo` and leaving the original 
`failedInserts` untouched.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103763)
Time Spent: 1h 20m  (was: 1h 10m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103757
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/May/18 17:45
Start Date: 20/May/18 17:45
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-390498916
 
 
   I'm not sure how to easily make this backwards compatible. Changing the
   type of the output PCollection (from PCollection to
   PCollection) is not a compatible change.
   
   
   
   On Sun, May 20, 2018 at 5:27 AM Carlos Alonso 
   wrote:
   
   > Hi @reuvenlax . Many thanks for your
   > comments. Completely agree on the backwards incompatibility issue, my bad.
   > Will fix it ASAP.
   >
   > On the other comment, regarding extending the information in the retry
   > policy I think I'm not following you. My idea is that for every insertion
   > error have the details on why it failed and the table it was going towards
   > (we use dynamic destinations). All of them in a PCollection so that the
   > pipeline can do something with them (store them in a dead-letter as of
   > today for further inspection, that's why the error and the table
   > destination is so important).
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103757)
Time Spent: 1h 10m  (was: 1h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103748
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/May/18 15:24
Start Date: 20/May/18 15:24
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-390490319
 
 
   Thinking about this... Maybe you meant to say 'extend the information in the 
`TableRow`' instead of the retry policy?
   
   That could definitely be an option given that `TableRow` accepts storing 
anything, but I think it is clearer to actually use a class that holds all the 
information one may need to debug/inspect the error.
   
   Another idea I had is to create a new method in `WriteResult` class, 
something like `failedInsertsExtendedInfo` instead of changing the original 
`failedInserts`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103748)
Time Spent: 1h  (was: 50m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-20 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103737
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 20/May/18 12:27
Start Date: 20/May/18 12:27
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-390477232
 
 
   Hi @reuvenlax. Many thanks for your comments. Completely agree on the 
backwards incompatibility issue, my bad. Will fix it ASAP.
   
   On the other comment, regarding extending the information in the retry 
policy I think I'm not following you. My idea is that for every insertion error 
have the details on why it failed and the table it was going towards (we use 
dynamic destinations). All of them in a PCollection so that the pipeline can do 
something with them (store them in a dead-letter as of today for further 
inspection, that's why the error and the table destination is so important).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103737)
Time Spent: 50m  (was: 40m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=103653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-103653
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 19/May/18 06:01
Start Date: 19/May/18 06:01
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-390382315
 
 
   High-level thought: why do you need a new BigQueryInsertError class? Is 
there a reason why you can't simply extend the information in the retry policy?
   
   One issue with this PR is that it is backwards incompatible, meaning that 
current pipelines will have trouble with it. We generally don't make 
backwards-incompatible changes except across major Beam versions unfortunately.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 103653)
Time Spent: 40m  (was: 0.5h)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=102465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-102465
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 16/May/18 12:38
Start Date: 16/May/18 12:38
Worklog Time Spent: 10m 
  Work Description: calonso commented on issue #5341: [BEAM-4257] Increases 
BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-389503389
 
 
   Hi @akedin, this is my very first PR against Beam :) How does it look? Do 
you need any help/clarification?
   
   I think the failure is not related to my changes...
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 102465)
Time Spent: 0.5h  (was: 20m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-14 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=101743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-101743
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 14/May/18 14:23
Start Date: 14/May/18 14:23
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341#issuecomment-388835105
 
 
   FYI @akedin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 101743)
Time Spent: 20m  (was: 10m)

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-4257) Add error reason and table destination to BigQueryIO streaming failed inserts

2018-05-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4257?focusedWorklogId=101259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-101259
 ]

ASF GitHub Bot logged work on BEAM-4257:


Author: ASF GitHub Bot
Created on: 11/May/18 18:31
Start Date: 11/May/18 18:31
Worklog Time Spent: 10m 
  Work Description: calonso opened a new pull request #5341: [BEAM-4257] 
Increases BigQuery streaming error information
URL: https://github.com/apache/beam/pull/5341
 
 
   This PR introduces a new `BigQueryInsertError` class to return for each 
particular row that could not be successfully inserted to help on callers 
troubleshooting failures.
   
   `BigQueryInsertError`'s objects contain: 
   * The `TableRow` that could not be inserted
   * The `InsertErrors` that were generated
   * The `TableReference` where the `TableRow` was going.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 101259)
Time Spent: 10m
Remaining Estimate: 0h

> Add error reason and table destination to BigQueryIO streaming failed inserts
> -
>
> Key: BEAM-4257
> URL: https://issues.apache.org/jira/browse/BEAM-4257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Carlos Alonso
>Assignee: Carlos Alonso
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When using `BigQueryIO.Write` and getting `WriteResult.getFailedInserts()` we 
> get a `PCollection` which is fine, but in order to properly work on 
> the errors downstream having extended information such as the `InsertError` 
> fields and the `TableReference` it was routed to would be really valuable.
>  
> My suggestion is to create a new object that contains all that information 
> and return a `PCollection` of those instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)