[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=114344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-114344 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 21/Jun/18 14:10 Start Date: 21/Jun/18 14:10 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5713: [BEAM-4283] Fix naming of the BigQuery fields URL: https://github.com/apache/beam/pull/5713#issuecomment-399117062 thx @iemejia ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 114344) Time Spent: 8h 40m (was: 8.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.6.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=114335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-114335 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 21/Jun/18 13:23 Start Date: 21/Jun/18 13:23 Worklog Time Spent: 10m Work Description: iemejia closed pull request #5713: [BEAM-4283] Fix naming of the BigQuery fields URL: https://github.com/apache/beam/pull/5713 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java b/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java index ceed04774c1..10317ea755a 100644 --- a/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java +++ b/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java @@ -145,10 +145,10 @@ public NexmarkPerf decode(InputStream inStream) new TableSchema() .setFields( ImmutableList.of( -new TableFieldSchema().setName("Runtime(sec)").setType("FLOAT"), -new TableFieldSchema().setName("Events(/sec)").setType("FLOAT"), +new TableFieldSchema().setName("runtimeSec").setType("FLOAT"), +new TableFieldSchema().setName("eventsPerSec").setType("FLOAT"), new TableFieldSchema() -.setName("Size of the result collection") +.setName("numResults") .setType("INTEGER"))); String tableSpec = @@ -163,9 +163,9 @@ public NexmarkPerf decode(InputStream inStream) input -> { NexmarkPerf nexmarkPerf = input.getValue(); TableRow row = new TableRow() - .set("Runtime(sec)", nexmarkPerf.runtimeSec) - .set("Events(/sec)", nexmarkPerf.eventsPerSec) - .set("Size of the result collection", nexmarkPerf.numResults); + .set("runtimeSec", nexmarkPerf.runtimeSec) + .set("eventsPerSec", nexmarkPerf.eventsPerSec) + .set("numResults", nexmarkPerf.numResults); return row; }; BigQueryIO.Write io = diff --git a/sdks/java/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java b/sdks/java/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java index 6fc5dabb50e..0be5b3d2d8a 100644 --- a/sdks/java/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java +++ b/sdks/java/nexmark/src/test/java/org/apache/beam/sdk/nexmark/PerfsToBigQueryTest.java @@ -104,16 +104,16 @@ public void testSavePerfsToBigQuery() throws IOException, InterruptedException { assertEquals("Wrong number of rows inserted", 2, actualRows.size()); List expectedRows = new ArrayList<>(); TableRow row1 = new TableRow() -.set("Runtime(sec)", nexmarkPerf1.runtimeSec).set("Events(/sec)", nexmarkPerf1.eventsPerSec) +.set("runtimeSec", nexmarkPerf1.runtimeSec).set("eventsPerSec", nexmarkPerf1.eventsPerSec) // when read using TableRowJsonCoder the row field is boxed into an Integer, cast it to int // to for bowing into Integer in the expectedRows. -.set("Size of the result collection", (int) nexmarkPerf1.numResults); +.set("numResults", (int) nexmarkPerf1.numResults); expectedRows.add(row1); TableRow row2 = new TableRow() -.set("Runtime(sec)", nexmarkPerf2.runtimeSec).set("Events(/sec)", nexmarkPerf2.eventsPerSec) +.set("runtimeSec", nexmarkPerf2.runtimeSec).set("eventsPerSec", nexmarkPerf2.eventsPerSec) // when read using TableRowJsonCoder the row field is boxed into an Integer, cast it to int // to for bowing into Integer in the expectedRows. -.set("Size of the result collection", (int) nexmarkPerf2.numResults); +.set("numResults", (int) nexmarkPerf2.numResults); expectedRows.add(row2); assertThat(actualRows, containsInAnyOrder(Iterables.toArray(expectedRows, TableRow.class))); This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 114335) Time Spent: 8.5h (was: 8h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL:
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=114281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-114281 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 21/Jun/18 09:58 Start Date: 21/Jun/18 09:58 Worklog Time Spent: 10m Work Description: echauchot opened a new pull request #5713: [BEAM-4283] Fix naming of the BigQuery fields URL: https://github.com/apache/beam/pull/5713 Fix naming of the fields of the added BQ table. Naming error not see in UT (see https://issues.apache.org/jira/browse/BEAM-4607) Follow this checklist to help us incorporate your contribution quickly and easily: - [X] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [X] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 114281) Time Spent: 8h 20m (was: 8h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.6.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=112374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112374 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 15/Jun/18 16:06 Start Date: 15/Jun/18 16:06 Worklog Time Spent: 10m Work Description: chamikaramj closed pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java index 29fb8924f22..19aef00094d 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { checkArgument(testServices != null, "testServices can not be null"); return toBuilder().setBigQueryServices(testServices).build(); } diff --git a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java index 1295cc0fe2c..c4e5462306f 100644 --- a/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java +++ b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.java @@ -32,10 +32,12 @@ import java.io.Serializable; import java.util.List; import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.values.ValueInSingleWindow; /** An interface for real, mock, or fake implementations of Cloud BigQuery services. */ -interface BigQueryServices extends Serializable { +@Experimental(Experimental.Kind.SOURCE_SINK) +public interface BigQueryServices extends Serializable { /** * Returns a real, mock, or fake {@link JobService}. diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeBigQueryServices.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeBigQueryServices.java index 0a384e74e17..d581a925f6c 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeBigQueryServices.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeBigQueryServices.java @@ -23,6 +23,7 @@ import java.io.ByteArrayOutputStream; import java.io.IOException; import java.util.List; +import org.apache.beam.sdk.annotations.Experimental; import org.apache.beam.sdk.coders.Coder.Context; import org.apache.beam.sdk.coders.ListCoder; @@ -30,16 +31,17 @@ /** * A fake implementation of BigQuery's query service.. */ -class FakeBigQueryServices implements BigQueryServices { +@Experimental(Experimental.Kind.SOURCE_SINK) +public class FakeBigQueryServices implements BigQueryServices { private JobService jobService; private FakeDatasetService datasetService; - FakeBigQueryServices withJobService(JobService jobService) { + public FakeBigQueryServices withJobService(JobService jobService) { this.jobService = jobService; return this; } - FakeBigQueryServices withDatasetService(FakeDatasetService datasetService) { + public FakeBigQueryServices withDatasetService(FakeDatasetService datasetService) { this.datasetService = datasetService; return this; } diff --git a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeDatasetService.java b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeDatasetService.java index 3526ed5ddd0..50d4b7af06d 100644 --- a/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeDatasetService.java +++ b/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeDatasetService.java @@ -39,6 +39,7 @@ import java.util.concurrent.ThreadLocalRandom; import java.util.regex.Pattern; import javax.annotation.Nullable;
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=112345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-112345 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 15/Jun/18 14:52 Start Date: 15/Jun/18 14:52 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-397646791 Hi Cham, is it ready for merging? If so, can you merge it ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 112345) Time Spent: 8h (was: 7h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=111972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111972 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 14/Jun/18 18:27 Start Date: 14/Jun/18 18:27 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-397393514 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111972) Time Spent: 7h 50m (was: 7h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=111807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111807 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 14/Jun/18 07:56 Start Date: 14/Jun/18 07:56 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-397206054 Hi Cham, thanks, done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111807) Time Spent: 7h 40m (was: 7.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=111433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111433 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 13/Jun/18 07:52 Start Date: 13/Jun/18 07:52 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-396527944 Hi @chamikaramj , I fixed the serialization issue and some other things. It should be OK now. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111433) Time Spent: 7.5h (was: 7h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=111018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111018 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 12/Jun/18 10:24 Start Date: 12/Jun/18 10:24 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-396527944 Hi Cham, I fixed the serialization issue and some other things. It should be OK now. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111018) Time Spent: 7h 20m (was: 7h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=111011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111011 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 12/Jun/18 09:32 Start Date: 12/Jun/18 09:32 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-396527944 Hi Cham, I fixed the serialization issue and some other things. I should be OK now. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111011) Time Spent: 7h 10m (was: 7h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=110892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110892 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 11/Jun/18 23:29 Start Date: 11/Jun/18 23:29 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-396418368 @echauchot please let me know if this is ready for another look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 110892) Time Spent: 7h (was: 6h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109814 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 07/Jun/18 17:46 Start Date: 07/Jun/18 17:46 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-395507028 Yeah, probably just pass values that you need from options object to ' NexmarkUtils.tableSpec' instead of the full options object ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109814) Time Spent: 6h 50m (was: 6h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109735=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109735 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 07/Jun/18 15:16 Start Date: 07/Jun/18 15:16 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-395458493 @chamikaramj I missed part of the stacktrace. I know where the serialization issue come from: I use PipelineOptions in a SerializableFunction used to configure the IO. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109735) Time Spent: 6h 40m (was: 6.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109418 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 15:04 Start Date: 06/Jun/18 15:04 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-395100488 @chamikaramj I applied your comments, thanks for the feedback. The stacktrace for the serialization issue is in gradle build scan. I copy it bellow ``` java.lang.IllegalArgumentException: unable to serialize DoFnAndMainOutput{doFn=org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1@3819e3a6, mainOutputTag=Tag}Open stacktrace Caused by: java.io.NotSerializableException: PipelineOptions objects are not serializable and should not be embedded into transforms (did you capture a PipelineOptions object in a field or in an anonymous class?). Instead, if you're using a DoFn, access PipelineOptions at runtime via ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions(), or pre-extract necessary fields from PipelineOptions at pipeline construction time.Close stacktrace at org.apache.beam.sdk.options.ProxyInvocationHandler.writeObject(ProxyInvocationHandler.java:174) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1128) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:53) at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.ParDoTranslation.translateDoFn(ParDoTranslation.java:462) at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.ParDoTranslation$1.translateDoFn(ParDoTranslation.java:160) at
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109414 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 15:00 Start Date: 06/Jun/18 15:00 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-395100488 @chamikaramj I applied your comments, thanks for the feedback. The stacktrace for the serialization issue is in gradle build scan. I copy it bellow > java.lang.IllegalArgumentException: unable to serialize DoFnAndMainOutput{doFn=org.apache.beam.sdk.io.gcp.bigquery.PrepareWrite$1@3819e3a6, mainOutputTag=Tag}Open stacktrace Caused by: java.io.NotSerializableException: PipelineOptions objects are not serializable and should not be embedded into transforms (did you capture a PipelineOptions object in a field or in an anonymous class?). Instead, if you're using a DoFn, access PipelineOptions at runtime via ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions(), or pre-extract necessary fields from PipelineOptions at pipeline construction time.Close stacktrace at org.apache.beam.sdk.options.ProxyInvocationHandler.writeObject(ProxyInvocationHandler.java:174) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1128) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.beam.sdk.util.SerializableUtils.serializeToByteArray(SerializableUtils.java:53) at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.ParDoTranslation.translateDoFn(ParDoTranslation.java:462) at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.construction.ParDoTranslation$1.translateDoFn(ParDoTranslation.java:160) at
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109325 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 10:15 Start Date: 06/Jun/18 10:15 Worklog Time Spent: 10m Work Description: asfgit commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-395018816 FAILURE --none-- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109325) Time Spent: 6h 10m (was: 6h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109307 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 08:38 Start Date: 06/Jun/18 08:38 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193334164 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -74,22 +96,89 @@ void runAll(OptionT options, NexmarkLauncher nexmarkLauncher) throws IOException appendPerf(options.getPerfFilename(), configuration, perf); actual.put(configuration, perf); // Summarize what we've run so far. - saveSummary(null, configurations, actual, baseline, start); + saveSummary(null, configurations, actual, baseline, start, options); } } + if (options.getExportSummaryToBigQuery()){ +savePerfsToBigQuery(options, actual, null); + } } finally { if (options.getMonitorJobs()) { // Report overall performance. -saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start); +saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start, options); saveJavascript(options.getJavascriptFilename(), configurations, actual, baseline, start); } } - if (!successful) { throw new RuntimeException("Execution was not successful"); } } + @VisibleForTesting + static void savePerfsToBigQuery( + NexmarkOptions options, + Map perfs, + @Nullable FakeBigQueryServices fakeBigQueryServices) { +Pipeline pipeline = Pipeline.create(options); Review comment: Also an extra point is that the volume data to insert into BQ is at maximum 12 (queries) x 3 fields for each nexmark run; thus I don't think there is any performance/speed concern here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109307) Time Spent: 5h 50m (was: 5h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109308 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 08:38 Start Date: 06/Jun/18 08:38 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193334164 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -74,22 +96,89 @@ void runAll(OptionT options, NexmarkLauncher nexmarkLauncher) throws IOException appendPerf(options.getPerfFilename(), configuration, perf); actual.put(configuration, perf); // Summarize what we've run so far. - saveSummary(null, configurations, actual, baseline, start); + saveSummary(null, configurations, actual, baseline, start, options); } } + if (options.getExportSummaryToBigQuery()){ +savePerfsToBigQuery(options, actual, null); + } } finally { if (options.getMonitorJobs()) { // Report overall performance. -saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start); +saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start, options); saveJavascript(options.getJavascriptFilename(), configurations, actual, baseline, start); } } - if (!successful) { throw new RuntimeException("Execution was not successful"); } } + @VisibleForTesting + static void savePerfsToBigQuery( + NexmarkOptions options, + Map perfs, + @Nullable FakeBigQueryServices fakeBigQueryServices) { +Pipeline pipeline = Pipeline.create(options); Review comment: Also an extra point is that the data volume to insert into BQ is at maximum 12 (queries) x 3 fields for each nexmark run; thus I don't think there is any performance/speed concern here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109308) Time Spent: 6h (was: 5h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109306 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 08:35 Start Date: 06/Jun/18 08:35 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r19303 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: Yes the injection is for testing only. I agree impl should not be public, and public FakeBigQueryServices, BigQueryServices interface and withTestServices will be usefull for other that have the same needs that this PR This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109306) Time Spent: 5h 40m (was: 5.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109304 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 08:31 Start Date: 06/Jun/18 08:31 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193332252 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java ## @@ -47,6 +47,10 @@ @JsonProperty public NexmarkUtils.SinkType sinkType = NexmarkUtils.SinkType.DEVNULL; + /** Shall we export the summary to BigQuery. */ Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109304) Time Spent: 5.5h (was: 5h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109270 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 01:21 Start Date: 06/Jun/18 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193266362 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeJobService.java ## @@ -78,7 +78,7 @@ /** * A fake implementation of BigQuery's job service. */ -class FakeJobService implements JobService, Serializable { +public class FakeJobService implements JobService, Serializable { Review comment: @Experimental(Experimental.Kind.SOURCE_SINK) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109270) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109269 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 01:21 Start Date: 06/Jun/18 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193266581 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java ## @@ -47,6 +47,10 @@ @JsonProperty public NexmarkUtils.SinkType sinkType = NexmarkUtils.SinkType.DEVNULL; + /** Shall we export the summary to BigQuery. */ Review comment: Please rewrite this comment (sounds like a TODO currently). What you wrote above is fine. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109269) Time Spent: 5h 10m (was: 5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109268 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 01:21 Start Date: 06/Jun/18 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193265905 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: Yes. I assume "inject BigQueryServices to savePerfsToBigQuery()" is for testing. BigQueryServicesImpl should not be public. I don't think there's a point in making 'withTestServices' available for testing without making 'BigQueryServices' interface available. I think 'withTestServices' in combination with FakeBigQueryServices will be useful for other Beam components that need to test pipelines that write to BigQuery as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109268) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109271 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 01:21 Start Date: 06/Jun/18 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193266493 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -300,4 +390,6 @@ public static void main(String[] args) throws IOException { NexmarkLauncher nexmarkLauncher = new NexmarkLauncher<>(options); new Main<>().runAll(options, nexmarkLauncher); } + + Review comment: Remove extra newlines. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109271) Time Spent: 5h 20m (was: 5h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109267 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 06/Jun/18 01:21 Start Date: 06/Jun/18 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193266011 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeDatasetService.java ## @@ -46,7 +46,7 @@ import org.apache.beam.sdk.values.ValueInSingleWindow; /** A fake dataset service that can be serialized, for use in testReadFromTable. */ -class FakeDatasetService implements DatasetService, Serializable { +public class FakeDatasetService implements DatasetService, Serializable { Review comment: @Experimental(Experimental.Kind.SOURCE_SINK) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109267) Time Spent: 5h (was: 4h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109015 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 10:04 Start Date: 05/Jun/18 10:04 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-394654301 @chamikaramj thanks for the review. Can you please answer my questions above? In the last commit I assumed that you meant what I wrote above. PTAL Also can you please take a look at the serialization issue and give me your opinion? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109015) Time Spent: 4h 50m (was: 4h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109010 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:58 Start Date: 05/Jun/18 09:58 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193001386 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: You mean that I should make `BigQueryServices` public in addition to `FakeBigQueryServices`, annotate it experimental, inject `BigQueryServices` to `savePerfsToBigQuery()` and add bigQuery test artifacts only to the deps of nexmark test artifact to avoid deps problem that I was raising above. Is that right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109010) Time Spent: 4h 40m (was: 4.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109008 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:52 Start Date: 05/Jun/18 09:52 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193012811 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -74,22 +96,89 @@ void runAll(OptionT options, NexmarkLauncher nexmarkLauncher) throws IOException appendPerf(options.getPerfFilename(), configuration, perf); actual.put(configuration, perf); // Summarize what we've run so far. - saveSummary(null, configurations, actual, baseline, start); + saveSummary(null, configurations, actual, baseline, start, options); } } + if (options.getExportSummaryToBigQuery()){ +savePerfsToBigQuery(options, actual, null); + } } finally { if (options.getMonitorJobs()) { // Report overall performance. -saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start); +saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start, options); saveJavascript(options.getJavascriptFilename(), configurations, actual, baseline, start); } } - if (!successful) { throw new RuntimeException("Execution was not successful"); } } + @VisibleForTesting + static void savePerfsToBigQuery( + NexmarkOptions options, + Map perfs, + @Nullable FakeBigQueryServices fakeBigQueryServices) { +Pipeline pipeline = Pipeline.create(options); Review comment: Yes it is technically feasible to create a new PipelineOptions with runner == DirectRunner and use it in the second pipeline. But where I'm not convinced is that it will require to ship direct runner libs in nexmark even when we are running the queries on another runner. It might be problematic to have them in the fat jar deployed on a spark cluster for example. Currently, we only ship direct runner libs in the classpath when we run JVM local tests on the direct runner (profile). Please note that the other big query additions to nexmark (sink the output PCollection to big query) currently run with the same runner than the queries in the same pipeline) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 109008) Time Spent: 4.5h (was: 4h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108999 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:28 Start Date: 05/Jun/18 09:28 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193001386 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: You mean that I should make BigQueryServices public in addition to FakeBigQueryServices, annotate it experimental and add bigquery test artifacts only to the deps of nexmark test artifact to avoid deps problem that I was raising above. Is that right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108999) Time Spent: 4h 20m (was: 4h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108996 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:23 Start Date: 05/Jun/18 09:23 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193001386 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: You mean that I should make BigQueryServices public in addition to FakeBigQueryServices, annotate it experimental and add bigquery test artifacts only to the deps of nexmark test artifact. Is that right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108996) Time Spent: 4h 10m (was: 4h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108985 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:14 Start Date: 05/Jun/18 09:14 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r193001386 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: You mean that I should make BigQueryServices public in addition to FakeBigQueryServices, annotate it experimental and add bigquery test artifacts only to the deps of nexmark test artifact? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108985) Time Spent: 4h (was: 3h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108980 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:05 Start Date: 05/Jun/18 09:05 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192998650 ## File path: sdks/java/nexmark/build.gradle ## @@ -42,6 +42,7 @@ configurations { } dependencies { + compile 'com.google.cloud:google-cloud-bigquery:1.28.0' Review comment: +1, duplicate comment, see above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108980) Time Spent: 3h 50m (was: 3h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108979 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 09:03 Start Date: 05/Jun/18 09:03 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192997887 ## File path: sdks/java/nexmark/build.gradle ## @@ -42,6 +42,7 @@ configurations { } dependencies { + compile 'com.google.cloud:google-cloud-bigquery:1.28.0' Review comment: Indeed, I thought I had removed it, but I may have missed it in the rebase process. Thanks for pointing out This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108979) Time Spent: 3h 40m (was: 3.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108901 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 00:58 Start Date: 05/Jun/18 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192918564 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/FakeBigQueryServices.java ## @@ -30,16 +30,16 @@ /** * A fake implementation of BigQuery's query service.. */ -class FakeBigQueryServices implements BigQueryServices { +public class FakeBigQueryServices implements BigQueryServices { Review comment: Please add "@Experimental(Experimental.Kind.SOURCE_SINK)". This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108901) Time Spent: 3h 20m (was: 3h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108902 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 00:58 Start Date: 05/Jun/18 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192918714 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1452,7 +1452,10 @@ static String getExtractDestinationUri(String extractDestinationDir) { } @VisibleForTesting -Write withTestServices(BigQueryServices testServices) { +/** + * This method is for test usage only + */ +public Write withTestServices(BigQueryServices testServices) { Review comment: I think it's fine to make BigQueryServices public but add "@Experimental(Experimental.Kind.SOURCE_SINK)" so that we can remove it in the future if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108902) Time Spent: 3h 20m (was: 3h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108900 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 00:58 Start Date: 05/Jun/18 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192920253 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -74,22 +96,89 @@ void runAll(OptionT options, NexmarkLauncher nexmarkLauncher) throws IOException appendPerf(options.getPerfFilename(), configuration, perf); actual.put(configuration, perf); // Summarize what we've run so far. - saveSummary(null, configurations, actual, baseline, start); + saveSummary(null, configurations, actual, baseline, start, options); } } + if (options.getExportSummaryToBigQuery()){ +savePerfsToBigQuery(options, actual, null); + } } finally { if (options.getMonitorJobs()) { // Report overall performance. -saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start); +saveSummary(options.getSummaryFilename(), configurations, actual, baseline, start, options); saveJavascript(options.getJavascriptFilename(), configurations, actual, baseline, start); } } - if (!successful) { throw new RuntimeException("Execution was not successful"); } } + @VisibleForTesting + static void savePerfsToBigQuery( + NexmarkOptions options, + Map perfs, + @Nullable FakeBigQueryServices fakeBigQueryServices) { +Pipeline pipeline = Pipeline.create(options); Review comment: I'm still not sure why we can't create a new PipelineOptions object here (PipelineOptions newOptions = PipelineOptionsFactory.create()) and use DirectRunner. That will be much faster (and cleaner) than using the original options with original runner. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108900) Time Spent: 3h 20m (was: 3h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108904 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 00:58 Start Date: 05/Jun/18 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192915615 ## File path: sdks/java/nexmark/build.gradle ## @@ -42,6 +42,7 @@ configurations { } dependencies { + compile 'com.google.cloud:google-cloud-bigquery:1.28.0' Review comment: We don't need this anymore, right ? (Also, This dependency will result in a conflict till we upgrade gRPC, protobuf, and Spanner. See https://issues.apache.org/jira/browse/BEAM-4229.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108904) Time Spent: 3.5h (was: 3h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108903 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 05/Jun/18 00:58 Start Date: 05/Jun/18 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192916538 ## File path: sdks/java/nexmark/build.gradle ## @@ -42,6 +42,7 @@ configurations { } dependencies { + compile 'com.google.cloud:google-cloud-bigquery:1.28.0' Review comment: We don't need this anymore, right ? Also, this can result in dependency conflicts, see https://issues.apache.org/jira/browse/BEAM-4229. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108903) Time Spent: 3.5h (was: 3h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108610 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 04/Jun/18 14:35 Start Date: 04/Jun/18 14:35 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-394377009 @chamikaramj I did the solution with the second pipeline which uses BigQueryIO that you suggested PTAL. Some comments : 1. I had to make FakeBigQuery* public. But what troubles me is that, as you did not want to put BigQueyServices public, I had to inject FakeBigQueryServices to use with BQIO.withTestServices(). So that makes a test artifact a dependence of nexmark production code. 2. I still have a serialization issue. I guess it is because of NexmarkPerf serialization (see custom coder in Main). => I would prefer that we discuss the global design (point 1 and related) before I try to fix this second point. Thanks again This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108610) Time Spent: 3h 10m (was: 3h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108595 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 04/Jun/18 14:26 Start Date: 04/Jun/18 14:26 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192759136 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: in this PR we output the perf of a nexmark run on a set of queries using a particular runner. We will end up having BQ tables per request x runner (pipelineOptions.getRunner()). So the pipeline that writes must me run with the same runner than the pipeline that tests the queries otherwise we will output perfs data as if their were gathered using directRunner each time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108595) Time Spent: 3h (was: 2h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108161 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 01/Jun/18 18:31 Start Date: 01/Jun/18 18:31 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192479918 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: Using any runner (for the second pipeline) is fine. But can you clarify why you cannot use DirectRunner here ? Using DirectRunner will be the easiest option if you want to push a small amount of data to BigQuery. You'll have to create a new PipelineOptions object for this (or update "runner" property of current PipelineOptions object). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108161) Time Spent: 2h 50m (was: 2h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108159 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 01/Jun/18 18:30 Start Date: 01/Jun/18 18:30 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192479918 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: Using any runner (for the second pipeline) is fine. But can you clarify why you cannot use DirectRunner here ? Using DirectRunner will be the easiest option if you want to push a small amount of data to BigQuery. You'll have to create a new PipelineOptions object for this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108159) Time Spent: 2h 40m (was: 2.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108087 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 01/Jun/18 14:14 Start Date: 01/Jun/18 14:14 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192408722 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: True, I'm using BigQueryService class. If you don't want it exposed, I can use BigQueryIO in a second pipeline indeed. I can create this pipeline once the run pipeline is finished and run it over of PCollection> and write this Pcollection to BQ using BigQueryIO. The only this is that the second pipeline will not run using DirectRunner, but using the runner configured in the piepelineOptions (the runner that we are currently testing with nexmark) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108087) Time Spent: 2.5h (was: 2h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=108086=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108086 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 01/Jun/18 14:12 Start Date: 01/Jun/18 14:12 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192408722 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: True, I'm using BigQueryService class. If you don't want it exposed, I can use BigQueryIO in a second pipeline indeed. I can create this pipeline once the run pipeline is finished and run it over of PCollection> and write this Pcollection to BQ using BigQueryIO. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 108086) Time Spent: 2h 20m (was: 2h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107695 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 15:10 Start Date: 31/May/18 15:10 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192133881 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: Using FakeBigQueryService for testing is fine, but seems like here you are using the BigQueryService class (which is supposed to be an implementation detail of Beam BigQuery connector) to publish results to real BigQuery, no ? (apologies if I misunderstood) I was suggesting to use a second Beam pipeline that uses DirectRunner (not the original pipeline) to publish results to BigQuery instead. I don't think we should expand the public interface of BigQuery connector for this task. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107695) Time Spent: 2h 10m (was: 2h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107627 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 09:51 Start Date: 31/May/18 09:51 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-393478182 @chamikaramj Thanks for your review, I answered all your comments. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107627) Time Spent: 2h (was: 1h 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107626 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 09:50 Start Date: 31/May/18 09:50 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192045764 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java ## @@ -420,6 +428,9 @@ public String toShortString() { if (sinkType != DEFAULT.sinkType) { sb.append(String.format("; sinkType:%s", sinkType)); } +if (exportSummaryToBigQuery != DEFAULT.exportSummaryToBigQuery) { Review comment: Like all the other configuration items, when we print the configuration, we print their value only if they are set by the user (i.e. their value is not the default value) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107626) Time Spent: 1h 50m (was: 1h 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107625 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 09:46 Start Date: 31/May/18 09:46 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192044761 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java ## @@ -47,6 +47,10 @@ @JsonProperty public NexmarkUtils.SinkType sinkType = NexmarkUtils.SinkType.DEVNULL; + /** Shall we export the summary to BigQuery. */ Review comment: If false, the summary is only output to the console. If true the summary is output to the console and it's content is written to bigquery tables per query x runner x mode. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107625) Time Spent: 1h 40m (was: 1.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107622 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 09:42 Start Date: 31/May/18 09:42 Worklog Time Spent: 10m Work Description: echauchot commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192043662 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: For your 2 questions: 1. No, here we output to BigQuery only the response time of the pipeline so it needs to be finished. 2. What you suggest (bigquery client) is what I did at first and I asked you how I can unit test it and you answered that I should use FakeBigQueryServices. But to use FakeBigQueryServices I needed to refactor everything to use FakeBigQueryServices's super class (BigQueryServices) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107622) Time Spent: 1.5h (was: 1h 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107609 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 08:40 Start Date: 31/May/18 08:40 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192022213 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java ## @@ -146,13 +164,77 @@ private void appendPerf( private static final String LINE = "=="; - /** - * Print summary of {@code actual} vs (if non-null) {@code baseline}. - */ + /** Send {@code nexmarkPerf} to BigQuery. */ + @VisibleForTesting + static void writeQueryPerftoBigQuery( Review comment: Can we perform this write using a Beam pipeline that uses DirectRunner and BigQuery sink instead of using BigQueryService class ? I think you should use either that or use a BigQuery client library directly instead of using the intermediate "BigQueryService" class in Beam which is not intended to be a public utility. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107609) Time Spent: 1h (was: 50m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107611 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 08:40 Start Date: 31/May/18 08:40 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192025438 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java ## @@ -420,6 +428,9 @@ public String toShortString() { if (sinkType != DEFAULT.sinkType) { sb.append(String.format("; sinkType:%s", sinkType)); } +if (exportSummaryToBigQuery != DEFAULT.exportSummaryToBigQuery) { Review comment: Why are we comparing with default values ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107611) Time Spent: 1h 20m (was: 1h 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=107610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-107610 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 31/May/18 08:40 Start Date: 31/May/18 08:40 Worklog Time Spent: 10m Work Description: chamikaramj commented on a change in pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#discussion_r192024638 ## File path: sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java ## @@ -47,6 +47,10 @@ @JsonProperty public NexmarkUtils.SinkType sinkType = NexmarkUtils.SinkType.DEVNULL; + /** Shall we export the summary to BigQuery. */ Review comment: How about "If true, exports the summary to BigQuery." ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 107610) Time Spent: 1h 10m (was: 1h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=106361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-106361 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 28/May/18 15:24 Start Date: 28/May/18 15:24 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-392554240 @chamikaramj finally, in the production code, I used `BigQueryServices` included in `BigQueryIO` with fake windows/pane/timestamp instead of the regular big query client. Thus, I could use `FakeBigQueryServices` in the test. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 106361) Time Spent: 50m (was: 40m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=105854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105854 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 25/May/18 09:15 Start Date: 25/May/18 09:15 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-391985730 @chamikaramj yes as I wrote in the comment in the code I took a look at FakeBigQueryServices and BigQueryServicesImpl and it is PCollection oriented among other things it deals with windowing. My use case is a simple insert of values outside a PCollection to BQ. I was searching for a more simple way of testing. But if it is the only way to test, I could create fake windows/pane. But this seems overkill to test a simple insert done using de bigQuery regular client. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105854) Time Spent: 40m (was: 0.5h) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=105852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105852 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 25/May/18 08:55 Start Date: 25/May/18 08:55 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-391985730 @chamikaramj yes as I wrote in the comment in the code I took a look at FakeBigQueryServices and BigQueryServicesImpl and it is PCollection oriented among other things it deals with windowing. My use case is a simple insert of values outside a PCollection to BQ. I was searching for a more simple way of testing. I inserted using de bigQuery regular client This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105852) Time Spent: 0.5h (was: 20m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=105850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105850 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 25/May/18 08:47 Start Date: 25/May/18 08:47 Worklog Time Spent: 10m Work Description: echauchot commented on issue #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464#issuecomment-391985730 @chamikaramj yes as I wrote in the comment in the code I took a look at FakeBigQueryServices and it is PCollection oriented. My use case is a simple insert of values outside a PCollection to BQ. I was searching for a more simple way of testing This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105850) Time Spent: 20m (was: 10m) > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4283) Export nexmark execution times to bigQuery
[ https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=105551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105551 ] ASF GitHub Bot logged work on BEAM-4283: Author: ASF GitHub Bot Created on: 24/May/18 12:05 Start Date: 24/May/18 12:05 Worklog Time Spent: 10m Work Description: echauchot opened a new pull request #5464: [BEAM-4283] Write Nexmark execution times to bigquery URL: https://github.com/apache/beam/pull/5464 This is to write Nexmark execution times to BigQuery to be able to integrate Nexmark output in perfkit dashboards. @jkff I did not do a proper test, can you please advice me on a way to properly unit test a simple insert (no PCollection) into Big Query. I left a TODO and comment in the test. Follow this checklist to help us incorporate your contribution quickly and easily: - [X] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [X] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 105551) Time Spent: 10m Remaining Estimate: 0h > Export nexmark execution times to bigQuery > -- > > Key: BEAM-4283 > URL: https://issues.apache.org/jira/browse/BEAM-4283 > Project: Beam > Issue Type: Sub-task > Components: examples-nexmark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Nexmark only outputs the results collection to bigQuery and prints in the > console the execution times. To supervise Nexmark execution times, we need to > store them as well per runner/query/mode -- This message was sent by Atlassian JIRA (v7.6.3#76005)