Re: [PR] Allow setting BigQuery endpoint [beam]
lukas-mi commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2379092944 I have the same question on how to actually use this featur. I would assume the interface would be something like `BigQueryIO.write.withEndpoint(emulatorEndpoint)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
lukas-mi commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2379149978 I've tried the following: ``` PipelineOptionsFactory.register(BigQueryOptions.class); BigQueryOptions options = PipelineOptionsFactory.create().as(BigQueryOptions.class); options.setBigQueryProject(bqEmulator.getEmulatorHttpEndpoint()); options.setBigQueryProject(bqEmulator.getProjectId()); var pipeline = Pipeline.create(options); var sinkTable = "%s.%s.%s".formatted(bqEmulator.getProjectId(), "test_dataset", "test_table"); var schema = new TableSchema() .setFields(List.of(new TableFieldSchema() .setName("testColumn") .setType("STRING") .setMode("REQUIRED"))); var row = new TableRow(); row.set("testColumn", "testValue"); var rows = Create.of(List.of(row)); pipeline .apply(rows) .apply(BigQueryIO .write() .to(sinkTable) .withFormatFunction(new SerializableFunction() { @Override public TableRow apply(TableRow input) { return input; } }) .withSchema(schema)); pipeline.run().waitUntilFinish(); ``` Still get the following error: ``` com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request GET https://bigquery.googleapis.com/bigquery/v2/projects/test-project/datasets/test_dataset?prettyPrint=false { "code": 400, "errors": [ { "domain": "global", "message": "The project test-project has not enabled BigQuery.", "reason": "invalid" } ], "message": "The project test-project has not enabled BigQuery.", "status": "INVALID_ARGUMENT" } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
kberezin-nshl commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2379649770 Hey, @wattache @lukas-mi Unfortunately, currently there is a bug causing the behavior that you're seeing. [The fix for that](https://github.com/apache/beam/pull/32450) has been merged already, so hopefully you'll be able to use this feature with the next release. In the meantime, it is possible to have the following workaround, first create this class: ```java public class WorkaroundBQServices implements BigQueryServices { private final String bigQueryEndpoint; public WorkaroundBQServices(String bigQueryEndpoint) { this.bigQueryEndpoint = bigQueryEndpoint; } @Override public JobService getJobService(BigQueryOptions options) { options.setBigQueryEndpoint(bigQueryEndpoint); return new BigQueryServicesImpl.JobServiceImpl(options); } @Override public DatasetService getDatasetService(BigQueryOptions options) { options.setBigQueryEndpoint(bigQueryEndpoint); return new BigQueryServicesImpl.DatasetServiceImpl(options); } @Override public WriteStreamService getWriteStreamService(BigQueryOptions options) { options.setBigQueryEndpoint(bigQueryEndpoint); return new BigQueryServicesImpl.WriteStreamServiceImpl(options); } @Override public StorageClient getStorageClient(BigQueryOptions options) throws IOException { options.setBigQueryEndpoint(bigQueryEndpoint); return new BigQueryServicesImpl.StorageClientImpl(options); } } ``` then in your pipeline, set it as ```java BigQueryIO.<...>write() .withTestServices(new WorkaroundBQServices(options.getBigQueryEndpoint())) ``` As I said, once new release come out, you should be able to remove this workaround. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
lukas-mi commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2379188460 I've updated on the previous example: - previously had `setBigQueryProject` twice without setting `setBigQueryEndpoint` - added dataset creation - added the use of `withCustomGcsTempLocation`, otherwise and exception is thrown ``` PipelineOptionsFactory.register(BigQueryOptions.class); BigQueryOptions options = PipelineOptionsFactory.create().as(BigQueryOptions.class); options.setBigQueryEndpoint(bqEmulator.getEmulatorHttpEndpoint()); options.setBigQueryProject(bqEmulator.getProjectId()); var pipeline = Pipeline.create(options); var sinkTable = "%s.%s.%s".formatted(bqEmulator.getProjectId(), "test_dataset", "test_table"); var datasetId = DatasetId.of(bqEmulator.getProjectId(), "test_dataset"); var datasetInfo = DatasetInfo.newBuilder(datasetId).build(); bqClient.create(datasetInfo); var schema = new TableSchema() .setFields(List.of(new TableFieldSchema() .setName("testColumn") .setType("STRING") .setMode("REQUIRED"))); var row = new TableRow(); row.set("testColumn", "testValue"); var rows = Create.of(List.of(row)); pipeline .apply(rows) .apply(BigQueryIO .write() .to(sinkTable) .withFormatFunction(new SerializableFunction() { @Override public TableRow apply(TableRow input) { return input; } }) .withSchema(schema) .withCustomGcsTempLocation(new ValueProvider() { @Override public String get() { return "/tmp"; } @Override public @UnknownKeyFor @NonNull @Initialized boolean isAccessible() { return true; } }) ); pipeline.run().waitUntilFinish(); ``` Now I'm getting an exception related to serialization: ``` unable to serialize DoFnWithExecutionInformation{doFn=org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3@5b76b891, mainOutputTag=Tag, sideInputMapping={}, schemaInformation=DoFnSchemaInformation{elementConverters=[], fieldAccessDescriptor=*}} java.lang.IllegalArgumentException: unable to serialize DoFnWithExecutionInformation{doFn=org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3@5b76b891, mainOutputTag=Tag, sideInputMapping={}, schemaInformation=DoFnSchemaInformation{elementConverters=[], fieldAccessDescriptor=*}} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
wattache commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2378580478 Hello, Thank you for this new feature. May I ask you @kberezin-nshl how you intend the user to use this new possibilities ? I've tried to use test container for my pipeline, but I still get the error message `"The project test-project has not enabled BigQuery.",` with url being `https://bigquery.googleapis.com/bigquery/v2/projects/test-project/datasets/***/tables/***/insertAll?prettyPrint=false` I use TestContainer and TestPipeline ```java BigQueryEmulatorContainer container = new BigQueryEmulatorContainer("ghcr.io/goccy/bigquery-emulator:0.4.3"); String projectId = container.getProjectId(); org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions pipelineOptions = PipelineOptionsFactory.create().as(org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions.class); pipelineOptions.setBigQueryEndpoint(container.getEmulatorHttpEndpoint()); BigQueryServicesImpl bqService = new BigQueryServicesImpl(); bqService.getJobService(pipelineOptions); Session session = buildExpectedSession(); WriteResult writeResult = p.apply(Create.of(session)) .apply(ParDo.of(new ProtobufSessionToBigQueryTableRow())) .apply(BigQueryIO.writeTableRows() .withTestServices(bqService) .to(row -> { return new TableDestination(projectId+":***.***", null); })); ``` Thanks in advance for your help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
Abacn merged PR #32153: URL: https://github.com/apache/beam/pull/32153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
github-actions[bot] commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2301905700 Reminder, please take a look at this pr: @Abacn @damondouglas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
kberezin-nshl commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2287992342 Thanks @Abacn , addressed your comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
Abacn commented on code in PR #32153: URL: https://github.com/apache/beam/pull/32153#discussion_r1715381602 ## CHANGES.md: ## @@ -68,6 +68,7 @@ ## New Features / Improvements * X feature added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)). +* Added an ability to set BigQuery endpoint (Java) ([#28149](https://github.com/apache/beam/issues/28149)). Review Comment: shall we note that this is for testing purpose? Endpoint can be ambiguous (e.g. regional/zonal endpoint/ like Dataflow endpoint) consider "BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java)" ## sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java: ## @@ -1615,8 +1642,13 @@ private static BigQueryWriteClient newBigQueryWriteClient(BigQueryOptions option .setChannelsPerCpu(2) .build(); + BigQueryWriteSettings.Builder builder = BigQueryWriteSettings.newBuilder(); + String endpoint = options.getBigQueryEndpoint(); Review Comment: `String` -> `@Nullable String` ## sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java: ## @@ -1615,8 +1642,13 @@ private static BigQueryWriteClient newBigQueryWriteClient(BigQueryOptions option .setChannelsPerCpu(2) .build(); + BigQueryWriteSettings.Builder builder = BigQueryWriteSettings.newBuilder(); + String endpoint = options.getBigQueryEndpoint(); + if (endpoint != null) { Review Comment: a safer guard is `Strings.isNullOrEmpty` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
kberezin-nshl commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2285534783 @Abacn @damondouglas could you please have a look? Basically, I just followed the same pattern as with `set/getGcsEndpoint` in `GcsOptions`, and this allows to run pipelines against BQ emulator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
kberezin-nshl commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2283417548 assign set of reviewers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
github-actions[bot] commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2283419749 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @Abacn for label java. R: @damondouglas for label io. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Allow setting BigQuery endpoint [beam]
github-actions[bot] commented on PR #32153: URL: https://github.com/apache/beam/pull/32153#issuecomment-2283342269 Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org