Re: [PR] Allow setting BigQuery endpoint [beam]

2024-09-27 Thread via GitHub


lukas-mi commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2379092944

   I have the same question on how to actually use this featur. I would assume 
the interface would be something like 
`BigQueryIO.write.withEndpoint(emulatorEndpoint)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-09-27 Thread via GitHub


lukas-mi commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2379149978

   I've tried the following:
   ```
   PipelineOptionsFactory.register(BigQueryOptions.class);
   BigQueryOptions options = 
PipelineOptionsFactory.create().as(BigQueryOptions.class);
   
   options.setBigQueryProject(bqEmulator.getEmulatorHttpEndpoint());
   options.setBigQueryProject(bqEmulator.getProjectId());
   
   var pipeline = Pipeline.create(options);
   
   var sinkTable = "%s.%s.%s".formatted(bqEmulator.getProjectId(), 
"test_dataset", "test_table");
   
   var schema = new TableSchema()
   .setFields(List.of(new TableFieldSchema()
   .setName("testColumn")
   .setType("STRING")
   .setMode("REQUIRED")));
   
   var row = new TableRow();
   row.set("testColumn", "testValue");
   var rows = Create.of(List.of(row));
   
   pipeline
   .apply(rows)
   .apply(BigQueryIO
   .write()
   .to(sinkTable)
   .withFormatFunction(new 
SerializableFunction() {
   @Override
   public TableRow apply(TableRow input) {
   return input;
   }
   })
   .withSchema(schema));
   
   pipeline.run().waitUntilFinish();
   ```
   
   Still get the following error:
   ```
   com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
Request
   GET 
https://bigquery.googleapis.com/bigquery/v2/projects/test-project/datasets/test_dataset?prettyPrint=false
   {
 "code": 400,
 "errors": [
   {
 "domain": "global",
 "message": "The project test-project has not enabled BigQuery.",
 "reason": "invalid"
   }
 ],
 "message": "The project test-project has not enabled BigQuery.",
 "status": "INVALID_ARGUMENT"
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-09-27 Thread via GitHub


kberezin-nshl commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2379649770

   Hey, @wattache @lukas-mi 
   
   Unfortunately, currently there is a bug causing the behavior that you're 
seeing. [The fix for that](https://github.com/apache/beam/pull/32450) has been 
merged already, so hopefully you'll be able to use this feature with the next 
release.
   
   In the meantime, it is possible to have the following workaround, first 
create this class:
   ```java
   public class WorkaroundBQServices implements BigQueryServices {
   
 private final String bigQueryEndpoint;
   
 public WorkaroundBQServices(String bigQueryEndpoint) {
   this.bigQueryEndpoint = bigQueryEndpoint;
 }
   
 @Override
 public JobService getJobService(BigQueryOptions options) {
   options.setBigQueryEndpoint(bigQueryEndpoint);
   return new BigQueryServicesImpl.JobServiceImpl(options);
 }
   
 @Override
 public DatasetService getDatasetService(BigQueryOptions options) {
   options.setBigQueryEndpoint(bigQueryEndpoint);
   return new BigQueryServicesImpl.DatasetServiceImpl(options);
 }
   
 @Override
 public WriteStreamService getWriteStreamService(BigQueryOptions options) {
   options.setBigQueryEndpoint(bigQueryEndpoint);
   return new BigQueryServicesImpl.WriteStreamServiceImpl(options);
 }
   
 @Override
 public StorageClient getStorageClient(BigQueryOptions options) throws 
IOException {
   options.setBigQueryEndpoint(bigQueryEndpoint);
   return new BigQueryServicesImpl.StorageClientImpl(options);
 }
   }
   ```
   
   then in your pipeline, set it as
   ```java
   BigQueryIO.<...>write()
   .withTestServices(new 
WorkaroundBQServices(options.getBigQueryEndpoint()))
   ```
   
   As I said, once new release come out, you should be able to remove this 
workaround.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-09-27 Thread via GitHub


lukas-mi commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2379188460

   I've updated on the previous example:
   - previously had `setBigQueryProject` twice without setting 
`setBigQueryEndpoint`
   - added dataset creation
   - added the use of `withCustomGcsTempLocation`, otherwise and exception is 
thrown
   
   ```
 PipelineOptionsFactory.register(BigQueryOptions.class);
 BigQueryOptions options = 
PipelineOptionsFactory.create().as(BigQueryOptions.class);
   
 options.setBigQueryEndpoint(bqEmulator.getEmulatorHttpEndpoint());
 options.setBigQueryProject(bqEmulator.getProjectId());
   
 var pipeline = Pipeline.create(options);
   
 var sinkTable = "%s.%s.%s".formatted(bqEmulator.getProjectId(), 
"test_dataset", "test_table");
   
 var datasetId = DatasetId.of(bqEmulator.getProjectId(), 
"test_dataset");
 var datasetInfo = DatasetInfo.newBuilder(datasetId).build();
 bqClient.create(datasetInfo);
   
 var schema = new TableSchema()
 .setFields(List.of(new TableFieldSchema()
 .setName("testColumn")
 .setType("STRING")
 .setMode("REQUIRED")));
   
 var row = new TableRow();
 row.set("testColumn", "testValue");
 var rows = Create.of(List.of(row));
   
 pipeline
 .apply(rows)
 .apply(BigQueryIO
 .write()
 .to(sinkTable)
 .withFormatFunction(new SerializableFunction() {
 @Override
 public TableRow apply(TableRow input) {
 return input;
 }
 })
 .withSchema(schema)
 .withCustomGcsTempLocation(new ValueProvider() 
{
 @Override
 public String get() {
 return "/tmp";
 }
   
 @Override
 public @UnknownKeyFor @NonNull @Initialized 
boolean isAccessible() {
 return true;
 }
 })
 );
   
 pipeline.run().waitUntilFinish();
   ```
   Now I'm getting an exception related to serialization:
   ```
   unable to serialize 
DoFnWithExecutionInformation{doFn=org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3@5b76b891,
 mainOutputTag=Tag, sideInputMapping={}, 
schemaInformation=DoFnSchemaInformation{elementConverters=[], 
fieldAccessDescriptor=*}}
   java.lang.IllegalArgumentException: unable to serialize 
DoFnWithExecutionInformation{doFn=org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$3@5b76b891,
 mainOutputTag=Tag, sideInputMapping={}, 
schemaInformation=DoFnSchemaInformation{elementConverters=[], 
fieldAccessDescriptor=*}}
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-09-27 Thread via GitHub


wattache commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2378580478

   Hello, 
   
   Thank you for this new feature. May I ask you @kberezin-nshl how you intend 
the user to use this new possibilities ? 
   
   I've tried to use test container for my pipeline, but I still get the error 
message `"The project test-project has not enabled BigQuery.",` with url being 
`https://bigquery.googleapis.com/bigquery/v2/projects/test-project/datasets/***/tables/***/insertAll?prettyPrint=false`
   
   I use TestContainer and TestPipeline
   
   ```java
BigQueryEmulatorContainer container = new 
BigQueryEmulatorContainer("ghcr.io/goccy/bigquery-emulator:0.4.3");
   
   String projectId = container.getProjectId();
   
   org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions pipelineOptions 
= 
PipelineOptionsFactory.create().as(org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions.class);
   
pipelineOptions.setBigQueryEndpoint(container.getEmulatorHttpEndpoint());
   BigQueryServicesImpl bqService = new BigQueryServicesImpl();
   bqService.getJobService(pipelineOptions);
   
   Session session = buildExpectedSession();
   
   WriteResult writeResult = p.apply(Create.of(session))
   .apply(ParDo.of(new ProtobufSessionToBigQueryTableRow()))
   .apply(BigQueryIO.writeTableRows()
   .withTestServices(bqService)
   .to(row -> {
   return new 
TableDestination(projectId+":***.***", null);
   }));
   ```
   
   Thanks in advance for your help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-21 Thread via GitHub


Abacn merged PR #32153:
URL: https://github.com/apache/beam/pull/32153


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-21 Thread via GitHub


github-actions[bot] commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2301905700

   Reminder, please take a look at this pr: @Abacn @damondouglas 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-13 Thread via GitHub


kberezin-nshl commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2287992342

   Thanks @Abacn , addressed your comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-13 Thread via GitHub


Abacn commented on code in PR #32153:
URL: https://github.com/apache/beam/pull/32153#discussion_r1715381602


##
CHANGES.md:
##
@@ -68,6 +68,7 @@
 ## New Features / Improvements
 
 * X feature added (Java/Python) 
([#X](https://github.com/apache/beam/issues/X)).
+* Added an ability to set BigQuery endpoint (Java) 
([#28149](https://github.com/apache/beam/issues/28149)).

Review Comment:
   shall we note that this is for testing purpose? Endpoint can be ambiguous 
(e.g. regional/zonal endpoint/ like Dataflow endpoint)
   
   consider "BigQuery endpoint can be overridden via PipelineOptions, this 
enables BigQuery emulators (Java)"



##
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java:
##
@@ -1615,8 +1642,13 @@ private static BigQueryWriteClient 
newBigQueryWriteClient(BigQueryOptions option
   .setChannelsPerCpu(2)
   .build();
 
+  BigQueryWriteSettings.Builder builder = 
BigQueryWriteSettings.newBuilder();
+  String endpoint = options.getBigQueryEndpoint();

Review Comment:
   `String` -> `@Nullable String`



##
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java:
##
@@ -1615,8 +1642,13 @@ private static BigQueryWriteClient 
newBigQueryWriteClient(BigQueryOptions option
   .setChannelsPerCpu(2)
   .build();
 
+  BigQueryWriteSettings.Builder builder = 
BigQueryWriteSettings.newBuilder();
+  String endpoint = options.getBigQueryEndpoint();
+  if (endpoint != null) {

Review Comment:
   a safer guard is `Strings.isNullOrEmpty`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-13 Thread via GitHub


kberezin-nshl commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2285534783

   @Abacn @damondouglas could you please have a look? 
   
   Basically, I just followed the same pattern as with `set/getGcsEndpoint` in 
`GcsOptions`, and this allows to run pipelines against BQ emulator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-12 Thread via GitHub


kberezin-nshl commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2283417548

   assign set of reviewers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-12 Thread via GitHub


github-actions[bot] commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2283419749

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @Abacn for label java.
   R: @damondouglas for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Allow setting BigQuery endpoint [beam]

2024-08-12 Thread via GitHub


github-actions[bot] commented on PR #32153:
URL: https://github.com/apache/beam/pull/32153#issuecomment-2283342269

   Checks are failing. Will not request review until checks are succeeding. If 
you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org