I am using the bigqueryio transform and I am using the following struct to
collect a data row:
type Record {
source_service biquery.NullString
.. etc...
}
This works fine with the direct runner, but when I try it with the dataflow
runner, then I get the following exception:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error
received from SDK harness for instruction -41: execute failed: bigquery: schema
field source_service of type STRING is not assignable to struct field
source_service of type struct { StringVal string; Valid bool }
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:55)
at
com.google.cloud.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:274)
at
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
at
com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:101)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
at
com.google.cloud.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:179)
at
com.google.cloud.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:107)
Suppressed: java.lang.IllegalStateException: Already closed.
at
org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:97)
at
com.google.cloud.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:93)
at
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:89)
... 6 more
Looks like the bigquery API is failing to detect the nullable type NullString,
and instead is attempting a plain assignment. Could it be that some aspect of
the type information has been lost thus preventing the bigquery API from
identifying and handling NullString properly?