I am using Google Kubernetes Cluster with the docker image that I built with PySpark 3.1.1 on prem and pushed the docker image to a google repository.
The py module generates some 100 rows of Random data and then writes it to a BigQuery table. Both write to and subsequent read from BigQuery table show the correct number of rows: Populated BigQuery table test.randomData rows written is 100 Reading from BigQuery table test.randomData rows read in is 100 However, the following operation fails if df2.subtract(read_df).count() == 0: print("Data has been loaded OK to Oracle table") else: print("Data could not be loaded to Oracle table, quitting") sys.exit(1) 21/08/06 21:58:45 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 8.0 (TID 11) (10.64.2.15 executor 1): java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available at com.google.cloud.spark.bigquery.repackaged.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:490) at com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:257) at com.google.cloud.spark.bigquery.repackaged.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:247) Further down it shows: py4j.protocol.Py4JJavaError: An error occurred while calling o116.count. : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: OK this may be specific to BigQuery because as I rtecall this operation could be done against an Oracle table. Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.