davidrabinowitz commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-721355772
@HyukjinKwon
Should I create another PR aimed at master?
In order to test it first you need to create a table in BigQuery in the
following manner:
```
bq load --source_format NEWLINE_DELIMITED_JSON <TABLE> vector_test.data.json
vector_test.schema.json
```
The files are:
- vector_test.data.json:
```
{"name":"row1","num":"1","vector":{"type":"1","indices":[],"values":[1,2,3]}}
{"name":"row2","num":"2","vector":{"type":"1","indices":[],"values":[4,5,6]}}
{"name":"row3","num":"3","vector":{"type":"1","indices":[],"values":[7,8,9]}}
```
- vector_test.schema.json:
```
[
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "num",
"type": "INTEGER"
},
{
"description": "{spark.type=vector}",
"fields": [
{
"mode": "NULLABLE",
"name": "type",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "size",
"type": "INTEGER"
},
{
"mode": "REPEATED",
"name": "indices",
"type": "INTEGER"
},
{
"mode": "REPEATED",
"name": "values",
"type": "FLOAT"
}
],
"mode": "NULLABLE",
"name": "vector",
"type": "RECORD"
}
]
```
A GCP account is needed for that, but the amount of data and operation are
well in the free tier.
Run `spark-shell --packages
com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.3` and enter
the following commands:
```
val df =
spark.read.format("com.google.cloud.spark.bigquery.v2.BigQueryDataSourceV2").load("<TABLE>")
df.schema()
df.show()
```
Notice that when the format is changed to `bigquery` another path is used
which does not rely on the code generator and hence does not suffer from this
issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]