Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/15821
@rxin I have updated this to use Arrow 0.3 and addressed your other
comments, could you please give it another look when possible? Following up on
a couple issues:
>Use SQLConf rather than a parameter for toPandas.
I removed this flag and used the conf "spark.sql.execution.arrow.enable"
which defaults to "false", and also added
"spark.sql.execution.arrow.maxRecordsPerBatch" to limit memory usage, still
under discussion.
>rather than defining the json using objects and serialize them, can we
just put the json as a string inline? that'd be much easier to inspect ...
Here is a sample of a simple JSON file the tests use. It contains metadata
and validity array in addition to the raw data, and ends up being a fairly
large string which is why I opt for generating the file instead.
```
{
"schema": {
"fields": [
{
"name": "nullable_int",
"type": {"name": "int", "isSigned": true, "bitWidth": 32},
"nullable": true,
"children": [],
"typeLayout": {
"vectors": [
{"type": "VALIDITY", "typeBitWidth": 1},
{"type": "DATA", "typeBitWidth": 32}
]
}
}
]
},
"batches": [
{
"count": 6,
"columns": [
{
"name": "nullable_int",
"count": 6,
"VALIDITY": [1, 0, 0, 1, 0, 1],
"DATA": [1, -1, 2, -2, 2147483647, -2147483648]
}
]
}
]
}
```
>Handle failure gracefully if arrow is not installed (or somehow package it
with Spark?)
I just want to make sure I took this the right way.. It should stop
execution and print out an error with a clear message. Not log a message then
continue execution without using pyarrow, correct?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]