Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20849
> The PR doesn't solve any practical use cases
It does. It allows many workarounds, for example, we can intentionally add
a custom delimiter so that it can support multiple-line-ish JSONs:
```
{
"a": 1
}
|^|
{
"b": 2
}
```
Go and google CSV's case too.
> `encoding`? Only as an alias for `charset`.
Yes, `encoding`. This has higher priority over `charset`. See `CSVOptions`.
Also, that's what we use in PySpark's CSV, doesn't it?
https://github.com/apache/spark/blob/a9350d7095b79c8374fb4a06fd3f1a1a67615f6f/python/pyspark/sql/readwriter.py#L333
Shall we expose `encoding` and add an alias for `charset`?
> I would definitely discuss how are you going to extend lineSep in your
PR: #20877 in the future to support Json Streaming for example. If you don't
have such vision, I would prefer to block your PR.
Why are you dragging an orthogonal thing into #20877? I don't think we
would fail to make a decision on the flexible option I guess we have much time
until 2.4.0.
Even if we fail to make a decision on the flexible option, we can expose
another option that supports the flexibility that forces unsetting `lineSep`,
can't we?
Is this flexible option also a part of your public release?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]