HyukjinKwon commented on a change in pull request #23665: [SPARK-26745][SQL]
Skip empty lines in JSON-derived DataFrames when skipParsing optimization in
effect
URL: https://github.com/apache/spark/pull/23665#discussion_r251356604
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala
##########
@@ -55,11 +56,15 @@ class FailureSafeParser[IN](
def parse(input: IN): Iterator[InternalRow] = {
try {
- if (skipParsing) {
- Iterator.single(InternalRow.empty)
- } else {
- rawParser.apply(input).toIterator.map(row => toResultRow(Some(row), ()
=> null))
- }
+ if (skipParsing) {
+ if (unparsedRecordIsNonEmpty(input)) {
Review comment:
> The case when an user sets StructType for arrays, can be excluded from the
count optimization in advance.
How are you going to exclude this without checking the input? The input
decides the number of records to return from the input `IN`.
> Regarding empty (blank) string, before #23543 they are considered as bad
records (appear in results). And count() produced pretty consistent results.
I think this was mistake. It had to be reverted and it was reverted. Empty
string isn't a JSON.
> I think we should answer to more generic question - which input conform to
empty `StructType()`.
I basically think an empty `StructType` basically means empty object `{}`. I
was thinking it's not controversial.
> Till we answer to the above question, reverting of the #21909 just move us
from one "bad" behavior to one another "bad" behavior.
No, it will moves one "bad" behaviour to the previous behaviour that keeps
backward compatibility.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]