Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/14638
I see. Thank you for guidance. Initially, I thought this table property is
orthogonally skipping the starting rows of each file for all formats. And, I
assumed that users will not use this property improperly.
But, what you mean is it has no meaning for columnar and vectorized
formats. So, if a user give this table property for this Parquet or ORC, Spark
need to ignore this.
If then, definitely, we should find some places for TEXT format only. BTW,
do you have some proper location in your mind instead of the current
`hadoopRDD.mapPartitionsWithIndex`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]