Github user sujith71955 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22396#discussion_r217802972
--- Diff: docs/sql-programming-guide.md ---
@@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- Since Spark 2.4, File listing for compute statistics is done in
parallel by default. This can be disabled by setting
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
- Since Spark 2.4, Metadata files (e.g. Parquet summary files) and
temporary files are not counted as data files when calculating table size
during Statistics computation.
- Since Spark 2.4, empty strings are saved as quoted empty strings `""`.
In version 2.3 and earlier, empty strings are equal to `null` values and do not
reflect to any characters in saved CSV files. For example, the row of `"a",
null, "", 1` was writted as `a,,,1`. Since Spark 2.4, the same row is saved as
`a,,"",1`. To restore the previous behavior, set the CSV option `emptyValue` to
empty (not quoted) string.
+ - Since Spark 2.4 load command from local filesystem supports wildcards
in the folder level paths(e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/).Also in
Older versions space in folder/file names has been represented using '%20'(e.g.
LOAD DATA INPATH 'tmp/folderName/myFile%20Name.csv), this usage will not be
supported from spark 2.4 version. Since Spark 2.4, Spark supports normal space
character in folder/file names (e.g. LOAD DATA INPATH
'hdfs://tmp/folderName/file Name.csv') and wildcard character '?' can be used.
(e.g. LOAD DATA INPATH 'hdfs://tmp/folderName/fileName?.csv')
--- End diff --
@cloud-fan We follow the same syntax as old versions for Load command path,
except in older versions user was not able to provide wildcard characters in
folder level of the local fs , Now we do support with our new implementation
and even in hdfs we do support the same syntax. So now it is consistent. All
the usage which i mentioned can be applied in both local and hdfs file systems.
Now the usages are more consistent compare to older versions.
For more details please refer below PR let me know for any clarifications.
Thanks
https://github.com/apache/spark/pull/20611
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]