[GitHub] [spark] srowen commented on a diff in pull request #39907: [SPARK-42359][SQL] Support row skipping when reading CSV files

via GitHub Sun, 09 Jul 2023 08:57:58 -0700


srowen commented on code in PR #39907:
URL: https://github.com/apache/spark/pull/39907#discussion_r1257505804



##########
docs/sql-data-sources-csv.md:
##########
@@ -102,6 +102,12 @@ Data source options of CSV can be set via:
     <td>For reading, uses the first line as names of columns. For writing, 
writes the names of columns as the first line. Note that if the given path is a 
RDD of Strings, this header option will remove all lines same with the header 
if exists. CSV built-in functions ignore this option.</td>
     <td>read/write</td>
   </tr>
+  <tr>
+    <td><code>skipLines</code></td>
+    <td>0</td>
+    <td>Sets the number of non-empty, uncommented lines to skip before parsing 
CSV files. If the <code>header</code> option is set to <code>true</code>, the 
first line after the number of <code>skipLines</code> will be taken as the 
header.</td>
+    <td>read</td>
+  </tr>

Review Comment:
   When does a CSV file have multiple header rows?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a diff in pull request #39907: [SPARK-42359][SQL] Support row skipping when reading CSV files

Reply via email to