[ https://issues.apache.org/jira/browse/SPARK-26406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26406. ---------------------------------- Resolution: Won't Fix I won't fix this although I get the usecase and problem. Spark's being very conservative so let's only add absolutely required APIs or options only. > Add option to skip rows when reading csv files > ---------------------------------------------- > > Key: SPARK-26406 > URL: https://issues.apache.org/jira/browse/SPARK-26406 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Thomas Kastl > Priority: Minor > > Real-world data can contain multiple header lines. Spark currently does not > offer any way to skip more than one header row. > Several workarounds are proposed on stackoverflow (manually editing each csv > file by adding "#" to the rows and using the comment option, or filtering > after reading) but all of them are workarounds with more or less obvious > drawbacks and restrictions. > The option > {code:java} > header=True{code} > already treats the first row of csv files differently, so the argument that > Spark wants to be row-order agnostic does not really hold here in my opinion. > A solution like pandas' > {code:java} > skiprows={code} > would be highly preferable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org