[ https://issues.apache.org/jira/browse/SPARK-23554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruslan Dautkhanov resolved SPARK-23554. --------------------------------------- Resolution: Duplicate > Hive's textinputformat.record.delimiter equivalent in Spark > ----------------------------------------------------------- > > Key: SPARK-23554 > URL: https://issues.apache.org/jira/browse/SPARK-23554 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 2.2.1, 2.3.0 > Reporter: Ruslan Dautkhanov > Priority: Major > Labels: csv, csvparser > > It would be great if Spark would support an option similar to Hive's > {{textinputformat.record.delimiter }} in spark-csv reader. > We currently have to create Hive tables to workaround this missing > functionality natively in Spark. > {{textinputformat.record.delimiter}} was introduced back in 2011 in > map-reduce era - > see MAPREDUCE-2254. > As an example, one of the most common use cases for us involving > {{textinputformat.record.delimiter}} is to read multiple lines of text that > make up a "record". Number of actual lines per "record" is varying and so > {{textinputformat.record.delimiter}} is a great solution for us to process > these files natively in Hadoop/Spark (custom .map() function then actually > does processing of those records), and we convert it to a dataframe.. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org