[jira] [Resolved] (SPARK-23554) Hive's textinputformat.record.delimiter equivalent in Spark

Ruslan Dautkhanov (JIRA) Thu, 12 Apr 2018 09:35:30 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-23554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ruslan Dautkhanov resolved SPARK-23554.
---------------------------------------
    Resolution: Duplicate

> Hive's textinputformat.record.delimiter equivalent in Spark
> -----------------------------------------------------------
>
>                 Key: SPARK-23554
>                 URL: https://issues.apache.org/jira/browse/SPARK-23554
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 2.2.1, 2.3.0
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
>              Labels: csv, csvparser
>
> It would be great if Spark would support an option similar to Hive's 
> {{textinputformat.record.delimiter }} in spark-csv reader.
> We currently have to create Hive tables to workaround this missing 
> functionality natively in Spark.
> {{textinputformat.record.delimiter}} was introduced back in 2011 in 
> map-reduce era -
>  see MAPREDUCE-2254.
> As an example, one of the most common use cases for us involving 
> {{textinputformat.record.delimiter}} is to read multiple lines of text that 
> make up a "record". Number of actual lines per "record" is varying and so 
> {{textinputformat.record.delimiter}} is a great solution for us to process 
> these files natively in Hadoop/Spark (custom .map() function then actually 
> does processing of those records), and we convert it to a dataframe.. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23554) Hive's textinputformat.record.delimiter equivalent in Spark

Reply via email to