[ 
https://issues.apache.org/jira/browse/SPARK-25517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25517.
-----------------------------------
    Resolution: Duplicate

According to the comments on the PR, I'll close this as `Duplicate` for now.

> Spark DataFrame option inferSchema="true", dataFormat=MM/dd/yyyy, fails to 
> detect date type from the csv file while reading
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25517
>                 URL: https://issues.apache.org/jira/browse/SPARK-25517
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1
>         Environment: Spark 2.3.0
>            Reporter: Manoranjan Kumar
>            Priority: Major
>              Labels: easyfix
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> spark.read.format("csv").option("inferSchema", true).option("dateFormat", 
> "MM/dd/yyyy") fails to detect or infer the date type while reading the csv 
> file having date column in the specified format(MM/dd/yyyy)
> For example:-
> An employee csv file (employee.csv) has following two sample dummy records 
> (with header):
> emp_id,emp_name,joining_date,emp_age, emp_in_time,emp_salary
> 100,Bradd Pitt,{color:#f6c342}09/25/2018{color},26,{color:#f691b2}09/25/2018 
> 10:12:36{color},10000.00
> 101,Angel Joli,{color:#f6c342}08/20/2018{color},28,{color:#f691b2}08/20/2018 
> 11:32:58{color},12000.00
> when I read the above csv file as dataframe like below: 
> val empDF = spark.read.format("csv").option("inferSchema", 
> true).option("dateFormat","MM/dd/yyyy").option("timestampFormat","MM/dd/yyyy 
> HH:mm:ss").load(employee.csv)
> empDF.printSchema()
> results/output:
> root
>  |-- emp_id: integer (nullable = true)
>  |-- emp_name: string (nullable = true)
>  |-- {color:#d04437}joining_date: string{color} (nullable = true)
>  |-- emp_age: integer (nullable = true)
>  |-- {color:#d04437}emp_in_time: timestamp{color} (nullable = true)
>  |-- emp_salary: double (nullable = true)
> Please notice above (marked in {color:#d04437}red{color} color) the data type 
> automatically inferred by spark for joining_date and emp_in_time, for 
> joining_date, it fails to detect as date type and the type remains as 
> {color:#d04437}string{color} as it is, whereas it detects well for 
> emp_in_time as {color:#d04437}timestamp{color}
> This was the issue that I struggled with for a complete day, and when I dived 
> deep into the spark source code, i found the implementation for date type is 
> missing whereas the implementation for timestamp is present in all its glory.
> I am new to this place (exactly first timer), please get back in case of 
> further information or live example with running code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to