[jira] [Created] (SPARK-23612) Specify formats for individual DateType and TimestampType columns in schemas

Patrick Young (JIRA) Tue, 06 Mar 2018 08:30:35 -0800

Patrick Young created SPARK-23612:
-------------------------------------

             Summary: Specify formats for individual DateType and TimestampType 
columns in schemas
                 Key: SPARK-23612
                 URL: https://issues.apache.org/jira/browse/SPARK-23612
             Project: Spark
          Issue Type: Improvement
          Components: PySpark, SQL
    Affects Versions: 2.3.0
            Reporter: Patrick Young



[https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200]

It would be very helpful if it were possible to specify the format for 
individual columns in a schema when reading csv files, rather than one format:

{code:title=Bar.python|borderStyle=solid}

# Currently can only do something like:

spark.read.option("**dateFormat", "yyyyMMdd").csv(...) 

# Would like to be able to do something like:

schema = StructType([

    StructField("date1", DateType(format="MM/dd/yyyy"), True),

    StructField("date2", DateType(format="yyyyMMdd"), True)

]

read.schema(schema).csv(...)

{{{code}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23612) Specify formats for individual DateType and TimestampType columns in schemas

Reply via email to