[ https://issues.apache.org/jira/browse/SPARK-21763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21763. ---------------------------------- Resolution: Invalid I think it should be asked to https://github.com/crealytics/spark-excel. > InferSchema option does not infer the correct schema (timestamp) from xlsx > file. > -------------------------------------------------------------------------------- > > Key: SPARK-21763 > URL: https://issues.apache.org/jira/browse/SPARK-21763 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Environment: Environment is my personal laptop. > Reporter: ANSHUMAN > Priority: Minor > > I have a xlsx file containing date/time filed (My Time) in following format > and sample records - > 5/16/2017 12:19:00 AM > 5/16/2017 12:56:00 AM > 5/16/2017 1:17:00 PM > 5/16/2017 5:26:00 PM > 5/16/2017 6:26:00 PM > I am reading the xlsx file in following manner: - > {code:java} > val inputDF = spark.sqlContext.read.format("com.crealytics.spark.excel") > .option("location","file:///C:/Users/file.xlsx") > .option("useHeader","true") > .option("treatEmptyValuesAsNulls","true") > .option("inferSchema","true") > .option("addColorColumns","false") > .load() > {code} > When I try to get schema using > {code:java} > inputDF.printSchema() > {code} > , I get *Double*. > Sometimes, even I get the schema as *String*. > And when I print the data, I get the output as: - > +------------------+ > | My Time| > +------------------+ > |42871.014189814814| > | 42871.03973379629| > |42871.553773148145| > | 42871.72765046296| > | 42871.76887731482| > +------------------+ > Above output is clearly not correct for the given input. > Moreover, if I convert the xlsx file in csv format and read it, I get the > output correctly. Here is the way how I read in csv format: - > {code:java} > spark.sqlContext.read.format("csv") > .option("header", "true") > .option("inferSchema", true) > .load(fileLocation) > {code} > Please look into the issue. I could not find the answer to it anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org