You have forgotten a y: It must be MM/did/yyyy
> On 17. Aug 2017, at 21:30, Aakash Basu <aakash.spark....@gmail.com> wrote: > > Hi Palwell, > > Tried doing that, but its becoming null for all the dates after the > transformation with functions. > > df2 = dflead.select('Enter_Date',f.to_date(df2.Enter_Date)) > > > <image.png> > > Any insight? > > Thanks, > Aakash. > >> On Fri, Aug 18, 2017 at 12:23 AM, Patrick Alwell <palw...@hortonworks.com> >> wrote: >> Aakash, >> >> I’ve had similar issues with date-time formatting. Try using the functions >> library from pyspark.sql and the DF withColumns() method. >> >> —————————————————————————————— >> >> from pyspark.sql import functions as f >> >> lineitem_df = >> lineitem_df.withColumn('shipdate',f.to_date(lineitem_df.shipdate)) >> >> —————————————————————————————— >> >> You should have first ingested the column as a string; and then leveraged >> the DF api to make the conversion to dateType. >> >> That should work. >> >> Kind Regards >> >> -Pat Alwell >> >> >>> On Aug 17, 2017, at 11:48 AM, Aakash Basu <aakash.spark....@gmail.com> >>> wrote: >>> >>> Hey all, >>> >>> Thanks! I had a discussion with the person who authored that package and >>> informed about this bug, but in the meantime with the same thing, found a >>> small tweak to ensure the job is done. >>> >>> Now that is fine, I'm getting the date as a string by predefining the >>> Schema but I want to later convert it to a datetime format, which is making >>> it this - >>> >>> >>> from pyspark.sql.functions import from_unixtime, unix_timestamp >>> >>> df2 = dflead.select('Enter_Date', >>> >>> from_unixtime(unix_timestamp('Enter_Date', 'MM/dd/yyy')).alias('date')) >>> >>> >>> >>> df2.show() >>> >>> <image.png> >>> >>> Which is not correct (as it is converting the 15 to 0015 instead of 2015. >>> Do you guys think using the DateUtil package will solve this? Or any other >>> solution with this built-in package? >>> >>> Please help! >>> >>> Thanks, >>> Aakash. >>> >>>> On Thu, Aug 17, 2017 at 12:01 AM, Jörn Franke <jornfra...@gmail.com> wrote: >>>> You can use Apache POI DateUtil to convert double to Date >>>> (https://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/DateUtil.html). >>>> Alternatively you can try HadoopOffice >>>> (https://github.com/ZuInnoTe/hadoopoffice/wiki), it supports Spark 1.x or >>>> Spark 2.0 ds. >>>> >>>> On 16. Aug 2017, at 20:15, Aakash Basu <aakash.spark....@gmail.com> wrote: >>>> >>>>> Hey Irving, >>>>> >>>>> Thanks for a quick revert. In Excel that column is purely string, I >>>>> actually want to import that as a String and later play around the DF to >>>>> convert it back to date type, but the API itself is not allowing me to >>>>> dynamically assign a Schema to the DF and I'm forced to inferSchema, >>>>> where itself, it is converting all numeric columns to double (Though, I >>>>> don't know how then the date column is getting converted to double if it >>>>> is string in the Excel source). >>>>> >>>>> Thanks, >>>>> Aakash. >>>>> >>>>> >>>>> On 16-Aug-2017 11:39 PM, "Irving Duran" <irving.du...@gmail.com> wrote: >>>>> I think there is a difference between the actual value in the cell and >>>>> what Excel formats that cell. You probably want to import that field as >>>>> a string or not have it as a date format in Excel. >>>>> >>>>> Just a thought.... >>>>> >>>>> >>>>> Thank You, >>>>> >>>>> Irving Duran >>>>> >>>>>> On Wed, Aug 16, 2017 at 12:47 PM, Aakash Basu >>>>>> <aakash.spark....@gmail.com> wrote: >>>>>> Hey all, >>>>>> >>>>>> Forgot to attach the link to the overriding Schema through external >>>>>> package's discussion. >>>>>> >>>>>> https://github.com/crealytics/spark-excel/pull/13 >>>>>> >>>>>> You can see my comment there too. >>>>>> >>>>>> Thanks, >>>>>> Aakash. >>>>>> >>>>>>> On Wed, Aug 16, 2017 at 11:11 PM, Aakash Basu >>>>>>> <aakash.spark....@gmail.com> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I am working on PySpark (Python 3.6 and Spark 2.1.1) and trying to >>>>>>> fetch data from an excel file using >>>>>>> spark.read.format("com.crealytics.spark.excel"), but it is inferring >>>>>>> double for a date type column. >>>>>>> >>>>>>> The detailed description is given here (the question I posted) - >>>>>>> >>>>>>> https://stackoverflow.com/questions/45713699/inferschema-using-spark-read-formatcom-crealytics-spark-excel-is-inferring-d >>>>>>> >>>>>>> >>>>>>> Found it is a probable bug with the crealytics excel read package. >>>>>>> >>>>>>> Can somebody help me with a workaround for this? >>>>>>> >>>>>>> Thanks, >>>>>>> Aakash. >>>>>> >>>>> >>>>> >>> >> >