Found a work around by subtracting 4881176 days when generating parquet files and verified the correct dates in Spark..
+----------+ | as_of| +----------+ |2016-09-30| |2015-08-05| +----------+ 0: jdbc:drill:zk=local> create table dfs.tmp.`/test` as select DATE_ADD(cast(as_of AS date), -4881176) as as_of from table(dfs.`/tmp /test.txt`(type => 'text', fieldDelimiter => ',', extractHeader => true)); java -jar parquet-tools-1.6.1-SNAPSHOT.jar head -n3 /tmp/test as_of = 17074 as_of = 16652 David Lee Vice President | BlackRock Phone: +1.415.670.2744 | Mobile: +1.415.706.6874 -----Original Message----- From: rahul challapalli [mailto:[email protected]] Sent: Tuesday, November 01, 2016 11:28 AM To: user <[email protected]> Subject: Re: Parquet Date Format Problem The fix will be available with the Drill 1.9 release unless you want to build from source yourself. On Tue, Nov 1, 2016 at 11:24 AM, Lee, David <[email protected]> wrote: > Nevermind.. Found the problem.. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > _jira_browse_DRILL-2D4203&d=DQIFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBT > ifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=OMoe-8auI3Ux9axzRFzxp7ArI-nYM2kX > DCZ-XJMqFeE&s=9cht_VOrVnTsWrUJg3KKAeqekOC0UkHDGd3wVSJqifA&e= > > > David Lee > Vice President | BlackRock > Phone: +1.415.670.2744 | Mobile: +1.415.706.6874 > > From: Lee, David > Sent: Tuesday, November 01, 2016 11:21 AM > To: '[email protected]' <[email protected]> > Subject: Parquet Date Format Problem > > I created a parquet file using Drill, but date values in the parquet > files don’t appear to be a logical INT32 type and as such when I’m > trying to read the parquet file in Spark it looks corrupted.. > > Here’s my test case.. > > > A. Create a test.txt file in /tmp: > > as_of > 2016-09-30 > > > B. Convert it to parquet using Drill: > > 0: jdbc:drill:zk=local> create table dfs.tmp.`/test` as select > cast(as_of AS date) as as_of from table(dfs.`/tmp/test.txt`(type => > 'text', fieldDelimiter => ',', extractHeader => true)); > > > C. Read the new file using Drill which looks fine: > > > 0: jdbc:drill:zk=local> select * from dfs.`/tmp/test`; > +-------------+ > | as_of | > +-------------+ > | 2016-09-30 | > +-------------+ > > > D. However running parquet-tools on it gives a completely different > result: > > java -jar parquet-tools-1.6.1-SNAPSHOT.jar head -n3 /tmp/test as_of = > 4898250 > > java -jar parquet-tools-1.6.1-SNAPSHOT.jar schema > /tmp/test/0_0_0.parquet message root { > required int32 as_of (DATE); > } > > According to the Parquet docs.. 4898250 days after Jan 1st 1970 is > sometime in the year 15,435.. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Parque > t_parquet-2Dformat_blob_master_LogicalTypes.md&d=DQIFaQ&c=zUO0BtkCe66y > JvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=OMoe-8auI3U > x9axzRFzxp7ArI-nYM2kXDCZ-XJMqFeE&s=dMrQzMV0gwJbSL_Vl48Zk41FW3V6RRVuqes > WaXAFKtk&e= > DATE > DATE is used to for a logical date type, without a time of day. It > must annotate an int32 that stores the number of days from the Unix > epoch, 1 January 1970. > > > > David Lee > Vice President | BlackRock > Phone: +1.415.670.2744 | Mobile: +1.415.706.6874 > > > This message may contain information that is confidential or privileged. > If you are not the intended recipient, please advise the sender > immediately and delete this message. See http://www.blackrock.com/ > corporate/en-us/compliance/email-disclaimers for further information. > Please refer to http://www.blackrock.com/corporate/en-us/compliance/ > privacy-policy for more information about BlackRock’s Privacy Policy. > For a list of BlackRock's office addresses worldwide, see > http://www.blackrock.com/corporate/en-us/about-us/contacts-locations. > > © 2016 BlackRock, Inc. All rights reserved. >
