The version of hive is 13.1. Timestamps are working fine in hive with parquet though. It looks like the fixes have been backported.
http://archive.cloudera.com/cdh5/cdh/5/hive-0.13.1-cdh5.3.0.releasenotes.html On Wed, Feb 4, 2015 at 3:51 AM, Raviprasad N Pentakota <[email protected]> wrote: > Hi, > There was known issues with the previous versions of hive with parquet & > timestamp combination. Please check it once which version of hive you are > using in your cluster. > > Regards, > Ravi Prasad Pentakota > India Software Lab, IBM Software Group > Phone: +9180-43328520 Mobile: 919620959477 > e-mail:[email protected] > > > > [image: Inactive hide details for Joshua Baxter ---02/03/2015 09:05:50 > PM---I've had a little more luck with this after upgrading to CD]Joshua > Baxter ---02/03/2015 09:05:50 PM---I've had a little more luck with this > after upgrading to CDH 5.3. The oracle direct connector seems > > From: Joshua Baxter <[email protected]> > To: [email protected] > Date: 02/03/2015 09:05 PM > Subject: Re: --as-parquet-file, Oraoop and Decimal and Timestamp types > ------------------------------ > > > > I've had a little more luck with this after upgrading to CDH 5.3. The > oracle direct connector seems to be working well with hcatalog integration > and the various output file formats. However its seems that parquet doesn't > work with hcatalog integration. When using "stored as parquet" as the > --hcatalog-storage-stanza all the mappers are erroring with the below. > > > 15/02/02 17:17:03 INFO mapreduce.Job: Task Id : > attempt_1422914679712_0003_m_000042_1, Status : FAILED > Error: java.lang.RuntimeException: Should never be used > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:79) > at > org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:103) > at > org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:644) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > > Anyone had any luck sqooping directly to parquet with Decimal and > Timestamp types? > > On Tue, Dec 2, 2014 at 6:17 PM, Joshua Baxter <*[email protected]* > <[email protected]>> wrote: > > I'm using Sqoop, Oraoop and the --as-parquet-file switch to pull down > partitions of a large fact table and getting some great speed. There are > not any columns i can evenly split by with the default connector but with > Oraoop I can get evenly sized parquet files that i can use directly in > impala and hive without incurring remote reads. A couple things i have > noticed though. > - Decimal fields are getting exported as strings. SQOOP-1445 refers > to this but it sounds like a fix isn't planned due to the HCatalog > support. > Unfortunately the direct connectors, apart from Netezza, are not > currently > not supported. > - You need to use option -Doraoop.timestamp.string=false otherwise > you get an Not in union ["long","null"]: 2014-07-24 00:00:00 exception > due > to the intermediary file format. However the resulting parquet file is a > double rather then a hive or impala compatible timestamp. > Here is what i am running now. > > sqoop import -Doraoop.chunk.method=ROWID > -Doraoop.timestamp.string=false -Doraoop.import.partitions=${PARTITION} \ > --direct \ > --connect jdbc:oracle:thin:@//${DATABASE} \ > --table "${TABLE}" \ > --columns COL1,COL2,COL3,COL4,COL5,COL6 \ > --map-column-java COL1=Long,COL2=Long,COL3=Long,COL4=Long \ > --m 48 \ > --target-dir /user/joshba/LANDING_PAD/TABLE-${PARTITION}/ \ > --delete-target-dir > > COL1-4 are stored as NUMBER(38,0) but don't hold anything more than a > the size of a long so I've remapped those to save space. COL5 is a Decimal > and COL6 is a DATE. Is there any way I can remap these also so that they > are written into the parquet file as DECIMAL and timestamp compatible types > respectively so there isn't a needed to redefine these columns. > > Many Thanks > > Josh > > >
