RE: Accessing Hive "date" column in Pig

Gufran Mohammed Pathan Mon, 10 Nov 2014 06:36:18 -0800

Hi,



Being able to use the HCatLoader() is becoming critical for me as that seems 
the only way I can access ORC files. Does anyone have any ideas on how I can 
resolve either of the following issues:



1.       Use the OrcStorage() function in Pig 0.12.1 (Apparently PIG - 3558 
which added ORC support in Pig is included in the HDP 2.1 distribution - 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.7-pig.html).
However when I execute - A = load 'orc_file_path' using OrcStorage(); gives the 
ClassNotFoundException for the class 
"org.apahce.hive.hadoop.hive.ql.io.orc.OrcNewInputFormat".
Am I missing any hive jars that I should be registering?



2.       Reading Orc files using HCatLoader() for tables that have the "date" 
column in it. I'm facing the issue documented below. Is there any way I can 
change the metadata to change the "date" format for a column to a "chararray" 
or something else? I tried projecting the relation that used the HCatLoader() 
into a pig-supported schema but it still didn't work (same error as documented 
below).



Any help on these issues would be much appreciated. Thanks in advance!



Gufran Pathan| +91 7760913355|



-----Original Message-----
From: Gufran Mohammed Pathan [mailto:gufran.path...@mu-sigma.com]
Sent: Saturday, November 08, 2014 6:50 PM
To: user@pig.apache.org
Subject: Accessing Hive "date" column in Pig



Hi,



I'm facing a problem accessing Hive tables that have the "date" data type in 
Pig. I get the "java.lang.NoSuchMethodError: 
org.joda.time.DateTime.<init>(IIIII)V"



The relevant YARN logs for the job below:



2014-11-08 22:45:17,132 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id

2014-11-08 22:45:17,163 INFO [main] 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: 
Current split being processed 
org.apache.hive.hcatalog.mapreduce.HCatSplit@4248d35a<mailto:org.apache.hive.hcatalog.mapreduce.HCatSplit@4248d35a>

2014-11-08 22:45:17,278 INFO [main] 
org.apache.hive.hcatalog.mapreduce.InternalUtil: Initializing 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe with properties 
{name=default.testdate4, numFiles=1, field.delim=,, columns.types=date, 
serialization.format=,, columns=dt, rawDataSize=0, numRows=0, 
serialization.lib=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
COLUMN_STATS_ACCURATE=true, totalSize=77, serialization.null.format=\N, 
transient_lastDdlTime=1415395168}

2014-11-08 22:45:17,364 INFO [main] org.apache.pig.data.SchemaTupleBackend: Key 
[pig.schematuple] was not set... will not generate code.

2014-11-08 22:45:17,411 INFO [main] 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: 
Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4] C:  R:

2014-11-08 22:45:17,442 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.NoSuchMethodError: 
org.joda.time.DateTime.<init>(IIIII)V

                at 
org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:420)

                at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:456)

                at 
org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:374)

                at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)

                at 
org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)

                at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)

                at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)

                at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)

                at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)

                at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

                at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)

                at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

                at java.security.AccessController.doPrivileged(Native Method)

                at javax.security.auth.Subject.doAs(Subject.java:415)

                at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)

                at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)





I ran the following Pig code:



pig -useHCatalog

A = load 'testdate4' using org.apache.hive.hcatalog.pig.HCatLoader();

dump A;





The "testdate4" hive table details are as follows:





0: jdbc:hive2://hdpbox.musigma:10000> desc testdate4;

+-------------+---------------+-------------+

| col_name  | data_type  | comment  |

+-------------+---------------+-------------+

| dt                 | date               |                      |

+-------------+---------------+-------------+





0: jdbc:hive2://*****.*****:10000> select * from testdate4;

+---------------+

| testdate4.dt  |

+---------------+

| 2011-09-01    |

| 2012-08-11    |

| 1992-10-18    |

| 1996-03-20    |

| 2007-07-23    |

| 2008-04-15    |

| 2014-12-30    |

+---------------+



Could it be related to the joda-time version as discussed in 
https://issues.apache.org/jira/browse/PIG-3953



Any ideas on how I can resolve this?



Version details:

Hive - 0.13.0.2.1.7.0-784

Pig - Pig 0.12.1.2.1.7.0-784

HDP -  2.1



P.S. I can't use "org.apache.hcatalog.pig.HCatLoader()" either as it does not 
support the "date type" (ERROR 1200: Type date not present)



Gufran Pathan| +91 7760913355|



Disclaimer: http://www.mu-sigma.com/disclaimer.html

Disclaimer: http://www.mu-sigma.com/disclaimer.html

RE: Accessing Hive "date" column in Pig

Reply via email to