Re: NPE when reading Parquet using Hive on Tez

Gopal Vijayaraghavan Mon, 04 Jan 2016 12:59:18 -0800

> select count(*) from alexa_parquet;

> Caused by: java.lang.NullPointerException
>    at 
>org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.tokeni
>ze(TypeInfoUtils.java:274)
>    at 
>org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.<init>
>(TypeInfoUtils.java:293)
>    at 
>org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeS
>tring(TypeInfoUtils.java:764)
>    at 
>org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getColum
>nTypes(DataWritableReadSupport.java:76)
>    at 
>org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(Dat
>aWritableReadSupport.java:220)
>    at 
>org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSp
>lit(ParquetRecordReaderWrapper.java:256)


This might be an NPE triggered off by a specific case of the type parser.

I tested it out on my current build with simple types and it looks like
the issue needs more detail on the column types for a repro.

hive> create temporary table x (x int) stored as parquet;
hive> insert into x values(1),(2);
hive> select count(*) from x where x.x > 1;
Status: DAG finished successfully in 0.18 seconds
OK
1
Time taken: 0.792 seconds, Fetched: 1 row(s)
hive> 

Do you have INT96 in the schema?

> I'm currently evaluating Hive on Tez as an alternative to keeping the
>SparkSQL thrift sever running all the time locking up resources.

Tez has a tunable value in tez.am.session.min.held-containers (i.e
something small like 10).

And HiveServer2 can be made work similarly because spark
HiveThriftServer2.scala is a wrapper around hive's ThriftBinaryCLIService.






Cheers,
Gopal

Re: NPE when reading Parquet using Hive on Tez

Reply via email to