> select count(*) from alexa_parquet; > Caused by: java.lang.NullPointerException > at >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.tokeni >ze(TypeInfoUtils.java:274) > at >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.<init> >(TypeInfoUtils.java:293) > at >org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeS >tring(TypeInfoUtils.java:764) > at >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getColum >nTypes(DataWritableReadSupport.java:76) > at >org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(Dat >aWritableReadSupport.java:220) > at >org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSp >lit(ParquetRecordReaderWrapper.java:256)
This might be an NPE triggered off by a specific case of the type parser. I tested it out on my current build with simple types and it looks like the issue needs more detail on the column types for a repro. hive> create temporary table x (x int) stored as parquet; hive> insert into x values(1),(2); hive> select count(*) from x where x.x > 1; Status: DAG finished successfully in 0.18 seconds OK 1 Time taken: 0.792 seconds, Fetched: 1 row(s) hive> Do you have INT96 in the schema? > I'm currently evaluating Hive on Tez as an alternative to keeping the >SparkSQL thrift sever running all the time locking up resources. Tez has a tunable value in tez.am.session.min.held-containers (i.e something small like 10). And HiveServer2 can be made work similarly because spark HiveThriftServer2.scala is a wrapper around hive's ThriftBinaryCLIService. Cheers, Gopal