Forcing either Hive or Spark SQL representation for metastore

Justin Miller Thu, 18 May 2017 15:02:19 -0700

Hello,

I was wondering if there were a way to force one representation or another for 
the Hive metastore. Some of our data can’t be parsed with the Hive method so it 
switches over to the Spark SQL method, leaving some of our data stored in Spark 
SQL format and some in Hive format. It’d be nice if we could force it to use 
the Spark SQL format, as changing the underlying data would be difficult.


Some have (Spark SQL): 

17/05/18 21:50:29 WARN HiveExternalCatalog: Could not persist 
`default`.`tablenamehere` in a Hive compatible way. Persisting it into Hive 
metastore in Spark SQL specific format.

Some have (Hive):

17/05/18 21:52:27 INFO CatalystSqlParser: Parsing command: int
17/05/18 21:52:27 INFO CatalystSqlParser: Parsing command: int
17/05/18 21:52:27 INFO CatalystSqlParser: Parsing command: int
17/05/18 21:52:27 INFO CatalystSqlParser: Parsing command: int
17/05/18 21:52:27 INFO CatalystSqlParser: Parsing command: bigint
17/05/18 21:52:27 INFO CatalystSqlParser: Parsing command: 
struct<srcMac:binary,dstMac:binary,srcIp:binary,dstIp:binary,srcPort:int,dstPort:int,proto:string,layer3Proto:int,layer4Proto:int>

Another question about persisting to S3, I’m getting the following for all: 

Caused by: MetaException(message:java.io.IOException: Got exception: 
java.io.IOException 
/username/sys_encrypted/staging/raw/updatedTimeYear=2017/updatedTimeMonth=5/updatedTimeDay=16/updatedTimeHour=23
 doesn't exist)

Thanks!
Justin

Forcing either Hive or Spark SQL representation for metastore

Reply via email to