Hivecontext going out-of-sync issue

Ranadip Chatterjee Thu, 18 Jun 2015 10:23:13 -0700

Hi All.

I have a partitioned table in Hive. The use case is to drop one of the
partitions before inserting new data every time the Spark process runs. I
am using the Hivecontext to read and write (dynamic partitions) and also to
alter the table to drop the partition before insert. Everything runs fine
when run for the first time (the partition being inserted didn't exist
before). However, if the partition existed and was dropped by the alter
table command in the same process, then the insert fails with the error
"FileNotFoundException: File does not exist : <hdfs table
location>/part_col=val1/part-00000". When the program is rerun as-is, it
succeeds as now the partition does not exist any more when it starts up.
Spark 1.3.0 on CDH5.4.0.


Things I have tried:
- Put a pause of up to 1 min between alter table and insert to ensure that
any possibly pending async task in the background gets time to finish.
- Create a new Hivecontext object and call Insert with it (Call "drop
partition" and insert using separate hive context objects). The intention
was perhaps a new hive context will be created with the correct state of
the hive metastore at that moment and should work.
- Create a new SparkContext and a HiveContext - more of throwing a stone at
the dark - try and create a new set of contexts after the alter table to
try and reload the states at that point in time.

None of these have worked so far.

Any ideas, suggestions or experiences on similar lines..?

-- 
Regards,
Ranadip Chatterjee

Hivecontext going out-of-sync issue

Reply via email to