Problems with Hive Streaming. Compactions not working. Out of memory errors.

Diego Fustes Villadóniga Tue, 29 Nov 2016 00:06:12 -0800

Hi all,

We are trying to use Hive streaming to ingest data in real time from Flink. We 
send batches of data every 5 seconds to Hive. We are working version 
1.1.0-cdh5.8.2.


The ingestión works fine. However, compactions are not working, the log shows 
this error:

Unable to select next element for compaction, ERROR: could not serialize access 
due to concurrent update

In addition, when we run simple queries like SELECT COUNT(1) FROM events, we 
are getting OutOfMemory errors, even though we have assigned 10GB to each 
Mapper/Reducer. Seeing the logs, each map task tries to load
all delta files, until it breaks, which does not make much sense to me.


I think that we have followed all the steps described in the documentation, so 
we are blocked in this point.

Could you help us?

Problems with Hive Streaming. Compactions not working. Out of memory errors.

Reply via email to