Hi all, We are trying to use Hive streaming to ingest data in real time from Flink. We send batches of data every 5 seconds to Hive. We are working version 1.1.0-cdh5.8.2.
The ingestión works fine. However, compactions are not working, the log shows this error: Unable to select next element for compaction, ERROR: could not serialize access due to concurrent update In addition, when we run simple queries like SELECT COUNT(1) FROM events, we are getting OutOfMemory errors, even though we have assigned 10GB to each Mapper/Reducer. Seeing the logs, each map task tries to load all delta files, until it breaks, which does not make much sense to me. I think that we have followed all the steps described in the documentation, so we are blocked in this point. Could you help us?