The OOM is most likely a side effect of not running compactions. W/o compactions you never reduce the number of delta file that need to be loaded to materialize the data set on read.
On 11/29/16, 10:03 AM, "Alan Gates" <alanfga...@gmail.com> wrote: >I¹m guessing that this is an issue in the metastore database where it is >unable to read from the transaction tables due to the ingestion rate. >What version of Hive are you using? What database are you storing the >metadata in? > >Alan. > >> On Nov 29, 2016, at 00:05, Diego Fustes Villadóniga <dfus...@oesia.com> >>wrote: >> >> Hi all, >> >> We are trying to use Hive streaming to ingest data in real time from >>Flink. We send batches of data every 5 seconds to Hive. We are working >>version 1.1.0-cdh5.8.2. >> >> The ingestión works fine. However, compactions are not working, the log >>shows this error: >> >> Unable to select next element for compaction, ERROR: could not >>serialize access due to concurrent update >> >> In addition, when we run simple queries like SELECT COUNT(1) FROM >>events, we are getting OutOfMemory errors, even though we have assigned >>10GB to each Mapper/Reducer. Seeing the logs, each map task tries to load >> all delta files, until it breaks, which does not make much sense to me. >> >> >> I think that we have followed all the steps described in the >>documentation, so we are blocked in this point. >> >> Could you help us? > >