Re: Problems with Hive Streaming. Compactions not working. Out of memory errors.

Eugene Koifman Tue, 29 Nov 2016 11:14:31 -0800

The OOM is most likely a side effect of not running compactions.
W/o compactions you never reduce the number of delta file that
need to be loaded to materialize the data set on read.


On 11/29/16, 10:03 AM, "Alan Gates" <alanfga...@gmail.com> wrote:

>I¹m guessing that this is an issue in the metastore database where it is
>unable to read from the transaction tables due to the ingestion rate.
>What version of Hive are you using?  What database are you storing the
>metadata in?
>
>Alan.
>
>> On Nov 29, 2016, at 00:05, Diego Fustes Villadóniga <dfus...@oesia.com>
>>wrote:
>> 
>> Hi all,
>>  
>> We are trying to use Hive streaming to ingest data in real time from
>>Flink. We send batches of data every 5 seconds to Hive. We are working
>>version 1.1.0-cdh5.8.2.
>>  
>> The ingestión works fine. However, compactions are not working, the log
>>shows this error:
>>  
>> Unable to select next element for compaction, ERROR: could not
>>serialize access due to concurrent update
>>  
>> In addition, when we run simple queries like SELECT COUNT(1) FROM
>>events, we are getting OutOfMemory errors, even though we have assigned
>>10GB to each Mapper/Reducer. Seeing the logs, each map task tries to load
>> all delta files, until it breaks, which does not make much sense to me.
>>  
>>  
>> I think that we have followed all the steps described in the
>>documentation, so we are blocked in this point.
>>  
>> Could you help us?
>
>

Re: Problems with Hive Streaming. Compactions not working. Out of memory errors.

Reply via email to