Re: FlinkSQL1.11.1读取kafka写入Hive(parquet) OOM问题

Jingsong Li Tue, 15 Sep 2020 22:41:47 -0700

可以考虑在写之前按照hashtid keyBy下吗？

Best,
Jingsong


On Wed, Sep 16, 2020 at 9:36 AM wangenbao <[email protected]> wrote:

> 求教各位大佬：
> 有遇到如下问题的吗？
>
> 1、我首先通过TableAPI读取Kafka中PB格式数据，转换成POJO对象，然后注册成View；
> 2、然后Insert into到三分区（日，小时，hashtid）的Hive表（Parquet格式Snappy压缩）中；
> 3、数据的分区相对分散些就会出现OOM问题，具体表现为
> parquet.hadoop.MemoryManager: Total allocation exceeds 50.00%
> (2,102,394,880
> bytes) of heap memory
> Scaling row group sizes to 13.62% for 115 writers
> 随后就会出现java.lang.OutOfMemoryError: Java heap space
>
> 我认为是Parquet的Writer数比较多，不知道大佬遇见过类似问题吗，该如何解决啊
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



-- 
Best, Jingsong Lee

Re: FlinkSQL1.11.1读取kafka写入Hive(parquet) OOM问题

回复