date:20200823

Re: flink集成到cdh

2020-08-23 文章啤酒鸭2

mark一下，我也是1.11，目前整合不了



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink 1.11.1 与HDP3.0.1中的hive集成，查询不出hive表数据

2020-08-23 文章 taochanglian

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/ 
中的 flink-sql-connector-hive-3.1.2 下载了么，放到lib里面了么?


在 2020/8/24 3:01, 黄蓉 写道:

各位好：

我使用的环境是HDP3.0.1的沙盒，flink是最新版本的1.11.1，从官网直接下载的编译好的jar包。我想测试flink与hive的集成，包括查询hive表的数据、写入数据到hive表等操作。目前我遇到问题就是通过flink 
sql 
client查询不出表数据，并且也不报错。但是该表在hive中查询是有记录的。其余的show 
tables，show database等语句都可以正常显示。


配置的hadoop环境变量如下：
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export HADOOP_HOME="/usr/hdp/3.0.1.0-187/hadoop"
export 
HADOOP_CLASSPATH="/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*"


sql-client配置文件如下：
tables: []
functions: []
catalogs:
   - name: myhive
 type: hive
 hive-conf-dir: /opt/hive-conf
execution:
  planner: blink
  type: batch
  result-mode: table
  max-table-result-rows: 100
  parallelism: 3
  max-parallelism: 128
  min-idle-state-retention: 0
  max-idle-state-retention: 0
  current-catalog: myhive
  current-database: default
  restart-strategy:
    type: fallback
deployment:
  response-timeout: 5000
  gateway-address: ""
  gateway-port: 0


请问出现这种情况是不是官网的flink包与hdp3.0.1不兼容？我需要自己重新编译flink吗？ 



Jessie
jessie...@gmail.com

Re: flink1.11 cdc使用

2020-08-23 文章 china_tao

支持。
insert into mysqlresult select k.vin,k.msgtime,d.brand_name from (SELECT
vin,max(msgtime) as msgtime,max(pts) as pts from kafkaSourceTable  group by
TUMBLE(rowtime, INTERVAL '10' SECOND),vin) AS k left join msyqlDimTable  FOR
SYSTEM_TIME AS OF k.pts AS d ON k.vin = d.vin

类似这样，先开10秒窗口获得kafka数据，然后join msyql维度表，然后插入mysql。
关键就是注意维度表lookup_cache_max-rows，lookup_cache_ttl这两个参数，设置维度表的更新时间。具体项目，具体对待，关键就是看看需要维度表支持多长时间的更新延迟。
另外，join维度表，目前应该只支持pts，不支持rowtime。



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Flink log4j2 问题

2020-08-23 文章 guaishushu1...@163.com

SQL提交会出现这种问题？？？
Caused by: java.lang.IllegalArgumentException: Initial capacity must be at 
least one but was 0
at 
org.apache.logging.log4j.util.SortedArrayStringMap.(SortedArrayStringMap.java:102)
at 
org.apache.logging.log4j.core.impl.ContextDataFactory.createContextData(ContextDataFactory.java:109)
at 
org.apache.logging.log4j.core.impl.ContextDataFactory.(ContextDataFactory.java:57)
... 29 more



guaishushu1...@163.com

Re: flink 1.11.1 与HDP3.0.1中的hive集成，查询不出hive表数据

2020-08-23 文章 Rui Li

Hello,

HDP里的hive版本是多少啊？另外你要查的表是啥样的呢（describe formatted看一下）？

On Mon, Aug 24, 2020 at 3:02 AM 黄蓉  wrote:

> 各位好：
>
>
> 我使用的环境是HDP3.0.1的沙盒，flink是最新版本的1.11.1，从官网直接下载的编译好的jar包。我想测试flink与hive的集成，包括查询hive表的数据、写入数据到hive表等操作。目前我遇到问题就是通过flink
>
> sql client查询不出表数据，并且也不报错。但是该表在hive中查询是有记录的。其余的show tables，show
> database等语句都可以正常显示。
>
> 配置的hadoop环境变量如下：
> export HADOOP_CONF_DIR="/etc/hadoop/conf"
> export HADOOP_HOME="/usr/hdp/3.0.1.0-187/hadoop"
> export
>
> HADOOP_CLASSPATH="/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*"
>
> sql-client配置文件如下：
> tables: []
> functions: []
> catalogs:
> - name: myhive
>   type: hive
>   hive-conf-dir: /opt/hive-conf
> execution:
>planner: blink
>type: batch
>result-mode: table
>max-table-result-rows: 100
>parallelism: 3
>max-parallelism: 128
>min-idle-state-retention: 0
>max-idle-state-retention: 0
>current-catalog: myhive
>current-database: default
>restart-strategy:
>  type: fallback
> deployment:
>response-timeout: 5000
>gateway-address: ""
>gateway-port: 0
>
>
> 请问出现这种情况是不是官网的flink包与hdp3.0.1不兼容？我需要自己重新编译flink吗？
>
> Jessie
> jessie...@gmail.com
>
>

-- 
Best regards!
Rui Li

Re: 1.11版本，关于TableEnvironment.executeSql("insert into ...")，job名称设置的问题

2020-08-23 文章 DanielGu

>我通过表环境执行insert into语句提交作业，我该如何设置我的job名称呢？ 

Hi,
社区也有相关讨论
https://issues.apache.org/jira/browse/FLINK-18545
之前在有看到一些相关的解决方案,可以参考
https://www.jianshu.com/p/5981646cb1d4
https://www.cnblogs.com/Springmoon-venn/p/13375972.html
Best,
DanielGu



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink on yarn配置问题

2020-08-23 文章赵一旦

比如今天尝试了一波命令：./bin/yarn-session.sh -nm test_flink -q -qu upd_security -s 1
-tm 3024MB -jm 3024MB
同时我设置了 export HADOOP_USER_NAME=xxx
，这个在启动的时候会看到日志：org.apache.flink.runtime.security.modules.HadoopModule  -
Hadoop user set to upd_security (auth:SIMPLE)。

然后报错：

2020-08-24 10:52:31 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli  -
Error while running the Flink session.
java.lang.RuntimeException: Couldn't get cluster description
at
org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1254)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:534)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: java.lang.NullPointerException: null
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getChildQueues(YarnClientImpl.java:587)
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getAllQueues(YarnClientImpl.java:557)
at
org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1247)
... 7 common frames omitted


 The program finished with the following exception:

java.lang.RuntimeException: Couldn't get cluster description
at
org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1254)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:534)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$5(FlinkYarnSessionCli.java:785)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:785)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getChildQueues(YarnClientImpl.java:587)
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getAllQueues(YarnClientImpl.java:557)
at
org.apache.flink.yarn.YarnClusterDescriptor.getClusterDescription(YarnClusterDescriptor.java:1247)
... 7 more





caozhen  于2020年8月24日周一 上午10:00写道：

> 报错是 AM申请资源时vcore不够
>
> 1、可以确认当前队列是否有足够的vcore
> 2、确认当前队列允许允许的最大application数
>
> 我之前遇到这个问题是队列没有配置好资源导致
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/

?????? 1.11??????????TableEnvironment.executeSql("insert into ...")??job??????????????

2020-08-23 文章 Asahi Lee

??sqljob??




----
??: 
   "user-zh"

Re: flink on yarn配置问题

2020-08-23 文章 caozhen

报错是 AM申请资源时vcore不够

1、可以确认当前队列是否有足够的vcore
2、确认当前队列允许允许的最大application数

我之前遇到这个问题是队列没有配置好资源导致



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink on yarn配置问题

2020-08-23 文章 caozhen

报错是 AM申请资源时vcore不够

1、可以确认当前队列是否有足够的vcore
2、确认当前队列允许允许的最大application数

我之前遇到这个问题是队列没有配置好资源导致



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink-sql-gateway还会更新吗

2020-08-23 文章 shougou

也有同样的问题，同时也问一下，sql client 计划在哪个版本支持gateway模式？多谢



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink on yarn配置问题

2020-08-23 文章 caozhen


报错是申请AM时vcore不够

1、可以确认下当前队列是否有剩余vcore数
2、确认当前队列允许的最大应用数是否超了


之前遇到过这个问题原因是队列没有分配资源，跟你的可能不一样




--
Sent from: http://apache-flink.147419.n8.nabble.com/

flink 1.11.1 与HDP3.0.1中的hive集成，查询不出hive表数据

2020-08-23 文章黄蓉


各位好：

   
我使用的环境是HDP3.0.1的沙盒，flink是最新版本的1.11.1，从官网直接下载的编译好的jar包。我想测试flink与hive的集成，包括查询hive表的数据、写入数据到hive表等操作。目前我遇到问题就是通过flink 
sql client查询不出表数据，并且也不报错。但是该表在hive中查询是有记录的。其余的show tables，show 
database等语句都可以正常显示。


配置的hadoop环境变量如下：
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export HADOOP_HOME="/usr/hdp/3.0.1.0-187/hadoop"
export 
HADOOP_CLASSPATH="/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*"


sql-client配置文件如下：
tables: []
functions: []
catalogs:
   - name: myhive
 type: hive
 hive-conf-dir: /opt/hive-conf
execution:
  planner: blink
  type: batch
  result-mode: table
  max-table-result-rows: 100
  parallelism: 3
  max-parallelism: 128
  min-idle-state-retention: 0
  max-idle-state-retention: 0
  current-catalog: myhive
  current-database: default
  restart-strategy:
type: fallback
deployment:
  response-timeout: 5000
  gateway-address: ""
  gateway-port: 0


请问出现这种情况是不是官网的flink包与hdp3.0.1不兼容？我需要自己重新编译flink吗？

Jessie
jessie...@gmail.com

Re: flink on yarn配置问题

2020-08-23 文章魏烽

Hi

一旦，提交任务的命令有嘛，可以发出来看看

或者在提交的时候指定一下提交任务到哪个队列

 原始邮件
发件人: 赵一旦
收件人: user-zh
发送时间: 2020年8月23日(周日) 22:58
主题: Re: flink on yarn配置问题


嗯，直观看是这个问题。想知道这个问题有啥常见原因？这个报错只是最终原因，但不一定是直接原因。因为这个yarn集群不可能没资源，我只是简单实验下，我们的yarn是个超级集群，不可能没资源。
我猜测会不会是其他问题，比如yarn队列不对，导致没资源？再或者不清楚可不可能与yarn的鉴权有关，我们的yarn集群应该是有用户权限和资源配额限制的，但理论上我是从另外一个集群上抄的配置，不清楚有没有搞错。
原机器是用于提交spark任务的，我主要复制了hadoop部分（yarn）到另一个机器（B），用B这台机器计划做flink任务的提交。

Zou Dan mailto:zoud...@163.com>> 于2020年8月23日周日 下午2:16写道：

> Hi, 一旦， root cause 应该是下面这个日志
> The number of requested virtual cores for application master 1 exceeds the
> maximum number of virtual cores 0 available in the Yarn Cluster.
>
> 我简单看了一下代码，应该是你们 yarn 节点上没有可用的资源，numYarnMaxVcores = 0
>
> > 2020年8月21日 下午11:11，赵一旦 mailto:hinobl...@gmail.com> 
> > >>
> 写道：
> >
> > The number of requested virtual cores for application master 1 exceeds
> the
> > maximum number of virtual cores 0 available in the Yarn Cluster.
>
>

Re: flink on yarn配置问题

2020-08-23 文章赵一旦

嗯，直观看是这个问题。想知道这个问题有啥常见原因？这个报错只是最终原因，但不一定是直接原因。因为这个yarn集群不可能没资源，我只是简单实验下，我们的yarn是个超级集群，不可能没资源。
我猜测会不会是其他问题，比如yarn队列不对，导致没资源？再或者不清楚可不可能与yarn的鉴权有关，我们的yarn集群应该是有用户权限和资源配额限制的，但理论上我是从另外一个集群上抄的配置，不清楚有没有搞错。
原机器是用于提交spark任务的，我主要复制了hadoop部分（yarn）到另一个机器（B），用B这台机器计划做flink任务的提交。

Zou Dan  于2020年8月23日周日 下午2:16写道：

> Hi, 一旦， root cause 应该是下面这个日志
> The number of requested virtual cores for application master 1 exceeds the
> maximum number of virtual cores 0 available in the Yarn Cluster.
>
> 我简单看了一下代码，应该是你们 yarn 节点上没有可用的资源，numYarnMaxVcores = 0
>
> > 2020年8月21日 下午11:11，赵一旦 mailto:hinobl...@gmail.com>>
> 写道：
> >
> > The number of requested virtual cores for application master 1 exceeds
> the
> > maximum number of virtual cores 0 available in the Yarn Cluster.
>
>

Re: Elasticsearch 写入问题

2020-08-23 文章 Benchao Li

你指的是用json format么？这个是json format的问题，当前的确是会把所有字段都写进去，不管是否为null。

我们内部也有类似需求，我们是修改了json format，可以允许忽略null的列。

 于2020年8月23日周日 下午6:11写道：

> 在flink1.11中使用table sql写入时，有一些字段为空时，依然被写入到elasticsearch，这个方式应该不太恰当。
>
>
>

-- 

Best,
Benchao Li

Elasticsearch 写入问题

2020-08-23 文章 abc15606

在flink1.11中使用table sql写入时，有一些字段为空时，依然被写入到elasticsearch，这个方式应该不太恰当。

Re: 1.11版本，关于TableEnvironment.executeSql("insert into ...")，job名称设置的问题

2020-08-23 文章 Benchao Li

FYI： 有一个issue[1] 正在跟进和解决这个问题

[1] https://issues.apache.org/jira/browse/FLINK-18545

Zou Dan  于2020年8月23日周日 下午2:29写道：

> 据我所知，这种执行方式目前没法设置 jobName
>
> > 2020年8月21日 上午11:11，Asahi Lee <978466...@qq.com> 写道：
> >
> > 你好！
> >   我通过表环境执行insert into语句提交作业，我该如何设置我的job名称呢？
> >
> >
> > 程序：
> > EnvironmentSettings bbSettings =
> EnvironmentSettings.newInstance().useBlinkPlanner().build();
> > TableEnvironment bsTableEnv = TableEnvironment.create(bbSettings);
> >
> > String sourceDDL = "CREATE TABLE datagen (  " +
> >" f_random INT,  " +
> >" f_random_str STRING,  " +
> >" ts AS localtimestamp,  " +
> >" WATERMARK FOR ts AS ts  " +
> >") WITH (  " +
> >" 'connector' = 'datagen',  " +
> >" 'rows-per-second'='10',  " +
> >" 'fields.f_random.min'='1',  " +
> >" 'fields.f_random.max'='5',  " +
> >" 'fields.f_random_str.length'='10'  " +
> >")";
> >
> > bsTableEnv.executeSql(sourceDDL);
> > Table datagen = bsTableEnv.from("datagen");
> >
> > System.out.println(datagen.getSchema());
> >
> > String sinkDDL = "CREATE TABLE print_table (" +
> >" f_random int," +
> >" c_val bigint, " +
> >" wStart TIMESTAMP(3) " +
> >") WITH ('connector' = 'print') ";
> > bsTableEnv.executeSql(sinkDDL);
> >
> > System.out.println(bsTableEnv.from("print_table").getSchema());
> >
> > Table table = bsTableEnv.sqlQuery("select f_random, count(f_random_str),
> TUMBLE_START(ts, INTERVAL '5' second) as wStart from datagen group by
> TUMBLE(ts, INTERVAL '5' second), f_random");
> > bsTableEnv.executeSql("insert into print_table select * from " + table);
>
>
>

-- 

Best,
Benchao Li

Re: 1.11版本，关于TableEnvironment.executeSql("insert into ...")，job名称设置的问题

2020-08-23 文章 Zou Dan

据我所知，这种执行方式目前没法设置 jobName

> 2020年8月21日 上午11:11，Asahi Lee <978466...@qq.com> 写道：
> 
> 你好！
>   我通过表环境执行insert into语句提交作业，我该如何设置我的job名称呢？
> 
> 
> 程序：
> EnvironmentSettings bbSettings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().build();
> TableEnvironment bsTableEnv = TableEnvironment.create(bbSettings);
> 
> String sourceDDL = "CREATE TABLE datagen (  " +
>" f_random INT,  " +
>" f_random_str STRING,  " +
>" ts AS localtimestamp,  " +
>" WATERMARK FOR ts AS ts  " +
>") WITH (  " +
>" 'connector' = 'datagen',  " +
>" 'rows-per-second'='10',  " +
>" 'fields.f_random.min'='1',  " +
>" 'fields.f_random.max'='5',  " +
>" 'fields.f_random_str.length'='10'  " +
>")";
> 
> bsTableEnv.executeSql(sourceDDL);
> Table datagen = bsTableEnv.from("datagen");
> 
> System.out.println(datagen.getSchema());
> 
> String sinkDDL = "CREATE TABLE print_table (" +
>" f_random int," +
>" c_val bigint, " +
>" wStart TIMESTAMP(3) " +
>") WITH ('connector' = 'print') ";
> bsTableEnv.executeSql(sinkDDL);
> 
> System.out.println(bsTableEnv.from("print_table").getSchema());
> 
> Table table = bsTableEnv.sqlQuery("select f_random, count(f_random_str), 
> TUMBLE_START(ts, INTERVAL '5' second) as wStart from datagen group by 
> TUMBLE(ts, INTERVAL '5' second), f_random");
> bsTableEnv.executeSql("insert into print_table select * from " + table);

Re: taskmanager引用通用jdbc连接池的问题

2020-08-23 文章 Zou Dan

每个 taskmanager 对应一个独立的 JVM，如果你的实现是一个 JVM 中唯一一个连接池，那么就是 50 * 2

> 2020年8月21日 上午11:09，Bruce  写道：
> 
> M

Re: flink on yarn配置问题

2020-08-23 文章 Zou Dan

Hi, 一旦， root cause 应该是下面这个日志
The number of requested virtual cores for application master 1 exceeds the
maximum number of virtual cores 0 available in the Yarn Cluster.

我简单看了一下代码，应该是你们 yarn 节点上没有可用的资源，numYarnMaxVcores = 0

> 2020年8月21日 下午11:11，赵一旦 mailto:hinobl...@gmail.com>> 写道：
> 
> The number of requested virtual cores for application master 1 exceeds the
> maximum number of virtual cores 0 available in the Yarn Cluster.

Re: 如何设置FlinkSQL并行度

2020-08-23 文章 JasonLee

hi
checkpoint savepoint的问题可以看下这个
https://mp.weixin.qq.com/s/Vl6_GsGeG0dK84p9H2Ld0Q



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink集成到cdh

Re: flink 1.11.1 与HDP3.0.1中的hive集成，查询不出hive表数据

Re: flink1.11 cdc使用

Flink log4j2 问题

Re: flink 1.11.1 与HDP3.0.1中的hive集成，查询不出hive表数据

Re: 1.11版本，关于TableEnvironment.executeSql("insert into ...")，job名称设置的问题

Re: flink on yarn配置问题

?????? 1.11??????????TableEnvironment.executeSql("insert into ...")??job??????????????

Re: flink on yarn配置问题

Re: flink on yarn配置问题

Re: flink-sql-gateway还会更新吗

Re: flink on yarn配置问题

flink 1.11.1 与HDP3.0.1中的hive集成，查询不出hive表数据

Re: flink on yarn配置问题

Re: flink on yarn配置问题

Re: Elasticsearch 写入问题

Elasticsearch 写入问题

Re: 1.11版本，关于TableEnvironment.executeSql("insert into ...")，job名称设置的问题

Re: 1.11版本，关于TableEnvironment.executeSql("insert into ...")，job名称设置的问题

Re: taskmanager引用通用jdbc连接池的问题

Re: flink on yarn配置问题

Re: 如何设置FlinkSQL并行度

22 matches

Site Navigation

Mail list logo

Footer information