Row和RowData的区别

2020-09-08 Thread 刘首维
Hi all, 请问`org.apache.flink.types.Row`和`org.apache.flink.table.data.RowData`的区别和联系是?

Re: flink实时统计GMV,如果订单金额下午变了该怎么处理

2020-09-08 Thread Benchao Li
不知道你是用的SQL还是DataStream API,如果用的是SQL的话,我感觉可以这么玩: 1. 首先版本是1.11+, 可以直接用binlog format,这样数据的修改其实会自动对应到update_before和update_after的数据,这样Flink 内部的算子都可以处理好这种数据,包括聚合算子。比如你是select sum(xxx) from T group by yyy这种,那这个sum指标会自动做好这件事。 2. 如果不用binlog模式,只是取最新的数据来做聚合计算,也可以用去重算子[1] 将append数据流转成retract数据流,这样下游再用同样的

手动修改CK状态

2020-09-08 Thread Shuai Xia
Hi,各位大佬,如果我想手动读取修改CK存储的状态内容,可以使用什么办法,我记得之前有工具类可以支持

pyflink execute_insert问题求解答

2020-09-08 Thread whh_960101
您好,我使用pyflink时的代码如下,有如下两个问题: 1. source = st_env.from_path('source') #st_env是StreamTableEnvironment,source是kafka源端 main_table = source.select("...") sub_table = source.select("...") main_table.execute_insert('sink').get_job_client().get_job_execution_result().result()

Re: Checkpoint metadata deleted by Flink after ZK connection issues

2020-09-08 Thread Cristian
> The job sub directory will be cleaned up when the job > finished/canceled/failed. What does this mean? Also, to clarify: I'm a very sloppy developer. My jobs crash ALL the time... and yet, the jobs would ALWAYS resume from the last checkpoint. The only cases where I expect Flink to clean

Re: Re: Re: Re: Re:Re: flink 1.11 StreamingFileWriter及sql-client问题

2020-09-08 Thread Jingsong Li
Hi kandy~ 有可能是https://issues.apache.org/jira/browse/FLINK-19166 这个问题导致的,即将发布的1.11.2会Fix它,希望你可以确认重试下~ Best, Jingsong On Fri, Aug 14, 2020 at 7:22 PM kandy.wang wrote: > @Jingsong orc格式,都看过了,还是没有commit。感觉你们可以测一下这个场景 > > 在 2020-08-12 16:04:13,"Jingsong Li" 写道: > >另外问一下,是什么格式?csv还是parquet。 >

Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败

2020-09-08 Thread Jingsong Li
非常感谢你的反馈,应该是真的有问题,我建个JIRA追踪下 https://issues.apache.org/jira/browse/FLINK-19166 会包含在即将发布的1.11.2中 Best, Jingsong On Wed, Sep 9, 2020 at 10:44 AM MuChen <9329...@qq.com> wrote: > hi,Rui Li: > 没有提交分区的目录是commited状态,手动add partition是可以正常查询的 > >

Re: Checkpoint metadata deleted by Flink after ZK connection issues

2020-09-08 Thread Yang Wang
AFAIK, the HA data, including Zookeeper meta data and real data on DFS, will only be cleaned up when the Flink cluster reached terminated state. So if you are using a session cluster, the root cluster node on Zk will be cleaned up after you manually stop the session cluster. The job sub directory

????????????checkpoint????????????

2020-09-08 Thread superainbower
hi ?? | | superainbower | | superainbo...@163.com | ?? ??2020??09??9?? 10:19??superainbower ?? HI metadata | | | importorg.apache.flink.runtime.checkpoint.savepoint.Savepoint;

?????? ??????????StreamingFileSink??hive metadata??????????????????

2020-09-08 Thread MuChen
hi??Rui Li?? commited??add partition /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-19/hour=07/part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1031 ---- ??:

?????? ??????????StreamingFileSink??hive metadata??????????????????

2020-09-08 Thread MuChen
sql?? INSERT INTO rt_dwd.dwd_music_copyright_test SELECT url,md5,utime,title,singer,company,level, from_unixtime(cast(utime/1000 as int),'-MM-dd') ,from_unixtime(cast(utime/1000 as int),'HH') FROM music_source; ---- ??:

Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败

2020-09-08 Thread Jingsong Li
插入Hive表的SQL也发下? On Tue, Sep 8, 2020 at 9:44 PM Rui Li wrote: > 另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态 > > On Tue, Sep 8, 2020 at 9:19 PM Rui Li wrote: > > > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <9329...@qq.com> wrote: > > > >> hi, Rui Li: > >>

????????????checkpoint????????????

2020-09-08 Thread superainbower
HI metadata | | | importorg.apache.flink.runtime.checkpoint.savepoint.Savepoint; ??savepoint flink1.11.1 | | | | | superainbower | | superainbo...@163.com |

Re: Difficulties with Minio state storage

2020-09-08 Thread Yangze Guo
Hi, Rex, I've tried to use MinIO as state backend and everything seems works well. Just sharing my configuration: ``` s3.access-key: s3.secret-key: s3.endpoint: http://localhost:9000 s3.path.style.access: true state.checkpoints.dir: s3://flink/checkpoints ``` I think the problem might be caused

Difficulties with Minio state storage

2020-09-08 Thread Rex Fenley
Hello! I'm trying to test out Minio as state storage backend using docker-compose on my local machine but keep running into errors that seem strange to me. Any help would be much appreciated :) The problem: With the following environment: environment: - | FLINK_PROPERTIES=

??????flink????????GMV,??????????????????????????????

2020-09-08 Thread xuzh
O(??_??)O?? ---- ??: "user-zh"

Re: State Storage Questions

2020-09-08 Thread Rex Fenley
Thanks a bunch! >For example, the Flink Kafka source operator's parallel instances maintain as operator state a mapping of partitions to offsets for the partitions that it is assigned to. This I think clarifies things. This is literally state for the operator to do its job, not really row data.

How to get Latency Tracking results?

2020-09-08 Thread Pankaj Chand
Hello, How do I visualize (or extract) the results for Latency Tracking for a Flink local cluster? I set "metrics.latency.interval 100" in the conf/flink-conf.yaml file, and started the cluster and SocketWindowWordCount job. However, I could not find the latency distributions anywhere in the web

Re: Flink 1.8.3 GC issues

2020-09-08 Thread Josson Paul
Hi Piotr, 2) SystemProcessingTimeService holds the HeapKeyedStateBackend and HeapKeyedStateBackend has lot of Objects and that is filling the Heap 3) I am not using Flink Kafka Connector. But we are using Apache Beam kafka connector. There is a change in the Apache Beam version. But the

Re: Performance issue associated with managed RocksDB memory

2020-09-08 Thread Yun Tang
Hi Juha I planned to give some descriptions in Flink documentation to give such hints, however, it has too many details for RocksDB and we could increase the managed memory size to a proper value to avoid this in most cases. Since you have come across this and reported in user mailing list, and

Re: Checkpoint metadata deleted by Flink after ZK connection issues

2020-09-08 Thread Cristian
I'm using the standalone script to start the cluster. As far as I can tell, it's not easy to reproduce. We found that zookeeper lost a node around the time this happened, but all of our other 75 Flink jobs which use the same setup, version and zookeeper, didn't have any issues. They didn't

Re: Should the StreamingFileSink mark the files "finished" when all bounded input sources are depleted?

2020-09-08 Thread Aljoscha Krettek
Hi, this is indeed the correct behaviour right now. Which doesn't mean that it's the behaviour that we would like to have. The reason why we can't move the "pending" files to "final" is that we don't have a point where we can do this in an idempotent and retryable fashion. When we do

Re:

2020-09-08 Thread Timo Walther
You are using the old connectors. The new connectors are available via SQL DDL (and execute_sql() API) like documented here: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html Maybe this will give your some performance boost, but certainly not

Re:

2020-09-08 Thread Violeta Milanović
Hi Timo, I actually tried many things, increasing jvm heap size and flink managed memory didn't help me. Running the same query without group by clause like this: select avg(transaction_amount) as avg_ta, avg(salary+bonus) as avg_income, avg(salary+bonus) - avg(transaction_amount) as

??????flink????????GMV,??????????????????????????????

2020-09-08 Thread ????????
?? ??? ---- ??: "user-zh"

Re: Flink alert after database lookUp

2020-09-08 Thread Timo Walther
Hi Sunitha, what you are describing is a typical streaming enrichment. We need to enrich the stream with some data from a database. There are different strategies to handle this: 1) You are querying the database for every record. This is usually not what you want because it would slow down

Re: FLINK DATASTREAM Processing Question

2020-09-08 Thread Timo Walther
Hi Vijay, one comment to add is that the performance might suffer with multiple map() calls. For safety reason, records between chained operators are serialized and deserialized in order to strictly don't influence each other. If all functions of a pipeline are guaranteed to not modify

Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败

2020-09-08 Thread Rui Li
另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态 On Tue, Sep 8, 2020 at 9:19 PM Rui Li wrote: > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <9329...@qq.com> wrote: > >> hi, Rui Li: >> 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: >> 2020-09-04 17:17:10,548 INFO

Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败

2020-09-08 Thread Rui Li
作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? On Tue, Sep 8, 2020 at 5:20 PM MuChen <9329...@qq.com> wrote: > hi, Rui Li: > 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: > 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators. > AbstractStreamOperator [] - Partition {dt=2020-08-22,

blob server相关,文件找不到

2020-09-08 Thread leiyanrui
java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store. at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:135)

Re: Performance issue associated with managed RocksDB memory

2020-09-08 Thread Juha Mynttinen
Hey Yun, Thanks for the detailed answer. It clarified how things work. Especially what is the role of RocksDB arena, and arena block size. I think there's no real-world case where it would make sense to start to a Flink job with RocksDB configured so that RocksDB flushes all the time, i.e.

Re: Checkpoint metadata deleted by Flink after ZK connection issues

2020-09-08 Thread Robert Metzger
Thanks a lot for reporting this problem here Cristian! I am not super familiar with the involved components, but the behavior you are describing doesn't sound right to me. Which entrypoint are you using? This is logged at the beginning, like this: "2020-09-08 14:45:32,807 INFO

Re: Flink Plan Visualizer

2020-09-08 Thread 黄潇
Hi, 据我所知,使用 env.getExecutionPlan() 得到的 json 字符串[1]只包含 stream graph 的信息,所以这样画出来的图是 stream graph。 在 job 提交之后的 web ui 中可以看到经过 operator chain 之后的图信息。 [1] https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/execution_plans.html zilong xiao 于2020年9月8日周二 下午8:00写道: > hi,想问下,Flink Plan

MetricQueryService中metric更新

2020-09-08 Thread liangji
版本flink1.11.0 请问下各位大佬MetricQueryService这个类中定义的四种类型metric的值是怎么更新的?比如kafka的current-offset是Gauge类型,MetricQueryService中gauges这个map对应的current-offset的值怎么更新的呢?只看到addMetric这个方法是put值的,但是通过本地调试和代码走读,这个方法只是一开始注册用的,运行过程如何更新metric的? -- Sent from: http://apache-flink.147419.n8.nabble.com/

Flink Plan Visualizer

2020-09-08 Thread zilong xiao
hi,想问下,Flink Plan Visualizer能画job graph吗?网上查貌似只能根据execution plan画steaming graph?

????????????checkpoint????????????

2020-09-08 Thread ????
Hi RocksDB Checkpoint Checkpoint Checkpoint RocksDB ?? Compaction Checkpoint ?? Checkpoint

请教个checkpoint目录删除问题

2020-09-08 Thread superainbower
hi,请教个问题,statebackend是rocksdb,flink-conf.yaml文件里state.backend.incremental:true,当在hdfs上有个checkpoint目录,做完checkpoint的时候停掉任务,修改了代码重启任务,指定上次的checkpoint的目录位置,这时候新任务作完checkpoint以后,之前的那个checkpoint的目录 可以删除吗?因为配置了增量checkpoint,不知道之前任务的checkpoint目录对新任务是否还有用

Re: flink sql中Table类型注册临时表,无法使用的问题?

2020-09-08 Thread Jary Zhen
flink 1.11 registerTable 的接口定义如下: 建议使用#createTemporaryView(String, Table) /** * Registers a {@link Table} under a unique name in the TableEnvironment's catalog. * Registered tables can be referenced in SQL queries. * * Temporary objects can shadow permanent ones. If a permanent object in a

Re: flink sql中Table类型注册临时表,无法使用的问题?

2020-09-08 Thread Jary Zhen
替换 tableEnv.registerTable("tableTmpA", tableA) -> tableEnv.createTemporaryView() api 试试? me 于2020年9月8日周二 下午5:59写道: > flink sql中Table类型注册临时表,无法使用的问题? > val tableA : Table = … > tableEnv.registerTable("tableTmpA", tableA) > > > tableEnv.sqlquery(“select * from tableTmpA”) ==>

回复:flink sql中Table类型注册临时表,无法使用的问题?

2020-09-08 Thread 引领
没法使用是指什么? 在2020年09月8日 17:59,me 写道: flink sql中Table类型注册临时表,无法使用的问题? val tableA : Table = … tableEnv.registerTable("tableTmpA", tableA) tableEnv.sqlquery(“select * from tableTmpA”) ==> flink1.11这里是没办法直接使用的 请问谁成功使用过这个特性?

??????flink????????GMV,??????????????????????????????

2020-09-08 Thread ????
??canal??mysql?? | | | | yrx73...@163.com | ?? ??2020??09??8?? 18:24 ?? ??kafka

??????flink????????GMV,??????????????????????????????

2020-09-08 Thread ??????
??kafka ---- ??:

??????flink????????GMV,??????????????????????????????

2020-09-08 Thread 1115098...@qq.com
flink??update updateinsert?? ??kafa2 1:009 ??-1000 2:009 ??500 flink

flink sql中Table类型注册临时表,无法使用的问题?

2020-09-08 Thread me
flink sql中Table类型注册临时表,无法使用的问题? val tableA : Table = … tableEnv.registerTable("tableTmpA", tableA) tableEnv.sqlquery(“select * from tableTmpA”) ==> flink1.11这里是没办法直接使用的 请问谁成功使用过这个特性?

flink????????GMV,??????????????????????????????

2020-09-08 Thread xuzh
?? ??GMV, ??mysql,binlog??kafka.??kafka?? 009 10??1000. json??kafka ,GMV??1000. 15009500??jsonkafka.

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
> The only one where I could see that users want different behaviour > BATCH jobs on the DataStream API. I agree that processing-time does > not make much sense in batch jobs. However, if users have written some > business logic using processing-time timers their jobs will silently > not work if

?????? ??????????StreamingFileSink??hive metadata??????????????????

2020-09-08 Thread MuChen
hi, Rui Li: 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to

回复:回复:flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread Shuai Xia
改动其实很小,就那一句代码的事,主要就看你怎么编译而已 -- 发件人:大罗 发送时间:2020年9月8日(星期二) 17:05 收件人:user-zh 主 题:Re: 回复:flink sql 1.11.1 could not insert hive orc record 你的回答我觉得应该是解决问题的方向。 有没有guideline,或者类似的参考,我可以自己修改ORC源码并且编译使用呢? -- Sent from:

Re: 回复:flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread 大罗
你的回答我觉得应该是解决问题的方向。 有没有guideline,或者类似的参考,我可以自己修改ORC源码并且编译使用呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/

回复:flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread Shuai Xia
# 主要原因为Orc在新版本后使用的WriterVersion为ORC_517 # 导致低版本的Hive解析不了 # 自实现OrcFile类,修改回旧版本 static { CURRENT_WRITER = WriterVersion.HIVE_13083; memoryManager = null; } -- 发件人:大罗 发送时间:2020年9月8日(星期二) 16:55 收件人:user-zh 主 题:Re: flink sql

Re: flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread Jingsong Li
Hi, flink-sql-orc_2.11-1.11.0.jar 和 flink-sql-connector-hive-2.2.0_2.11-1.11.0.jar 目前是不能共存的,不然会冲突,你试试去掉flink-sql-orc看看? On Tue, Sep 8, 2020 at 4:55 PM 大罗 wrote: > Hi ,我例子中的hive orc表,不是事务表,如图: > > createtab_stmt > CREATE TABLE `dest_orc`( > `i` int) > PARTITIONED BY ( > `ts` string) > ROW

回复:flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread Shuai Xia
flink-orc模块版本应该只支持新版本,2.1.1支持不了,可以自己修改ORC源码 -- 发件人:大罗 发送时间:2020年9月8日(星期二) 16:55 收件人:user-zh 主 题:Re: flink sql 1.11.1 could not insert hive orc record Hi ,我例子中的hive orc表,不是事务表,如图: createtab_stmt CREATE TABLE `dest_orc`( `i`

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Aljoscha Krettek
I agree with almost all of your points! The only one where I could see that users want different behaviour BATCH jobs on the DataStream API. I agree that processing-time does not make much sense in batch jobs. However, if users have written some business logic using processing-time timers

Re: flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread 大罗
Hi ,我例子中的hive orc表,不是事务表,如图: createtab_stmt CREATE TABLE `dest_orc`( `i` int) PARTITIONED BY ( `ts` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

Re: flink 1.11.1 版本执行HiveCatalog遇到问题质询

2020-09-08 Thread taochanglian
贴一下代码 在 2020/9/8 14:09, zhongbaoluo 写道: 据插入数据执行失败,也没有找到异常。 yarn

Re: flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread taochanglian
看看你的表是不是事务表,hive建表的时候加上 'transactional' = 'false' 在 2020/9/8 16:26, 大罗 写道: Hi,我使用flink sql 1.11.1 的hive catalog特性往hive orc表插入数据: 我所使用的版本如下: Hadoop 3.0.0+cdh6.3.2 HDFS 3.0.0+cdh6.3.2 HBase 2.1.0+cdh6.3.2 Hive 2.1.1+cdh6.3.2 Flink 1.11.1 定义hive orc表如下: create table dest_orc ( i int )

flink sql 1.11.1 could not insert hive orc record

2020-09-08 Thread 大罗
Hi,我使用flink sql 1.11.1 的hive catalog特性往hive orc表插入数据: 我所使用的版本如下: Hadoop 3.0.0+cdh6.3.2 HDFS 3.0.0+cdh6.3.2 HBase 2.1.0+cdh6.3.2 Hive 2.1.1+cdh6.3.2 Flink 1.11.1 定义hive orc表如下: create table dest_orc ( i int ) partitioned by (ts string) stored as orc TBLPROPERTIES( 'orc.compress' = 'SNAPPY'

Re: flink-sql 1.11版本都还没完全支持checkpoint吗

2020-09-08 Thread Peihui He
[image: image.png] 重新部署如果需要从上次cancel点恢复的化,需要指定savepoint,savepoint 可以是上次cancel点最后一次checkpoint。 凌天荣 <466792...@qq.com> 于2020年9月8日周二 下午4:07写道: > 没有指定savapoint的,我们是cancel掉,然后重新部署的 > > > --原始邮件-- > 发件人: > "user-zh" >

回复: flink-sql 1.11版本都还没完全支持checkpoint吗

2020-09-08 Thread silence
手动停止再恢复的话需要启动时通过 (-s 上一次checkpoint的mate路径)进行恢复 https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/check points.html#resuming-from-a-retained-checkpoint -邮件原件- 发件人: 凌天荣 <466792...@qq.com> 发送时间: 2020年9月8日 15:50 收件人: user-zh 主题: flink-sql 1.11版本都还没完全支持checkpoint吗

Re: flink1.9.3 on yarn 提交任务问题

2020-09-08 Thread zheng faaron
Hi, 第一个问题,per-job的方式和session的方式在运行时是不在一个ui中的。 第二个问题,可以配置yarn. container. vcore Best, Faaron Zheng From: 宁吉浩 Sent: Monday, September 7, 2020 3:23:12 PM To: user-zh Subject: flink1.9.3 on yarn 提交任务问题 我选择用 bin/flink run -m yarn cluster 的方式提交任务; 遇到了两个问题: 1.

?????? flink-sql 1.11??????????????????checkpoint??

2020-09-08 Thread ??????
savapoint??cancel?? ---- ??: "user-zh"

Re: flink-sql 1.11版本都还没完全支持checkpoint吗

2020-09-08 Thread Peihui He
重启后有没有指定savepoint呢? 凌天荣 <466792...@qq.com> 于2020年9月8日周二 下午3:50写道: > 代码里设置了enableCheckpointing,任务停掉后,重启,还是没能消费停掉期间的数据,也就是checkpoint没生效

flink-sql 1.11??????????????????checkpoint??

2020-09-08 Thread ??????
enableCheckpointing??checkpoint??

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
Hey Aljoscha A couple of thoughts for the two remaining TODOs in the doc: # Processing Time Support in BATCH/BOUNDED execution mode I think there are two somewhat orthogonal problems around this topic:     1. Firing processing timers at the end of the job     2. Having processing timers in the

flink 1.11.1 版本执行HiveCatalog遇到问题质询

2020-09-08 Thread zhongbaoluo
你好! 我想咨询一下,在 flink 1.11.1 版本里面用 application 模式在包在hdfs上运行 HiveCatalog。 查询元数据之类的都能执行成功,查询数据插入数据执行失败,也没有找到异常。 yarn 状态就是失败,数据也没有执行成功。 本地执行或者jar 包在本地的方式提交都能执行成功。 我想质询一下这个问题 谢谢 ** Thanks & Best Regards! 杉欣集团-技术研究院 云平台

Re: State Storage Questions

2020-09-08 Thread Tzu-Li (Gordon) Tai
Hi! Operator state is bound to a single parallel operator instance; there is no partitioning happening here. It is typically used in Flink source and sink operators. For example, the Flink Kafka source operator's parallel instances maintain as operator state a mapping of partitions to offsets for