flink sql作业状态跨存储系统迁移问题
我们要将当前在Hadoop Yarn上运行的flink sql作业迁移到K8S上,状态存储介质要从HDFS更换到对象存储,以便作业能够从之前保存点恢复,升级对用户无感。 又因为flink作业状态文件内容中包含有绝对路径,所以不能通过物理直接复制文件的办法实现。 查了一下官网flink state processor api目前读取状态需要传参uid和flink状态类型,但问题是flink sql作业的uid是自动生成的,状态类型我们也无法得知,请问有没有遍历目录下保存的所有状态并将其另存到另一个文件系统目录下的API ? 感觉state processor api更适合stream api写的作业,sql作业几乎无法处理。是这样么?
Re: Flink sql client doesn't work with "partition by" clause
Hello all, I've realized that the previous mail had some error which caused invisible text. So I'm resending the mail below. Hello all, I have found that the Flink sql client doesn't work with the *"partition by"* clause. Is this a bug? It's a bit weird since when I execute the same sql with *"tableEnv.executeSql(statement)"* code it works as expected. Has anyone tackled this kind of issue? I have tested in flink 1.16.1 version. Thanks in advance *- This below code only works with executeSql method in table api but not with sql client cli.* CREATE TABLE source_table ( id STRING, status STRING, type STRING, `hour` INT ) PARTITIONED BY (`hour`) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://${our_data_path}month=202307/day=20230714', 'format' = 'parquet' ); SELECT `hour` FROM source_table GROUP BY `hour`; *- This below query works both on the executeSql() method in table api and sql client query.* CREATE TABLE source_table_2 ( id STRING, status STRING, type STRING ) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://${out_data_path}/month=202307/day=20230714', 'format' = 'parquet' ); SELECT status FROM source_table_2 GROUP BY status; Best, dongwoo 2023년 7월 28일 (금) 오후 6:19, Dongwoo Kim 님이 작성: > Hello all, I have found that flink sql client doesn't work with "partition > by" clause. > Is this bug? It's bit weird since when I execute same sql with > tableEnv.executeSql(statement) code it works as expected. Has anyone > tackled this kind of issue? I have tested in flink 1.16.1 version. > Thanks in advance > > > - This below code only works with executeSql method in table api but not > with sql client cli. > > CREATE TABLE source_table > > ( > > id STRING, > > status STRING, > > type STRING, > > `hour`INT > > ) PARTITIONED BY (`hour`) WITH ( > > 'connector' = 'filesystem', > > 'path' = 'hdfs://${our_data_path}month=202307/day=20230714', > > 'format' = 'parquet' > > ); > > > SELECT `hour` > > FROM source_table > > GROUP BY `hour`; > > > > - This below query works both on executeSql method in table api and sql > client query. > > > CREATE TABLE source_table_2 > > ( > > id STRING, > > status STRING, > > type STRING > > ) WITH ( > > 'connector' = 'filesystem', > > 'path' = 'hdfs://${out_data_path}/month=202307/day=20230714', > > 'format' = 'parquet' > > ); > > > SELECT status > > FROM source_table_2 > > GROUP BY status; > > > > Best, > > dongwoo > > >
Re: Suggestions for Open Source FLINK SQL editor
Hi, Guozhen, our team also use flink as ad-hoc query engine. Can we talk aboat it Guozhen Yang 于2023年7月20日周四 11:58写道: > Hi Rajat, > > We are using apache zeppelin as our entry point for submitting flink > ad-hoc queries (and spark jobs actually). > > It supports interactive queries, data visualization, multiple data query > engines, multiple auth models. You can check out other features on its > official website. > > But because of the inactivity of the apache zeppelin community (the last > stable release was a year and a half ago), we need to do a bit of custom > development and bug fixing on its master branch. > > On 2023/07/19 16:47:43 Rajat Ahuja wrote: > > Hi team, > > > > I have set up a session cluster on k8s via sql gateway. I am looking for > > an open source Flink sql editor that can submit sql queries on top of the > > k8s session cluster. Any suggestions for sql editor to submit queries ? > > > > > > Thanks > > >
Flink sql client doesn't work with "partition by" clause
Hello all, I have found that flink sql client doesn't work with "partition by" clause. Is this bug? It's bit weird since when I execute same sql with tableEnv.executeSql(statement) code it works as expected. Has anyone tackled this kind of issue? I have tested in flink 1.16.1 version. Thanks in advance - This below code only works with executeSql method in table api but not with sql client cli. CREATE TABLE source_table ( id STRING, status STRING, type STRING, `hour`INT ) PARTITIONED BY (`hour`) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://${our_data_path}month=202307/day=20230714', 'format' = 'parquet' ); SELECT `hour` FROM source_table GROUP BY `hour`; - This below query works both on executeSql method in table api and sql client query. CREATE TABLE source_table_2 ( id STRING, status STRING, type STRING ) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://${out_data_path}/month=202307/day=20230714', 'format' = 'parquet' ); SELECT status FROM source_table_2 GROUP BY status; Best, dongwoo
Flink kubernetes operator cpu limit configuration does not take effect
hi all,
关于使用DataStream实现有界流的join
Hi, 如题,请教一下关于如何使用DataStream API实现有界流的join操作,我在调用join的时候必须要window,怎么避免,还是需要使用SQL API才可以 感谢, 鱼
回复: flink-job-history 任务太多页面卡死
这个解决不了根本问题 主要是我们的任务比较多,业务上就需要保留几千个任务 | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制 在2023年07月28日 11:28,Shammon FY 写道: Hi, 可以通过配置`jobstore.max-capacity`和`jobstore.expiration-time`控制保存的任务数,具体参数可以参考[1] [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#full-jobmanager-options Best, Shammon FY On Fri, Jul 28, 2023 at 10:17 AM 阿华田 wrote: 目前flink-job-history 已经收录5000+任务,当点击全部任务查看时,job-history就会卡死无法访问,各位大佬有什么好的解决方式? | | 阿华田 | | a15733178...@163.com | 签名由网易邮箱大师定制