flink1.12.2 CLI连接hive出现异常

2021-04-26 Thread 张锴
*使用flink1.12.2 CLI连接hive时,可以查看到catalog,也能看到指定的catalog下的tables,但是执行select 语句时就出现异常。* [ERROR] Could not execute SQL statement. Reason: org.apache.hadoop.ipc.RemoteException: Application with id 'application_1605840182730_29292' doesn't exist in RM. Please check that the job submission was suc at

Re: Flink消费Kafka报错:ByteArraySerializer is not an instance of org.apache.kafka.common.serialization.Serializer

2021-04-26 Thread Colar
感谢你的回复 我看了打的JAR包中是存在org.apache.kafka.common.serialization.Serializer和ByteArraySerializer 在Flink run-application过程中也并没有将集群中Kafka 1.1.1版本的依赖加载到类路径中,并且在HDFS的application_1619417125027_0011/lib下也没有发现Kafka相关的jar包 不太明白为什么会出现问题 这是我的打包配置: org.apache.maven.plugins

Re: Deployment/Memory Configuration/Scalability

2021-04-26 Thread Yangze Guo
Hi, Radoslav, > 1. Is it a good idea to have regular savepoints (say on a daily basis)? > 2. Is it possible to have high availability with Per-Job mode? Or maybe I > should go with session mode and make sure that my flink cluster is running a > single job? Yes, we can achieve HA with per-job

Re: flink sql 使用自定义函数 ,返回嵌套行, 查询报错 scala.MatchError

2021-04-26 Thread Shengkai Fang
Hi gen 我在1.13分支上验证了下你的case,发现能够跑通。建议cp下那个patch到自己的分支,再验证下。 Best, Shengkai Shengkai Fang 于2021年4月27日周二 上午11:46写道: > 请问你使用的是哪个版本? 这个似乎是一个已知的修复的bug[1] > > [1] https://github.com/apache/flink/pull/15548 > > gen 于2021年4月27日周二 上午9:40写道: > >> Hi, all >> >> 请教下为什么 无法通过t.* 将 自定义函数返回的嵌套字段查出来。 >> >>

Re: 问题:flink 1.13编译 flink-parquet报错 -类重复:org.apache.flink.formats.parquet.generated.SimpleRecord

2021-04-26 Thread Shengkai Fang
是不是没有删除之前生成的类,手动删除冲突的类试试。 Best, Shengkai HunterXHunter <1356469...@qq.com> 于2021年4月27日周二 上午10:58写道: > 查看发现 > > org.apache.avro > avro-maven-plugin > ${avro.version} >

Re: Flink消费Kafka报错:ByteArraySerializer is not an instance of org.apache.kafka.common.serialization.Serializer

2021-04-26 Thread Shengkai Fang
hi, Colar. Flink 使用的 Kafka 的版本是2.4.1,但是你的集群版本是1.1.1。看样子 作业运行时加载的是 集群上的 ByteArraySerializer,而不是 Flink 的 `flink-connector-kafka`中的。不太确定打成一个shade包能不能行。 Best, Shengkai Colar <523774...@qq.com> 于2021年4月26日周一 下午6:05写道: > 使用Flink 1.12.2 消费Kafka报错: > > 2021-04-26 17:39:39,802 WARN

Re: flink sql 使用自定义函数 ,返回嵌套行, 查询报错 scala.MatchError

2021-04-26 Thread Shengkai Fang
请问你使用的是哪个版本? 这个似乎是一个已知的修复的bug[1] [1] https://github.com/apache/flink/pull/15548 gen 于2021年4月27日周二 上午9:40写道: > Hi, all > > 请教下为什么 无法通过t.* 将 自定义函数返回的嵌套字段查出来。 > > tEnv.executeSql( > """ > | SELECT t.* FROM ( > | SELECT EvtParser(request) as t FROM parsed_nginx_log >

Re: DataStreamAPI 与flink sql疑问

2021-04-26 Thread Shengkai Fang
Flink支持将DataStream 转换成一个 Table,然后通过API进行操作。如果想跟SQL相结合,可以将Table注册成一个 temporary view。 Best, Shengkai HunterXHunter <1356469...@qq.com> 于2021年4月27日周二 上午9:46写道: > 你试过吗? > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ >

问题:flink 1.13编译 flink-parquet报错 -类重复:org.apache.flink.formats.parquet.generated.SimpleRecord

2021-04-26 Thread HunterXHunter
查看发现 org.apache.avro avro-maven-plugin ${avro.version} generate-sources

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Yik San Chan
Hi Dian, If that's the case, shall we reword "Attach custom python files for job." into "attach custom files that could be put in PYTHONPATH, e.g., .zip, .whl, etc." Best, Yik San On Tue, Apr 27, 2021 at 10:08 AM Dian Fu wrote: > Hi Yik San, > > All the files which could be put in the

Re: Confusing docs on python.archives

2021-04-26 Thread Dian Fu
For the command line arguments, it’s documented in https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html > 2021年4月27日 上午10:19,Dian Fu 写道: > > There are multiple ways to specify the target directory depending on how to > specify the python archives. > 1) API:

Re: Confusing docs on python.archives

2021-04-26 Thread Dian Fu
There are multiple ways to specify the target directory depending on how to specify the python archives. 1) API: add_python_archive(“file:///path/to/py_env .zip", "myenv"), see [1] for more details, 2) configuration: python.archives, e.g. file:///path/to/py_env.zip#myenv 3) command line

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Dian Fu
Hi Yik San, All the files which could be put in the PYTHONPATH are allowed here, e.g. .zip, .whl, etc. Regards, Dian > 2021年4月27日 上午8:16,Yik San Chan 写道: > > Hi Dian, > > It is still not clear to me - does it only allow Python files (.py), or not? > > Best, > Yik San > > On Mon, Apr 26,

flink sql 使用自定义函数 ,返回嵌套行, 查询报错 scala.MatchError

2021-04-26 Thread gen
Hi, all 请教下为什么 无法通过t.* 将 自定义函数返回的嵌套字段查出来。 tEnv.executeSql( """ | SELECT t.* FROM ( | SELECT EvtParser(request) as t FROM parsed_nginx_log | ) |""".stripMargin) 自定义函数 EvtParser @DataTypeHint("ROW") def eval(line: String) = {...} 详细代码 class

flink sql 使用自定义函数 ,返回嵌套行, 查询报错 scala.MatchError

2021-04-26 Thread gen
目前无法通过t.* 将嵌套的字段查询出来。 val schema = tEnv.executeSql( """ | SELECT t.* FROM ( | SELECT EvtParser(request) as t FROM parsed_nginx_log | ) |""".stripMargin).getTableSchema 其中自定义函数 EvtParser 定义如下。 @DataTypeHint("ROW") def eval(line: String) = {

Re: DataStreamAPI 与flink sql疑问

2021-04-26 Thread HunterXHunter
你试过吗? -- Sent from: http://apache-flink.147419.n8.nabble.com/

flink sql 使用cdc 同步数据到ES7,报错 Detail: 无法为包含1073741350字节的字符串缓冲区扩大525个更多字节

2021-04-26 Thread william
org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped. at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42) ~[flink-sql-connector-postgres-cdc-1.2.0.jar:1.2.0] at

Re: Flink missing Kafka records

2021-04-26 Thread Dan Hill
Hey Robert. Nothing weird. I was trying to find recent records (not the latest). No savepoints (just was running about ~1 day). No checkpoint issues (all successes). I don't know how many are missing. I removed the withIdleness. The other parts are very basic. The text logs look pretty

Re: Confusing docs on python.archives

2021-04-26 Thread Yik San Chan
Hi Dian, I wonder where can we specify the target directory? Best, Yik San On Mon, Apr 26, 2021 at 9:19 PM Dian Fu wrote: > Hi Yik San, > > It should be a typo issue. I guess it should be `If the target directory > name is specified, the archive file will be extracted to a directory with >

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Yik San Chan
Hi Dian, It is still not clear to me - does it only allow Python files (.py), or not? Best, Yik San On Mon, Apr 26, 2021 at 9:15 PM Dian Fu wrote: > Hi Yik San, > > 1) what `--pyFiles` is used for: > All the files specified via `--pyFiles` will be put in the PYTHONPATH of > the Python worker

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Dan Hill
Hey Yun and Robert, I'm using Flink v1.11.1. Robert, I'll send you a separate email with the logs. On Mon, Apr 26, 2021 at 12:46 AM Yun Tang wrote: > Hi Dan, > > I think you might use older version of Flink and this problem has been > resolved by FLINK-16753 [1] after Flink-1.10.3. > > > [1]

RE: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-26 Thread Hailu, Andreas [Engineering]
Hey Nico, thanks for your reply. I gave this a try and unfortunately had no luck. // ah -Original Message- From: Nico Kruber Sent: Wednesday, April 21, 2021 1:01 PM To: user@flink.apache.org Subject: Re: [1.9.2] Flink SSL on YARN - NoSuchFileException Hi Andreas, judging from [1], it

Re: Writing to Avro from pyflink

2021-04-26 Thread Edward Yang
Hi Dian, Thanks for trying it out, it ruled out a problem with the python code. I double checked the jar path and only included the jar you referenced without any luck. However, I tried creating a python 3.7 (had 3.8) environment for pyflink and the code worked without any errors! On Sun, Apr

Re: Flink Metric isBackPressured not available

2021-04-26 Thread David Anderson
The isBackPressured metric is a Boolean -- it reports true or false, rather than 1 or 0. The Flink web UI can not display it (it shows NaN); perhaps the same is true for Datadog. https://issues.apache.org/jira/browse/FLINK-15753 relates to this. Regards, David On Tue, Apr 13, 2021 at 12:13 PM

RE: [1.9.2] Flink SSL on YARN - NoSuchFileException

2021-04-26 Thread Hailu, Andreas [Engineering]
Hi Arvid, thanks for the reply. Our stores are world-readable, so I don’t think that it’s an access issue. All of our clients have the stores present through a shared mount as well. I’m able to see the shipped stores in the directory.info output when pulling the YARN logs, and can confirm the

Re: Correctly serializing "Number" as state in ProcessFunction

2021-04-26 Thread Robert Metzger
Quick comment on the kryo type registration and the messages you are seeing: The messages are expected: What the message is saying is that we are not serializing the type using Flink's POJO serializer, but we are falling back to Kryo. Since you are registering all the instances of Number that you

Re: Flink missing Kafka records

2021-04-26 Thread Robert Metzger
Hi Dan, Can you describe under which conditions you are missing records (after a machine failure, after a Kafka failure, after taking and restoring from a savepoint, ...). Are many records missing? Are "the first records" or the "latest records" missing? Any individual records missing, or larger

Re: The wrong Options of Kafka Connector, will make the cluster can not run any job

2021-04-26 Thread Robert Metzger
Thanks a lot for your message. This could be a bug in Flink. It seems that the archival of the execution graph is failing because some classes are unloaded. What I observe from your stack traces is that some classes are loaded from flink-dist_2.11-1.11.2.jar, while other classes are loaded from

Re: Task Local Recovery with mountable disks in the cloud

2021-04-26 Thread Till Rohrmann
Hi Sonam, sorry for the late reply. We were a bit caught in the midst of the feature freeze for the next major Flink release. In general, I think it is a very good idea to disaggregate the local state storage to make it reusable across TaskManager failures. However, it is also not trivial to do.

Re: Confusing docs on python.archives

2021-04-26 Thread Dian Fu
Hi Yik San, It should be a typo issue. I guess it should be `If the target directory name is specified, the archive file will be extracted to a directory with the specified name.` Regards, Dian > 2021年4月26日 下午8:57,Yik San Chan 写道: > > Hi community, > > In >

Re: Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Dian Fu
Hi Yik San, 1) what `--pyFiles` is used for: All the files specified via `--pyFiles` will be put in the PYTHONPATH of the Python worker during execution and then they will be available for the Python user-defined functions during execution. 2) validate for the files passed to `--pyFiles`

Confusing docs on python.archives

2021-04-26 Thread Yik San Chan
Hi community, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/python/python_config.html#python-options , > For each archive file, a target directory is specified. If the target directory name is specified, the archive file will be extracted to a name can directory with the

Contradictory docs: python.files config can include not only python files

2021-04-26 Thread Yik San Chan
Hi community, In https://ci.apache.org/projects/flink/flink-docs-stable/dev/python/python_config.html, regarding python.files: > Attach custom python files for job. This makes readers think only Python files are allowed here. However, in

PyFlink: Shall we disallow relative URL for filesystem path?

2021-04-26 Thread Yik San Chan
Hi community, When using Filesystem SQL Connector, users need to provide a path. When running a PyFlink job using the mini-cluster mode by simply `python WordCount.py`, the path can be a relative path, such as, `words.txt`. However, trying to submit the job to `flink run` will fail without

flink 侧输出流类型转换问题

2021-04-26 Thread 张锴
flink版本使用1.12.2,业务需要用到了侧输出流,首先先创建了一个侧输出标签 val outputTagDate = OutputTag[String]("Date-side-output")。程序写完编译的时候报 type mismatch : found : org.apache.flink.streaming.api.scala.OutputTag[String] required: org.apache.flink.util.OutputTag[java.io.Serializable] Note: String <: java.io.Serializable,

Flink消费Kafka报错:ByteArraySerializer is not an instance of org.apache.kafka.common.serialization.Serializer

2021-04-26 Thread Colar
使用Flink 1.12.2 消费Kafka报错: 2021-04-26 17:39:39,802 WARN org.apache.flink.runtime.taskmanager.Task [] - TriggerWindow(SlidingEventTimeWindows(3, 5000), ReducingStateDescriptor{name=window-contents, defaultValue=null,

Re: 多个复杂算子保证精准一次性

2021-04-26 Thread 张锴
中间的状态也不能丢,两个都需要 hk__lrzy 于2021年4月25日周日 下午8:25写道: > 所有算子都需要维护。 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ >

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Tony Wei
Hi Till, I have created the ticket to extend the description of `execution.targe`. https://issues.apache.org/jira/browse/FLINK-22476 best regards, Tony Wei 於 2021年4月26日 週一 下午5:24寫道: > Hi Till, Yangze, > > I think FLINK-15852 should solve my problem. > It is my fault that my flink version is

Flink Hive connector: hive-conf-dir supports hdfs URI, while hadoop-conf-dir supports local path only?

2021-04-26 Thread Yik San Chan
Hi community, This question is cross-posted on Stack Overflow https://stackoverflow.com/questions/67264156/flink-hive-connector-hive-conf-dir-supports-hdfs-uri-while-hadoop-conf-dir-sup In my current setup, local dev env can access testing env. I would like to run Flink job on local dev env,

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Tony Wei
Hi Till, Yangze, I think FLINK-15852 should solve my problem. It is my fault that my flink version is not 100% consistent with the community version, and FLINK-15852 is the one I missed. Thanks for your information. best regards, Till Rohrmann 於 2021年4月26日 週一 下午5:14寫道: > I think you are right

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Yangze Guo
If the GenericCLI is selected, then the execution.target should have been overwritten to "yarn-application" in GenericCLI#toConfiguration. It is odd that why the GenericCLI#isActive return false as the execution.target is defined in both flink-conf and command line. Best, Yangze Guo On Mon, Apr

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Till Rohrmann
I think you are right that the `GenericCLI` should be the first choice. >From the top of my head I do not remember why FlinkYarnSessionCli is still used. Maybe it is in order to support some Yarn specific cli option parsing. I assume it is either an oversight or some parsing has not been

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Yangze Guo
Hi, Till, I agree that we need to resolve the issue by overriding the configuration before selecting the CustomCommandLines. However, IIUC, after FLINK-15852 the GenericCLI should always be the first choice. Could you help me to understand why the FlinkYarnSessionCli can be activated? Best,

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Till Rohrmann
Hi Tony, I think you are right that Flink's cli does not behave super consistent at the moment. Case 2. should definitely work because `-t yarn-application` should overwrite what is defined in the Flink configuration. The problem seems to be that we don't resolve the configuration wrt the

Re: Flink Event specific window

2021-04-26 Thread Swagat Mishra
Hi Arvid, On 2 - I was referring to stateful functions as an alternative to windows, but in this particular use case, its not fitting in exactly I think, though a solution can be built around it. On the overall approach here what's the right way to use Flink SQL: Every event has the transaction

Deployment/Memory Configuration/Scalability

2021-04-26 Thread Radoslav Smilyanov
Hi all, I am having multiple questions regarding Flink :) Let me give you some background of what I have done so far. *Description* I am using Flink 1.11.2. My job is doing data enrichment. Data is consumed from 6 different kafka topics and it is joined via multiple CoProcessFunctions. On a

Re: MemoryStateBackend Issue

2021-04-26 Thread Matthias Pohl
I'm not sure what you're trying to achieve. Are you trying to simulate a task failure? Or are you trying to pick up the state from a stopped job? You could achieve the former one by killing the TaskManager instance or by throwing a custom failure as part of your job pipeline. The latter one can be

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Yun Tang
Hi Dan, I think you might use older version of Flink and this problem has been resolved by FLINK-16753 [1] after Flink-1.10.3. [1] https://issues.apache.org/jira/browse/FLINK-16753 Best Yun Tang From: Robert Metzger Sent: Monday, April 26, 2021 14:46 To: Dan

DataStreamAPI 与flink sql疑问

2021-04-26 Thread 张锴
flink版本使用的是1.12.2.。请问如果在Dstream 上用一些Operater,比如map ,flatmap,process等,可以在其重写的方法中使用tableEnv.sqlQuery("xxx") tableEnv.createTemporaryView(),这种sql吗,能这样结合吗?

Re: Re: pojo warning when using auto generated protobuf class

2021-04-26 Thread Yun Gao
Hi Prashant, Flink should always give warnings as long as the deduced result is GenericType, no matter it uses the default kryo serializer or the register one, thus if you have registered the type, you may simply ignore the warnings. To make sure it works, you may find the tm that the source

Re: kafka consumers partition count and parallelism

2021-04-26 Thread Robert Metzger
Hey Prashant, the Kafka Consumer parallelism is constrained by the number of partitions the topic(s) have. If you have configured the Kafka Consumer in Flink with a parallelism of 100, but your topic has only 20 partitions, 80 consumer instances in Flink will be idle. On Mon, Apr 26, 2021 at 2:54

Re: Checkpoint error - "The job has failed"

2021-04-26 Thread Robert Metzger
Hi Dan, can you provide me with the JobManager logs to take a look as well? (This will also tell me which Flink version you are using) On Mon, Apr 26, 2021 at 7:20 AM Dan Hill wrote: > My Flink job failed to checkpoint with a "The job has failed" error. The > logs contained no other recent

Re: Kubernetes Setup - JM as job vs JM as deployment

2021-04-26 Thread Yangze Guo
Hi, Gil IIUC, you want to deploy Flink cluster using YAML files yourselves and want to know whether the JM should be deployed as Job[1] or Deployment. If that is the case, as Matthias mentioned, Flink provides two ways to integrate with K8S [2][3], in [3] the JM will be deployed as a Deployment.

Re: when should `FlinkYarnSessionCli` be included for parsing CLI arguments?

2021-04-26 Thread Yangze Guo
Hi, Tony. What is the version of your flink-dist. AFAIK, this issue should be addressed in FLINK-15852[1]. Could you give the client log of case 2(set the log level to DEBUG would be better). [1] https://issues.apache.org/jira/browse/FLINK-15852 Best, Yangze Guo On Sun, Apr 25, 2021 at 11:33

Re: Read Hive table in Stream Mode use distinct cause heap OOM

2021-04-26 Thread Shengkai Fang
Hi, could you tell me which version do you use? I just want to check whether there are any problems. Best, Shengkai 张颖 于2021年4月25日周日 下午5:23写道: > hi,I met an appearance like this: > > this is my sql: > SELECT distinct header,label,reqsig,dense_feat,item_id_feat,user_id_feat > FROM

Re: Contiguity in SQL vs CEP

2021-04-26 Thread Dawid Wysakowicz
Hi, MATCH_RECOGNIZE clause in SQL standard does not support different contiguities. The MATCH_RECOGNIZE always uses the strict contiguity. Best, Dawid On 21/04/2021 00:02, tbud wrote: > There's 3 different types of Contiguity defined in the CEP documentation [1] > looping + non-looping --

Re: Contiguity and state storage in CEP library

2021-04-26 Thread Dawid Wysakowicz
Hi, Yes you are correct that if an event can not match any pattern it won't be stored in state. If you process your records in event time it might be stored for a little while before processing in order to sort the incoming records based on time. Once a Watermark with a higher timestamp comes it

Re: Watermarks in Event Time Temporal Join

2021-04-26 Thread Maciej Bryński
Hi Shengkai, Thanks for the answer. The question is do we need to determine if an event in the main stream is late. Let's look at interval join - event is emitted as soon as there is a match between left and right stream. I agree the watermark should pass on versioned table side, because this is

Re: Dynamic Table Options 被优化器去掉了

2021-04-26 Thread Shengkai Fang
hi, macial kk. 看样子是个bug,能提供以下你的ddl以及相关的环境吗?方便我们复现下问题。 Best, Shengkai plan的digest是不会打印connector的option的值的,因此你是没有办法通过plan来判断是否生效了。 macia kk 于2021年4月26日周一 上午12:31写道: > Hi > > 我有在使用 temporal Joini 的时候有设置 如果读取分区的相关的 dynamic > option,但是最后是没有生效的,我看全部使用的默认参数,打印出来了执行计划,逻辑执行计划是有的,优化之后没有了 >