Re: FlinkSQL submit query and then the jobmanager failed.

2021-01-24 Thread Matthias Pohl
Hi, thanks for reaching out to the community. I'm not an Hive nor Orc format expert. But could it be that this is a configuration problem? The error is caused by an ArrayIndexOutOfBounds exception in ValidReadTxnList.readFromString on an array generated by splitting a String using colons as

Re:Re: Re: Test failed in flink-end-to-end-tests/flink-end-to-end-tests-common-kafka

2021-01-24 Thread Smile@LETTers
Hi Matthias, Sorry for my miss leading. I mean kafka-schema-serializer rather than kafka-avro-serializer. io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe is in kafka-schema-serializer and kafka-schema-serializer should be a dependency of kafka-avro-serializer according to their pom.xml

Re: 退订

2021-01-24 Thread Leonard Xu
Hi 需要取消订阅邮件, 可以发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 取消订阅来自 user-zh@flink.apache.org 邮件列表的邮件 邮件列表的订阅管理,请参考[1] 祝好, Leonard [1] https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list >

Re: flink sql 执行limit 很少的语句依然会暴增

2021-01-24 Thread zhang hao
flink run -py new_jdbc_source.py Traceback (most recent call last): File "new_jdbc_source.py", line 66, in st_env.execute_sql("select * from feature_bar_sink").print() File "/Users/derek.bao/dev-utils/flink-1.11.2/opt/python/pyflink.zip/pyflink/table/table_environment.py", line 543, in

Re: 退订

2021-01-24 Thread Far
退订,为什么不起作用 发自我的iPhone > 在 2021年1月24日,下午9:55,唐军亮 写道: >

Re: Flink sql去重问题

2021-01-24 Thread Leonard Xu
Hello 特殊的Top-N是说去重的语义是Top 1, 所以只用保留一个大小的堆,其底层实现和其他Top-N的数据结构不一样,并不需要维护一个堆, 其他的数据根据语义 要么被丢掉,要么撤回下发新值,另外这种有状态的算子,数据都是放在state里的,设置的TTL是生效的,表示state中的数据有效期时多久,这个数据会用来判断新来的数据是丢掉还是撤回旧值并下发新的值。 祝好, Leonard > 在 2021年1月22日,10:53,guaishushu1...@163.com 写道: > > >

Re: FlinkSQL1.12查询hive表很快finished;No more splits available

2021-01-24 Thread 赵一旦
这个其实还是挺乱的,我看了下hive-storage-api貌似也不是肯定没用。 我基于flink的data-stream-api的filesink方式写hive,orc格式文件。引入的是flink-orc包,内部依赖hive-storage-api中。这个我刚刚尝试去除换成hive-exec等,结果不行,因为少了部分类,比如 MapColumnVector等。 不过之前测试过写入没问题。所以看样子我data-stream写hive的时候是需要依赖flink-orc包,也就简介引入了hive-storage-api包,这是必须的。

Re: Caused by: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V

2021-01-24 Thread yujianbo
请教一下大佬后来如何解决,我的hadoop和hive版本跟您一致。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Initializing broadcast state

2021-01-24 Thread Guowei Ma
Hi,Nick Normally you could not iterate all the keyed states, but the `BroadCastState` & `applyTokeyedState` could do that. For example, before you get the broadcast side elements you might choose to cache the non-broadcast element to the keyed state. After the broadcast elements arrive you need to

Re: FlinkSQL1.12查询hive表很快finished;No more splits available

2021-01-24 Thread 赵一旦
基于这个回答,还有另一个英文email中有个人说我的hive-common和hive-exec不一致的问题。

Re: FlinkSQL1.12查询hive表很快finished;No more splits available

2021-01-24 Thread 赵一旦
我hive版本应该是1.2.1,我看spark部分依赖的1.2.1的包。 此外,关于hive配置。此处我需要问下,flink集群需要有hive的依赖嘛是?我flink集群本身没加任何hive依赖。 只是在flink的sql-client启动的时候通过-l参数指定了部分包,这个包是基于flink官网文档给的那个flink-sql-connector-hive-1.2.2。 此外,在flink-client端的conf中(即classpath中)加了hive-site.xml配置,内部也仅指定了最基础的一些关于metastore的连接信息。 Rui Li 于2021年1月25日周一

Re: Initializing broadcast state

2021-01-24 Thread Nick Bendtner
Thanks Guowei. Another question I have is, what is the use of a broadcast state when I can update a map state or value state inside of the process broadcast element method and use that state to do a lookup in the process element method like this example

Unable to query/print the incomplete bucket state

2021-01-24 Thread Falak Kansal
Hi, greetings I am applying window operations on a datastream. Then I apply some transformation (it could be anything). Let's say I keep the window size to 1 minute and data is coming in a strictly increasing timestamp and let's say watermark is 1 ms (checkpointing is also enabled). There would

Re: FlinkSQL1.12查询hive表很快finished;No more splits available

2021-01-24 Thread Rui Li
你好, 关于分区字段的filter,flink与hive的隐式类型转换规则不同,建议在写where条件时按照分区字段的类型来指定常量。 关于读ORC的异常,请问你的hive版本是多少呢?另外hive配置中是否指定过hive.txn.valid.txns参数? On Sun, Jan 24, 2021 at 6:45 AM 赵一旦 wrote: > 补充(1)FlinkSQL的查询,对于分区字符串字段貌似必须加'',不加就查询不到?如上hour=02这种直接导致no more split。 >

Re: Initializing broadcast state

2021-01-24 Thread Guowei Ma
Hi, Nick You might need to handle it yourself If you have to process an element only after you get the broadcast state. For example, you could “cache” the element to the state and handle it when the element from the broadcast side elements are arrived. Specially if you are using the

Initializing broadcast state

2021-01-24 Thread Nick Bendtner
Hi guys, What is the way to initialize broadcast state(say with default values) before the first element shows up in the broadcasting stream? I do a lookup on the broadcast state to process transactions which come from another stream. The problem is the broadcast state is empty until the first

Re: Where should a secondary flow for late events processing be defined?

2021-01-24 Thread Guowei Ma
Hi, Jose What I understand your question is Your job has two stages. You want to handle the first stage differently according to the event time of the Stream A. It means that if the event time of Stream A is “too late” then you would enrich Stream A with the external system and or you would

Re: Caused by: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V

2021-01-24 Thread yang nick
应该是guava包冲突问题,请参考这篇文章(参考) https://blog.csdn.net/u012121587/article/details/103903162 董海峰(Sharp) 于2021年1月24日周日 上午9:11写道: > Hi,您好啊,我最近遇到一个问题,在社区里发过,但是没人回答,想请教您一下,烦请有空的时候回复一下,谢谢您啦。 > hadoop3.3.0 flink1.12 hive3.12 > I want to integrate hive and flink. After I configure the >

Re: Flink 1.11 checkpoint compatibility issue

2021-01-24 Thread Lu Niu
Hi, Thanks all for replying. 1. The code uses data stream api only. In the code, we use env.setMaxParallsim() api but not use any operator.setMaxParallsim() api. We do use setParallsim() on each operator. 2. We did set uids for each operator and we can find uids match in two savepoints. 3.

Re: Job execution graph state - INITIALIZING

2021-01-24 Thread Chesnay Schepler
INITIALIZING is the very first state a job is in. It is the state of a job that has been accepted by the JobManager, but the processing of said job has not started yet. In other words, INITIALIZING = submitted job, CREATED = data-structures and components required for scheduling have been

Where should a secondary flow for late events processing be defined?

2021-01-24 Thread Jose Velasco
Hi everybody! I'm so excited to be here asking a first question about Flink DataStream API. I have a basic enrichment pipeline (event time). Basically, there's a main stream A (Kafka source) being enriched with the info of 2 other streams: B and C. (Kafka sources as well). Basically, the

Re: Flink to BigTable

2021-01-24 Thread Niels Basjes
Hi, I haven't tried it myself yet but there is a Flink connector for HBase and I remember someone telling me that Google has made a library available which is effectively the HBase client which talks to BigTable in the backend. Like I said: I haven't tried this yet myself. Niels Basjes Op zo

Flink to BigTable

2021-01-24 Thread Pierre Oberholzer
Dear Community, I would like to use BigTable as a sink for a Flink job: 1) Is there a connector out-of-the-box ? 2) Can I use Datastream API ? 3) How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/value are created in BigTable for nulls ? I have searched the

Job execution graph state - INITIALIZING

2021-01-24 Thread Nikola Hrusov
Hello, I have looked into this issue: https://issues.apache.org/jira/browse/FLINK-16866 which supposedly adds "INITIALIZING" state. I tried to find the documentation here: - https://ci.apache.org/projects/flink/flink-docs-release-1.12/internals/job_scheduling.html#jobmanager-data-structures -

Re: Flink on yarn JDK 版本支持问题

2021-01-24 Thread Yun Tang
Hi, MaxMetaspaceSize 是在JDK8中新增的,用以取代以前的PermGen[1],JDK7中自然不支持。可以在hadoop集群中再安装JDK8,将 env.java.home 指向新的JDK [1] https://www.baeldung.com/java-permgen-metaspace#metaspace 祝好 唐云 From: Jacob <17691150...@163.com> Sent: Saturday, January 23, 2021 16:17 To:

Re: unsubscribe

2021-01-24 Thread Matthias Pohl
Hi Abhishek, unsubscribing works by sending an email to user-unsubscr...@flink.apache.org as stated in [1]. Best, Matthias [1] https://flink.apache.org/community.html#mailing-lists On Sun, Jan 24, 2021 at 3:06 PM Abhishek Jain wrote: > unsubscribe >

unsubscribe

2021-01-24 Thread Abhishek Jain
unsubscribe

退订

2021-01-24 Thread 唐军亮
退订

FlinkSQL submit query and then the jobmanager failed.

2021-01-24 Thread 赵一旦
As the title, my query sql is very simple, it just select all columns from a hive table(version 1.2.1; orc format). When the sql is submitted, after several seconds, the jobmanager is failed. Here is the Jobmanager's log. Does anyone can help to this problem? 2021-01-24 04:41:24,952 ERROR

Re: Flink 1.11 checkpoint compatibility issue

2021-01-24 Thread 赵一旦
I think you need provide all the parallelism information, such like the operator info 'Id: b0936afefebc629e050a0f423f44e6ba, maxparallsim: 4096'. What is the parallelism, the maxparallism maybe be generated from the parallelism you have set. Arvid Heise 于2021年1月22日周五 下午11:03写道: > Hi Lu, > > if