????flink??kafka??????????????????

2019-08-26 Thread 1900
flink on yarn?? flink??1.7.2??hadoop??2.8.5??kafka??1.0.0 kafkaflinkkafka??offset ?? Properties props = new Properties(); props.put("auto.offset.reset", "latest"); Schema(), props));DataStream data = env.addSource(new

Loading dylibs

2019-08-26 Thread Vishwas Siravara
Hi guys, I have a flink application that loads a dylib like this System.loadLibrary("vibesimplejava"); The application runs fine , when I restart the job I get this exception : com.visa.aip.cryptolib.aipcyptoclient.EncryptionException: Unexpected errorjava.lang.UnsatisfiedLinkError: Native

Re: How can TMs distribute evenly over Flink on YARN cluster?

2019-08-26 Thread Yang Wang
Hi Qi, If you want to get better isolation between different flink jobs and multi-tenant support, i suggest you to use the per-job mode. Each flink job is a yarn application, and yarn use cgroup to limit the resource used by each application. Best, Yang Qi Kang 于2019年8月26日周一 下午9:02写道: >

Fwd: FLINK TABLE API UDAF QUESTION, For heap backends, the new state serializer must not be incompatible

2019-08-26 Thread orlando qi
-- Forwarded message - 发件人: orlando qi Date: 2019年8月23日周五 上午10:44 Subject: FLINK TABLE API UDAF QUESTION, For heap backends, the new state serializer must not be incompatible To: Hello everyone: I defined a UDAF function when I am using the FLINK TABLE API to achieve the

Re: Flink 1.9 build failed

2019-08-26 Thread Eliza
Hi on 2019/8/27 11:35, Simon Su wrote: Could not resolve dependencies for project org.apache.flink:flink-s3-fs-hadoop:jar:1.9-SNAPSHOT: Could not find artifact org.apache.flink:flink-fs-hadoop-shaded:jar:tests:1.9-SNAPSHOT in maven-ali (http://maven.aliyun.com/nexus/content/groups/public/)

实时计算占比的问题

2019-08-26 Thread ddwcg
一张表做自关联,然后求两个粒度的占比;比如下面求月销数额占年销售额的比例,appendTable是从上游用滚动窗口计算的流注册的,tumble_end_time是某个窗口的结束时间,请问这样inner join 会丢数据吗?有更好的方法计算占比吗? select months,monthAmount/yearAmount as amountRate from (select months,years,amount as monthAmount,tumble_end_time from appendTable) a join (select

Flink 1.9 build failed

2019-08-26 Thread Simon Su
Hi all I’m trying to build flink 1.9 release branch, it raises the error like: Could not resolve dependencies for project org.apache.flink:flink-s3-fs-hadoop:jar:1.9-SNAPSHOT: Could not find artifact org.apache.flink:flink-fs-hadoop-shaded:jar:tests:1.9-SNAPSHOT in maven-ali

Re: FLINK TABLE API 自定义聚合函数UDAF从check point恢复任务报状态序列化不兼容问题( For heap backends, the new state serializer must not be incompatible)

2019-08-26 Thread orlando qi
没有改变,我主要是来测试恢复任务是不是成功。 import java.lang.{Double => JDouble, Long => JLong, String => JString} import com.vrv.bigdata.scala.datetime.DateTimeUtil import org.apache.flink.api.common.typeinfo.{TypeInformation, Types} import org.apache.flink.table.functions.AggregateFunction /** * 自定义聚合函数:更新列值

Re: 关于row number over的用法

2019-08-26 Thread Jark Wu
Hi 你的 query 并不是 topn 语法。 可以先看这篇文档了解 topn 语法: http://blink.flink-china.org/dev/table/sql.html#topn Best, Jark > 在 2019年8月26日,19:02,ddwcg <3149768...@qq.com> 写道: > > > 文档上还没有更新topN怎么使用,我尝试用row_number() over()

Re: flink1.9中关于blink的文档在哪看呀

2019-08-26 Thread Jark Wu
Blink 合并到 flink 后,是作为一种 planner 的实现存在,所以文档是和 flink 在一起的。 如何使用 blink planner,可以看这里:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/common.html#create-a-tableenvironment

Re: 关于flink状态后端使用Rocksdb序列化问题

2019-08-26 Thread Jark Wu
I think Congxian is right. POJO Schema Evolution is the feature what you want. Best, Jark > 在 2019年8月26日,21:52,Congxian Qiu 写道: > > hi, 你看以看一下 1.8 开始支持的 POJO Scheme Evolution[1] 是否满足你的需求,你需要注意的是如何满足 Flink > 中判断 POJO 的定义 [2] > [1] >

Re: flink1.9中关于blink的文档在哪看呀

2019-08-26 Thread Zili Chen
Blink 的文档应该都在 [1] 了,并没有跟着 Flink 版本变化而变化的意思呀(x Best, tison. [1] https://github.com/apache/flink/blob/blink/README.md rockey...@163.com 于2019年8月27日周二 上午10:18写道: > > hi,all > flink1.9中关于blink的文档在哪看呀?找了半天找不到 0.0 > > > rockey...@163.com > Have a good day ! >

flink1.9中关于blink的文档在哪看呀

2019-08-26 Thread rockey...@163.com
hi,all flink1.9中关于blink的文档在哪看呀?找了半天找不到 0.0 rockey...@163.com Have a good day !

答复: 关于elasticSearch table sink 构造过于复杂

2019-08-26 Thread aven . wu
你好: 可以自己构建 indexRequest 设置id,type,source 等字段 ElasticsearchSinkFunction 不知道是否满足你的需求? 发件人: Jark Wu 发送时间: 2019年8月26日 18:00 主题: Re: 关于elasticSearch table sink 构造过于复杂 > ETL作业, 能指定某个字段作为es的主键id么, 我试了同步数据明细到es中,但是id 却是随机生成的. 据我所知,目前是不支持的。 可以去建个 JIRA 给社区提需求。 如果使用的 blink planner,可以使用 deduplicate

Re: Customize file assignments logic in flink application

2019-08-26 Thread Lu Niu
Yes. you are right. SplittableIterator will cause each worker list all the files. thanks! best Lu On Fri, Aug 16, 2019 at 12:33 AM Zhu Zhu wrote: > Hi Lu, > > I think it's OK to choose any way as long as it works. > Though I've no idea how you would extend SplittableIterator in your case. >

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-26 Thread Oytun Tez
Thanks Till and Zili! I see that docker-flink repo now has 1.9 set up, we are only waiting for it to be pushed to Docker Hub. We should be fine once that is done. Thanks again! --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com —

Re: type error with generics ..

2019-08-26 Thread Debasish Ghosh
Thanks for the clear explanation .. On Mon, Aug 26, 2019 at 10:34 PM Seth Wiesman wrote: > Hi Debasish, > > As it seems your aware TypeInformation is Flink’s internal type system > used for serialization between tasks and in/out of state backends. > > The issue you are seeing is because you are

Re: type error with generics ..

2019-08-26 Thread Debasish Ghosh
actually the scala and java code are completely separate - in fact they are part of separate test suites. We have both scala and Java API in our application but they r completely separate .. and yeah in Scala the implicits did the trick while I had to pass the TypeInformation explicitly with

Re: type error with generics ..

2019-08-26 Thread Rong Rong
Glad that you sort it out and sorry for the late reply. yes. I think the problem is how your `TypeInformation` for `Data` is being passed to the DataStreamSource construct. Regarding why scala side works but not java, there might've been something to do with the implicit variable passing for your

Re: 关于flink状态后端使用Rocksdb序列化问题

2019-08-26 Thread Congxian Qiu
hi, 你看以看一下 1.8 开始支持的 POJO Scheme Evolution[1] 是否满足你的需求,你需要注意的是如何满足 Flink 中判断 POJO 的定义 [2] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/schema_evolution.html [2]

Running flink example programs-WordCount

2019-08-26 Thread RAMALINGESWARA RAO THOTTEMPUDI
Hi, I am using the command " ./bin/flink run ./examples/batch/WordCount.jar --input /home/trrao/Desktop/ram2.txt --output /home/trrao/Desktop/ramop.txt " But I am getting " Caused by: java.net.ConnectException:connection refused" Kindly give the correct to run the wordcount example in

Re: How can TMs distribute evenly over Flink on YARN cluster?

2019-08-26 Thread Qi Kang
Hi Yang, Many thanks for your detailed explanation. We are using Hadoop 2.6.5, so setting multiple-assignments-enabled parameter is not an option. BTW, do you prefer using YARN session cluster rather than per-job cluster under this situation? These YARN nodes are almost dedicated to Flink

关于row number over的用法

2019-08-26 Thread ddwcg
文档上还没有更新topN怎么使用,我尝试用row_number() over() 跑了一下,但是报错,请问topN可以是RetractStream吗? val monthstats = bsTableEnv.sqlQuery( """ |select |id,province,amount, |row_number() over(partition by id,province order by amount ) as rn |from mytable where type=1 |group by

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
I would use a regular ProcessFunction, not a WindowProcessFunction. The final WM depends on how the records were partitioned at the watermark assigner (and the assigner itself). AFAIK, the distribution of files to source reader tasks is not deterministic. Hence, the final WM changes from run to

Re: flink 1.9 消费kafka报错

2019-08-26 Thread ddwcg
谢谢您的耐心解答,是本地cache的问题,已经解决 > 在 2019年8月26日,17:56,ddwcg <3149768...@qq.com> 写道: > > 都加了,还是不行,下面是我的pom文件和 libraires的截图 > > > > apache.snapshots > Apache Development Snapshot Repository > https://repository.apache.org/content/repositories/snapshots/ >

Re: How can TMs distribute evenly over Flink on YARN cluster?

2019-08-26 Thread Yang Wang
Hi Qi Kang, If you means to spread out all taskmanager evenly across the yarn cluster, it is a pity that flink could do nothing. Each per-job flink cluster is an individual application on the yarn cluster, they do not know the existence of others. Could share the yarn version? If it is above

RE: tumbling event time window , parallel

2019-08-26 Thread Hanan Yehudai
You said “ You can use a custom ProcessFunction and compare the timestamp of each record with the current watermark.”. Does the window process function has all the events – even the ones that are dropped due to lateness? from what I’m understand the “ iterable” argument I contains the record

Re: flink 1.9 消费kafka报错

2019-08-26 Thread Jark Wu
看起来是你依赖了一个老版本的 EnvironmentSettings,可能是本地 mvn cache 导致的。 可以尝试清空下 “~/.m2/repository/org/apache/flink/flink-table-api-java” 目录。 Best, Jark > 在 2019年8月26日,17:56,ddwcg <3149768...@qq.com> 写道: > > 都加了,还是不行,下面是我的pom文件和 libraires的截图 > > > > apache.snapshots > Apache Development

Re: 关于elasticSearch table sink 构造过于复杂

2019-08-26 Thread Jark Wu
> ETL作业, 能指定某个字段作为es的主键id么, 我试了同步数据明细到es中,但是id 却是随机生成的. 据我所知,目前是不支持的。 可以去建个 JIRA 给社区提需求。 如果使用的 blink planner,可以使用 deduplicate with keeping first row,是一个比较轻量的去重计算,能拿到一个 key (也就是去重 key)。 文档还在 review 中,可以先看这个PR:

Re: [DataStream API] Best practice to handle custom session window - time gap based but handles event for ending session

2019-08-26 Thread Fabian Hueske
Hi Jungtaek, Sorry for the slow reply and thanks for the feedback on the book! :-) As I said, I don't think Flink's windowing API is well suited to deal with the problem of manually terminated session windows due lack of support to split windows. Given that Spark has similar support for timers,

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
Hi, The paths of the files to read are distributed across all reader / source tasks and each task reads the files in order of their modification timestamp. The watermark generator is not aware of any files and just looks at the stream of records produced by the source tasks. You need to chose the

Re: OVER operator filtering out records

2019-08-26 Thread Fabian Hueske
Hi Vinod, This sounds like a watermark issue to me. The commonly used watermark strategies (like bounded out-of-order) are only advancing when there is a new record. Moreover, the current watermark is the minimum of the current watermarks of all input partitions. So, the watermark only moves

Re: flink 1.9 消费kafka报错

2019-08-26 Thread ddwcg
hi,我指定了使用blinkplanner,还是报一样的错 object StreamingJob { def main(args: Array[String]) { val bsEnv = StreamExecutionEnvironment.getExecutionEnvironment val bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build() val bsTableEnv =

Re:Re: 关于elasticSearch table sink 构造过于复杂

2019-08-26 Thread hb
没有group by的语句,比如就是select * from table ,表明细数据,以DDL 方式 写入 es, 能指定某个字段作为es的主键id么, 我试了同步数据明细到es中,但是id 却是随机生成的. 在 2019-08-26 15:47:53,"Jark Wu" 写道: >嗯,descriptor 和 DDL 就是可以用于这个场景,将 table 查询结果直接写入 sink。 > >Best, >Jark > > > >> 在 2019年8月26日,16:44,巫旭阳 写道: >> >> 感谢解答, >> 我的意图是 构建EStablesink,可以将table

How can TMs distribute evenly over Flink on YARN cluster?

2019-08-26 Thread Qi Kang
Hi, We got 3 Flink jobs running on a 10-node YARN cluster. The jobs were submitted in a per-job flavor, with same parallelism (10) and number of slots per TM (2). We originally assumed that TMs should automatically spread across the cluster, but what came out was just the opposite: All 5 TMs

Re: 关于elasticSearch table sink 构造过于复杂

2019-08-26 Thread Jark Wu
嗯,descriptor 和 DDL 就是可以用于这个场景,将 table 查询结果直接写入 sink。 Best, Jark > 在 2019年8月26日,16:44,巫旭阳 写道: > > 感谢解答, > 我的意图是 构建EStablesink,可以将table 查询的结果 直接写入ES 避免再转换DataStream 通过ESSink写入 > > > > > > > 在 2019-08-26 16:39:49,"Jark Wu" 写道: >> Hi , >> >> >> Elasticsearch6UpsertTableSink 是标记成

Re:Re: 关于elasticSearch table sink 构造过于复杂

2019-08-26 Thread 巫旭阳
感谢解答, 我的意图是 构建EStablesink,可以将table 查询的结果 直接写入ES 避免再转换DataStream 通过ESSink写入 在 2019-08-26 16:39:49,"Jark Wu" 写道: >Hi , > > >Elasticsearch6UpsertTableSink 是标记成 @internal 的,不是开放给用户直接去构造的。 >如果要注册一个 ES sink,可以使用 descriptor API,也就是 >org.apache.flink.table.descriptors.Elasticsearch。 >或者使用 DDL

Elasticsearch6UpsertTableSink 的构造方法过于复杂

2019-08-26 Thread 巫旭阳
public Elasticsearch6UpsertTableSink( boolean isAppendOnly, TableSchema schema, List hosts, String index, String docType, String keyDelimiter, String keyNullLiteral, SerializationSchema serializationSchema, XContentType contentType,

Is it possible to register a custom TypeInfoFactory without using an annotation?

2019-08-26 Thread 杨力
I'd like to provide a custom serializer for a POJO class. But that class cannot be modified so it's not possible to add a @TypeInfo annotation to it. Are there any other ways to register one?

RE: tumbling event time window , parallel

2019-08-26 Thread Hanan Yehudai
The data source is generated by an application that monitors some sort of sessions. With the EVENT_TIME column being the session end time . It is possible that the files will have out of order data , because of the async nature of the application writing files. While the EVENT_TIME is

Re: tumbling event time window , parallel

2019-08-26 Thread Fabian Hueske
Hi, Can you share a few more details about the data source? Are you continuously ingesting files from a folder? You are correct, that the parallelism should not affect the results, but there are a few things that can affect that: 1) non-determnistic keys 2) out-of-order data with inappropriate

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-26 Thread Zili Chen
Hi Oytun, I think it intents to publish flink-queryable-state-client-java without scala suffix since it is scala-free. An artifact without scala suffix has been published [2]. See also [1]. Best, tison. [1] https://issues.apache.org/jira/browse/FLINK-12602 [2]

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-26 Thread Till Rohrmann
The missing support for the Scala shell with Scala 2.12 was documented in the 1.7 release notes [1]. @Oytun, the docker image should be updated in a bit. Sorry for the inconveniences. Thanks for the pointer that flink-queryable-state-client-java_2.11 hasn't been published. We'll upload this in a

RE: timeout error while connecting to Kafka

2019-08-26 Thread Eyal Pe'er
Hi, Brief update. I tried to run the same code, but this time I used another Kafka cluster that I have where the version is 0.11. The code runs fine without the timeout exception. In conclusion, it seems like the problem occurs only when consuming events from Kafka 0.9. currently, I have no

Re:Re:回复: Re: flink1.9 blink planner table ddl 使用问题

2019-08-26 Thread hb
感谢,解决了, 指定 'connector.version' = '0.11' 就可以了. Blink SQL这方面的官方资料和文档好少啊,开发容易遇到问题. 在 2019-08-26 14:26:15,"hb" <343122...@163.com> 写道: >kafka版本是 kafka_2.11-1.1.0, >支持的kafka版本有哪些 >在 2019-08-26 14:23:19,"pengcheng...@bonc.com.cn" 写道: >>检查一下代码的kafka版本,可能是这方面的错误 >> >> >> >>pengcheng...@bonc.com.cn >>

Re: type error with generics ..

2019-08-26 Thread Debasish Ghosh
Looks like using the following overload of StreamExecutionEnvironment.addSource which takes a TypeInformation as well, does the trick .. env.addSource( FlinkSource.collectionSourceFunction(data), TypeInformation.of(Data.class) ) regards. On Mon, Aug 26, 2019 at 11:24 AM Debasish Ghosh