Re: Python UDF from Java

2020-04-30 Thread jincheng sun
Thanks Flavio and Thanks Marta, That's a good question as many user want to know that! CC to user-zh mailing list :) Best, Jincheng - Twitter: https://twitter.com/sunjincheng121 - Flavio Pompermaier 于2020年5月1日周五 上午7:04写道: > Yes, that's awesome! I think this would

Re: Python UDF from Java

2020-04-30 Thread jincheng sun
Thanks Flavio and Thanks Marta, That's a good question as many user want to know that! CC to user-zh mailing list :) Best, Jincheng - Twitter: https://twitter.com/sunjincheng121 - Flavio Pompermaier 于2020年5月1日周五 上午7:04写道: > Yes, that's awesome! I think this would

Re: Flink Task Manager GC overhead limit exceeded

2020-04-30 Thread shao.hongxiao
你好,宋 Please refer to this document [1] for more details 能发一下具体链接吗,我也发现flink ui上显示的内存参数不太对,我想仔细看一下相关说明 谢谢啦 | | 邵红晓 | | 邮箱:17611022...@163.com | 签名由 网易邮箱大师 定制 On 04/30/2020 12:08, Xintong Song wrote: Then I would suggest the following. - Check the task manager log to see if the '-D'

Re: Python UDF from Java

2020-04-30 Thread Flavio Pompermaier
Yes, that's awesome! I think this would be a really attractive feature to promote the usage of Flink. Thanks Marta, Flavio On Fri, May 1, 2020 at 12:26 AM Marta Paes Moreira wrote: > Hi, Flavio. > > Extending the scope of Python UDFs is described in FLIP-106 [1, 2] and is > planned for the

Re: Python UDF from Java

2020-04-30 Thread Marta Paes Moreira
Hi, Flavio. Extending the scope of Python UDFs is described in FLIP-106 [1, 2] and is planned for the upcoming 1.11 release, according to Piotr's last update. Hope this addresses your question! Marta [1]

Python UDF from Java

2020-04-30 Thread Flavio Pompermaier
Hi to all, is it possible to run a Python UDF from a Java job (using Table API or SQL)? Is there any reference? Best, Flavio

Re: Does it make sense to use init containers for job upgrades in kubernetes

2020-04-30 Thread bbaj...@gmail.com
Thnx all: 1) for now, we will try with inhouse kubernetes, and see how it goes. 2) Till, cheers, I'll give a stab, though likely I'll end up with an operator or some other workflow tool ( I've gotten multiple weird looks when I mentioned init container approach at work; I was mostly curios at this

Re: Wait for cancellation event with CEP

2020-04-30 Thread Till Rohrmann
Hi Maxim, I think your problem should be solvable with the CEP library: public static void main(String[] args) throws Exception { // set up the streaming execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Re: Does it make sense to use init containers for job upgrades in kubernetes

2020-04-30 Thread Till Rohrmann
Hi Barisa, from what you've described I believe it could work. But I never tried it out. Maybe you could report back once you tried it out. I believe it would be interesting to hear your experience with this approach. One thing to note is that the approach hinges on the fact that the older

Re: join in sql without time interval

2020-04-30 Thread Jark Wu
Yes. Flink Table uses something like that but more lower API called `TwoInputStreamOperator`, you can see: org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator And state ttl in TableConfig can take effect on such join query. Best, Jark On Thu, 30 Apr 2020 at 22:35, lec

Re: join in sql without time interval

2020-04-30 Thread lec ssmi
Thanks, but is the bottom layer of the table API really implemented like this? Konstantin Knauf 于 2020年4月30日周四 22:02写道: > Hi Lec Ssmi, > > yes, Dastream#connect on two streams both keyed on the productId with a > KeyedCoProcessFunction is the way to go. > > Cheers, > > Konstantin > > On Thu,

Re: Using logicalType in the Avro table format

2020-04-30 Thread Arvid Heise
Okay bummer, but not completely unexpected. The conversions should be automatically compiled into SpecificRecords. I'm not sure how the Table API is doing it internally; I just saw SpecificRecord in your stacktrace and figured to try it out. On Thu, Apr 30, 2020 at 3:35 PM Gyula Fóra wrote: >

Re: join in sql without time interval

2020-04-30 Thread Konstantin Knauf
Hi Lec Ssmi, yes, Dastream#connect on two streams both keyed on the productId with a KeyedCoProcessFunction is the way to go. Cheers, Konstantin On Thu, Apr 30, 2020 at 11:10 AM lec ssmi wrote: > Maybe, the connect method? > > lec ssmi 于2020年4月30日周四 下午3:59写道: > >> Hi: >> As the

Re: Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 俞剑波
请问可以怎么根据java代码去设置这个东西,希望大佬能教一下,非常感谢!!! 972684638 于2020年4月30日周四 下午7:28写道: > metrics.reporter.promgateway.jobName这个配置,可以通过java代码来设置,任务启动时,将它设成job id即可; > 另外,把suffix那个配置去掉; > > 接下来,就根据exported_job去统计即可; > > > > ---原始邮件--- > 发件人: "俞剑波" 发送时间: 2020年4月30日(星期四) 晚上7:19 > 收件人: "user-zh" 主题: Re: Flink监控:

Re: 订阅成功了吗

2020-04-30 Thread 俞剑波
ok a511955993 于2020年4月30日周四 下午9:13写道: > >

Re: Does it make sense to use init containers for job upgrades in kubernetes

2020-04-30 Thread Alexander Fedulov
Hi Barisa, it seems that there is no immediate answer to your concrete question here, so I wanted to ask you back a more general question: did you consider using the Community Edition of Ververica Platform for your purposes [1]

Re: Using logicalType in the Avro table format

2020-04-30 Thread Gyula Fóra
Hi Arvid! I tried it with Avro 1.9.2, and it lead to the same error. Seems like Avro cannot find the conversion class between LocalDateTime and timestamp-millis. Not sure how exactly this works, maybe we need to set the conversions ourselves? Thanks! Gyula On Thu, Apr 30, 2020 at 12:53 PM Arvid

订阅成功了吗

2020-04-30 Thread a511955993

Re: Using Stateful Functions within a Flink pipeline

2020-04-30 Thread Annemarie Burger
Hi Igal, Thanks for your responses. Regarding "having a pre-processing job that does whatever transformations necessary with the DataStream and outputs to Kafka / Kinesis, and then having a separate StateFun deployment that consumes from that transformed Kafka / Kinesis topic." I was wondering

Re: multiple joins in one job

2020-04-30 Thread Benchao Li
Hi lec, AFAIK, time attribute will be preserved after time interval join. Could you share your DDL and SQL queries with us? lec ssmi 于2020年4月30日周四 下午5:48写道: > Hi: >I need to join multiple stream tables using time interval join. The > problem is that the time attribute will disappear

回复:Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 972684638
metrics.reporter.promgateway.jobName这个配置,可以通过java代码来设置,任务启动时,将它设成job id即可; 另外,把suffix那个配置去掉; 接下来,就根据exported_job去统计即可; ---原始邮件--- 发件人: "俞剑波"https://blog.csdn.net/u013516966/article/details/103171484 4.希望有遇到这样问题并解决的大佬小伙伴能够帮个忙,说一下怎么解决。 俞剑波

Re: Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 俞剑波
再说一下我的详细情况,集群的是*per job*模式,指标是都采集到了,然后说明一下我的配置和遇到的情况: 1.在flink-conf.yaml是这么配置的 metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter metrics.reporter.promgateway.host: 10.20.0.200 metrics.reporter.promgateway.port: 9091

Re: 请问我申请成功了吗

2020-04-30 Thread 俞剑波
哈哈哈,谢谢,又碰到了 zhisheng 于2020年4月30日周四 下午6:51写道: > ok 了 > > 俞剑波 <15205029...@163.com> 于2020年4月30日周四 下午5:34写道: > > > 请问我邮箱申请成功了吗 >

Re: Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 俞剑波
我添加了这个参数配置发现这个只是解决了标签冲突,还是没有这个job_name: 不加参数:flink_jobmanager_Status_JVM_Memory_Heap_Used{exported_job="myJobYJB4eea972f622437b738875b3e8e811a56",host="localhost",instance="pushgateway",job="pushgateway"} 加了参数:

Re: Using logicalType in the Avro table format

2020-04-30 Thread Arvid Heise
Hi Gyula, it may still be worth to try to upgrade to Avro 1.9.2 (can never hurt) and see if this solves your particular problem. The code path in GenericDatumWriter is taking the conversion path, so it might just work. Of course that depends on the schema being correctly translated to a specific

Re: execution.checkpointing.tolerable-failed-checkpoints 无效

2020-04-30 Thread zhisheng
这个参数好像可以作业里面单独设置,可以试试看 env.getCheckpointConfig().setTolerableCheckpointFailureNumber(); 蒋佳成(Jiacheng Jiang) <920334...@qq.com> 于2020年4月30日周四 下午3:07写道: > hi > > 我在flink-conf.yaml中配置execution.checkpointing.tolerable-failed-checkpoints: > 100无效,默认为0,也就是不容忍错误,这样的话一个checkpoint出错,job就要重启,这个值该怎么设置呢?

Wait for cancellation event with CEP

2020-04-30 Thread Maxim Parkachov
Hi everyone, I need to implement following functionality. There is a kafka topic where "forward" events are coming and in the same topic there are "cancel" events. For each "forward" event I need to wait 1 minute for possible "cancel" event. I can uniquely match both events. If "cancel" event

Re: Configuring taskmanager.memory.task.off-heap.size in Flink 1.10

2020-04-30 Thread Jiahui Jiang
I see I see. Thank you so much! From: Xintong Song Sent: Wednesday, April 29, 2020 11:22 PM To: Jiahui Jiang Cc: user@flink.apache.org Subject: Re: Configuring taskmanager.memory.task.off-heap.size in Flink 1.10 That's pretty much it. I'm not very familiar with

Re: "Fill in" notification messages based on event time watermark

2020-04-30 Thread Manas Kale
Hi Timo and Piotrek, Thank you for the suggestions. I have been trying to set up unit tests at the operator granularity, and the blog post's testHarness examples certainly help a lot in this regard. I understood my problem - an upstream session window operator can only report the end of the

multiple joins in one job

2020-04-30 Thread lec ssmi
Hi: I need to join multiple stream tables using time interval join. The problem is that the time attribute will disappear after the jon , and pure sql cannot declare the time attribute field again . So, to make is success, I need to insert the last result of join to kafka ,and consume it

广播流传输自定义对象无效

2020-04-30 Thread 牙牙
hi,all 我这边遇到一个问题: 在定义广播流(non-keyed类型)的sourceFunction时,实际传输的流中元素数据为热加载的某个具体实现类A,此类A继承某个抽象类B,此抽象类B实现指定接口C,sourceFunction的泛型T指定为抽象类B类型。 现象: 下游算子的processFunction中,对于接收广播流数据的方法中收不到实现类A,实现类A实现序列化接口了,私有属性均有getter和setter。 还请哪位大佬,指点迷津!

请问我申请成功了吗

2020-04-30 Thread 俞剑波
请问我邮箱申请成功了吗

Re:FlinkSQL Retraction 问题原理咨询

2020-04-30 Thread Michael Ran
指定的更新键是tms_company? 结果是: yuantong:2 zhongtong:2 在 2020-04-30 17:08:22,"wangl...@geekplus.com.cn" 写道: > >自己实现了一下 https://yq.aliyun.com/articles/457392/ 菜鸟物流订单统计的例子,读 kafka 写到 RDS, RDS >表没有主键,也没有唯一键。 > >INSERT INTO table_out select tms_company, count(distinct order_id) as

Re: join in sql without time interval

2020-04-30 Thread lec ssmi
Maybe, the connect method? lec ssmi 于2020年4月30日周四 下午3:59写道: > Hi: > As the following sql: > >SELECT * FROM Orders INNER JOIN Product ON Orders.productId = > Product.id > > If we use datastream API instead of sql, how should it be achieved? > Because the APIs in DataStream only have

FlinkSQL Retraction 问题原理咨询

2020-04-30 Thread wangl...@geekplus.com.cn
自己实现了一下 https://yq.aliyun.com/articles/457392/ 菜鸟物流订单统计的例子,读 kafka 写到 RDS, RDS 表没有主键,也没有唯一键。 INSERT INTO table_out select tms_company, count(distinct order_id) as order_cnt from (select order_id, LAST_VALUE(tms_company) AS tms_company from dwd_table group by order_id) group by

Re: doing demultiplexing using Apache flink

2020-04-30 Thread Alexander Fedulov
This too, should be possible. Flink uses `StreamingFileSink` to transfer data to S3 [1 ]. You can pass it your custom bucket assigner [2

Re: Using logicalType in the Avro table format

2020-04-30 Thread Gyula Fóra
Hi! @Arvid: We are using Avro 1.8 I believe but this problem seems to come from the flink side as Dawid mentioned. @Dawid: Sounds like a reasonable explanation, here are the actual queries to reproduce within the SQL client/table api: CREATE TABLE source_table ( int_field INT,

Re: Savepoint memory overhead

2020-04-30 Thread Lasse Nedergaard
Hi Thanks for the reply. The link you provide make us thinking of some old rocksdb cfg. We was still using and it could cause our container killing problems so I will do a test without specific rocksdb cfg. But we also see RocksDbExceptions “cannot allocate memory” while appending to a

Re: Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 俞剑波
是的,我使用了!请问大佬有解决办法吗,真的卡很多天了,希望能够解决一下,非常感谢 972684638 于2020年4月30日周四 下午4:18写道: > 我想知道,你是否使用了pushgateway? > > > > ---原始邮件--- > 发件人: "俞剑波" 发送时间: 2020年4月30日(星期四) 下午4:01 > 收件人: "user-zh" 主题: Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id > > > flink集群是per > >

回复:Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 972684638
我想知道,你是否使用了pushgateway? ---原始邮件--- 发件人: "俞剑波"

Re:Re:Re:回复:flink1.9,state process api 读取MapState,出错

2020-04-30 Thread guanyq
定位到问题点了。 和这个keyBy有关,后一种可以读取MapState,前一种报错

Re: Using logicalType in the Avro table format

2020-04-30 Thread Dawid Wysakowicz
Hi Gyula, I have not verified it locally yet, but I think you are hitting yet another problem of the unfinished migration from old TypeInformation based type system to the new type system based on DataTypes. As far as I understand the problem the information about the bridging class

Flink监控: promethues获取到有的metrics没有包含flink 对应的job_name或者job_id

2020-04-30 Thread 俞剑波
flink集群是per job模式,所以当有多个并行度,任务就会有多个taskmanager在多个机器上面。采用flink-metrics-prometheus的方式会遇到一个问题,就是收集taskmanager的jvm信息时,promethues返回的记录里面没有flink对应的job_name或者job_id,导致不能将数据汇总起来,请问各位有遇到这个问题吗?是怎么解决的?非常感谢!

join in sql without time interval

2020-04-30 Thread lec ssmi
Hi: As the following sql: SELECT * FROM Orders INNER JOIN Product ON Orders.productId = Product.id If we use datastream API instead of sql, how should it be achieved? Because the APIs in DataStream only have Window Join and Interval Join,the official website says that to solve the

Re: Using logicalType in the Avro table format

2020-04-30 Thread Arvid Heise
Hi Gyula, can you please check which Avro version you are using? Avro only supports Java 8 time (java.time.LocalDateTime) after 1.9.2. Before that everything was hardcoded to joda time. However, I'm not entirely sure where the Java 8 time is coming in your example, as I'm not familiar with the

Re:Re:回复:flink1.9,state process api 读取MapState,出错

2020-04-30 Thread guanyq
我自己又实现了一个MapState的存储job,用同样的代码是可以读取出所有MapState的key的。 在 2020-04-30 15:23:39,"guanyq" 写道: >public class MyKeyedMapState {public String key;public >ListString value;public MyKeyedMapState() { >}public String getKey() {return key; >}public void setKey(String key)

Re: doing demultiplexing using Apache flink

2020-04-30 Thread Arvid Heise
Hi Dhurandar, if you use KafkaSerializationSchema [1], you can create a producer record, where you explicitly set the output topic. The topic can be arbitrarily calculated. You pass it while constructing the sink: stream.addSink(new FlinkKafkaProducer( topic, serSchema, // <--

Re:回复:flink1.9,state process api 读取MapState,出错

2020-04-30 Thread guanyq
public class MyKeyedMapState {public String key;public ListString value;public MyKeyedMapState() { }public String getKey() {return key; }public void setKey(String key) {this.key = key;}public ListString getValue() { return value;}

?????? flink 1.10????????

2020-04-30 Thread ??????(Jiacheng Jiang)
hi Xintong??java??native memoryapi ---- ??: "Xintong Song"http://apache-flink.147419.n8.nabble.com/Flink-tt1869.html

Re:回复:flink1.9,state process api 读取MapState,出错

2020-04-30 Thread guanyq
有没有发现,我这还是报错。 在 2020-04-30 09:40:45,"shx" <17611022...@163.com> 写道: >能发一下写入状态的代码看一下吗,还有一个问题,键值状态访问,你的代码里是读出了所有key关键的mapstate吗,谢谢 > > > > >| | >邵红晓 >| >| >邮箱:17611022...@163.com >| > >签名由 网易邮箱大师 定制 > >在2020年04月30日 09:04,guanyq 写道: >代码中没特别指定Serializer。都是默认的序列化。 >在 2020-04-29

Re:Re:回复:flink1.9,state process api 读取MapState,出错

2020-04-30 Thread guanyq
还有其他可能的原因么。 在 2020-04-30 10:25:32,"guanyq" 写道: 附件是代码 还有一个问题,键值状态访问,你的代码里是读出了所有key关键的mapstate吗 -- 代码是读出所有map状态的key。 在 2020-04-30 09:40:45,"shx" <17611022...@163.com> 写道: >能发一下写入状态的代码看一下吗,还有一个问题,键值状态访问,你的代码里是读出了所有key关键的mapstate吗,谢谢 > > > > >| | >邵红晓 >| >|

execution.checkpointing.tolerable-failed-checkpoints ????

2020-04-30 Thread ??????(Jiacheng Jiang)
hi flink-conf.yaml??execution.checkpointing.tolerable-failed-checkpoints?? 1000checkpoint??job?? best jungglge

Re: Savepoint memory overhead

2020-04-30 Thread Yun Tang
Hi Lasse Would you please give more details? 1. What is the memory configuration of your task manager? e.g the memory size of process, managed memory. And how large the memory would increase to once you meet problem. 2. Did you use managed memory to control RocksDB? [1] 3. Why you

Re: Reading from sockets using dataset api

2020-04-30 Thread Arvid Heise
Hi Kaan, not entirely sure I understand your solution. I gathered that you create a dataset of TCP addresses and then use flatMap to fetch and output the data? If so, then I think it's a good solution for batch processing (DataSet). It doesn't work in DataStream because it doesn't play well with