What should I take care if I enable object reuse

2019-03-13 Thread yinhua.dai
Hi Community, I saw from the document that we need to be careful about enable the object reuse feature. So which part should I check to avoid any issues? Can any one help to summarize? Thank you. // *enableObjectReuse() / disableObjectReuse()* By default, objects are not reused in Flink.

Re: How to join stream and dimension data in Flink?

2019-03-13 Thread 徐涛
Hi Hequn, Thanks a lot for your answer! That is very helpful for me. I still have some questions about stream and dimension data join and temporal table join: 1. I found the temporal table join is still a one stream driven join, I do not know why the dimension data join

Re: Set partition number of Flink DataSet

2019-03-13 Thread qi luo
Hi Ken, Do you mean that I can create a batch sink which writes to N files? That sounds viable, but since our data size is huge (billions of records & thousands of files), the performance may be unacceptable. I will check Blink and give it a try anyway. Thank you, Qi > On Mar 12, 2019, at

Re: Migrating Existing TTL State to 1.8

2019-03-13 Thread Ning Shi
Just wondering if anyone has any insights into the new TTL state cleanup feature mentioned below. Thanks, — Ning > On Mar 11, 2019, at 1:15 PM, Ning Shi wrote: > > It's exciting to see TTL state cleanup feature in 1.8. I have a question > regarding the migration of existing TTL state to the

Challenges using Flink REST API

2019-03-13 Thread Wouter Zorgdrager
Hey all! I'm looking for some advice on the following; I'm working on an abstraction on top of Apache Flink to 'pipeline' Flink applications using Kafka. For deployment this means that all these Flink jobs are embedded into one jar and each job is started using an program argument (e.g. "--stage

Re: Challenges using Flink REST API

2019-03-13 Thread Chesnay Schepler
You should get the full stacktrace if you upgrade to 1.7.2 . On 13.03.2019 09:55, Wouter Zorgdrager wrote: Hey all! I'm looking for some advice on the following; I'm working on an abstraction on top of Apache Flink to 'pipeline' Flink applications using Kafka. For deployment this means that

Will state TTL support event time cleanup in 1.8?

2019-03-13 Thread Sergei Poganshev
Do improvements introduced in https://issues.apache.org/jira/browse/FLINK-10471 add support for event time TTL?

Re: Will state TTL support event time cleanup in 1.8?

2019-03-13 Thread Stefan Richter
TTL based on event time is not part or 1.8, but likely to be part of 1.9. > On 13. Mar 2019, at 13:17, Sergei Poganshev wrote: > > Do improvements introduced in > https://issues.apache.org/jira/browse/FLINK-10471 > add support for event >

Partitions and the number of cores/executors

2019-03-13 Thread mbilalce . dev
Hi, I am working with Gelly graph library but I think the question is applicable in general. I just want to confirm if a single data partition in Flink is executed by only a single executor/core? i.e. multiple executors can't be utilized to process a single partition in parallel. So, if I need

Re: Understanding timestamp and watermark assignment errors

2019-03-13 Thread Konstantin Knauf
Hi Andrew, generally, this looks like a concurrency problem. Are you using asynchronous checkpointing? If so, could you check if this issue also occurs with synchronous checkpointing. There have been reports recently, that there might be a problem with some Kryo types. Can you set the logging

Custom Partitioner and Graph Algorithms

2019-03-13 Thread MBilal
Hi, I am observing a behaviour in the task statistics that I don't fully understand. Essentially I have create a partitioner that assigns all the edges to a single partition. I see imbalance (in terms of records sent/received) in the task statistics of different instances of the same operator

Re: Challenges using Flink REST API

2019-03-13 Thread Wouter Zorgdrager
Hi Chesnay, Unfortunately this is not true when I run the Flink 1.7.2 docker images. The response is still: { "errors": [ "org.apache.flink.client.program.ProgramInvocationException: The main method caused an error." ] } Regards, Wouter Zorgdrager Op wo 13 mrt. 2019 om 10:42

Re: Challenges using Flink REST API

2019-03-13 Thread Chesnay Schepler
Can you give me the stacktrace that is logged in the JobManager logs? On 13.03.2019 10:57, Wouter Zorgdrager wrote: Hi Chesnay, Unfortunately this is not true when I run the Flink 1.7.2 docker images. The response is still: { "errors": [

Re: Understanding timestamp and watermark assignment errors

2019-03-13 Thread Stefan Richter
Hi, I think this looks like the same problem as in this issue: https://issues.apache.org/jira/browse/FLINK-11420 Best, Stefan > On 13. Mar 2019, at 09:41, Konstantin Knauf wrote: > > Hi Andrew, > > generally, this looks like a

Re: Challenges using Flink REST API

2019-03-13 Thread Wouter Zorgdrager
Hey Chesnay, Actually I was mistaken by stating that in the JobManager logs I got the full stacktrace because I actually got the following there: 2019-03-13 11:55:13,906 ERROR org.apache.flink.runtime.webmonitor.handlers.JarRunHandler- Exception occurred in REST handler:

Re: Challenges using Flink REST API

2019-03-13 Thread Chesnay Schepler
My bad, I was looking at the wrong code path. The linked issue isn't helpful, as it only slightly extends the exception message. You cannot get the stacktrace in 1.7.X nor in the current RC for 1.8.0 . I've filed https://issues.apache.org/jira/browse/FLINK-11902 to change this. The 1.8.0 RC

Re: Partitions and the number of cores/executors

2019-03-13 Thread Stefan Richter
Hi, Your assumption is right. Parallel processing is based in splitting inputs and each split is only processed by one task instance at a time. Best, Stefan > On 13. Mar 2019, at 09:52, mbilalce@gmail.com wrote: > > Hi, > > I am working with Gelly graph library but I think the question

Batch jobs stalling after initial progress

2019-03-13 Thread Marko Mušnjak
Hi, I'm running flink batch jobs on EMR 5.21, and I'm seeing many (>50%) jobs stall and make no progress after some initial period. I've seen the behaviour earlier (5.17), but not nearly as much as now. The job is a fairly simple enrichment job, loading an avro metadata file, creating several

Re: K8s job cluster and cancel and resume from a save point ?

2019-03-13 Thread Vishal Santoshi
BTW, does 1.8 also solve the issue where we can cancel with a save point. That too is broken in 1.7.2 curl --header "Content-Type: application/json" --request POST --data '{"target-directory":"hdfs://nn-crunchy:8020/tmp/xyz14","cancel-job":*true*}'

Flink to MQ connector with checkpoint support for exctly once semantics

2019-03-13 Thread min.tan
Hi, Our Flink jobs need to read messages from IBM MQ and write messages into IBM. Just wonder if there are already some MQ connectors with two phase committee sink Function or Checkpoint Listener and checkpoint function implemented to support the exactly once semantics. Many thanks in advance.

Re: Migrating Existing TTL State to 1.8

2019-03-13 Thread Stefan Richter
Hi, If you are worried about old state, you can combine the compaction filter based TTL with other cleanup strategies (see docs). For example, setting `cleanupFullSnapshot` when you take a savepoint it will be cleared of any expired state and you can then use it to bring it into Flink 1.8.

Scala API support plans

2019-03-13 Thread Ilya Karpov
Hi guys, what are the plans for scala-api support? Asking about it because I’ve recently found that scala-api used to catch up with java one. Thanks!

Re: Backoff strategies for async IO functions?

2019-03-13 Thread Rong Rong
Thanks for raising the concern @shuyi and the explanation @konstantin. Upon glancing on the Flink document, it seems like user have full control on the timeout behavior [1]. But unlike AsyncWaitOperator, it is not straightforward to access the internal state of the operator to, for example, put

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-13 Thread Robert Metzger
@Bowen: I agree. Confluent Hub looks nicer, but it is on their company website. I guess the likelihood that they give out code from their company website is fairly low. @Nils: Beam's page is similar to our Ecosystem page, which we'll reactivate as part of this PR:

Re: Set partition number of Flink DataSet

2019-03-13 Thread Ken Krugler
Hi Qi, > On Mar 13, 2019, at 1:26 AM, qi luo wrote: > > Hi Ken, > > Do you mean that I can create a batch sink which writes to N files? Correct. > That sounds viable, but since our data size is huge (billions of records & > thousands of files), the performance may be unacceptable. The main

Re: Batch jobs stalling after initial progress

2019-03-13 Thread Ken Krugler
Hi Marko, Some things that have caused my jobs to run very slowly (though not completely stall) 1. Cross-joins generating huge result sets. 2. Joins causing very large spills to disk. 3. Slow external API access With streaming, iterations can cause stalls, but I don’t think that’s true for

Re: How to join stream and dimension data in Flink?

2019-03-13 Thread Hequn Cheng
Hi Henry, These are good questions! I would rather not to add the temporal and lateral prefix in front of the join. The temporal table is a concept orthogonal to join. We should say join a temporal table or join a Lateral table. 1. You can of course use stream-stream join. Introducing the

Re: Set partition number of Flink DataSet

2019-03-13 Thread qi luo
Hi Ken, Agree. I will try partitonBy() to reducer the number of parallel sinks, and may also try sortPartition() so each sink could write files one by one. Looking forward to your solution. :) Thanks, Qi > On Mar 14, 2019, at 2:54 AM, Ken Krugler wrote: > > Hi Qi, > >> On Mar 13, 2019, at

Re: Conflicting Cassandra versions while using flink-connector-cassandra

2019-03-13 Thread Gustavo Momenté
Just created a PR trying to address this issue: https://github.com/apache/flink/pull/7980 what do you think? Em ter, 12 de mar de 2019 às 23:23, Gustavo Momenté < momente.gust...@gmail.com> escreveu: > Can I shade `flink-connector-cassandra` version? And if so do you know why > it isn't shaded

flink 关于kafka 两流去重问题

2019-03-13 Thread 邓成刚【139】
HI,大家好,我有一个问题,想请教一下: 问题描述: 我有两张表(来自kafka 的流) table1(EVENTTIME,NEW_EVENT_ID,F4,F6),table2(EVENTTIME,NEW_EVENT_ID,F2,F3) 两个表 都在 EVENTTIME 定义了rowtime 属性 我想把两个流用UNION ALL并去重,如下语句: Table id_distinct = tableEnv.sqlQuery("select distinct EVENTTIME,NEW_EVENT_IDfrom (select

blink ha,进程启动就挂掉

2019-03-13 Thread xiao...@chinaunicom.cn
Hi,All 搭建了blink的ha,节点为:JM(node1,node2),TM(node3,node4,node5)但是启动后node1的进程就挂掉,node2的进程不能启动,报错如下: node1的JobManager日志: ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint

Re:flink任务 动态更新(热部署) 问题请教

2019-03-13 Thread TsingJyujing
Mailing List 是不能发图的 热部署是不可能热部署的,这辈子不可能热部署。 要是动态修改配置,DAG又不会改。 只有修改修改配置文件和代码,然后重新Deploy才能维持得了新业务这样子…… 我们现在的解决方法是,启动的时候从Consul KV 动态读取 Table 的一些配置。 以Sink为例,你需要定义表结构(字段和类型,以及表名),还需要定义Append和Retract的时候做什么操作。 以Source为例的话,你要定义来源数据格式,和如何生成指定的表。 如果有套路的话(比如都是JSON类型的Source或者写入JDBC的Sink),可以做到修改配置,重启就生效。

flink-question

2019-03-13 Thread 郭恒
问题描述: 1、我有一个MySQL数据源(SourceTable Object):里面有id, name, my_time三个字段,他们的类型分别是int, String, String类型。 我使用的是JDBCInputFormat读取数据源,读取数据的SQL语句是: select id, name, my_time from mysql_source 2、当我使用sql API进行操作,操作语句如下: select id, my_time, name from table_source (检索顺序为int, String, String) 3、我定义了一个TableSink