Re: Reading Key-Value from Kafka | Eviction policy.

2019-09-26 Thread miki haiat
I'm sure there is several ways to implement it. Can you elaborate more on your use case ? On Fri, Sep 27, 2019, 08:37 srikanth flink wrote: > Hi, > > My data source is Kafka, all these days have been reading the values from > Kafka stream to a table. The table just grows and runs into a heap

Reading Key-Value from Kafka | Eviction policy.

2019-09-26 Thread srikanth flink
Hi, My data source is Kafka, all these days have been reading the values from Kafka stream to a table. The table just grows and runs into a heap issue. Came across the eviction policy that works on only keys, right? Have researched to configure the environment file(Flink SLQ) to read both key

Re: CSV Table source as data-stream in environment file

2019-09-26 Thread Dian Fu
You need to add the following configuration to configure it run in streaming mode[1]. execution: type: streaming Regards, Dian [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html#environment-files

Re: 订阅邮件

2019-09-26 Thread Dian Fu
To subscribe to the mailing list, you need send email to the following address dev-subscr...@flink.apache.org , user-subscr...@flink.apache.org and user-zh-subscr...@flink.apache.org

Re: 订阅邮件

2019-09-26 Thread Dian Fu
To subscribe to the mailing list, you need send email to the following address dev-subscr...@flink.apache.org , user-subscr...@flink.apache.org and user-zh-subscr...@flink.apache.org

Re: Debugging slow/failing checkpoints

2019-09-26 Thread Congxian Qiu
Hi Steve 1. Do you use exactly once or at least once? 2. Do you use incremental or not 3. Do you have any timer, and where does the timer stored(Heap or RocksDB), you can ref the config here[1], you can try store the timer in RocksDB. 4. Does the align time too long 5. You can check if it is

Re: ** Help need w.r.t parallelism settings in flink **

2019-09-26 Thread Zhu Zhu
Hi Akshay, For your questions, 1. One main purpose of maxParallelism is to decide the count of keyGroup. keyGroup is the bucket for keys when doing keyBy partitioning. So a larger maxParallelism indicates a finer granularity for key distribution. No matter it's a stateful operator or not. 2. You

** Help need w.r.t parallelism settings in flink **

2019-09-26 Thread Akshay Iyangar
Hi So we are running a beam pipeline that uses flink as its execution engine. We are currently on flink1.8 So per the flink documentation I see that there is an option that allows u to set Parallelism and maxParallelism. We actually want to set both so that we can dynamically scale the

Re: Flink job manager doesn't remove stale checkmarks

2019-09-26 Thread Clay Teeter
I looked into the disk issues and found that Fabian was on the right path. The checkpoints that were lingering were in-fact in use. Thanks for the help! Clay On Thu, Sep 26, 2019 at 8:09 PM Clay Teeter wrote: > I see, I'll try turning off incremental checkpoints to see if that helps. > > re:

Debugging slow/failing checkpoints

2019-09-26 Thread Steven Nelson
I am working with an application that hasn't gone to production yet. We run Flink as a cluster within a K8s environment. It has the following attributes 1) 2 Job Manager configured using HA, backed by Zookeeper and HDFS 2) 4 Task Managers 3) Configured to use RocksDB. The actual RocksDB files are

UnitTests and ProcessTimeWindows - Missing results

2019-09-26 Thread Clay Teeter
What is the best way to run unit tests on streams that contain ProcessTimeWindows? Example: def bufferDataStreamByProcessId(ds: DataStream[MaalkaRecord]): DataStream[MaalkaRecord] = { ds.map { r => println(s"data in: $r") // Data shows up here r }.keyBy { mr => val r =

Re: Flink job manager doesn't remove stale checkmarks

2019-09-26 Thread Clay Teeter
I see, I'll try turning off incremental checkpoints to see if that helps. re: Diskspace, i could see a scenario with my application where i could get 10,000+ checkpoints, if the checkpoints are additive. I'll let you know what i see. Thanks! Clay On Wed, Sep 25, 2019 at 5:40 PM Fabian Hueske

Re: Flink- Heap Space running out

2019-09-26 Thread Fabian Hueske
Hi, I don' think that the memory configuration is the issue. The problem is the join query. The join does not have any temporal boundaries. Therefore, both tables are completely stored in memory and never released. You can configure a memory eviction strategy via idle state retention [1] but you

Re: Flink- Heap Space running out

2019-09-26 Thread miki haiat
You can configure the task manager memory in the config.yaml file. What is the current configuration? On Thu, Sep 26, 2019, 17:14 Nishant Gupta wrote: > am running a query to join a stream and a table as below. It is running > out of heap space. Even though it has enough heap space in flink

Re: Problems with java.utils

2019-09-26 Thread Dian Fu
Hi Nick, There is a package named "org.apache.flink.table.api.java" in flink and so the import of "org.apache.flink.table.api._" causes "org.apache.flink.table.api.java" imported. Then all the import of package starting with "java" such as "import java.util.ArrayList" will try to find the

Re: 向社区提交代码怎么自己验证

2019-09-26 Thread Dian Fu
1)build失败的话,可以看一下失败原因,如果和这个PR没有关系,可以通过“@flinkbot run travis”重新触发travis 2)本地可以通过“mvn clean verify”验证一下,详细可以看一下[1],我看你这个改动是doc相关的,一般来说,不会导致build失败 [1] https://flink.apache.org/contributing/contribute-code.html > 在 2019年9月26日,下午9:56,高飞龙

Re: Problems with java.utils

2019-09-26 Thread Nicholas Walton
Dian That fixed the problem thanks you. It would appear that someone has taken it upon themselves to redefine part of the Java standard library in org.apache.flink.table.api._ NIck > On 26 Sep 2019, at 15:16, Dian Fu wrote: > > Hi Nick, > >>> [error]

Re: 向社区提交代码怎么自己验证

2019-09-26 Thread Zili Chen
看了下你的 PR,应该是因为不稳定测试导致的。文档相关的改动应该跟 CI 无关。 Best, tison. Zili Chen 于2019年9月26日周四 下午10:21写道: > mvn verify 可以跑单元测试和做编译期检查(如 checkstyle) > > Best, > tison. > > > 高飞龙 于2019年9月26日周四 下午9:56写道: > >> hi,我在向社区提交PR时,提示build失败( >> https://github.com/apache/flink/pull/9749#issuecomment-534149758) >> >> >>

Re: 向社区提交代码怎么自己验证

2019-09-26 Thread Zili Chen
mvn verify 可以跑单元测试和做编译期检查(如 checkstyle) Best, tison. 高飞龙 于2019年9月26日周四 下午9:56写道: > hi,我在向社区提交PR时,提示build失败( > https://github.com/apache/flink/pull/9749#issuecomment-534149758) > > > 我应该怎么做?在提交PR之前我可以执行什么脚本先在本地进行build测试吗? > > > > > > -- > > > > gaofeilong198...@163.com

Re: Problems with java.utils

2019-09-26 Thread Dian Fu
Hi Nick, >> [error] ……/src/main/scala/org/example/Job.scala:30:13: object util is >> not a member of package org.apache.flink.table.api.java >> [error] import java.util.ArrayList The error message shows that it tries to find "util.ArrayList" under package

Flink- Heap Space running out

2019-09-26 Thread Nishant Gupta
am running a query to join a stream and a table as below. It is running out of heap space. Even though it has enough heap space in flink cluster (60GB * 3) Is there an eviction strategy needed for this query ? *SELECT sourceKafka.* FROM sourceKafka INNER JOIN DefaulterTable ON

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-26 Thread Sean Hester
thanks to everyone for all the replies. i think the original concern here with "just" relying on the HA option is that there are some disaster recovery and data center migration use cases where the continuity of the job managers is difficult to preserve. but those are admittedly very edgy use

Re: Problems with java.utils

2019-09-26 Thread Nicholas Walton
I’ve shrunk the problem down to a minimal size. The code package org.example import org.apache.flink.table.api._ import org.apache.http.HttpHost import java.util.ArrayList object foo { val httpHosts = new ArrayList[HttpHost] httpHosts.add(new HttpHost("samwise.local", 9200, "http")) }

向社区提交代码怎么自己验证

2019-09-26 Thread 高飞龙
hi,我在向社区提交PR时,提示build失败(https://github.com/apache/flink/pull/9749#issuecomment-534149758) 我应该怎么做?在提交PR之前我可以执行什么脚本先在本地进行build测试吗? -- gaofeilong198...@163.com

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread srikanth flink
Awesome, thanks! On Thu, Sep 26, 2019 at 5:50 PM Terry Wang wrote: > Hi, Srikanth~ > > In your code, > DataStream outStreamAgg = tableEnv.toRetractStream(resultTable, > Row.class).map(t -> {}); has converted the resultTable into a DataStream > that’s unrelated with tableApi, > And the

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread Terry Wang
Hi, Srikanth~ In your code, DataStream outStreamAgg = tableEnv.toRetractStream(resultTable, Row.class).map(t -> {}); has converted the resultTable into a DataStream that’s unrelated with tableApi, And the following code `outStreamAgg.addSink(…)` is just a normall stream write to a FlinkKafka

Re: Challenges Deploying Flink With Savepoints On Kubernetes

2019-09-26 Thread Yang Wang
Hi, Aleksandar Savepoint option in standalone job cluster is optional. If you want to always recover from the latest checkpoint, just as Aleksandar and Yun Tang said you could use the high-availability configuration. Make sure the cluster-id is not changed, i think the job could recover both at

Problems with java.utils

2019-09-26 Thread Nicholas Walton
I’m having a problem using ArrayList in Scala . The code is below import org.apache.flink.core.fs._ import org.apache.flink.streaming.api._ import org.apache.flink.streaming.api.scala._ import org.apache.flink.table.api._ import org.apache.flink.table.api.scala._ import

Re: Multiple Job Managers in Flink HA Setup

2019-09-26 Thread Yang Wang
Hi Steven, I have test the standalone cluster on kubernetes with 2 jobmanager. Using active jobmanager webui or standby jobmanager webui to submit flink jobs could both work fine. So i think maybe the problem is about your HAProxy. Does your HAProxy will forward traffic to both active and standby

回复: 关于1.9使用hive中的udf

2019-09-26 Thread like
非常感谢,但是我试过的 hive.xx_db.xx_udf 这种方式是找不到这个udf的,必须使用 tableEnv.useCatalog("hive") 、tableEnv.useDatabase("default") 在2019年9月26日 16:43,Terry Wang 写道: 问题1: default关键词报错是否试过 hive.`default`.xx_udf 方式, 这样转义应该能解决关键词报错的问题。 问题2: flink 1.10 中会支持modular plugin的方式,使用起来会更方便 Best, Terry Wang 在

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread srikanth flink
Hi Terry Wang, Thanks for quick reply. I would like to understand more on your line " If the target table is a type of Kafka which implments AppendStreamTableSink, the update-mode will be append only". If your statement defines retract mode could not be used for Kafka sinks as it implements

CSV Table source as data-stream in environment file

2019-09-26 Thread Nishant Gupta
Hi Team, How do we define csv table source as a data-stream instead of data-set in environment file.? Whether or not i mention update-mode: append or not... I takes only csv file as data-set. Is there any detailed reference to environment file configuration where sinks and sources are defined.

Re: 关于1.9使用hive中的udf

2019-09-26 Thread Terry Wang
问题1: default关键词报错是否试过 hive.`default`.xx_udf 方式, 这样转义应该能解决关键词报错的问题。 问题2: flink 1.10 中会支持modular plugin的方式,使用起来会更方便 Best, Terry Wang > 在 2019年9月25日,下午7:21,like 写道: > > 各位大佬好: >目前我在使用1.9版本中hive的udf碰到如下问题: >

Re: Flink SQL update-mode set to retract in env file.

2019-09-26 Thread Terry Wang
Hi srikanth~ The Flink SQL update-mode is inferred from the target table type. For now, there are three StreamTableSink type, `AppendStreamTableSink` `UpsertStreamTableSink` and `RetractStreamTableSink`. If the target table is a type of Kafka which implments AppendStreamTableSink, the

????????org.apache.flink.streaming.api.operators.TimerHeapInternalTimer ???????????????? ??????????????????

2019-09-26 Thread ??????
,: StreamQueryConfig queryConfig = tabEnv.queryConfig(); queryConfig.withIdleStateRetentionTime(Time.seconds(20), Time.minutes(6)); DataStream source = env.socketTextStream("localhost", 10028) .map(new MapFunction() {

Flink SQL update-mode set to retract in env file.

2019-09-26 Thread srikanth flink
How could I configure environment file for Flink SQL, update-mode: retract? I have this for append: properties: - key: zookeeper.connect value: localhost:2181 - key: bootstrap.servers value: localhost:9092 - key: group.id value: