Re: Best Flink SQL length proposal

2019-06-26 Thread JingsongLee
Hi Simon, Hope you can wrap them simply. In our scenario, there are also many jobs that have so many columns, the huge generated code not only lead to compile exception, but also lead to the code cannot be optimized by JIT. We are planning to introduce a Java code Splitter (analyze Java code

Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread JingsongLee
Hi Felipe: Yeah, you can use InMemoryExternalCatalog and CalciteConfig, but I don't quite understand what you mean. InMemoryExternalCatalog provides methods to create, drop, and alter (sub-)catalogs or tables. And CalciteConfig is for defining a custom Calcite configuration. They are two

Re: Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-06-26 Thread Xintong Song
Hi Singh, You can use the environment variable "FLINK_CONF_DIR" to specify path to the directory of config files. You can also override config options with command line arguments prefixed -D (for yarn-session.sh) or -yD (for 'flink run' command). Thank you~ Xintong Song On Wed, Jun 26, 2019

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Biao Liu
Hi Ken, In regard to oversized input splits, it seems to be a rare case beyond my expectation. However it should be fixed definitely since input split can be user-defined. We should not assume it must be small. I agree with Stephan that maybe there is something unexpectedly involved in the input

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread qi luo
Hi Stephan, We have met similar issues described as Ken. Would all these issues be hopefully fixed in 1.9? Thanks, Qi > On Jun 26, 2019, at 10:50 PM, Stephan Ewen wrote: > > Hi Ken! > > Sorry to hear you are going through this experience. The major focus on > streaming so far means that

Re: Best Flink SQL length proposal

2019-06-26 Thread Simon Su
Hi Jiongsong Thanks for your reply. It seems that to wrap fields is a feasible way for me now. And there already exists another JIRA FLINK-8921 try to improve this. Thanks, Simon On 06/26/2019 19:21,JingsongLee wrote: Hi Simon: Does your code include the PR[1]? If include: try

Re: HDFS checkpoints for rocksDB state backend:

2019-06-26 Thread Congxian Qiu
Hi Andrea As the NoClassDefFoundError, could you please verify that there exist `org.apache.hadoop.hdfs.protocol.HdfsConstants*` *in your jar. Or could you use Arthas[1] to check if there exists the class when running the job? [1] https://github.com/alibaba/arthas Best, Congxian Andrea Spina

Re: open() setup method not being called for AggregateFunctions?

2019-06-26 Thread Piyush Narang
Circling back on this as I was able to dig in a bit more our specific use-case (Datastream API and we perform a window + groupby). It seems as though the planner is creating an AggregateAggFunction which currently isn’t a RichFunction. From what I understand, not allowing rich aggregation

Any tutorial/example/blogpost/doc for Hive Source and Hive Sink with Flink streaming job?

2019-06-26 Thread Elkhan Dadashov
Hey Flink community, Just getting started with Flink. Wanted to ask if there is any tutorial/example/blogpost/doc for Hive Source and Hive Sink with Flink streaming job? Thanks.

HDFS checkpoints for rocksDB state backend:

2019-06-26 Thread Andrea Spina
Dear community, I'm trying to use HDFS checkpoints in flink-1.6.4 with the following configuration state.backend: rocksdb state.checkpoints.dir: hdfs:// rbl1.stage.certilogo.radicalbit.io:8020/flink/checkpoint state.savepoints.dir: hdfs:// rbl1.stage.certilogo.radicalbit.io:8020/flink/savepoints

Re: [ANNOUNCEMENT] June 2019 Bay Area Apache Flink Meetup

2019-06-26 Thread Xuefu Zhang
Hi all, As a gentle reminder, the meetup [1] will be on today at 6:30pm at Zendesk, 1019 Market Street, SF. Come on in for enlightening talks as well as foods and drinks. See you there! Regards, Xuefu [1] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262216929/ On Fri, Jun 21,

Error when creating InMemoryExternalCatalog to populate using another stream.

2019-06-26 Thread Felipe Gutierrez
Hi, I am trying to use the InMemoryExternalCatalog to register a table using the Java Table API 1.8 I want to update this table during with another stream that I will be reading. Then I plan to use the values of my InMemoryExternalCatalog to execute other queries. Is that a reasonable plan to

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Stephan Ewen
Hi Ken! Sorry to hear you are going through this experience. The major focus on streaming so far means that the DataSet API has stability issues at scale. So, yes, batch mode in current Flink version can be somewhat tricky. It is a big focus of Flink 1.9 to fix the batch mode, finally, and by

Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-06-26 Thread M Singh
Hi: I have a single EMR cluster with Flink and want to run multiple applications on it with different flink configurations.  Is there a way to  1. Pass the config file name for each application, or2. Overwrite the config parameters via command line arguments for the application.  This is similar

Re: Apache Flink - How to pass configuration params in the flink-config.yaml file to local execution environment

2019-06-26 Thread M Singh
Hey Folks:  Just wanted to see if you have any advice on this issue of passing config parameters to the application.  I've tried passing parameters by using ParameterTool parameterTool = ParameterTool.fromMap(config);StreamExecutionEnvironment env =

Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread Felipe Gutierrez
Hi JingsongLee, it is still not very clear to me. I imagine that I can create an InMemoryExternalCatalog and insert some tuples there (which will be in memory). Then I can use Calcite to use the values of my InMemoryExternalCatalog and change my plan. Is that correct? Do you have an example of

Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread JingsongLee
Hi Felipe: I think your approach is absolutely right. You can try to do some plan test just like [1]. You can find more CalciteConfigBuilder API test in [2].

Re: Best Flink SQL length proposal

2019-06-26 Thread JingsongLee
Hi Simon: Does your code include the PR[1]? If include: try set TableConfig.setMaxGeneratedCodeLength smaller (default 64000)? If exclude: Can you wrap some fields to a nested Row field to reduce field number. 1.https://github.com/apache/flink/pull/5613

Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread Felipe Gutierrez
Hi, does someone have a simple example using Table API and a Calcite rule which change/optimize the query execution plan of a query in Flink? >From the official documentation, I know that I have to create a CalciteConfig object [1]. Then, I based my firsts tests on this stackoverflow post [2]

Best Flink SQL length proposal

2019-06-26 Thread Simon Su
Hi all, Currently I faced a problem caused by a long Flink SQL. My sql is like “insert into tableA select a, b, c …….from sourceTable”, I have more than 1000 columns in select target, so that’s the problem, flink code generator will generate a RichMapFunction class and contains a map

Re: checkpoint stage size的问题

2019-06-26 Thread ReignsDYL
这是web ui的监控 -- Sent from: http://apache-flink.147419.n8.nabble.com/

[FLINK-10868] job cannot be exited immediately if job manager is timed out for some reason

2019-06-26 Thread Anyang Hu
Hi ZhenQiu && Rohrmann: Currently I backport the FLINK-10868 to flink-1.5, most of my jobs (all batch jobs) can be exited immediately after applying for the failed container to the upper limit, but there are still some jobs cannot be exited immediately. Through the log, it is observed that these

Re: checkpoint stage size的问题

2019-06-26 Thread Yun Tang
你好 这个问题问得有点稍微宽泛,因为并没有描述你所认为的checkpoint state size越来越大的周期。checkpoint state size变大有几个原因: 1. 上游数据量增大。 2. window设置时间较长,尚未触发,导致window内积攒的数据比较大。 3. window的类型决定了所需要存储的state size较大。 可以参考社区的文档[1] window state的存储空间问题。另外,在上游数据量没有显著变化的时候,若干窗口周期后的checkpoint state

Re: checkpoint stage size的问题

2019-06-26 Thread ReignsDYL
我发现窗口的trigger只进行了fire,并没有进行purge,我不清楚是不是这个原因,或者还是有其他的原因。 -- Sent from: http://apache-flink.147419.n8.nabble.com/

来自小乐的邮件

2019-06-26 Thread 小乐

checkpoint stage size的问题

2019-06-26 Thread ReignsDYL
各位好,我的项目的流计算模型source(kafka)->filter->keyby->window->aggregate->sink(hbase),现在发现window的subtask的checkpoint的stage size越来越大,请问是什么原因啊? -- Sent from: http://apache-flink.147419.n8.nabble.com/

关于使用Flink建设基于CDC方式的OGG数据湖

2019-06-26 Thread 唐门小师兄
一:背景描述现有数据中心基于GP做数仓,但是在OGG数据到GP做贴源ODS过程耗费集群太多资源,导致集群性能瓶颈,故想考虑基于GP + HADOOP的架构来重构数据仓库。 二:思路将入库贴源ODS的工作交由Hadoop生态来完成,数据从OGG实时流入kafka时,都是类似DDL中的"I, U ,D"类型数据, 然后Flink采用动态表方式找出每条主键的最新记录,内存中的动态表将按天写入HDFS, 然后GP通过外部表的方式,将今天的数据加载到GP中与存量做meger更新,这样既完成性能瓶颈的调优, 数据流转图简要如下: