Flink Kafka consumer auto-commit timeout

2020-03-08 Thread Rong Rong
Hi All, I would like to bring back this discussion which I saw multiple times in previous ML threads [1], but there seem to have no solution if checkpointing is disabled. All of these ML reported exceptions have one common pattern: > *INFO* org.apache.kafka.clients.consumer.internals.AbstractCoo

Re: KafkaConsumer keeps getting InstanceAlreadyExistsException

2020-03-15 Thread Rong Rong
We also had seen this issue before running Flink apps in a shared cluster environment. Basically, Kafka is trying to register a JMX MBean[1] for application monitoring. This is only a WARN suggesting that you are registering more than one MBean with the same client id "consumer-1", it should not a

Re: KafkaConsumer keeps getting InstanceAlreadyExistsException

2020-03-18 Thread Rong Rong
information. There might be >>>> some slight impact on the accuracy of the consumer metrics, but should be >>>> almost ignorable because the partition discoverer is quite inactive >>>> compared with the actual consumer. >>>> >>>>

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Rong Rong
Congratulations Hequn, well deserved! -- Rong On Wed, Aug 7, 2019 at 8:30 AM wrote: > Congratulations, Hequn! > > > > *From:* Xintong Song > *Sent:* Wednesday, August 07, 2019 10:41 AM > *To:* d...@flink.apache.org > *Cc:* user > *Subject:* Re: [ANNOUNCE] Hequn becomes a Flink committer > > >

Re: how to get the code produced by Flink Code Generator

2019-08-07 Thread Rong Rong
+1. I think this would be a very nice way to provide more verbose feedback for debugging. -- Rong On Wed, Aug 7, 2019 at 9:28 AM Fabian Hueske wrote: > Hi Vincent, > > I don't think there is such a flag in Flink. > However, this sounds like a really good idea. > > Would you mind creating a Jira

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread Rong Rong
Congratulations Andrey! On Wed, Aug 14, 2019 at 10:14 PM chaojianok wrote: > Congratulations Andrey! > At 2019-08-14 21:26:37, "Till Rohrmann" wrote: > >Hi everyone, > > > >I'm very happy to announce that Andrey Zagrebin accepted the offer of the > >Flink PMC to become a committer of the Flink

Re: Issue with FilterableTableSource and the logical optimizer rules

2019-08-19 Thread Rong Rong
Hi Itamar, The problem you described sounds similar to this ticket[1]. Can you try to see if following the solution listed resolves your issue? -- Rong [1] https://issues.apache.org/jira/browse/FLINK-12399 On Mon, Aug 19, 2019 at 8:56 AM Itamar Ravid wrote: > Hi, I’m facing a strange issue wi

Re: Problem with Flink on Yarn

2019-08-23 Thread Rong Rong
This seems like your Kerberos server is starting to issue invalid token to your job manager. Can you share how your Kerberos setting is configured? This might also relate to how your KDC servers are configured. -- Rong On Fri, Aug 23, 2019 at 7:00 AM Zhu Zhu wrote: > Hi Juan, > > Have you tried

Re: type error with generics ..

2019-08-24 Thread Rong Rong
Hi Debasish, I think the error refers to the output of your source instead of your result of the map function. E.g. DataStream ins = readStream(in, Data.class, serdeData)*.returns(new TypeInformation);* DataStream simples = ins.map((Data d) -> new Simple(d.getName())) .returns(new TypeHint(){}.get

Re: type error with generics ..

2019-08-25 Thread Rong Rong
.. > > (BTW I am not sure how to invoke returns on a DataStream and hence had to > do a fake map - any suggestions here ?) > > regards. > > On Sat, Aug 24, 2019 at 10:26 PM Rong Rong wrote: > >> Hi Debasish, >> >> I think the error refers to the output of

Re: type error with generics ..

2019-08-26 Thread Rong Rong
;>> Otherwise the type has to be specified explicitly using type information. >>> [info] at >>> org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:882) >>> [info] at >>> org.apache.flink.api.java.typeutils

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Rong Rong
Congratulations Zili! -- Rong On Wed, Sep 11, 2019 at 6:26 PM Hequn Cheng wrote: > Congratulations! > > Best, Hequn > > On Thu, Sep 12, 2019 at 9:24 AM Jark Wu wrote: > >> Congratulations Zili! >> >> Best, >> Jark >> >> On Wed, 11 Sep 2019 at 23:06, wrote: >> >> > Congratulations, Zili. >> >

Re: Extending Flink's SQL-Parser

2019-09-18 Thread Rong Rong
Hi Dominik, To add to Rui's answer. there are other examples I can think of on how to extend Calcite's DDL syntax is already in Calcite's Server module [1] and one of our open-sourced project [2]. you might want to check them out. -- Rong [1] https://github.com/apache/calcite/blob/master/server/

Re: Flink Join Time Window

2019-09-30 Thread Rong Rong
Hi Nishant, On a brief look. I think this is a problem with your 2nd query: > > *Table2*... > Table latestBadIps = tableEnv.sqlQuery("SELECT MAX(b_proctime) AS > mb_proctime, bad_ip FROM BadIP ***GROUP BY bad_ip***HAVING > MIN(b_proctime) > CURRENT_TIMESTAMP - INTERVAL '2' DAY "); > tableEnv.regi

Re: JDBC Table Sink doesn't seem to sink to database.

2019-10-17 Thread Rong Rong
Hi John, You are right. IMO the batch interval setting is used for increasing the JDBC execution performance purpose. The reason why your INSERT INTO statement with a `non_existing_table` the exception doesn't happen is because the JDBCAppendableSink does not check table existence beforehand. That

Re: JDBC Table Sink doesn't seem to sink to database.

2019-10-17 Thread Rong Rong
it to batch interval = 1 and it works fine. Anyways I > think the JDBC sink could have some improvements like batchInterval + time > interval execution. So if the batch doesn't fill up then execute what ever > is left on that time interval. > > On Thu, 17 Oct 2019 at 12:22, Ro

Re: JDBC Table Sink doesn't seem to sink to database.

2019-10-17 Thread Rong Rong
> > > On Thu, 17 Oct 2019 at 14:00, Rong Rong wrote: > >> Yes, I think having a time interval execution (for the AppendableSink) >> should be a good idea. >> Can you please open a Jira issue[1] for further discussion. >> >> -- >> Rong >> >>

Re: [DISCUSS] Semantic and implementation of per-job mode

2019-10-31 Thread Rong Rong
Hi All, Thanks @Tison for starting the discussion and I think we have very similar scenario with Theo's use cases. In our case we also generates the job graph using a client service (which serves multiple job graph generation from multiple user code) and we've found that managing the upload/downlo

Re: flink on yarn-cluster kerberos authentication for hbase

2019-11-08 Thread Rong Rong
Hi Can you share more information regarding how you currently setup your Kerberos that only works with Zookeeper? Does your HBase share the same KDC? -- Rong On Fri, Nov 8, 2019 at 12:33 AM venn wrote: > Thanks, I already seen, not work for me > > > > *发件人**:* Jaqie Chan > *发送时间:* Friday, No

Re: Limit max cpu usage per TaskManager

2019-11-09 Thread Rong Rong
Hi Lu, Yang is right. enabling cgroup isolation is probably the one you are looking for to control how Flink utilize the CPU resources. One more idea is to enable DominantResourceCalculator[1] which I think you've probably done so already. Found an interesting read[2] about this if you would lik

Re: [DISCUSS] Support configure remote flink jar

2019-11-23 Thread Rong Rong
Thanks @Tison for starting the discussion and sorry for joining so late. Yes, I think this is a very good idea. we already tweak the flink-yarn package internally to support something similar to what @Thomas mentioned: to support registering a Jar that has already uploaded to some DFS (needless to

Re: Apache Flink - Throttling stream flow

2019-11-27 Thread Rong Rong
Hi Mans, is this what you are looking for [1][2]? -- Rong [1] https://issues.apache.org/jira/browse/FLINK-11501 [2] https://github.com/apache/flink/pull/7679 On Mon, Nov 25, 2019 at 3:29 AM M Singh wrote: > Thanks Ciazhi & Thomas for your responses. > > I read the throttling example but want

Re: Flink ML feature

2019-12-12 Thread Rong Rong
Hi guys, Yes, as Till mentioned. The community is working on a new ML library and we are working closely with the Alink project to bring the algorithms. You can find more information regarding the new ML design architecture in FLIP-39 [1]. One of the major change is that the new ML library [2] wi

Re: Yarn Kerberos issue

2020-01-04 Thread Rong Rong
Hi Juan, Chesnay was right. If you are using CLI to launch your session cluster based on the document [1], you following the instruction to use kinit [2] first seems to be one of the right way to go. Another way of approaching it is to setup the kerberos settings in the flink-conf.yaml file [3]. F

Re: Completed job wasn't saved to archive

2020-01-09 Thread Rong Rong
Hi Pavel, Sorry for bringing this thread up so late. I was digging into the usage of the Flink history server and I found one situation where there would be no logs and no failure/success message from the cluster: In very rare case in our Flink-YARN session cluster: if an application master (JobMa

Re: Yarn Kerberos issue

2020-01-12 Thread Rong Rong
> > > >> Yang > > > > >> > > > > >> Juan Gentile 于2020年1月6日周一 下午3:55写道: > > > > >> > > > > >>> Hello Rong, Chesnay, > > > > >>> > > > &

Re: Yarn Kerberos issue

2020-01-13 Thread Rong Rong
hub.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md#dt-renewal-renewers-and-yarn > > > > If you think that you need more information about our issue, we can > organize a call and discuss about it. > > > > Regards, > > J

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Rong Rong
Congratulations, a big thanks to the release managers for all the hard works!! -- Rong On Wed, Feb 12, 2020 at 5:52 PM Yang Wang wrote: > Excellent work. Thanks Gary & Yu for being the release manager. > > > Best, > Yang > > Jeff Zhang 于2020年2月13日周四 上午9:36写道: > >> Congratulations! Really appre

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-21 Thread Rong Rong
Congratulations Jingsong!! Cheers, Rong On Fri, Feb 21, 2020 at 8:45 AM Bowen Li wrote: > Congrats, Jingsong! > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann > wrote: > >> Congratulations Jingsong! >> >> Cheers, >> Till >> >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao wrote: >> >>> Congr

Re: Over Window Not Processing Messages

2018-06-26 Thread Rong Rong
Hi Greg. Based on a quick test I cannot reproduce the issue, it is emitting messages correctly in the ITCase environment. can you share more information? Does the same problem happen if you use proctime? I am guessing this could be highly correlated with how you set your watermark strategy of your

Re: env.setStateBackend deprecated in 1.5 for FsStateBackend

2018-06-26 Thread Rong Rong
Hmm. If you have a wrapper function like this, it will not report deprecated warning. *def getFsStateBackend(path: String): StateBackend = return new FsStateBackend(path) * Since AbstractStateBackend implements StateBackend and *def setStateBackend(backend: StateBackend): StreamExecutionEnvironme

Re: Streaming

2018-06-27 Thread Rong Rong
Hi , Stream distinct accumulator is actually supported in SQL API [1]. The syntax is pretty much identical to the batch case. A simple example using the tumbling window will be. > SELECT COUNT(DISTINCT col) > FROM t > GROUP BY TUMBLE(rowtime, INTERVAL '1' MINUTE) I haven't added the support but

Re: error: object connectors is not a member of package org.apache.flink.streaming

2018-06-30 Thread Rong Rong
Hi Mich, Ted is correct, Flink release binary does not include any connectors and you will have to include the appropriate connector version. This is to avoid dependency conflicts between different Kafka releases. You probably need the specific Kafka connector version jar file as well, so in your

Re: How to use broadcast variables in data stream

2018-06-30 Thread Rong Rong
Hi Zhen, This might be a rather inefficient solution. We have encountered situations that we need to have some daily config update pushed to our flink streaming application, where the state is very large (but keyed). We end-up having a service to push that data into a separated kafka stream (which

Re: Displaying topic data with Flink streaming

2018-06-30 Thread Rong Rong
Hi Mich, How did you setup your local Kafka cluster, did you produce any message to it? Seems like you are using a standard local Kafka cluster setup for testing: "bootstrap.servers", "localhost:9092" "zookeeper.connect", "localhost:2181" so probably you need to manually produce some data, probab

Re: Passing type information to JDBCAppendTableSink

2018-07-01 Thread Rong Rong
Hi Chris, Looking at the code, seems like JDBCTypeUtil [1] is used for converting Flink TypeInformation into JDBC Type (Java.sql.type), and SQL_TIMESTAMP and SQL_TIME are both listed in the conversion mapping. However the JDBC types are different. Regarding the question whether your insert is cor

Re: The program didn't contain a Flink job

2018-07-02 Thread Rong Rong
Did you forget to call executionEnvironment.execute() after you define your Flink job? -- Rong On Mon, Jul 2, 2018 at 1:42 AM eSKa wrote: > Hello, We are currently running jobs on Flink 1.4.2. Our usecase is as > follows: > -service get request from customer > - we submit job to flink using Yar

Re: The program didn't contain a Flink job

2018-07-02 Thread Rong Rong
Hmm. That's strange. Can you explain a little more on how your YARN cluster is set up and how you configure the submission context? Also, did you try submitting the jobs in detach mode? Is this happening from time to time for one specific job graph? Or it is consistently throwing the exception fo

Re: Regarding external metastore like HIVE

2018-07-03 Thread Rong Rong
+1 on this feature, there have been a lot of pains for us trying to connect to external catalog / metastore as well. @shivam can you comment on the tickets regarding the specific use case and the type of external catalogs you are interested ? Thanks, Rong On Tue, Jul 3, 2018 at 3:16 AM Shivam Sh

Re: Passing type information to JDBCAppendTableSink

2018-07-05 Thread Rong Rong
;>>> - The types look OK. >>>> - You can also use Types.STRING, and Types.LONG instead of >>>> BasicTypeInfo.xxx >>>> - Beware that in the failure case, you might have multiple entries in >>>> the database table. Some databases sup

Re: Slide Window Compute Optimization

2018-07-05 Thread Rong Rong
Hi Yennie, AFAIK, the sliding window will in fact duplicate elements into multiple different streams. There's a discussion thread regarding this [1]. We are looking into some performance improvement, can you provide some more info regarding your use case? -- Rong [1] https://issues.apache.org/ji

Re: Description of Flink event time processing

2018-07-05 Thread Rong Rong
Hi Elias, Thanks for putting together the document. This is actually a very good, well-rounded document. I think you did not to enable access for comments for the link. Would you mind enabling comments for the google doc? Thanks, Rong On Thu, Jul 5, 2018 at 8:39 AM Fabian Hueske wrote: > Hi E

Re: Slide Window Compute Optimization

2018-07-06 Thread Rong Rong
ccurate the results? But Now it seems that the >> smaller step size of slide window,the longer time need to compute. Because >> once a element arrives, it will be processed in every window (number of >> windows = window size/step size)serially through one thread. >> >>

Re: Pass a method as parameter

2018-07-09 Thread Rong Rong
Hi Soheil, I think I knew what you meant by passing a function's name B to another method A: assuming in function "A" you are trying to dynamically load another function "B" based on either (1) some characteristic of the message in your data stream, or (2) some configuration during start up. And I

Re: Want to write Kafka Sink to SQL Client by Flink-1.5

2018-07-11 Thread Rong Rong
Hi Shivam, Thank you for interested in contributing to Kafka Sink for SQL client. Could you share your plan for implementation. I have some questions as there might have been some overlap with current implementations. On a higher level, 1. Are you using some type of metadata store to host topic sc

Re: Support for detached mode for Flink1.5 SQL Client

2018-07-11 Thread Rong Rong
Is the Gateway Mode [1] in the FLIP-24 SQL Client road map what you are looking for? -- Rong [1]: https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client On Tue, Jul 10, 2018 at 3:37 AM Shivam Sharma <28shivamsha...@gmail.com> wrote: > Hi All, > > Is there any way to run Flink1.5

Re: How to customize trigger for Count Time Window

2018-07-15 Thread Rong Rong
Hi Soheil, I don't think just overriding the window trigger function is sufficient, since your logic effectively changes the how elements are assigned to a window. Based on a quick scan I think your use case might be able to reuse the DynamicGapSessionWIndow [1], where you will have to create a cu

Re: Rest API calls

2018-08-01 Thread Rong Rong
Hi Yuvraj, Vino is right, having a customized function is probably the easiest at this moment. Alternatively, I think what you are looking for is very much similar to side-input feature of data stream[2]. Thanks, Rong [2] FLIP-17: https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+I

Re: How to submit flink job on yarn by java code

2018-08-15 Thread Rong Rong
I dont think your exception / code was attached. In general, this is largely depending on how your setup is. Are you trying to setup a long-running YARN session cluster or are you trying to directly use YARN cluster submit? [1]. We have an open-sourced project [2] with similar requirement submitti

Re: Error in KyroSerializer

2018-08-18 Thread Rong Rong
This sounds very much related to FLINK-10160 [1]. Would you mind upgrading your Flink version to 1.4.3 and try again? Thanks, Rong [1] https://issues.apache.org/jira/browse/FLINK-10160 On Fri, Aug 17, 2018 at 4:20 PM Pankaj Chaudhary wrote: > Hi, > > I am on Flink 1.4.2 and as part of my opera

Re: Error on SQL orderBy Error

2018-08-19 Thread Rong Rong
Hi Chris, This looks like a bug to me as val allOrders:Table = orderTable .select('id, 'order_date, 'amount, 'customer_id) .orderBy('id.asc) works perfectly fine. Could you file a bug report and kindly provide the Flink version you are using? I can take a look into it Thanks, Ron

Re: What's the advantage of using BroadcastState?

2018-08-19 Thread Rong Rong
Hi Paul, To add to Hequn's answer. Broadcast state can typically be used as "a low-throughput stream containing a set of rules which we want to evaluate against all elements coming from another stream" [1] So to add to the difference list is: whether it is "broadcast" across all keys if processing

Re: Error on SQL orderBy Error

2018-08-19 Thread Rong Rong
Filed https://issues.apache.org/jira/browse/FLINK-10172. Seems like it is affecting latest version. -- Rong On Sun, Aug 19, 2018 at 8:14 AM Rong Rong wrote: > Hi Chris, > > This looks like a bug to me as > val allOrders:Table = orderTable > .select('id,

Re: Kryo Serialization Issue

2018-08-26 Thread Rong Rong
This seems to be irrelevant to the issue for KyroSerializer in recent discussions [1]. which has been fixed in 1.4.3, 1.5.0 and 1.6.0. On a quick glance, this might have been a corrupted message in your decoding, for example a malformed JSON string. -- Rong [1] https://issues.apache.org/jira/brow

Re: AvroSchemaConverter and Tuple classes

2018-08-26 Thread Rong Rong
Yes you should be able to use Row instead of Tuple in your BatchTableSink. There's sections in Flink documentation regarding mapping of data types to table schemas [1]. and table can be converted into various typed DataStream [2] as well. Hope these are helpful. Thanks, Rong [1] https://ci.apache

Re: Cannot compile Wikipedia Edit Stream example

2018-09-02 Thread Rong Rong
Hi Can you elaborate more on how to reproduce the error? What's the maven archetype you use to generate the job, what's the flink version you used? what java version you used in Intellij? I am suspecting either there's a mixed up on Scala / Java scaffold. Since your Tuple should be org.apache.fli

Re: flink use hdfs DistributedCache

2018-09-02 Thread Rong Rong
I am not sure if this suits your use case, but Flink YARN cli does support transferring local resource to all YARN nodes. Simply use[1]: bin/flink run -m yarn-cluster -yt or bin/flink run -m yarn-cluster --yarnship should do the trick. It might have not been using the HDFS DistributedCache API t

Re: ListState - elements order

2018-09-13 Thread Rong Rong
I don't think ordering is guaranteed in the internal implementation, to the best of my knowledge. I agreed with Aljoscha, if there is no clear definition of ordering, it is assumed to be not preserved by default. -- Rong On Thu, Sep 13, 2018 at 7:30 PM vino yang wrote: > Hi Aljoscha, > > Regard

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Rong Rong
This is in fact a very strange behavior. To add to the discussion, when you mentioned: "raw Flink (windowed or not) nor when using Flink CEP", how were the comparisons being done? Also, were you able to get the results correct without the additional GROUP BY term of "foo" or "userId"? -- Rong On

Re: why same Sliding ProcessTime TimeWindow triggered twice

2018-09-17 Thread Rong Rong
I haven't dug too deep into the content. But seems like this line was the reason: .keyBy(s => s.endsWith("FRI")) essentially you are creating two key partitions (True, False) where each one of them has its own sliding window I believe. Can you printout the key space for each of th

Re: Question about Window Tigger

2018-09-19 Thread Rong Rong
Hi Chang, There were some previous discussion regarding how to debug watermark and window triggers[1]. Basically if there's no data for some partitions there's no way to advance watermark. As it would not be able to determine whether this is due to network failure or actually there's no data arriv

Re: Accessing Global State when processing KeyedStreams

2018-09-19 Thread Rong Rong
Hi Scott, Your use case seems to be a perfect fit for the Broadcast state pattern [1]. -- Rong [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/broadcast_state.html On Wed, Sep 19, 2018 at 7:11 AM Scott Sue wrote: > Hi, > > In our application, we receive Orde

Re: How to get the location of keytab when using flink on yarn

2018-09-24 Thread Rong Rong
Hi Just a quick thought on this: You might be able to use delegation token to access HBase[1]. It might be a more secure way instead of distributing your keytab over to all the YARN nodes. Hope this helps. -- Rong [1] https://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication On Mon, Sep 24

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Rong Rong
Hi Henry, Vino. I think IN operator was translated into either a RexSubQuery or a SqlStdOperatorTable.IN operator. I think Vino was referring to the first case. For the second case (I think that's what you are facing here), they are converted into tuples and the maximum we currently have in Flink

Re: Does Flink SQL "in" operation has length limit?

2018-09-28 Thread Rong Rong
Best, Hequn > > On Fri, Sep 28, 2018 at 9:27 PM Rong Rong wrote: > >> Hi Henry, Vino. >> >> I think IN operator was translated into either a RexSubQuery or a >> SqlStdOperatorTable.IN operator. >> I think Vino was referring to the first case. >>

Re: Does Flink SQL "in" operation has length limit?

2018-10-01 Thread Rong Rong
e corresponding code? I haven't looked into the code but we should >> definitely support this query. @Henry feel free to open an issue for it. >> >> Regards, >> Timo >> >> >> Am 28.09.18 um 19:14 schrieb Rong Rong: >> >> Yes. >> >&

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Rong Rong
Hi Xuefu, Thanks for putting together the overview. I would like to add some more on top of Timo's comments. 1,2. I agree with Timo that a proper catalog support should also address the metadata compatibility issues. I was actually wondering if you are referring to something like utilizing table s

Re: Reset kafka offets to latest on restart

2018-11-21 Thread Rong Rong
Hi Vishal, You can probably try using similar offset configuration as a service consumer. Maybe this will be useful to look at [1] Thanks, Rong [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration On Wed, Nov 21, 2018

Re: Questions about UDTF in flink SQL

2018-11-30 Thread Rong Rong
Hi Wangsan, If your require is essentially wha Jark describe, we already have a proposal following up [FLINK-9249] in its related/parent task: [FLINK-9484]. We are already implementing some of these internally and have one PR ready for review for FLINK-9294. Please kindly take a look and see if t

Re: Java Exapmle of Stochastic Outlier Selection

2019-01-12 Thread Rong Rong
Hi James, Usually Flink ML is highly integrated with Scala. I did poke around to and try to make the example work in Java and it does require a significant amount of effort, but you can try: First the implicit type parameters needs to be passed over to the execution environment to generate the Da

Re: Is the eval method invoked in a thread safe manner in Fink UDF

2019-01-13 Thread Rong Rong
According to the codegen result, I think each field is invoked sequentially. However, if you maintain internal state within your UDF, it is your responsibility to maintain the internal state consistency. Are you invoking external RPC in your "GetName" UDF method and that has to be async? -- Rong

Re: ElasticSearch RestClient throws NoSuchMethodError due to shade mechanism

2019-01-15 Thread Rong Rong
Hi Henry, I was not sure if this is the suggested way. but from what I understand of the pom file in elasticsearch5, you are allowed to change the sub version of the org.ealisticsearch.client via manually override using -Delasticsearch.version=5.x.x during maven build progress if you are using a

Re: TimeZone shift problem in Flink SQL

2019-01-25 Thread Rong Rong
Hi Henry, Unix epoch time values are always under GMT timezone, for example: - 1548162182001 <=> GMT: Tuesday, January 22, 2019 1:03:02.001 PM, or CST: Tuesday, January 22, 2019 9:03:02.001 PM. - 1548190982001 <=> GMT: Tuesday, January 22, 2019 9:03:02.001 PM, or CST: Wednesday, January 23, 2019 4

Re: Get nested Rows from Json string

2019-02-06 Thread Rong Rong
Hi François, I wasn't exactly sure this is a JSON object or JSON string you are trying to process. For a JSON string this [1] article might help. For a JSON object, I am assuming you are trying to convert it into a TableSource and processing using Table/SQL API, you could probably use the example

Re: Using custom evictor and trigger on Table API

2019-02-06 Thread Rong Rong
Hi Dongwon, There was a previous thread regarding this[1], unfortunately this is not supported yet. However there are some latest development proposal[2,3] to enhance the TableAPI which might be able to support your use case. -- Rong [1] http://apache-flink-user-mailing-list-archive.2336050.n4.

Re: Get nested Rows from Json string

2019-02-08 Thread Rong Rong
; I've changed Rows to Map, which ease the conversion process. > > Nevertheless I'm interested in any explanation about why row1.setField(i, > row2) appeends row2 at the end of row1. > > All the best > > François > > Le mer. 6 févr. 2019 à 19:33, Rong Rong a écrit :

Re: Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-11 Thread Rong Rong
Hi Stephen, Chesney was right, you will have to use a more complex version of the window processing function. Perhaps your goal can be achieve by this specific function with incremental aggregation [1]. If not you can always use the regular process window function [2]. Both of these methods have a

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-11 Thread Rong Rong
getKey(IN value)Hi Stephen, Yes, we had a discussion regarding for dynamic offsets and keys [1]. The main idea was the same: we don't have many complex operators after the window operator, thus a huge spike of traffic will occur after firing on the window boundary. In the discussion the best idea

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-13 Thread Rong Rong
Thanks Stephan for the great proposal. This would not only be beneficial for new users but also for contributors to keep track on all upcoming features. I think that better window operator support can also be separately group into its own category, as they affects both future DataStream API and b

Re: Impact of occasional big pauses in stream processing

2019-02-13 Thread Rong Rong
Hi Ajay, Flink handles "backpressure" in a graceful way so that it doesn't get affected when your processing pipeline is occasionally slowed down. I think the following articles will help [1,2]. In your specific case: the "KeyBy" operation will re-hash data so they can be reshuffled from all inpu

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Rong Rong
t;>> Because it is easier to update the roadmap on wiki compared to on flink web >>> site. And I guess we may need to update the roadmap very often at the >>> beginning as there's so many discussions and proposals in community >>> recently. We can move it into fli

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Rong Rong
derstanding is that this should not impact other flink jobs. Is that > correct? > > > > Thanks. > > > > Ajay > > > > *From: *Andrey Zagrebin > *Date: *Thursday, February 14, 2019 at 5:09 AM > *To: *Rong Rong > *Cc: *"Aggarwal, Ajay" , "

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Rong Rong
Rong On Thu, Feb 21, 2019 at 8:50 AM Stephan Ewen wrote: > Hi Rong Rong! > > I would add the security / kerberos threads to the roadmap. They seem to > be advanced enough in the discussions so that there is clarity what will > come. > > For the window operator with slicing, I

Re: Metrics for number of "open windows"?

2019-02-21 Thread Rong Rong
Hi Andrew, I am assuming you are actually using customized windowAssigner, trigger and process function. I think the best way for you to keep in-flight, not-yet-triggered windows is to emit metrics in these 3 pieces. Upon looking at the window operator, I don't think there's a a metrics (guage) t

Re: SinkFunction.Context

2019-02-21 Thread Rong Rong
Hi Durga, 1. currentProcessingTime: refers to this operator(SinkFunction)'s system time at the moment of invoke 1a. the time you are referring to as "flink window got the message" is the currentProcessingTime() invoked at the window operator (which provided by the WindowContext similar to this one

Re: Flink window triggering and timing on connected streams

2019-02-26 Thread Rong Rong
Hi Andrew, To add to the answer Till and Hequn already provide. WindowOperator are operating on a per-key-group based. so as long as you only have one open session per partition key group, you should be able to manage the windowing using the watermark strategy Hequn mentioned. As Till mentioned, t

Re: [Flink-Question] In Flink parallel computing, how do different windows receive the data of their own partition, that is, how does Window determine which partition number the current Window belongs

2019-03-03 Thread Rong Rong
Hi I am not sure if I understand your question correctly, so will try to explain the flow how elements gets into window operators. Flink makes the partition assignment before invoking the operator to process element. For the word count example, WindowOperator is invoked by StreamInputProcessor[1]

Re: Schema Evolution on Dynamic Schema

2019-03-07 Thread Rong Rong
Hi Shahar, I wasn't sure which schema are you describing that is going to "evolve" (is it the registered_table? or the output sink?). It will be great if you can clarify more. For the example you provided, IMO it is more considered as logic change instead of schema evolution: - if you are changin

Re: Schema Evolution on Dynamic Schema

2019-03-08 Thread Rong Rong
y i can transform the Row object to my generated >class by maybe the Row's column names corresponding to the generated class >field names, though i don't see Row object has any notion of column names. > > Would love to hear your thoughts. If you want me to paste some code he

Re: Schema Evolution on Dynamic Schema

2019-03-09 Thread Rong Rong
Hi Shahar, >From my understanding, if you use "groupby" withAggregateFunctions, they save the accumulators to SQL internal states: which are invariant from your input schema. Based on what you described I think that's why it is fine for recovering from existing state. I think one confusion you mig

Re: Backoff strategies for async IO functions?

2019-03-13 Thread Rong Rong
Thanks for raising the concern @shuyi and the explanation @konstantin. Upon glancing on the Flink document, it seems like user have full control on the timeout behavior [1]. But unlike AsyncWaitOperator, it is not straightforward to access the internal state of the operator to, for example, put th

Re: Backoff strategies for async IO functions?

2019-03-19 Thread Rong Rong
; Cheers, > Till > > On Wed, Mar 13, 2019 at 5:42 PM Rong Rong wrote: > >> Thanks for raising the concern @shuyi and the explanation @konstantin. >> >> Upon glancing on the Flink document, it seems like user have full control >> on the timeout behavior [1]. But u

Re: Map UDF : The Nothing type cannot have a serializer

2019-03-21 Thread Rong Rong
Based on what I saw in the implementation, I think you meant to implement a ScalarFunction right? since you are only trying to structure a VarArg string into a Map. If my understanding was correct. I think the Map constructor[1] is something you might be able to leverage. It doesn't resolve your N

Re: Calcite SQL Map to Pojo Map

2019-03-28 Thread Rong Rong
If your conversion is done using a UDF you need to override the getResultType method [1] to explicitly specify the key and value type information. As generic erasure will not preseve the part of your code. Thanks, Rong [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/udf

Re: Calcite SQL Map to Pojo Map

2019-03-29 Thread Rong Rong
I think the proper solution should not be Types.GENERIC(Map.class) as you will not be able to do any success processing with the return object. For example, Map['k', 'v'].get('k') will not work. I think there might be some problem like you suggested that they are handled as GenericType instead of

Re: Infinitely requesting for Yarn container in Flink 1.5

2019-03-29 Thread Rong Rong
Hi Qi, I think the problem may be related to another similar problem reported in a previous JIRA [1]. I think a PR is also in discussion. Thanks, Rong [1] https://issues.apache.org/jira/browse/FLINK-10868 On Fri, Mar 29, 2019 at 5:09 AM qi luo wrote: > Hello, > > Today we encountered an issue

Re: Source reinterpretAsKeyedStream

2019-03-29 Thread Rong Rong
Hi Adrienne, I think you should be able to reinterpretAsKeyedStream by passing in a DataStreamSource based on the ITCase example [1]. Can you share the full code/error logs or the IAE? -- Rong [1] https://github.com/apache/flink/blob/release-1.7.2/flink-streaming-java/src/test/java/org/apache/fl

Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread Rong Rong
Congrats! Thanks Aljoscha for being the release manager and all for making the release possible. -- Rong On Wed, Apr 10, 2019 at 4:23 AM Stefan Richter wrote: > Congrats and thanks to Aljoscha for managing the release! > > Best, > Stefan > > > On 10. Apr 2019, at 13:01, Biao Liu wrote: > > >

Re: Flink JDBC: Disable auto-commit mode

2019-04-12 Thread Rong Rong
Hi Konstantinos, Seems like setting for auto commit is not directly possible in the current JDBCInputFormatBuilder. However there's a way to specify the fetch size [1] for your DB round-trip, doesn't that resolve your issue? Similarly in JDBCOutputFormat, a batching mode was also used to stash up

Re: Flink JDBC: Disable auto-commit mode

2019-04-15 Thread Rong Rong
> >> >> *From:* Papadopoulos, Konstantinos >> >> *Sent:* Δευτέρα, 15 Απριλίου 2019 12:30 μμ >> *To:* Fabian Hueske >> *Cc:* Rong Rong ; user >> *Subject:* RE: Flink JDBC: Disable auto-commit mode >> >> >> >> Hi Fabian, >>

  1   2   >