Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Caizhi Weng
Hi Andres, In that case you should use `flatMap` method instead of `map` method. `flatMap` method allows you to return multiple elements and collect them all into one DS. This applies even if you have multiple contents in your DS. public static void main(String[] args) throws Exception {

Re: Memory constrains running Flink on Kubernetes

2019-07-23 Thread Xintong Song
Hi, Flink acquires these 'Status_JVM_Memory' metrics through the MXBean library. According to MXBean document, non-heap is "the Java virtual machine manages memory other than the heap (referred as non-heap memory)". Not sure whether that is equivalent to the metaspace. If the

Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Andres Angel
Hello Weng, This definitely helps a lot, however I know my initial DS has a single row content then I would in theory just create a DS which is what I need. That is why I need to know how to create a new environment DS within a map function. thanks so much On Tue, Jul 23, 2019 at 11:41 PM

Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Caizhi Weng
Hi Andres, Thanks for the detailed explanation. but apparently I can't create a new DS within a map function If you create a new DS within the map function, then you'll create as many DSs as the number of elements in the old DS which... doesn't seem to be your desired situation? I suppose you

subscribe

2019-07-23 Thread maybe love

Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Andres Angel
Hello, Let me list properly the questions I have: * How to catch into a string the content of a DataStream? about this point basically I have a DS , the only way how I can use the content is within a map function , print , store the content somewhere or SQL queries. The point is that I need the

Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Caizhi Weng
Hi Andres, Sorry I can't quite get your question... Do you mean that how to spilt the string into fields? There is a `split` method in java. You can give it a regexp and it will return an array containing all the split fields. Andres Angel 于2019年7月24日周三 上午10:28写道: > Hello Weng, > > thanks for

Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Andres Angel
Hello Weng, thanks for your reply, however I'm struggling to somehow read the content of my DS with the payload that defines how many fields the message contains into a String. That is the reason why I thought into a map function for that DS. The Tuple part can change overtime can even pass from

Re: Create Tuple Array dynamically fro a DS

2019-07-23 Thread Caizhi Weng
Hi Andres, Are the payloads strings? If yes, one method is that you can store them as strings and process it further with user defined functions when you need to use them. Another method is that you can store them into arrays. Also, if the type of the first 3 fields are the same for the first

Re: add laplace to k means

2019-07-23 Thread Yun Gao
Hi alaa, In the KMeans example, in each iteration the new centers is computed in a map-reduce pattern. Each task maintains a part of points and it first choose the new center for each point, and then the new center of the sum(point) and num(point) is computed in the CentroidAccumulator,

Re: GroupBy result delay

2019-07-23 Thread Hequn Cheng
Hi Fanbin, Fabian is right, it should be a watermark problem. Probably, some tasks of the source don't have enough data to advance the watermark. Furthermore, you could also monitor event time through Flink web interface. I have answered a similar question on stackoverflow, see more details

Create within a map function of a DS a new register DS

2019-07-23 Thread Andres Angel
Hello everyone, I need to read an element from my DS and according to the content create on the flight a new DS and register it as new EnvironmentTable. I'm using the map function for my input DS, however when I try to use the variable env(environment, in my case StreamExecutionEnvironment ) I

Re: Transform from Table to DS

2019-07-23 Thread Andres Angel
This has been fixed now, something weird is that according to the documentation , I might include around 4 maven packages to properly work along with the TABLE/SQL API https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/ . However , I solved my issue working without :

Create Tuple Array dynamically fro a DS

2019-07-23 Thread Andres Angel
Hello everyone, I need to create dynamically the size of my Tuple that feeds a DS, let me explain it better. Let's assume the first payload I read has this format "filed1,field2,field3", then this might require a Tuple3<> but my payload later can be "field1,field2,field3,field4" then my Tuple

Re: Transform from Table to DS

2019-07-23 Thread Caizhi Weng
Hi Andres, Can you print your entire code (including the import section) in this post? It might be that this Exception has something to do with your import. If you are coding in a Java environment then you should import StreamTableEnvironment.java not StreamTableEnvironment.scala. Andres Angel

Re: Flink 1.8 run参数不一样

2019-07-23 Thread Zili Chen
你好,可以查看下 log/ 目录下的相关日志有没有这样一段 2019-07-24 09:34:36,507 WARN org.apache.flink.client.cli.CliFrontend - Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli. java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException at

Flink 1.8 run参数不一样

2019-07-23 Thread 王佩
之前下载的Flink 1.8,运行bin/flink run --help,会有 yarn-cluster 的一些参数,如下: Options for yarn-cluster mode: -d,--detachedIf present, runs the job in detached mode -m,--jobmanager Address of the JobManager (master) to

Re: GroupBy result delay

2019-07-23 Thread Fanbin Bu
If I use proctime, the groupBy happens without any delay. On Tue, Jul 23, 2019 at 10:16 AM Fanbin Bu wrote: > not sure whether this is related: > > public SingleOutputStreamOperator assignTimestampsAndWatermarks( > AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) { > >//

Re: GroupBy result delay

2019-07-23 Thread Fanbin Bu
not sure whether this is related: public SingleOutputStreamOperator assignTimestampsAndWatermarks( AssignerWithPeriodicWatermarks timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange // behaviour: the watermark will creep

Re: GroupBy result delay

2019-07-23 Thread Fanbin Bu
Thanks Fabian for the prompt reply. I just started using Flink and this is a great community. The watermark setting is only accounting for 10 sec delay. Besides that, the local job on IntelliJ is running fine without issues. Here is the code: class EventTimestampExtractor(slack: Long = 0L)

Transform from Table to DS

2019-07-23 Thread Andres Angel
Hello guys I'm working on Java environment and I have a sample code as: Table schemafit = tenv.sqlQuery("Here is my query"); I need to turn this into a DS to print and any other transformation then I doing a sort of: DataStream resultSet = tenv.toAppendStream(schemafit, Row.class);

Re: Checkpoints timing out for no apparent reason

2019-07-23 Thread spoganshev
Looks like this is the issue: https://issues.apache.org/jira/browse/FLINK-11164 We'll try switching to 1.8 and see if it helps. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
Sure: /--> AsyncIO --\ STREAM --> ProcessFunc -- -- Union -- WindowFunc \--/ ProcessFunc keeps track of the unique keys per window duration and emits

Re: Timestamp(timezone) conversion bug in non blink Table/SQL runtime

2019-07-23 Thread Rong Rong
Hi Shuyi, I think there were some discussions in the mailing list [1,2] and JIRA tickets [3,4] that might be related. Since the table-blink planner doesn't produce such error, I think this problem is valid and should be fixed. Thanks, Rong [1]

Re: Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Juan Rodríguez Hortalá
Using that classifier worked, the code builds fine now, thanks a lot. I'm using 1.8.0 by the way Greetings, Juan On Tue, Jul 23, 2019 at 5:06 AM Haibo Sun wrote: > Hi, Juan > > It is dependent on "flink-runtime-*-tests.jar", so build.sbt should be > modified as follows: > > *scalaVersion :=

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-23 Thread Rong Rong
Hi Dongwon, Sorry for the late reply. I did try some experiment and seems like you are right: Setting the `.return()` type actually alter the underlying type of the DataStream from a GenericType into a specific RowTypeInfo. Please see the JIRA ticket [1] for more info. Regarding the approach,

Re: Execution environments for testing: local vs collection vs mini cluster

2019-07-23 Thread Juan Rodríguez Hortalá
Hi Bao, Thanks for your answer. 1. Integration tests for my project. 2. Both data stream and data sets On Mon, Jul 22, 2019 at 11:44 PM Biao Liu wrote: > Hi Juan, > > I'm not sure what you really want. Before giving some suggestions, could > you answer the questions below first? > > 1. Do

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Flavio Pompermaier
For each key I need to call an external REST service to get the current status and this is why I'd like to use Async IO. At the moment I do this in a process function but I'd like a cleaner solution (if possible). Do you think your proposal of forking could be a better option? Could you provide a

Re: CEP Pattern limit

2019-07-23 Thread Fabian Hueske
Hi Pedro, each pattern gets translated into one or more Flink operators. Hence, your Flink program becomes *very* large and requires much more time to be deployed. Hence, the timeout. I'd try to limit the size your job by grouping your patterns and creating an own job for each group. You can

Re: Flink and CDC

2019-07-23 Thread Flavio Pompermaier
Indeed Kafka connect is perfect but I think Flink could easily do the same without much work..this is what I'm asking for..if anybody has never thought about it

Re: 1.9 Release Timeline

2019-07-23 Thread Oytun Tez
Thank you for responding! I'll subscribe to dev@ --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Tue, Jul 23, 2019 at 10:25 AM Timo Walther wrote: > Hi Oytun, > > the community is working hard to release 1.9. You can

Re: 1.9 Release Timeline

2019-07-23 Thread Timo Walther
Hi Oytun, the community is working hard to release 1.9. You can see the progress here [1] and on the dev@ mailing list. [1] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=328=detail Regards, Timo Am 23.07.19 um 15:52 schrieb Oytun Tez: Ping, any estimates? --- Oytun Tez

Re: Checkpoints timing out for no apparent reason

2019-07-23 Thread spoganshev
I've looked into this problem a little bit more. And it looks like the problem is caused by some problem with Kinesis sink. There is an exception in the logs at the moment in time when the job gets restored after being stalled for about 15 minutes: Encountered an unexpected expired iterator

Re: 1.9 Release Timeline

2019-07-23 Thread Oytun Tez
Ping, any estimates? --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Thu, Jul 18, 2019 at 11:07 AM Oytun Tez wrote: > Hi team, > > 1.9 is bringing very exciting updates, State Processor API and MapState > migrations

Re: Extending REST API with new endpoints

2019-07-23 Thread Oytun Tez
Ping, any ideas? --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Mon, Jul 22, 2019 at 9:39 AM Oytun Tez wrote: > I did take a look at it, but things got out of hand very quickly from > there on :D > > I see that

Re: [DISCUSS] Create a Flink ecosystem website

2019-07-23 Thread Oytun Tez
I agree with Robert – localization (this is what we do at MotaWord) is a maintenance work. If not maintained as well as mainstream, it will only damage and distant devs that use those local websites. Re: comments, I don't think people will really discuss furiously. But we at least need a system

Re: Flink SinkFunction for WebSockets

2019-07-23 Thread Oytun Tez
Hi Tim, I think this might be a useful sink for small interactions with outside. Are you planning to open source this? If yes, can you try to make it agnostic so that people can plug in their own WebSocket protocol – Stomp etc? :) We can publish this in the upcoming community website as an

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
OK, I see. What information will be send out via the async request? Maybe you can fork of a separate stream with the info that needs to be send to the external service and later union the result with the main stream before the window operator? Am Di., 23. Juli 2019 um 14:12 Uhr schrieb Flavio

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-23 Thread Dongwon Kim
Hi Fabian, Thanks for clarification :-) I could convert back and forth without worrying about it as I keep using Row type during the conversion (even though fields are added). Best, Dongwon On Tue, Jul 23, 2019 at 8:15 PM Fabian Hueske wrote: > Hi Dongwon, > > regarding the question about

Re: Flink on Mesos

2019-07-23 Thread Till Rohrmann
I'll take a look. Cheers, Till On Tue, Jul 23, 2019 at 3:07 PM Oleksandr Nitavskyi wrote: > Hey guys. > > > > We have also made implementation in Flink on Mesos component in order to > support network bandwidth configuration. > > > > Will somebody be able to have a look on our PR: >

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Till Rohrmann
Hi Richard, it looks as if the zNode of a completed job has not been properly removed. Without the logs of the respective JobMaster, it is hard to debug any further. However, I suspect that this is an instance of FLINK-11665. I am currently working on a fix for it. [1]

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Fabian Hueske
Good to know that you were able to fix the issue! I definitely agree that it would be good to know why this situation occurred. Am Di., 23. Juli 2019 um 14:38 Uhr schrieb Richard Deurwaarder < rich...@xeli.eu>: > Hi Fabian, > > I followed the advice of another flink user who mailed me directly,

Re: Flink on Mesos

2019-07-23 Thread Oleksandr Nitavskyi
Hey guys. We have also made implementation in Flink on Mesos component in order to support network bandwidth configuration. Will somebody be able to have a look on our PR: https://github.com/apache/flink/pull/8652 There are for sure some details to clarify. Cheers Oleksandr From: Till

add laplace to k means

2019-07-23 Thread alaa
Hallo I have used this k means code on Flink https://github.com/apache/flink/blob/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/clustering/KMeans.java and I would to add noise that follows Laplace distribution to the sum of data item and to the number

Memory constrains running Flink on Kubernetes

2019-07-23 Thread wvl
Hi, We're running a relatively simply Flink application that uses a bunch of state in RocksDB on Kubernetes. During the course of development and going to production, we found that we were often running into memory issues made apparent by Kubernetes OOMKilled and Java OOM log events. In order to

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Richard Deurwaarder
Hi Fabian, I followed the advice of another flink user who mailed me directly, he has the same problem and told me to use something like: rmr zgrep /flink/hunch/jobgraphs/1dccee15d84e1d2cededf89758ac2482 which allowed us to start the job again. It might be nice to investigate what went wrong as

Re: Flink and CDC

2019-07-23 Thread Flavio Pompermaier
Anyone else having experience on this topic that could provide additional feedback here? On Thu, Jul 18, 2019 at 1:18 PM Flavio Pompermaier wrote: > I think that using Kafka to get CDC events is fine. The problem, in my > case, is really about how to proceed: > 1) do I need to create Flink

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Flavio Pompermaier
The problem of bundling all records together within a window is that this solution doesn't scale (in the case of large time windows and number of events)..my requirement could be fulfilled by a keyed ProcessFunction but I think AsyncDataStream should provide a first-class support to keyed streams

Re:Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Haibo Sun
Hi, Juan It is dependent on "flink-runtime-*-tests.jar", so build.sbt should be modified as follows: scalaVersion := "2.11.0" val flinkVersion = "1.8.1" libraryDependencies ++= Seq( "org.apache.flink" %% "flink-test-utils" % flinkVersion % Test, "org.apache.flink" %%

Re: Use batch and stream environment in a single pipeline

2019-07-23 Thread Fabian Hueske
Hi, Right now it is not possible to mix batch and streaming environments in a job. You would need to implement the batch logic via the streaming API which is not always straightforward. However, the Flink community is spending a lot of effort on unifying batch and stream processing. So this will

Re: [Table API] ClassCastException when converting a table to DataStream

2019-07-23 Thread Fabian Hueske
Hi Dongwon, regarding the question about the conversion: If you keep using the Row type and not adding/removing fields, the conversion is pretty much for free right now. It will be a MapFunction (sometimes even not function at all) that should be chained with the other operators. Hence, it should

Re: Flink SinkFunction for WebSockets

2019-07-23 Thread Fabian Hueske
Hi Tim, One thing that might be interesting is that Flink might emit results more than once when a job recovers from a failure. It is up to the receiver to deal with that. Depending on the type of results this might be easy (idempotent updates) or impossible. Best, Fabian Am Fr., 19. Juli

Re: From Kafka Stream to Flink

2019-07-23 Thread Maatary Okouya
I would like to have a KTable, or maybe in Flink term a dynamic Table, that only contains the latest value for each keyed record. This would allow me to perform aggregation and join, based on the latest state of every record, as opposed to every record over time, or a period of time. On Sun, Jul

Re: AsyncDataStream on key of KeyedStream

2019-07-23 Thread Fabian Hueske
Hi Flavio, Not sure I understood the requirements correctly. Couldn't you just collect and bundle all records with a regular window operator and forward one record for each key-window to an AsyncIO operator? Best, Fabian Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-23 Thread Flavio Pompermaier
I agree but you have to know in which jar a job is contained..when you upload the jar on our application you immediately know the qualified name of the job class and in which jar it belongs to. I think that when you upload a jar on Flink, Flink should list all available jobs inside it (IMHO)..it

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Fabian Hueske
Hi Richard, I hope you could resolve the problem in the meantime. Nonetheless, maybe Till (in CC) has an idea what could have gone wrong. Best, Fabian Am Mi., 17. Juli 2019 um 19:50 Uhr schrieb Richard Deurwaarder < rich...@xeli.eu>: > Hello, > > I've got a problem with our flink cluster

Re: Does Flink support raw generic types in a merged stream?

2019-07-23 Thread Fabian Hueske
Hi John, You could implement your own n-ary Either type. It's a bit of work because you'd need also a custom TypeInfo & Serializer but rather straightforward if you follow the implementation of Either. Best, Fabian Am Mi., 17. Juli 2019 um 16:28 Uhr schrieb John Tipper <

Re: Union of streams performance issue (10x)

2019-07-23 Thread Fabian Hueske
Hi Peter, The performance drops probably be due to de/serialization. When tasks are chained, records are simply forwarded as Java objects via method calls. When a task chain in broken into multiple operators, the records (Java objects) are serialized by the sending task, possibly shipped over the

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-23 Thread Jeff Zhang
IIUC the list of jobs contained in jar means the jobs you defined in the pipeline. Then I don't think it is flink's responsibility to maintain the job list info, it is the job scheduler that define the pipeline. So the job scheduler should maintain the job list. Flavio Pompermaier

Re: timeout exception when consuming from kafka

2019-07-23 Thread Fabian Hueske
Hi Yitzchak, Thanks for reaching out. I'm not an expert on the Kafka consumer, but I think the number of partitions and the number of source tasks might be interesting to know. Maybe Gordon (in CC) has an idea of what's going wrong here. Best, Fabian Am Di., 23. Juli 2019 um 08:50 Uhr schrieb

Re: MiniClusterResource class not found using AbstractTestBase

2019-07-23 Thread Fabian Hueske
Hi Juan, Which Flink version do you use? Best, Fabian Am Di., 23. Juli 2019 um 06:49 Uhr schrieb Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com>: > Hi, > > I'm trying to use AbstractTestBase in a test in order to use the mini > cluster. I'm using specs2 with Scala, so I cannot

Re: GroupBy result delay

2019-07-23 Thread Fabian Hueske
Hi Fanbin, The delay is most likely caused by the watermark delay. A window is computed when the watermark passes the end of the window. If you configured the watermark to be 10 minutes before the current max timestamp (probably to account for out of order data), then the window will be computed

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-23 Thread Flavio Pompermaier
The jobs are somehow related to each other in the sense that we have a configurable pipeline where there are optional steps you can enable/disable (and thus we create a single big jar). Because of this, we have our application REST service that actually works also as a job scheduler and use the

Re: [DISCUSS] Create a Flink ecosystem website

2019-07-23 Thread Robert Metzger
Thanks a lot Marta for offering to write a blog post about the community site! I'm not sure if multi-language support for the site is a good idea. I see the packages site as something similar to GitHub or Jira. The page itself contains very view things we could actually translate. The package

Re: apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-23 Thread Xiangyu Su
Hi Zili, here is the release notes for 1.8.1 https://flink.apache.org/news/2019/07/02/release-1.8.1.html But I could not find any ticket related to the "unexpected time-consuming", I have just tested our application with both versions, this issue is be able to reproduce every time with version

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-23 Thread Jeff Zhang
Thanks Flavio, I get most of your points except one - Get the list of jobs contained in jar (ideally this is is true for every engine beyond Spark or Flink) Just curious to know how you submit job via rest api, if there're multiple jobs in one jar, then do you need to submit jar one time

Re: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

2019-07-23 Thread Yun Tang
Hi Andrew FilesCreated = CreateFileOps + FsDirMkdirOp Please refer to [1] and [2] to know the meaning of this metrics. [1]

Re: [SURVEY] How many people implement Flink job based on the interface Program?

2019-07-23 Thread Flavio Pompermaier
Hi Jeff, the thing about the manifest is really about to have a way to list multiple main classes in the jart (without the need to inspect every Java class or forcing a 1-to-1 between jar and job like it is now). My requirements were driven by the UI we're using in our framework: - Get the

Re: timeout exception when consuming from kafka

2019-07-23 Thread Yitzchak Lieberman
Hi. Another question - what will happen during a triggered checkpoint if one of the kafka brokers is unavailable? Will appreciate your insights. Thanks. On Mon, Jul 22, 2019 at 12:42 PM Yitzchak Lieberman < yitzch...@sentinelone.com> wrote: > Hi. > > I'm running a Flink application (version

Re: apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-23 Thread Zili Chen
Hi Xiangyu, Could you share the corresponding JIRA that fixed this issue? Best, tison. Xiangyu Su 于2019年7月19日周五 下午8:47写道: > btw. it seems like this issue has been fixed in 1.8.1 > > On Fri, 19 Jul 2019 at 12:21, Xiangyu Su wrote: > >> Ok, thanks. >> >> and this time-consuming until now

Re: Execution environments for testing: local vs collection vs mini cluster

2019-07-23 Thread Biao Liu
Hi Juan, I'm not sure what you really want. Before giving some suggestions, could you answer the questions below first? 1. Do you want to write a unit test (or integration test) case for your project or for Flink? Or just want to run your job locally? 2. Which mode do you want to test?