Re: Running flink examples

2019-09-18 Thread Vijay Bhaskar
Can you check whether its able to read the supplied input file properly or not? Regards Bhaskar On Wed, Sep 18, 2019 at 1:07 PM RAMALINGESWARA RAO THOTTEMPUDI < tr...@iitkgp.ac.in> wrote: > Hi Sir, > > I am trying to run the flink programs particularl Pagerank. > > I have used the following com

Re: Running flink examples

2019-09-18 Thread RAMALINGESWARA RAO THOTTEMPUDI
Hi Sir, I am trying to run the flink programs particularl Pagerank. I have used the following command : ./bin/flink run -d ./examples/batch/PageRank.jar --input /path/to/input It is running but it is showing only 15 elements ranking for my data. But I need to find the ranking of all elements o

Re: serialization issue in streaming job run with scala Future

2019-09-18 Thread Rafi Aroch
Hi Debasish, Have you taken a look at the AsyncIO API for running async operations? I think this is the preferred way of doing it. [1] So it would look something like this: class AsyncDatabaseRequest extends AsyncFunction[String, (String, String)] { /** The database specific client that can

Re: Property based testing

2019-09-18 Thread Indraneel R
Oh great! Thanks, Aaron that was quite clear. I will give it a try! On Wed, Sep 18, 2019 at 8:29 PM Aaron Levin wrote: > Hey, > > I've used ScalaCheck to test flink applications. Basic idea is: > > * use ScalaCheck to generate some kind of collection > * use `fromCollection` in `StreamExecutionE

Re: Kafka producer failed with InvalidTxnStateException when performing commit transaction

2019-09-18 Thread Tony Wei
Hi Becket, One more thing, I have tried to restart other brokers without active controller, but this exception might happen as well. So it should be independent of the active controller like you said. Best, Tony Wei Tony Wei 於 2019年9月18日 週三 下午6:14寫道: > Hi Becket, > > I have reproduced this pr

Re: Client for Monitoring API!

2019-09-18 Thread Biao Liu
Hi Anis, Have you tried Flink metric reporter? It's a better way to handle metrics than through rest api. Flink supports reporting metrics to external system. You could find the list of external systems supported here [1]. 1. https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/

Re: Flink on yarn use jar on hdfs

2019-09-18 Thread Yang Wang
Hi shengnan, Sorry for late. I will attach a pr to FLINK-13938 in this week. If we specify the shared lib(-ysl), all the jars located in the lib directory of flink client will not be uploaded. Instead, we will use the hdfs path to set the LocalResource of yarn. And the visibility of LocalResource

Re: Batch mode with Flink 1.8 unstable?

2019-09-18 Thread Ken Krugler
Hi Till, I tried out 1.9.0 with my workflow, and I no longer am running into the errors I described below, which is great! Just to recap, this is batch, per-job mode on YARN/EMR. Though I did run into a new issue, related to my previous problem when reading files written via SerializedOutputFo

Re: serialization issue in streaming job run with scala Future

2019-09-18 Thread Debasish Ghosh
ok, the above problem was due to some serialization issues which we fixed by marking some of the things transient. This fixes the serialization issues .. But now when I try to execute in a Future I hit upon this .. *java.util.concurrent.ExecutionException: Boxed Error* at scala.concurrent.impl.Pr

changing flink/kafka configs for stateful flink streaming applications

2019-09-18 Thread Abrar Sheikh
Hey all, One of the known things with Spark Stateful Streaming application is that we cannot alter Spark Configurations or Kafka Configurations after the first run of the stateful streaming application, this has been explained well in https://www.linkedin.com/pulse/upgrading-running-spark-streamin

Re: Extending Flink's SQL-Parser

2019-09-18 Thread Rong Rong
Hi Dominik, To add to Rui's answer. there are other examples I can think of on how to extend Calcite's DDL syntax is already in Calcite's Server module [1] and one of our open-sourced project [2]. you might want to check them out. -- Rong [1] https://github.com/apache/calcite/blob/master/server/

Re: Property based testing

2019-09-18 Thread Aaron Levin
Hey, I've used ScalaCheck to test flink applications. Basic idea is: * use ScalaCheck to generate some kind of collection * use `fromCollection` in `StreamExecutionEnvironment` to create a `DataStream` * use `DataStreamUtils.collect` as a sink * plug my flink logic between the collection source a

Re: Client for Monitoring API!

2019-09-18 Thread Felipe Gutierrez
yes. you can use prometheus+Grafana. https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter https://felipeogutierrez.blogspot.com/2019/04/monitoring-apache-flink-with-prometheus.html Felipe On 2019/09/18 11:36:3

Re: Time Window Flink SQL join

2019-09-18 Thread Fabian Hueske
Hi Nishant, You should model the query as a join with a time-versioned table [1]. The bad-ips table would be the time-time versioned table [2]. Since it is a time-versioned table, it could even be updated with new IPs. This type of join will only keep the time-versioned table (the bad-ips in stat

Property based testing

2019-09-18 Thread Indraneel R
Hi All, Is there any property based testing framework for flink like 'SparkTestingBase' for spark? Would also be useful to know what are some of the standard testing practices for data testing for flink pipelines. regards -Indraneel

Re: Time Window Flink SQL join

2019-09-18 Thread Nishant Gupta
Hi Fabian, Thanks for your reply I have a continuous stream of kafka coming and static table of badips. I wanted to segregate records having bad ip. So therefore i was joining it. But with that 60 gb memory getting run out So i used below query. Can u please guide me in this regard On Wed, 18 S

Re: Time Window Flink SQL join

2019-09-18 Thread Fabian Hueske
Hi, The query that you wrote is not a time-windowed join. INSERT INTO sourceKafkaMalicious SELECT sourceKafka.* FROM sourceKafka INNER JOIN badips ON sourceKafka.`source.ip`=badips.ip WHERE sourceKafka.timestamp_received BETWEEN CURRENT_TIMESTAMP - INTERVAL '15' MINUTE AND CURRENT_TIMESTAMP; The

Time Window Flink SQL join

2019-09-18 Thread Nishant Gupta
Hi Team, I am running a query for Time Window Join as below INSERT INTO sourceKafkaMalicious SELECT sourceKafka.* FROM sourceKafka INNER JOIN badips ON sourceKafka.`source.ip`=badips.ip WHERE sourceKafka.timestamp_received BETWEEN CURRENT_TIMESTAMP - INTERVAL '15' MINUTE AND CURRENT_TIMESTAMP; T

Client for Monitoring API!

2019-09-18 Thread Anis Nasir
Hey all, Is there any client library that we can use to fetch/store the metrics expose through flink monitoring rest api ? Regards, Anis

Re: Kafka producer failed with InvalidTxnStateException when performing commit transaction

2019-09-18 Thread Tony Wei
Hi Becket, I have reproduced this problem in our development environment. Below is the log message with debug level. Seems that the exception was from broker-3, and I also found other error code in broker-2 during the time. There are others INVALID_TXN_STATE error for other transaction id. I just

Re: Difference between data stream window function and cep within

2019-09-18 Thread Joshua Fan
Hi Dian Thank you for your explanation. After have a look at the source code, the cep within just executes by a time interval according to each state. Thank you. Yours sincerely Joshua On Wed, Sep 18, 2019 at 9:41 AM Dian Fu wrote: > Hi Joshua, > > There is no tumbling/sliding window underlyin

Running flink examples

2019-09-18 Thread RAMALINGESWARA RAO THOTTEMPUDI
Hi Sir, I am trying to run the flink programs particularl Pagerank. I have used the following command : ./bin/flink run -d ./examples/batch/PageRank.jar --input /path/to/input It is running but it is showing only 15 elements ranking for my data. But I need to find the ranking of all elements