Strata San Jose

2018-02-08 Thread ashish pok
Wondering if any of the core Flink team members are planning to be at the conference? It would be great to meet in peson. Thanks, -- Ashish

RE: Kafka as source for batch job

2018-02-08 Thread Marchant, Hayden
Forget to mention that my target Kafka version is 0.11.x with aim to upgrade to 1.0 when 1.0.x fixpack is released. From: Tzu-Li (Gordon) Tai [mailto:tzuli...@apache.org] Sent: Thursday, February 08, 2018 8:05 PM To: user@flink.apache.org; Marchant, Hayden [ICG-IT]

Measure the memory consumption of a job at runtime

2018-02-08 Thread Wang, Chao
Hi, I would like to measure the memory consumption of a job at runtime. I came across some discussion (here: https://stackoverflow.com/questions/35315522/flink-memory-usage ), and it seems that it’s not possible two years ago. Is it possible in the current status, and if yes, how to do it?

RE: Kafka as source for batch job

2018-02-08 Thread Marchant, Hayden
Hi Gordon, Actually our use case is that we have start/end timestamp, and we plan on calling KafkaConsumer.offsetForTimes to get the offsets for each partition. So, I guess our logic is different in that we have an ‘and’ predicate between each partition arriving at offset, as opposed to the

dataset sort

2018-02-08 Thread david westwood
Hi: I would like to sort historical data using the dataset api. env.setParallelism(10) val dataset = [(Long, String)] .. .paritionByRange(_._1) .sortPartition(_._1, Order.ASCEDING) .writeAsCsv("mydata.csv").setParallelism(1) the data is out of order (in local order) but .print() prints the

RE: Kafka as source for batch job

2018-02-08 Thread Tzu-Li (Gordon) Tai
Hi Marchant, Yes I agree. In general, the isEndOfStream method has a very ill-defined semantic, with actually different behaviors across different Kafka connector versions. This method will definitely need to be revisited in the future (we are thinking about a rework of the connector). What

Re: CEP for time series in csv-file

2018-02-08 Thread Timo Walther
You can also take a look at the Flink training from data Artisans and the code examples there. They also use CEP and basically read also from a file: http://training.data-artisans.com/exercises/CEP.html Regards, Timo Am 2/8/18 um 6:09 PM schrieb Kostas Kloudas: Hi Esa, I think the best

Re: CEP for time series in csv-file

2018-02-08 Thread Kostas Kloudas
Hi Esa, I think the best place to start is the documentation available at the flink website. Some pointers are the following: CEP documentation: https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/libs/cep.html

CEP for time series in csv-file

2018-02-08 Thread Esa Heikkinen
Hi I have cvs-file(s) that contain an event in every row and first column is time stamp of event. Rest of columns are data and "attributes" of event. I'd want to write simple Scala code that: 1) reads data of csv-file 2) converts data of csv-file compatible for CEP 3) sets pattern for CEP 4)

RE: Kafka as source for batch job

2018-02-08 Thread Marchant, Hayden
Gordon, Thanks for the pointer. I did some searches for usages of isEndOfStream and it’s a little confusing. I see that all implementors of DeserializationSchema must implement this method, but it’s not called from anyone central in the Flink streaming engine, but rather each source can decide

Developing and running Flink applications in Linux through Windows editors or IDE's ?

2018-02-08 Thread Esa Heikkinen
Hello I am newbie with Flink. I'd want to develop my Flink scala-application in Windows IDE (for examples IntelliJ IDEA) and run them in Linux (Ubuntu). Is that good or bad idea ? Or is it some remote use possible ? At this moment there are no graphical interface (GUI) in Linux. Or would it be

Re: Two issues when deploying Flink on DC/OS

2018-02-08 Thread Lasse Nedergaard
And We see the same too Med venlig hilsen / Best regards Lasse Nedergaard > Den 8. feb. 2018 kl. 11.58 skrev Stavros Kontopoulos > : > > We see the same issue here (2): > 2018-02-08 10:55:11,447 ERROR >

Re: Two issues when deploying Flink on DC/OS

2018-02-08 Thread Stavros Kontopoulos
We see the same issue here (2): 2018-02-08 10:55:11,447 ERROR org.apache.flink.runtime.webmonitor.files.StaticFileServerHandler - Caught exception java.io.IOException: Connection reset by peer Stavros On Sat, Jan 13, 2018 at 9:59 PM, Eron Wright wrote: > Hello Dongwon, >

Re: Kafka as source for batch job

2018-02-08 Thread Tzu-Li (Gordon) Tai
Hi Hayden, Have you tried looking into `KeyedDeserializationSchema#isEndOfStream` [1]? I think that could be what you are looking for. It signals the end of the stream when consuming from Kafka. Cheers, Gordon On 8 February 2018 at 10:44:59 AM, Marchant, Hayden (hayden.march...@citi.com)

Kafka as source for batch job

2018-02-08 Thread Marchant, Hayden
I know that traditionally Kafka is used as a source for a streaming job. In our particular case, we are looking at extracting records from a Kafka topic from a particular well-defined offset range (per partition) - i.e. from offset X to offset Y. In this case, we'd somehow want the application

Re: Question about flink checkpoint

2018-02-08 Thread Fabian Hueske
Great, thank you! Best, Fabian 2018-02-07 23:52 GMT+01:00 Chengzhi Zhao : > Thanks, Fabian, > > I opened an JIRA ticket and I'd like to work on it if people think this > would be a improvement: > https://issues.apache.org/jira/browse/FLINK-8599 > > Best, > Chengzhi > >