Re: Adding Context To Logs

2016-06-02 Thread Chesnay Schepler
the job name is straight up not accessible from within a function. Until recently it wasn't even accessible /anywhere/ on the TaskManager. To get the job name in a function you would have to pass it from the TaskManager to the task, to the operator and (probably) to the RuntimeContext. If

Re: java.io.IOException: Couldn't access resultSet

2016-06-05 Thread Chesnay Schepler
you are not supposed to call open yourselves. On 05.06.2016 11:05, David Olsen wrote: Following the sample on the flink website[1] to test jdbc I encountered an error "Couldn't access resultSet". It looks like the nextRecord is called before open() function. However I've called open() when I

Re: Custom keyBy(), look for similaties

2016-06-08 Thread Chesnay Schepler
the idea behind key-selectors is to extract a property on which you can to equality comparisons. let's get one question out of the way first: is your scoring algorithm transitive? as in if A==B and B==C, is it a given that A==C? because if not, there's just no way to group(=partition) the

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Chesnay Schepler
from what i can tell from your code you are trying to execute a job within a job. This just doesn't work. your main method should look like this: |publicstaticvoidmain(String[]args)throwsException{doublepi =new classPI().compute();System.out.println("We estimate Pi to be: "+pi);}| On

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Chesnay Schepler
could you state a specific problem? On 07.06.2016 06:40, Soumya Simanta wrote: I've a simple program which takes some inputs from a command line (Socket stream) and then aggregates based on the key. When running this program on my local machine I see some output that is counter intuitive to

Re: java.io.IOException: Couldn't access resultSet

2016-06-06 Thread Chesnay Schepler
Caused by: java.lang.NullPointerException at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:164) ... 7 more Anything I should check as well? Thanks On 5 June 2016 at 17:26, Chesnay Schepler <ches...@apache.org <mai

Re: Does Flink allows for encapsulation of transformations?

2016-06-07 Thread Chesnay Schepler
e? Thanks a lot for the suggestions. On Tuesday, June 7, 2016 6:15 AM, Chesnay Schepler <ches...@apache.org> wrote: from what i can tell from your code you are trying to execute a job within a job. This just doesn't work. your main method should look like this: |publicstaticvoidm

Re: java.io.IOException: Couldn't access resultSet

2016-06-06 Thread Chesnay Schepler
false). Is the variable 'exhausted' supposed to act in that way (initialized to false, then check if hasNext() true, which unfortunately is always false)? I appreciate any suggestions. Thanks. On 6 June 2016 at 15:46, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org&g

Re: Does Flink allows for encapsulation of transformations?

2016-06-10 Thread Chesnay Schepler
ter.collect().get(0)) .map(new Sampler()) .reduce(new SumReducer()) .map(new MapFunction<Long, Double >() { *Long N = NumIter.collect().get(0);* @Override public Double map(Long arg0) throws Exception { return arg0 *4.0/N; }}); }} Thanks a lot for your time. Ser On Tuesday, June 7, 2

Re: Parallel read text

2016-05-28 Thread Chesnay Schepler
ExecutionEnvironment.readTextFile will read the file in parallel. On 28.05.2016 09:59, David Olsen wrote: After searching on the internet I still do not find the answer (with key word like 'apache flink parallel read text') I am looking for. So asking here before jumping to write code ... My

Re: Reading Parameter values sent to partition

2016-05-28 Thread Chesnay Schepler
There are 2 flaws in your code: Let's start with the fundamental one: At no point do you associate your mapConf with the flatMap or even the job. THeoretically you should add it to the flatMap using flatMap(...).withConfiguration(mapConf) method. But here's is the second a more subtle flaw:

Re: Different log4j.properies

2016-05-30 Thread Chesnay Schepler
i believe the log4j.properties are always used for JM/TM logs, and log4j-yarn-session.properties strictly for pure YARN stuff. On 30.05.2016 18:30, Konstantin Knauf wrote: Hi everyone, basic question, but I just want to check if my current understanding of the log4j property files inside

Re: State key serializer has not been configured in the config.

2016-06-23 Thread Chesnay Schepler
We should adjust the error message to contain the keyed stream thingy. On 23.06.2016 10:11, Till Rohrmann wrote: Hi Jacob, the `ListState` abstraction is a state which we call partitioned/key-value state. As such, it is only possible to use it with a keyed stream. This means that you have to

Re: Getting the NumberOfParallelSubtask

2016-06-20 Thread Chesnay Schepler
Within the mapper you cannot access the parallelism of the following nor preceding operation. On 20.06.2016 15:56, Paschek, Robert wrote: Hi Mailing list, using a RichMapPartitionFunction i can access the total number m of this mapper utilized in my job with int m =

Re: Cassandra Connector Problem (Possible Guava Conflict?)

2016-06-26 Thread Chesnay Schepler
The problem is that the cassandra jar currently contains 2 shaded guavas. I already have a fix ready that suppressed the root-poms shade plugin configuration inside the cassandra submit. I will submit that next week. On 26.06.2016 17:46, Eamon Kavanagh wrote: Hey everyone, I'm having an

Re: How to ensure exactly-once semantics in output to Kafka?

2016-02-05 Thread Chesnay Schepler
Essentially what happens is the following: in between checkpoints all incoming data is stored within the operator state. when a checkpoint-complete operation arrives, the data is read from the operator state and written into kafka (or any system) if the job fails while storing records in

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-09 Thread Chesnay Schepler
). I might glean a hint from it. Prez Cannady p: 617 500 3378 e: revp...@opencorrelate.org <mailto:revp...@opencorrelate.org> GH: https://github.com/opencorrelate LI: https://www.linkedin.com/in/revprez On Mar 9, 2016, at 3:25 AM, Chesnay Schepler <ches...@apache.org <mailto:ches..

Re: FromIteratorFunction problems

2016-04-07 Thread Chesnay Schepler
hmm, maybe i was to quick with linking to the JIRA. As for an example: you can look at the streaming WindowJoin example. The sample data uses an Iterator. (ThrottledIterator) Note that the iterator implementation used is part of flink and also implements serializable. On 07.04.2016 22:18,

Re: FromIteratorFunction problems

2016-04-07 Thread Chesnay Schepler
you will find some information regarding this issue in this JIRA: https://issues.apache.org/jira/browse/FLINK-2608 On 07.04.2016 22:18, Andrew Whitaker wrote: Hi, I'm trying to get a simple example of a source backed by an iterator working. Here's the code I've got: ```

Re: Yammer metrics

2016-04-08 Thread Chesnay Schepler
Apr 8, 2016 at 11:01 AM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Flink currently doesn't expose any metrics beyond those shown in the Dashboard. I am currently working on integrating a new metrics system that is

Re: Yammer metrics

2016-04-08 Thread Chesnay Schepler
note that we still /could /expose the option of using the yammer reporters; there isn't any technical limitation as of now that would prohibit that. On 08.04.2016 13:05, Chesnay Schepler wrote: I'm very much aware of how Yammer works. As the slides you linked show (near the end

Re: Yammer metrics

2016-04-08 Thread Chesnay Schepler
covers all use-cases with the help of JMXTrans and similar tools. Reporting to specific systems is something we want to add as well though. Regards, Chesnay Schepler On 08.04.2016 09:32, Sanne de Roever wrote: Hi, I´m looking into setting up monitoring for our (Flink) environment and realized

Re: Java I/O exception

2016-03-21 Thread Chesnay Schepler
I quickly went through the code: Flink gathers some data about the hardware available, like numberOfCPUCores / available physical memory. Now the physical memory part is apparently only used for logging / metrics display in the dashboard, so its not a problem that you got it, it is simply not

Re: override file in flink

2016-03-22 Thread Chesnay Schepler
by using DataStream#writeAsCsv(String path, WriteMode writeMode) On 22.03.2016 12:18, subash basnet wrote: Hello all, I am trying to write the streaming data to file and update it recurrently with the streaming data. I get the following unable to override exception error: *Caused by:

Re: Oracle 11g number serialization: classcast problem

2016-03-23 Thread Chesnay Schepler
format to read the database with configurable parallelism. We are still working on it. If we get to something stable and working, we'll gladly share it. saluti, Stefano 2016-03-22 15:46 GMT+01:00 Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>>: The JDBC format

Re: Oracle 11g number serialization: classcast problem

2016-03-23 Thread Chesnay Schepler
On 23.03.2016 10:38, Chesnay Schepler wrote: On 23.03.2016 10:04, Stefano Bortoli wrote: I had a look at the JDBC input format, and it does indeed interpret BIGDECIMAL and NUMERIC values as double. This sounds more like a bug actually. Feel free to open a JIRA for this. Actually

Re: Unable to run the batch examples after running stream examples

2016-03-24 Thread Chesnay Schepler
basnet wrote: Hello Chesnay Schepler, I am running the latest flink examples from here <https://github.com/apache/flink> in eclipse. I had used the WikipediaAnalysis pom properties/dependency in batch pom as shown below: UTF-8 1.0.0 org.apache.flink flin

Re: Unable to submit flink job that uses Avro data

2016-03-23 Thread Chesnay Schepler
Could you be missing the call to execute()? On 23.03.2016 01:25, Tarandeep Singh wrote: Hi, I wrote a simple Flink job that uses Avro input format to read avro file and save the results in avro format. The job does not get submitted and job client exist out immediately. Same thing happens if

Re: Periodic actions

2016-03-03 Thread Chesnay Schepler
could the problem be as simple as var active being never true? On 04.03.2016 03:08, shikhar wrote: I am trying to have my job also run a periodic action by using a custom source that emits a dummy element periodically and a sink that executes the callback, as shown in the code below. However as

Re: JDBCInputFormat preparation with Flink 1.1-SNAPSHOT and Scala 2.11

2016-03-09 Thread Chesnay Schepler
you can always create your own InputFormat, but there is no AbstractJDBCInputFormat if that's what you were looking for. When you say arbitrary tuple size, do you mean a) a size greater than 25, or b) tuples of different sizes? If a) unless you are fine with using nested tuples you won't get

Re: Java 8 and keyBy in 1.0.0

2016-03-30 Thread Chesnay Schepler
based on https://issues.apache.org/jira/browse/FLINK-3138 this is not supported for non-static methods. On 30.03.2016 10:33, Andrew Ge Wu wrote: Hi, This is not very obvious and looks like a bug. I have a lambda expression to get key from objects in stream: *This works:* stream.keyBy(value

Re: Oracle 11g number serialization: classcast problem

2016-03-22 Thread Chesnay Schepler
The JDBC formats don't make any assumption as to what DB backend is used. A JDBC float in general is returned as a double, since that was the recommended mapping i found when i wrote the formats. Is the INT returned as a double as well? Note: The (runtime) output type is in no way connected

Re: Flink log dir

2016-04-28 Thread Chesnay Schepler
according to https://issues.apache.org/jira/browse/FLINK-3678 it should be available in 1.0.3 On 28.04.2016 16:23, Flavio Pompermaier wrote: Hi to all, I'm using Flink 1.0.1 and I can't find how to change log directory.in the current master I see that there's the

Re: Requesting the next InputSplit failed

2016-04-27 Thread Chesnay Schepler
Are you using your modified connector or the currently available one? On 27.04.2016 17:35, Flavio Pompermaier wrote: Hi to all, I'm running a Flink Job on a JDBC datasource and I obtain the following exception: java.lang.RuntimeException: Requesting the next InputSplit failed. at

Re: Cassandra sink wrt Counters

2016-05-10 Thread Chesnay Schepler
made; selectively re-read/write upon failure One of the key requisites is proper failure reporting though; if an update fails we /need to know/. As far as i know Cassandra doesn't make this guarantee. Regards, Chesnay Schepler On 10.05.2016 07:48, milind parikh wrote: Given FLINK 3311

Re: Sink Parallelism

2016-04-19 Thread Chesnay Schepler
The picture you reference does not really show how dataflows are connected. For a better picture, visit this link: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows Let me know if this doesn't answer your question. On 19.04.2016 14:22,

Re: lost connection

2016-04-21 Thread Chesnay Schepler
Hello, the first step is always to check the logs under /log. The JobManager log in particular may contain clues as why no connection could be established. Regards, Chesnay On 21.04.2016 15:44, Radu Tudoran wrote: Hi, I am trying to submit a jar via the console (flink run my.jar). The

Re: lost connection

2016-04-21 Thread Chesnay Schepler
ersons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! *From:*Chesnay Schepler [mailto:ches...@apache.org] *Sent:* Thursday, April 21, 2016 3:58 PM *To:* user@flink.apache.org *Subjec

Re: Issue with running Flink Python jobs on cluster

2016-07-13 Thread Chesnay Schepler
Hello Geoffrey, How often does this occur? Flink distributes the user-code and the python library using the Distributed Cache. Either the file is deleted right after being created for some reason, or the DC returns a file name before the file was created (which shouldn't happen, it should

Re:

2016-07-17 Thread Chesnay Schepler
Hello Chen, you can access the set configuration in your rich function like this: |public static final class Tokenizer extends RichFlatMapFunction> { @Override public void flatMap(String value, Collector> out) { ParameterTool parameters

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
entific data that needs to be completed soon. Thank for your assistance, Geoffrey On Sun, Jul 17, 2016 at 4:04 AM Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Please also post the job you're trying to run. On 17.07.2016 08:43, Geoffrey Mon wrote:

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
Cheers, Geoffrey On Fri, Jul 15, 2016 at 4:15 AM Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Could you write a java job that uses the Distributed cache to distribute files? If this fails then the DC is faulty,

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Chesnay Schepler
ut the cache. Cheers, Geoffrey On Fri, Jul 15, 2016 at 4:15 AM Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Could you write a java job that uses the Distributed cache to distribute files? If this fails then the DC

Re: Issue with running Flink Python jobs on cluster

2016-07-19 Thread Chesnay Schepler
:58 AM Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: well now i know what the problem could be. You are trying to execute a job on a cluster (== not local), but have set the local flag to true. env.execute(local=True) Due to this fl

Re: Writing in flink clusters

2016-07-13 Thread Chesnay Schepler
Hello, Is that the complete error message? I'm a bit surprised it does not explicitly name any file name. If it really doesn't we should change that. Regards, Chesnay Schepler On 13.07.2016 15:35, Alexis Gendronneau wrote: Hi Roy, Have you looked on the nodes in charge of sink tasks ? You

Re: Issue with running Flink Python jobs on cluster

2016-07-15 Thread Chesnay Schepler
the problem. Thanks, Geoffrey On Wed, Jul 13, 2016 at 6:12 AM Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Hello Geoffrey, How often does this occur? Flink distributes the user-code and the python library using

Re: JDBC sink in flink

2016-07-05 Thread Chesnay Schepler
that makes send. Also what's the difference between a RichOutputFormat and a RichSinkFunction ? Can I use JDBCOutputFormat as a sink in a stream ? On Tue, Jul 5, 2016 at 3:53 PM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Hello,

Re: JDBC sink in flink

2016-07-05 Thread Chesnay Schepler
Hello, an instance of the JDBCOutputFormat will use a single connection to send all values. Essentially - open(...) is called at the very start to create the connection - then all invoke/writeRecord calls are executed (using the same connection) - then close() is called to clean up. The

Re: Error joining with Python API

2016-08-16 Thread Chesnay Schepler
looks like a bug, will look into it. :) On 16.08.2016 10:29, Ufuk Celebi wrote: I think that this is actually a bug in Flink. I'm cc'ing Chesnay who originally contributed the Python API. He can probably tell whether this is a bug in the Python API or Flink ioperator side of things. ;) On Mon,

Re: To get Schema for jdbc database in Flink

2017-02-08 Thread Chesnay Schepler
Hello, I don't understand why you explicitly need the schema since the batch JDBCInput-/Outputformats don't require it. That's kind of the nice thing about Rows. Would be cool if you could tell us what you're planning to do with the schema :) In any case, to get the schema within the plan

Re: To get Schema for jdbc database in Flink

2017-02-08 Thread Chesnay Schepler
the jdbc input source returns the datastream of Row and to write them into jdbc database we have to create a table which requires schema. Thanks Punit On 02/08/2017 08:22 AM, Chesnay Schepler wrote: Hello, I don't understand why you explicitly need the schema since the batch JDBCInput

Re: Unable to use Scala's BeanProperty with classes

2017-02-15 Thread Chesnay Schepler
Hello, There is an open PR about adding support for case classes to the cassandra sinks: https://github.com/apache/flink/pull/2633 You would have to checkout the branch and build it yourself. If this works for you it would be great if you could also give some feedback either here or in the

Re: RichMapFunction in DataStream, how do I set the parameters received in open?

2016-09-12 Thread Chesnay Schepler
Hello, you cannot pass a configuration in the Steaming API. This way of configuration is more of a relic from past times. The common way to pass configure a function is to pass the parameters through the constructor and store the values in a field. Regards, Chesnay On 12.09.2016 18:27,

Re: Flink JDBC JDBCOutputFormat Open

2016-09-12 Thread Chesnay Schepler
Hello, the JDBC Sink completely ignores the taskNumber and parallelism. Regards, Chesnay On 12.09.2016 08:41, Swapnil Chougule wrote: Hi Team, I want to know how tasknumber & numtasks help in opening db connection in Flink JDBC JDBCOutputFormat Open. I checked with docs where it says:

Re: Custom(application) Metrics - Piggyback on Flink's metrics infra or not?

2016-09-22 Thread Chesnay Schepler
0, 2016 at 2:10 AM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Hello Eswar, as far as I'm aware the general structure of the Flink's metric system is rather similar to DropWizard. You can use DropWizard metrics by creating

Re: Error joining with Python API

2016-08-17 Thread Chesnay Schepler
Found the issue, there was a missing tab in the chaining method... On 16.08.2016 12:12, Chesnay Schepler wrote: looks like a bug, will look into it. :) On 16.08.2016 10:29, Ufuk Celebi wrote: I think that this is actually a bug in Flink. I'm cc'ing Chesnay who originally contributed

Re: Apache siddhi into Flink

2016-08-29 Thread Chesnay Schepler
Hello Aparup, could you provide more information about Siddhi? How mature is it; how is the community? How does it compare to the Flink's CEP library? How should this integration look like? Are you proposing to replace the current CEP library, or will they co-exist with different use-cases

Re: Flink JMX

2016-08-29 Thread Chesnay Schepler
ter.jmx_reporter.port: 8080-8082 But no hope. Am i miss anything ? Thank You On Mon, Aug 29, 2016 at 4:36 PM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Hello, can you post the jmx config entries and give us mo

Re: Release Process

2016-11-04 Thread Chesnay Schepler
Hello, Every contribution to the master branch will be released as part of the next minor version, in your case this would be 1.2. We are currently aiming for a release in December. In between minor versions several bug-fix versions are released (1.1.1, 1.1.2 etc.). For these the community

Re: Flink and factories?

2016-10-19 Thread Chesnay Schepler
The functions are serialized when env.execute() is being executed. The thing is, as i understand it, that your singleton is simply not part of the serialized function, so it doesn't actually matter when the function is serialized. Storing the factory instance in the function shouldn't be too

Re: Task and Operator Monitoring via JMX / naming

2016-10-20 Thread Chesnay Schepler
This is completely unintended behavior; you should never have to adjust your topology so the metric system get's the names right. I'll take a deep look into this tomorrow ;) Regards, Chesnay On 20.10.2016 08:50, Philipp Bussche wrote: Some further observations: I had a Job which was taking

Re: Task and Operator Monitoring via JMX / naming

2016-10-20 Thread Chesnay Schepler
Well the issue is the following: the metric system assumes the following naming scheme for tasks based on the DataSet API and simple streaming jobs: [CHAIN] operatorName1 [=> operatorName2 [ ...]] To retrieve the operator name the above is split by "=>", giving us a String[] of all operator

Re: org.apache.flink.core.fs.Path error?

2016-10-20 Thread Chesnay Schepler
.runtime.tasks.SourceStreamTask.run(_SourceStreamTask.java:56_) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(_StreamTask.java:266_) at org.apache.flink.runtime.taskmanager.Task.run(_Task.java:584_) at java.lang.Thread.run(_Thread.java:745_) *From:*Chesnay Schepler [mailto:ches...@apache.org <mailto

Re: org.apache.flink.core.fs.Path error?

2016-10-20 Thread Chesnay Schepler
Hello Radu, Flink can handle windows paths, this alone can't be the problem. If you could post the error you are getting we may pinpoint the issue, but right now i would suggest the usual: check that the path is indeed correct, that you have sufficient permissions to access the file. And

Re: org.apache.flink.core.fs.Path error?

2016-10-20 Thread Chesnay Schepler
taskmanager.Task.run(_Task.java:584_) at java.lang.Thread.run(_Thread.java:745_) *From:*Chesnay Schepler [mailto:ches...@apache.org] *Sent:* Thursday, October 20, 2016 2:22 PM *To:* user@flink.apache.org *Subject:* Re: org.apache.flink.core.fs.Path error? Hello Radu, Flink can handle windo

Re: Task and Operator Monitoring via JMX / naming

2016-10-15 Thread Chesnay Schepler
Hello Philipp, there is certainly something very wrong here. What you _should_ see is 6 entries, 1 for each operator; 2-3 more for the tasks the operators are executed in and the taskmanager stuff. Usually, operator metrics use the name that you configured, like "TokenMapStream", whereas

Re: Task and Operator Monitoring via JMX / naming

2016-10-15 Thread Chesnay Schepler
Hello Philipp, the relevant names are stored in the OperatorMetricGroup/TaskMetricGroup classes in flink-runtime. The name for a task is extracted directly from the TaskDeploymentDescriptor in TaskManagerJobMetricGroup#addTask(). The name for a streaming operator that the metric system uses

Re: Flink Metrics

2016-10-17 Thread Chesnay Schepler
Hello, we could also offer a small utility method that creates 3 flink meters, each reporting one rate of a DW meter. Timers weren't added yet since, as Till said, no one requested them yet and we haven't found a proper internal use-case for them Regards, Chesnay On 17.10.2016 09:52, Till

Re: Task and Operator Monitoring via JMX / naming

2016-10-20 Thread Chesnay Schepler
It will be in the master tomorrow. On 20.10.2016 18:50, Philipp Bussche wrote: Thanks Chesnay ! I am not too familiar with the release cycles here but was wondering when one could expect your fix to be in the master of Flink ? Should I create my own build for the moment maybe ? Thanks. --

Re: Many operations cause StackOverflowError with AWS EMR YARN cluster

2016-11-23 Thread Chesnay Schepler
Hello, implementing collect() in python is not that trivial and the gain is questionable. There is an inherent size limit (think 10mb), and it is a bit at odds with the deployment model of the Python API. Something easier would be to execute each iteration of the for-loop as a separate job

Re: Tame Flink UI?

2016-11-23 Thread Chesnay Schepler
? For example, instead of enumerating all metrics, maybe ask for the range? On Wed, Nov 16, 2016 at 2:05 PM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Hello, The WebInterfaces first pulls a list of all available metrics for a specific taskmana

Re: Cassandra Connector

2016-11-22 Thread Chesnay Schepler
Hello, the CassandraSink is not implemented as a sink but as a special operator, so you wouldn't be able to use the addSink() method. (I can't remember the actual method being used.) There are also several different implementations for various types (tuples, pojo's, scala case classes) but

Re: Cassandra Connector

2016-11-22 Thread Chesnay Schepler
Actually this is a bit inaccurate. _Some_ implementations are not implemented as a sink. Also, you can in fact instantiate the sinks yourself as well, as in readings.addSink(new CassandraTupleSink(, ); On 22.11.2016 09:30, Chesnay Schepler wrote: Hello, the CassandraSink

Re: Tame Flink UI?

2016-11-16 Thread Chesnay Schepler
Hello, The WebInterfaces first pulls a list of all available metrics for a specific taskmanager/job/task (which is reasonable since how else would you select them), and then requests the values for all metrics by supplying the name of every single metric it just received, which is where

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

2016-11-01 Thread Chesnay Schepler
Hello, the main issue that prevented us from writing batches is that there is a server-side limit as to how big a batch may be, however there was no way to tell how big the batch that you are currently building up actually is. Regarding locality, I'm not sure if a partitioner alone solves

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Chesnay Schepler
Done. https://issues.apache.org/jira/browse/FLINK-5299 On 08.12.2016 16:50, Ufuk Celebi wrote: Would you like to open an issue for this for starters Chesnay? Would be good to fix for the upcoming release even. On 8 December 2016 at 16:39:58, Chesnay Schepler (ches...@apache.org) wrote

Re: conditional dataset output

2016-12-08 Thread Chesnay Schepler
Hello Lars, The only other way i can think of how this could be done is by wrapping the used outputformat in a custom format, which calls open on the wrapped outputformat when you receive the first record. This should work but is quite hacky though as it interferes with the format

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Chesnay Schepler
It would be neat if we could support arrays as keys directly; it should boil down to checking the key type and in case of an array injecting a KeySelector that calls Arrays.hashCode(array). This worked for me when i ran into the same issue while experimenting with some stuff. The batch API

Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Chesnay Schepler
We do not measure how much data we are spilling to disk. On 09.12.2016 14:43, Fabian Hueske wrote: Hi, the heap mem usage should be available via Flink's metrics system. Not sure if that also captures spilled data. Chesnay (in CC) should know that. If the spilled data is not available as a

Re: How to retrieve values from yarn.taskmanager.env in a Job?

2016-12-12 Thread Chesnay Schepler
Hello, can you clarify one small thing for me: Do you want to access this parameter when you define the plan (aka when you call methods on the StreamExecutionEnvironment or DataStream instances) or from within your functions/operators? Regards, Chesnay Schepler On 12.12.2016 14:21, Till

Re: Reg. custom sinks in Flink

2016-12-12 Thread Chesnay Schepler
Regarding 2) I don't think so. That would require access to the datastax MappingManager. We could add something similar as the ClusterBuilder for that though. Regards, Chesnay On 12.12.2016 16:15, Meghashyam Sandeep V wrote: Hi Till, Thanks for the information. 1. What do you mean by

Re: Reg. custom sinks in Flink

2016-12-12 Thread Chesnay Schepler
d still use Pojo? Thanks, On Mon, Dec 12, 2016 at 7:26 AM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Regarding 2) I don't think so. That would require access to the datastax MappingManager.

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-12 Thread Chesnay Schepler
FLINK-5470 is a duplicate of FLINK-5298 for which there is also an open PR. FLINK-5472 is imo invalid since the webserver does support https, you just have to enable it as per the security documentation. On 12.01.2017 16:20, Till Rohrmann wrote: I also found an issue:

Re: JVM Non Heap Memory

2016-12-05 Thread Chesnay Schepler
Hello Daniel, I'm afraid you stumbled upon a bug in Flink. Meters were not properly cleaned up, causing the underlying dropwizard meter update threads to not be shutdown either. I've opened a JIRA and will open a PR soon. Thank your for

Re: JVM Non Heap Memory

2016-12-05 Thread Chesnay Schepler
://issues.apache.org/jira/browse/FLINK-5261 I added an issue to improve the documentation about cancellation (https://issues.apache.org/jira/browse/FLINK-5260). Which version of Flink are you using? Chesnay's fix will make it into the upcoming 1.1.4 release. On 5 December 2016 at 14:04:49, Chesnay Schepler (ches

Re: JVM Non Heap Memory

2016-12-05 Thread Chesnay Schepler
o it seems the fix of Meter's won't make it to 1.1.4 ? Best Regards, Daniel Santos On 12/05/2016 01:28 PM, Chesnay Schepler wrote: We don't have to include it in 1.1.4 since Meter's do not exist in 1.1; my bad for tagging it in JIRA for 1.1.4. On 05.12.2016 14:18, Ufuk Celebi wr

Re: Sequential/ordered map

2017-01-05 Thread Chesnay Schepler
So given an ordered list of texts, for each word find the earliest text it appears in? As Kostas said, when splitting the text into words wrap them in a Tuple2 containing the word and text index and group them by the word. As far as i can tell the next step would be a simple reduce that

Re: Debugging Python-Api fails with NoClassDefFoundError

2017-01-05 Thread Chesnay Schepler
Hello, all Flink dependencies of the Python APi are marked as *provided* in the pom.xml similar to most connectors. By removing the provided tags in the pom.xml you should be able to run the PythonPlanBinder from the IDE. This was done to exclude these dependencies in the flink-python jar;

Re: [E] Re: unable to add more servers in zookeeper quorum peers in flink 1.2

2017-03-22 Thread Chesnay Schepler
I guess that's because the grouping is wrong. ^server\.*([0-9])+*[[:space:]]*\=([^: \#]+) should probably be ^server\.*([0-9]+)*[[:space:]]*\=([^: \#]+) Could you modify the .sh script as such and try again? Regards, Chesnay On 22.03.2017 16:10, kanagaraj.vengidas...@verizon.com wrote:

Re: Fail to call jobs/:jobid/cancel-with-savepoint/ with local Flink mini cluseter started by IDE

2017-03-29 Thread Chesnay Schepler
Hello, The cancel-with-savepoint command is not available in 0.10.1 . I'm pretty sure it was added in 1.2 , so you'll have to upgrade the runtime-web dependency. Note that all Flink dependencies should always have the same version. Regards, Chesnay On 28.03.2017 18:12, Sendoh wrote: Hi

Re: Web Dashboard reports 0 for sent/received bytes and records

2017-03-31 Thread Chesnay Schepler
Hello, there are collected and are accessible under the metrics tab. Currently we can't display them directly since the backing job data-structure only sees tasks(aka chains) and not the individual operators. There are some ongoing efforts to change that though, so we may be able to improve

Re: register metrics reporter on application level

2017-03-17 Thread Chesnay Schepler
Hello, there is currently no way to specify a reporter for a specific job. Regards, Chesnay On 17.03.2017 19:18, Mingmin Xu wrote: Hello all, I'm following https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html to collect metrics. It seems the reporter is set

Re: Cassandra connector POJO - tombstone question

2017-04-12 Thread Chesnay Schepler
Hello, what i can do is add hook like we do for the ClusterBuilder with which you can provide a set of options that will be used for every call to the mapper. This would provide you access with all options that are listed on the page you linked. You can find an implementation of this here:

Re: Connector for REST End Point

2017-04-10 Thread Chesnay Schepler
something like that . Thanks Archit On Thu, Apr 6, 2017 at 1:15 PM, Chesnay Schepler <ches...@apache.org <mailto:ches...@apache.org>> wrote: Hello, Please give us a bit more information, I'm really not sure what you mean. :(

Re: Fetching metrics failed.

2017-04-20 Thread Chesnay Schepler
Hello, the MetricQueryService is used by the webUI to fetch fetch metrics from the JobManager and all TaskManagers. It is only used when the webUI is accessed. Based on the logs you gave the TaskManager isn't killed by the JobManager; instead the JobManager only detected that the TaskManager

Re: Proper way to call a Python function in WindowFunction.apply()

2017-03-14 Thread Chesnay Schepler
Hey, Naturally this would imply that you're script is available on all nodes, so you will have to distribute it manually. On 14.03.2017 17:23, Chesnay Schepler wrote: Hello, I would suggest implementing the RichWindowFunction instead, and instantiate Jep within open(), or maybe do some

Re: Proper way to call a Python function in WindowFunction.apply()

2017-03-14 Thread Chesnay Schepler
Hello, I would suggest implementing the RichWindowFunction instead, and instantiate Jep within open(), or maybe do some lazy instantiation within apply. Regards, Chesnay On 14.03.2017 15:47, 김동원 wrote: Hi all, What is the proper way to call a Python function in WindowFunction.apply()?

Re: Flink 1.2 and Cassandra Connector

2017-03-06 Thread Chesnay Schepler
Hello, i believe the cassandra connector is not shading it's dependencies properly. This didn't cause issues in the past since flink used to have a dependency on codahale metrics as well. Please open a JIRA for this issue. Regards, Chesnay On 06.03.2017 11:32, Tarandeep Singh wrote: Hi

Re: Flink UI and records sent

2017-04-05 Thread Chesnay Schepler
Hey Flavio, it's unlikely that the counters skip a record. For the webUI these metrics are transported in 2 different ways: For running tasks they are fetched through the metric system; this provides no guarantee that the final count is ever displayed. For finished tasks the final count is

Re: FileNotFoundException when restoring checkpoint

2017-07-17 Thread Chesnay Schepler
Hello, If i recall correctly savepoints are always self-contained even if incremental checkpointing is enabled. However, this doesn't appear to be documented anywhere. As for the missing file, I'm looping in Stefan who is more knowledgeable about incremental checkpointing (and potentially

  1   2   3   4   5   6   7   8   9   10   >