Re: Queryable State

2017-01-25 Thread Nico Kruber
using it seems > pointless. Moreover while removing it I would take a second look at those > > functions: > > KvStateRequestSerializer::deserializeList > > KvStateRequestSerializer.serializeList > > > As I think they are not used at all even right now. Thanks for your time. > > Regar

Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Nico Kruber
Hi Sujit, this does indeed sound strange and we are not aware of any data loss issues. Are there any exceptions or other errors in the job/taskmanager logs? Do you have a minimal working example? Is it that whole windows are not processed or just single items inside a window? Nico On Tuesday,

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Nico Kruber
You do not require a plugin, but most probably this dependency was not fetched by Eclipse. Please try a "mvn clean package" in your project and see whether this helps Eclipse. Also, you may create a clean test project with mvn archetype:generate \

Re: Unable to use Scala's BeanProperty with classes

2017-02-14 Thread Nico Kruber
Hi Adarsh, thanks for reporting this. It should be fixed eventually. @Timo: do you have an idea for a work-around or quick-fix? Regards Nico On Tuesday, 14 February 2017 21:11:21 CET Adarsh Jain wrote: > I am getting the same problem when trying to do FlatMap operation on my > POJO class. > >

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Nico Kruber
wing dependency that causes the problem: > > > org.apache.flink > flink-test-utils_2.10 > 1.2.0 > test-jar > test > > > Best, > Flavio > > On Tue, Feb 14, 2017 at 2:51 PM, Nico Kruber

Re: Inconsistent Results in Event Time based Sliding Window processing

2017-02-14 Thread Nico Kruber
s – A Northgate Public Services Company > <https://www.google.co.in/maps/place/Rave+Technologies/@19.0058078,72.823516 > ,17z/data=!3m1!4b1!4m5!3m4!1s0x3bae17fcde71c3b9:0x1e2a8c0c4a075145!8m2!3d19. > 0058078!4d72.8257047> > > > > Please consider the environment before p

Re: Queryable State

2017-01-16 Thread Nico Kruber
esolve > also my problem :) > > Regards > Dawid Wysakowicz > > 2017-01-13 18:50 GMT+01:00 Nico Kruber <n...@data-artisans.com>: > > Hi Dawid, > > I'll try to reproduce the error in the next couple of days. Can you also > > share > > the value

Re: Running streaming job on every node of cluster

2017-02-28 Thread Nico Kruber
inue this discussion in dev mail list? Nico, what do you > think? > > BR, Evgeny. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/config.ht > ml#configuring-taskmanager-processing-slots > От: Nico Kruber<mailto:n...@data-artisans.

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
Hi Evgeny, I tried to reproduce your example with the following code, having another console listening with "nc -l 12345" env.setParallelism(2); env.addSource(new SocketTextStreamFunction("localhost", 12345, " ", 3)) .map(new MapFunction() {

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
oyment modes allow it. > > BR, Evgeny. > > От: Nico Kruber<mailto:n...@data-artisans.com> > Отправлено: 27 февраля 2017 г. в 20:07 > Кому: user@flink.apache.org<mailto:user@flink.apache.org> > Копия: Evgeny Kincharov<mailto:evgeny_kincha...@epam.com> > Те

Re: Sliding Windows Processing problem with Kafka Queue Event Time and BoundedOutOfOrdernessGenerator

2017-02-27 Thread Nico Kruber
Hi Sujit, actually, according to https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/ windows.html#allowed-lateness the sliding window should fire each time for each element arriving late. Did you set the following for your window operator? .window() .allowedLateness() The

Re: Running streaming job on every node of cluster

2017-02-27 Thread Nico Kruber
this may also be a good read: https://ci.apache.org/projects/flink/flink-docs-release-1.2/concepts/ runtime.html#task-slots-and-resources On Monday, 27 February 2017 18:40:48 CET Nico Kruber wrote: > What about setting the parallelism[1] to the total number of slots in your > c

Re: Queryable State

2017-01-13 Thread Nico Kruber
Hi Dawid, I'll try to reproduce the error in the next couple of days. Can you also share the value deserializer you use? Also, have you tried even smaller examples in the meantime? Did they work? As a side-note in general regarding the queryable state "sink" using ListState

Re: has insufficient permissions to access it - Error

2017-04-12 Thread Nico Kruber
th? Because I moved the file into my project > and IntelliJ did an autocomplete for the file path > > How can I use a relative file path? Because I work on two different systems > > > Marc > > > Am 11.04.2017 um 10:19 schrieb Nico Kruber <n...@data-artisans.com>: &g

Re: Aggregation problem.

2017-04-11 Thread Nico Kruber
maxBy() is still a member of org.apache.flink.api.scala.GroupedDataSet in the current sources - what did you upgrade flink to? Also please make sure the new version is used, or - if compiled from sources - try a "mvn clean install" to get rid of old intermediate files. Regards Nico On

Re: Aggregation problem.

2017-04-13 Thread Nico Kruber
/flink-1.2.0/flink-1.2.0-bin-had > oop27-scala_2.11.tgz). I am getting this error in eclipse Neon(3) > > Regards, > Kursat > > -Original Message- > From: Nico Kruber [mailto:n...@data-artisans.com] > Sent: Tuesday, April 11, 2017 3:34 PM > To: user@flink.

Re: questions on custom state with flink window

2017-03-10 Thread Nico Kruber
Hi Sai, 3) If you want to make "Managed Keyed State" queryable, you have to set it as queryable through the API, e.g.: final ValueStateDescriptor query1State = new ValueStateDescriptor<>("stateName", Long.class); query1State.setQueryable("queryName");

Re: Queryable State

2017-03-13 Thread Nico Kruber
Hi Chet, the following thins may create the error you mentioned: * the job ID of the query must match the ID of the running job * the job is not running anymore * the queryableStateName does not match the string given to setQueryable("query-name") * the queried key does not exist (note that you

Re: OutOfMemory error (Direct buffer memory) while allocating the TaskManager off-heap memory

2017-03-06 Thread Nico Kruber
Hi Yassine, Thanks for reporting this. The problem you run into is due to start-local.sh which we discourage in favour of start-cluster.sh that resembles real use case better. In your case, start-local.sh starts a job manager with an embedded task manager but does not parse the task manager

Re: data loss after implementing checkpoint

2017-07-31 Thread Nico Kruber
es and recovers (or is cancelled and > restarted), will my configuration(as to which rules are enabled) still > hold? or do I have to persist the info into a backend? > > On Mon, Jul 10, 2017 at 7:36 PM, Nico Kruber <n...@data-artisans.com> wrote: > > Hi Aftab, > > looks like what

Re: state inside functions

2017-08-03 Thread Nico Kruber
Hi Peter, there's no need to worry about transient members as the operator itself is not serialized - only the state itself, depending on the state back-end. If you want your state to be recovered by checkpoints you should implement the open() method and initialise your state there as in your

Re: json mapper

2017-08-03 Thread Nico Kruber
Hi Peter, I'm no Scala developer but I may be able to help with some concepts: * a static reference used inside a [Map]Function will certainly cause problems when executed in parallel in the same JVM, e.g. a TaskManager with multiple slots, depending on whether this static object is stateful

Re: Event-time and first watermark

2017-08-03 Thread Nico Kruber
Hi Gwenhael, "A Watermark(t) declares that event time has reached time t in that stream, meaning that there should be no more elements from the stream with a timestamp t’ <= t (i.e. events with timestamps older or equal to the watermark)." [1] Therefore, they should be behind the actual event

Re: Akka Quarantine & Old YARN Versions

2017-08-03 Thread Nico Kruber
Hi Konstantin, I digged through the linked pull requests (of https://issues.apache.org/jira/ browse/FLINK-3347) a bit just to notice that the fix-version tag was wrong (should have been 1.2.1, not 1.2.0) but you have that already. In there, it was also mentioned that the quarantine monitor is

Re: Getting JobManager address and port within a running job

2017-08-03 Thread Nico Kruber
Assuming, from your previous email, that you fire up a LocalFlinkMiniCluster: this, afaik, does not process flink-conf.yaml but only the configuration given to it. If you start a "real" flink cluster, e.g. by bin/start-cluster.sh, it will show the behaviour you desired. Nico On Thursday, 3

Re: Akka Quarantine & Old YARN Versions

2017-08-04 Thread Nico Kruber
s configuration > just not documented? > > Cheers, > > Konstantin > > On 03.08.2017 17:11, Nico Kruber wrote: > > Hi Konstantin, > > I digged through the linked pull requests (of > > https://issues.apache.org/jira/ browse/FLINK-3347) a bit just to notice &g

Re: replacement for KeyedStream.fold(..) ?

2017-08-03 Thread Nico Kruber
Hi Peter, although unfortunately not documented yet in [1] (rumor has it that that is going to change soon) and without a proper replacement note in the deprecation javadoc, two things come to mind for replacing fold(): * AggregateFunction and

Re: Can't find correct JobManager address, job fails with Queryable state

2017-08-03 Thread Nico Kruber
Hi Biplob, by starting a local environment the way you described, i.e. by using LocalStreamEnvironment.createLocalEnvironmentWithWebUI(conf); you are firing up a LocalFlinkMiniCluster which, by default, has the queryable state server disabled. You can enable it via:

Re: Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-16 Thread Nico Kruber
Hi Chao, 1) regarding the difference of CoFlatMapFunction/CoProcessFunction, let me quote the javadoc of the CoProcessFunction: "Contrary to the {@link CoFlatMapFunction}, this function can also query the time (both event and processing) and set timers, through the provided {@link Context}.

Re: Aggregation by key hierarchy

2017-08-16 Thread Nico Kruber
Hi Nico, > Thank you . This is pretty much what I am doing , was wondering if there is > a better way. > > If there are 10 dimensions on which I want to aggregate with 2 windows - > this would become about 20 different combinations > > Thank you > Basanth > > On Mon

Re: Question about Global Windows.

2017-08-16 Thread Nico Kruber
Hi Steve, are you sure a GlobalWindows assigner fits your needs? This may be the case if all your events always come in order and you do not ever have overlapping sessions since a GlobalWindows assigner simply puts all events (per key) into a single window (per key). If you have overlapping

Re: Standalone cluster - taskmanager settings ignored

2017-08-14 Thread Nico Kruber
Hi Marc, the master, i.e. JobManager, does not need to know which clients, i.e. TaskManager, are supposed to connect to it. Indeed, only the task managers need to know where to connect to and they will try to establish that connection and re-connect when losing it. Nico On Friday, 11 August

Re: Writing on Cassandra

2017-08-14 Thread Nico Kruber
If I see this correctly in the code, the CassandraSink is using the value of its input stream automatically, so in your case Tuple2> What you want is it to use only Tuple6 without the first

Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Nico Kruber
Hi, have you tried letting your source also implement the StoppableFunction interface as suggested by the SourceFunction javadoc? If a source is stopped, e.g. after identifying some special signal from the outside, it will continue processing all remaining events and the Flink program will

Re: Aggregation by key hierarchy

2017-08-14 Thread Nico Kruber
Hi Basanth, Let's assume you have records of the form Record = {timestamp, country, state, city, value} Then you'd like to create aggregates, e.g. the average, for the following combinations? 1) avg per country 2) avg per state and country 3) avg per city and state and country * You could create

Re: One large WindowFunction vs. several smaller ones

2017-08-14 Thread Nico Kruber
Hi Maarten, If the Count-WF is counting the number of events per window and the Diff-WF is just comparing this number to the output of the previous window, then you do not need a WindowFunction for the Diff-WF afterall: Just use your Count-WF and plug in a stateful map (also see [1]) afterwards

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Nico Kruber
Hi Eranga and Kien, Flink supports asynchronous IO since version 1.2, see [1] for details. You basically pack your URL download into the asynchronous part and collect the resulting string for further processing in your pipeline. Nico [1]

Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread Nico Kruber
Just to be sure, can you try flink 1.3.2 which is supposed to fix FLINK-7133 and was released recently? Nico On Monday, 14 August 2017 03:19:06 CEST mingleizhang wrote: > BTW, My elastic search version is 2.3.3, not the jira FLINK-7133 by 1.7.1. > And I found 2.3.3 is not based on asm. My

Re: data loss after implementing checkpoint

2017-07-11 Thread Nico Kruber
.Ex > ternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); > > after stateBackend, do I need to configure another dir for > checkpoints? How can I set this configuration in main method like I > did for stateBackend ? > > BR, > > On 10 July 2017 at 17:06, Nico Kruber <

Re: Nested Field Expressions with Rows

2017-07-10 Thread Nico Kruber
Can you show a minimal example of the query you are trying to run? Maybe Timo or Fabian (cc'd) can help. Nico On Friday, 7 July 2017 23:09:09 CEST Joshua Griffith wrote: > Hello, > > When using nested field expressions like “Account.Id" with nested rows, I > get the following error, “This type

Re: data loss after implementing checkpoint

2017-07-10 Thread Nico Kruber
Hi Aftab, looks like what you want is either an externalized checkpoint with RETAIN_ON_CANCELLATION mode [1] or a savepoint [2]. Ordinary checkpoints are deleted when the job is cancelled and only serve as a fault tolerance layer in case something goes wrong, i.e. machines fail, so that the

Re: problems starting the training exercise TaxiRideCleansing on local cluster

2017-07-10 Thread Nico Kruber
Hi Günter, unfortunately, I cannot reproduce your error. This is what I did (following http://training.data-artisans.com/devEnvSetup.html): * clone and build the flink-training-exercises project: git clone https://github.com/dataArtisans/flink-training-exercises.git cd flink-training-exercises

Re: System properties when submitting flink job to YARN Session

2017-07-10 Thread Nico Kruber
Hi Jins, I'm not sure whether you can define a system property, but you can include it in the program arguments of "flink run [OPTIONS] " You may also be able to define system properties but these are probably only valid in your main() function executed within the flink run script, not any

Re: Unable to make mapWithState work correctly

2017-07-25 Thread Nico Kruber
Hi Victor, from a quick look at your code, I think, you set up everything just fine (I'm not too familiar with Scala though) but the problem is probably somewhere else: As [1] states (a bit hidden maybe), checkpoints are only used to recover from failures, e.g. if you run your job on 2 task

Re: How can I set charset for flink sql?

2017-07-25 Thread Nico Kruber
Please, for the sake of making your email searchable, do not post stack traces as screenshots but rather text into your email. On Tuesday, 25 July 2017 12:18:56 CEST 程骥 wrote: > My sql like this(contain a Chinese word) > > Get exception when I submit the job to cluster. > > > > Is there

Re: Is joined stream WindowedStream?

2017-07-27 Thread Nico Kruber
Hi Wei, what do you mean be "windowedStream"? The result of dataStream.join(otherStream).where().equalTo() expects a window to be specified. In each window, based on the time and window characteristics you defined, both sources will collect elements that fit into the window and, at its end,

Re: a lot of connections in state "CLOSE_WAIT"

2017-06-30 Thread Nico Kruber
Hi XiangWei, this could be a resource leak, i.e. a socket not getting closed, but I was unable to reproduce that behaviour. Maybe Chesnay (cc'd) has an idea on how/ where this may happen. Can you tell us a bit more on what you where doing / how the webinterface was used? Is there a way to

Re: Writing groups of Windows to files

2017-06-30 Thread Nico Kruber
Looks like you are missing a window *function* that processes the window. >From [1] : stream .keyBy(...) <- keyed versus non-keyed windows .window(...) <- required: "assigner" [.trigger(...)] <- optional: "trigger" (else default trigger)

Re: Great number of jobs and numberOfBuffers

2017-08-18 Thread Nico Kruber
rg] > Sent: jeudi 17 août 2017 11:24 > To: Ufuk Celebi <u...@apache.org> > Cc: Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>; > user@flink.apache.org; Nico Kruber <n...@data-artisans.com> Subject: Re: > Great number of jobs and numberOfBuffers >

Re: Use Single Sink For All windows

2017-06-08 Thread Nico Kruber
How about using asynchronous I/O operations? https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ asyncio.html Nico On Tuesday, 6 June 2017 16:22:31 CEST rhashmi wrote: > because of parallelism i am seeing db contention. Wondering if i can merge > sink of multiple windows

Re: Add custom configuration files to TMs classpath on YARN

2017-06-21 Thread Nico Kruber
A workaround may be to use the DistributedCache. It apparently is not documented much but the JavaDoc mentions roughly how to use it: https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/ flink/api/java/ExecutionEnvironment.java#L954 /** * Registers a file at the

Re: Can't get my job restarted on job manager failures

2017-06-20 Thread Nico Kruber
cala:117) > at > scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.s > cala:599) at > scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) > at > scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597) > at >

Re: Can't get my job restarted on job manager failures

2017-06-20 Thread Nico Kruber
Hi Mike, have you configured zookeeper [1] ? afaik, it is required for a high- availability (YARN) session and is used to store JobManager state. Without it, a recovery would not know what to recover from. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/

Re: Kafka watermarks

2017-06-20 Thread Nico Kruber
Can you clarify a bit more on what you want to achieve? Also, what is "BOOTE"? Nico On Tuesday, 20 June 2017 13:45:06 CEST nragon wrote: > When consuming from kafka should we use BOOTE inside consumer or after? > Thanks > > > > -- > View this message in context: >

Re: Kafka and Flink integration

2017-06-20 Thread Nico Kruber
No, this is only necessary if you want to register a custom serializer itself [1]. Also, in case you are wondering about registerKryoType() - this is only needed as a performance optimisation. What exactly is your problem? What are you trying to solve? (I can't read JFR files here, and from

Re: Kafka and Flink integration

2017-06-20 Thread Nico Kruber
mprove > this process. For instance, tuples vs general class types. Do you know if > it's worth it to map a custom object into tuple just for de/serialization > process? > > According to jfr analysis, kryo methods are hit a lot. > > [cid:image003.jpg@01D2E9E1.26D2D370] > >

Re: Access to time in aggregation, or aggregation in ProcessWindowFunction?

2017-06-20 Thread Nico Kruber
Hi William, I'm not quite sure what you are trying to achieve... What constitutes a "new event"? is this based on some key? If so, you may group on that key, create a window and use a custom trigger [1] instead where you can react in onElement() and setup a event time timer for the first one

Re: Can't get my job restarted on job manager failures

2017-06-20 Thread Nico Kruber
453 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking > the coordinator 2147483480 dead. > > > > > > Kind Regards, > Mike Pryakhin > > > On 20 Jun 2017, at 18:27, Nico Kruber <n...@data-artisans.com> wrote: > > &

Re: Flink and swapping question

2017-05-29 Thread Nico Kruber
FYI: taskmanager.sh sets this parameter but also states the following: # Long.MAX_VALUE in TB: This is an upper bound, much less direct memory will be used TM_MAX_OFFHEAP_SIZE="8388607T" Nico On Monday, 29 May 2017 15:19:47 CEST Aljoscha Krettek wrote: > Hi Flavio, > > Is this running on

Re: Gelly and degree filtering

2017-05-30 Thread Nico Kruber
Does Martin's answer to a similar thread help? https://lists.apache.org/thread.html/ 000af2fb17a883b60f4a2359ebbeca42e3160c2167a88995c2ee28c2@ %3Cuser.flink.apache.org%3E On Monday, 29 May 2017 19:38:20 CEST Martin Junghanns wrote: > Hi Ali :) > > You could compute the degrees beforehand (e.g.

Re: How can I increase Flink managed memory?

2017-05-30 Thread Nico Kruber
By default, Flink allocates a fraction of 0.7 (taskmanager.memory.fraction) of the free memory (total memory configured via taskmanager.heap.mb minus memory used for network buffers) for its managed memory. An absolute value may be set using taskmanager.memory.size (overrides the fraction

Re: Does job restart resume from last known internal checkpoint?

2017-06-02 Thread Nico Kruber
Hi Moiz, didn't Timo's answer cover your questions? see here in case you didn't receive it: https://lists.apache.org/thread.html/ a1a0d04e7707f4b0ac8b8b2f368110b898b2ba11463d32f9bba73968@ %3Cuser.flink.apache.org%3E Nico On Thursday, 1 June 2017 20:30:59 CEST Moiz S Jinia wrote: > Bump.. > >

Re: Deterministic Update

2017-06-08 Thread Nico Kruber
yes - you need to implement the CheckpointedFunction interface. (as an example: our BucketingSink uses this) Nico On Thursday, 8 June 2017 06:44:10 CEST rhashmi wrote: > Is there any possibility to trigger sink operator on completion of > checkpoint? > > > > -- > View this message in context:

Re: Flink on kubernetes -> shell deployment

2017-06-08 Thread Nico Kruber
If you have access to the web dashboard, you probably have access to the Jobmanager in general and can submit jobs from your command line by passing flink run --jobmanager ... I've looped in Patrick in case I am missing something kubernetes-specific here. Nico On Wednesday, 7 June 2017

Re: Error running Flink job in Yarn-cluster mode

2017-06-08 Thread Nico Kruber
I'm no expert here, but are 4 yarn containers/task managers (-yn 4) not too many for 3 data nodes (=3 dn?)? also, isn't the YARN UI reflecting its own jobs, i.e. running flink, as opposed to running the actual flink job? or did you mean that the flink web ui (through yarn) showed the submitted

Re: Queryable State

2017-06-14 Thread Nico Kruber
Hi Chet, you should not see a org.apache.flink.runtime.query.UnknownKvStateKeyGroupLocation when querying an existing(!) key. However, if you query a key the non-registered TaskManager is responsible for, I suppose this is the exception you will get. Unfortunately, the queryable state API

Re: Rest API cancel-with-savepoint: 404s when passing path as target-directory

2017-09-20 Thread Nico Kruber
Hi Emily, I'm not familiar with the details of the REST API either but if this is a problem with the proxy, maybe it is already interpreting the encoded URL and passes it on un-encoded - have you tried encoding the path again? That is, encoding the percent-signs: http://

Re: Savepoints and migrating value state data types

2017-09-21 Thread Nico Kruber
Hi Marc, I assume you have set a UID for your CoProcessFunction as described in [1]? Also, can you provide the Flink version you are working with and the serializer you are using? If you have the UID set, your strategy seems to be the same as proposed by [2]: "Although it is not possible to

Re: on Wikipedia Edit Stream example

2017-09-21 Thread Nico Kruber
Hi Haibin, if you execute the program as in the Wiki edit example [1] from mvn as given or from the IDE, a local Flink environment will be set up which is not accessible form the outside by default. This is done by the call to StreamExecutionEnvironment.getExecutionEnvironment(); which also

Re: Question about concurrent checkpoints

2017-09-21 Thread Nico Kruber
Hi Narendra, according to [1], even with asynchronous state snapshots (see [2]), a checkpoint is only complete after all sinks have received the barriers and all (asynchronous) snapshots have been processed. Since, if the number of concurrent checkpoints is 0, no checkpoint barriers will be

Re: Question about concurrent checkpoints

2017-09-21 Thread Nico Kruber
On Thursday, 21 September 2017 20:08:01 CEST Narendra Joshi wrote: > Nico Kruber <n...@data-artisans.com> writes: > > according to [1], even with asynchronous state snapshots (see [2]), a > > checkpoint is only complete after all sinks have received the barriers and

Re: FLINK-6117 issue work around

2017-09-06 Thread Nico Kruber
I looked at the commit you cherry-picked and nothing in there explains the error you got. This rather sounds like something might be mixed up between (remaining artefacts of) flink 1.3 and 1.2. Can you verify that nothing of your flink 1.3 tests remains, e.g. running JobManager or TaskManager

Re: Building scala examples

2017-09-25 Thread Nico Kruber
Hi Michael, from what I see, Java and Scala examples reside in different packages, e.g. * org.apache.flink.streaming.scala.examples.async.AsyncIOExample vs. * org.apache.flink.streaming.examples.async.AsyncIOExample A quick run on the Flink 1.3. branch revealed flink-examples-

Re: Using HiveBolt from storm-hive with Flink-Storm compatibility wrapper

2017-09-25 Thread Nico Kruber
Hi Federico, I also did not find any implementation of a hive sink, nor much details on this topic in general. Let me forward this to Timo and Fabian (cc'd) who may know more. Nico On Friday, 22 September 2017 12:14:32 CEST Federico D'Ambrosio wrote: > Hello everyone, > > I'd like to use the

Re: Can i use Avro serializer as default instead of kryo serializer in Flink for scala API ?

2017-09-25 Thread Nico Kruber
Hi Shashank, enabling Avro as the default de/serializer for Flink should be as simple as the following, according to [1] val env = StreamExecutionEnvironment.getExecutionEnvironment env.getConfig.enableForceAvro() I am, however, no expert on this and the implications regarding the use of Avro

Re: high-availability.jobmanager.port vs jobmanager.rpc.port

2017-09-25 Thread Nico Kruber
Hi Elias, indeed that looks strange but was introduced with FLINK-3172 [1] with an argument about using the same configuration key (as opposed to having two different keys as mentioned) starting at https://issues.apache.org/jira/browse/FLINK-3172? focusedCommentId=15091940#comment-15091940

Re: History Server

2017-09-25 Thread Nico Kruber
Hi Elias, in theory, it could be integrated into a single web interface, but this was not done so far. I guess the main reason for keeping it separate was probably to have a better separation of concerns as the history server is actually independent of the current JobManager execution and

Re: Stand alone blob server

2017-08-23 Thread Nico Kruber
Hi Anugrah, you can track the progress at the accompanying jira issue: https://issues.apache.org/jira/browse/FLINK-6916 Currently, roughly half of the tasks are done with a few remaining in PR review stage. Note that the actual implementation differs a bit from what was proposed in FLIP 19

Re: Aggregation by key hierarchy

2017-08-21 Thread Nico Kruber
ithout restarting ? > > On Wed, Aug 16, 2017 at 4:37 AM, Nico Kruber <n...@data-artisans.com> wrote: > > [back to the ml...] > > > > also including your other mail's additional content... > > > > > I have been able to do this by the following and repea

Re: Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-21 Thread Nico Kruber
tion, for maintaining custom states of my program logic I guess I > cannot use it. > > > Thank you, > Chao > > On 08/16/2017 03:31 AM, Nico Kruber wrote: > > Hi Chao, > > > > 1) regarding the difference of CoFlatMapFunction/CoProcessFunction, let me > > qu

Re: BlobCache and its functioning

2017-08-31 Thread Nico Kruber
Hi Federico, 1) Which version of Flink are you using? 2) Can you also share the JobManager log? 3) Why do you think, Flink is stuck at the BlobCache? Is it really blocked, or do you still have CPU load? Can you post stack traces of the TaskManager (TM) and JobManager processes when you think

Re: BlobCache and its functioning

2017-08-31 Thread Nico Kruber
st/127.0.0.1:43268 > 2017-08-30 16:01:58,162 DEBUG org.apache.flink.runtime.blob. > BlobClient - GET content addressable BLOB > 6bde2f7a709181065c6710c2252a5846f361ad68 from /127.0.0.1:35743 > > 3) There actually was CPU load, but I thought Flink was stuck

Re: Sink -> Source

2017-09-01 Thread Nico Kruber
Hi Philipp, afaik, Flink doesn't offer this out-of-the-box. You could either hack something as suggested or use Kafka to glue different jobs together. Both may affect exactly/at-least once guarantees, however. Also refer to

Re: Building scala examples

2017-09-27 Thread Nico Kruber
to create and run my own streaming program. They only contain java > compiled class, if I am not mistaken. > > Let me try to create a scala example with similar build procedure. > > Thanks! > > > On Mon, Sep 25, 2017 at 10:41 PM, Nico Kruber <n...@data-artisans.com>

Re: Flink HA Zookeeper Connection Timeout

2017-11-13 Thread Nico Kruber
Hi Sathya, have you checked this yet? https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/ jobmanager_high_availability.html I'm no expert on the HA setup, have you also tried Flink 1.3 just in case? Nico On Wednesday, 8 November 2017 04:02:47 CET Sathya Hariesh Prakash

Re: Flink 1.2.0->1.3.2 TaskManager reporting to JobManager

2017-11-28 Thread Nico Kruber
Hi Regina, can you explain a bit more on what you are trying to do and how this is set up? I quickly tried to reproduce locally by starting a cluster and could not see this behaviour. Also, can you try to increase the loglevel to INFO and see whether you see anything suspicious in the logs?

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-22 Thread Nico Kruber
r -tf target/program.jar | grep MeasurementTable shows the class is > present, are there other dependencies missing? You may need to add runtime > dependencies into your pom or gradle.build file. > > On Tue, Nov 21, 2017 at 2:28 AM, Nico Kruber <n...@data-artisans.com> wro

Re: Correlation between data streams/operators and threads

2017-11-22 Thread Nico Kruber
t is used, and 3 are free. > > c) Attached > > d) Attached > > e) I'll try the debug mode in Eclipse. > > Thanks, > Shailesh > > On Fri, Nov 17, 2017 at 1:52 PM, Nico Kruber <n...@data-artisans.com> wrote: > > regarding 3. > > a) The taskmanager logs

Re: Flink 1.4.0 can not override JAVA_HOME for single-job deployment on YARN

2017-12-14 Thread Nico Kruber
Hi, are you running Flink in an JRE >= 8? We dropped Java 7 support for Flink 1.4. Nico On 14/12/17 12:35, 杨光 wrote: > Hi, > I am usring flink single-job mode on YARN. After i upgrade flink > verson from 1.3.2 to 1.4.0, the parameter > "yarn.taskmanager.env.JAVA_HOME" doesn’t work as before.

Re: ProgramInvocationException: Could not upload the jar files to the job manager / No space left on device

2017-12-14 Thread Nico Kruber
Hi Regina, judging from the exception you posted, this is not about storing the file in HDFS, but a step before that where the BlobServer first puts the incoming file into its local file system in the directory given by the `blob.storage.directory` configuration property. If this property is not

Re: Flink 1.4 with cassandra-connector: Shading error

2017-12-19 Thread Nico Kruber
Hi Dominik, nice assessment of the issue: in the version of the cassandra-driver we use there is even a comment about why: try { // prevent this string from being shaded Class.forName(String.format("%s.%s.channel.Channel", "io", "netty")); shaded = false; } catch (ClassNotFoundException

Re: Correlation between data streams/operators and threads

2017-11-17 Thread Nico Kruber
regarding 3. a) The taskmanager logs are missing, are there any? b) Also, the JobManager logs say you have 4 slots available in total - is this enough for your 5 devices scenario? c) The JobManager log, however, does not really reveal what it is currently doing, can you set the log level to

Re: Flink takes too much memory in record serializer.

2017-11-14 Thread Nico Kruber
We're actually also trying to have the serializer stateless in future and may be able to remove the intermediate serialization buffer which is currently growing on heap before we copy the data into the actual target buffer. This intermediate buffer grows and is pruned after serialization if it

Re: Streaming : a way to "key by partition id" without redispatching data

2017-11-13 Thread Nico Kruber
Hi Gwenhaël, several functions in Flink require keyed streams because they manage their internal state by key. These keys, however, should be independent of the current execution and its parallelism so that checkpoints may be restored to different levels of parallelism (for re-scaling, see

Re: JobManager shows TaskManager was lost/killed while TaskManger Process is still running and the network is OK.

2017-11-13 Thread Nico Kruber
>From what I read in [1], simply add JVM options to env.java.opts as you would when you start a Java program yourself, so setting "-XX:+UseG1GC" should enable G1. Nico [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/ config.html#common-options On Friday, 15 September

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-21 Thread Nico Kruber
Hi Shankara, sorry for the late response, but honestly, I cannot think of a reason that some of your program's classes (using only a single jar file) are found some others are not, except for the class not being in the jar. Or there's some class loader issue in the Flink Beam runner (which I

Re: Negative values using latency marker

2017-11-03 Thread Nico Kruber
Hi Tovi, if I see this correctly, the LatencyMarker gets its initial timstamp during creation at the source and the latency is reported as a metric at a sink by comparing the initial timestamp with the current time. If the clocks between the two machines involved diverge, e.g. the sinks clock

Re: Incremental checkpointing documentation

2017-11-03 Thread Nico Kruber
Hi Elias, let me answer the questions to the best of my knowledge, but in general I think this is as expected. (Let me give a link to the docs explaining the activation [1] for other readers first.) On Friday, 3 November 2017 01:11:52 CET Elias Levy wrote: > What is the interaction of

Re: Making external calls from a FlinkKafkaPartitioner

2017-11-03 Thread Nico Kruber
Hi Ron, imho your code should be fine (except for a potential visibility problem on the changes of the non-volatile partitionMap member, depending on your needs). The #open() method should be called (once) for each sink initialization (according to the javadoc) and then you should be fine with

Re: Negative values using latency marker

2017-11-06 Thread Nico Kruber
nd regards, > Tovi > -Original Message- > From: Nico Kruber [mailto:n...@data-artisans.com] > Sent: יום ו 03 נובמבר 2017 15:22 > To: user@flink.apache.org > Cc: Sofer, Tovi [ICG-IT] <ts72...@imceu.eu.ssmb.com> > Subject: Re: Negative values using latency m

  1   2   >