Re: Help using HBase with Flink 1.1.4

2017-01-16 Thread Ted Yu
Logged FLINK-5517 for upgrading hbase version to 1.3.0 On Mon, Jan 16, 2017 at 5:26 PM, Ted Yu wrote: > hbase uses Guava 12.0.1 and Flink uses 18.0 where Stopwatch.()V is > no longer accessible. > HBASE-14963 removes the use of Stopwatch at this location. > > hbase 1.3.0 RC

Re: Help using HBase with Flink 1.1.4

2017-01-16 Thread Ted Yu
hbase uses Guava 12.0.1 and Flink uses 18.0 where Stopwatch.()V is no longer accessible. HBASE-14963 removes the use of Stopwatch at this location. hbase 1.3.0 RC has passed voting period. Please use 1.3.0 where you wouldn't see the IllegalAccessError On Mon, Jan 16, 2017 at 4:50 PM, Giuliano

Help using HBase with Flink 1.1.4

2017-01-16 Thread Giuliano Caliari
Hello, I'm trying to use HBase on one of my stream transformations and I'm running into the Guava/Stopwatch dependency problem java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator Reading on the

Three input stream operator and back pressure

2017-01-16 Thread Dmitry Golubets
Hi, there are only *two *interfaces defined at the moment: *OneInputStreamOperator* and *TwoInputStreamOperator.* Is there any way to define an operator with arbitrary number of inputs? My another concern is how to maintain *backpressure *in the operator? Let's say I read events from two Kafka

Re: Fault tolerance guarantees of Elasticsearch sink in flink-elasticsearch2?

2017-01-16 Thread Andrew Roberts
Hi Gordon, Thanks for getting back to me. The ticket looks good, but I’m going to need to do something similar for our homegrown sinks. It sounds like just having the affected sinks participate in checkpointing is enough of a solution - is there anything special about `SinkFunction[T]`

Re: Measuring Execution Time

2017-01-16 Thread Charith Wickramarachchi
Thanks very much. Regards, Charith On Mon, Jan 16, 2017 at 12:49 PM, Ufuk Celebi wrote: > This is exposed via the REST API under: > > /jobs/:jobid/vertices/:vertexid/subtasktimes > > The SubtasksTimesHandler serves this endpoint. > > – Ufuk > > > On Mon, Jan 16, 2017 at 6:20

Re: Measuring Execution Time

2017-01-16 Thread Ufuk Celebi
This is exposed via the REST API under: /jobs/:jobid/vertices/:vertexid/subtasktimes The SubtasksTimesHandler serves this endpoint. – Ufuk On Mon, Jan 16, 2017 at 6:20 PM, Charith Wickramarachchi wrote: > Thanks very much. I noticed the times recorded in the web

Re: How to read from a Kafka topic from the beginning

2017-01-16 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 I would put this differently: "auto.offset.reset" policy is only used, if there are no valid committed offsets for a topic. See here: http://stackoverflow.com/documentation/apache-kafka/5449/consumer-groups - -and-offset-management (don't be

Re: How to read from a Kafka topic from the beginning

2017-01-16 Thread Jonas
You also need to have a new for this to work. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-read-from-a-Kafka-topic-from-the-beginning-tp3522p11087.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: Can serialization be disabled between chains?

2017-01-16 Thread Ufuk Celebi
+1 to what Fabian said. Regarding the memory consumption: Flink's back pressure mechanisms also depends on this, because the availability of (network) buffers determines how fast operator can produce data. If no buffers are available, the producing operator will slow down. On Mon, Jan 16, 2017 at

Re: Restart the job from a checkpoint

2017-01-16 Thread Ufuk Celebi
Yes, exactly. This is a little cumbersome at the moment, but there are plans to improve this after 1.2 is released. – Ufuk On 16 January 2017 at 16:33:49, tao xiao (xiaotao...@gmail.com) wrote: > Hi Ufuk, > > Thank you for the reply. I want to know what the difference is between >

Re: Regarding caching the evicted elements and re-emitting them to the next window

2017-01-16 Thread Shaoxuan Wang
Hi Abdul, You may want to check out FLIP13 "side output" https://goo.gl/6KSYd0 . Once we have this feature, you should be able to collect the data to the external distributed storage, and use these data later on demand. BTW, can you explain your use case in more details, such that people here may

Re: Restart the job from a checkpoint

2017-01-16 Thread tao xiao
Hi Ufuk, Thank you for the reply. I want to know what the difference is between state.backend.fs.checkpoint.dir and state.checkpoints.dir in this case? Does state.checkpoint.dir store the metadata that points to the checkpoint that is stored in state.backend.fs.checkpoint.dir? On Mon, 16 Jan

Re: Apache Flink 1.1.4 - Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-01-16 Thread Vasiliki Kalavri
Hi Miguel, thank you for opening the issue! Changes/improvements to the documentation are also typically handled with JIRAs and pull requests [1]. Would you like to give it a try and improve the community detection docs? Cheers, -Vasia. [1]:

Re: Queryable State

2017-01-16 Thread Dawid Wysakowicz
Hi Nico, Ufuk, Thanks for diving into this issue. @Nico I don't think that's the problem. The code can be exactly reproduced in java. I am using other constructor for ListDescriptor than you did: You used: > public ListStateDescriptor(String name, TypeInformation typeInfo) > While I used: >

Re: Can serialization be disabled between chains?

2017-01-16 Thread Dmitry Golubets
First issue is not a problem with idiomatic Scala - we make all our data objects immutable. Second.. yeah, I guess it makes sense. Thanks for clarification. Best regards, Dmitry On Mon, Jan 16, 2017 at 1:27 PM, Fabian Hueske wrote: > One of the reasons is to ensure that data

Re: Can serialization be disabled between chains?

2017-01-16 Thread Fabian Hueske
One of the reasons is to ensure that data cannot be modified after it left a thread. A function that emits the same object several times (in order to reduce object creation & GC) might accidentally modify emitted records if they would be put as object in a queue. Moreover, it is easier to control

Re: Can serialization be disabled between chains?

2017-01-16 Thread Dmitry Golubets
Hi Ufuk, Do you know what's the reason for serialization of data between different threads? Also, thanks for the link! Best regards, Dmitry On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi wrote: > Hey Dmitry, > > this is not possible if I'm understanding you correctly. > > A

Re: Can serialization be disabled between chains?

2017-01-16 Thread Ufuk Celebi
Hey Dmitry, this is not possible if I'm understanding you correctly. A task chain is executed by a single task thread and hence it is not possible to continue processing before the record "leaves" the thread, which only happens when the next task thread or the network stack consumes it. Hand

Re: Queryable State

2017-01-16 Thread Nico Kruber
Hi Dawid, regarding the original code, I couldn't reproduce this with the Java code I wrote and my guess is that the second parameter of the ListStateDescriptor is wrong: .asQueryableState( "type-time-series-count", new

Re: Queryable State

2017-01-16 Thread Ufuk Celebi
Hey Dawid! I talked offline with Nico last week and he took this over. He also suggested to remove the list queryable state variant altogether which makes a lot of sense to me (at least with the current state of things). @Nico: could you open an issue for it? Nico also found a difference in your

Re: Apache Flink 1.1.4 - Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-01-16 Thread Miguel Coimbra
Hello, I created the JIRA issue at: https://issues.apache.org/jira/browse/FLINK-5506 Is it possible to submit suggestions to the documentation? If so, where can I do so? I actually did this based on the example at this page (possible Flink versions aside):

Re: Restart the job from a checkpoint

2017-01-16 Thread Ufuk Celebi
Hey! This is possible with the upcoming 1.2 version of Flink (also in the current snapshot version): https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/checkpoints.html#externalized-checkpoints You have to manually activate it via the checkpoint config (see docs). Ping me if you

Re: Events are assigned to wrong window

2017-01-16 Thread Nico
Hi Aljoscha, is was able to identify the root cause of the problem. It is my first map function using the ValueState. But first, the assignTimestampsAndWatermarks() is called after the connector to Kafka is generated: FlinkKafkaConsumer09 carFlinkKafkaConsumer09 = new

Re: Measuring Execution Time

2017-01-16 Thread Ufuk Celebi
Unfortunately no. You only get the complete execution time after the job has finished. What you can do is browse to the web interface and check the runtime for each operator there (asumming that each iterative process is a separate operator). Does this help? On Mon, Jan 16, 2017 at 7:36 AM,

Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-16 Thread Fabian Hueske
A user reported that outer joins on the Table API and SQL compute wrong results: https://issues.apache.org/jira/browse/FLINK-5498 2017-01-15 20:23 GMT+01:00 Till Rohrmann : > I found two problematic issues with Mesos HA mode which breaks it: > >

Re: State in flink jobs

2017-01-16 Thread Stephan Ewen
Hi! State is only persisted as part of checkpoints or savepoints. If both are not used, then state is not persistent across restarts. Stephan On Mon, Jan 16, 2017 at 8:47 AM, Janardhan Reddy < janardhan.re...@olacabs.com> wrote: > Hi, > > Is the value state persisted across job

Re: How to get help on ClassCastException when re-submitting a job

2017-01-16 Thread Stephan Ewen
Hi! I think Yury pointed out the correct diagnosis. Caching the classes across multiple jobs in the same session can cause these types of issues. For YARN single-job deployments, Flink 1.2 will not to any dynamic classloading any more, but start with everything in the application classpath. For

Re: How to get help on ClassCastException when re-submitting a job

2017-01-16 Thread Ufuk Celebi
@Giuliano: any updates? Very curious to figure out what's causing this. As Fabian said, this is most likely a class loading issue. Judging from the stack trace, you are not running with YARN but a standalone cluster. Is that correct? Class loading wise nothing changed between Flink 1.1 and Flink

Re: Kafka KeyedStream source

2017-01-16 Thread Fabian Hueske
Hi Niels, I think the biggest problem for keyed sources is that Flink must be able to co-locate key-partitioned state with the pre-partitioned data. This might work, if the key is the partition ID, i.e, not the original key attribue that was hashed to assign events to partitions. Flink could

Re: Strategies for Complex Event Processing with guaranteed data consistency

2017-01-16 Thread Kathleen Sharp
Hi Fabian, A case consists of all events sharing the same case id. This id is what we initially key the stream by. The order of these events is the trace. For example, caseid: case1, consisting of event1, event2, event3. Start time 11:00, end 11:05, run time 5 minutes caseid: case12, consisting