Hi, new Flink user here!
I found a discussion on user@flink.apache.org about using DynamoDB as a sink.
However, as noted, sinks have an at-least-once guarantee so your operations
must idempotent.
However, another way to go about this (and correct me if I'm wrong) is to write
the state to the
I am trying to use AWS EMR yarn cluster where the flink code runs, in one
of apply window function, I try to set some values in redis it fails. I
have tried to access the same redis with no flink code and get/set works,
but from the flink I get into this exception. Any inputs on what might be
Hi Timur,
Great! Bootstrap action for Flink is good for AWS users. I think the bootstrap
action scripts would be placed in `flink-contrib` directory.
If you want, one of people in PMC of Flink will be assign FLINK-1337 to you.
Regards,
Chiwan Park
> On Apr 6, 2016, at 3:36 AM, Timur Fayruzov
Hi Saiph,
all you have to do is to invoke the `javaStream` method on your Scala
DataStream.
Hope I've been helpful. :)
On Tue, Apr 5, 2016 at 7:35 PM, Saiph Kappa wrote:
> Hi,
>
> I'm programming in scala and using some extra libraries made in Java. My
> question is:
Hi,
I need some suggestions regarding accessing RDF triples from flink. I'm
trying to integrate flink in a pipeline where the input for flink comes
from SPARQL query on a Jena model. And after modification of triples using
flink, I will be performing SPARQL update using Jena to save my changes.
Hi - I'm trying to identify bottlenecks in my Flink streaming job, and am
curious about the Back Pressure view in the job manager web UI. If there
are already docs for Back Pressure please feel free to just point me to
those. :)
When "Sampling in progress..." is displayed, what exactly is
Hi - as clarified in another thread [1] stateful operators store all of
their current state in the backend on each checkpoint. Just curious if
Kafka topics with log compaction have ever been considered as a possible
state backend?
Samza [2] uses RocksDB as a local state store, with all writes
Hi Ufuk,
I thought so, but I am not sure when and where ;) I will let you know,
if I come across it again.
Cheers,
Konstantin
On 05.04.2016 21:10, Ufuk Celebi wrote:
> Hey Zach and Konstantin,
>
> Great questions and answers. We can try to make this more explicit in the
> docs.
>
> On Tue,
Hey Zach and Konstantin,
Great questions and answers. We can try to make this more explicit in the docs.
On Tue, Apr 5, 2016 at 8:54 PM, Konstantin Knauf
wrote:
> To my knowledge flink takes care of deleting old checkpoints (I think it
> says so in the
This worked when I ran my test code locally, but I'm seeing nothing reach my
sink when I try to run this in YARN (previously, when I just echo'ed all sums
to my sink, it would work).
Here's what my code looks like:
StreamExecutionEnvironment env =
Hi Zach,
some answers/comments inline.
Cheers
Konstantin
On 05.04.2016 20:39, Zach Cox wrote:
> Hi - I have some questions regarding Flink's checkpointing, specifically
> related to storing state in the backends.
>
> So let's say an operator in a streaming job is building up some state.
>
Hi - I have some questions regarding Flink's checkpointing, specifically
related to storing state in the backends.
So let's say an operator in a streaming job is building up some state. When
it receives barriers from all of its input streams, does it store *all* of
its state to the backend? I
Yes, Hadoop version was the culprit. It turns out that EMRFS requires at
least 2.4.0 (judging from the exception in the initial post, I was not able
to find the official requirements).
Rebuilding Flink with Hadoop 2.7.1 and with Scala 2.11 worked like a charm
and I was able to run WordCount using
Hey Timur,
if you are using EMR with IAM roles, Flink should work out of the box.
You don't need to change the Hadoop config and the IAM role takes care
of setting up all credentials at runtime. You don't need to hardcode
any keys in your application that way and this is the recommended way
to go
Hi,
I'm programming in scala and using some extra libraries made in Java. My
question is: how can I easily convert
"org.apache.flink.streaming.scala.DataStream" to
"org.apache.flink.streaming.api.datastream.DataStream"?
Thanks.
Hi
The following are missing in the ‘Powered by Flink’ list:
king.com
https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces88
Otto Group
http://data-artisans.com/how-we-selected-apache-flink-at-otto-group/
Hello Ufuk,
I'm using EMR 4.4.0.
Thanks,
Timur
On Tue, Apr 5, 2016 at 2:18 AM, Ufuk Celebi wrote:
> Hey Timur,
>
> which EMR version are you using?
>
> – Ufuk
>
> On Tue, Apr 5, 2016 at 1:43 AM, Timur Fayruzov
> wrote:
> > Thanks for the answer,
Hi everyone,
I would like to bring the "Powered by Flink" wiki page [1] to the attention
of Flink user's who recently joined the Flink community. The list tracks
which organizations are using Flink.
If your company / university / research institute / ... is using Flink but
the name is not yet
Hey Robert!
This is currently not possible :-(, but this is a feature that is on
Flink's road map.
A very inconvenient workaround could be to manually query the REST
APIs [1] and dump the responses somewhere and query it there.
– Ufuk
[1]
Hi everyone,
I'm using Flink 0.10.2 to run some benchmarks on my cluster and I would
like to compare it to Spark 1.6.0. Spark has an eventLog property that I
can use to have the history written to HDFS, and then later view it offline
on the History Server.
Does Flink have a similar Feature,
Aljoscha,
Thank you for your quick response.
Yes, I am using FsStateBackend, so I will try RocksDB backend.
Regards,
Hironori
2016-04-05 21:23 GMT+09:00 Aljoscha Krettek :
> Hi,
> I guess you are using the FsStateBackend, is that correct? You could try
> using the RocksDB
Hello,
I am trying to implement windowed distinct count on a stream. In this
case, the state
have to hold all distinct value in the window, so can be large.
In my test, if the state size become about 400MB, checkpointing takes
40sec and spends most of Taskmanager's CPU.
Are there any good way to
Yes exactly. This is a feature which we still have to add.
On Tue, Apr 5, 2016 at 1:07 PM, Anwar Rizal wrote:
> Thanks Till.
>
> The only way I can change the behavior would be to post filter the result
> then.
>
> Anwar.
>
> On Tue, Apr 5, 2016 at 11:41 AM, Till Rohrmann
Thanks Till.
The only way I can change the behavior would be to post filter the result
then.
Anwar.
On Tue, Apr 5, 2016 at 11:41 AM, Till Rohrmann wrote:
> Hi Anwar,
>
> yes, once we have published the introductory blog post about the CEP
> library, we will also publish
Hi Robert,
I tried several paths and rmr before.
It stopped after 1-2 minutes. There was an exception on the shell.
Sorry, should have attached to the last mail.
Thanks,
Konstnatin
On 05.04.2016 11:22, Robert Metzger wrote:
> I've tried reproducing the issue on a test cluster, but everything
By the way. The way I see to fixing this is extending WindowAssigner with
an "isEventTime()" method and then allow accumulating/lateness in the
WindowOperator only if this is true.
But it seems a but hacky because it special cases event-time. But then
again, maybe we need to special case it ...
Hi Folks,
as part of my effort to improve the windowing in Flink [1] I also thought
about lateness, accumulating/discarding and window cleanup. I have some
ideas on this but I would love to get feedback from the community as I
think that these things are important for everyone doing event-time
Hello,
last week I got a problem where my job worked in local mode but could not
be serialized on the cluster. I assume that local mode does not really
serialize all the operators (the problem was with a custom map function)
and I need to enforce this behaviour in local mode or, better, be able
Hi Anwar,
yes, once we have published the introductory blog post about the CEP
library, we will also publish a more in-depth description of the approach
we have implemented. To spoil it a little bit: We have mainly followed the
paper “Efficient Pattern Matching over Event Streams” for the
I've tried reproducing the issue on a test cluster, but everything worked
fine.
Have you tried different values for "recovery.zookeeper.path.root" or only
one? Maybe the path you've put contains invalid data?
Regarding the client log you've send: Did you manually stop the client or
did it stop
Hello all,
you can find my slides on Large-Scale Machine Learning with FlinkML here
(from SICS Data Science day and FOSDEM 2016):
http://www.slideshare.net/TheodorosVasiloudis/flinkml-large-scale-machine-learning-with-apache-flink
Best,
Theodore
On Mon, Apr 4, 2016 at 3:19 PM, Rubén Casado
Balaji, now I see it is my mistake: I wasn't clear enough in my
question, sorry. Saying "the project" I mean Flink project itself. The
question is already answered.
Regards,
Andrew
Balaji Rajagopalan writes:
> In your pom file you can mention against which
Chiwan, thanks, got it! - and the build finished with success.
I still a little confused with the method used: a tool from tools/
changes files being under the Git control.
Regards,
Andrew
Chiwan Park writes:
> Hi Andrew,
>
> The method to build Flink with Scala 2.11
Hey Konstantin,
just looked at the logs and the cluster is started, but the job is
indeed never submitted.
I've forwarded this to Robert, because he is familiar with the YARN
client. I will look into how the client interacts with the ZooKeeper
root path.
– Ufuk
On Tue, Apr 5, 2016 at 9:18 AM,
34 matches
Mail list logo