Hi all,
What's the purpose of .keyBy() on ConnectedStream? How does it affect
.map() and .flatMap()?
I'm not finding a way to group stream elements based on a key, something
like a Window on a normal Stream, but for a ConnectedStream.
Regards,
Matt
Haha, I see. Thanks.
On 1/26/17, 1:48 PM, "Chen Qin" wrote:
>We worked around S3 and had a beer with our Hadoop engineers...
>
>
>
>--
>View this message in context:
Hi there, I just started investigating Flink and I'm curious if I'm
approaching my issue in the right way.
My current usecase is modeling a series of transformations, where I
start with some transformations, which when done can yield another
transformation, or a result to output to some sink, or
Hi Robert,
Thanks for the answer.
My code does actually contain both mapr streams and maprdb jars. here are
the steps I followed based on your suggestion:
1. I copied only the mapr-streams-*.jar and maprdb*.jar.
2. Then I tried to run my jar, but i got java.lang.noclassdeffounderror for
some
I have a project where i am reading in on a single DataStream from Kafka, then
sending to a variable number of handlers based on content of the recieved data,
after that i want to join them all. Since i do not know how many different
streams this will create, i cannot have a single "base" to
We worked around S3 and had a beer with our Hadoop engineers...
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-snapshotting-to-S3-Timeout-waiting-for-connection-from-pool-tp10994p11330.html
Sent from the Apache Flink User Mailing List
Hello Chesnay,
Thanks for the advice. I've begun adding multiple jobs per Python plan file
here: https://issues.apache.org/jira/browse/FLINK-5183 and
https://github.com/GEOFBOT/flink/tree/FLINK-5183
The functionality of the patch works. I am able to run multiple jobs per
file successfully, but
Hi,
can anyone help me with this problem? I don't get it. Forget the examples
below, I've created a copy / paste example to reproduce the problem of
incorrect results when using key-value state und windowOperator.
public class StreamingJob {
public static void main(String[] args) throws
Hi,
I am new here.
My name is Lior and I am working at Parallel Machines.
Was assigned recently to work on Flink run/use/improve :-)
I am using the FLINK_CONF_DIR environment variable to pass the config
location
to the start-cluster.sh (e.g. env
Yes, thank you Robert!
Best regards,
Dmitry
On Thu, Jan 26, 2017 at 4:55 PM, Robert Metzger wrote:
> Hi,
> Is this what you are looking for? https://ci.apache.org/
> projects/flink/flink-docs-release-1.2/monitoring/best_
> practices.html#parsing-command-line-arguments-and-
Hi,
Is this what you are looking for?
https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application
On Thu, Jan 26, 2017 at 5:38 PM, Dmitry Golubets
wrote:
> Hi,
>
Hi,
Is there a place for user defined configuration settings?
How to read them?
Best regards,
Dmitry
Hi Robert,
I ended up overriding Flink httpclient version number in main pom file and
recompiling it.
Thanks
Best regards,
Dmitry
On Thu, Jan 26, 2017 at 4:12 PM, Robert Metzger wrote:
> Hi Dmitry,
>
> I think this issue is new.
> Where is the AWS SDK dependency coming
Hi Dmitry,
I think this issue is new.
Where is the AWS SDK dependency coming from? Maybe you can resolve the
issue on your side for now.
I've filed a JIRA for this issue:
https://issues.apache.org/jira/browse/FLINK-5661
On Wed, Jan 25, 2017 at 8:24 PM, Dmitry Golubets
Hi Dominik,
You could measure the throughput at each task in your job to see if one
operator is causing the slowdown (for example using Flink's metrics system)
Maybe the backpressure view already helps finding the task that causes the
issue.
Did you check if there are enough resources available
Hi Ani,
This error is independent of cancel vs stop. Its an issue of loading the
MapR classes from the classloaders.
Do you user jars contain any MapR code (either mapr streams or maprdb)?
If so, I would recommend you to put these MapR libraries into the "lib/"
folder of Flink. They'll then be
Hi,
I would guess that the watermark generation does not work as expected.
I would recommend to log the extracted timestamps + the watermarks to
understand how time is progressing, and when watermarks are generated to
trigger a window computation.
On Tue, Jan 24, 2017 at 6:53 PM, Sujit Sakre
Hi Joe,
working on a KeyedStream means that the records are partitioned by that
key, i.e., all records with the same key are processed by the same thread.
Therefore, only on thread accesses the state for a particular key.
Other tasks do not have read or write access to the state of other tasks.
Hi Oriol,
The number of keys is related to the number of data-structures (NFAs) Flink is
going to create and keep.
Given this, it may make sense to try to reduce your key-space (or your
keyedStreams). Other than that, Flink
has not issue handling large numbers of keys.
Now, for the issue you
Hi Florian,
you can rate-limit the Kafka consumer by implementing a custom
DeserializationSchema that sleeps a bit from time to time (or at each
deserialization step)
On Tue, Jan 24, 2017 at 1:16 PM, Florian König
wrote:
> Hi Till,
>
> thank you for the very helpful
@jonas Flink's Fork-Join Pool drives only the actors, which are doing
coordination. Unless your job is permanently failing/recovering, they don't
do much.
On Thu, Jan 26, 2017 at 2:56 PM, Robert Metzger wrote:
> Hi Jonas,
>
> The good news is that your job is completely
Hi Jonas,
The good news is that your job is completely parallelizable. So if you are
running it on a cluster, you can scale it at least to the number of Kafka
partitions you have (actually even further, because the Kafka consumers are
not the issue).
I don't think that the scala (=akka) worker
If I have a keyed stream going in to a N node Flink stream processor, and I had
a job that was keeping a count using a ValueStateDescriptor (per key), would
that descriptor be synchronized among all the nodes?
i.e. Are the state descriptors interfaces (ValueStateDescriptor,
JProfiler
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Improving-Flink-Performance-tp11248p11311.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at
Nabble.com.
Hello everyone,
I'm using the CEP library for event stream processing.
I'm splitting the dataStream into different KeyedStreams using keyBy(). In
the KeyBy, I'm using a tuple of two elements, which means I may have
several millions of KeyedStreams, as I need to monitor all our customer's
users.
Offtopic: What profiler is it that you're using?
> On Jan 25, 2017, at 18:11, Jonas wrote:
>
> Images:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/n11305/Tv6KnR6.png
> and
>
26 matches
Mail list logo