Hello!
Has anyone used Flink in "production" for PLC's sanomaly detections?
Any pointers/docs to check?
Best regards,
Iván Venzor C.
Till,
Thanks again for putting this together. It is certainly along the lines of
what I want to accomplish, but I see some problem with it. In your code
you use a ValueStore to store the priority queue. If you are expecting to
store a lot of values in the queue, then you are likely to be using
Why do you want collect and iterate? Why not iterate on the DataStream
itself?
May be I didn't understand your use case completely.
Srikanth
On Tue, May 3, 2016 at 10:55 AM, Aljoscha Krettek
wrote:
> Hi,
> please keep in mind that we're dealing with streams. The Iterator
Hello,
Just to follow up on this issue: after collecting some data and setting up
additional tests we have managed to pinpoint the issue to the the
ScalaBuff-generated code that decodes enumerations. After switching to use
ScalaPB generator instead, the problem was gone.
One thing peculiar about
Hi,
please keep in mind that we're dealing with streams. The Iterator might
never finish.
Cheers,
Aljoscha
On Tue, 3 May 2016 at 16:35 Suneel Marthi wrote:
> DataStream> *newCentroids = new DataStream<>.()*
>
> *Iterator> iter
DataStream> *newCentroids = new DataStream<>.()*
*Iterator> iter =
DataStreamUtils.collect(newCentroids);*
*List> list = Lists.newArrayList(iter);*
On Tue, May 3, 2016 at 10:26 AM, subash basnet wrote:
> Hello all,
>
Hello Stefano,
Thank you, I found out that just sometime ago that I could use keyBy, but I
couldn't find how to set and getBroadcastVariable in datastream like that
of dataset.
For example in below code we get collection of *centroids* via broadcast.
Eg: In KMeans.java
class X extends
I'm not sure this is the right way to do it but we were exploring all the
possibilities and this one is the more obvious. We also spent some time to
study how to do it to achieve a better understanding of Flink's internals.
What we want to do though is to integrate Flink with another distributed
Hi
I did all the settings required for cluster setup. but when I ran the
start-cluster.sh script, it only started one jobmanager on the master node.
Logs are written only on the master node. Slaves don't have any logs. And
when I ran a program it said:
Resources available to scheduler: Number of
Hi!
The order in which the elements arrive in an iteration HEAD is the order in
which the last operator in the loop (the TAIL) produces them. If that is a
deterministic ordering (because of a sorted reduce, for example), then you
should be able to rely on the order.
Otherwise, the order of
Just had a quick chat with Aljoscha...
The first version of the aligned window code will still duplicate the
elements, later versions should be able to get rid of that.
On Tue, May 3, 2016 at 11:10 AM, Aljoscha Krettek
wrote:
> Hi,
> even with the optimized operator for
After fixing the clock issue on the application level, the latency is as
expected. Thanks again!
Robert
On Tue, May 3, 2016 at 9:54 AM, Robert Schmidtke
wrote:
> Hi Igor, thanks for your reply.
>
> As for your first point I'm not sure I understand correctly. I'm
Hi Simone,
you are right, the interfaces you extend are not considered to be public,
user-facing API.
Adding custom operators to the DataSet API touches many parts of the system
and is not straightforward.
The DataStream API has better support for custom operators.
Can you explain what kind of
Yeah thanks for letting me know.
On 03-May-2016 2:40 PM, "Fabian Hueske" wrote:
> Yes, but be aware that your program runs with parallelism 1 if you do not
> configure the parallelism.
>
> 2016-05-03 11:07 GMT+02:00 Punit Naik :
>
>> Hi Stephen, Fabian
Hello Fabian,
we delved more moving from the input you gave us but a question arised. We
always assumed that runtime operators were open for extension without
modifying anything inside Flink but it looks like this is not the case and
the documentation assumes that the developer is working to a
Hello all,
How could we perform *withBroadcastSet* and *groupBy* in DataStream like
that of DataSet in the below KMeans code:
DataSet newCentroids = points
.map(new SelectNearestCenter()).*withBroadcastSet*(loop, "centroids")
.map(new CountAppender()).*groupBy*(0).reduce(new
Hey Tarandeep,
I think the failures are unrelated. Regarding the number of network
buffers:
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-the-network-buffers
The timeouts might occur, because the task managers are pretty loaded.
I would suggest to
Hi,
even with the optimized operator for aligned time windows I would advice
against using long sliding windows with a small slide. The system will
internally create a lot of "buckets", i.e. each sliding window is treated
separately and the element is put into 1,440 buckets, in your case. With a
Yes, but be aware that your program runs with parallelism 1 if you do not
configure the parallelism.
2016-05-03 11:07 GMT+02:00 Punit Naik :
> Hi Stephen, Fabian
>
> setting "fs.output.always-create-directory" to true in flink-config.yml
> worked!
>
> On Tue, May 3, 2016
Hi Stephen, Fabian
setting "fs.output.always-create-directory" to true in flink-config.yml
worked!
On Tue, May 3, 2016 at 2:27 PM, Stephan Ewen wrote:
> Hi!
>
> There is the option to always create a directory:
> "fs.output.always-create-directory"
>
> See
>
There is a Scaladoc but it is not covering all packages, unfortunately. In
the Scala API you can call transform without specifying a TypeInformation,
it works using implicits/context bounds.
On Tue, 3 May 2016 at 01:48 Srikanth wrote:
> Sorry for the previous incomplete
Hi!
There is the option to always create a directory:
"fs.output.always-create-directory"
See
https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#file-systems
Greetings,
Stephan
On Tue, May 3, 2016 at 9:26 AM, Punit Naik wrote:
> Hello
>
> I
Hi Elias,
thanks for the long write-up. It's interesting that it actually kinda works
right now.
You might be interested in a design doc that we're currently working on. I
posted it on the dev list but here it is:
Hi Igor, thanks for your reply.
As for your first point I'm not sure I understand correctly. I'm ingesting
records at a rate of about 50k records per second, and those records are
fairly small. If I add a time stamp to each of them, I will have a lot more
data, which is not exactly what I want.
Did you specify a parallelism? The default parallelism of a Flink instance
is 1 [1].
You can set a different default parallelism in ./conf/flink-conf.yaml or
pass a job specific parallelism with ./bin/flink using the -p flag [2].
More options to define parallelism are in the docs [3].
[1]
Hello
I executed my Flink code in eclipse and it properly generated the output by
creating a folder (as specified in the string) and placing output files in
them.
But when I exported the project as JAR and ran the same code using ./flink
run, it generated the output, but instead of creating a
Hi Elias!
There is a feature pending that uses an optimized version for aligned time
windows. In that case, elements would go into a single window pane, and the
full window would be composed of all panes it spans (in the case of sliding
windows). That should help a lot in those cases.
The
27 matches
Mail list logo