From the Flink documentation:
Conditions for a class to be treated as a POJO by Flink:
- The class must be public
- It must have a public constructor without arguments
- All fields either have to be public or there must be getters and
setters for all non-public fields.
In your example, the
issues -
but those were related to Java 8 lambdas.
Back then, bumping ASM to version 5 helped it. Not sure if this is the same
problem, though, since you do not seem to use Java 8 lambdas...
On Fri, Apr 24, 2015 at 11:32 AM, Aljoscha Krettek aljos...@apache.org
wrote:
I'm looking
Hi,
I think the example could be made more concise by using the Table API.
http://ci.apache.org/projects/flink/flink-docs-master/libs/table.html
Please let us know if you have questions about that, it is still quite new.
On Fri, Jun 5, 2015 at 9:03 AM, hawin hawin.ji...@gmail.com wrote:
Hi
Yes, this code seems very reasonable. :D
The way to use this to modify a file on HDFS is to read the file,
then filter out some elements and write a new modified file that does
not contain the filtered out elements. As said before, Flink (or
HDFS), does not allow in-place modification of files.
:03 AM, Aljoscha Krettek aljos...@apache.org
wrote:
Hi,
actually, what do you want to know about Flink SQL?
Aljoscha
On Sat, Jun 6, 2015 at 2:22 AM, Hawin Jiang hawin.ji...@gmail.com
wrote:
Thanks all
Actually, I want to know more info about Flink SQL and Flink performance
Here
- part1
- 1
- 2
- ...
- part2
- 1
- ...
- partX
Flink's file format supports recursive directory scans such that you can
add new subfolders to dataSetRootFolder and read the full data set.
2015-06-05 9:58 GMT+02:00 Aljoscha Krettek aljos...@apache.org:
Hi,
I
@Ufuk, probably should. yes.
On Thu, 18 Jun 2015 at 16:18 Tamara Mendt tammyme...@gmail.com wrote:
Great, thanks!
On Thu, Jun 18, 2015 at 4:16 PM, Ufuk Celebi u...@apache.org wrote:
Should we add this to the Javadoc of the eagerly executed operations?
On 18 Jun 2015, at 16:11, Maximilian
Hi,
could you please try replacing JavaDefaultStringSchema() with
SimpleStringSchema() in your first example. The one where you get this
exception:
org.apache.commons.lang3.SerializationException:
java.io.StreamCorruptedException: invalid stream header: 68617769
Cheers,
Aljoscha
On Fri, 26 Jun
Not yet, no. I created a Jira issue:
https://issues.apache.org/jira/browse/FLINK-2277
On Thu, 25 Jun 2015 at 14:48 Sebastian s...@apache.org wrote:
Is there a way to configure this setting for a delta iteration in the
scala API?
Best,
Sebastian
On 17.06.2015 10:04, Ufuk Celebi wrote:
,
Shiti
On Sun, Jun 14, 2015 at 4:10 PM, Shiti Saxena ssaxena@gmail.com
wrote:
I'll do the fix
On Sun, Jun 14, 2015 at 12:42 AM, Aljoscha Krettek aljos...@apache.org
wrote:
I merged your PR for the RowSerializer. Teaching the aggregators to deal
with null values should be a very
approach for the test case?
Thanks,
Shiti
On Sun, Jun 14, 2015 at 4:10 PM, Shiti Saxena ssaxena@gmail.com
wrote:
I'll do the fix
On Sun, Jun 14, 2015 at 12:42 AM, Aljoscha Krettek aljos...@apache.org
wrote:
I merged your PR for the RowSerializer. Teaching the aggregators to
deal
fix it.
On Thu, 11 Jun 2015 at 10:40 Aljoscha Krettek aljos...@apache.org wrote:
Cool, good to hear.
The PojoSerializer already handles null fields. The RowSerializer can be
modified in pretty much the same way. So you should start by looking at the
copy()/serialize()/deserialize() methods
Hi,
yes, I think the problem is that the RowSerializer does not support
null-values. I think we can add support for this, I will open a Jira issue.
Another problem I then see is that the aggregations can not properly deal
with null-values. This would need separate support.
Regards,
Aljoscha
On
on the issue with TupleSerializer or is someone working on it?
On Mon, Jun 15, 2015 at 11:20 AM, Aljoscha Krettek aljos...@apache.org
wrote:
Hi,
the reason why this doesn't work is that the TupleSerializer cannot deal
with null values:
@Test
def testAggregationWithNull(): Unit = {
val env
)?
On Tue, Jun 16, 2015 at 1:32 PM, Aljoscha Krettek aljos...@apache.org
wrote:
One more thing, it would be good if the TupleSerializer didn't write a
boolean for every field. A single integer could be used where one bit
specifies if a given field is null or not. (Maybe we should also add
Thanks for fixing my oversight. :D
On Fri, May 29, 2015 at 3:05 PM, Márton Balassi
balassi.mar...@gmail.com wrote:
Thanks, Max.
On Fri, May 29, 2015 at 3:04 PM, Maximilian Michels m...@apache.org wrote:
Fixed it on the master.
Problem were some classes belonging to package
Hi,
good questions, about 1. you are right, when the JobManager fails the state
is lost. Ufuk, Till and Stephan are currently working on making the
JobManager fault tolerant by having hot-standby JobManagers and storing the
important JobManager state in ZooKeeper. Maybe they can further comment on
Hi,
I looked into it. Right now, when the specified partitioner is FORWARD
the JobGraph that is generated from the StreamGraph will have the
POINT-TO-POINT pattern specified. This doesn't work, however, if the
parallelism differs so the operators will not have a POINT-TO-POINT
connection in the
Hi Kristoffer,
I'm afraid not, but maybe Timo has some further information. In this
extended example we can see the problem:
https://gist.github.com/aljoscha/84cc363d13cf1dfe9364. The output is:
Type is: class org.apache.flink.examples.java8.wordcount.TypeTest$Thing
class
Hi Vincenzo,
regarding TaskManagers and how they execute the operations:
The TaskManager gets a class that is derived from AbstractInvokable. The
TaskManager will create an object from that class and then call methods to
facilitate execution. The two main methods are registerInputOutput() and
Hi,
I wanted to post something along the same lines but now I don't think the
approach with local top-ks and merging works. For example, if you want to
get top-4 and you do the pre-processing in two parallel instances. This
input data would lead to incorrect results:
1. Instance:
a 6
b 5
c 4
d 3
think.
Best,
Aljoscha
On Sun, 23 Aug 2015 at 23:07 Gyula Fóra gyula.f...@gmail.com wrote:
Hey,
I am not sure if I get it, why aren't the results correct?
You don't instantly get the global top-k, but you are always updating it
with the new local results.
Gyula
Aljoscha Krettek aljos
Hi,
no, this is unfortunately not fixed in the current master.
Cheers,
Aljoscha
On Tue, 28 Jul 2015 at 15:29 Ufuk Celebi u...@apache.org wrote:
Hey Phillip,
thanks for reporting the problem. I think your assessment is correct. If
the program is already finished, the threads throwing the
Hi Lydia,
it might work using new DataSet(javaSet) where DataSet
is org.apache.flink.api.scala.DataSet. I'm not sure, however. What is your
use case for this?
Cheers,
Aljoscha
On Mon, 13 Jul 2015 at 15:55 Lydia Ickler ickle...@googlemail.com wrote:
Hi guys,
is it possible to convert a Java
Hi,
that depends. How are you executing the program? Inside an IDE? By starting
a local cluster? And then, how big is your input data?
Cheers,
Aljoscha
On Wed, 15 Jul 2015 at 23:45 Vinh June hoangthevinh@gmail.com wrote:
I just realized that Flink program takes a lot of time to run, for
Hi,
your first example doesn't work because the SimpleStringSchema does not
work for sinks. You can use this modified serialization schema:
https://gist.github.com/aljoscha/e131fa8581f093915582. This works for both
source and sink (I think the current SimpleStringSchema is not correct and
should
Hi Gwenhaël,
are you using the one-yarn-cluster-per-job mode of Flink? I.e., you are
starting your Flink job with (from the doc):
flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
./examples/flink-java-examples-0.10-SNAPSHOT-WordCount.jar
If you are, then this is almost possible on the current
which i run the code.
>
>
> Yes, i did forget to post here, but my program calls the unionMessageStreams()
>
> On Wed, Oct 21, 2015 at 9:39 AM, Aljoscha Krettek
> <aljoscha.kret...@gmail.com> wrote:
> Hi Gayu,
> could it be that no data ever arrives on the second
So does the filter maybe filter out everything?
> On 21 Oct 2015, at 16:18, Gayu <gaa...@gmail.com> wrote:
>
> Yes, exactly.
>
> On Wed, Oct 21, 2015 at 10:17 AM, Aljoscha Krettek
> <aljoscha.kret...@gmail.com> wrote:
> So it is received in the filter but th
Hi,
first of all, am I correct to assume that
new SocketSource(hostName1, port, '\n', -1)
should be
new SocketTextStreamFunction(hostName1, port1, '\n', -1)
or are you using a custom built SocketSource for this?
If I replace it by SocketTextStreamFunction and execute it the example runs and
certainly possible that I
> messed something up while refactoring to the API change. I will look at
> it further when I get a chance, but if you have any thoughts they are much
> appreciated.
>
>
> Thanks,
> Paul Hamilton
>
>
> On 10/17/15, 6:39 AM, "Aljos
Hi Paul,
the key based state should now be fixed in the current 0.10-SNAPSHOT builds if
you want to continue playing around with it.
Cheers,
Aljoscha
> On 21 Oct 2015, at 19:40, Aljoscha Krettek <aljos...@apache.org> wrote:
>
> Hi Paul,
> good to hear that the windo
Hi,
I don’t think that alleviates the problem. Sometimes you might want the system
to continue even if stuff outside the UDF fails. For example, if a serializer
does not work because of a null value somewhere. You would, however, like to
get a message about this somewhere, I assume.
Cheers,
Hi,
these are some interesting Ideas.
I have some thoughts, though, about the current implementation.
1. With Schema and Field you are basically re-implementing RowTypeInfo, so it
should not be required. Maybe just an easier way to create a RowTypeInfo.
2. Right now, in Flink the
rious thing is that in the 2nd statement .keyBy(t -> t.f1) works
> but .keyBy(1) does not, even though they do the same thing. I'm using Idea at
> the moment so it can be just another type inference problem with that IDE.
>
> cheers
> Martin
>
> On Tue, Nov 3, 2015 at 3:06 P
Hi,
where are you storing the results of each window computation to? Maybe you
could also store it from inside a custom WindowFunction where you just count
the elements and then store the results.
On the other hand, adding a (1) field and doing a window reduce (à la
WordCount) is going to be
Hi Paul,
it’s good to see people interested in this. I sketched a Trigger that should
fit your requirements: https://gist.github.com/aljoscha/a7c6f22548e7d24bc4ac
You can use it like this:
DataStream<> input = …
DataStream<> result = input
.keyBy(“session-id”)
Hi,
right now, the 0.10-SNAPSHOT is in a bit of a weird state. We still have
the old windowing API in there alongside the new one. To make your example
use the new API that actually uses the timestamps and watermarks you would
use the following code:
ll I be able to switch from event-time to
> processing- or ingestion-time without having to adjust my code?
>
> Best,
> Alex
>
> Aljoscha Krettek <aljos...@apache.org> schrieb am Mi., 7. Okt. 2015,
> 17:23:
>
>> Hi,
>> right now, the 0.10-SNAPSHOT
pojo.getTime();
>return pojo.getTime();
>}
>
>@Override
>public long extractWatermark(Pojo pojo, long l) {
>return Long.MIN_VALUE;
>}
>
>@Override
>public long getCurrentWatermark() {
>return lastTimestamp - maxDelay;
Hi Martin,
the answer depends, because the current windowing implementation has some
problems. We are working on improving it in the 0.10 release, though.
If your elements arrive with strictly increasing timestamps and you have
parallelism=1 or don't perform any re-partitioning of data (which a
Hi Arnaud,
I think my answer to Gwenhaël could also be helpful to you:
are you using the one-yarn-cluster-per-job mode of Flink? I.e., you are
starting your Flink job with (from the doc):
flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096
Hi Jerry,
it should be possible to just use the Redis API inside a Flink operator,
for example a map or flatMap. You can use RichFunctions (
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#rich-functions)
to setup the connection and close it after computation
Hi Martin,
maybe this is what you are looking for:
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#output-splitting
Regards,
Aljoscha
On Thu, 3 Sep 2015 at 12:02 Till Rohrmann wrote:
> Hi Martin,
>
> could grouping be a solution to your
Hi Rico,
I have a suspicion. What is the distribution of your keys? That is, are
there many unique keys, do the keys keep evolving, i.e. is it always new
and different keys?
Cheers,
Aljoscha
On Tue, 8 Sep 2015 at 13:44 Rico Bergmann wrote:
> I also see in the TM overview
; Am 08.09.2015 um 20:05 schrieb Aljoscha Krettek <aljos...@apache.org>:
>
> Hi Rico,
> I have a suspicion. What is the distribution of your keys? That is, are
> there many unique keys, do the keys keep evolving, i.e. is it always new
> and different keys?
>
> Cheers,
>
No, each operator would have its own local list.
In a distributed environment it is very tricky to keep global state across
all instances of operations (Flink does not support anything in this
direction). If you really need it then the only way is to set the
parallelism of the operator to 1. This
Hi Jack,
Stephan is right, this should work. Unfortunately the TypeAnalyzer does not
correctly detect that it cannot treat your Id class as a Pojo. I will add a
Jira issue for that. For the time being you can use this command to force
the system to use Kryo:
env.getConfig.enableForceKryo();
I
Hi Rico,
you should be able to get it with these steps:
git clone https://github.com/StephanEwen/incubator-flink.git flink
cd flink
git checkout -t origin/windows
This will get you on Stephan's windowing branch. Then you can do a
mvn clean install -DskipTests
to build it.
I will merge his
ime. But I don't know whether this could be a setup problem. I
> noticed the os load of my testsystem was around 90%. So it might be more a
> setup problem ...
>
> Thanks for your support so far.
>
> Cheers. Rico.
>
>
>
>
>
> Am 24.09.2015 um 09:33 schrieb Aljosch
Hi Philipp,
am I correct to assume that your tuples do not arrive in the order of the
timestamp that you extract. Unfortunately, for that case the current
windowing implementation does not work correctly. We are working hard on
fixing this for the upcoming 0.10 release, though. If you are
Hi Stefan,
I added a section in the documentation that describes the syntax of the
expressions. It is a bit bare bones but I hope it helps nonetheless.
https://ci.apache.org/projects/flink/flink-docs-master/libs/table.html
Cheers,
Aljoscha
On Wed, 16 Sep 2015 at 14:55 Aljoscha Krettek <al
> Philipp
>
>
> On 18.09.2015 17:05, Aljoscha Krettek wrote:
>
> Hi Philipp,
> am I correct to assume that your tuples do not arrive in the order of the
> timestamp that you extract. Unfortunately, for that case the current
> windowing implementation does not work corr
closure, reproduction, or dissemination)
> by persons other than the intended recipient(s) is prohibited. If you receive
> this e-mail in error, please notify the sender by phone or email immediately
> and delete it!
>
>
> -Original Message-
> From: Aljoscha Krette
Right now, it is exactly "Object.hash % getNumberOfParallelSubtasks()”...
> On 09 Dec 2015, at 02:37, Radu Tudoran wrote:
>
> Object.hash % getNumberOfParallelSubtasks()
Hi Mihail,
could you please give some information about the number of keys that you are
expecting in the data and how big the elements are that you are processing in
the window.
Also, are there any other operations that could be taxing on Memory. I think
the different exception you see for
Hi Niels,
I’m afraid this will not work. (If I understood correctly what you are trying
to do.) When the trigger is being serialized/deserialized each parallel
instance of the trigger has their own copy of the QueueSource object. Plus, a
separate instance of the QueueSource itself will be
Hi,
the problem could be that GValue is not Comparable. Could you try making it
extend Comparable (The Java Comparable).
Cheers,
Aljoscha
> On 12 Dec 2015, at 20:43, Robert Metzger wrote:
>
> Hi,
>
> Can you check the log output in your IDE or the log files of the Flink
Hi Nirmalya,
when using count windows the window will trigger after “slide-size” elements
have been received. So, since in your example, slide-size is set to 1 it will
emit a new max for every element received and once it accumulated 4 elements it
will start removing one element for every new
Hi,
the current behavior is in fact that the window will be triggered every
“slide-size” elements and the computation will take into account the last
“window-size” elements. So for a window with window-size 10 and slide-size 5
the window will be triggered every 5 elements. This means that your
Hi,
I’m afraid this is not possible right now. I’m also not sure about the Evictors
as a whole. Using them makes window operations very slow because all elements
in a window have to be kept, i.e. window results cannot be pre-aggregated.
Cheers,
Aljoscha
> On 15 Dec 2015, at 12:23, Radu Tudoran
Hi,
these are certainly valid use cases. As far is I know, the people who know most
in this area are on vacation right now. They should be back in a week, I think.
They should be able to give you a proper description of the current situation
and some pointers.
Cheers,
Aljoscha
> On 04 Jan
Hi Sebastian,
I’m afraid the people working on Flink don’t have much experience with
Cassandra. Maybe you could look into the Elasticsearch sink and adapt it to
write to Cassandra instead. That could be a valuable addition to Flink.
Cheers,
Aljoscha
> On 22 Dec 2015, at 14:36, syepes
from
> getCurrentWatermark() and emitting a watermark at every record)
>
> If I set
>
> StreamExecutionEnvironment.setParallelism(5);
>
> it does not work.
>
> So, if I understood you correctly, it is the opposite of what you were
> expecting?!
>
> Chee
nds 'onEventTime' calls per second.
>
> So thank you. I now understand I have to be more careful with these timers!.
>
> Niels Basjes
>
>
>
> On Fri, Nov 27, 2015 at 11:28 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> Hi Niels,
> do the records that arrive from Ka
Hi,
the function is in fact applied to the remaining elements (at least I hope it
is). So the first sentence should be the correct one.
Cheers,
Aljoscha
> On 28 Nov 2015, at 03:14, Nirmalya Sengupta
> wrote:
>
> Hello Fabian,
>
> From your reply to this thread:
ink what we will need at some point for this are approximate whatermarks
> which correlate event and ingest time.
>
> I think they have similar concepts in Millwheel/Dataflow.
>
> Cheers,
> Gyula
> On Mon, Nov 30, 2015 at 5:29 PM Aljoscha Krettek <aljos...@apache.org
Hi,
as an addition. I don’t have a solution yet, for the general problem of what
happens when a parallel instance of a source never receives elements. This
watermark business is very tricky...
Cheers,
Aljoscha
> On 30 Nov 2015, at 17:20, Aljoscha Krettek <aljos...@apache.org> wrote
Hi,
the Evictor is very tricky to understand, I’m afraid. What happens when a
Trigger fires is the following:
1. Trigger fires
2. Evictor can remove elements from the window buffer
3. Window function processes the elements that remain in the window buffer
The tricky thing here is that the
Hi,
the problem here is that the system needs to be aware that Watermarks will be
flowing through the system. You can either do this via:
env.setStreamTimeCharacteristic(EventTime);
or:
env.getConfig().enableTimestamps();
I know, not very intuitive.
Cheers,
Aljoscha
> On 30 Nov 2015, at
Hi,
I’ll try to go into a bit more detail about the windows here. What you can do
is this:
DataStream> input = … // fields are (id, sum,
count), where count is initialized to 1, similar to word count
DataStream> counts = input
Hi Niels,
do the records that arrive from Kafka already have the session ID or do you
want to assign them inside your Flink job based on the idle timeout?
For the rest of your problems you should be able to get by with what Flink
provides:
The triggering can be done using a custom Trigger that
Hi Anwar,
what Fabian wrote is completely right. I just want to give the reasoning for
why the CountTrigger behaves as it does. The idea was to have Triggers that
clearly focus on one thing and then at some point add combination triggers. For
example, an OrTrigger that triggers if either of
that
> events of different sessions can not intermingled. Isn't the idea of the
> keyBy expression below exactly not to have intermingled sessions by
> first grouping by sesion-ids?
>
> Cheers and thank you,
>
> Konstantin
>
> On 17.10.2015 14:39, Aljoscha Krettek wrote:
&g
Hi,
I wrote a little example that could be what you are looking for:
https://github.com/dataArtisans/query-window-example
It basically implements a window operator with a modifiable window size that
also allows querying the current accumulated window contents using a second
input stream.
o, if I understood you correctly, it is the opposite of what you were
> expecting?!
>
> Cheers,
>
> Konstantin
>
>
> On 17.11.2015 11:32, Aljoscha Krettek wrote:
>> Hi,
>> actually, the bug is more subtle. Normally, it is not a problem that the
>> Times
but have an obstacle with
> org.apache.flink.runtime.state.AbstractStateBackend class. Where to get it? I
> guess it stored in your local branch only. Would you please to send me
> patches for public branch or share the branch with me?
>
> Best regards,
> Roman
>
>
> 20
Yes, that’s what I meant.
> On 19 Nov 2015, at 12:08, Till Rohrmann <trohrm...@apache.org> wrote:
>
> You mean an additional start-up parameter for the `start-cluster.sh` script
> for the HA case? That could work.
>
> On Thu, Nov 19, 2015 at 11:54 AM, Aljoscha Kret
if
> necessary. However, the user is more likely to lose his state when shutting
> down the cluster.
>
> On Thu, Nov 19, 2015 at 10:55 AM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> I agree with Aljoscha. Many companies install Flink (and its config) in a
&g
Hi,
@Konstantin: are you using event-time or processing-time windows. If you are
using processing time, then you can only do it the way Fabian suggested. The
problem here is, however, that the .keyBy().reduce() combination would emit a
new maximum for every element that arrives there and you
Hi,
could you please try adding the lucene-core-4.10.4.jar file to your lib folder
of Flink.
(https://repo1.maven.org/maven2/org/apache/lucene/lucene-core/4.10.4/)
Elasticsearch uses dependency injection to resolve the classes and maven is not
really aware of this.
Also you could try adding
Hi,
could you try pulling the problem apart, i.e. determine at which point in
the pipeline you have duplicate data. Is it after the sources or in the
CoFlatMap or the Map after the reduce, for example?
Cheers,
Aljoscha
On Wed, 1 Jun 2016 at 17:11 Biplob Biswas wrote:
Hi James,
the TypeInformation must be available at the call site, not in the case
class definition. In your WindowFunction you are using a TestGen[String] so
it should suffice to add this line at some point before the call to apply():
implicit val testGenType =
Hi Jack,
right now this is not possible except when writing a custom operator. We
are working on support for a time-to-live setting on states, this should
solve your problem.
For writing a custom operator, check out DataStream.transform() and
StreamMap, which is the operator implementation for
The problem could be that open() is not called with a proper Configuration
object in streaming mode.
On Sun, 5 Jun 2016 at 19:33 Stephan Ewen wrote:
> Hi David!
>
> You are using the JDBC format that was written for the batch API in the
> streaming API.
>
> While that should
Hi,
I think the problem is that the case class has generic parameters. You can
try making TypeInformation for those parameters implicitly available at the
call site, i.e:
implicit val typeT = createTypeInformation[T] // where you insert the
specific type for T and do the same for the other
That's nice. Can you try it on your cluster with an added "reset" call on
the buffer?
On Tue, 7 Jun 2016 at 14:35 Flavio Pompermaier wrote:
> After "some" digging into this problem I'm quite convinced that the
> problem is caused by a missing reset of the buffer during the
Hi,
I'm afraid you're running into a bug into the special processing-time
window operator. A suggested workaround would be to switch to
characteristic IngestionTime and use TumblingEventTimeWindows.
I also open a Jira issue for the bug so that we can keep track of it:
on. So I applied a MapFunction to DataSet and put a
>> dummy value in the join field/key where it was null. Then In the join
>> function, I change it back to null.
>>
>> Best,
>> Tarandeep
>>
>> On Thu, Jun 9, 2016 at 7:06 AM, Aljoscha Krettek <aljos...@apache.o
Hi Josh,
I'll have to think a bit about that one. Once I have something I'll get
back to you.
Best,
Aljoscha
On Wed, 8 Jun 2016 at 21:47 Josh wrote:
> This is just a question about a potential use case for Flink:
>
> I have a Flink job which receives tuples with an event id
Hi,
the problem is that the KeySelector is an anonymous inner class and as such
as a reference to the outer RecordFilterer object. Normally, this would be
rectified by the closure cleaner but the cleaner is not used in
CoGroup.where(). I'm afraid this is a bug.
Best,
Aljoscha
On Thu, 9 Jun 2016
Hi,
right now, the way to do it is by using a custom operator, i.e. a
OneInputStreamOperator. There you have the low-level control and can set
timers based on watermarks or processing time. You can, for example look at
StreamMap for a very simple operator or WindowOperator for an operator that
Hi Igor,
you might be interested in this doc about how we want to improve handling
of late data and some other things in the windowing API:
https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing
I've sent it around several times but you can never know
5, 2016 at 5:16 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Hi,
>> there is no functionality to have asynchronous calls in user functions in
>> Flink.
>>
>> The asynchronous action feature in Spark is also not meant for such
>
Hi Bart,
yup, this is a bug. AFAIK it is now known, would you like to open the Jira
issue for it? If not, I can also open one.
The problem is in the interaction of how chaining works in the streaming
API with object reuse. As you said, with how it is implemented it serially
calls the two map
server joined and rebalance
>> the processing? How is it done if I have a keyed stream and some custom
>> ValueState variables?
>>
>> Cheers,
>> Gosia
>>
>> 2016-05-25 11:32 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>:
>>
>>> Hi Gosia,
&
Hi Prateek,
this is a deprecated setting that affects how memory is allocated in Flink
Worker nodes. Since at least 1.0.0 the default behavior is the behavior
that would previously be requested by the --yst flag.
In short, you don't need the flag when running streaming programs. (Except
Robert
Hi,
while I think it would be possible to do it by creating a "meta sink" that
contains several RollingSinks I think the approach of integrating it into
the current RollinkSink is better.
I think it's mostly a question of style and architectural purity but also
of resource consumption and
Hi,
I'm afraid this is currently a shortcoming in the API. There is this open
Jira issue to track it: https://issues.apache.org/jira/browse/FLINK-3869.
We can't fix it before Flink 2.0, though, because we have to keep the API
stable on the Flink 1.x release line.
Cheers,
Aljoscha
On Mon, 13 Jun
Yes, this is correct. Right now we're basically using .hashCode() for
keying. (Which can be problematic in some cases.)
Beam, for example, clearly specifies that the encoded form of a value
should be used for all comparisons/hashing. This is more well defined but
can lead to slow performance in
1 - 100 of 1095 matches
Mail list logo