Re: sharebuffer prune code

2018-05-28 Thread Shailesh Jain
Thanks Dawid. I'll rebase against your branch and test it. Would revert
back if I hit the issue again.

Regards,
Shailesh

On Sun, May 27, 2018 at 5:54 PM, Dawid Wysakowicz 
wrote:

> The logic for SharedBuffer and in result for prunning will be changed in
> FLINK-9418 [1]. We plan to make it backwards compatible. There is
> already open PR[2] (in review), you can check if the problem persists.
>
> Regards,
> Dawid
>
> [1] https://issues.apache.org/jira/browse/FLINK-9418
> [2] https://github.com/apache/flink/pull/6059
>
>
> On 24.05.2018 12:21, aitozi wrote:
> > Can you explain it more explictly?
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>
>
>


Re: latency critical job

2018-05-28 Thread makeyang
Rong Rong:
my flink version is 1.4.2
since we are using the docker env which is sharing disk-io, based on our
observation, disk-io spike cased by other process in the same physical
machine can lead to long time operator processing.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: latency critical job

2018-05-28 Thread makeyang
Timo:
thanks for u suggestion



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: [ANNOUNCE] Apache Flink 1.5.0 release

2018-05-28 Thread Bowen Li
Congratulations, everyone!

On Mon, May 28, 2018 at 1:15 AM, Fabian Hueske  wrote:

> Thank you Till for serving as a release manager for Flink 1.5!
>
> 2018-05-25 19:46 GMT+02:00 Till Rohrmann :
>
> > Quick update: I had to update the date of the release blog post which
> also
> > changed the URL. It can now be found here:
> >
> > http://flink.apache.org/news/2018/05/25/release-1.5.0.html
> >
> > Cheers,
> > Till
> >
> > On Fri, May 25, 2018 at 7:03 PM, Hao Sun  wrote:
> >
> > > This is great. Thanks for the effort to get this out!
> > >
> > > On Fri, May 25, 2018 at 9:47 AM Till Rohrmann 
> > > wrote:
> > >
> > >> The Apache Flink community is very happy to announce the release of
> > >> Apache Flink 1.5.0.
> > >>
> > >> Apache Flink® is an open-source stream processing framework for
> > >> distributed, high-performing, always-available, and accurate data
> > streaming
> > >> applications.
> > >>
> > >> The release is available for download at:
> > >>
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the new
> > >> features and improvements and the list of contributors:
> > >>
> > >> http://flink.apache.org/news/2018/05/18/release-1.5.0.html
> > >>
> > >> The full release notes are available in Jira:
> > >>
> > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > >> projectId=12315522=12341764
> > >>
> > >> I would like to thank all contributors for working very hard on making
> > >> this release a success!
> > >>
> > >> Best,
> > >> Till
> > >>
> > >
> >
>


Re: Odd job failure

2018-05-28 Thread Elias Levy
On Mon, May 28, 2018 at 1:48 AM, Piotr Nowojski 
wrote:

> Most likely suspect is the standard java problem of some dependency
> convergence issue. Please check if you are not pulling in multiple Kafka
> versions into your class path. Especially your job shouldn’t pull any Kafka
> library except of the one that comes from flnk-connector-kafka-0.11 (which
> is 0.11.0.2).
>

Alas, that is not the case.  The job correctly includes kafka-clients:
0.11.0.2:

[warn] Found version conflict(s) in library dependencies; some are
suspected to be binary incompatible:
[warn]
[warn]  * org.apache.kafka:kafka-clients:0.11.0.2 is selected over
{0.10.2.1, 0.9.0.1}
[warn]  +- org.apache.flink:flink-connector-kafka-0.11_2.11:1.4.2
(depends on 0.11.0.2)
[warn]  +- org.apache.flink:flink-connector-kafka-0.9_2.11:1.4.2
(depends on 0.10.2.1)
[warn]  +- org.apache.flink:flink-connector-kafka-0.10_2.11:1.4.2
(depends on 0.10.2.1)
[warn]




> Please also consider upgrading your cluster at least to Kafka 0.11.0.2.
> Kafka 0.11.0.0 was pretty unstable release, and we do not support it. Our
> connector depend on Kafka 0.11.0.2 client and while I don’t assume that
> there is some incompatibility between 0.11.0.0 cluster and 0.11.0.2 client,
> it definitely wouldn’t hurt to upgrade the cluster.
>

Thanks for the tip.  That said, this error should be unrelated to the
version of the cluster.


Debugging window processing: can I output window start/end times, prove correctness?

2018-05-28 Thread chrisr123

I am learning the tumbling and rolling window API and I was wondering what
API calls people use
to determine if their events are being assigned to windows as they expect?
For example, is there
a way to print out the window start and and times for windows as they are
being processed, and what
events are contained in the window before an operation is performed?

I have a simple socket-reading stream application that uses Event Time. I
have written a test "producer"
app that generates events with timestamps, and sometimes I deliberately
force an event to be out of order
by using sleep(), etc.  However, I want to convince myself that events are
being included in sliding windows as
I expect. The main part of the app is this snippet:

DataStream> sensorStream = env
.socketTextStream(parms.get("host"), 
parms.getInt("port"))
.assignTimestampsAndWatermarks(new 
BoundedOutOfOrdernessGenerator())
.map(new RawSensorMapper());

// Compute:
// Sliding 10-second window every 5 seconds
DataStream> counts = sensorStream
.keyBy(0)
.timeWindow(Time.seconds(10), Time.seconds(5))
.sum(1);

I just map each event to Tuple2

How can I code this to "see" the start and end times of each sliding window
as it's generated so I can see why certain events are or are not included
within a given window? 

The events come in looking like this. (Simulating data coming from a
sensor).  I use AssignerWithPeriodicWatermarks subclass to parse the event
time timestamp from each record. The "sensor1" field is used as the key.
Watermark is just System.currentTimeMillis() at the moment. 

1527516714364 sensor1 28.33 
1527516714365 sensor1 311.42
1527516717365 sensor1 12.33 

Any tips appreciated.
Thanks






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


[ANNOUNCE] Final Reminder - Call for Presentations - Flink Forward Berlin 2018

2018-05-28 Thread Fabian Hueske
Hi all,

This is the final reminder for the call for presentations for Flink Forward
Berlin 2018.

*The call closes in 7 days* (June 4, 11:59 PM CEST).

Submit your talk and get to present your Apache Flink and stream processing
ideas, experiences, use cases, and best practices on September 4-5 in
Berlin.

https://flink-forward.org/call-for-presentations-submit-talk/

Best regards,
Fabian

(PC Chair for Flink Forward Berlin 2018)


Re: Job execution fails when parallelism is increased beyond 1

2018-05-28 Thread HarshithBolar
I'm using Flink 1.4.2



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Job execution fails when parallelism is increased beyond 1

2018-05-28 Thread Chesnay Schepler

Could you tell us which Flink version you are using?

On 28.05.2018 14:01, HarshithBolar wrote:

I'm submitting a Flink job to a cluster that has three Task Managers via the
Flink dashboard. When I set `Parallelism` to 1 (which is default),
everything runs as expected. But when I increase `Parallelism` to anything
more than 1, the job fails with the exception,

 /java.io.FileNotFoundException:
/tmp/flink-io-f91d7812-a411-4b58-a891-c9be1cde91da/08caeac37d6b8351daf6a3eb123a473106c56381b101f3e5d9704df9f78406a2.0.buffer
(No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at
org.apache.flink.streaming.runtime.io.BufferSpiller.createSpillingChannel(BufferSpiller.java:259)
at
org.apache.flink.streaming.runtime.io.BufferSpiller.(BufferSpiller.java:120)
at
org.apache.flink.streaming.runtime.io.BarrierBuffer.(BarrierBuffer.java:149)
at
org.apache.flink.streaming.runtime.io.StreamInputProcessor.(StreamInputProcessor.java:129)
at
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:56)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:745)/

I've enabled checkpoints on my Flink job with "Exactly Once" retry strategy
every 10 seconds. Here's my checkpoint configuration,

/ env.setStateBackend(new
RocksDBStateBackend(props.getFlinkCheckpointDataUri(), true));
 env.enableCheckpointing(1, EXACTLY_ONCE); //10 seconds
 CheckpointConfig config = env.getCheckpointConfig();

config.enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);/


Should I configure something else other than simply entering `2` while
submitting the job in the dashboard?

EDIT: If I disable checkpoints and upload the job, it runs without any
error.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





ML in Streaming API

2018-05-28 Thread Thodoris Bitsakis
Hello and thanks for the subscription!

I am using Streaming API to develop a ML algorithm and i would like your
opinions regarding the following issues: 

1) The input is read from a big size file with d-dimensional points, and i
want to perform a parallel count window. In each parallel count window, i
want to perform a function that maintains a list of buckets in memory in
order to be checkpointed(exploiting state feature) . For every parallel
count window some(0*)! of the buckets will be updated or deleted.

My thoughts: 
As there is no logical key and there is no parallel countWinowAll, the
correct way is to perform a parallel flatmap operator? But then i assume
that i must implement a custom buffering of input data using ListState to
implement the countwindow? Also i could use again another ListState to
maintain the list of buckets in memory. But then every time i want to update
a specific buffer of the listState i must clear the ListState and reinsert
all buffers again(not Optimal for big buffers)? 

The other way is to use a deterministic pseudo-key and use
keyby.countwindow. The number of different keys will be the number of
parallelism. In order to update some of the buckets for every key (parallel
instance) i am considering the use of mapState(UK=Bucket index,UV= Bucket
elements). In that case i think the use of pseudo-key is not the best
technique? and also i am going to use unnecessary data shuffle (keyby)?

What is the best way? Or is there another way to solve the previous 
issues?

2)When there is no more input data (EOF) or when a user “asks” for a
part-evaluation of the ML algorithm through an external source, i want to
collect the list of buckets from the parallel operator instances to another
reduce-style operator with parallelism 1 to find the final list (classic
scenario of map-reduce). When there is no user query or EOF, I don't want
the parallel operator instances to emit anything.

My thoughts: I don't know how the user will “ask” the flink parallel
operator instances  (parallel count window) to emit their results to the
downstream operator of parallelism 1. I don't know how the operator
instances will know that the file ended (if i use keyby.countwindow i can
use a custom trigger with timer? Else in flatmap case? )

The concept is that the list of buckets in each parallel operator instance
is a local Sketch and i want to collect the local sketches when the user
“asks” to calculate the final Sketch.

Any thoughts are very much appreciated!!! Thank you in advance.

 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Writing Table API results to a csv file

2018-05-28 Thread Chris Ruegger
Fabian, Jorn:

Yes, that was indeed it.
When I added the env.execute("MyApp") it worked.
Thank you for your help.
-Chris




On Mon, May 28, 2018 at 5:03 AM, Fabian Hueske  wrote:

> Hi,
>
> Jörn is probably right.
>
> In contrast to print(), which immediately triggers an execution,
> writeToSink() just appends a sink operator and requires to explicitly
> trigger the execution.
>
> The INFO messages of the TypeExtractor are "just" telling you, that Row
> cannot be used as a POJO type, but that's fine here.
>
> Best, Fabian
>
> 2018-05-27 19:51 GMT+02:00 Jörn Franke :
>
>> Do you have the complete source?
>>
>> I am missing a env.execute at the end
>>
>> > On 27. May 2018, at 18:55, chrisr123  wrote:
>> >
>> > I'm using Flink 1.4.0
>> >
>> > I'm trying to save the results of a Table API query to a CSV file, but
>> I'm
>> > getting an error.
>> > Here are the details:
>> >
>> > My Input file looks like this:
>> > id,species,color,weight,name
>> > 311,canine,golden,75,dog1
>> > 312,canine,brown,22,dog2
>> > 313,feline,gray,8,cat1
>> >
>> > I run a query on this to select canines only, and I want to save this
>> to a
>> > csv file:
>> >
>> >ExecutionEnvironment env =
>> > ExecutionEnvironment.getExecutionEnvironment();
>> >BatchTableEnvironment tableEnv =
>> > TableEnvironment.getTableEnvironment(env);
>> >
>> >String inputPath = "location-of-source-file";
>> >CsvTableSource petsTableSource = CsvTableSource.builder()
>> >.path(inputPath)
>> >.ignoreFirstLine()
>> >.fieldDelimiter(",")
>> >.field("id", Types.INT())
>> >.field("species", Types.STRING())
>> >.field("color", Types.STRING())
>> >.field("weight", Types.DOUBLE())
>> >.field("name", Types.STRING())
>> >.build();
>> >
>> >// Register our table source
>> >tableEnv.registerTableSource("pets", petsTableSource);
>> >Table pets = tableEnv.scan("pets");
>> >
>> >Table counts = pets
>> >.groupBy("species")
>> >.select("species, species.count as count")
>> >.filter("species === 'canine'");
>> >
>> >DataSet result = tableEnv.toDataSet(counts, Row.class);
>> >result.print();
>> >
>> >// Write Results to File
>> >TableSink sink = new 
>> > CsvTableSink("/home/hadoop/output/pets",
>> ",");
>> >counts.writeToSink(sink);
>> >
>> > When I run this, I get the output from the result.print() call as this:
>> >
>> > canine,2
>> >
>> > but I do not see any results written
>> > to the file, and I see the error below.
>> > How can I save the results I'm seeing in stdout to a CSV file?
>> > Thanks!
>> >
>> >
>> >
>> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
>> > (TypeExtractor.java:1873) - class org.apache.flink.types.Row does not
>> > contain a getter for field fields
>> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
>> > (TypeExtractor.java:1876) - class org.apache.flink.types.Row does not
>> > contain a setter for field fields
>> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
>> > (TypeExtractor.java:1911) - class org.apache.flink.types.Row is not a
>> valid
>> > POJO type because not all fields are valid POJO fields.
>> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
>> > (TypeExtractor.java:1873) - class org.apache.flink.types.Row does not
>> > contain a getter for field fields
>> > 2018-05-27 12:49:44,412 INFO  [main] typeutils.TypeExtractor
>> > (TypeExtractor.java:1876) - class org.apache.flink.types.Row does not
>> > contain a setter for field fields
>> > 2018-05-27 12:49:44,412 INFO  [main] typeutils.TypeExtractor
>> > (TypeExtractor.java:1911) - class org.apache.flink.types.Row is not a
>> valid
>> > POJO type because not all fields are valid POJO fields.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/
>>
>
>


-- 

Simplicity is the ultimate sophistication
--Leonardo DaVinci

www.rueggerconsultingllc.com


Re: Writing Table API results to a csv file

2018-05-28 Thread Fabian Hueske
Hi,

Jörn is probably right.

In contrast to print(), which immediately triggers an execution,
writeToSink() just appends a sink operator and requires to explicitly
trigger the execution.

The INFO messages of the TypeExtractor are "just" telling you, that Row
cannot be used as a POJO type, but that's fine here.

Best, Fabian

2018-05-27 19:51 GMT+02:00 Jörn Franke :

> Do you have the complete source?
>
> I am missing a env.execute at the end
>
> > On 27. May 2018, at 18:55, chrisr123  wrote:
> >
> > I'm using Flink 1.4.0
> >
> > I'm trying to save the results of a Table API query to a CSV file, but
> I'm
> > getting an error.
> > Here are the details:
> >
> > My Input file looks like this:
> > id,species,color,weight,name
> > 311,canine,golden,75,dog1
> > 312,canine,brown,22,dog2
> > 313,feline,gray,8,cat1
> >
> > I run a query on this to select canines only, and I want to save this to
> a
> > csv file:
> >
> >ExecutionEnvironment env =
> > ExecutionEnvironment.getExecutionEnvironment();
> >BatchTableEnvironment tableEnv =
> > TableEnvironment.getTableEnvironment(env);
> >
> >String inputPath = "location-of-source-file";
> >CsvTableSource petsTableSource = CsvTableSource.builder()
> >.path(inputPath)
> >.ignoreFirstLine()
> >.fieldDelimiter(",")
> >.field("id", Types.INT())
> >.field("species", Types.STRING())
> >.field("color", Types.STRING())
> >.field("weight", Types.DOUBLE())
> >.field("name", Types.STRING())
> >.build();
> >
> >// Register our table source
> >tableEnv.registerTableSource("pets", petsTableSource);
> >Table pets = tableEnv.scan("pets");
> >
> >Table counts = pets
> >.groupBy("species")
> >.select("species, species.count as count")
> >.filter("species === 'canine'");
> >
> >DataSet result = tableEnv.toDataSet(counts, Row.class);
> >result.print();
> >
> >// Write Results to File
> >TableSink sink = new 
> > CsvTableSink("/home/hadoop/output/pets",
> ",");
> >counts.writeToSink(sink);
> >
> > When I run this, I get the output from the result.print() call as this:
> >
> > canine,2
> >
> > but I do not see any results written
> > to the file, and I see the error below.
> > How can I save the results I'm seeing in stdout to a CSV file?
> > Thanks!
> >
> >
> >
> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
> > (TypeExtractor.java:1873) - class org.apache.flink.types.Row does not
> > contain a getter for field fields
> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
> > (TypeExtractor.java:1876) - class org.apache.flink.types.Row does not
> > contain a setter for field fields
> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
> > (TypeExtractor.java:1911) - class org.apache.flink.types.Row is not a
> valid
> > POJO type because not all fields are valid POJO fields.
> > 2018-05-27 12:49:44,411 INFO  [main] typeutils.TypeExtractor
> > (TypeExtractor.java:1873) - class org.apache.flink.types.Row does not
> > contain a getter for field fields
> > 2018-05-27 12:49:44,412 INFO  [main] typeutils.TypeExtractor
> > (TypeExtractor.java:1876) - class org.apache.flink.types.Row does not
> > contain a setter for field fields
> > 2018-05-27 12:49:44,412 INFO  [main] typeutils.TypeExtractor
> > (TypeExtractor.java:1911) - class org.apache.flink.types.Row is not a
> valid
> > POJO type because not all fields are valid POJO fields.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>


Re: Clarification in TumblingProcessing TimeWindow Documentation

2018-05-28 Thread Fabian Hueske
I agree, this should be fixed.

Thanks for noticing, Dhruv.
Would you mind creating a JIRA for this?

Thank you,
Fabian

2018-05-28 8:39 GMT+02:00 Bowen Li :

> Hi Dhruv,
>
> I can see it's confusing, and it does seem the comment should be improved.
> You can find concrete explanation of tumbling window and relative arguments
> at https://ci.apache.org/projects/flink/flink-docs-
> master/dev/stream/operators/windows.html#tumbling-windows
>
> Feel free to open a PR with better comment.
>
> Thanks,
> Bowen
>
>
>
> On Sun, May 27, 2018 at 12:43 PM, Dhruv Kumar 
> wrote:
>
>> Hi
>>
>> I was looking at TumblingProcessingTimeWindows.java
>> 
>>  and
>> was a bit confused with the documentation at the start of this class. It
>> says the following:
>>
>> /**
>> * A {@link WindowAssigner} that windows elements into windows based on
>> the current
>> * system time of the machine the operation is running on. Windows cannot
>> overlap.
>> *
>> * For example, in order to window into windows of 1 minute, every 10
>> seconds:
>> *  {@code
>> * DataStream> in = ...;
>> * KeyedStream> keyed = in.keyBy(...);
>> * WindowedStream, String, TimeWindows> windowed =
>> * keyed.window(TumblingProcessingTimeWindows.of(Time.of(1, MINUTES),
>> Time.of(10, SECONDS));
>> * } 
>> */
>> It says one can have tumbling windows of 1 minute, every 10 seconds.
>> Doesn’t this become a sliding window then? The SlidingProcessTimeWindows.
>> java
>> 
>>  has
>> the exact same documentation with just one tiny change (“Windows can
>> possibly overlap”). It seems to me that in the above documentation, the
>> second Time argument of 10 seconds is for providing the window offset (as
>> confirmed here
>> )
>> and not for starting the tumbling window every 10 seconds.
>>
>> Thanks
>>
>>
>> --
>> *Dhruv Kumar*
>> PhD Candidate
>> Department of Computer Science and Engineering
>> University of Minnesota
>> www.dhruvkumar.me
>>
>>
>


Re: Large number of sources in Flink Job

2018-05-28 Thread Fabian Hueske
Hi Chirag,

There have been some issue with very large execution graphs.
You might need to adjust the default configuration and configure larger
Akka buffers and/or timeouts.

Also, 2000 sources means that you run at least 2000 threads at once.

The FileInputFormat (and most of its sub-classes) in Flink 1.5.0 can be
configured to accept multiple directories.
This would be a preferred approach to creating one source per directory.

Best, Fabian

2018-05-28 6:35 GMT+02:00 Chirag Dewan :

> Hi,
>
> I am working on a use case where my Flink job needs to collect data from
> thousands of sources.
>
> As an example, I want to collect data from more than 2000 File
> Directories, process(filter, transform) the data and distribute the
> processed data streams to 200 different directories.
>
> Are there any caveats I should know with such large number of sources,
> also taking into account per operator parallelism?
>
> Regards,
>
> Chirag
>
>


Re: Odd job failure

2018-05-28 Thread Piotr Nowojski
Hi,

I think that’s unlikely to happen. As far as I know, the only way to actually 
unload the classes in JVM is when their class loader is garbage collected, 
which means all the references in the code to it must vanish. In other words, 
it should never happen that class is not found while anyone is still 
referencing it.

Most likely suspect is the standard java problem of some dependency convergence 
issue. Please check if you are not pulling in multiple Kafka versions into your 
class path. Especially your job shouldn’t pull any Kafka library except of the 
one that comes from flnk-connector-kafka-0.11 (which is 0.11.0.2).

Please also consider upgrading your cluster at least to Kafka 0.11.0.2. Kafka 
0.11.0.0 was pretty unstable release, and we do not support it. Our connector 
depend on Kafka 0.11.0.2 client and while I don’t assume that there is some 
incompatibility between 0.11.0.0 cluster and 0.11.0.2 client, it definitely 
wouldn’t hurt to upgrade the cluster.

Piotrek

> On 26 May 2018, at 17:58, Elias Levy  wrote:
> 
> Piotr & Stephan,
> 
> Thanks for the replies.  Apologies for the late response.  I've been 
> traveling for the past month.
> 
> We've not observed this issue (spilling) again, but it is good to know that 
> 1.5 will use back-pressure based alignment.  I think for now we'll leave 
> task.checkpoint.alignment.max-size as is and work towards moving to 1.5 once 
> we confirm it is stable.
> 
> As for the java.lang.NoClassDefFoundError: 
> org/apache/kafka/clients/NetworkClient$1 error.  We see that one constantly 
> when jobs are canceled/restarted/upgraded.  We are using the 
> flink-connector-kafka-0.11 connector against a 0.11.0.0 cluster.  The error 
> indicates to me that the Kafka threads are not being fully shutdown and they 
> are trying to reload the NetworkClient class but failing, maybe because the 
> code is no longer accessible via the class loader or some other reason.  
> 
> It looks like others are observing the same error.  Alexander Smirnov 
> reported it here on the list last month as well.
> 
> 
> On Thu, May 3, 2018 at 1:22 AM, Stephan Ewen  > wrote:
> Hi Elias!
> 
> Concerning the spilling of alignment data to disk:
> 
>   - In 1.4.x , you can set an upper limit via " 
> task.checkpoint.alignment.max-size ". See [1].
>   - In 1.5.x, the default is a back-pressure based alignment, which does not 
> spill any more.
> 
> Best,
> Stephan
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#task-checkpoint-alignment-max-size
>  
> 
> 
> On Wed, May 2, 2018 at 1:37 PM, Piotr Nowojski  > wrote:
> Hi,
> 
> It might be some Kafka issue. 
> 
> From what you described your reasoning seems sound. For some reason TM3 fails 
> and is unable to restart and process any data, thus forcing spilling on 
> checkpoint barriers on TM1 and TM2.
> 
> I don’t know the reason behind java.lang.NoClassDefFoundError: 
> org/apache/kafka/clients/NetworkClient$1 errors, but it doesn’t seem to be 
> important in this case.
> 
> 1. What Kafka version are you using? Have you looked for any known Kafka 
> issues with those symptoms?
> 2. Maybe the easiest thing will be to reformat/reinstall/recreate TM3 AWS 
> image? It might be some system issue.
> 
> Piotrek
> 
>> On 28 Apr 2018, at 01:54, Elias Levy > > wrote:
>> 
>> We had a job on a Flink 1.4.2 cluster with three TMs experience an odd 
>> failure the other day.  It seems that it started as some sort of network 
>> event.  
>> 
>> It began with the 3rd TM starting to warn every 30 seconds about socket 
>> timeouts while sending metrics to DataDog.  This latest for the whole outage.
>> 
>> Twelve minutes later, all TMs reported at nearly the same time that they had 
>> marked the Kafka coordinator as deed ("Marking the coordinator XXX (id: 
>> 2147482640 rack: null) dead for group ZZZ").  The job terminated and the 
>> system attempted to recover it.  Then things got into a weird state.
>> 
>> The following related for six or seven times for a period of about 40 
>> minutes: 
>> TM attempts to restart the job, but only the first and second TMs show signs 
>> of doing so.  
>> The disk begins to fill up on TMs 1 and 2.  
>> TMs 1 & 2 both report java.lang.NoClassDefFoundError: 
>> org/apache/kafka/clients/NetworkClient$1 errors.  These were mentioned on 
>> this list earlier this month.  It is unclear if the are benign.
>> The job dies when the disks finally fills up on 1 and 2.
>> 
>> Looking at the backtrace logged when the disk fills up, I gather that Flink 
>> is buffering data coming from Kafka into one of my operators as a result of 
>> a barrier.  The job has a two input operator, with one input the 

Re: [ANNOUNCE] Apache Flink 1.5.0 release

2018-05-28 Thread Fabian Hueske
Thank you Till for serving as a release manager for Flink 1.5!

2018-05-25 19:46 GMT+02:00 Till Rohrmann :

> Quick update: I had to update the date of the release blog post which also
> changed the URL. It can now be found here:
>
> http://flink.apache.org/news/2018/05/25/release-1.5.0.html
>
> Cheers,
> Till
>
> On Fri, May 25, 2018 at 7:03 PM, Hao Sun  wrote:
>
> > This is great. Thanks for the effort to get this out!
> >
> > On Fri, May 25, 2018 at 9:47 AM Till Rohrmann 
> > wrote:
> >
> >> The Apache Flink community is very happy to announce the release of
> >> Apache Flink 1.5.0.
> >>
> >> Apache Flink® is an open-source stream processing framework for
> >> distributed, high-performing, always-available, and accurate data
> streaming
> >> applications.
> >>
> >> The release is available for download at:
> >>
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the new
> >> features and improvements and the list of contributors:
> >>
> >> http://flink.apache.org/news/2018/05/18/release-1.5.0.html
> >>
> >> The full release notes are available in Jira:
> >>
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> >> projectId=12315522=12341764
> >>
> >> I would like to thank all contributors for working very hard on making
> >> this release a success!
> >>
> >> Best,
> >> Till
> >>
> >
>


Re: Clarification in TumblingProcessing TimeWindow Documentation

2018-05-28 Thread Bowen Li
Hi Dhruv,

I can see it's confusing, and it does seem the comment should be improved.
You can find concrete explanation of tumbling window and relative arguments
at
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#tumbling-windows

Feel free to open a PR with better comment.

Thanks,
Bowen



On Sun, May 27, 2018 at 12:43 PM, Dhruv Kumar  wrote:

> Hi
>
> I was looking at TumblingProcessingTimeWindows.java
> 
>  and
> was a bit confused with the documentation at the start of this class. It
> says the following:
>
> /**
> * A {@link WindowAssigner} that windows elements into windows based on
> the current
> * system time of the machine the operation is running on. Windows cannot
> overlap.
> *
> * For example, in order to window into windows of 1 minute, every 10
> seconds:
> *  {@code
> * DataStream> in = ...;
> * KeyedStream> keyed = in.keyBy(...);
> * WindowedStream, String, TimeWindows> windowed =
> * keyed.window(TumblingProcessingTimeWindows.of(Time.of(1, MINUTES),
> Time.of(10, SECONDS));
> * } 
> */
> It says one can have tumbling windows of 1 minute, every 10 seconds.
> Doesn’t this become a sliding window then? The SlidingProcessTimeWindows.
> java
> 
>  has
> the exact same documentation with just one tiny change (“Windows can
> possibly overlap”). It seems to me that in the above documentation, the
> second Time argument of 10 seconds is for providing the window offset (as
> confirmed here
> )
> and not for starting the tumbling window every 10 seconds.
>
> Thanks
>
>
> --
> *Dhruv Kumar*
> PhD Candidate
> Department of Computer Science and Engineering
> University of Minnesota
> www.dhruvkumar.me
>
>