Re: Sending watermarks into Kafka

2021-12-21 Thread Matthias J. Sax
Your high level layout make sense. However, I think there are a few 
problems doing it with Flink:


(1) How to encode the watermark? It could break downstream consumers 
that don't know what to do with them (eg, crash on deserialization)? 
There is no guarantee that only a downstream Flink job consumes the data 
(nor that the downstream Flink was upgraded to understand those input 
watermarks).


Using a control message, the downstream KafkaConsumer would 
"filter" control messages / watermarks automatically, and user's would 
opt-in explicitly thus providing a safe and backward compatible upgrade 
path.



(2) About all the metadata: it will be hard for Flink to track all those 
things, but it would be simpler to push it into the storage layer IMHO. 
For example, if Flink does dynamically scaling, it adds a new producer 
to the group and Kafka takes care of the rest. Thus, Flink only needs to 
provide the actual watermark timestamp.


On particular problem is error handling scenario: what happens if a 
producer fails and one downstream watermarks is missing? What happens if 
Flink thinks a worker is dead and replaces it with a different one, but 
the worker is actually a zombie and might still write watermarks into 
the topic? Flink cannot fence off the zombie. -- Pushing it into Kafka 
allows to handle those cases much easier IMHO.



(3) About ordering: Kafka provides strict ordering guarantees per 
partitions. Thus, there is no problem here. Of course, if there are 
multiple producers writing into the same partition, you get interleaved 
writes (that's why you need to know how many producer you got, to be 
able to reason about it downstream).



Hope this helps.

-Matthias




On 12/21/21 5:13 AM, Niels Basjes wrote:

Hi,

Like I said I've only just started thinking about how this can be
implemented (I'm currently still lacking a lot of knowledge).
So at this point I do not yet see why solving this in the transport (like
Kafka) is easier than solving it in the processing engine (like Flink).
In the normal scenarios we have today all watermarks are (re)created in the
processing engine so instinctively I would expect that to be the
"right place".

Also as far as I can see right now in order to make this happen the
watermarks must all be annotated with things like the applicationId (to
handle multiple producers), the timestamp (duh), the taskId and the total
number of tasks in the producing system: So the producers or the broker
must attach this information to the watermarks.
It should also be able to handle dynamic scaling of producing applications
and handling the entering and leaving of producers into a topic is also a
thing to consider.
[Reading this back; is this the reason for it to be easier in the
transport?]

I do realize that even if this is implemented in the processing engine some
constraints may be needed to allow this to work: Like having some kind of
ordering guarantees per partition in a topic.

Do you guys know of any article/blog/paper/mail discussion/... that
describes/discusses this?

Niels

On Mon, Dec 20, 2021 at 4:35 PM Matthias J. Sax 
wrote:


I think this problem should be tackled inside Kafka, not Flink.

Kafka already has internal control messages to write transaction
markers. Those could be extended to carry watermark information. It
would be best to generalize those as "user control messages" and
watermarks could just be one application.

In addition, we might need something link a "producer group" to track
how many producers are writing into a partition: this would allow to
inform downstream consumer how many different watermarks they need to
track.

It's not an easy problem to solve, but without integrating with the
storage layer, but trying to solve it at the processing layer, it's even
harder.

-Matthias

On 12/20/21 01:57, Niels Basjes wrote:

I'm reading the Pulsar PIP and noticed another thing to take into

account:

multiple applications (with each a different parallelism) that all write
into the same topic.

On Mon, 20 Dec 2021, 10:45 Niels Basjes,  wrote:


Hi Till,

This morning I also realized what you call an 'effective watermark' is
indeed what is needed.
I'm going to read up on what Pulsar has planned.

What I realized is that the consuming application must be aware of the
parallelism of the producing application, which is independent of the
partitions in the intermediate transport.

Assume I produce in parallel 2 and have 5 kafka partition which I then
read in parallel 3; then in the consuming (parallel 3) application I

must

wait for watermarks from each original input before I can continue:

which

is 2
Also we must assume that those watermarks are created at different
timestamps.
So my current assessment is that the watermark records must include at
least the timestamp, the number of the thread for this watermark and the
total number of threads .

Niels


On Mon, Dec 20, 2021 at 10:10 AM Till Rohrmann 

Re: Sending watermarks into Kafka

2021-12-20 Thread Matthias J. Sax

I think this problem should be tackled inside Kafka, not Flink.

Kafka already has internal control messages to write transaction 
markers. Those could be extended to carry watermark information. It 
would be best to generalize those as "user control messages" and 
watermarks could just be one application.


In addition, we might need something link a "producer group" to track 
how many producers are writing into a partition: this would allow to 
inform downstream consumer how many different watermarks they need to track.


It's not an easy problem to solve, but without integrating with the 
storage layer, but trying to solve it at the processing layer, it's even 
harder.


-Matthias

On 12/20/21 01:57, Niels Basjes wrote:

I'm reading the Pulsar PIP and noticed another thing to take into account:
multiple applications (with each a different parallelism) that all write
into the same topic.

On Mon, 20 Dec 2021, 10:45 Niels Basjes,  wrote:


Hi Till,

This morning I also realized what you call an 'effective watermark' is
indeed what is needed.
I'm going to read up on what Pulsar has planned.

What I realized is that the consuming application must be aware of the
parallelism of the producing application, which is independent of the
partitions in the intermediate transport.

Assume I produce in parallel 2 and have 5 kafka partition which I then
read in parallel 3; then in the consuming (parallel 3) application I must
wait for watermarks from each original input before I can continue: which
is 2
Also we must assume that those watermarks are created at different
timestamps.
So my current assessment is that the watermark records must include at
least the timestamp, the number of the thread for this watermark and the
total number of threads .

Niels


On Mon, Dec 20, 2021 at 10:10 AM Till Rohrmann 
wrote:


Hi Niels,

if you have multiple inputs going into a single Kafka partition then you
have to calculate the effective watermark by looking at the min watermark
from all inputs. You could insert a Flink operator that takes care of it
and then writes to a set of partitions in 1:n relationship. Alternatively,
you could take a look at Pulsar that wants to support this functionality
out of the box [1].

[1] https://github.com/apache/pulsar/issues/12267

Cheers,
Till

On Sun, Dec 19, 2021 at 4:46 PM Niels Basjes  wrote:


Hi,

About a year ago I spoke at the Flink Forward conference (
https://www.youtube.com/watch?v=wqRDyrE3dwg ) about handling

development

problems regarding streaming applications and handling the lack of

events

in a stream.
Something I spoke about towards the end of this talk was the idea to

ship

the watermarks of a Flink topology into the intermediate transport

between

applications so you wouldn't need to recreate them.

At that time it was just an idea, today I'm actually trying to build

that

and see if this idea is actually possible.

So the class of applications I work on usually do a keyBy on something

like

a SessionId, SensorId or IP address.
In low traffic scenarios this means that in Kafka some partitions are
completely idle which makes Windows/GroupBy type operations impossible

(in

my talk I explain it a lot better).

I have a test setup right now to play around with this and I'm running

into

a bit of a conceptual hurdle for which I'm looking for help.

My goal is to ship the watermarks from within a topology into Kafka and
then let a follow up application extract those watermarks again and

simply

continue.
The new SinkWriter interface has a void writeWatermark(Watermark
watermark) method
that seems intended for this kind of thing.
The basic operations like writing a watermark into Kafka, reading it

again

and then recreating the watermark again works in my test setup (very

messy

code but it works).

My hurdle has to do with the combination of
- different parallelism numbers between Flink and Kafka (how do I ship 2
watermarks into 3 partitions)
- the fact that if you do a keyBy (both in Flink and Kafka) there is a
likely mismatch between the Flink 'partition' and the Kafka `partition`.
- processing speed differences between various threads (like session "A"
needs more CPU cycles/time/processing than session "B") will lead to
skewing of the progression between them.
- watermarks in separate threads in a single Flink topology are not
synchronized (they cannot and should not be).

Has anyone any pointers on possible ways to handle this?

Right now my only idea is to ship the watermark into all partitions (as
they do not have a key!) and let the consuming application determine the
"real watermark" based on the mix of watermarks coming in from the

upstream

threads.

All suggestions and ideas are appreciated.

--
Best regards / Met vriendelijke groeten,

Niels Basjes






--
Best regards / Met vriendelijke groeten,

Niels Basjes





Re: [ANNOUNCE] New PMC member: Arvid Heise

2021-06-16 Thread Matthias J. Sax
Congrats!

On 6/16/21 6:06 AM, Leonard Xu wrote:
> Congratulations, Arvid!
> 
> 
>> 在 2021年6月16日,20:08,Till Rohrmann  写道:
>>
>> Congratulations, Arvid!
>>
>> Cheers,
>> Till
>>
>> On Wed, Jun 16, 2021 at 1:47 PM JING ZHANG  wrote:
>>
>>> Congratulations, Arvid!
>>>
>>> Nicholas Jiang  于2021年6月16日周三 下午7:25写道:
>>>
 Congratulations, Arvid!



 --
 Sent from:
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

>>>
> 


Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-05 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Seems, this will be fixed today:

https://twitter.com/natfriedman/status/1235613840659767298?s=19


- -Matthias

On 3/5/20 8:37 AM, Stephan Ewen wrote:
> It looks like this feature still messes up email addresses, for
> example if you do a "git log | grep noreply" in the repo.
>
> Don't most PRs consist anyways of multiple commits where we want
> to preserve "refactor" and "feature" differentiation in the
> history, rather than squash everything?
>
> On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski 
> wrote:
>
>> Hi,
>>
>> If it’s really not preserving ownership (I didn’t notice the
>> problem before), +1 for removing “squash and merge”.
>>
>> However -1 for removing “rebase and merge”. I didn’t see any
>> issues with it and I’m using it constantly.
>>
>> Piotrek
>>
>>> On 5 Mar 2020, at 16:40, Jark Wu  wrote:
>>>
>>> Hi all,
>>>
>>> Thanks for the feedbacks. But I want to clarify the motivation
>>> to disable "Squash and merge" is just because of the
>>> regression/bug of the missing author information. If GitHub
>>> fixes this later, I think it makes sense to bring this button
>>> back.
>>>
>>> Hi Stephan & Zhijiang,
>>>
>>> To be honest, I love the "Squash and merge" button and often
>>> use it. It saves me a lot of time to merge PRs, because pulling
>>> and pushing commits
>> in
>>> China is very unstable.
>>>
>>> I don't think the potential problems you mentioned is a
>>> "problem". For "Squash and merge", - "Merge commits": there is
>>> no "merge" commits, because GitHub will
>> squash
>>> commits and rebase the commit and then add to the master
>>> branch. - "This closes #" line to track back: when you
>>> click "Squash and merge", it allows you to edit the title and
>>> description, so you can add "This closes #" message to the
>>> description the same with in the local git. Besides, GitHub
>>> automatically append "(#)" after the
>> title,
>>> which is also helpful to track.
>>>
>>> Best, Jark
>>>
>>> On Thu, 5 Mar 2020 at 23:36, Robert Metzger
>>>  wrote:
>>>
 +1 for disabling this feature for now.

 Thanks a lot for spotting this!

 On Thu, Mar 5, 2020 at 3:54 PM Zhijiang
  wrote:

> +1 for disabling "Squash and merge" if feasible to do
> that.
>
> The possible benefit to use this button is for saving some
> efforts to squash some intermediate "[fixup]" commits
> during PR review. But it would bring more potential
> problems as mentioned below, missing author information and
> message of "This closes #", etc. Even it might cause
> unexpected format of long commit content
>> description
> if not handled carefully in the text box.
>
> Best, Zhijiang
>
>
> --
>
>
From:tison 
> Send Time:2020 Mar. 5 (Thu.) 21:34 To:dev
>  Subject:Re: [DISCUSS] Disable
> "Squash and merge" button for Flink repository on GitHub
>
> Hi Yadong,
>
> Maybe we firstly reach out INFRA team and see the reply
> from their
>> side.
>
> Since the actual operator is INFRA team, in the dev mailing
> list we can focus on motivation and wait for the reply.
>
> Best, tison.
>
>
> Yadong Xie  于2020年3月5日周四 下午9:29写道:
>
>> Hi Jark
>>
>> I think GitHub UI can not disable both the "Squash and
>> merge" button
 and
>> "Rebase and merge" at the same time if there exists any
>> protected
 branch
> in
>> the repository(according to github rules).
>>
>> If we only left "merge and commits" button, it will
>> against requiring
>> a
>> linear commit history rules here
>>
>>
>

>> https://help.github.com/en/github/administering-a-repository/requirin
g-a-linear-commit-history
>>
>>
>>
tison  于2020年3月5日周四 下午9:04写道:
>>
>>> For implement it, file a JIRA ticket in INFRA [1]
>>>
>>> Best, tison. [1]
>>> https://issues.apache.org/jira/projects/INFRA
>>>
>>>
>>> Stephan Ewen  于2020年3月5日周四 下午8:57写道:
>>>
 Big +1 to disable it.

 I have never been a fan, it has always caused
 problems: - Merge commits - weird alias emails - lost
 author information - commit message misses the "This
 closes #" line to track
 back
 commits to PRs/reviews.

 The button goes against best practice, it should go
 away.

 Best, Stephan


 On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie
 
> wrote:

> Hi Jark There is a conversation about this here:
>
>

>>>
>>
>

>> https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-m
erge-commits-made-by-Github-Apps-changed/td-p/48797
>
>>
I think GitHub will fix it soon, it is a bug, not a feature :).
>
> Jingsong Li  于2020年3月5日周四 下

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 2/12/19 3:26 AM, Dian Fu wrote:
> Congrats Thomas!
> 
> Regards, Dian
>> 在 2019年2月12日,下午6:58,Hequn Cheng  写道:
>> 
>> Congrats Thomas!
>> 
>> Best, Hequn
>> 
>> 
>> On Tue, Feb 12, 2019 at 6:53 PM Stefan Richter
>> > > wrote: Congrats Thomas!,
>> 
>> Best, Stefan
>> 
>>> Am 12.02.2019 um 11:20 schrieb Stephen Connolly
>>> >> >:
>>> 
>>> Congratulations to Thomas. I see that this is not his first
>>> time in the PMC rodeo... also somebody needs to update LDAP as
>>> he's not on https://people.apache.org/phonebook.html?pmc=flink
>>>  yet!
>>> 
>>> -stephenc
>>> 
>>> On Tue, 12 Feb 2019 at 09:59, Fabian Hueske >> > wrote: Hi everyone,
>>> 
>>> On behalf of the Flink PMC I am happy to announce Thomas Weise
>>> as a new member of the Apache Flink PMC.
>>> 
>>> Thomas is a long time contributor and member of our community.
>>>  He is starting and participating in lots of discussions on our
>>> mailing lists, working on topics that are of joint interest of
>>> Flink and Beam, and giving talks on Flink at many events.
>>> 
>>> Please join me in welcoming and congratulating Thomas!
>>> 
>>> Best, Fabian
>> 
> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEE5pX7Jb/8TkJbcVD1JOnaS79K1SUFAlxi+NAACgkQJOnaS79K
1SWbHRAAiYdL5h4wkbndb+oWdpIXdcpVXPb2/0BIMq1uI87qCfOjiz3AFQiBZt61
8Lk3Sz+c8vi5NMl2f9WXGHrNerCFfi1PLtS0XZ6NGNGMXYD+j7sheYtByeWG6Flo
zOPxkpvIzMaswOvsdkLGCsvzX7DJhFRwAOi49YbydTfj+v77I/9kcLJGyLu1Vvjr
dLP6yh9SjnMZTjBCFVE2OehjUFUj24T3o1puxE5mLV63aFEkT/48o4ueAp0lh+dx
QLFWi2FvWuEy9g2cb2p7veImXRBMuiCcVYm1uiTf2vMu/Pvx3SFU5JtLN36RiUqq
BaTqScZi0fj7ap2k7X/zxLFtu0QcE+HuXq3sm7tOamAz4O636cvDMy5oYYngJBKr
8Y87IjIUnwr3Pb07g13LqyTmR1cLgt2PRJvlgyr5Z3v9xjRAMH5jZe2FlARXeSJp
MppHniMlfmOYBv6CDs9AkrbiGKs3YAjcqHFj2vrqV6mokWPhpMtz17zDeGWG+EGP
tpl9wYxTLqunMYHCeJzCFAV1ySJEHcHucFsfXoPze/nvt0sXU8ihFoNHdwrneSDv
n7vRA4i0eMklMw5TiRU0wEULDGwbODzHU3lVQPHMwkMhcsmizt+kcv1qaqzih12T
EG+siG110PM3PvpDYDdAQwMOcm2XHDAOqLgpUaQtXx8HvlUyK04=
=3Z8l
-END PGP SIGNATURE-


Re: [ANNOUNCE] New committer Gary Yao

2018-09-07 Thread Matthias J. Sax
Congrats!

On 09/07/2018 08:15 AM, Timo Walther wrote:
> Congratulations, Gary!
> 
> Timo
> 
> 
> Am 07.09.18 um 16:46 schrieb Ufuk Celebi:
>> Great addition to the committers. Congrats, Gary!
>>
>> – Ufuk
>>
>>
>> On Fri, Sep 7, 2018 at 4:45 PM, Kostas Kloudas
>>  wrote:
>>> Congratulations Gary! Well deserved!
>>>
>>> Cheers,
>>> Kostas
>>>
 On Sep 7, 2018, at 4:43 PM, Fabian Hueske  wrote:

 Congratulations Gary!

 2018-09-07 16:29 GMT+02:00 Thomas Weise :

> Congrats, Gary!
>
> On Fri, Sep 7, 2018 at 4:17 PM Dawid Wysakowicz
> 
> wrote:
>
>> Congratulations Gary! Well deserved!
>>
>> On 07/09/18 16:00, zhangmingleihe wrote:
>>> Congrats Gary!
>>>
>>> Cheers
>>> Minglei
>>>
 在 2018年9月7日,下午9:59,Andrey Zagrebin
  写道:

 Congratulations Gary!

> On 7 Sep 2018, at 15:45, Stefan Richter
> > wrote:
> Congrats Gary!
>
>> Am 07.09.2018 um 15:14 schrieb Till Rohrmann
>> > :
>> Hi everybody,
>>
>> On behalf of the PMC I am delighted to announce Gary Yao as a new
>> Flink
>> committer!
>>
>> Gary started contributing to the project in June 2017. He helped
> with
>> the
>> Flip-6 implementation, implemented many of the new REST handlers,
>> fixed
>> Mesos issues and initiated the Jepsen-based distributed test
>> suite
>> which
>> uncovered several serious issues. Moreover, he actively helps
>> community
>> members on the mailing list and with PR reviews.
>>
>> Please join me in congratulating Gary for becoming a Flink
> committer!
>> Cheers,
>> Till
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] New committer Piotr Nowojski

2018-06-22 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 6/22/18 12:28 PM, Stefan Richter wrote:
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAlstTjMACgkQDccxaWtL
g1+AgA/8DT/ZS8nUrE/DKWs5JPqEBTXvuR6f7/pfs12cJQ+vesbpIUxCB876Vqfm
JYhtE6k7ZNWkaQCVqOlXtPRusbF23daSKIjxnCEp9yFnx1HOtt6bAJs453RrN66u
bEmchSW77j5H/SvZkS8TV56VwqIi8UsRl2qRlBrS36xvALUKpuFWW+V1m8cfi6rR
BcmidjY5Li0KUexIChGsTkdGxjrKD53ZFvqOD1ltT8uYB9qWtiJJzZ4t3ZaxxEKB
n2IDKT/8+jiR9apt2i2+nIB8dMjqX2bukwBjB05802UTBS9gE71C9DfLbopMenxf
tIfyA3+A8EYkLIbDFH7zPFkiheTwroFZ88ChcylzMftLspXJXK8p3v/EpUo7O6rD
D4ddy6sZhGxIkYpD5PmYTG7OrLHsgLlBQQWWCQDeJmFD+nuY+larTP3H4xRxq0Eu
b5cX9TG27XOiqoYjZoJY0W3/RSASTMLp+BJzAb6AuXCcW+ijbfxEDR0TvOR83PD7
Mu67JmR9ah/AZei2uEF6v5x3+YonW1GyXfDz/Lp1XVHWFRfFcDClj8VEWFFYR4E3
zvD1vVfW4RdsOcy4fOAmH0udZaK7PeEwyK3WnaR7ZMyuKCz1sT2nUbDAEAwMMVyM
9NBaN8pupeZKCqL805/O1HUW9jmNxdDrGmrdoWwe7u+ylm+R0K0=
=/+Sy
-END PGP SIGNATURE-


Re: [ANNOUNCE] New committer Piotr Nowojski

2018-06-22 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 6/22/18 12:28 PM, Stefan Richter wrote:
> Congrats Piotr!
> 
>> Am 22.06.2018 um 21:26 schrieb Till Rohrmann 
>> :
>> 
>> Hi everybody,
>> 
>> On behalf of the PMC I am delighted to announce Piotr Nowojski as
>> a new Flink committer!
>> 
>> Piotr has been an active member of our community for more than a
>>  year. Among other things, he contributed the TwoPhaseCommitSink,
>>  worked extensively on improving Flink's network stack and is now
>>  contributing to stream SQL. He is also helping the community by
>>  reviewing PRs, answering questions and driving discussions on 
>> the mailing list.
>> 
>> Please join me in congratulating Piotr for becoming a Flink 
>> committer!
>> 
>> Cheers, Till
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAlstTjMACgkQDccxaWtL
g1+fPA/8CY4uCC9w2TK9++d3r+qK2w5vQ5FxLiIFzARC5C3XRG75L+vinM6wBOiP
kMqLQxlKce4gIb7qfAEfLL2CnHZ8tryZrTWEYxQCrf3M1TFRM6yoGYbbLGVroi3/
CY5KtersshjQWnp4qNJ03jfGKRLk6tz7rs9kElVpek6X33nYyLbVsFx0jViYcoFt
ddBSiJYoc8nq/BTVAbYY8ClO7bTOYFgp5vj0rlFobMyQPGGgWfEdnGOfQTG2vqWL
9siam1p+fKUg443TOdVrSlXB7e8U4CeAn6VO6BQZpwiTXha2WSZHjZTDOKb6GCVz
QmXG50tg2ngQrrrkbn2CNiOMZXnWT/QyD4farKk8tZuFIl2nICXK9r97W1DiV0SG
MXxgxnz7NZBfpajZNSuWTIBVRIXawPpdh3Icn+aAHCV9rnDtImZ2qNrQGvBlw60B
J9sNQGG34thg0yee6Kfc/XAEXceLZpQPAC6rNDQqR++JdQH4yV096EacMY7vowQN
37uSnoSG6mhd9l5wMpynCSRaTm2eiEvbnb424b8xFjg/AfU5PZJT/NBOFcxKDYrM
Tbhdex+eDTqO1F8nF0YPEoKajvWBZqMlQfBp/qyF5hYNYK3KUhZJFWeYa7W5BaJ8
KyyZLn1HlhYv7il0MGXwDhn6STPvGpeun56Pc4BO+UVrHI2vlM8=
=TrrS
-END PGP SIGNATURE-


Re: [ANNOUNCE] New committer: Sihua Zhou

2018-06-22 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 6/22/18 10:33 AM, shimin yang wrote:
> Congrats!
> 
> On Sat, Jun 23, 2018 at 1:13 AM Chen Qin  
> wrote:
> 
>> Congrats!
>> 
>>> On Jun 22, 2018, at 9:48 AM, Ted Yu  
>>> wrote:
>>> 
>>> Congratulations Sihua!
>>> 
 On Fri, Jun 22, 2018 at 6:42 AM, zhangminglei 
 <18717838...@163.com>
>> wrote:
 
 Congrats! Sihua
 
 Cheers Minglei.
 
> 在 2018年6月22日,下午9:17,Till Rohrmann  写 
> 道:
> 
> Hi everybody,
> 
> On behalf of the PMC I am delighted to announce Sihua Zhou 
> as a new
>> Flink
> committer!
> 
> Sihua has been an active member of our community for 
> several months.
 Among
> other things, he helped developing Flip-6, improved
> Flink's state
 backends
> and fixed a lot of major and minor issues. Moreover, he is 
> helping the Flink community reviewing PRs, answering users 
> on the mailing list and proposing new features.
> 
> Please join me in congratulating Sihua for becoming a
> Flink committer!
> 
> Cheers, Till
 
 
 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAlstOykACgkQDccxaWtL
g18BnA/+OnY+NscS/uCYud9A0cM8Tj2z2QdoQ/ILe6jvvtcRX9SncYdZ1tNDcrPt
ogjOPR/2Uawz6u3tgC/ddjYeMb0YUewKaa2GHwUsD51222iYhQH1uor73rVT9pbz
u8xoC1x/NcaHr2XrQLlyToacMm7oh1fL66+sBHeoE3k0UDeFsJmh5LdKbMSZT5KG
yfrll9ND/PLKmeN0D00TRlgifdZZZiDY7ItDKZz0LKpdQ5DVBzVO003g8tg8Q1q+
mvRnkQ1MZcA/X6eqR1KOS85fW0WWwhSS5+7m3z0fR77mwM4yAIsJl9/HR69yKDCk
F8Js0DG5KtRm02IRP0Z5kgRZITmS3V7YOU/JR1874tqLvDfegdn9V/Pnk6A/vjsy
uW4FPqtL610I7eKAsL3ckDnGatOUuStwJGgM0KFZbmVxTrzveh8ow42uy70qykz1
9tWhpZ6iDmCH7RTs0tJ/GFAWeq22at/EJG6qQ8T9ZPYz1pZWaEdYD0gSPZEUOPex
A978T4l2HucpMCiHR0b8gv7BttWndXFOCVS8wD1YJy0AFvMyxeegBmLZ1dQPo9Y2
hrOwLKc1o2wl7DQ7FdknMhJb3KKyPJZ1LXUmd4hSO5e+Gb20X2OEW53jvguxZSaG
DqAIxlu8zI+krlChJ9O+PNB2YeyO7Yhu48Kj7XuTs/xI/ZavFU0=
=ddxU
-END PGP SIGNATURE-


Re: [ANNOUNCE] Two new committers: Xingcan Cui and Nico Kruber

2018-05-08 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 5/8/18 12:28 PM, Shuyi Chen wrote:
> Congratulations!
> 
> On Tue, May 8, 2018 at 12:18 PM, Dawid Wysakowicz < 
> wysakowicz.da...@gmail.com> wrote:
> 
>> Congratulations Nico and Xingcan! Well deserved!
>> 
>> 
>> On 08.05.2018 20:52, Fabian Hueske wrote:
>>> Hi everyone,
>>> 
>>> I'm happy to announce that two members of the Flink community
>>> accepted
>> the
>>> offer of the PMC to become committers.
>>> 
>>> * Xingcan Cui has been contributing to Flink for about a year,
>>> focusing
>> on
>>> Flink's relational APIs (SQL & Table API). In the past year,
>>> Xingcan has started design discussions, helped reviewing
>>> several pull requests, and replied to questions on the user
>>> mailing list.
>>> 
>>> * Nico Kruber is an active contributor since 1.5 years and
>>> worked mostly
>> on
>>> internal features, such as the blob manager and a new network
>>> stack. Nico answers many questions on the user mailing list,
>>> reports lots of bugs and is a very active PR reviewer.
>>> 
>>> Please join me in congratulating Xingcan and Nico.
>>> 
>>> Cheers, Fabian
>>> 
>> 
>> 
>> 
> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIzBAEBCgAdFiEEeiQdEa0SVXokodP3DccxaWtLg18FAlryB2MACgkQDccxaWtL
g1/VYRAAnC9AH3GCd4BWKUaEg0bWBLBtqQHYxYlkcHA0y/tQ70tJENNmf0yufPn4
FsT/1/PJlLQNBZfvoXnfAiMbveKph+9S4ieInN2ApURM+tTNKz0oLK3UW7YliHsR
8qgpanfzB8wTN2LDi8sxwWMZBeWZq4hKI0hIxOWtMrPkztFAWehZnoBCYUy36x0p
QE57u89+x+HJDHWLLN0TWzyHwkvjhZlaqxbeWOlvG4wdEl4QV1Lr3wfx7uX9tbh3
0LsWCqe7FqGoRQQ2q5xOCob0sQeOp3iYMRFZKva471DeANwAUdu1Jg/ImFNBbMbY
8jLOw/dd+j49ULEC/+DYX4iKMDvLLNaDl8Slj7y3D2uJPBdjy4mfYpcpsKehr7vd
uIa2r7jkou08bQXc2W+BqCqQk8F7qNV8WRD8Ps8ekDgknIVLQiGXcjRBQkve1hoz
DaobnvsnDzusuYeOkap21bRq1iZoPzx8ZbVvLLwywxSarMb5UQQjBkx04FZjEBUB
WGvFZZ3rhDOx5vwYliu3C3/BVPAE/Z+e4XisTPz2EMZJLsE22PaJqefyTvYwsiKA
zAoQ6JaCbUjjPW8/6DeHCJkQSzprtlKr8x6Dt28Q3QYRYkv0MrtTWs1D9fTVit9K
OUmEmTusp6n2+aGRl8szzWf0+IOG8e40TgO+Pnhbis4+N7s0mRM=
=HLXS
-END PGP SIGNATURE-


Re: [ANNOUNCE] New committer: Haohui Mai

2017-11-02 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 11/1/17 9:40 PM, Dawid Wysakowicz wrote:
> Congratulations!
> 
> 01.11.2017 7:45 PM "Stephan Ewen"  napisał(a):
> 
>> Congrats and welcome!
>> 
>> On Wed, Nov 1, 2017 at 6:47 PM, Chen Qin 
>> wrote:
>> 
>>> Congratulations!
>>> 
>>> On Wed, Nov 1, 2017 at 2:41 AM, Aljoscha Krettek
>>>  wrote:
>>> 
 Congratulations! 
 
> On 1. Nov 2017, at 10:13, Shaoxuan Wang
>  wrote:
> 
> Congratulations!
> 
> On Wed, Nov 1, 2017 at 4:36 PM, Till Rohrmann
> 
 wrote:
> 
>> Congrats and welcome on board :-)
>> 
>> On Wed, Nov 1, 2017 at 9:14 AM, Fabian Hueske
>> 
 wrote:
>> 
>>> Hi everybody,
>>> 
>>> On behalf of the PMC I am delighted to announce Haohui
>>> Mai as a new
 Flink
>>> committer!
>>> 
>>> Haohui has been an active member of our community for
>>> several
>> months.
>>> Among other things, he made major contributions in
>>> ideas and code
>> to
 the
>>> SQL and Table APIs.
>>> 
>>> Please join me in congratulating Haohui for becoming a
>>> Flink
>>> committer!
>>> 
>>> Cheers, Fabian
>>> 
>> 
 
 
>>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQJFBAEBCgAvFiEEeiQdEa0SVXokodP3DccxaWtLg18FAln603YRHG1qc2F4QGFw
YWNoZS5vcmcACgkQDccxaWtLg198Bg/+K+1xITykT7W2WQL0CFCn9/hJEF5VI3iK
5YLzu4LUC9JWKFVl6CpEFlNWO+h9RrGuNZu+MPFqjsz+8JwyX1mC780gr8Gg59Qa
+IHsK5xNo5et6p45r0bgukA2hDGuap1W5WlB8cVyOat+Zmuhczc4J4WXAMeHXo1V
8fptsg1yNRkh7k4hweXCPSlqldwgCzllcqN26SkdlKGctr1729wpvWrAJsO+Azxl
dBCnvB+qqDEmlTxPP/4R4cpprQ5kSvUjKorOMwDSw2/h5De6FsnwnUcoxmk8ZIcI
vxU3KOU+mKYp0SH9v0soX/i7dWnQqBBiKmVqZgBQeDCS9C87X64TqjPYJUyhBjNX
jHTVYY32Wmdyo0f1Y9SfQ5theq4eM/DApIVQLzzUHAX1OzibjuAGVtLEK8udcrk1
/e3aBULzNcNqVICRGrKzZGFch0KW0VMBjdkAqjCcs8cdYRe0trZSVhHlaE5r1lS/
lM2CUpqXIqx2/GFeRCC5znbUpnf0lwaUg/0PJfxBF6ifPtyeip2cfpggtX8uK75Q
NrCR7UBAtYu1PUkXWAheEeuD8Gwl0YSrE57mjRs9ltbUSV0VlAOjBYx6yf1Ol6sw
EYR/fVcyplVmshXUO3fqPfsWeVhGVZMtB9U2zOgVA0Fd+MwXsJ5Qs5b0dVEo7JHC
pABowAipTbY=
=zEpM
-END PGP SIGNATURE-


Re: [ANNOUNCE] New Flink PMC member: Chesnay Schepler

2017-07-28 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 7/28/17 4:40 PM, Fabian Hueske wrote:
> Congrats and welcome Chesnay!
> 
> 2017-07-28 16:34 GMT+02:00 Kostas Kloudas
> :
> 
>> Congratulations Chesnay!
>> 
>>> On Jul 28, 2017, at 4:05 PM, Greg Hogan 
>>> wrote:
>>> 
>>> Developers,
>>> 
>>> On behalf of the Flink PMC I am delighted to announce Chesnay
>>> Schepler
>> as a member of the Flink PMC.
>>> 
>>> Chesnay is a longtime contributor, reviewer, and committer
>>> whose breadth
>> of work and knowledge covers nearly the entire codebase.
>>> 
>>> Please join me in congratulating Chesnay and welcoming him to
>>> bind his
>> votes, validate licenses, and sign releases!
>>> 
>>> Regards, Greg
>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJZe1rjAAoJELz8Z8hxAGOiMGoP3jsKIWSvZo7DzJVU2jxwf35p
3lC6Rhr88B5xTqFHZhMoy5pG6+wwbtd1tPkVsHhA3ZITMvmLc04xHy8srZGtqKJQ
l/br+bWCqt3+2V704K0ZaQcAEZLagb2xsyknQUh5yCCyYUW42Wr/vufQ7q80lGB0
KGkl3djUULWHCe/Nqvkq2Ekddwpn3GHxBWi1D6utd0pI+2dHTtF9pXj7Nqks5u67
nsN2333XFltMhHG3UdUnZGcnTTob2iZZpYiisjtFSSxwQKw0ur3FLCSBtKlL7Tf+
dfJnLwDOfrOEkAvjO0V2P8nhEIdUM/RvtcRh7x/CMjzl2XMz/AKXkxiqzz+RVGg0
puetqO067KKLnemlPCLNphS6G/dZsTiZrVdJ4W59um592JmgEJidJoPU7ieawrMz
w4m3bVSCN/erObHbaTJjZViyhU50kmpMlB8pWgu2vSxa3VVY11kHmLqjPHNhLODT
S5A8Z5OK2v8rZf1dALs/QrieGjkjFL1YjGMW6TpEggGaxdI/kWHfXngjNSWfnY5y
D0Apgjlrk4R+aCT/84fGL4rKiLcDm1ji0QZ7eCIhfS7izbHmsO+2aQkIS/hQu13v
O6VCSEDNaMON+2tYUE3dZaSzivB76Ryw9hvcgE2Ll6MNIVry4gbfCKiMg+6dnQdZ
n5FJqwgA+L+Qqm0=
=h0/I
-END PGP SIGNATURE-


Re: [ANNOUNCE] New Flink committer Jincheng Sun

2017-07-10 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 7/10/17 6:42 AM, Ted Yu wrote:
> Congratulations, Jincheng.
> 
> On Mon, Jul 10, 2017 at 6:17 AM, Fabian Hueske 
> wrote:
> 
>> Hi everybody,
>> 
>> On behalf of the PMC, I'm very happy to announce that Jincheng
>> Sun has accepted the invitation of the PMC to become a Flink
>> committer.
>> 
>> Since more than nine month, Jincheng is one of the most active
>> contributors of the Table API / SQL module. He has contributed
>> several major features, reported and fixed many bugs, and also
>> spent a lot of time reviewing pull requests.
>> 
>> Please join in me congratulating Jincheng for becoming a Flink
>> committer.
>> 
>> Thanks, Fabian
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJZY6gZAAoJELz8Z8hxAGOiMPMP4IL5jpqC1zBTSjOzBBwOc5BA
TuZSLJG4z9Da8WVvHe+HjTfqkDI8Ctm+WbFqerdGhfpARdpgfb9e7HPpmfxGC1kQ
fAGB14/1YWzOAWXGMDNLvaA01z1IJj6cTnuDbcG6rQePI+jjh2U22bGZiuvyGmCX
gIIvmf/1D43uU/YWcprGmrlnLEuXLHK9GO1PPxTrdWWYt1m4DQgojKrUm7HatSKe
QzdKwcvU6W9WK3hx7w0sPg1Pj9Rr9MNWLs1HGz5vn9GPKl0GhTw+TkjtHnNil4ly
jZdbsIt3vlEqlT3fQbVbNKWtnhGh1pb/GetyIzQKw9GXorIDkVNQPP7d0QInRBfC
FviddaSnh8jKr6PIu0AlMC/dx7a3f/h51Nrq+yp2OYV3TpOl+smUClKHL3uOOy4w
lAtXc/RMj+13zgnPjZqc7q+YAjFc6Hv4uzJrzr5ibVyfA0ZmGrsgYsVglk01h2ys
Ui95l1ZmTyKggoNVrPeGg/W68j4WEiEUcK1MvitnxSQfvWyUHqOTC/fTLzQHaSuk
295pZtowzSUnIsUcLokAOYwixgdU/ASgPHZ0rtgZuEtJGwVHjpQtTqh35EmM188L
J3v4CYwAoGlTV7IyUIOcjwYiZM1q2RGu0ydGQ1he3GMmJcO8j3BYG9gzkRJM1oHC
59jkKzl2S23m5t8=
=nghB
-END PGP SIGNATURE-


Re: [ANNOUNCE] New Flink PMC member: Tzu-Li (Gordon) Tai

2017-07-10 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 7/10/17 8:03 AM, jincheng sun wrote:
> Hi Gordon, Congratulations !!!
> 
> Cheers, Jincheng
> 
> 2017-07-10 22:44 GMT+08:00 Robert Metzger :
> 
>> Hi Everybody,
>> 
>> On behalf of the Flink PMC, I'm very excited to announce Gordon
>> as the latest addition to the Flink PMC.
>> 
>> Gordon is a very active community member, helping with a lot of
>> the release tasks, project discussions and of course work on the
>> codebase itself.
>> 
>> 
>> Regards, Robert
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJZY6gzAAoJELz8Z8hxAGOiDywP4LRt4hkhwgT9IExuPYSw/4Cf
RaRCIlyVRk01hxPICd2khB/EeTh1KAZ4kE5ZNggd+O+HCVTXX1Ff6TLoD1viszFT
0FOCi68hh6txI/ycxbPyKTqPV583P3AiyEeaM4mFSoicYqtMOHWxxxNRYEwyawAX
F2WlwrisdE89Y/Ay7gqPGcLjWNPpI365DFXctDgry5R2E5mnpUst+1a5abUKskJ7
QjOgPtK4SN468JidxJVA5M4+04GHVWkloY/m4nqB+bEhwZJvfEFHKN5IZflP33PU
uEeLS/D9N75t7ohRdc4GdT4+h4DjOAdAd/Mv+Omhjp2SNIKPe/6CssqmmSNJwDQ5
5TKBWZFRgozGFdAeGTEZuutGl32KTTOz8lqX9IzeK3bd9eGEl1j9U5sCkI6roK/X
oKgYpdAWjxTZiLgwPAhTlgY/EjlYKjaJ7SwdQA6/seWQ8V6IyJ8qQuoL/BhGZp0t
+0k/97Bu2MJx4sYOwcqq3tPJ3FM8Y3x+gtXSPlINUkeAaQTPCdKRs9+heRTePxwL
2YR+reI8CP3wHXXCa7jLtx/y0qEYSSchbPwIIR2RVNnRLdeou1ii5mHmiGiKEGyB
Ovi8TZdvY7/VnSs4itxvo0gjmtF1GiGI73ssm1+zJUdUZwfo1VAo2vNVbzVAgDqQ
chdIKjDWzlTq1BQ=
=VBHH
-END PGP SIGNATURE-


Re: [ANNOUNCE] New committer: Dawid Wysakowicz

2017-06-19 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

- -Matthias

On 6/19/17 3:52 AM, Dawid Wysakowicz wrote:
> Thank you all for the warm welcome. I will do my best to be as
> helpful as possible.
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJZSEPQAAoJELz8Z8hxAGOiWIQP33BKo+xnu4tO9ykQbwpccNCB
SvIGDGc9nGNcOVyDztk5ED1fm9322aQBNDY2RVjjxvHb1gtpG/ByWxbKqybGwJV8
d5UsAOpgErSbOeoBD9/3tEYEh0lm7xXKDiQTB/Ltb8PSLMmoSKfSxbFJVAxVQlnN
+HPS33S5i8Yy9KkUF8JBhNxA/IKVCCk4/NJavJY497Sx8spIHKfXVhveqQfRCLWG
Mib2uKQQ+vUJl7ZLTgBQ/a9OMrfox4sAzmraIOo4GX6ZUq+7XP5SnwX047wNiH3O
CfM//X3eedfHPDdNEPeEYLeULYdJ4BnlgAiRQh2e4tarRI+ulLbbp1MzA+ndpRRQ
XrB1Pp5+Of/p8qiSOJhgdYlkWkXfaVBpMkM3P67jYMeG+n4yOBcFoFeVYs1O4cP6
u7rX/SghrapC1yRarFnfBTv/Rmm2VRU3y/N93swIQIzU9DUupZcvOhbS1tylNE1o
CPbGjkoDoIhuVRqQlYPa28vSUZEnP8ZPahg6sgHhAnHC2n5UfTSeI9RdLX43Iad1
xVPX50qTPMPi4PaQtmAOuVsUcRp9PAn/k02OMNXMg6ANQrI9OpFFDR5tIPR7wqN7
uGRyu2MBrePnsZYu4evi+C8p59OnOf8OKDBucM0bIIXcWDqW1rf0b8HEBM/pL4jT
DNY8fMvulWFNX4M=
=rrA6
-END PGP SIGNATURE-


Re: [ANNOUNCE] New committer: Theodore Vasiloudis

2017-03-21 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 3/21/17 8:59 AM, Greg Hogan wrote:
> Welcome, Theo, and great to have you onboard with Flink and ML!
> 
> 
>> On Mar 21, 2017, at 4:35 AM, Robert Metzger 
>> wrote:
>> 
>> Hi everybody,
>> 
>> On behalf of the PMC I am delighted to announce Theodore
>> Vasiloudis as a new Flink committer!
>> 
>> Theo has been a community member for a very long time and he is
>> one of the main drivers of the currently ongoing ML discussions
>> in Flink.
>> 
>> 
>> Welcome Theo and congratulations again for becoming a Flink
>> committer!
>> 
>> 
>> Regards, Robert
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJY0WpUAAoJELz8Z8hxAGOiwsUP337d/4nIcb0OaUC6/S23HOJs
0VvQJtpZ1KpjimJAxLo6Z5LaSozbdSJJRBtkLp0bNdw3pTTiHGV7jRA+nPJW7/+7
qHMOhLYED5WxEzQZDyaBMauxYfOO9fbRXEfblLHnq3yXQQOTeJisx9rBxpiPTa+K
RPnkUZF/76RHiZXNggYpahqho9KwiARqiUWOJkuAiTM118a2Xj47vBNVekCs9YpI
5yKEH+8f9ADc8j1dHmQmpu9xKjMmm39SMJm7XSRGIrMFPwJmj3N94Uv75lEuMYJa
qs+RqKADAVRZHCZt6LdOb1uViLR9fN2/q14lKahNnK4V0TtRsyhZ7evwT4Myy3WW
bUuslmB1Wix0Ysq5T6s2STgwtHyITtT0A1Ur6BbJu6VKi5r4d37kluASrFHziwXL
kzTf4KFWfR6807VDh93TlAWG1ONkz72lZqGU103r8gFE08l3Wr95Vz6zXcLRkmy0
KaTAdMlJyys7Vtep9GvFHO1wzGSIEPAJ3TmfRSsWDKvVhGCXLPfX2aiqXvN++HQ3
rB+C8gYWIneTA1C9J/Sv0iLuK/M4Jq+WAQ090Z8C/5Tqi11C5Ez6g5g5Md3Ij2gi
OYvEcFbJlPAnvQ4vs8gBEwejerNYnsufVRKfPG6yV1F0iOmMPOm0eqEwLKVViPb2
Uxi4txWBrpsAHDk=
=mH4x
-END PGP SIGNATURE-


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-07 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 2/7/17 12:55 PM, Haohui Mai wrote:
> Congratulations!
> 
> On Tue, Feb 7, 2017 at 12:17 PM Fabian Hueske 
> wrote:
> 
>> Hi everybody,
>> 
>> I'm very happy to announce that Jark Wu and Kostas Kloudas
>> accepted the invitation of the Flink PMC to become committers of
>> the Apache Flink project.
>> 
>> Jark and Kostas are longtime members of the Flink community. Both
>> are actively driving Flink's development and contributing to its 
>> community in many ways.
>> 
>> Please join me in welcoming Kostas and Jark as committers.
>> 
>> Fabian
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJYmjTvAAoJELz8Z8hxAGOixrMP3iEATjioyiHr4koyIHF/it2y
cFXdrJsD84t1fLpNlH5snxokqIOxSDUC5kbQ437RGpGk7g6bxJ/KZP1fObWGstd/
2ThhM69UeDsVKUm0fiycJ9kvvmuylW4jXJ170JiTltpFdOv2x8nWkHYyv8rDkbrX
WKs+rUqBpbOk49ObnTtN/V1wqeHc7+iZRmwK1yVjTFKS+IvhFzCY8cGdMbHee8hR
oM2rB91nnI130voCiCAuVnzc/LV7G9EifKr1xAdbHZzG+Ld3kuDCvjuG2BNhH4LP
gFFKqIU4/0Tvi5+pqVoz5YIJNVBQMAOmafTQEUKpakhl85qRVx/etMi70DFA0ujb
lGwdG6N4EbFdUCNpj5Y9w4vH0CDlY/cjffTRQJCtuQJVtP8psnVVgaZp5H9HLVxX
k08sE5j44NJS1+owgayq4pPUGcz5Kzl8ECQx4DSOmGs2A2ZGgntB1Pluy/3LTd9h
UbuLI1mJm+1n/3SqmBmch6h5pZzu1QJvt3Wfm+ChE+eD70OUj7a7TYxbaCBoUvXq
fRQ1CV7YpbeR7Pti9eQTS+sh/yKfu1g90cSLOImnGLbCKL46ociq5Ja0njG7H9ta
ndZ+lCRjyygUQG9XI+9LB3pZulcSsWWhbKWmkVENsfxu8RAfk1VRYRBFqq5by+h/
3nJWUuk+BS2cAlo=
=uVv7
-END PGP SIGNATURE-


Re: Some thoughts about the lower-level Flink APIs

2016-08-15 Thread Matthias J. Sax
It really depends on the skill level of the developer. Using low-level
API requires to think about many details (eg. state handling etc.) that
could be done wrong.

As Flink gets a broader community, more people will use it who might not
have the required skill level to deal with low-level API. For more
trained uses, it is of course a powerful tool!

I guess it boils down to the question, what type of developer Flink
targets, if low-level API should be offensive advertised or not. Also
keep in mind, that many people criticized Storm's low-level API as hard
to program etc.


-Matthias

On 08/15/2016 07:46 AM, Gyula Fóra wrote:
> Hi Jamie,
> 
> I agree that it is often much easier to work on the lower level APIs if you
> know what you are doing.
> 
> I think it would be nice to have very clean abstractions on that level so
> we could teach this to the users first but currently I thinm its not easy
> enough to be good starting point.
> 
> The user needs to understand a lot about the system if the dont want to
> hurt other parts of the pipeline. For insance working with the
> streamrecords, propagating watermarks, working with state internals
> 
> This all might be overwhelming at the first glance. But maybe we can slim
> some abstractions down to the point where this becomes kind of the
> extension of the RichFunctions.
> 
> Cheers,
> Gyula
> 
> On Sat, Aug 13, 2016, 17:48 Jamie Grier  wrote:
> 
>> Hey all,
>>
>> I've noticed a few times now when trying to help users implement particular
>> things in the Flink API that it can be complicated to map what they know
>> they are trying to do onto higher-level Flink concepts such as windowing or
>> Connect/CoFlatMap/ValueState, etc.
>>
>> At some point it just becomes easier to think about writing a Flink
>> operator yourself that is integrated into the pipeline with a transform()
>> call.
>>
>> It can just be easier to think at a more basic level.  For example I can
>> write an operator that can consume one or two input streams (should
>> probably be N), update state which is managed for me fault tolerantly, and
>> output elements or setup timers/triggers that give me callbacks from which
>> I can also update state or emit elements.
>>
>> When you think at this level you realize you can program just about
>> anything you want.  You can create whatever fault-tolerant data structures
>> you want, and easily execute robust stateful computation over data streams
>> at scale.  This is the real technology and power of Flink IMO.
>>
>> Also, at this level I don't have to think about the complexities of
>> windowing semantics, learn as much API, etc.  I can easily have some inputs
>> that are broadcast, others that are keyed, manage my own state in whatever
>> data structure makes sense, etc.  If I know exactly what I actually want to
>> do I can just do it with the full power of my chosen language, data
>> structures, etc.  I'm not "restricted" to trying to map everything onto
>> higher-level Flink constructs which is sometimes actually more complicated.
>>
>> Programming at this level is actually fairly easy to do but people seem a
>> bit afraid of this level of the API.  They think of it as low-level or
>> custom hacking..
>>
>> Anyway, I guess my thought is this..  Should we explain Flink to people at
>> this level *first*?  Show that you have nearly unlimited power and
>> flexibility to build what you want *and only then* from there explain the
>> higher level APIs they can use *if* those match their use cases well.
>>
>> Would this better demonstrate to people the power of Flink and maybe
>> *liberate* them a bit from feeling they have to map their problem onto a
>> more complex set of higher level primitives?  I see people trying to
>> shoe-horn what they are really trying to do, which is simple to explain in
>> english, onto windows, triggers, CoFlatMaps, etc, and this get's
>> complicated sometimes.  It's like an impedance mismatch.  You could just
>> solve the problem very easily programmed in straight Java/Scala.
>>
>> Anyway, it's very easy to drop down a level in the API and program whatever
>> you want but users don't seem to *perceive* it that way.
>>
>> Just some thoughts...  Any feedback?  Have any of you had similar
>> experiences when working with newer Flink users or as a newer Flink user
>> yourself?  Can/should we do anything to make the *lower* level API more
>> accessible/visible to users?
>>
>> -Jamie
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] Flink 1.1.0 Released

2016-08-09 Thread Matthias J. Sax
Congrats!

On 08/09/2016 09:26 AM, Fabian Hueske wrote:
> Thanks Ufuk and everybody who contributed to the release!
> 
> Cheers, Fabian
> 
> 2016-08-08 20:41 GMT+02:00 Henry Saputra :
> 
>> Great work all. Great Thanks to Ufuk as RE :)
>>
>> On Monday, August 8, 2016, Stephan Ewen  wrote:
>>
>>> Great work indeed, and big thanks, Ufuk!
>>>
>>> On Mon, Aug 8, 2016 at 6:55 PM, Vasiliki Kalavri <
>>> vasilikikala...@gmail.com >
>>> wrote:
>>>
 yoo-hoo finally announced 
 Thanks for managing the release Ufuk!

 On 8 August 2016 at 18:36, Ufuk Celebi >
>>> wrote:

> The Flink PMC is pleased to announce the availability of Flink 1.1.0.
>
> On behalf of the PMC, I would like to thank everybody who contributed
> to the release.
>
> The release announcement:
> http://flink.apache.org/news/2016/08/08/release-1.1.0.html
>
> Release binaries:
> http://apache.openmirror.de/flink/flink-1.1.0/
>
> Please update your Maven dependencies to the new 1.1.0 version and
> update your binaries.
>
> – Ufuk
>

>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Contribute to Flink

2016-07-08 Thread Matthias J. Sax
Hi Kevin,

welcome to the Flink community!

Did you have a look into the web-page:
https://flink.apache.org/how-to-contribute.html

If you have follow-up question, just go for it

-Matthias

On 07/08/2016 09:12 AM, Kevin wrote:
> Hi,
> 
> I am relatively new to the development process of Apache Flink. Where
> can I start to help you developing Flink?
> 
> Kind regards,
> Kevin



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP 1 - Flink Improvement Proposal

2016-07-08 Thread Matthias J. Sax
Done.

I replaced KIP -> FLIP and Kafka -> Flink. Still there are some things
we need to rework. Please have a look and propose changes.

https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Same for FLIP Template

https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template


-Matthias

On 07/08/2016 09:56 AM, Aljoscha Krettek wrote:
> Hi,
> I got this reply from one of the Kafka committers:
> 
> Thanks for sharing your intention to use a process similar to our KIP
> 
> process. You are more than welcome to copy the structures and docs that we
> 
> have for the KIP process. :)
> 
> Ismael
> 
> 
> So it seems we're good to go. @Matthias, since you are a Kafka contributor,
> could you maybe copy the relevant docs from the Kafka wiki to our wiki? The
> source for this page:
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> and
> the KIP template that is referenced there. If you want, I can take it from
> there and adapt it to Flink and then let everyone discuss it again.
> 
> 
> Cheers,
> 
> Aljoscha
> 
> On Thu, 7 Jul 2016 at 11:38 Aljoscha Krettek <aljos...@apache.org> wrote:
> 
>> I'll reach out to the Kafka community and ask if it's ok for us to "steal"
>> their process for this.
>>
>> On Thu, 7 Jul 2016 at 11:36 Aljoscha Krettek <aljos...@apache.org> wrote:
>>
>>> @Matthias: Yes, this is the reason why I like the KIP process and why I
>>> said "The problem with these is that a) the comments on the Google Docs are
>>> not reflected in Jira and the mailing list. There has been some very active
>>> discussion on some of the docs that most people would never notice.".
>>>
>>> On Thu, 7 Jul 2016 at 11:28 Robert Metzger <rmetz...@apache.org> wrote:
>>>
>>>> I also like the proposal. I think its an issue that Google Docs comments
>>>> are not reflected within ASF infra. Therefore, I'm +1 on discussing the
>>>> proposals on the mailing list.
>>>>
>>>> I agree that we need to clean up our wiki.
>>>>
>>>> On Thu, Jul 7, 2016 at 10:58 AM, Matthias J. Sax <mj...@apache.org>
>>>> wrote:
>>>>
>>>>> Just to point out one thing about Kafka KIPs and using the project
>>>> wiki:
>>>>>
>>>>> The wiki contains the current state of the proposal, while the
>>>>> discussion is covered over the dev-mailing list. IMHO, this makes a lot
>>>>> of sense, as people tend to follow the mailing list but not wiki
>>>>> changes. Furthermore, the mailing list tracks the whole discussion
>>>>> history, while the proposal is kept in a clean state and thus easy to
>>>> read.
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 07/06/2016 10:09 PM, Aljoscha Krettek wrote:
>>>>>> Jip, that's why I referenced the Kafka process which is also in their
>>>>> wiki.
>>>>>>
>>>>>> On Wed, 6 Jul 2016 at 21:01 Stephan Ewen <se...@apache.org> wrote:
>>>>>>
>>>>>>> Yes, big +1
>>>>>>>
>>>>>>> I had actually talked about the same thing with some people as well.
>>>>>>>
>>>>>>> I am currently sketching a few FLIPs for things, like improvements
>>>> to
>>>>> the
>>>>>>> Yarn/Mesos/Kubernetes integration
>>>>>>>
>>>>>>>
>>>>>>> One thing we should do here is to actually structure the wiki a bit
>>>> to
>>>>> make
>>>>>>> it easier to find information and proposals.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 6, 2016 at 4:24 PM, Ufuk Celebi <u...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hey Aljoscha,
>>>>>>>>
>>>>>>>> thanks for this proposal. I've somehow missed it last week. I like
>>>> the
>>>>>>>> idea very much and agree with your assessment about the problems
>>>> with
>>>>>>>> the Google Doc approach.
>>>>>>>>
>>>>>>>> Regarding the process: I'm also in favour of adopting it from
>>>> Kafka. I
>>>>>>>> would not expect any problems with this, but we can post a

Re: Contributing

2016-07-08 Thread Matthias J. Sax
Done.

I replaced KIP -> FLIP and Kafka -> Flink. Still there are some things
we need to rework. Please have a look and propose changes.

https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Same for FLIP Template

https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template


-Matthias

On 07/08/2016 11:17 AM, Matthias J. Sax wrote:
> Hi Kevin,
> 
> welcome to the Flink community!
> 
> Did you have a look into the web-page:
> https://flink.apache.org/how-to-contribute.html
> 
> If you have follow-up question, just go for it :)
> 
> -Matthias
> 
> On 07/08/2016 10:02 AM, Kevin Jacobs wrote:
>> Hi,
>>
>> I am relatively new to the development process of Apache Flink. Where
>> can I start to help you developing Flink?
>>
>> Kind regards,
>> Kevin
> 



signature.asc
Description: OpenPGP digital signature


Re: Contributing

2016-07-08 Thread Matthias J. Sax
Hi Kevin,

welcome to the Flink community!

Did you have a look into the web-page:
https://flink.apache.org/how-to-contribute.html

If you have follow-up question, just go for it :)

-Matthias

On 07/08/2016 10:02 AM, Kevin Jacobs wrote:
> Hi,
> 
> I am relatively new to the development process of Apache Flink. Where
> can I start to help you developing Flink?
> 
> Kind regards,
> Kevin



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP 1 - Flink Improvement Proposal

2016-07-07 Thread Matthias J. Sax
Just to point out one thing about Kafka KIPs and using the project wiki:

The wiki contains the current state of the proposal, while the
discussion is covered over the dev-mailing list. IMHO, this makes a lot
of sense, as people tend to follow the mailing list but not wiki
changes. Furthermore, the mailing list tracks the whole discussion
history, while the proposal is kept in a clean state and thus easy to read.

-Matthias


On 07/06/2016 10:09 PM, Aljoscha Krettek wrote:
> Jip, that's why I referenced the Kafka process which is also in their wiki.
> 
> On Wed, 6 Jul 2016 at 21:01 Stephan Ewen <se...@apache.org> wrote:
> 
>> Yes, big +1
>>
>> I had actually talked about the same thing with some people as well.
>>
>> I am currently sketching a few FLIPs for things, like improvements to the
>> Yarn/Mesos/Kubernetes integration
>>
>>
>> One thing we should do here is to actually structure the wiki a bit to make
>> it easier to find information and proposals.
>>
>>
>>
>>
>> On Wed, Jul 6, 2016 at 4:24 PM, Ufuk Celebi <u...@apache.org> wrote:
>>
>>> Hey Aljoscha,
>>>
>>> thanks for this proposal. I've somehow missed it last week. I like the
>>> idea very much and agree with your assessment about the problems with
>>> the Google Doc approach.
>>>
>>> Regarding the process: I'm also in favour of adopting it from Kafka. I
>>> would not expect any problems with this, but we can post a quick note
>>> to their ML.
>>>
>>> @Matthias: The name works for me. ;-)
>>>
>>> – Ufuk
>>>
>>> On Tue, Jun 28, 2016 at 10:19 PM, Matthias J. Sax <mj...@apache.org>
>>> wrote:
>>>> FLIP ?? Really? :D
>>>>
>>>> http://www.maya.tv/en/character/flip
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 06/28/2016 06:26 PM, Aljoscha Krettek wrote:
>>>>> I'm proposing to add a formal process for how we deal with (major)
>>>>> improvements to Flink and design docs. This has been mentioned several
>>>>> times recently but we never took any decisive action to actually
>>> implement
>>>>> such a process so here we go.
>>>>>
>>>>> Right now, we have Jira issues and we sometimes we have design docs
>>> that we
>>>>> keep in Google Docs. Jamie recently added links to those that he could
>>> find
>>>>> on the mailing list to the Flink wiki:
>>>>> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home.
>>> The
>>>>> problem with these is that a) the comments on the Google Docs are not
>>>>> reflected in Jira and the mailing list. There has been some very
>> active
>>>>> discussion on some of the docs that most people would never notice.
>> The
>>>>> community therefore might seem less active than it actually is. b) the
>>>>> documents are not very discoverable, if we had a clearly defined place
>>>>> where we put them and also prominently link to this on the Flink
>>> homepage
>>>>> this would greatly help people that try to find out about current
>>>>> developments.
>>>>>
>>>>> Kafka has a process like this:
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> .
>>>>> They call it KIP, for Kafka Improvement Proposal. We could either
>> adapt
>>>>> this for Flink or come up with our own process. Doing the former would
>>> save
>>>>> us a lot of time and I don't think the Kafka community would mind us
>>>>> copying their process. The subject also hints at this, our process
>>> could be
>>>>> called FLIP, for Flink Improvement Proposal.
>>>>>
>>>>> What do you think? Feedback is highly welcome. :-)
>>>>>
>>>>> Cheers,
>>>>> Aljoscha
>>>>>
>>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP 1 - Flink Improvement Proposal

2016-06-28 Thread Matthias J. Sax
FLIP ?? Really? :D

http://www.maya.tv/en/character/flip

-Matthias


On 06/28/2016 06:26 PM, Aljoscha Krettek wrote:
> I'm proposing to add a formal process for how we deal with (major)
> improvements to Flink and design docs. This has been mentioned several
> times recently but we never took any decisive action to actually implement
> such a process so here we go.
> 
> Right now, we have Jira issues and we sometimes we have design docs that we
> keep in Google Docs. Jamie recently added links to those that he could find
> on the mailing list to the Flink wiki:
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home. The
> problem with these is that a) the comments on the Google Docs are not
> reflected in Jira and the mailing list. There has been some very active
> discussion on some of the docs that most people would never notice. The
> community therefore might seem less active than it actually is. b) the
> documents are not very discoverable, if we had a clearly defined place
> where we put them and also prominently link to this on the Flink homepage
> this would greatly help people that try to find out about current
> developments.
> 
> Kafka has a process like this:
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals.
> They call it KIP, for Kafka Improvement Proposal. We could either adapt
> this for Flink or come up with our own process. Doing the former would save
> us a lot of time and I don't think the Kafka community would mind us
> copying their process. The subject also hints at this, our process could be
> called FLIP, for Flink Improvement Proposal.
> 
> What do you think? Feedback is highly welcome. :-)
> 
> Cheers,
> Aljoscha
> 



signature.asc
Description: OpenPGP digital signature


Re: [Discussion] Query regarding Join

2016-06-13 Thread Matthias J. Sax
You need to do an outer-join. However, there is no build-in support for
outer-joins yet.

You can use Window-CoGroup to implement the outer-join as an own operator.


-Matthias

On 06/13/2016 06:53 PM, Vinay Patil wrote:
> Hi,
> 
> I have a question regarding the join operation, consider the following
> dummy example:
> 
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
> DataStreamSource sourceStream =
> env.fromElements(10,20,23,25,30,33,102,18);
> DataStreamSource destStream = env.fromElements(20,30,40,50,60,10);
> 
> sourceStream.join(destStream)
> .where(new ElementSelector())
> .equalTo(new ElementSelector())
> .window(TumblingEventTimeWindows.of(Time.milliseconds(10)))
> .apply(new JoinFunction() {
> 
> private static final long serialVersionUID = 1L;
> 
> @Override
> public Integer join(Integer paramIN1, Integer paramIN2) throws Exception {
> return paramIN1;
> }
> }).print();
> 
> I perfectly get the elements that are matching in both the streams, however
> my requirement is to write these matched elements and also the unmatched
> elements to sink(S3)
> 
> How do I get the unmatched elements from each stream ?
> 
> Regards,
> Vinay Patil
> 



signature.asc
Description: OpenPGP digital signature


Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-13 Thread Matthias J. Sax
Sounds like a good idea to me. We could include Wikipedia article as well.

As was thinking about extending the article anyway (no time so far...),
as of Flink 1.x the system is stable in large parts and it might be nice
to have a high level system description on Wikipedia, too.

-Matthias


On 05/13/2016 12:20 PM, Kostas Tzoumas wrote:
> Should we also add a component "Flink website and wiki" (minus the
> documentation) with an associated maintainer?
> 
> On Fri, May 13, 2016 at 12:17 PM, Timo Walther  wrote:
> 
>> +1 for from my side too
>>
>>
>>
>> On 13.05.2016 06:13, Chiwan Park wrote:
>>
>>> +1 for this proposal
>>>
>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Matthias J. Sax
+1 from my side.

Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
it's me, even the correct spelling would be with two 't' :P)

-Matthias

On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> +1 for the proposal
> On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
> 
>> Yes, Gabor Gevay, that did refer to you!
>>
>> Sorry for the ambiguity...
>>
>> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi >>
>> wrote:
>>
>>> +1 for the proposal
>>> @ggevay: I do think that it refers to you. :)
>>>
>>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay  wrote:
>>>
 Hello,

 There are at least three Gábors in the Flink community,  :) so
 assuming that the Gábor in the list of maintainers of the DataSet API
 is referring to me, I'll be happy to do it. :)

 Best,
 Gábor G.



 2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> Hi everyone!
>
> We propose to establish some lightweight structures in the Flink open
> source community and development process,
> to help us better handle the increased interest in Flink (mailing
>> list
 and
> pull requests), while not overwhelming the
> committers, and giving users and contributors a good experience.
>
> This proposal is triggered by the observation that we are reaching
>> the
> limits of where the current community can support
> users and guide new contributors. The below proposal is based on
> observations and ideas from Till, Robert, and me.
>
> 
> Goals
> 
>
> We try to achieve the following
>
>   - Pull requests get handled in a timely fashion
>   - New contributors are better integrated into the community
>   - The community feels empowered on the mailing list.
> But questions that need the attention of someone that has deep
> knowledge of a certain part of Flink get their attention.
>   - At the same time, the committers that are knowledgeable about
>> many
 core
> parts do not get completely overwhelmed.
>   - We don't overlook threads that report critical issues.
>   - We always have a pretty good overview of what the status of
>> certain
> parts of the system are.
>   -> What are often encountered known issues
>   -> What are the most frequently requested features
>
>
> 
> Problems
> 
>
> Looking into the process, there are two big issues:
>
> (1) Up to now, we have been relying on the fact that everything just
> "organizes itself", driven by best effort. That assumes
> that everyone feels equally responsible for every part, question, and
> contribution. At the current state, this is impossible
> to maintain, it overwhelms the committers and contributors.
>
> Example: Pull requests are picked up by whoever wants to pick them
>> up.
 Pull
> requests that are a lot of work, have little
> chance of getting in, or relate to less active components are
>> sometimes
 not
> picked up. When contributors are pretty
> loaded already, it may happen that no one eventually feels
>> responsible
>>> to
> pick up a pull request, and it falls through the cracks.
>
> (2) There is no good overview of what are known shortcomings,
>> efforts,
 and
> requested features for different parts of the system.
> This information exists in various peoples' heads, but is not easily
> accessible for new people. The Flink JIRA is not well
> maintained, it is not easy to draw insights from that.
>
>
> ===
> The Proposal
> ===
>
> Since we are building a parallel system, the natural solution seems
>> to
 be:
> partition the workload ;-)
>
> We propose to define a set of components for Flink. Each component is
> maintained or tracked by one or more
> people - let's call them maintainers. It is important to note that we
 don't
> suggest the maintainers as an authoritative role, but
> simply as committers or contributors that visibly step up for a
>> certain
> component, and mainly track and drive the efforts
> pertaining to that component.
>
> It is also important to realize that we do not want to suggest that
 people
> get less involved with certain parts and components, because
> they are not the maintainers. We simply want to make sure that each
>>> pull
> request or question or contribution has in the end
> one person (or a small set of people) responsible for catching and
 tracking
> it, if it was not worked on by the pro-active
> community.
>
> For some components, having multiple maintainers will be helpful. In
>>> that
> case, one maintainer should be the "chair" or "lead"
> and make sure that no issue of that component gets lost between 

Re: Eclipse Problems

2016-04-27 Thread Matthias J. Sax
I guess, removing .fromElements(Object..) would fix the problem. Not
sure so, if we can remove the method due to API stability...

I don't see any other good solution (even if the current implementation
gives a nice behavior by accident...):

If you have a complex class hierarchy, it would be quite complex to find
out the correct common sub-type. Using only .fromElemenst(Class,
X...) requires to specify the correct sub-type and has the additional
advantage, the the compiler can check the type already (instead of a
potential later runtime error).


-Matthias


On 04/27/2016 03:07 PM, Till Rohrmann wrote:
> You’re completely right Mathias. The compiler shouldn’t allow something
> like env.fromElements(SubClass.class, new ParentClass()) if it weren’t for
> the overloaded method. Thus, the test case is somewhat bogus.
> 
> I’m actually wondering why the initial problem
> https://issues.apache.org/jira/browse/FLINK-3444 was solved this way. I
> think it would be better to automatically infer the common super type of
> all provided elements. Otherwise, you run into problems you’ve found out
> about.
> 
> Consequently, I think it is fine if you remove the fromElementsWithBaseType2
> test case.
> 
> Cheers,
> Till
> ​
> 
> On Wed, Apr 27, 2016 at 1:22 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Hi Till,
>>
>> but StreamExecutionEnvironmentTest.fromElementWithBaseTypeTest2 does not
>> test was you describe -- even if it is intended to test it.
>>
>> It would test your describe scenario, if fromElements(Class, X...)
>> would be called, But this call is not possible because X is defined a
>> type Subclass and thus the provided object of Parentclass cannot be
>> handed over as type X. Therefore, fromElements(Object...) is called: of
>> course, this fails too, because now the type is derived as
>> Class (and not Subclass) and neither Subclass nor Parentclass
>> inherit from Class.
>>
>> The scenario you describe will never work -- if you remove the overload
>> fromElements(Object...) the code would not even compile as the compiler
>> can figure out from the generics that the call
>> fromElments(Subclass.class, new Parentclass()) is invalid.
>>
>> It is only possible to hand in "reverse inheritance types" for
>> fromElemenst(Object...). In this case, the first given Object defines
>> the type. Thus, if you call fromElements(new Subclass(), new
>> Parentclass()), the call will fail, as Parentclass is no subtype of
>> Subtype -- the call fromElements(new Parentclass() new Subclass()) would
>> succeed.
>>
>> Makes sense?
>>
>> Still no idea how to make it compile in Eclipse...
>>
>> -Matthias
>>
>> On 04/27/2016 10:21 AM, Till Rohrmann wrote:
>>> Thanks for looking into this problem Mathias. I think the Scala test
>> should
>>> be fixed as you've proposed.
>>>
>>> Concerning the
>> StreamExecutionEnvironmentTest.fromElementWithBaseTypeTest2,
>>> I think it shouldn't be changed. The reason is that the class defines the
>>> common base class of the elements. And the test makes sure that the
>>> fromElements call fails if you provide instances which are not of the
>>> specified type or a subclass of it. Thus, we should find another way to
>>> make it work with Eclipse.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Apr 26, 2016 at 9:41 PM, Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>
>>>> Even if the fix works, I still have two issues in my Eclipse build...
>>>>
>>>> In
>>>>
>>>>
>>>>
>> flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>>>
>>>> Eclipse cannot infer the integer type. It could be fixed if you make the
>>>> type explicit (as this is only a test, it might be nice to fix this --
>>>> let me know if I can push this or not)
>>>>
>>>>> diff --git
>>>>
>> a/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>>>
>> b/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>>>> index c2e13fe..f9ce3b8 100644
>>>>> ---
>>>>
>> a/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>>>> +++
>>>>
>> b/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>>>> @@ -29,7 +29,7 @@ private[extensions] abstract class Acce

Re: Eclipse Problems

2016-04-27 Thread Matthias J. Sax
Hi Till,

but StreamExecutionEnvironmentTest.fromElementWithBaseTypeTest2 does not
test was you describe -- even if it is intended to test it.

It would test your describe scenario, if fromElements(Class, X...)
would be called, But this call is not possible because X is defined a
type Subclass and thus the provided object of Parentclass cannot be
handed over as type X. Therefore, fromElements(Object...) is called: of
course, this fails too, because now the type is derived as
Class (and not Subclass) and neither Subclass nor Parentclass
inherit from Class.

The scenario you describe will never work -- if you remove the overload
fromElements(Object...) the code would not even compile as the compiler
can figure out from the generics that the call
fromElments(Subclass.class, new Parentclass()) is invalid.

It is only possible to hand in "reverse inheritance types" for
fromElemenst(Object...). In this case, the first given Object defines
the type. Thus, if you call fromElements(new Subclass(), new
Parentclass()), the call will fail, as Parentclass is no subtype of
Subtype -- the call fromElements(new Parentclass() new Subclass()) would
succeed.

Makes sense?

Still no idea how to make it compile in Eclipse...

-Matthias

On 04/27/2016 10:21 AM, Till Rohrmann wrote:
> Thanks for looking into this problem Mathias. I think the Scala test should
> be fixed as you've proposed.
> 
> Concerning the StreamExecutionEnvironmentTest.fromElementWithBaseTypeTest2,
> I think it shouldn't be changed. The reason is that the class defines the
> common base class of the elements. And the test makes sure that the
> fromElements call fails if you provide instances which are not of the
> specified type or a subclass of it. Thus, we should find another way to
> make it work with Eclipse.
> 
> Cheers,
> Till
> 
> On Tue, Apr 26, 2016 at 9:41 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Even if the fix works, I still have two issues in my Eclipse build...
>>
>> In
>>
>>
>> flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>
>> Eclipse cannot infer the integer type. It could be fixed if you make the
>> type explicit (as this is only a test, it might be nice to fix this --
>> let me know if I can push this or not)
>>
>>> diff --git
>> a/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>> b/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>> index c2e13fe..f9ce3b8 100644
>>> ---
>> a/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>> +++
>> b/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>>> @@ -29,7 +29,7 @@ private[extensions] abstract class AcceptPFTestBase
>> extends TestLogger with JUni
>>>
>>>private val env = ExecutionEnvironment.getExecutionEnvironment
>>>
>>> -  protected val tuples = env.fromElements(1 -> "hello", 2 -> "world")
>>> +  protected val tuples = env.fromElements(new Integer(1) -> "hello",
>> new Integer(2) -> "world")
>>>protected val caseObjects = env.fromElements(KeyValuePair(1,
>> "hello"), KeyValuePair(2, "world"))
>>>
>>>protected val groupedTuples = tuples.groupBy(_._1)
>>
>> Furthermore, in
>>
>> flink-java/src/test/java/org/apache/flink/api/java/io/FromElementsTest.java
>>
>>> @Test
>>> public void fromElementsWithBaseTypeTest1() {
>>>   ExecutionEnvironment executionEnvironment =
>> ExecutionEnvironment.getExecutionEnvironment();
>>>   executionEnvironment.fromElements(ParentType.class, new SubType(1,
>> "Java"), new ParentType(1, "hello"));
>>> }
>>
>> and in
>>
>>
>> flink-streaming-java/src/test/java/org/apache/flink/streaming/api/StreamExecutionEnvironmentTest.java
>>
>>> @Test
>>> public void fromElementsWithBaseTypeTest1() {
>>>   StreamExecutionEnvironment env =
>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>   env.fromElements(ParentClass.class, new SubClass(1, "Java"), new
>> ParentClass(1, "hello"));
>>> }
>>
>> In both cases, I get the error:
>>
>>   The method .fromElements(Object[]) is ambiguous
>>
>> No clue how to fix this, and why Eclipse does not bind to
>> .fromElements(Class, X). Any ideas?
>>
>> I also digger a little bit and for both test-classes there is a second
>> t

Re: Eclipse Problems

2016-04-26 Thread Matthias J. Sax
Even if the fix works, I still have two issues in my Eclipse build...

In

flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala

Eclipse cannot infer the integer type. It could be fixed if you make the
type explicit (as this is only a test, it might be nice to fix this --
let me know if I can push this or not)

> diff --git 
> a/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
>  
> b/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
> index c2e13fe..f9ce3b8 100644
> --- 
> a/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
> +++ 
> b/flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
> @@ -29,7 +29,7 @@ private[extensions] abstract class AcceptPFTestBase extends 
> TestLogger with JUni
>  
>private val env = ExecutionEnvironment.getExecutionEnvironment
>  
> -  protected val tuples = env.fromElements(1 -> "hello", 2 -> "world")
> +  protected val tuples = env.fromElements(new Integer(1) -> "hello", new 
> Integer(2) -> "world")
>protected val caseObjects = env.fromElements(KeyValuePair(1, "hello"), 
> KeyValuePair(2, "world"))
>  
>protected val groupedTuples = tuples.groupBy(_._1)

Furthermore, in

flink-java/src/test/java/org/apache/flink/api/java/io/FromElementsTest.java

> @Test
> public void fromElementsWithBaseTypeTest1() {
>   ExecutionEnvironment executionEnvironment = 
> ExecutionEnvironment.getExecutionEnvironment();
>   executionEnvironment.fromElements(ParentType.class, new SubType(1, 
> "Java"), new ParentType(1, "hello"));
> }

and in

flink-streaming-java/src/test/java/org/apache/flink/streaming/api/StreamExecutionEnvironmentTest.java

> @Test
> public void fromElementsWithBaseTypeTest1() {
>   StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.fromElements(ParentClass.class, new SubClass(1, "Java"), new 
> ParentClass(1, "hello"));
> }

In both cases, I get the error:

  The method .fromElements(Object[]) is ambiguous

No clue how to fix this, and why Eclipse does not bind to
.fromElements(Class, X). Any ideas?

I also digger a little bit and for both test-classes there is a second
test method called "fromElementsWithBaseTypeTest2". If I understand this
test correctly, it also tries to bind to .fromElements(Class, X), but
this does not happen and .fromElemenst(Object[]) is called. Even if
there is still an exception, I got the impression that this test does
not what the intention was.

If might be good to change fromElementsWithBaseTypeTest2 to

> env.fromElements(new SubClass(1, "Java"), new ParentClass(1, "hello"));

(ie, remove the first Class parameter). Any comments on this?

-Matthias



On 04/25/2016 01:42 PM, Robert Metzger wrote:
> Cool, thank you for working on this!
> 
> On Mon, Apr 25, 2016 at 1:37 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> I can confirm that the SO answer works.
>>
>> I will add a note to the Eclipse setup guide at the web site.
>>
>> -Matthias
>>
>>
>> On 04/25/2016 11:33 AM, Robert Metzger wrote:
>>> It seems that the user resolved the issue on SO, right?
>>>
>>> On Mon, Apr 25, 2016 at 11:31 AM, Ufuk Celebi <u...@apache.org> wrote:
>>>
>>>> On Mon, Apr 25, 2016 at 12:14 AM, Matthias J. Sax <mj...@apache.org>
>>>> wrote:
>>>>> What do you think about this?
>>>>
>>>> Hey Matthias!
>>>>
>>>> Thanks for bringing this up.
>>>>
>>>> I think it is very desirable to keep support for Eclipse. It's quite a
>>>> high barrier for new contributors to enforce a specific IDE (although
>>>> IntelliJ is gaining quite the user base I think :P).
>>>>
>>>> Do you have time to look into this?
>>>>
>>>> – Ufuk
>>>>
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Eclipse Problems

2016-04-25 Thread Matthias J. Sax
I can confirm that the SO answer works.

I will add a note to the Eclipse setup guide at the web site.

-Matthias


On 04/25/2016 11:33 AM, Robert Metzger wrote:
> It seems that the user resolved the issue on SO, right?
> 
> On Mon, Apr 25, 2016 at 11:31 AM, Ufuk Celebi <u...@apache.org> wrote:
> 
>> On Mon, Apr 25, 2016 at 12:14 AM, Matthias J. Sax <mj...@apache.org>
>> wrote:
>>> What do you think about this?
>>
>> Hey Matthias!
>>
>> Thanks for bringing this up.
>>
>> I think it is very desirable to keep support for Eclipse. It's quite a
>> high barrier for new contributors to enforce a specific IDE (although
>> IntelliJ is gaining quite the user base I think :P).
>>
>> Do you have time to look into this?
>>
>> – Ufuk
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Problem with flink while development

2016-04-18 Thread Matthias J. Sax
If you work on plain Integer (or other non-POJO types) you need to
provide a KeySelector to make it work.

For you case something like this:

.keyBy(new KeySelector() {
@Override
public Integer getKey(Integer value) throws Exception {
return value;
}
})

As Saikat already mentioned, int-based index access to keys works only
for Flink's Tuple types.

For POJO's you can also access key by name.

https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/index.html#datastream-transformations


-Matthias


On 04/18/2016 08:30 PM, Saikat Maitra wrote:
> Hello Jitendra,
> 
>  I am new to Flink community but may have seen this issue earlier. Can you
> try to use DataStream instead of KeyedStream integers4
> 
> Regards Saikat
> 
> On Mon, Apr 18, 2016 at 7:37 PM, Jitendra Agrawal <
> jitendra.agra...@northgateps.com> wrote:
> 
>> Hi Team,
>>
>> Problem Description :  When I was calling *reduce()* method on keyedStream
>> object then getting Ecxeption as
>>
>> "* org.apache.flink.api.common.InvalidProgramException: Specifying keys via
>> field positions is only valid for tuple data types. Type: Integer*".
>>
>>StreamExecutionEnvironment env =
>> StreamExecutionEnvironment.getExecutionEnvironment();
>>   KeyedStream integers4 = env.fromElements(1,2,3).keyBy(0);
>>   DataStream doubleIntegers4 = integers4.reduce(new
>> ReduceFunction() {
>>
>> @Override
>> public Integer reduce(Integer arg0, Integer arg1) throws Exception {
>> return arg0 + arg1;
>> }
>> });
>>
>>
>> can you please tell me how to solve this exception? or can you provide any
>> link or example where I get these complete Solution..
>>
>> your help will be appreciated..
>>
>> Thanks & Regards
>>
>>  *Jitendra Agrawal *
>>
>> Mumbai
>>
>> 9167539602
>>
>> --
>> This email is sent on behalf of Northgate Public Services (UK) Limited and
>> its associated companies including Rave Technologies (India) Pvt Limited
>> (together "Northgate Public Services") and is strictly confidential and
>> intended solely for the addressee(s).
>> If you are not the intended recipient of this email you must: (i) not
>> disclose, copy or distribute its contents to any other person nor use its
>> contents in any way or you may be acting unlawfully;  (ii) contact
>> Northgate Public Services immediately on +44(0)1908 264500 quoting the name
>> of the sender and the addressee then delete it from your system.
>> Northgate Public Services has taken reasonable precautions to ensure that
>> no viruses are contained in this email, but does not accept any
>> responsibility once this email has been transmitted.  You should scan
>> attachments (if any) for viruses.
>>
>> Northgate Public Services (UK) Limited, registered in England and Wales
>> under number 00968498 with a registered address of Peoplebuilding 2,
>> Peoplebuilding Estate, Maylands Avenue, Hemel Hempstead, Hertfordshire, HP2
>> 4NN.  Rave Technologies (India) Pvt Limited, registered in India under
>> number 117068 with a registered address of 2nd Floor, Ballard House, Adi
>> Marzban Marg, Ballard Estate, Mumbai, Maharashtra, India, 41.
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Flink optimizer optimizations

2016-04-16 Thread Matthias J. Sax
Sure. WITHOUT.

Thanks. Good catch :)

On 04/16/2016 01:18 PM, Ufuk Celebi wrote:
> On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <mj...@apache.org> wrote:
>> (with the need to sort the data, because both
>> datasets will be sorted on A already). Thus, the overhead of sorting in
>> the group might pay of in the join.
> 
> I think you meant to write withOUT the need to the sort the data, right?
> 



signature.asc
Description: OpenPGP digital signature


Re: Flink optimizer optimizations

2016-04-16 Thread Matthias J. Sax
Assume you have a groupBy followed by a join.

DataSet1 (nor sorted) -> groupBy(A) --> join(1.A == 2.A)
^
DataSet2 (sorted on A) -+

For groupBy(A) of DataSet1 the optimizer can pick hash-grouping or the
more expensive sort-based-grouping. If the optimizer pick
sort-based-grouping, the join becomes super cheap because if can just
perform a merge-join (with the need to sort the data, because both
datasets will be sorted on A already). Thus, the overhead of sorting in
the group might pay of in the join.

-Matthias

On 04/15/2016 10:50 PM, CPC wrote:
> Hi
> 
> When i look for what kind of optimizations flink does, i found
> https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals  is
> it up to date? Also i couldnt understand:
> 
> "Reusing of partitionings and sort orders across operators. If one operator
> leaves the data in partitioned fashion (and or sorted order), the next
> operator will automatically try and reuse these characteristics. The
> planning for this is done holistically and can cause earlier operators to
> pick more expensive algorithms, if they allow for better reusing of
> sort-order and partitioning."
> 
> Can you give example for "earlier operators to pick more expensive
> algorithms" ?
> 
> Regards
> 



signature.asc
Description: OpenPGP digital signature


Re: Issue deploying a topology to flink with a java api

2016-04-14 Thread Matthias J. Sax
change the version to 1.1-SNAPSHOT

On 04/14/2016 11:52 AM, star jlong wrote:
> One question which dependency of flink are you using because I'm using 
>  org.apache.flink 
> flink-storm-examples_2.11 
> 1.0.0
> And once I change the version to SNAPSHOT version, the pom.xml complains that 
> it could not satisfy the given dependency.
> 
> Le Jeudi 14 avril 2016 10h45, star jlong <jlongs...@yahoo.fr.INVALID> a 
> écrit :
>  
> 
>  Yes it is. 
> 
>     Le Jeudi 14 avril 2016 10h39, Matthias J. Sax <mj...@apache.org> a écrit :
>  
> 
>  For the fix, you need to use the current development version of Flink,
> ie, change your maven dependency from 1.0 to
> 1.1-SNAPSHOT
> 
> One question: what is FlinkGitService.class? It does only show up when
> you get the ClassLoader:
> 
>> ClassLoader loader = URLClassLoader.newInstance(new URL[] { new URL(path) }, 
>> FlinkGitService.class.getClassLoader());
> 
> It is the class that contains methods deploy() and getFlinkTopology() ?
> 
> -Matthias
> 
> On 04/14/2016 05:20 AM, star jlong wrote:
>> What I'm  trying to say is that to get submit the flink topology to flink, I 
>> had to do an invocation of the mainMethod(which contain the actaul topology) 
>> of my topology with the class java.lang.reflect.Method.That is if you a take 
>> look at the following the topology the mainMethod is buildTopologypublic 
>> class WordCountTopology {
>> public static void main(String[] args) throws Exception {
>>
>> Config conf = new Config();
>> conf.setDebug(true);
>> if (args != null && args.length > 0) {
>>
>> conf.setNumWorkers(1);
>> conf.setMaxTaskParallelism(1);
>> FlinkSubmitter.submitTopology(args[0], conf, buildTopology());
>>
>> }
>> // Otherwise, we are running locally
>> else {
>> conf.setMaxTaskParallelism(1);
>> FlinkLocalCluster cluster = new FlinkLocalCluster();
>> cluster.submitTopology("word-count", conf, buildTopology());
>> Thread.sleep(1);
>> }
>> }
>>
>> public static FlinkTopology buildTopology() {
>>
>> TopologyBuilder builder = new TopologyBuilder();
>>
>> builder.setSpout("spout", new RandomSentenceSpout(), 1);
>> builder.setBolt("split", new SplitSentence(), 
>> 1).shuffleGrouping("spout");
>> builder.setBolt("count", new WordCount(), 1).fieldsGrouping("split", new 
>> Fields("word"));
>>
>> builder.setBolt("writeIntoFile", new 
>> BoltFileSink("/home/username/wordcount.txt", new OutputFormatter() {
>> private static final long serialVersionUID = 1L;
>>
>> @Override
>> public String format(Tuple tuple) {
>> return tuple.toString();
>> }
>> }), 1).shuffleGrouping("count");
>>
>> return FlinkTopology.createTopology(builder);
>>
>> }
>> }That is the method that I want to invoke from my jar so that I will be able 
>> to do the submitting of the topology without any problem ie
>>
>> final FlinkClient cluster = 
>> FlinkClient.getConfiguredClient(conf);cluster.submitTopology(topologyId, 
>> uploadedJarLocation, getFlinkTopogy(String.format("file://%s", 
>> jarPath),properties.getProperty("topologyMainClass"), 
>> properties.getProperty("methodName")));
>> Where getFlinkTopology() return the contains actually topology
>>
>> But while doing that reflection I had an exception.
>>
>> Another question please. How do I make used of the hotflix of Till. 
>>
>> Le Jeudi 14 avril 2016 0h19, Matthias J. Sax <mj...@apache.org> a écrit :
>>   
>>
>>   I cannot follow completely in your last step when you fail. What do you
>> mean by "I'm stuck at the level when I want to copy that from the jar to
>> submit it to flink"?
>>
>> Btw: I copied the code from the SO question and it works for me on the
>> current master (which includes Till's hotfix).
>>
>> -Matthias
>>
>>
>> On 04/13/2016 09:39 PM, star jlong wrote:
>>> Thanks Matthias for the reply. 
>>> Maybe I should explain what I want to do better.My objective is to deploy a 
>>> flink topology on flink using java but in the production mode. For that 
>>> here are the step that I have taken.
>>> 1-Convert a sample wordcount storm topology to a flink topology as 
>>> indicated here 
>&g

Re: Issue deploying a topology to flink with a java api

2016-04-13 Thread Matthias J. Sax
Hi jstar,

I need to have a close look. But I am wondering why you use reflection
in the first place? Is there any specific reason for that?

Furthermore, the example provided in project maven-example also covers
the case to submit a topology to Flink via Java. Have a look at
org.apache.flink.storm.wordcount.WordCountRemoteBySubmitter

It contains a main() method and you can just run it as a regular Java
program in your IDE.

The SO question example should also work; it also contains a main()
method, so you should be able to run it.

Btw: If you use Storm-Compatiblitly-API there is no reason the get an
ExecutuionEnvironment in you code. This happen automatically with
FlinkClient/FlinkSubmitter.

Furthermore, I would recommend to use FlinkSubmitter instead of
FlinkClient as it is somewhat simpler to use.

About SO question: I guess the problem is the jar assembling. The user says

"Since I'using maven to handle my dependencies, I do a Mvn clean install
to obtain the jar."

I guess this is not sufficient to bundle a correct jar. Have a look into
pom.xml from storm-examples. It uses maven plug-ins in assemble the jar
correctly. (Regular maven artifact do not work for job submission...)

Will have a close look and follow up... Hope this helps already.

-Matthias

On 04/13/2016 06:23 PM, star jlong wrote:
> Thanks for the reply.
> @Stephen, I try using RemoteEnvironment to submit my topology to flink. 
> Here is the try that I did RemoteEnvironment remote = new 
> RemoteEnvironment(ipJobManager, 6123, jarPath); remote.execute();
> While running the program, this is the exception that I got.
> java.lang.RuntimeException: No data sinks have been created yet. A program 
> needs at least one sink that consumes data. Examples are writing the data set 
> or printing it.
>  
> 
> Le Mercredi 13 avril 2016 16h54, Till Rohrmann  a 
> écrit :
>  
> 
>  I think this is not the problem here since the problem is still happening
> on the client side when the FlinkTopology tries to copy the registered
> spouts. This happens before the job is submitted to the cluster. Maybe
> Mathias could chime in here.
> 
> Cheers,
> Till
> 
> On Wed, Apr 13, 2016 at 5:39 PM, Stephan Ewen  wrote:
> 
>> Hi!
>>
>> For flink standalone programs, you would use a "RemoteEnvironment"
>>
>> For Storm, I would use the "FlinkClient" in "org.apache.flink.storm.api".
>> That one should deal with jars, classloaders, etc for you.
>>
>> Stephan
>>
>>
>> On Wed, Apr 13, 2016 at 3:43 PM, star jlong 
>> wrote:
>>
>>> Thanks for the suggestion. Sure those examples are interesting and I have
>>> deploy them successfully on flink. The deployment is done the command
>> line
>>> that is doing something like
>>> bin/flink run example.jarBut what I want is to submit the topology to
>>> flink using a java program.
>>>
>>> Thanks.
>>>
>>> Le Mercredi 13 avril 2016 14h12, Chesnay Schepler <
>> ches...@apache.org>
>>> a écrit :
>>>
>>>
>>>   you can find examples here:
>>>
>>>
>> https://github.com/apache/flink/tree/master/flink-contrib/flink-storm-examples
>>>
>>> we haven't established yet that it is an API issue; it could very well
>>> be caused by the reflection magic you're using...
>>>
>>> On 13.04.2016 14:57, star jlong wrote:
 Ok, it seems like there an issue with the api. So please does anybody
>>> has a working example for deploying a topology using the flink dependency
>>> flink-storm_2.11 or any other will be welcoming.

 Thanks,
 jstar

   Le Mercredi 13 avril 2016 13h44, star jlong
>>>  a écrit :


   Hi Schepler,

 Thanks for the concerned. Yes I'm actaully having the same issue as
>>> indicated on that post because I'm the one that posted that issue.

   Le Mercredi 13 avril 2016 13h35, Chesnay Schepler <
>>> ches...@apache.org> a écrit :



>>>
>> http://stackoverflow.com/questions/36584784/issues-while-submitting-a-topology-to-apache-flink-using-the-flink-api

 On 13.04.2016 14:28, Till Rohrmann wrote:
> Hi jstar,
>
> what's exactly the problem you're observing?
>
> Cheers,
> Till
>
> On Wed, Apr 13, 2016 at 2:23 PM, star jlong
>>  wrote:
>
>> Hi there,
>>
>> I'm jstar. I have been playing around with flink. I'm very much
>>> interested
>> in submitting a topoloy  to flink using its api. As indicated
>> on stackoverflow, that is the try that I have given. But I was stuck
>>> with
>> some exception. Please any help will be welcoming.
>>
>>
>> Thanks.
>> jstar





>>>
>>>
>>>
>>>
>>>
>>
> 
>   
> 



signature.asc
Description: OpenPGP digital signature


Re: Broken links after doc resturcturing

2016-04-10 Thread Matthias J. Sax
Done.

This is a preview: https://github.com/mjsax/flink-web/tree/broken-links

Please let me know if I can push it. (Maybe someone can verify, that I
did not miss anything...)

-Matthias


On 04/07/2016 04:41 PM, Stephan Ewen wrote:
> Hey Matthias!
> 
> The mailing list and feature requests are getting super many, hard to keep
> up and fix things within days...
> 
> Do you think you could fix those links? As a simple approach, I would
> suggest to
> 
>   - Truncate the history to drop everything earlier than "Flink" days (in
> your list before Hadoop Compatibility in Flink)
> 
>   - Links that point to a changing URL (docs-master or Github master
> branch) to point to the release around that time.
> 
>   - Links from news that have no match any more should probably be
> dropped...
> 
> Stephan
> 
> 
> 
> On Thu, Apr 7, 2016 at 2:54 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Anyone?
>>
>> On 04/04/2016 05:06 PM, Matthias J. Sax wrote:
>>> Hi,
>>>
>>> I just stepped through the whole blog. Some stuff can get fixed easily,
>>> more links should just be removed, and for some I am not sure what to do
>>> about (quite old stuff).
>>>
>>> I put my though about each broken link (or nothing if I have no idea how
>>> to deal with it). Please give feedback.
>>>
>>>
>>>
>>> ** Version 0.2 Released
>>> -> Link zum Changelog:
>>> https://stratosphere.eu/wiki/doku.php/wiki:changesrelease0.2
>>>
>>>
>>>
>>> ** Stratosphere Demo Accepted for ICDE 2013 -> links zum Paper und poster
>>> ->
>>>
>> https://flink.apache.org/assets/papers/optimizationOfDataFlowsWithUDFs_13.pdf
>>> ->
>>>
>> https://flink.apache.org/assets/papers/optimizationOfDataFlowsWithUDFs_poster_13.pdf
>>>
>>> Paper:
>> http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6544927
>>> Poster not sure (maybe ask Fabian or remove link)
>>>
>>>
>>>
>>> ** Stratosphere Demo Paper Accepted for BTW 2013
>>> -> https://flink.apache.org/assets/papers/Sopremo_Meteor%20BigData.pdf
>>>
>>>
>>>
>>> ** ICDE 2013 Demo Preview
>>> -> https://flink.apache.org/publications
>>>
>>> This section does no exist any more. But the paper is linked in another
>>> blog post anyway. Maybe, we can remove the whole blog post.
>>>
>>>
>>>
>>> ** Paper "All Roads Lead to Rome: Optimistic Recovery for Distributed
>>> Iterative Data Processing" accepted at CIKM 2013
>>> -> https://flink.apache.org/assets/papers/optimistic.pdf
>>>
>>> Available at https://dl.acm.org/citation.cfm?doid=2505515.2505753
>>>
>>>
>>>
>>> ** Stratosphere got accepted to the Hadoop Summit Europe in Amsterdam
>>> -> http://hadoopsummit.org/amsterdam/
>>> ->
>>>
>> https://hadoopsummit.uservoice.com/forums/196822-future-of-apache-hadoop/filters/top
>>>
>>> I would just remove those links.
>>>
>>>
>>>
>>> ** Stratosphere 0.4 Released
>>> ->
>>>
>> https://flink.apache.org/blog/tutorial/2014/01/12/0.4-migration-guide.html
>>> ->
>>>
>> https://flink.apache.org/blog/tutorial/2014/01/12/0.4-migration-guide.html
>>> ->
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/0.4/programming_guides/iterations.html
>>> ->
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/0.4/programming_guides/iterations.html
>>> ->
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/0.4/program_execution/local_executor.html
>>> -> https://flink.apache.org/quickstart/
>>>
>>>
>>>
>>>
>>> ** Optimizer Plan Visualization Tool
>>> -> http://stratosphere.eu/docs/0.4/program_execution/web_interface.html
>>> -> http://stratosphere.eu/docs/0.4/program_execution/local_executor.html
>>>
>>>
>>>
>>> ** Use Stratosphere with Amazon Elastic MapReduce
>>> ->
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/0.4/setup/yarn.html
>>>
>>>
>>>
>>> ** Stratosphere version 0.5 available
>>> -> http://stratosphere.eu/docs/0.5/
>>> -> http://stratosphere.eu/docs/0.5/programming_guides/examples_java.html
>>>
>>>
>>>
>>> ** Hadoop Compatibility in Flink

Re: Broken links after doc resturcturing

2016-04-07 Thread Matthias J. Sax
Anyone?

On 04/04/2016 05:06 PM, Matthias J. Sax wrote:
> Hi,
> 
> I just stepped through the whole blog. Some stuff can get fixed easily,
> more links should just be removed, and for some I am not sure what to do
> about (quite old stuff).
> 
> I put my though about each broken link (or nothing if I have no idea how
> to deal with it). Please give feedback.
> 
> 
> 
> ** Version 0.2 Released
> -> Link zum Changelog:
> https://stratosphere.eu/wiki/doku.php/wiki:changesrelease0.2
> 
> 
> 
> ** Stratosphere Demo Accepted for ICDE 2013 -> links zum Paper und poster
> ->
> https://flink.apache.org/assets/papers/optimizationOfDataFlowsWithUDFs_13.pdf
> ->
> https://flink.apache.org/assets/papers/optimizationOfDataFlowsWithUDFs_poster_13.pdf
> 
> Paper: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6544927
> Poster not sure (maybe ask Fabian or remove link)
> 
> 
> 
> ** Stratosphere Demo Paper Accepted for BTW 2013
> -> https://flink.apache.org/assets/papers/Sopremo_Meteor%20BigData.pdf
> 
> 
> 
> ** ICDE 2013 Demo Preview
> -> https://flink.apache.org/publications
> 
> This section does no exist any more. But the paper is linked in another
> blog post anyway. Maybe, we can remove the whole blog post.
> 
> 
> 
> ** Paper "All Roads Lead to Rome: Optimistic Recovery for Distributed
> Iterative Data Processing" accepted at CIKM 2013
> -> https://flink.apache.org/assets/papers/optimistic.pdf
> 
> Available at https://dl.acm.org/citation.cfm?doid=2505515.2505753
> 
> 
> 
> ** Stratosphere got accepted to the Hadoop Summit Europe in Amsterdam
> -> http://hadoopsummit.org/amsterdam/
> ->
> https://hadoopsummit.uservoice.com/forums/196822-future-of-apache-hadoop/filters/top
> 
> I would just remove those links.
> 
> 
> 
> ** Stratosphere 0.4 Released
> ->
> https://flink.apache.org/blog/tutorial/2014/01/12/0.4-migration-guide.html
> ->
> https://flink.apache.org/blog/tutorial/2014/01/12/0.4-migration-guide.html
> ->
> https://ci.apache.org/projects/flink/flink-docs-master/0.4/programming_guides/iterations.html
> ->
> https://ci.apache.org/projects/flink/flink-docs-master/0.4/programming_guides/iterations.html
> ->
> https://ci.apache.org/projects/flink/flink-docs-master/0.4/program_execution/local_executor.html
> -> https://flink.apache.org/quickstart/
> 
> 
> 
> 
> ** Optimizer Plan Visualization Tool
> -> http://stratosphere.eu/docs/0.4/program_execution/web_interface.html
> -> http://stratosphere.eu/docs/0.4/program_execution/local_executor.html
> 
> 
> 
> ** Use Stratosphere with Amazon Elastic MapReduce
> ->
> https://ci.apache.org/projects/flink/flink-docs-master/0.4/setup/yarn.html
> 
> 
> 
> ** Stratosphere version 0.5 available
> -> http://stratosphere.eu/docs/0.5/
> -> http://stratosphere.eu/docs/0.5/programming_guides/examples_java.html
> 
> 
> 
> ** Hadoop Compatibility in Flink
> ->
> https://ci.apache.org/projects/flink/flink-docs-release-0.7/hadoop_compatibility.html
> 
> Works; however, we might want to point to a newer version (maybe current
> master?)
> 
> 
> 
> ** January 2015 in the Flink community
> -> http://data-artisans.com/computing-recommendations-with-flink.html
> ->
> http://2015.hadoopsummit.org/amsterdam-blog/announcing-the-community-vote-session-winners-for-the-2015-hadoop-summit-europe/
> 
> 
> 
> ** Introducing Flink Streaming
> ->
> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#sources
> ->
> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#stream-connectors
> ->
> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#window-operators
> 
> Those can be fixed easily (point to current master)
> 
> 
> 
> ** February 2015 in the Flink community
> ->
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example
> -> https://github.com/apache/flink/tree/master/flink-staging/flink-table
> -> https://github.com/apache/flink/tree/master/flink-staging/flink-hcatalog
> 
> I would just removed those.
> 
> 
> 
> ** Peeking into Apache Flink's Engine Room
> ->
> https://ci.apache.org/projects/flink/flink-docs-master/apis/dataset_transformations.html#join-algorithm-hints
> 
> Can be fixed with
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#semantic-annotations
> 
> 
> 
> ** March 2015 in the Flink community
> -> http://data-artisans.com/dataflow.html
> -> https://github.com/apache/f

Re: Failing Test

2016-04-05 Thread Matthias J. Sax
Happened again after your fix:
https://travis-ci.org/apache/flink/jobs/120620482

-Matthias


On 04/01/2016 08:57 PM, Maximilian Michels wrote:
> Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689.
> 
> On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels <m...@apache.org> wrote:
>> Hi Matthias,
>>
>> Thanks for spotting the test failure. It's actually a bug in the code
>> and not a test problem. Fixing it.
>>
>> Cheers,
>> Max
>>
>> On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi <u...@apache.org> wrote:
>>> Hey Matthias,
>>>
>>> the test has been only recently added with the resource management
>>> refactoring. It's probably just a too aggressive timeout for Travis.
>>>
>>> @Max: Did you ever see this fail?
>>>
>>> – Ufuk
>>>
>>> On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax <mj...@apache.org> wrote:
>>>> Anyone seen this before? One-time thing or test instability?
>>>>
>>>>> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
>>>>> (29848225634 nanoseconds) during expectMsgClass waiting for class 
>>>>> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful
>>>>
>>>>
>>>> -Matthias
>>>>



signature.asc
Description: OpenPGP digital signature


Re: Broken links after doc resturcturing

2016-04-04 Thread Matthias J. Sax
m/apache/flink/blob/master/flink-core/src/test/java/org/apache/flink/core/memory/benchmarks/MemorySegmentSpeedBenchmark.java



** Storm Compatibility in Apache Flink: How to run existing Storm
topologies on Flink
->
https://ci.apache.org/projects/flink/flink-docs-master/apis/storm_compatibility.html

Can be fixed easily.



** Flink Forward 2015
-> http://flink-forward.org/   point to FF16 now
-> http://flink-forward.org/?post_type=day



Just let me know what you think about it.


-Matthias


On 04/04/2016 03:46 PM, Ufuk Celebi wrote:
> Hey Matthias,
> 
> could you update the links in the posts or share a list of the posts
> with broken links so I can update them?
> 
> We can also provide redirects for the broken links. With 1.0, we now
> should be more careful to not break/redirect links...
> 
> 
> On Mon, Apr 4, 2016 at 2:53 PM, Matthias J. Sax <mj...@apache.org> wrote:
>> Hi,
>>
>> I just realized, that the restructuring of the programming guide in our
>> documentation
>> https://github.com/apache/flink/commit/ad267a4b199979536dd8a5572628eefc77d7e0f4)
>> broke a couple of links from the Flink blog.
>>
>> Should we update those older blog post? JIRA for it?
>>
>> For the future: would it be a good idea to maintain a "link directory"
>> to keep track of used links that must be updated in case of a similar
>> doc change in the future (for links that point to current master docs)?
>>
>>
>>
>> -Matthias
>>



signature.asc
Description: OpenPGP digital signature


Broken links after doc resturcturing

2016-04-04 Thread Matthias J. Sax
Hi,

I just realized, that the restructuring of the programming guide in our
documentation
https://github.com/apache/flink/commit/ad267a4b199979536dd8a5572628eefc77d7e0f4)
broke a couple of links from the Flink blog.

Should we update those older blog post? JIRA for it?

For the future: would it be a good idea to maintain a "link directory"
to keep track of used links that must be updated in case of a similar
doc change in the future (for links that point to current master docs)?



-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Failing Test

2016-04-02 Thread Matthias J. Sax
Thanks. Just tried is out and it works :)

On 04/01/2016 08:57 PM, Maximilian Michels wrote:
> Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689.
> 
> On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels <m...@apache.org> wrote:
>> Hi Matthias,
>>
>> Thanks for spotting the test failure. It's actually a bug in the code
>> and not a test problem. Fixing it.
>>
>> Cheers,
>> Max
>>
>> On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi <u...@apache.org> wrote:
>>> Hey Matthias,
>>>
>>> the test has been only recently added with the resource management
>>> refactoring. It's probably just a too aggressive timeout for Travis.
>>>
>>> @Max: Did you ever see this fail?
>>>
>>> – Ufuk
>>>
>>> On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax <mj...@apache.org> wrote:
>>>> Anyone seen this before? One-time thing or test instability?
>>>>
>>>>> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
>>>>> (29848225634 nanoseconds) during expectMsgClass waiting for class 
>>>>> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful
>>>>
>>>>
>>>> -Matthias
>>>>



signature.asc
Description: OpenPGP digital signature


Failing Test

2016-04-01 Thread Matthias J. Sax
Anyone seen this before? One-time thing or test instability?

> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
> (29848225634 nanoseconds) during expectMsgClass waiting for class 
> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Submission Problem

2016-03-31 Thread Matthias J. Sax
StormConfig is set a a global job parameter

FlinkClient.java line 337ff

> ExecutionConfig flinkConfig = topology.getExecutionEnvironment().getConfig();
> flinkConfig.setGlobalJobParameters(new StormConfig(conf));



On 03/31/2016 05:05 PM, Stephan Ewen wrote:
> Hmm, it is wrong that the JobManager tries to load that class directly from
> the actor message.
> All user code should be deserialized lazily.
> 
> How is that class passed? Implicitly through some config?
> 
> On Thu, Mar 31, 2016 at 4:51 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Here we go...
>>
>> StormConfig.class is contained in the user jar file. I guess I need to
>> "register" it somehow? Or is it a class loading issue?
>>
>>
>>> 2016-03-31 16:47:33,095 ERROR akka.remote.EndpointWriter
>> - AssociationError [akka.tcp://flink@127.0.0.1:6123]
>> <- [akka.tcp://flink@127.0.0.1:32775]: Error
>> [org.apache.flink.storm.util.StormConfig] [
>>> java.lang.ClassNotFoundException: org.apache.flink.storm.util.StormConfig
>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>   at java.lang.Class.forName0(Native Method)
>>>   at java.lang.Class.forName(Class.java:278)
>>>   at
>> java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
>>>   at
>> akka.util.ClassLoaderObjectInputStream.resolveClass(ClassLoaderObjectInputStream.scala:19)
>>>   at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>>   at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>>   at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>>   at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>   at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>>   at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>>   at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>   at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>   at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>>   at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>>   at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>   at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>   at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>>   at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>>   at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>   at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>   at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>>   at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>>   at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>   at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>   at
>> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
>>>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>>>   at
>> akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
>>>   at
>> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
>>>   at scala.util.Try$.apply(Try.scala:161)
>>>   at
>> akka.serialization.Serialization.deserialize(Serialization.scala:98)
>>>   at
>> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
>>>   at
>> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
>>>   at
>> akka.remote.DefaultMessage

Re: Submission Problem

2016-03-31 Thread Matthias J. Sax
Here we go...

StormConfig.class is contained in the user jar file. I guess I need to
"register" it somehow? Or is it a class loading issue?


> 2016-03-31 16:47:33,095 ERROR akka.remote.EndpointWriter  
>   - AssociationError [akka.tcp://flink@127.0.0.1:6123] <- 
> [akka.tcp://flink@127.0.0.1:32775]: Error 
> [org.apache.flink.storm.util.StormConfig] [
> java.lang.ClassNotFoundException: org.apache.flink.storm.util.StormConfig
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:278)
>   at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625)
>   at 
> akka.util.ClassLoaderObjectInputStream.resolveClass(ClassLoaderObjectInputStream.scala:19)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>   at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
>   at 
> akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
>   at scala.util.Try$.apply(Try.scala:161)
>   at akka.serialization.Serialization.deserialize(Serialization.scala:98)
>   at 
> akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
>   at 
> akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58)
>   at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58)
>   at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:76)
>   at 
> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> ]



On 03/31/2016 04:31 PM, Till Rohrmann wrote:
> Could you please rerun t

Re: Submission Problem

2016-03-31 Thread Matthias J. Sax
enclosed the logs.. maybe you can make some sense out if them.

On 03/31/2016 02:52 PM, Till Rohrmann wrote:
> I would assume that something went wrong on the JobManager side. Could you
> check the logs if they contain something suspicious? Additionally you could
> turn on lifecycle event logging​ for Akka.
> 
> Cheers,
> Till
> ​
> 
2016-03-31 16:17:22,950 INFO  org.apache.flink.client.CliFrontend   - 
2016-03-31 16:17:22,951 INFO  org.apache.flink.client.CliFrontend   -  Starting Command Line Client (Version: 1.1-SNAPSHOT, Rev:bdf55b9, Date:30.03.2016 @ 19:19:23 CEST)
2016-03-31 16:17:22,951 INFO  org.apache.flink.client.CliFrontend   -  Current user: mjsax
2016-03-31 16:17:22,951 INFO  org.apache.flink.client.CliFrontend   -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.7/24.95-b01
2016-03-31 16:17:22,951 INFO  org.apache.flink.client.CliFrontend   -  Maximum heap size: 1769 MiBytes
2016-03-31 16:17:22,951 INFO  org.apache.flink.client.CliFrontend   -  JAVA_HOME: /usr/lib/jvm/java-7-openjdk-amd64
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   -  Hadoop version: 2.3.0
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   -  JVM Options:
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - -Dlog.file=/home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/log/flink-mjsax-client-T420s-dbis-mjsax.log
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - -Dlog4j.configuration=file:/home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/conf/log4j-cli.properties
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - -Dlogback.configurationFile=file:/home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/conf/logback.xml
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   -  Program Arguments:
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - run
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - /home/mjsax/workspace_flink/flink/flink-contrib/flink-storm-examples/target/WordCount-StormTopology.jar
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   -  Classpath: /home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/lib/flink-dist_2.10-1.1-SNAPSHOT.jar:/home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/lib/flink-python_2.10-1.1-SNAPSHOT.jar:/home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/lib/log4j-1.2.17.jar:/home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar:::
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - 
2016-03-31 16:17:22,953 INFO  org.apache.flink.client.CliFrontend   - Using configuration directory /home/mjsax/workspace_flink/flink/flink-dist/target/flink-1.1-SNAPSHOT-bin/flink-1.1-SNAPSHOT/conf
2016-03-31 16:17:22,954 INFO  org.apache.flink.client.CliFrontend   - Trying to load configuration file
2016-03-31 16:17:23,220 INFO  org.apache.flink.client.CliFrontend   - Running 'run' command.
2016-03-31 16:17:23,230 INFO  org.apache.flink.client.CliFrontend   - Building program from JAR file
2016-03-31 16:17:23,261 INFO  org.apache.flink.client.program.Client- Starting client actor system
2016-03-31 16:17:23,263 INFO  org.apache.flink.runtime.client.JobClient - Starting JobClient actor system
2016-03-31 16:17:23,655 INFO  akka.event.slf4j.Slf4jLogger  - Slf4jLogger started
2016-03-31 16:17:23,691 INFO  Remoting  - Starting remoting
2016-03-31 16:17:23,808 INFO  Remoting  - Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:39643]
2016-03-31 16:17:23,813 INFO  org.apache.flink.runtime.client.JobClient - Started JobClient actor system at 127.0.0.1:39643
2016-03-31 16:17:23,813 INFO  org.apache.flink.client.CliFrontend   - Starting execution of program
2016-03-31 16:17:23,813 INFO  org.apache.flink.client.program.Client 

Submission Problem

2016-03-31 Thread Matthias J. Sax
Hi,

I just tried to submit Flink's Storm-Topology example via command line:

bin/flink run
~/workspace_flink/flink/flink-contrib/flink-storm-examples/target/WordCount-StormTopology.jar


However, I get a timeout and the program is not submitted. I tracked the
problem down to the following statement:

JobClient -> line 211:

> Future future = jobManagerGateway.ask(
>   new JobManagerMessages.SubmitJob(
>   jobGraph,
>   ListeningBehaviour.DETACHED // only receive the Acknowledge for 
> the job submission message
>   ),
>   timeout);
>   
> result = Await.result(future, timeout);

The jobManagerGateway has the "value":
AkkaActorGateway(akka.tcp://flink@127.0.0.1:6123/user/jobmanager, null)

Not sure why Await.result does not return, as the value of
jobManagerGateway seems to be correct. Any idea?


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: StateBackend

2016-02-16 Thread Matthias J. Sax
Thanks for the input.

Just to clearly my understanding: by "Flink-embedded [...] scale out as
the Flink job scales out", you mean that each TM hosts an embedded state
backend service, ie, all those instances form a logically single but
distributed backend service? How is is ensure, that the state is store
reliable (ie, not on the same machine the state belongs to)? Is this
handled by the service automatically, or is it Flink's responsibility?

What do you mean by "They work nicely with savepoints, because every
Flink job has a copy of the state"?

The classification itself makes sense. I guess, we should reflect this
in the documentation. Not sure, if the code can/should reflect this -- I
doubt it.

-Matthias


On 02/16/2016 10:32 AM, Stephan Ewen wrote:
> I think this is actually a pretty good question. Right now, there are two
> different types of state backends:
> 
>   (1) Flink-embedded. They are independent of external services, scale out
> as the Flink job scales out, and are really mainly a way of storing and
> backuping key/value state.
> For example: MemoryStateBackend, FsStateBackend, RocksDBStateBackend
> They work nicely with savepoints, because every Flink job has a
> copy of the state.
> 
>   (2) Flink-connected:The state is outside Flink. The systems need to run
> separately, don't scale with Flink.
>Examples: DBStateBackend
>One advantage they have currently is that state in Flink is small,
> so checkpoints and restore are very cheap.
> 
> 
> I think we should start classifying the state backends like this.
> 
> 
> Greetings,
> Stephan
> 
> 
> On Mon, Feb 15, 2016 at 3:11 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> 
>> Hi,
>> sorry about not answering but I wanted to wait since I already voiced my
>> opinion on the PR.
>>
>> I think it is better to assume an already running redis because it is
>> easier to work around clashes in running redis instances (ports, data
>> directory, and such). Then, however, care needs to be taken to make sure
>> that the state inside the one redis instance does not clash.
>>
>> Cheers,
>> Aljoscha
>>> On 15 Feb 2016, at 14:53, Matthias J. Sax <mj...@apache.org> wrote:
>>>
>>> Anyone?
>>>
>>> Otherwise, I will suggest to move forward with the PR using the
>>> assumption that Redis must be started manually.
>>>
>>> -Matthias
>>>
>>> On 02/11/2016 08:28 PM, Matthias J. Sax wrote:
>>>> Hi,
>>>>
>>>> In Flink it is possible to have different backends for operator state. I
>>>> am wondering what the best approach for different state backends would
>> be.
>>>>
>>>> Let's assume the backend is a database server. The following questions
>>>> arise:
>>>>  - Should the database server be started manually by the user or can
>>>> Flink start the server automatically it used?
>>>>(this seems to be the approach for RocksDB as embedded servers)
>>>>  - Should each job use the same or individual backup server (or maybe a
>>>> mix of both?)
>>>>
>>>> I personally think, that Flink should not start-up a backup server but
>>>> assume that it is available when the job is submitted. This allows the
>>>> user also the start up multiple instances of the backup server and
>>>> choose which one to use for each job individually.
>>>>
>>>> What do you think about it? I ask because of the current PR for Redis as
>>>> StateBackend:
>>>> https://github.com/apache/flink/pull/1617
>>>>
>>>> There is no embedded mode for Redis as for RocksDB.
>>>>
>>>> -Matthias
>>>>
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3406) Extend RabbitMQ source with interface StoppableFunction

2016-02-15 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3406:
--

 Summary: Extend RabbitMQ source with interface StoppableFunction
 Key: FLINK-3406
 URL: https://issues.apache.org/jira/browse/FLINK-3406
 Project: Flink
  Issue Type: Improvement
  Components: Streaming Connectors
Reporter: Matthias J. Sax


{{RMQSource}} is not stoppable right now. To make it stoppable, is must 
implement {{StoppableFunction}}. Implementing method {{stop()}} must ensure, 
that the source stops receiving new messages from RabbitMQ and issues a final 
checkpoint. Afterwards, {{run()}} must return.

When implementing this, keep in mind, that the gathered checkpoint might later 
be used as a savepoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: StateBackend

2016-02-15 Thread Matthias J. Sax
Anyone?

Otherwise, I will suggest to move forward with the PR using the
assumption that Redis must be started manually.

-Matthias

On 02/11/2016 08:28 PM, Matthias J. Sax wrote:
> Hi,
> 
> In Flink it is possible to have different backends for operator state. I
> am wondering what the best approach for different state backends would be.
> 
> Let's assume the backend is a database server. The following questions
> arise:
>   - Should the database server be started manually by the user or can
> Flink start the server automatically it used?
> (this seems to be the approach for RocksDB as embedded servers)
>   - Should each job use the same or individual backup server (or maybe a
> mix of both?)
> 
> I personally think, that Flink should not start-up a backup server but
> assume that it is available when the job is submitted. This allows the
> user also the start up multiple instances of the backup server and
> choose which one to use for each job individually.
> 
> What do you think about it? I ask because of the current PR for Redis as
> StateBackend:
> https://github.com/apache/flink/pull/1617
> 
> There is no embedded mode for Redis as for RocksDB.
> 
> -Matthias
> 



signature.asc
Description: OpenPGP digital signature


StateBackend

2016-02-11 Thread Matthias J. Sax
Hi,

In Flink it is possible to have different backends for operator state. I
am wondering what the best approach for different state backends would be.

Let's assume the backend is a database server. The following questions
arise:
  - Should the database server be started manually by the user or can
Flink start the server automatically it used?
(this seems to be the approach for RocksDB as embedded servers)
  - Should each job use the same or individual backup server (or maybe a
mix of both?)

I personally think, that Flink should not start-up a backup server but
assume that it is available when the job is submitted. This allows the
user also the start up multiple instances of the backup server and
choose which one to use for each job individually.

What do you think about it? I ask because of the current PR for Redis as
StateBackend:
https://github.com/apache/flink/pull/1617

There is no embedded mode for Redis as for RocksDB.

-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Want Flink startup issues :-)

2016-02-07 Thread Matthias J. Sax
For the road map ideas, there are often no JIRAs created yet. Mostly,
road map ideas are more complex things to get done, requiring design
documents and discussions before the actual coding can be done.

Usually, we create the JIRA (or multiple JIRAs) during the design phase.
So just watch the mailing list to keep track of the road map ideas you
are interested in. Of course, if you want to get started with any of
those, you can start the discussion on the mail by yourself and also
start a design document etc.

Just be aware, that this process will take some time, as the community
will give you a lot of feedback etc. If you want to get started more
quickly, working on an existing JIRA with limited scope is a good
starting point -- or you just do both in parallel ;)

-Matthias


On 02/06/2016 02:10 PM, Dongwon Kim wrote:
> Hi Chiwan!
> 
> That's what I wanted to know!
> Thanks!
> 
> Dongwon Kim
> 
> 2016-02-06 22:00 GMT+09:00 Chiwan Park :
>> Hi Dongwon,
>>
>> Yes, the things to do are picking an issue (by assigning the issue to you or 
>> commenting on the issue) and make changes and send a pull request for it.
>>
>> Welcome! :)
>>
>> Regards,
>> Chiwan Park
>>
>>> On Feb 6, 2016, at 3:31 PM, Dongwon Kim  wrote:
>>>
>>> Hi Fabian, Matthias, Robert!
>>>
>>> Thank you for welcoming me to the community :-)
>>> I'm taking a look at JIRA and "How to contribute" as you guys suggested.
>>> One trivial question is whether I just need to make a pull request
>>> after figuring out issues?
>>> Then I'll pick up any issue, figure it out, and then make a pull
>>> request by myself ;-)
>>>
>>> Meanwhile, I also read the roadmap and I find few plans capturing my 
>>> interest.
>>> - Making YARN resource dynamic
>>> - DataSet API Enhancements
>>> - Expose more runtime metrics
>>> Would any of you informs me of new or existing issues regarding the above?
>>>
>>> Thanks!
>>>
>>> Dongwon
>>>
>>> 2016-02-06 4:55 GMT+09:00 Fabian Hueske :
 Hi Dongwon,

 welcome to the Flink mailing list!
 What kind of issues are you interested in?

 - API / library features: DataSet API, DataStream API, SQL, StreamSQL,
 Graphs (Gelly)
 - Processing runtime: Batch, Streaming
 - Connectors to other systems: Stream sources/sinks
 - Web dashboard
 - Compatibility: Storm, Hadoop

 You can also have a look into Flink's issue tracker JIRA [1]. Right now, we
 have about 600 issues listed with any kind of difficulty and effort.
 If you find an issue that sounds interesting, just drop a note and we can
 give you some details about if you want to learn more.

 Best, Fabian

 [1]
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved

 2016-02-05 17:14 GMT+01:00 Dongwon Kim :

> Hello,
>
> I'm Dongwon Kim and I want to get involved in Flink community.
> Can anyone guide me through contributing to Flink with some startup 
> issues?
> Although my research interest lie in big data systems including Flink,
> Spark, MapReduce, and Tez, I've never participated in open source
> communities.
>
> FYI, I've done the following things for past few years:
> - I've studied Apache Hadoop (MRv1, MRv2, and YARN), Apache Tez, and
> Apache Spark through the source code.
> - My doctoral thesis is about improving the performance of MRv1 by
> making network pipelines between mappers and reducers like what Flink
> does.
> - I've used Ganglia to monitor the cluster performance and I've been
> interested in metrics and counters in big data systems.
> - I gave a talk named "a comparative performance evaluation of Flink"
> at last Flink Forward.
>
> I would be very appreciated if someone can help me get involved in the
> most promising ASF project :-)
>
> Greetings,
> Dongwon Kim
>
>>



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3356) JobClientActorRecoveryITCase.testJobClientRecovery

2016-02-07 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3356:
--

 Summary: JobClientActorRecoveryITCase.testJobClientRecovery
 Key: FLINK-3356
 URL: https://issues.apache.org/jira/browse/FLINK-3356
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


https://travis-ci.org/apache/flink/jobs/107597706
https://travis-ci.org/mjsax/flink/jobs/107597700

{noformat}
Tests in error: 
  JobClientActorRecoveryITCase.testJobClientRecovery:135 » Timeout Futures 
timed...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Want Flink startup issues :-)

2016-02-05 Thread Matthias J. Sax
Hi Dongwon,

very cool that you decided to join the community. Btw: very nice talk at
Flink Forward!

Fabian pointed out the most important things already.

On more thing I wanted to add (just in case you are not aware of it
already). There is a "How to contribute" section on the Flink web page:
https://flink.apache.org/how-to-contribute.html

This should also help to get you started. Looking forward to your first
pull request!

-Matthias


On 02/05/2016 08:55 PM, Fabian Hueske wrote:
> Hi Dongwon,
> 
> welcome to the Flink mailing list!
> What kind of issues are you interested in?
> 
> - API / library features: DataSet API, DataStream API, SQL, StreamSQL,
> Graphs (Gelly)
> - Processing runtime: Batch, Streaming
> - Connectors to other systems: Stream sources/sinks
> - Web dashboard
> - Compatibility: Storm, Hadoop
> 
> You can also have a look into Flink's issue tracker JIRA [1]. Right now, we
> have about 600 issues listed with any kind of difficulty and effort.
> If you find an issue that sounds interesting, just drop a note and we can
> give you some details about if you want to learn more.
> 
> Best, Fabian
> 
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved
> 
> 2016-02-05 17:14 GMT+01:00 Dongwon Kim :
> 
>> Hello,
>>
>> I'm Dongwon Kim and I want to get involved in Flink community.
>> Can anyone guide me through contributing to Flink with some startup issues?
>> Although my research interest lie in big data systems including Flink,
>> Spark, MapReduce, and Tez, I've never participated in open source
>> communities.
>>
>> FYI, I've done the following things for past few years:
>> - I've studied Apache Hadoop (MRv1, MRv2, and YARN), Apache Tez, and
>> Apache Spark through the source code.
>> - My doctoral thesis is about improving the performance of MRv1 by
>> making network pipelines between mappers and reducers like what Flink
>> does.
>> - I've used Ganglia to monitor the cluster performance and I've been
>> interested in metrics and counters in big data systems.
>> - I gave a talk named "a comparative performance evaluation of Flink"
>> at last Flink Forward.
>>
>> I would be very appreciated if someone can help me get involved in the
>> most promising ASF project :-)
>>
>> Greetings,
>> Dongwon Kim
>>
> 



signature.asc
Description: OpenPGP digital signature


New instable test?

2016-02-05 Thread Matthias J. Sax
Hi,

I had a failing build last night:
https://travis-ci.org/apache/flink/jobs/107116079

> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.782 sec 
> <<< FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase
> testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase)
>   Time elapsed: 20.153 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Communication with 
> JobManager failed: Job submission to the JobManager timed out.
>   at 
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:408)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:394)
>   at 
> org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:386)
>   at 
> org.apache.flink.runtime.client.JobClientActorRecoveryITCase$1.run(JobClientActorRecoveryITCase.java:107)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: Job 
> submission to the JobManager timed out.
>   at 
> org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:256)
>   at 
> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
>   at 
> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
>   at 
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Should I open a JIRA for it?

-Matthias



signature.asc
Description: OpenPGP digital signature


Re: New instable test?

2016-02-05 Thread Matthias J. Sax
And another one: https://travis-ci.org/mjsax/flink/jobs/107198383

This time it's
EventTimeWindowCheckpointingITCase.testPreAggregatedTumblingTimeWindow


-Matthias

On 02/05/2016 11:06 AM, Matthias J. Sax wrote:
> Hi,
> 
> I had a failing build last night:
> https://travis-ci.org/apache/flink/jobs/107116079
> 
>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.782 sec 
>> <<< FAILURE! - in 
>> org.apache.flink.runtime.client.JobClientActorRecoveryITCase
>> testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase)
>>   Time elapsed: 20.153 sec  <<< ERROR!
>> org.apache.flink.runtime.client.JobExecutionException: Communication with 
>> JobManager failed: Job submission to the JobManager timed out.
>>  at 
>> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:408)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:394)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.submitJobAndWait(FlinkMiniCluster.scala:386)
>>  at 
>> org.apache.flink.runtime.client.JobClientActorRecoveryITCase$1.run(JobClientActorRecoveryITCase.java:107)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: 
>> org.apache.flink.runtime.client.JobClientActorSubmissionTimeoutException: 
>> Job submission to the JobManager timed out.
>>  at 
>> org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:256)
>>  at 
>> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
>>  at 
>> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
>>  at 
>> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
>>  at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>  at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>  at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>  at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>  at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>  at 
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>  at 
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>  at 
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>  at 
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 
> 
> Should I open a JIRA for it?
> 
> -Matthias
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3344) EventTimeWindowCheckpointingITCase.testPreAggregatedTumblingTimeWindow

2016-02-05 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3344:
--

 Summary: 
EventTimeWindowCheckpointingITCase.testPreAggregatedTumblingTimeWindow
 Key: FLINK-3344
 URL: https://issues.apache.org/jira/browse/FLINK-3344
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


https://travis-ci.org/apache/flink/jobs/107198388
https://travis-ci.org/mjsax/flink/jobs/107198383

{noformat}
Maven produced no output for 300 seconds.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3323) Nifi connector not documented

2016-02-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3323:
--

 Summary: Nifi connector not documented
 Key: FLINK-3323
 URL: https://issues.apache.org/jira/browse/FLINK-3323
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Matthias J. Sax
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Connector Documentation missing

2016-02-03 Thread Matthias J. Sax
FlumeSink is there. FlumeSource and FlumeTopology is all put in
comments... Not sure about it.

There is no JIRA for a Flume connector... Does anyone know the status of
Flume connector?


On 02/03/2016 02:45 PM, Suneel Marthi wrote:
> Most of the flume code is commented out IIRC?
> 
> Sent from my iPhone
> 
>> On Feb 3, 2016, at 8:24 AM, Matthias J. Sax <mj...@apache.org> wrote:
>>
>> Hi,
>>
>> I just observed that there are 7 flink-streaming-connectors available
>> but only 5 are documented on the web page.
>>
>> Flume and Nifi are not documented. Did we miss to extend the
>> documentation for both (which should have been part of the commit of the
>> code) or was this left out on purpose?
>>
>> -Matthias
>>
>>



signature.asc
Description: OpenPGP digital signature


Connector Documentation missing

2016-02-03 Thread Matthias J. Sax
Hi,

I just observed that there are 7 flink-streaming-connectors available
but only 5 are documented on the web page.

Flume and Nifi are not documented. Did we miss to extend the
documentation for both (which should have been part of the commit of the
code) or was this left out on purpose?

-Matthias




signature.asc
Description: OpenPGP digital signature


Re: Release Flink 1.0.0

2016-01-25 Thread Matthias J. Sax
Hi,

I also would like to get the STOP signal in. But I do not have time to
work in it this week... According to Till's comments, this will be the
last round of reviewing required. So I should be able to finish it till
3rd Feb, but not sure.

What do you think about it?

-Matthias

On 01/25/2016 04:29 PM, Aljoscha Krettek wrote:
> Hi,
> I think the refactoring of Partitioned State and the WindowOperator on state 
> work is almost ready. I also have the RocksDB state backend working. I’m 
> running some tests now on the cluster and should be able to open a PR 
> tomorrow.
> 
> 
>> On 25 Jan 2016, at 15:36, Stephan Ewen  wrote:
>>
>> I agree, with Gyula, one out-of-core state backend should be in. We are
>> pretty close to that. Aljoscha has done good work on extending test
>> coverage for state backends, so we should be pretty comfortable that it
>> works as well, once we integrate new state backends with the tests.
>>
>> There is a bit of work do do around extending the interface of the
>> key/value state. I would like to start a separate thread on that today or
>> tomorrow...
>>
>>
>>
>> On Mon, Jan 25, 2016 at 12:16 PM, Gyula Fóra  wrote:
>>
>>> Hi,
>>>
>>> I agree that getting Flink 1.0.0 out soon would be great as Flink is in a
>>> pretty solid state right now.
>>>
>>> I wonder whether it would make sense to include an out-of-core state
>>> backend in streaming core that can be used with partitioned/window states.
>>> I think if we are releasing 1.0.0 we should have a solid feature set for
>>> our strong steaming use-cases  (in this case stateful, and windowed
>>> computations) and this should be a part of that.
>>>
>>> I know that Aljoscha is working on a solution for this which will probably
>>> involve a heavy refactor of the State backend interfaces, and I am also
>>> working on a similar solution. Maybe it would be good to get at least one
>>> good robust solution for this in and definitely Aljoscha's refactor for the
>>> interfaces.
>>>
>>> If we decide to do this, I think this needs 1-2 extra weeks of proper
>>> testing so this might delay the schedule a little bit.
>>>
>>> What do you think?
>>>
>>> Gyula
>>>
>>>
>>>
>>> Robert Metzger  ezt írta (időpont: 2016. jan. 25., H,
>>> 11:54):
>>>
 Hi,

 I would like to release 1.0.0 in the next weeks.
 Looking at the JIRAs, I think we are going to close a lot of blocking
 issues soon. How about we do a first release candidate on Wednesday, 3.
 February?

 The first release candidate is most likely not going to pass the vote,
>>> the
 primary goal will be collecting a list of issues we need to address.

 There is also a Wiki page for the 1.0 release:
 https://cwiki.apache.org/confluence/display/FLINK/1.0+Release

 Please -1 to this message if 3. February is too soon for the first RC (it
 also means that we'll do a feature freeze around that time).

>>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] Chengxiang Li added as committer

2016-01-19 Thread Matthias J. Sax
Congrats and welcome Chengxiang!! :)

On 01/19/2016 12:56 PM, Kostas Tzoumas wrote:
> Welcome Chengxiang!!
> 
> On Tue, Jan 19, 2016 at 12:31 PM, Stephan Ewen  wrote:
> 
>> Good to have you on board!
>>
>> On Tue, Jan 19, 2016 at 11:29 AM, Maximilian Michels 
>> wrote:
>>
>>> Pleased to have you with us Chengxiang!
>>>
>>> Cheers,
>>> Max
>>>
>>> On Tue, Jan 19, 2016 at 11:13 AM, Chiwan Park 
>>> wrote:
 Congrats! Welcome Chengxiang Li!

> On Jan 19, 2016, at 7:13 PM, Vasiliki Kalavri <
>>> vasilikikala...@gmail.com> wrote:
>
> Congratulations! Welcome Chengxiang Li!
>
> On 19 January 2016 at 11:02, Fabian Hueske  wrote:
>
>> Hi everybody,
>>
>> I'd like to announce that Chengxiang Li accepted the PMC's offer to
>>> become
>> a committer of the Apache Flink project.
>>
>> Please join me in welcoming Chengxiang Li!
>>
>> Best, Fabian
>>

 Regards,
 Chiwan Park

>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Emitting to non-declared output stream

2016-01-19 Thread Matthias J. Sax
Please ignore. Wrong list. Sorry!

On 01/19/2016 03:25 PM, Matthias J. Sax wrote:
> Hi,
> 
> currently, I am using Storm 0.9.3. For first tests on a new topology, I
> use LocalCluster. It happened to me, that I emitted tuples to an output
> stream, that I did never declare (and thus not connect to). For this, I
> would expect an error message in the log. However, I don't get anything
> which makes debugging very hard.
> 
> What do you think about it? Should I open a JIRA for it?
> 
> For real cluster deployment, I think the overhead of checking the output
> stream ID is too large and one can easily see the problem in the UI --
> the non-declared output streams that gets tuples show up there. However,
> for LocalCluster, there is not UI and an error log message would be nice.
> 
> 
> -Matthias
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3238) EventTimeAllWindowCheckpointingITCase.testSlidingTimeWindow()

2016-01-15 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3238:
--

 Summary: 
EventTimeAllWindowCheckpointingITCase.testSlidingTimeWindow()
 Key: FLINK-3238
 URL: https://issues.apache.org/jira/browse/FLINK-3238
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax


"Maven produced no output for 300 seconds."

https://travis-ci.org/mjsax/flink/jobs/102475719



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Git force pushing and deletion of branchs

2016-01-13 Thread Matthias J. Sax
+1

On 01/13/2016 11:51 AM, Fabian Hueske wrote:
> @Stephan: You mean all tags should be protected, not only those under rel?
> 
> 2016-01-13 11:43 GMT+01:00 Till Rohrmann :
> 
>> +1 for protecting the master branch.
>>
>> On Wed, Jan 13, 2016 at 11:42 AM, Li, Chengxiang 
>> wrote:
>>
>>> +1 on the original style.
>>> Master branch disable force pushing in case of misusing and feature
>> branch
>>> enable force pushing for flexible developing.
>>>
>>> -Original Message-
>>> From: Gyula Fóra [mailto:gyf...@apache.org]
>>> Sent: Wednesday, January 13, 2016 6:36 PM
>>> To: dev@flink.apache.org
>>> Subject: Re: [DISCUSS] Git force pushing and deletion of branchs
>>>
>>> +1 for protecting the master branch.
>>>
>>> I also don't see any reason why anyone should force push there
>>>
>>> Gyula
>>>
>>> Fabian Hueske  ezt írta (időpont: 2016. jan. 13.,
>> Sze,
>>> 11:07):
>>>
 Hi everybody,

 Lately, ASF Infra has changed the write permissions of all Git
 repositories twice.

 Originally, it was not possible to force into the master branch.
 A few weeks ago, infra disabled also force pushing into other branches.

 Now, this has changed again after the issue was discussed with the ASF
 board.
 The current situation is the following:
 - force pushing is allowed on all branched, including master
 - branches and tags can be deleted (not sure if this applies as well
 for the master branch)
 - "the 'protected' portions of git to primarily focus on refs/tags/rel
 - thus any tags under rel, will have their entire commit history."

 I am not 100% sure which exact parts of the repository are protected
 now as I am not very much into the details of Git.
 However, I believe we need to create new tags under rel for our
 previous releases to protect them.

 In addition, I would like to propose to ask Infra to add protection
 for the master branch. I can only recall very few situations where
 changes had to be reverted. I am much more in favor of a reverting
 commit now and then compared to a branch that can be arbitrarily
>> changed.

 What do you think about this?

 Best, Fabian

>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3214) WindowCheckpointingITCase

2016-01-10 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3214:
--

 Summary: WindowCheckpointingITCase
 Key: FLINK-3214
 URL: https://issues.apache.org/jira/browse/FLINK-3214
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


No Output for 300 seconds. Build got canceled.

https://travis-ci.org/apache/flink/jobs/101407292



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3212) JobManagerCheckpointRecoveryITCase

2016-01-10 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3212:
--

 Summary: JobManagerCheckpointRecoveryITCase
 Key: FLINK-3212
 URL: https://issues.apache.org/jira/browse/FLINK-3212
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


{noformat}
Tests in error: 
JobManagerCheckpointRecoveryITCase.testCheckpointRecoveryFailure:354 » 
IllegalState
JobManagerCheckpointRecoveryITCase.testCheckpointedStreamingSumProgram:192 » IO
{noformat}
https://travis-ci.org/mjsax/flink/jobs/101407273



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Empty test-jar

2016-01-05 Thread Matthias J. Sax
Hi,

I want to use HttpTestClient from flink-runtime-web in flink-tests
module. However, the test-jar file

  flink-runtime-web-1.0-SNAPSHOT-tests.jar

is empty... Any ideas how to fix this?

-Matthias



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3199) KafkaITCase.testOneToOneSources

2016-01-04 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3199:
--

 Summary: KafkaITCase.testOneToOneSources
 Key: FLINK-3199
 URL: https://issues.apache.org/jira/browse/FLINK-3199
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


https://travis-ci.org/mjsax/flink/jobs/100167558
{noformat}
Failed tests: 
KafkaITCase.testOneToOneSources:96->KafkaConsumerTestBase.runOneToOneExactlyOnceTest:521->KafkaTestBase.tryExecute:318
 Test failed: The program execution failed: Job execution failed.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Streaming Iterations, no headOperator ?

2016-01-02 Thread Matthias J. Sax
Hi,

I am working on FLINK-1870 and my changes break some unit tests. The
problem is in streaming.api.IterateTest.

I tracked the problem down to StreamTask.registerInputOutput(). It calls
headOperator.setup(...). My changes depend on this call, however, there
is no head operator (ie, ==null), and the call to setup(...) is not
made. Thus, for one operator the StreamingRuntimeContext member variable
"runtimeConext" is not initialized (ie, is null) and the test fails with
a NPE.

Can you give a short explanation about those code parts? What is the
condition for a missing headOperator? How can I ensure, that setup() is
called for all operators?

You can find my code here:
https://github.com/mjsax/flink/tree/flink-1870-inputChannelIndex

Thanks! And happy new year!


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Flink and Clojure

2015-12-15 Thread Matthias J. Sax
Hi,

I had a closer look into this and opened an PR to fix the issue:
https://github.com/apache/flink/pull/1457

As I am afraid of side effects I am not aware of, please give feedback
if this fix can be merged or not...

Thx.

-Matthias

On 12/11/2015 06:26 PM, Nick Dimiduk wrote:
> Ah I see. This explains the issues I had with submitting streaming jobs
> that package JDBC drivers. Is there a second in the guide/docs about
> classloader considerations with Flink?
> 
> On Thu, Dec 10, 2015 at 11:53 PM, Stephan Ewen <se...@apache.org> wrote:
> 
>> Flink's classloading is different from Hadoop's.
>>
>> In Hadoop, the entire JVM is started with all classes (including the user
>> jar) in the classpath already. In Flink, jars are added dymanically, to
>> running JVMs with custom class loaders. That way, running worker/master
>> processes can accept new jars without restarts. Important for low-latency,
>> shells, etc
>>
>> Flink itself respects these classloaders whenever dynamically looking up a
>> class. It may be that Closure is written such that it can only dynamically
>> instantiate what is the original classpath.
>>
>>
>>
>> On Fri, Dec 11, 2015 at 1:31 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
>>
>>> As far as the jvm is concerned, clojure is just another library. You
>> should
>>> be able to package it up like any other dependency and submit the job.
>>> That's always how it worked in Hadoop/MR anyway...
>>>
>>> On Thu, Dec 10, 2015 at 3:22 PM, Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>
>>>> Thanks for this idea.
>>>>
>>>> I extended my pom to include clojure-1.5.1.jar in my program jar.
>>>> However, the problem is still there... I did some research on the
>>>> Internet, and it seems I need to mess around with Clojure's class
>>>> loading strategy...
>>>>
>>>> -Matthias
>>>>
>>>> On 12/10/2015 06:47 PM, Nick Dimiduk wrote:
>>>>> I think Mattias's project is using maven though -- there's a pom in
>> the
>>>>> project that doesn't look generated. If you want to do it from lein,
>>>> maybe
>>>>> my old lein-hadoop [0] plugin can help?
>>>>>
>>>>> [0]: https://github.com/ndimiduk/lein-hadoop
>>>>>
>>>>> On Thu, Dec 10, 2015 at 8:54 AM, Robert Metzger <rmetz...@apache.org
>>>
>>>> wrote:
>>>>>
>>>>>> I had the same though as Nick. Maybe Leiningen allows to somehow
>>> build a
>>>>>> fat-jar containing the clojure standard library.
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 5:51 PM, Nick Dimiduk <ndimi...@apache.org>
>>>> wrote:
>>>>>>
>>>>>>> What happens when you follow the packaging examples provided in the
>>>> flink
>>>>>>> quick start archetypes? These have the maven-foo required to
>> package
>>> an
>>>>>>> uberjar suitable for flink submission. Can you try adding that step
>>> to
>>>>>> your
>>>>>>> pom.xml?
>>>>>>>
>>>>>>> On Thursday, December 10, 2015, Stephan Ewen <se...@apache.org>
>>> wrote:
>>>>>>>
>>>>>>>> This is a problem in Java.
>>>>>>>> I think you cannot dynamically modify the initial system class
>>> loader.
>>>>>>>>
>>>>>>>> What most apps do is check for the thread context class loader
>> when
>>>>>>>> dynamically loading classes. We can check and make sure that one
>> is
>>>>>> set,
>>>>>>>> but if Closure does not respect that, we have a problem.
>>>>>>>> Then Closure is not built for dynamic class loading.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 10, 2015 at 5:15 PM, Matthias J. Sax <
>> mj...@apache.org
>>>>>>>> <javascript:;>> wrote:
>>>>>>>>
>>>>>>>>> Would it make sense (if possible?) for Flink to add the user jar
>>>>>>>>> dynamically to it's own classpath so Clojure can find it? Or
>>> somehow
>>>>>>>>> modify Clojure's class loader?
>>>>>>>>>
>>>>>>>>> The ja

Re: New Blog Post Draft

2015-12-11 Thread Matthias J. Sax
Awesome! Thanks a lot!

On 12/11/2015 11:18 AM, Slim Baltagi wrote:
> Hi Matthias
> 
> I already shared your blog at Linkedin forums covering 255, 758 members!
> 
> Big Data and Analytics 160, 316
> Hadoop Users 74,333
> Big Data, Low Latency19,949
> Apache Storm   955
> Apache Flink 205
> 
> 
> Slim 
> 
> On Dec 11, 2015, at 4:10 AM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Just published it. Spread the word :)
>>
>> Thanks for all your valuable feedback!
>>
>> On 12/10/2015 01:17 PM, Matthias J. Sax wrote:
>>> Thanks for all your feedback! I updated the PR.
>>>
>>> I would like to publish the post today. Please let me know if you have
>>> any more comments on the draft.
>>>
>>> -Matthias
>>>
>>> On 12/09/2015 08:12 PM, Vasiliki Kalavri wrote:
>>>> Thanks Matthias! This is a very nice blog post and reads easily.
>>>>
>>>> On 9 December 2015 at 19:21, Ufuk Celebi <u...@apache.org> wrote:
>>>>
>>>>> Great post! Thanks!
>>>>>
>>>>> I have also made some comments in the commit.
>>>>>
>>>>> – Ufuk
>>>>>
>>>>>> On 09 Dec 2015, at 14:19, Maximilian Michels <m...@apache.org> wrote:
>>>>>>
>>>>>> Hi Matthias,
>>>>>>
>>>>>> Thank you for the blog post. You had already shared a first draft with
>>>>>> me. This one looks even better!
>>>>>>
>>>>>> I've made some minor comments. +1 to merge if these are addressed.
>>>>>>
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>> On Wed, Dec 9, 2015 at 1:20 PM, Matthias J. Sax <mj...@apache.org>
>>>>> wrote:
>>>>>>> Just updated the draft (thanks to Till and Slim for feedback) and opened
>>>>>>> a PR.
>>>>>>>
>>>>>>> https://github.com/apache/flink-web/pull/15
>>>>>>>
>>>>>>> @Slim: we discussed about benchmark result beforehand and decided to do
>>>>>>> a second blog post later on
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 12/09/2015 12:14 PM, Slim Baltagi wrote:
>>>>>>>> Matthias,
>>>>>>>>
>>>>>>>> This is great blog!
>>>>>>>>
>>>>>>>> I would like to suggest the following:
>>>>>>>> Change the title to: How to run your existing Storm applications on
>>>>> Apache Flink stream processing engine?
>>>>>>>> Fixing the few typos
>>>>>>>> For this reasons -> For these reasons
>>>>>>>> Storm compatibility package which allows users -> Storm compatibility
>>>>> package that allows users
>>>>>>>> we need to translated it  -> we need to translate it
>>>>>>>> in not available -> is not available
>>>>>>>> eg, StormWordCount.jar  -> e.g., StormWordCount.jar
>>>>>>>> Provide some benchmarks on running a storm application as it is versus
>>>>> running it on Flink.
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Slim Baltagi
>>>>>>>>
>>>>>>>> On Dec 9, 2015, at 5:10 AM, Robert Metzger <rmetz...@apache.org>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Great, thank you for writing the article.
>>>>>>>>>
>>>>>>>>> I like the general idea, but I've found some small typos.
>>>>>>>>> Can you open a pull request against the "flink-web" repo to make
>>>>> reviewing
>>>>>>>>> it easier?
>>>>>>>>>
>>>>>>>>> On Wed, Dec 9, 2015 at 11:32 AM, Matthias J. Sax <mj...@apache.org>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> after talking to several people and getting some feedback already, I
>>>>>>>>>> would like to suggest a new blog post for the project web site about
>>>>> the
>>>>>>>>>> Storm compatibility layer.
>>>>>>>>>>
>>>>>>>>>> You can find the draft here:
>>>>>>>>>>
>>>>>>>>>>
>>>>> https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/_posts/2015-12-07-storm-compatibility.md
>>>>>>>>>>
>>>>>>>>>> The missing (just not rendered) picture is this one:
>>>>>>>>>>
>>>>>>>>>>
>>>>> https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/img/blog/flink-storm.png
>>>>>>>>>>
>>>>>>>>>> Looking forward to your feedback!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Flink and Clojure

2015-12-10 Thread Matthias J. Sax
Thanks for this idea.

I extended my pom to include clojure-1.5.1.jar in my program jar.
However, the problem is still there... I did some research on the
Internet, and it seems I need to mess around with Clojure's class
loading strategy...

-Matthias

On 12/10/2015 06:47 PM, Nick Dimiduk wrote:
> I think Mattias's project is using maven though -- there's a pom in the
> project that doesn't look generated. If you want to do it from lein, maybe
> my old lein-hadoop [0] plugin can help?
> 
> [0]: https://github.com/ndimiduk/lein-hadoop
> 
> On Thu, Dec 10, 2015 at 8:54 AM, Robert Metzger <rmetz...@apache.org> wrote:
> 
>> I had the same though as Nick. Maybe Leiningen allows to somehow build a
>> fat-jar containing the clojure standard library.
>>
>> On Thu, Dec 10, 2015 at 5:51 PM, Nick Dimiduk <ndimi...@apache.org> wrote:
>>
>>> What happens when you follow the packaging examples provided in the flink
>>> quick start archetypes? These have the maven-foo required to package an
>>> uberjar suitable for flink submission. Can you try adding that step to
>> your
>>> pom.xml?
>>>
>>> On Thursday, December 10, 2015, Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> This is a problem in Java.
>>>> I think you cannot dynamically modify the initial system class loader.
>>>>
>>>> What most apps do is check for the thread context class loader when
>>>> dynamically loading classes. We can check and make sure that one is
>> set,
>>>> but if Closure does not respect that, we have a problem.
>>>> Then Closure is not built for dynamic class loading.
>>>>
>>>>
>>>>
>>>> On Thu, Dec 10, 2015 at 5:15 PM, Matthias J. Sax <mj...@apache.org
>>>> <javascript:;>> wrote:
>>>>
>>>>> Would it make sense (if possible?) for Flink to add the user jar
>>>>> dynamically to it's own classpath so Clojure can find it? Or somehow
>>>>> modify Clojure's class loader?
>>>>>
>>>>> The jars in lib are added to the classpath at startup. This makes it
>>>>> practically impossible to execute a Flink program that is written in
>>>>> Clojure right now...
>>>>>
>>>>>
>>>>> On 12/10/2015 05:09 PM, Aljoscha Krettek wrote:
>>>>>> Clojure is not considering the user-jar when trying to load the
>>> class.
>>>>>>
>>>>>>> On 10 Dec 2015, at 17:05, Matthias J. Sax <mj...@apache.org
>>>> <javascript:;>> wrote:
>>>>>>>
>>>>>>> Hi Squirrels,
>>>>>>>
>>>>>>> I was playing with a Flink Clojure WordCount example today.
>>>>>>> https://github.com/mjsax/flink-external/tree/master/flink-clojure
>>>>>>>
>>>>>>> After building the project with "mvn package" I tried to submit it
>>> to
>>>> a
>>>>>>> local cluster. Before I started the cluster, I manually copied
>>>>>>> "clojure-1.5.1.jar" into Flink's lib folder.
>>>>>>>
>>>>>>>> cp ~/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar
>>> lib/
>>>>>>>> bin/start-local.sh
>>>>>>>
>>>>>>> However, when submitting the jar, I get an exception:
>>>>>>>
>>>>>>>> bin/flink run -c org.apache.flink.clojure.WordCount
>>>>>>>
>>>>>
>>>>
>>>
>> ~/workspace_flink/flink-external/flink-clojure/target/flink-clojure-0.10.0.jar
>>>>>>>
>>>>>>>
>>>>>>>> 
>>>>>>>> The program finished with the following exception:
>>>>>>>>
>>>>>>>> org.apache.flink.client.program.ProgramInvocationException: The
>>>>> program's entry point class 'org.apache.flink.clojure.WordCount'
>> threw
>>> an
>>>>> error during initialization.
>>>>>>>> at
>>>>>
>>>>
>>>
>> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:585)
>>>>>>>> at
>>>>>
>>>>
>>>
>> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:195)
>>>>>>>> at

Re: New Blog Post Draft

2015-12-10 Thread Matthias J. Sax
Thanks for all your feedback! I updated the PR.

I would like to publish the post today. Please let me know if you have
any more comments on the draft.

-Matthias

On 12/09/2015 08:12 PM, Vasiliki Kalavri wrote:
> Thanks Matthias! This is a very nice blog post and reads easily.
> 
> On 9 December 2015 at 19:21, Ufuk Celebi <u...@apache.org> wrote:
> 
>> Great post! Thanks!
>>
>> I have also made some comments in the commit.
>>
>> – Ufuk
>>
>>> On 09 Dec 2015, at 14:19, Maximilian Michels <m...@apache.org> wrote:
>>>
>>> Hi Matthias,
>>>
>>> Thank you for the blog post. You had already shared a first draft with
>>> me. This one looks even better!
>>>
>>> I've made some minor comments. +1 to merge if these are addressed.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Wed, Dec 9, 2015 at 1:20 PM, Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>> Just updated the draft (thanks to Till and Slim for feedback) and opened
>>>> a PR.
>>>>
>>>> https://github.com/apache/flink-web/pull/15
>>>>
>>>> @Slim: we discussed about benchmark result beforehand and decided to do
>>>> a second blog post later on
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 12/09/2015 12:14 PM, Slim Baltagi wrote:
>>>>> Matthias,
>>>>>
>>>>> This is great blog!
>>>>>
>>>>> I would like to suggest the following:
>>>>> Change the title to: How to run your existing Storm applications on
>> Apache Flink stream processing engine?
>>>>> Fixing the few typos
>>>>> For this reasons -> For these reasons
>>>>> Storm compatibility package which allows users -> Storm compatibility
>> package that allows users
>>>>> we need to translated it  -> we need to translate it
>>>>> in not available -> is not available
>>>>> eg, StormWordCount.jar  -> e.g., StormWordCount.jar
>>>>> Provide some benchmarks on running a storm application as it is versus
>> running it on Flink.
>>>>> Thanks
>>>>>
>>>>> Slim Baltagi
>>>>>
>>>>> On Dec 9, 2015, at 5:10 AM, Robert Metzger <rmetz...@apache.org>
>> wrote:
>>>>>
>>>>>> Great, thank you for writing the article.
>>>>>>
>>>>>> I like the general idea, but I've found some small typos.
>>>>>> Can you open a pull request against the "flink-web" repo to make
>> reviewing
>>>>>> it easier?
>>>>>>
>>>>>> On Wed, Dec 9, 2015 at 11:32 AM, Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> after talking to several people and getting some feedback already, I
>>>>>>> would like to suggest a new blog post for the project web site about
>> the
>>>>>>> Storm compatibility layer.
>>>>>>>
>>>>>>> You can find the draft here:
>>>>>>>
>>>>>>>
>> https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/_posts/2015-12-07-storm-compatibility.md
>>>>>>>
>>>>>>> The missing (just not rendered) picture is this one:
>>>>>>>
>>>>>>>
>> https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/img/blog/flink-storm.png
>>>>>>>
>>>>>>> Looking forward to your feedback!
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Flink and Clojure

2015-12-10 Thread Matthias J. Sax
Hi Squirrels,

I was playing with a Flink Clojure WordCount example today.
https://github.com/mjsax/flink-external/tree/master/flink-clojure

After building the project with "mvn package" I tried to submit it to a
local cluster. Before I started the cluster, I manually copied
"clojure-1.5.1.jar" into Flink's lib folder.

> cp ~/.m2/repository/org/clojure/clojure/1.5.1/clojure-1.5.1.jar lib/
> bin/start-local.sh

However, when submitting the jar, I get an exception:

> bin/flink run -c org.apache.flink.clojure.WordCount
~/workspace_flink/flink-external/flink-clojure/target/flink-clojure-0.10.0.jar


> 
> The program finished with the following exception:
> 
> org.apache.flink.client.program.ProgramInvocationException: The program's 
> entry point class 'org.apache.flink.clojure.WordCount' threw an error during 
> initialization.
> at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:585)
> at 
> org.apache.flink.client.program.PackagedProgram.(PackagedProgram.java:195)
> at org.apache.flink.client.CliFrontend.buildProgram(CliFrontend.java:784)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
> at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1050)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1103)
> Caused by: java.lang.ExceptionInInitializerError
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:278)
> at 
> org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:578)
> ... 5 more
> Caused by: java.io.FileNotFoundException: Could not locate 
> org/apache/flink/clojure/WordCount__init.class or 
> org/apache/flink/clojure/WordCount.clj on classpath:
> at clojure.lang.RT.load(RT.java:443)
> at clojure.lang.RT.load(RT.java:411)
> at clojure.core$load$fn__5018.invoke(core.clj:5530)
> at clojure.core$load.doInvoke(core.clj:5529)
> at clojure.lang.RestFn.invoke(RestFn.java:408)
> at clojure.lang.Var.invoke(Var.java:415)
> at org.apache.flink.clojure.WordCount.(Unknown Source)
> ... 8 more

I am not sure why the class is not found. It is contained in the jar
file. I can fix this error by copying the user jar
(flink-clojure-0.10.0.jar) into Flink's lib-folder.

It seems, that Flink is not looking into the user-jar when loading this
class. Can anybody explain why?

Thx.

-Matthias



signature.asc
Description: OpenPGP digital signature


New Blog Post Draft

2015-12-09 Thread Matthias J. Sax
Hi,

after talking to several people and getting some feedback already, I
would like to suggest a new blog post for the project web site about the
Storm compatibility layer.

You can find the draft here:
https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/_posts/2015-12-07-storm-compatibility.md

The missing (just not rendered) picture is this one:
https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/img/blog/flink-storm.png

Looking forward to your feedback!


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: New Blog Post Draft

2015-12-09 Thread Matthias J. Sax
Just updated the draft (thanks to Till and Slim for feedback) and opened
a PR.

https://github.com/apache/flink-web/pull/15

@Slim: we discussed about benchmark result beforehand and decided to do
a second blog post later on


-Matthias

On 12/09/2015 12:14 PM, Slim Baltagi wrote:
> Matthias,
> 
> This is great blog!
> 
> I would like to suggest the following: 
> Change the title to: How to run your existing Storm applications on Apache 
> Flink stream processing engine?
> Fixing the few typos
> For this reasons -> For these reasons
> Storm compatibility package which allows users -> Storm compatibility package 
> that allows users 
>  we need to translated it  -> we need to translate it 
> in not available -> is not available
> eg, StormWordCount.jar  -> e.g., StormWordCount.jar
> Provide some benchmarks on running a storm application as it is versus 
> running it on Flink. 
> Thanks
> 
> Slim Baltagi
> 
> On Dec 9, 2015, at 5:10 AM, Robert Metzger <rmetz...@apache.org> wrote:
> 
>> Great, thank you for writing the article.
>>
>> I like the general idea, but I've found some small typos.
>> Can you open a pull request against the "flink-web" repo to make reviewing
>> it easier?
>>
>> On Wed, Dec 9, 2015 at 11:32 AM, Matthias J. Sax <mj...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> after talking to several people and getting some feedback already, I
>>> would like to suggest a new blog post for the project web site about the
>>> Storm compatibility layer.
>>>
>>> You can find the draft here:
>>>
>>> https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/_posts/2015-12-07-storm-compatibility.md
>>>
>>> The missing (just not rendered) picture is this one:
>>>
>>> https://github.com/mjsax/flink-web/blob/stormCompatibilityBlogPost/img/blog/flink-storm.png
>>>
>>> Looking forward to your feedback!
>>>
>>>
>>> -Matthias
>>>
>>>
> 
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3142) CheckpointCoordinatorTest fails

2015-12-07 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3142:
--

 Summary: CheckpointCoordinatorTest fails
 Key: FLINK-3142
 URL: https://issues.apache.org/jira/browse/FLINK-3142
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


https://travis-ci.org/mjsax/flink/jobs/95439203
{noformat}
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.713 sec <<< 
FAILURE! - in org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest
testMaxConcurrentAttempsWithSubsumption(org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest)
  Time elapsed: 60.145 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at org.junit.Assert.assertNotNull(Assert.java:631)
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest.testMaxConcurrentAttempsWithSubsumption(CheckpointCoordinatorTest.java:946)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


JobManagerCheckpointRecoveryITCase

2015-12-07 Thread Matthias J. Sax
Hi,

I had a failing build due to "Maven produced no output for 300 seconds."

It seems that
> JobManagerCheckpointRecoveryITCase.testCheckpointedStreamingSumProgram()

got stuck. Did anyone see this before? Should I open an "instable-test"
JIRA?

https://travis-ci.org/mjsax/flink/jobs/95439206

-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Slinding Window Join (without duplicates)

2015-11-24 Thread Matthias J. Sax
Stephan is right. A tumbling window does not help. The last tuple of
window n and the first tuple of window n+1 are "close" to each other and
should be joined for example.

From a SQL-like point of view this is a very common case expressed as:

SELECT * FROM s1,s2 WHERE s1.key = s2.key AND |s1.ts - s2.ts| < window-size

I would not expect to get any duplicates here.

Basically, the window should move by one tuple (for each stream) and
join with all tuples from the other stream that are within the time
range (window size) were the ts of this new tuple define the boundaries
of the window (ie, there are no "fixed" window boundaries as defined by
a time-slide).

Not sure how a "session window" can help here... I guess using most
generic window API allows to define slide by one tuple and window size X
seconds. But I don't know how duplicates could be avoided...

-Matthias

On 11/24/2015 11:04 AM, Stephan Ewen wrote:
> I understand Matthias' point. You want to join elements that occur within a
> time range of each other.
> 
> In a tumbling window, you have strict boundaries and a pair of elements
> that arrives such that one element is before the boundary and one after,
> they will not join. Hence the sliding windows.
> 
> What may be a solution here is a "session window" join...
> 
> On Tue, Nov 24, 2015 at 10:33 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> 
>> Hi,
>> I’m not sure this is a problem. If a user specifies sliding windows then
>> one element can (and will) end up in several windows. If these are joined
>> then there will be multiple results. If the user does not want multiple
>> windows then tumbling windows should be used.
>>
>> IMHO, this is quite straightforward. But let’s see what others have to say.
>>
>> Cheers,
>> Aljoscha
>>> On 23 Nov 2015, at 20:36, Matthias J. Sax <mj...@apache.org> wrote:
>>>
>>> Hi,
>>>
>>> it seems that a join on the data streams with an overlapping sliding
>>> window produces duplicates in the output. The default implementation
>>> internally just use two nested-loops over both windows to compute the
>>> result.
>>>
>>> How can duplicates be avoided? Is there any way after all right now? If
>>> not, should be add this?
>>>
>>>
>>> -Matthias
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Flink Master on Travis

2015-11-24 Thread Matthias J. Sax
Hi,

I just observed that a couple of builds failed recently due to
timeout... Is there anything we can do about this?

Two recent build passed but took 1:56:25, 1:50:47 what is close to the
2h time limit, too.

Just increasing the timeout seems no a good idea I guess.


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: withParameters() for Streaming API

2015-11-24 Thread Matthias J. Sax
We had this discussion a while ago.

If I recall correctly, "withParameters()" is not encourage to be used in
DataSet either.

This is the thread:
https://mail-archives.apache.org/mod_mbox/flink-dev/201509.mbox/%3C55EC69CD.1070003%40apache.org%3E

-Matthias

On 11/24/2015 02:14 PM, Timo Walther wrote:
> Hi all,
> 
> I want to set the Configuration of a streaming operator and access it
> via the open method of the RichFunction.
> There is no possibility to set the Configuration of the open method at
> the moment, right? Can I open an issue for a withParameters() equivalent
> for the Stremaing API?
> 
> Regards,
> Timo
> 



signature.asc
Description: OpenPGP digital signature


Slinding Window Join (without duplicates)

2015-11-23 Thread Matthias J. Sax
Hi,

it seems that a join on the data streams with an overlapping sliding
window produces duplicates in the output. The default implementation
internally just use two nested-loops over both windows to compute the
result.

How can duplicates be avoided? Is there any way after all right now? If
not, should be add this?


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Storm Compatibility

2015-11-21 Thread Matthias J. Sax
Thanks for your feedback! This is very valuable :)

Please share your experience (positive and negative) when doing more
complex stuff. And don't hesitate to ask if you have any questions.

-Matthias

On 11/21/2015 06:04 PM, Naveen Madhire wrote:
> FYI, I just saw this email chain and thought of sharing my exp. I used the
> Storm Flink API few days ago. Just a simple example worked well, however I
> will be testing few more next week.
> 
> One thing to note is, I had to include all Scala dependencies in the storm
> topology since FlinkLocalCluster.java class has LocalFlinkMiniCluster.scala
> 
> 
> Not sure if this is an issue but after including scala dependencies
> everything worked well. ;)
> 
> 
> On Fri, Nov 20, 2015 at 4:12 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Multiple inputs per bolt is currently not supported. :(
>> FlinkTopologyBuilder has a bug. There is already a JIRA for it:
>> https://issues.apache.org/jira/browse/FLINK-2837
>>
>> I know already how to fix it (hope to can get it into 0.10.1)
>>
>> Removing FlinkTopologyBuilder does make sense (I did not do it because
>> the members we need to access are private). Your idea to get access via
>> reflection is good!
>>
>> Btw: can you also have a look here:
>> https://github.com/apache/flink/pull/1387
>> I would like to merge this ASAP but need some feedback.
>>
>> -Matthias
>>
>> On 11/20/2015 07:30 PM, Maximilian Michels wrote:
>>> I thought about the API changes again. It probably does make sense to
>>> keep the LocalCluster and StormSubmitter equivalent classes. That way,
>>> we don't break the Storm API too much. Users can stick to the pattern
>>> of using either FlinkCluster to execute locally or FlinkSubmitter to
>>> submit remotely. Still, we can save some code by reusing Storm's
>>> TopologyBuilder.
>>>
>>> I'll open a pull request with the changes. This also includes some
>>> more examples and features (e.g. multiple inputs per Bolt).
>>>
>>> On Mon, Nov 16, 2015 at 4:33 PM, Maximilian Michels <m...@apache.org>
>> wrote:
>>>> You are right in saying that both API approaches support executing
>>>> Storm jobs. However, I think the proposed changes make it much easier
>>>> to reuse Storm topologies. And here is why:
>>>>
>>>> 1. No existing classes need to be exchanged.
>>>>
>>>> A Storm topology stays like it is. If you already have it defined
>>>> somewhere, you simply pass it to the FlinkTopologyBuilder to create a
>>>> StreamExecutionEnvironment.
>>>>
>>>> 2. Storm and Flink have different runtime behavior.
>>>>
>>>> IMHO makes more sense to make it transparent to the user that the
>>>> result of the translation is an actual Flink job executed by the Flink
>>>> runtime. Therefore, it makes sense to stick to the Flink way of
>>>> executing. Hiding this fact behind Storm dummy classes can create
>>>> problems for the user.
>>>>
>>>> 3. Code reuse
>>>>
>>>> As you can see in the proposed changes, it makes the implementation
>>>> much simpler while retaining the desire functionality. That has also
>>>> impact of testability and maintainability.
>>>>
>>>> I can also understand your perspective. I wonder if we could get some
>>>> feedback from other people on the mailing list?
>>>>
>>>>
>>>> Let me also address your other comments and suggestions:
>>>>
>>>>> * You changed examples to use finite-spouts -- from a testing point of
>>>>> view this makes sense. However, the examples should show how to run an
>>>>> *unmodified* Storm topology in Flink.
>>>>
>>>> Good point. As far as I know we only test finite sources in the Flink
>>>> streaming tests. Using finite sources makes things much easier. I
>>>> would like to keep the tests simple like this. We can still have
>>>> separate tests to test the infinite attribute of the regular spouts.
>>>> The examples can be converted back to using the infinite spout. IMHO
>>>> the existing approach which involves waiting and killing of the
>>>> topology doesn't seem to be the cleanest solution.
>>>>
>>>>> * we should keep the local copy "unprocessedBolts" when creating a
>> Flink
>>>>> program to allow to re-submit the same topology object twice (or alter
>>>>> it a

Re: how to write dataset in a file?

2015-11-21 Thread Matthias J. Sax
I would not set

> ExecutionEnvironment env = 
> ExecutionEnvironment.createLocalEnvironment().setParallelism(1);

because this changes the default parallelism of *all* operator to one.
Instead, only set the parallelism of the **sink** to one (as described
here:
https://stackoverflow.com/questions/32580970/writeascsv-and-writeastext-is-unexpected/32581813#32581813)

filteredData.writeAsText("file:///output1.txt").setParallelism(1);

-Matthias

On 11/21/2015 02:23 PM, Márton Balassi wrote:
> Additionally as having multiple files under /output1.txt is standard in the
> Hadoop ecosystem you can transparently read all the files with
> env.readTextFile("/output1.txt").
> 
> You can also set parallelism on individual operators (e.g the file writer)
> if you really need a single output.
> 
> On Fri, Nov 20, 2015, 21:27 Suneel Marthi  wrote:
> 
>> You can write to a single output file by setting parallelism == 1
>>
>>  So final ExecutionEnvironment env = ExecutionEnvironment.
>> createLocalEnvironment().setParallelism(1);
>>
>> The reason u see multiple output files is because, each worker is writing
>> to a different file.
>>
>> On Fri, Nov 20, 2015 at 10:06 PM, jun aoki  wrote:
>>
>>> Hi Flink community
>>>
>>> I know I'm mistaken but could not find what I want.
>>>
>>> final ExecutionEnvironment env =
>>> ExecutionEnvironment.createLocalEnvironment();
>>> DataSet data = env.readTextFile("file:///text1.txt");
>>> FilterFunction filter = new MyFilterFunction();  // looks for a
>>> line starts with "[ERROR]"
>>> DataSet filteredData = data.filter(filter);
>>> filteredData.writeAsText("file:///output1.txt");
>>> env.execute();
>>>
>>> Then I expect to get a single file /output1.txt , but actually get
>>> /output1.txt/1, /output1.txt/2, /output1.txt/3...
>>> I assumed I was getting a single file because the method signature says
>>> writeAsText(String filePath).  <-- filePath instead of directoryPath
>>> Also the Javadoc comment sounds like I assumed right.
>>>
>>>
>> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java#L1354
>>>
>>> Can anyone tell if the method signature and document should be fixed? or
>> if
>>> I am missing some configuration?
>>>
>>> --
>>> -jun
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Matthias J. Sax
Nice.

I just would need to get some feedback about it -- I had to change
something in a "hacky way"... Maybe there is a better solution for it...

https://github.com/apache/flink/pull/1387

I there is no better idea about solving the naming issue, I would merge
it into master (there is no 0.10.1 branch yet, right?)

-Matthias

On 11/20/2015 01:54 PM, Till Rohrmann wrote:
> If it' not API breaking, then it can be included imo.
> 
> On Fri, Nov 20, 2015 at 1:44 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> If we find more bugs later on, we could have a 0.10.2, too.
>>
>> +1 for quick bug fix release.
>>
>> Question: should bug fix releases contain fixes for core components
>> only? I would have a fix for a bug in Storm compatibility -- not sure if
>> it should be included or not
>>
>> -Matthias
>>
>> On 11/20/2015 12:35 PM, Till Rohrmann wrote:
>>> The optimizer bug (https://issues.apache.org/jira/browse/FLINK-3052)
>> should
>>> be fixed with https://github.com/apache/flink/pull/1388.
>>>
>>> On Fri, Nov 20, 2015 at 11:37 AM, Gyula Fóra <gyula.f...@gmail.com>
>> wrote:
>>>
>>>> Thanks guys,
>>>>
>>>> I understand your point and you are probably right, if this is a
>>>> lightweight process than the earlier the better :)
>>>>
>>>> Gyula
>>>> On Fri, Nov 20, 2015 at 11:34 AM Ufuk Celebi <u...@apache.org> wrote:
>>>>
>>>>> Hey Gyula,
>>>>>
>>>>> I understand your point, but we already have some important fixes for
>>>>> 0.10.1. It's fair to assume that we will find more issues in the
>> future,
>>>>> but the bugfix releases have way less overhead than the major
>> releases. I
>>>>> would still keep the ASAP schedule and would not wait longer (except
>> for
>>>>> the case that the we find an important issue that we want fixed). We
>> can
>>>>> always do a new bug fix release.
>>>>>
>>>>> – Ufuk
>>>>>
>>>>> On Fri, Nov 20, 2015 at 11:25 AM, Gyula Fóra <gyula.f...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Wouldnt you think that it would make sense to wait a week or so to
>> find
>>>>> all
>>>>>> the hot issues with the current release?
>>>>>>
>>>>>> To me it feels a little bit like rushing this out and we will have
>>>> almost
>>>>>> the same situation afterwards.
>>>>>>
>>>>>> I might be wrong but I think people should get a chance to try this
>>>> out.
>>>>>>
>>>>>> In any case I would +1 for the quick release if everyone else thinks
>>>>> thats
>>>>>> the way, these are just my thoughts.
>>>>>>
>>>>>> Gyula
>>>>>> On Fri, Nov 20, 2015 at 11:13 AM Till Rohrmann <trohrm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Actually, I still have another bug related to the optimizer which I
>>>>> would
>>>>>>> like to include if possible. The problem is that the optimizer is not
>>>>>> able
>>>>>>> to push properties properly out of a bulk iteration which in some
>>>> cases
>>>>>> can
>>>>>>> lead to rejected Flink jobs.
>>>>>>>
>>>>>>> On Fri, Nov 20, 2015 at 11:10 AM, Robert Metzger <
>>>> rmetz...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Great, thank you!
>>>>>>>>
>>>>>>>> Let me know if there is any issue, I'll address it asap. The PR is
>>>>> not
>>>>>>>> building anymore because you've pushed an update to the Kafka
>>>>>>>> documentation. I can rebase and merge the PR once you give me green
>>>>>> light
>>>>>>>> ;)
>>>>>>>>
>>>>>>>> Till has merged FLINK-3021, so we might be able to have a first RC
>>>>>> today.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Nov 20, 2015 at 11:05 AM, Stephan Ewen <se...@apache.org>
>>>>>> wrote:
>>>>>>>>
>>>

Re: Storm Compatibility

2015-11-20 Thread Matthias J. Sax
Multiple inputs per bolt is currently not supported. :(
FlinkTopologyBuilder has a bug. There is already a JIRA for it:
https://issues.apache.org/jira/browse/FLINK-2837

I know already how to fix it (hope to can get it into 0.10.1)

Removing FlinkTopologyBuilder does make sense (I did not do it because
the members we need to access are private). Your idea to get access via
reflection is good!

Btw: can you also have a look here:
https://github.com/apache/flink/pull/1387
I would like to merge this ASAP but need some feedback.

-Matthias

On 11/20/2015 07:30 PM, Maximilian Michels wrote:
> I thought about the API changes again. It probably does make sense to
> keep the LocalCluster and StormSubmitter equivalent classes. That way,
> we don't break the Storm API too much. Users can stick to the pattern
> of using either FlinkCluster to execute locally or FlinkSubmitter to
> submit remotely. Still, we can save some code by reusing Storm's
> TopologyBuilder.
> 
> I'll open a pull request with the changes. This also includes some
> more examples and features (e.g. multiple inputs per Bolt).
> 
> On Mon, Nov 16, 2015 at 4:33 PM, Maximilian Michels <m...@apache.org> wrote:
>> You are right in saying that both API approaches support executing
>> Storm jobs. However, I think the proposed changes make it much easier
>> to reuse Storm topologies. And here is why:
>>
>> 1. No existing classes need to be exchanged.
>>
>> A Storm topology stays like it is. If you already have it defined
>> somewhere, you simply pass it to the FlinkTopologyBuilder to create a
>> StreamExecutionEnvironment.
>>
>> 2. Storm and Flink have different runtime behavior.
>>
>> IMHO makes more sense to make it transparent to the user that the
>> result of the translation is an actual Flink job executed by the Flink
>> runtime. Therefore, it makes sense to stick to the Flink way of
>> executing. Hiding this fact behind Storm dummy classes can create
>> problems for the user.
>>
>> 3. Code reuse
>>
>> As you can see in the proposed changes, it makes the implementation
>> much simpler while retaining the desire functionality. That has also
>> impact of testability and maintainability.
>>
>> I can also understand your perspective. I wonder if we could get some
>> feedback from other people on the mailing list?
>>
>>
>> Let me also address your other comments and suggestions:
>>
>>> * You changed examples to use finite-spouts -- from a testing point of
>>> view this makes sense. However, the examples should show how to run an
>>> *unmodified* Storm topology in Flink.
>>
>> Good point. As far as I know we only test finite sources in the Flink
>> streaming tests. Using finite sources makes things much easier. I
>> would like to keep the tests simple like this. We can still have
>> separate tests to test the infinite attribute of the regular spouts.
>> The examples can be converted back to using the infinite spout. IMHO
>> the existing approach which involves waiting and killing of the
>> topology doesn't seem to be the cleanest solution.
>>
>>> * we should keep the local copy "unprocessedBolts" when creating a Flink
>>> program to allow to re-submit the same topology object twice (or alter
>>> it after submission). If you don't make the copy, submitting/translating
>>> the topology into a Flink job alters the object (which should not
>>> happen). And as it is not performance critical, the copying overhead
>>> does not matter.
>>
>> I didn't think about that but we can copy the spouts and bolts before
>> processing them. I've added that to my local branch. However, I didn't
>> see where this was done previously. Can you give me a hint?
>>
>>> * Why did you change the dop from 4 to 1 WordCountTopology ? We should
>>> test in parallel fashion...
>>
>> Absolutely. Already reverted this locally.
>>
>>> * Too many reformatting changes ;) You though many classes without any
>>> actual code changes.
>>
>> Yes, I ran "Optimize Imports" in IntelliJ. Sorry for that but this
>> only affects the import statements.
>>
>> I would like to open a pull request soon to merge some of the changes.
>> It would be great if some other people commented on the API changes
>> and whether we should integrate direct support for spouts/bolts in
>> DataStream. Next, I would like to test and bundle some more of the
>> examples included in Storm.
>>
>> Cheers,
>> Max
>>
>> On Sat, Nov 14, 2015 at 5:13 PM, Matthias J. Sax <mj...@apache.org&g

[jira] [Created] (FLINK-3048) DataSinkTaskTest.testCancelDataSinkTask

2015-11-19 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3048:
--

 Summary: DataSinkTaskTest.testCancelDataSinkTask
 Key: FLINK-3048
 URL: https://issues.apache.org/jira/browse/FLINK-3048
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


https://travis-ci.org/apache/flink/jobs/91941025

{noformat}
Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 16.483 sec <<< 
FAILURE! - in org.apache.flink.runtime.operators.DataSinkTaskTest
testCancelDataSinkTask(org.apache.flink.runtime.operators.DataSinkTaskTest)  
Time elapsed: 1.136 sec  <<< FAILURE!
java.lang.AssertionError: Temp output file has not been removed
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at 
org.apache.flink.runtime.operators.DataSinkTaskTest.testCancelDataSinkTask(DataSinkTaskTest.java:397)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Redis connector for Flink

2015-11-17 Thread Matthias J. Sax
Hi,

I was just wondering if we should put a Redis connector for Flink on the
agenda. Or do we have a JIRA for this already? If not, just having a
JIRA might trigger some external contribution for it :)


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Redis connector for Flink

2015-11-17 Thread Matthias J. Sax
I was thinking about source, sink.

But why not as a state backend, too.


On 11/17/2015 12:08 PM, Stephan Ewen wrote:
> Nice idea!
> 
> Do you mean Redis as a state backend, a source, or a sink?
> 
> On Tue, Nov 17, 2015 at 11:41 AM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Hi,
>>
>> I was just wondering if we should put a Redis connector for Flink on the
>> agenda. Or do we have a JIRA for this already? If not, just having a
>> JIRA might trigger some external contribution for it :)
>>
>>
>> -Matthias
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3034) Redis SInk Connector

2015-11-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3034:
--

 Summary: Redis SInk Connector
 Key: FLINK-3034
 URL: https://issues.apache.org/jira/browse/FLINK-3034
 Project: Flink
  Issue Type: New Feature
  Components: Streaming Connectors
Reporter: Matthias J. Sax
Priority: Minor


Flink does not provide a sink connector for Redis.

See FLINK-3033



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3031) Consistent Shutdown of Streaming Jobs

2015-11-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3031:
--

 Summary: Consistent Shutdown of Streaming Jobs
 Key: FLINK-3031
 URL: https://issues.apache.org/jira/browse/FLINK-3031
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Matthias J. Sax


Depends on FLINK-2111

When a streaming job is shut down cleanly via "stop", a last consistent 
snapshot should be collected. This snapshot could be used to resume a job later 
on.

See mail archive for more details of the discussion: 
https://mail-archives.apache.org/mod_mbox/flink-dev/201511.mbox/%3CCA%2Bfaj9xDFAUG_zi%3D%3DE2H8s-8R4cn8ZBDON_hf%2B1Rud5pJqvZ4A%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3033) Redis Source Connector

2015-11-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3033:
--

 Summary: Redis Source Connector
 Key: FLINK-3033
 URL: https://issues.apache.org/jira/browse/FLINK-3033
 Project: Flink
  Issue Type: New Feature
  Components: Streaming Connectors
Reporter: Matthias J. Sax
Priority: Minor


Flink does not provide a source connector for Redis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3035) Redis as State Backend

2015-11-17 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3035:
--

 Summary: Redis as State Backend
 Key: FLINK-3035
 URL: https://issues.apache.org/jira/browse/FLINK-3035
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Matthias J. Sax
Priority: Minor


Add Redis as a state backend for distributed snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Fwd: Re: Storm Compatibility

2015-11-14 Thread Matthias J. Sax
About DataStream extension and setting storm dependency to provided. If
this works, a big +1 from my side.

-Matthias


On 11/14/2015 05:13 PM, Matthias J. Sax wrote:
> I just had a look at your proposal. It makes a lot of sense. I still
> believe that it is a matter of taste if one prefers your or my point of
> view. Both approaches allows to easily reuse and execute Storm
> Topologies on Flink (what is the most important feature we need to have).
> 
> I hope to get some more feedback from the community, if the
> Strom-compatibility should be more "stormy" or more "flinky". Bot
> approaches make sense to me.
> 
> 
> I view minor comments:
> 
> * FileSpout vs FiniteFileSpout
>   -> FileSpout was implemented in a Storm way -- to set the "finished"
> flag here does not make sense from a Storm point of view (there is no
> such thing as a finite spout)
>   Thus, this example shows how a regular Storm spout can be improved
> using FiniteSpout interface -- I would keep it as is (even if seems to
> be unnecessary complicated -- imagine that you don't have the code of
> FileSpout)
> 
> * You changed examples to use finite-spouts -- from a testing point of
> view this makes sense. However, the examples should show how to run an
> *unmodified* Storm topology in Flink.
> 
> * we should keep the local copy "unprocessedBolts" when creating a Flink
> program to allow to re-submit the same topology object twice (or alter
> it after submission). If you don't make the copy, submitting/translating
> the topology into a Flink job alters the object (which should not
> happen). And as it is not performance critical, the copying overhead
> does not matter.
> 
> * Why did you change the dop from 4 to 1 WordCountTopology ? We should
> test in parallel fashion...
> 
> * Too many reformatting changes ;) You though many classes without any
> actual code changes.
> 
> 
> 
> 
> 
> 
> ---- Forwarded Message 
> Subject: Re: Storm Compatibility
> Date: Fri, 13 Nov 2015 12:15:19 +0100
> From: Maximilian Michels <m...@apache.org>
> To: Matthias J. Sax <mj...@apache.org>
> CC: Stephan Ewen <se...@apache.org>, Robert Metzger <rmetz...@apache.org>
> 
> Hi Matthias,
> 
> Thank you for your remarks.
> 
> I believe the goal of the compatibility layer should not be to mimic
> Storm's API but to easily execute Storm typologies using Flink. I see
> that it is easy for users to use class names for execution they know
> from Storm but I think this makes the API verbose. I've refactored it
> a bit to make it more aligned with Flink's execution model. After all,
> the most important thing is that it makes it easy for people to reuse
> Storm typologies while getting all the advantages of Flink.
> 
> Let me explain what I have done so far:
> https://github.com/apache/flink/compare/master...mxm:storm-dev
> 
> API
> - remove FlinkClient, FlinkSubmitter, FlinkLocalCluster,
> FlinkTopology: They are not necessary in my opinion and are
> replicating functionality already included in Flink or Storm.
> 
> - Build the topology with the Storm TopologyBuilder (instead of
> FlinkTopology) which is then passed to the FlinkTopologyBuilder which
> generates the StreamExecutionEnvironment containing the StreamGraph.
> You can then simply call execute() like you would usually do in Flink.
> This lets you reuse your Storm typologies with the ease of Flink
> context-based execution mechanism. Note that it works in local and
> remote execution mode without changing any code.
> 
> Tests
> - replaced StormTestBase.java with StreamingTestBase
> - use a Finite source for the tests and changed it a bit
> 
> Examples
> - Convert examples to new API
> - Remove duplicate examples (local and remote)
> 
> I hope these changes are not too invasive for you. I think it makes
> the compatibility layer much easier to use. Let me know what you think
> about it. Of course, we can iterate on it.
> 
> About the integration of the compatibility layer into DataStream:
> Wouldn't it be possible to set storm to provided and let the user
> include the jar if he/she wants to use the Storm compatibility? That's
> also what we do for other libraries like Gelly. You have to package
> them into the JAR if you want to run them on the cluster. We should
> give a good error message if classes cannot be found.
> 
> +1 for moving the discussion to the dev list.
> 
> Cheers,
> Max
> 
> On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <mj...@apache.org> wrote:
>> One more thing that just came to my mind about (1): I have to correct my
>> last reply on it:
>>
>> We **cannot reuse** TopologyBuil

Fwd: Re: Storm Compatibility

2015-11-14 Thread Matthias J. Sax
I just had a look at your proposal. It makes a lot of sense. I still
believe that it is a matter of taste if one prefers your or my point of
view. Both approaches allows to easily reuse and execute Storm
Topologies on Flink (what is the most important feature we need to have).

I hope to get some more feedback from the community, if the
Strom-compatibility should be more "stormy" or more "flinky". Bot
approaches make sense to me.


I view minor comments:

* FileSpout vs FiniteFileSpout
  -> FileSpout was implemented in a Storm way -- to set the "finished"
flag here does not make sense from a Storm point of view (there is no
such thing as a finite spout)
  Thus, this example shows how a regular Storm spout can be improved
using FiniteSpout interface -- I would keep it as is (even if seems to
be unnecessary complicated -- imagine that you don't have the code of
FileSpout)

* You changed examples to use finite-spouts -- from a testing point of
view this makes sense. However, the examples should show how to run an
*unmodified* Storm topology in Flink.

* we should keep the local copy "unprocessedBolts" when creating a Flink
program to allow to re-submit the same topology object twice (or alter
it after submission). If you don't make the copy, submitting/translating
the topology into a Flink job alters the object (which should not
happen). And as it is not performance critical, the copying overhead
does not matter.

* Why did you change the dop from 4 to 1 WordCountTopology ? We should
test in parallel fashion...

* Too many reformatting changes ;) You though many classes without any
actual code changes.






 Forwarded Message 
Subject: Re: Storm Compatibility
Date: Fri, 13 Nov 2015 12:15:19 +0100
From: Maximilian Michels <m...@apache.org>
To: Matthias J. Sax <mj...@apache.org>
CC: Stephan Ewen <se...@apache.org>, Robert Metzger <rmetz...@apache.org>

Hi Matthias,

Thank you for your remarks.

I believe the goal of the compatibility layer should not be to mimic
Storm's API but to easily execute Storm typologies using Flink. I see
that it is easy for users to use class names for execution they know
from Storm but I think this makes the API verbose. I've refactored it
a bit to make it more aligned with Flink's execution model. After all,
the most important thing is that it makes it easy for people to reuse
Storm typologies while getting all the advantages of Flink.

Let me explain what I have done so far:
https://github.com/apache/flink/compare/master...mxm:storm-dev

API
- remove FlinkClient, FlinkSubmitter, FlinkLocalCluster,
FlinkTopology: They are not necessary in my opinion and are
replicating functionality already included in Flink or Storm.

- Build the topology with the Storm TopologyBuilder (instead of
FlinkTopology) which is then passed to the FlinkTopologyBuilder which
generates the StreamExecutionEnvironment containing the StreamGraph.
You can then simply call execute() like you would usually do in Flink.
This lets you reuse your Storm typologies with the ease of Flink
context-based execution mechanism. Note that it works in local and
remote execution mode without changing any code.

Tests
- replaced StormTestBase.java with StreamingTestBase
- use a Finite source for the tests and changed it a bit

Examples
- Convert examples to new API
- Remove duplicate examples (local and remote)

I hope these changes are not too invasive for you. I think it makes
the compatibility layer much easier to use. Let me know what you think
about it. Of course, we can iterate on it.

About the integration of the compatibility layer into DataStream:
Wouldn't it be possible to set storm to provided and let the user
include the jar if he/she wants to use the Storm compatibility? That's
also what we do for other libraries like Gelly. You have to package
them into the JAR if you want to run them on the cluster. We should
give a good error message if classes cannot be found.

+1 for moving the discussion to the dev list.

Cheers,
Max

On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <mj...@apache.org> wrote:
> One more thing that just came to my mind about (1): I have to correct my
> last reply on it:
>
> We **cannot reuse** TopologyBuilder because the returned StormTopology
> from .createTopology() does **not** contain the references to the
> Spout/Bolt object. Internally, those are already serialized into an
> internal Thrift representation (as preparation to get sent to Nimbus).
> However, in order to create a Flink job, we need the references of course...
>
> -Matthias
>
>
> On 11/11/2015 04:33 PM, Maximilian Michels wrote:
>> Hi Matthias,
>>
>> Sorry for getting back to you late. I'm very new to Storm but have
>> familiarized myself a bit the last days. While looking through the
>> Storm examples and the compatibility layer I discovered the following
&

Re: [DISCUSSION] Consistent shutdown of streaming jobs

2015-11-13 Thread Matthias J. Sax
I was thinking about this issue too and wanted to include it in my
current PR (I just about to rebase it to the current master...
https://github.com/apache/flink/pull/750).

Or should be open a new JIRA for it and address it after Stop signal is
available?


-Matthias

On 11/12/2015 11:47 AM, Maximilian Michels wrote:
> +1 for the proposed changes. But why not always create a snapshot on
> shutdown? Does that break any assumptions in the checkpointing
> interval? I see that if the user has checkpointing disabled, we can
> just create a fake snapshot.
> 
> On Thu, Nov 12, 2015 at 9:56 AM, Gyula Fóra  wrote:
>> Yes, I agree with you.
>>
>> Once we have the graceful shutdown we can make this happen fairly simply
>> with the mechanism you described :)
>>
>> Gyula
>>
>> Stephan Ewen  ezt írta (időpont: 2015. nov. 11., Sze,
>> 15:43):
>>
>>> I think you are touching on something important here.
>>>
>>> There is a discussion/PullRequest about graceful shutdown of streaming jobs
>>> (like stop
>>> the sources and let the remainder of the streams run out).
>>>
>>> With the work in progress to draw external checkpoint, it should be easy do
>>> checkpoint-and-close.
>>> We may not even need the last ack in the "checkpoint -> ack -> notify ->
>>> ack" sequence, when the
>>> operators simply wait for the "notifyComplete" function to finish. Then,
>>> the operators finish naturally
>>> only successfully when the "notifyComplete()" method succeeds, otherwise
>>> they go to the state "failed".
>>> That is good, because we need no extra mechanism (extra message type).
>>>
>>> What we do need anyways is a way to detect when the checkpoint did not
>>> globally succeed, that the
>>> functions where it succeeded do not wait forever for the "notifySuccessful"
>>> message.
>>>
>>> We have two things here now:
>>>
>>> 1) Graceful shutdown should trigger an "internal" checkpoint (which is
>>> immediately discarded), in order to commit
>>> pending data for cases where data is staged between checkpoints.
>>>
>>> 2) An option to shut down with external checkpoint would also be important,
>>> to stop and resume from exactly there.
>>>
>>>
>>> Stephan
>>>
>>>
>>> On Wed, Nov 11, 2015 at 3:19 PM, Gyula Fóra  wrote:
>>>
 Hey guys,

 With recent discussions around being able to shutdown and restart
>>> streaming
 jobs from specific checkpoints, there is another issue that I think needs
 tackling.

 As far as I understand when a streaming job finishes the tasks are not
 notified for the last checkpoints and also jobs don't take a final
 checkpoint before shutting down.

 In my opinion this might lead to situations when the user cannot tell
 whether the job finished properly (with consistent states/ outputs) etc.
>>> To
 give you a concrete example, let's say I am using the RollingSink to
 produce exactly once output files. If the job finishes I think there will
 be some files that remain in the pending state and are never completed.
>>> The
 user then sees some complete files, and some pending files for the
 completed job. The question is then, how do I tell whether the pending
 files were actually completed properly no that the job is finished.

 Another example would be that I want to manually shut down my job at
>>> 12:00
 and make sure that I produce every output up to that point. Is there any
 way to achieve this currently?

 I think we need to do 2 things to make this work:
 1. Job shutdowns (finish/manual) should trigger a final checkpoint
 2. These final checkpoints should actually be 2 phase checkpoints:
 checkpoint -> ack -> notify -> ack , then when the checkpointcoordinator
 gets all the notification acks it can tell the user that the system shut
 down cleanely.

 Unfortunately it can happen that for some reason the coordinator does not
 receive all the acks for a complete job, in that case it can warn the
>>> user
 that the checkpoint might be inconsistent.

 Let me know what you think!

 Cheers,
 Gyula

>>>



signature.asc
Description: OpenPGP digital signature


Python Examples

2015-11-10 Thread Matthias J. Sax
Hi,

Slim recently twittered about Will McGinnis Python examples on Flink.

https://github.com/wdm0006/flink-python-examples

I think it would be nice to add it to the third party section on the web
page. Does anyone have any objections about it? If not, I will add it
the next days.


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Flink deployment fabric script

2015-11-09 Thread Matthias J. Sax
Hi Do,

thanks for you interest in Flink. It is great that you want to
contribute to the system.

Right now, I am not sure how your script could be integrated into Flink.
As a reference, please read the following guidelines:

https://flink.apache.org/how-to-contribute.html
https://flink.apache.org/contribute-code.html

Let's drive a discussion about it first. Can you describe what
advantages (differences/new features) you script offers compared to the
already provided startup scripts?


-Matthias


On 11/08/2015 07:52 PM, Le Quoc Do wrote:
> Hi Flinkers,
> 
> I'm start working with Flink and I would like to contribute to Flink.
> However, I'm a very new Flinker, so the first thing I could contribute
> is a one-click
> style deployment script to deploy Flink, Spark and Hadoop Yarn on cluster
> and cloud computing environments (OpenStack based Cloud). It's available
> here: https://bitbucket.org/lequocdo/flink-setup.
> I added Spark to the script, since there are many people want to compare
> performance between Spark and Flink.
> I have tested the script with our cluster and a OpenStack based cloud.
> Please let me know if this contribution makes some senses or not. Your
> feedback and comments will be greatly appreciated.
> 
> Thank you,
> Do
> 



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   >