Re: [Discuss] Beam mascot

2019-11-09 Thread Maximilian Michels
I like that sketch! The designer has also sent me some rough sketches, 
I'll share these here when I get consent from the designer.


-Max

On 09.11.19 19:22, Alex Van Boxel wrote:

+1 for a FireFly. Ok, I can't draw, but it's to make a point ;-)

Fire2.jpg



  _/
_/ Alex Van Boxel


On Sat, Nov 9, 2019 at 12:26 AM Kyle Weaver > wrote:


Re fish: The authors of the Streaming Systems went with trout, but
the book mentioned a missed opportunity to make their cover a "robot
dinosaur with a Scottish accent." Perhaps that idea is worth revisiting?

On Fri, Nov 8, 2019 at 3:20 PM Luke Cwik mailto:lc...@google.com>> wrote:

My top suggestion is a cuttlefish.

On Thu, Nov 7, 2019 at 10:28 PM Reza Rokni mailto:r...@google.com>> wrote:

Salmon... they love streams? :-)

On Fri, 8 Nov 2019 at 12:00, Kenneth Knowles
mailto:k...@apache.org>> wrote:

Agree with Aizhamal that it doesn't matter if they are
taken if they are not too close in space to Beam: Apache
projects, big data, log processing, stream processing.
Not a legal opinion, but an aesthetic opinion. So I
would keep Lemur as a possibility. Definitely nginx is
far away from Beam so it seems OK as long as the art is
different.

Also FWIW there are many kinds of Lemurs, and also
related Tarsier, of the only uncontroversial and
non-extinct infraorder within suborder Strepsirrhini. I
think there's enough room for another mascot with big
glowing eyes :-). The difference in the designer's art
will be more significant than the taxonomy.

Kenn

On Tue, Nov 5, 2019 at 4:37 PM Aizhamal Nurmamat kyzy
mailto:aizha...@apache.org>> wrote:

Aww.. that Hoover beaver is cute. But then lemur is
also "taken" [1] and the owl too [2].

Personally, I don't think it matters much which
mascots are taken, as long as the project is not too
close in the same space as Beam. Also, it's good to
just get all ideas out. We should still consider
hedgehogs. I looked up fireflies, they don't look
nice, but i am not dismissing the idea :/

And thanks for reaching out to designers, Max. To
your point:
 >how do we arrive at a concrete design
 >once we have consensus on the type of mascot?
My thinking is that the designer will come up with
few sketches, then we vote on one here in the dev@ list.

[1]

https://www.nginx.com/blog/introducing-the-lemur-stack-and-an-official-nginx-mascot/
[2]
https://blog.readme.io/why-every-startup-needs-a-mascot/

On Tue, Nov 5, 2019 at 5:31 AM Maximilian Michels
mailto:m...@apache.org>> wrote:

Quick update: The mentioned designer has gotten
back to me and offered
to sketch something until the end of the week.
I've pointed him to this
thread and the existing logo material:
https://beam.apache.org/community/logos/

[I don't want to interrupt the discussion in any
way, I just think
having something concrete will help us to
eventually decide what we want.]

On 05.11.19 12:49, Maximilian Michels wrote:
 > How about fireflies in the Beam light rays? ;)
 >
 >> Feels like "Beam" would go well with an
animal that has glowing bright
 >> eyes such as a lemur
 >
 > I love the lemur idea because it has almost
orange eyes.
 >
 > Thanks for starting this Aizhamal! I've
recently talked to a designer
 > which is somewhat famous for creating logos.
He was inclined to work on
 > a software project logo. Of course there is a
little bit of a price tag
 > attached, though the quote sounded reasonable.
 >
 > It raises the general question, how do we
arrive at a concrete design
 > once we have consensus on the type of 

Re: New Contributor

2019-11-09 Thread Luke Cwik
Welcome, I have added you as a contributor.

On Fri, Nov 8, 2019 at 3:14 PM Yang Zhang 
wrote:

> Hello Beam community,
>
> This is Yang from LinkedIn. I am closely working with Xinyu on adopting
> Beam SQL in LinkedIn. Can someone add me as a contributor for Beam's Jira
> issue tracker? I would like to create/assign tickets for my work. My Jira
> ID is *yangzhang*. Thanks!
>
> Best,
> Yang
>


Re: [VOTE] @RequiresTimeSortedInput stateful DoFn annotation

2019-11-09 Thread Jan Lukavský

Hi,

I'll try to summarize the mailing list threads to clarify why I think 
this addition is needed (and actually necessary):


 a) there are situations where the order of input events matter 
(obviously any finite state machine)


 b) in streaming case, this can be handled by the current machinery 
(e.g. holding elements in state, sorting all elements with timestamp 
less than input watermark, dropping latecomers)


 c) in batch case, this can be handled the same way, but

  i) due to the nature of batch processing, that has extreme 
requirements on the size of state needed to hold the elements (actually, 
in extreme, that might be the whole input, which might not be feasible)


  ii) although it is true, that watermark might (and will) fall behind 
in streaming processing as well so that similar issues might arise there 
too, it is hardly imaginable that it will fall behind as much as several 
years (but it is absolutely natural in batch case) - I'm talking about 
regular streaming processing, not some kappa like architectures, where 
this happens as well, but is causes troubles ([1])


  iii) given the fact, that some runners already use sort-merge 
groupings, it is actually virtually for free to also sort elements 
inside groups by timestamps, the runner just has to know, that it should 
do so


I don't want to go too far into details to keep this focused, but the 
fact that runner would know that it should sort by timestamp before 
stateful pardo brings additional features that are currently unavailable 
- e.g. actually shift event time smoothly, as elements flow through, not 
from -inf to +inf in one shot. That might have positive effect on timers 
being fired smoothly and thus for instance being able to free some state 
that would have to be held until the end of computation otherwise.


Therefore, I think it is essential for users to be able to tell runner 
that a particular stateful pardo depends on order of input events, so 
that the runner can use optimizations available in batch case. The 
streaming case is mostly unaffected by that, because all the sorting can 
be handled the usual way.


Hope this helps to clarify why it would be good to introduce (some way) 
to mark stateful pardos as "time sorted".


Cheers,

 Jan

[1] 
https://www.ververica.com/resources/flink-forward-san-francisco-2019/moving-from-lambda-and-kappa-architectures-to-kappa-at-uber


Hope these thoughts help

On 11/8/19 11:35 AM, Jan Lukavský wrote:

Hi Max,

thanks for comment. I probably should have put links to discussion 
threads here in the vote thread. Relevant would be


 - (a pretty lengthy) discussion about whether sorting by timestamp 
should be part of the model - [1]


 - part of the discussion related to the annotation - [2]

Regarding the open question in the design document - these are not 
meant to be open questions in regard to the design of the annotation 
and I'll remove that for now, as it is not (directly) related.


Now - main reason for this vote is that there is actually not a clear 
consensus in the ML thread. There are plenty of words like "should", 
"could", "would" and "maybe", so I wanted to be sure there is 
consensus to include this. I already run this in production for 
several months, so it is definitely useful for me. :-) But that might 
not be sufficient.


I'd be very happy to answer any more questions.

Thanks,

 Jan

[1] 
https://lists.apache.org/thread.html/4609a1bb1662690d67950e76d2f1108b51327b8feaf9580de659552e@%3Cdev.beam.apache.org%3E


[2] 
https://lists.apache.org/thread.html/dd9bec903102d9fcb4f390dc01513c0921eac1fedd8bcfdac630aaee@%3Cdev.beam.apache.org%3E


On 11/8/19 11:08 AM, Maximilian Michels wrote:

Hi Jan,

Disclaimer: I haven't followed the discussion closely, so I do not 
want to comment on the technical details of the feature here.


From the outside, it looks like there may be open questions. Also, we 
may need more motivation for what we can build with this feature or 
how it will become useful to users.


There are many threads in Beam and I believe we need to carefully 
prioritize the Beam feature set in order to focus on the things that 
provide the most value to our users.


Cheers,
Max

On 07.11.19 15:55, Jan Lukavský wrote:

Hi,
is there anything I can do to make this more attractive? :-) Any 
feedback would be much appreciated.

Many thanks,
  Jan

Dne 5. 11. 2019 14:10 napsal uživatel Jan Lukavský :

    Hi,

    I'd like to open a vote on accepting design document [1] as a 
base for

    implementation of @RequiresTimeSortedInput annotation for stateful
    DoFns. Associated JIRA [2] and PR [3] contains only subset of 
the whole
    functionality (allowed lateness ignored and no possibility to 
specify

    UDF for time - or sequential number - to be extracted from data).
    The PR
    will be subject to independent review process (please feel free to
    self-request review if you are interested in this) after the 
vote would
    eventually succeed.