Re: CEP API: Question on FollowedBy

2016-04-05 Thread Anwar Rizal
Thanks Till.

The only way I can change the behavior would be to post filter the result
then.

Anwar.

On Tue, Apr 5, 2016 at 11:41 AM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Anwar,
>
> yes, once we have published the introductory blog post about the CEP
> library, we will also publish a more in-depth description of the approach
> we have implemented. To spoil it a little bit: We have mainly followed the
> paper “Efficient Pattern Matching over Event Streams” for the
> implementation.
>
> Concerning your questions:
>
> 1.) followedBy means that there can be an arbitrary sequence of events
> between two matching events, as long as they occur in the specified time
> interval. Thus, TemperatureEvent(40) will match together with
> TemperaturEvent(50), TemperaturEvent(70), TemperaturEvent(65) and
> TemperaturEvent(60).
>
> 2.) The same applies here for the next operator. It says that the second
> matching event has to follow directly after the previous matched event.
> However, a matching event can be part of multiple matching sequences. Thus,
> you will get TemperaturEvent(70), TemperaturEvent(65) and
> TemperaturEvent(65), TemperaturEvent(60) as two distinct matching sequences.
>
> To make a long story short, there is currently no option to change the
> sequence semantics so that events are only part of one matching sequence.
> At the moment, events can participate in multiple matching sequences. I
> hope that answers your question Anwar.
>
> Cheers,
> Till
> ​
>
> On Mon, Apr 4, 2016 at 11:18 AM, Anwar Rizal <anriza...@gmail.com> wrote:
>
>> Hi All,
>>
>>
>> I saw Till's blog preparation. It will be a very helpful blog. I hope
>> that some other blogs that explain how it works will come soon :-)
>>
>> I have a question on followedBy pattern matching semantic.
>>
>>
>> From the documentation
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html
>> ,
>>
>> 
>>
>> Non-strict contiguity means that other events are allowed to occur
>> in-between two matching events. A non-strict contiguity pattern state can
>> be created via the followedBy method.
>>
>> Pattern<Event, ?> nonStrictNext = start.followedBy("middle");
>>
>>
>> 
>>
>> I try to use Till's examples in the blog, to understand the semantic of
>> followedBy
>>
>> --
>> First question.
>> Say, I have sequence of temperatures in a time window that corresponds to
>> the within clause (say in 20 minutes).
>>
>> TemperatureEvent(40) , OtherEvent(...), TemperatureEvent(30),
>> TemperatureEvent(50), OtherEvent(...), TemperatureEvent(70),
>> TemperatureEvent(65), TemperatureEvent(60)
>>
>> say I want to match two TemperatureEvents whose temperatures > 35.
>>
>> What will be the matches in this case ?
>>
>>
>>-  Will TemperatureEvent(40) , TemperatureEvent(50), match ? (because
>>we have TemperatureEvent(30) at time 3 that does not match.
>>- Will TemperatureEvent(40) , TemperatureEvent(70) match ? (because
>>the pair matches also the specification of pattern , the difference is we
>>have TemperatureEvent(50) which happened before TempertureEvent(70) ).
>>Similar question for TemperatureEvent(40) - TemperatureEvent(65) and
>>TemperatureEvent(50)-TemperatureEvent(65) etc. pairs.
>>
>>
>>
>> --
>> Second question.
>> For next (and also for followedBy) , I have also questions regarding
>> example above:
>> Will TemperatureEvent(70), TemperatureEvent(65) and TemperatureEvent(65),
>> TemperatureEvent(60) be returned , or the second pair is no longer returned
>> because TemperatureEvent(65) has been used in the first pair ?
>>
>>
>>
>> Is there a way to define the different sequence semantics  for the two
>> questions I asked above ?
>>
>>
>> Thanks,
>> Anwar.
>>
>>
>>
>>
>>
>>
>>
>>
>
>


CEP API: Question on FollowedBy

2016-04-04 Thread Anwar Rizal
Hi All,


I saw Till's blog preparation. It will be a very helpful blog. I hope that
some other blogs that explain how it works will come soon :-)

I have a question on followedBy pattern matching semantic.


>From the documentation
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html
,



Non-strict contiguity means that other events are allowed to occur
in-between two matching events. A non-strict contiguity pattern state can
be created via the followedBy method.

Pattern nonStrictNext = start.followedBy("middle");




I try to use Till's examples in the blog, to understand the semantic of
followedBy

--
First question.
Say, I have sequence of temperatures in a time window that corresponds to
the within clause (say in 20 minutes).

TemperatureEvent(40) , OtherEvent(...), TemperatureEvent(30),
TemperatureEvent(50), OtherEvent(...), TemperatureEvent(70),
TemperatureEvent(65), TemperatureEvent(60)

say I want to match two TemperatureEvents whose temperatures > 35.

What will be the matches in this case ?


   -  Will TemperatureEvent(40) , TemperatureEvent(50), match ? (because we
   have TemperatureEvent(30) at time 3 that does not match.
   - Will TemperatureEvent(40) , TemperatureEvent(70) match ? (because the
   pair matches also the specification of pattern , the difference is we have
   TemperatureEvent(50) which happened before TempertureEvent(70) ). Similar
   question for TemperatureEvent(40) - TemperatureEvent(65) and
   TemperatureEvent(50)-TemperatureEvent(65) etc. pairs.



--
Second question.
For next (and also for followedBy) , I have also questions regarding
example above:
Will TemperatureEvent(70), TemperatureEvent(65) and TemperatureEvent(65),
TemperatureEvent(60) be returned , or the second pair is no longer returned
because TemperatureEvent(65) has been used in the first pair ?



Is there a way to define the different sequence semantics  for the two
questions I asked above ?


Thanks,
Anwar.


Re: Where does Flink store intermediate state and triggering/scheduling a delayed event?

2016-02-03 Thread Anwar Rizal
Allow me to jump to this very interesting discussion.

The 2nd point is actually an interesting question.

I understand that we can set a timestamp of event in Flink. What if we set
the timestamp to somewhere in the future, for example 24 hours from now ?
Can Flink handle this case ?


Also , I'm still unclear whether the windowing can also be backed up by a
backend like RocksDB. Concretely, can we have a time window of 24 hours
while the tps is 100 TPS ?

Anwar.

On Wed, Feb 3, 2016 at 10:12 AM, Fabian Hueske  wrote:

> Hi,
>
> 1) At the moment, state is kept on the JVM heap in a regular HashMap.
>
> However, we added an interface for pluggable state backends. State
> backends store the operator state (Flink's built-in window operators are
> based on operator state as well). A pull request to add a RocksDB backend
> (going to disk) will be merged soon [1]. Another backend using Flink's
> managed memory is planned.
>
> 2) I am not sure what you mean by trigger / schedule a delayed event, but
> have a few pointers that might be helpful:
> - Flink can handle late arriving events. Check the event-time feature [2].
> - Flink's window triggers can be used to schedule window computations [3]
> - You can implement a custom source function that emits / triggers events.
>
> Best, Fabian
>
> [1] https://github.com/apache/flink/pull/1562
> [2]
> http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
> [3] http://flink.apache.org/news/2015/12/04/Introducing-windows.html
>
> 2016-02-03 5:39 GMT+01:00 Soumya Simanta :
>
>> I'm getting started with Flink and had a very fundamental doubt.
>>
>> 1) Where does Flink capture/store intermediate state?
>>
>> For example, two streams of data have a common key. The streams can lag
>> in time (second, hours or even days). My understanding is that Flink
>> somehow needs to store the data from the first (faster) stream so that it
>> can match and join the data with the second(slower) stream.
>>
>> 2) Is there a mechanism to trigger/schedule a delayed event in Flink?
>>
>> Thanks
>> -Soumya
>>
>>
>>
>>
>


Re: Doubt about window and count trigger

2015-11-27 Thread Anwar Rizal
Thanks Fabian and Aljoscha,

I try to implement the trigger as you described as follow:

https://gist.github.com/anonymous/d0578a4d27768a75bea4
<https://gist.github.com/anonymous/d0578a4d27768a75bea4>

It works fine , indeed.

Thanks,
Anwar


On Fri, Nov 27, 2015 at 11:49 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi Anwar,
> what Fabian wrote is completely right. I just want to give the reasoning
> for why the CountTrigger behaves as it does. The idea was to have Triggers
> that clearly focus on one thing and then at some point add combination
> triggers. For example, an OrTrigger that triggers if either of it’s sub
> triggers triggers, or an AndTrigger that triggers after both its sub
> triggers fire. (There is also more complex stuff that could be thought of
> here.)
>
> Cheers,
> Aljoscha
> > On 27 Nov 2015, at 09:59, fhue...@gmail.com wrote:
> >
> >
> > Hi,
> >
> > a regular tumbling time window of 5 seconds gets all elements within
> that period of time (semantics of time varies for processing, ingestion,
> and event time modes) and triggers the execution after 5 seconds.
> >
> > If you define a custom trigger, the assignment policy remains the same,
> but the trigger condition is overwritten (it is NOT additional but replaces
> the default condition), i.e., in your implementation, it will only trigger
> when 100 elements arrived. In order to trigger also when the window time
> expires, you have to register a timer (processing time or event time timer)
> via the trigger context.
> > NOTE: The window assigner will continue to assign elements to the
> window, even if the window was already evaluated. If you PURGE the window
> and an element arrives after that, a new window is created.
> >
> > To implement your trigger, you have to register a timer in the onEvent()
> method with:
> > ctx.registerEventTimeTimer(window.getEnd)
> > You can to that in every onEvent() call, because the timer is always
> overwritten.
> >
> > NOTE: you should use Flink’s keyed-state (access via triggerContext) if
> you want to keep state such as the current count.
> >
> > Hope this helps. Please let me know if you have further questions.
> > Fabian
> >
> >
> >
> >
> > From: Matthias J. Sax
> > Sent: Friday, November 27, 2015 08:44
> > To: user@flink.apache.org
> > Subject: Re: Doubt about window and count trigger
> >
> >
> > Hi,
> >
> > a Trigger is an *additional* condition for intermediate (early)
> > evaluation of the window. Thus, it is not "or-ed" to the basic window
> > definition.
> >
> > If you want to have an or-ed window condition, you can customize it by
> > specifying your own window definition.
> >
> > > dataStream.window(new MyOwnWindow() extends WindowAssigner { /* put
> your code here */ );
> >
> > -Matthias
> >
> >
> > On 11/26/2015 11:40 PM, Anwar Rizal wrote:
> > > Hi all,
> > >
> > > From the documentation:
> > > "The |Trigger| specifies when the function that comes after the window
> > > clause (e.g., |sum|, |count|) is evaluated (“fires”) for each window."
> > >
> > > So, basically, if I specify:
> > >
> > > |keyedStream
> > > .window(TumblingTimeWindows.of(Time.of(5, TimeUnit.SECONDS))
> > > .trigger(CountTrigger.of(100))|
> > >
> > > |
> > > |
> > >
> > > |The execution of the window function is triggered when the count
> reaches 100 in the time window of 5 seconds. If you have a system that
> never reaches 100 in 5 seconds, basically you will never have the window
> fired.|
> > >
> > > |
> > > |
> > >
> > > |My question is, what would be the best option to have behavior as
> follow:|
> > >
> > > |The execution of the window function is triggered when 5 seconds is
> reached or 100 events are received before 5 seconds.|
> > >
> > >
> > > I think of implementing my own trigger that looks like CountTrigger,
> but that will fire also when the end of time window is reached (at the
> moment, it just returns Continue, instead of Fired). But maybe there's a
> better way ?
> > >
> > > Is there a reason why CountTrigger is implemented as it is implemented
> today, and not as I described above (5 seconds or 100 events reached,
> whichever comes first).
> > >
> > >
> > > Thanks,
> > >
> > > Anwar.
> > >
>
>


Re: Doubt about window and count trigger

2015-11-27 Thread Anwar Rizal
Thanks Fabian,

Just for completion.
In that 1 min window, is my modified count trigger still valid ? Say, if in
that one minute window, I have 100 events after 30 s, it will still fire at
30th second  ?

Cheers,
anwar.



On Fri, Nov 27, 2015 at 3:31 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Anwar,
>
> You trigger looks good!
>
> I just want to make sure you know what it is exactly happening after a
> window was evaluated and purged.
> Once a window was purged, the whole window is cleared and removed. If a
> new element arrives, that would have fit into the purged window, a new
> window with exactly the same time boundaries is created, i.e., if you have
> a 5 min time window, that is fired and purged in minute 4 and a new element
> arrived immediately after the purging, it is put into a window, that will
> only "exist" for 1 more minute (and not starting a new 5 minute window).
>
> Cheers, Fabian
>
>
> 2015-11-27 14:59 GMT+01:00 Anwar Rizal <anriza...@gmail.com>:
>
>> Thanks Fabian and Aljoscha,
>>
>> I try to implement the trigger as you described as follow:
>>
>> https://gist.github.com/anonymous/d0578a4d27768a75bea4
>> <https://gist.github.com/anonymous/d0578a4d27768a75bea4>
>>
>> It works fine , indeed.
>>
>> Thanks,
>> Anwar
>>
>>
>> On Fri, Nov 27, 2015 at 11:49 AM, Aljoscha Krettek <aljos...@apache.org>
>> wrote:
>>
>>> Hi Anwar,
>>> what Fabian wrote is completely right. I just want to give the reasoning
>>> for why the CountTrigger behaves as it does. The idea was to have Triggers
>>> that clearly focus on one thing and then at some point add combination
>>> triggers. For example, an OrTrigger that triggers if either of it’s sub
>>> triggers triggers, or an AndTrigger that triggers after both its sub
>>> triggers fire. (There is also more complex stuff that could be thought of
>>> here.)
>>>
>>> Cheers,
>>> Aljoscha
>>> > On 27 Nov 2015, at 09:59, fhue...@gmail.com wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > a regular tumbling time window of 5 seconds gets all elements within
>>> that period of time (semantics of time varies for processing, ingestion,
>>> and event time modes) and triggers the execution after 5 seconds.
>>> >
>>> > If you define a custom trigger, the assignment policy remains the
>>> same, but the trigger condition is overwritten (it is NOT additional but
>>> replaces the default condition), i.e., in your implementation, it will only
>>> trigger when 100 elements arrived. In order to trigger also when the window
>>> time expires, you have to register a timer (processing time or event time
>>> timer) via the trigger context.
>>> > NOTE: The window assigner will continue to assign elements to the
>>> window, even if the window was already evaluated. If you PURGE the window
>>> and an element arrives after that, a new window is created.
>>> >
>>> > To implement your trigger, you have to register a timer in the
>>> onEvent() method with:
>>> > ctx.registerEventTimeTimer(window.getEnd)
>>> > You can to that in every onEvent() call, because the timer is always
>>> overwritten.
>>> >
>>> > NOTE: you should use Flink’s keyed-state (access via triggerContext)
>>> if you want to keep state such as the current count.
>>> >
>>> > Hope this helps. Please let me know if you have further questions.
>>> > Fabian
>>> >
>>> >
>>> >
>>> >
>>> > From: Matthias J. Sax
>>> > Sent: Friday, November 27, 2015 08:44
>>> > To: user@flink.apache.org
>>> > Subject: Re: Doubt about window and count trigger
>>> >
>>> >
>>> > Hi,
>>> >
>>> > a Trigger is an *additional* condition for intermediate (early)
>>> > evaluation of the window. Thus, it is not "or-ed" to the basic window
>>> > definition.
>>> >
>>> > If you want to have an or-ed window condition, you can customize it by
>>> > specifying your own window definition.
>>> >
>>> > > dataStream.window(new MyOwnWindow() extends WindowAssigner { /* put
>>> your code here */ );
>>> >
>>> > -Matthias
>>> >
>>> >
>>> > On 11/26/2015 11:40 PM, Anwar Rizal wrote:
>>> > > Hi all,
>>> > >
>>> > > Fr

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Anwar Rizal
Broadcast is what we do for the same type of your initial problem indeed.

In another thread, Stephan mentioned a possibility of using OperatorState
in ConnectedStream. I think this approach using OperatorState does the
business as well.

In my understanding, the approach using broadcast will require you to
checkpoint somewhere upstream. I'm not sure if OperatorState on
ConnectedStream will be a solution on this though.

On Tue, Nov 17, 2015 at 2:55 PM, Stephan Ewen  wrote:

> A global state that all can access read-only is doable via broadcast().
>
> A global state that is available to all for read and update is currently
> not available. Consistent operations on that would be quite costly, require
> some form of distributed communication/consensus.
>
> Instead, I would encourage you to go with the following:
>
> 1) If you can partition the state, use a keyBy().mapWithState() - That
> localizes state operations and makes it very fast.
>
> 2) If your state is not organized by key, your state is probably very
> small, and you may be able to use a non-parallel operation.
>
> 3) If some operation updates the state and another one accesses it, you
> can often implement that with iterations and a CoFlatMapFunction (one side
> is the original input, the other the feedback input).
>
> All approaches in the end localize state access and modifications, which
> is a good pattern to follow, if possible.
>
> Greetings,
> Stephan
>
>
>
> On Tue, Nov 17, 2015 at 2:44 PM, Vladimir Stoyak 
> wrote:
>
>> Not that I necessarily need that for this particular example, but is
>> there a Global State available?
>>
>> IE, how can I make a state available across all parallel instances of an
>> operator?
>>
>>
>>
>> On Tuesday, November 17, 2015 1:49 PM, Vladimir Stoyak 
>> wrote:
>>
>>
>> Perfect! It does explain my problem.
>>
>> Thanks a lot
>>
>>
>>
>> On Tuesday, November 17, 2015 1:43 PM, Stephan Ewen 
>> wrote:
>>
>>
>> Is the CoFlatMapFunction intended to be executed in parallel?
>>
>> If yes, you need some way to deterministically assign which record goes
>> to which parallel instance. In some way the CoFlatMapFunction does a
>> parallel (partitions) join between the model and the result of the session
>> windows, so you need some form of key that selects which partition the
>> elements go to. Does that make sense?
>>
>> If not, try to set it to parallelism 1 explicitly.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Tue, Nov 17, 2015 at 1:11 PM, Vladimir Stoyak 
>> wrote:
>>
>> My model DataStream is not keyed and does not have any windows, only the
>> main stream has windows and apply function
>>
>> I have two Kafka Streams, one for events and one for model
>>
>> DataStream model_stream
>> = env.addSource(new FlinkKafkaConsumer082(model_topic, new 
>> AvroDeserializationSchema(Model.class), properties));
>>
>>
>> DataStream main_stream = env.addSource(new 
>> FlinkKafkaConsumer082(raw_topic, new 
>> AvroDeserializationSchema(Raw.class), properties));
>>
>>
>> My topology looks like this:
>> main_stream
>> .assignTimestamps(new myTimeExtractor())
>> .keyBy("event_key")
>> .window(GlobalWindows.create())
>> .trigger(new sessionTrigger(session_timeout))
>> .apply(new AggFunction())
>> .connect(model_stream)
>> .flatMap(new applyModel())
>> .print();
>>
>>  AggFunction is a simple aggregate function:
>> Long start_ts=Long.MAX_VALUE;
>> Long end_ts=Long.MIN_VALUE;
>> Long dwell_time=0L,last_event_ts=0L;
>> int size = Lists.newArrayList(values).size();
>>
>> for (Raw value: values) {
>> if(value.getTs() > end_ts) end_ts = value.getTs();
>> if (value.getTs() < start_ts) start_ts = value.getTs();
>>
>> if(last_event_ts == 0L){
>> last_event_ts = value.getTs();
>> } else {
>> dwell_time += value.getTs() - last_event_ts;
>> last_event_ts = value.getTs();
>> }
>> }
>>
>> out.collect(new
>> Features(tuple.getField(0), tuple.getField(2), tuple.getField(1), start_ts, 
>> end_ts, size, dwell_time, Boolean.FALSE));
>>
>>
>>
>> On Tuesday, November 17, 2015 12:59 PM, Stephan Ewen 
>> wrote:
>>
>>
>> Hi!
>>
>> Can you give us a bit more context? For example share the structure of
>> the program (what stream get windowed and connected in what way)?
>>
>> I would guess that the following is the problem:
>>
>> When you connect one stream to another, then partition n of the first
>> stream connects with partition n of the other stream.
>> When you do a keyBy().window() then the system reshuffles the data, and
>> the records are in different partitions, meaning that they arrive in other
>> instances of the CoFlatMapFunction.
>>
>> You can also call keyBy() before both inputs to make sure that the
>> records are properly routed...
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, Nov 17, 

Re: Apache Flink Operator State as Query Cache

2015-11-16 Thread Anwar Rizal
Stephan,

Having a look at the brand new 0.10 release, I noticed that OperatorState
is not implemented for ConnectedStream, which is quite the opposite of what
you said below.

Or maybe I misunderstood your sentence here ?

Thanks,
Anwar.


On Wed, Nov 11, 2015 at 10:49 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> In general, if you can keep state in Flink, you get better
> throughput/latency/consistency and have one less system to worry about
> (external k/v store). State outside means that the Flink processes can be
> slimmer and need fewer resources and as such recover a bit faster. There
> are use cases for that as well.
>
> Storing the model in OperatorState is a good idea, if you can. On the
> roadmap is to migrate the operator state to managed memory as well, so that
> should take care of the GC issues.
>
> We are just adding functionality to make the Key/Value operator state
> usable in CoMap/CoFlatMap as well (currently it only works in windows and
> in Map/FlatMap/Filter functions over the KeyedStream).
> Until the, you should be able to use a simple Java HashMap and use the
> "Checkpointed" interface to get it persistent.
>
> Greetings,
> Stephan
>
>
> On Sun, Nov 8, 2015 at 10:11 AM, Welly Tambunan <if05...@gmail.com> wrote:
>
>> Thanks for the answer.
>>
>> Currently the approach that i'm using right now is creating a base/marker
>> interface to stream different type of message to the same operator. Not
>> sure about the performance hit about this compare to the CoFlatMap
>> function.
>>
>> Basically this one is providing query cache, so i'm thinking instead of
>> using in memory cache like redis, ignite etc, i can just use operator state
>> for this one.
>>
>> I just want to gauge do i need to use memory cache or operator state
>> would be just fine.
>>
>> However i'm concern about the Gen 2 Garbage Collection for caching our
>> own state without using operator state. Is there any clarification on that
>> one ?
>>
>>
>>
>> On Sat, Nov 7, 2015 at 12:38 AM, Anwar Rizal <anriza...@gmail.com> wrote:
>>
>>>
>>> Let me understand your case better here. You have a stream of model and
>>> stream of data. To process the data, you will need a way to access your
>>> model from the subsequent stream operations (map, filter, flatmap, ..).
>>> I'm not sure in which case Operator State is a good choice, but I think
>>> you can also live without.
>>>
>>> val modelStream =  // get the model stream
>>> val dataStream   =
>>>
>>> modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can
>>> keep the latest model in a CoFlatMapRichFunction, not necessarily as
>>> Operator State, although maybe OperatorState is a good choice too.
>>>
>>> Does it make sense to you ?
>>>
>>> Anwar
>>>
>>> On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan <if05...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have a high density data that required a downsample. However this
>>>> downsample model is very flexible based on the client device and user
>>>> interaction. So it will be wasteful to precompute and store to db.
>>>>
>>>> So we want to use Apache Flink to do downsampling and cache the result
>>>> for subsequent query.
>>>>
>>>> We are considering using Flink Operator state for that one.
>>>>
>>>> Is that the right approach to use that for memory cache ? Or if that
>>>> preferable using memory cache like redis etc.
>>>>
>>>> Any comments will be appreciated.
>>>>
>>>>
>>>> Cheers
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


Re: Apache Flink Operator State as Query Cache

2015-11-06 Thread Anwar Rizal
Let me understand your case better here. You have a stream of model and
stream of data. To process the data, you will need a way to access your
model from the subsequent stream operations (map, filter, flatmap, ..).
I'm not sure in which case Operator State is a good choice, but I think you
can also live without.

val modelStream =  // get the model stream
val dataStream   =

modelStream.broadcast.connect(dataStream). coFlatMap(  ) Then you can keep
the latest model in a CoFlatMapRichFunction, not necessarily as Operator
State, although maybe OperatorState is a good choice too.

Does it make sense to you ?

Anwar

On Fri, Nov 6, 2015 at 10:21 AM, Welly Tambunan  wrote:

> Hi All,
>
> We have a high density data that required a downsample. However this
> downsample model is very flexible based on the client device and user
> interaction. So it will be wasteful to precompute and store to db.
>
> So we want to use Apache Flink to do downsampling and cache the result for
> subsequent query.
>
> We are considering using Flink Operator state for that one.
>
> Is that the right approach to use that for memory cache ? Or if that
> preferable using memory cache like redis etc.
>
> Any comments will be appreciated.
>
>
> Cheers
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com 
>


Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Anwar Rizal
Yeah

I had similar problems with kafka in spark streaming. I worked around the
problem by excluding kafka from connector and then adding the library back.

Maybe you can try something like:

libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" % "0.9.1",
"org.apache.flink" % "flink-clients" % "0.9.1" ,"org.apache.flink" %
"flink-ml" % "0.9.1"  exclude("org.scalanlp",
"breeze_${scala.binary.version}"))

libraryDependencies += "org.scalanlp" % "breeze_2.10" % "0.11.2"

Anwar.




On Wed, Oct 28, 2015 at 11:29 AM, Frederick Ayala 
wrote:

> I tried adding libraryDependencies += "org.scalanlp" % "breeze_2.10" %
> "0.11.2"  but the problem persist.
>
> I also tried as explained in the Breeze documentation:
>
> libraryDependencies  ++= Seq(
>   "org.scalanlp" %% "breeze" % "0.11.2",
>   "org.scalanlp" %% "breeze-natives" % "0.11.2",
>   "org.scalanlp" %% "breeze-viz" % "0.11.2"
> )
>
> resolvers ++= Seq("Sonatype Releases" at "
> https://oss.sonatype.org/content/repositories/releases/;)
>
> But it doesn't work.
>
> The message is still "unresolved dependency:
> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found"
>
> Could the problem be on flink-ml/pom.xml?
>
> 
> org.scalanlp
> breeze_${scala.binary.version}
> 0.11.2
> 
>
> The property scala.binary.version is not being replaced by the value 2.10
>
> Thanks,
>
> Frederick Ayala
>
> On Wed, Oct 28, 2015 at 10:59 AM, DEVAN M.S.  wrote:
>
>> Can you add libraryDependencies += "org.scalanlp" % "breeze_2.10" %
>> "0.11.2" also ?
>>
>>
>>
>> Devan M.S. | Technical Lead | Cyber Security | AMRITA VISHWA VIDYAPEETHAM
>> | Amritapuri | Cell +919946535290 |
>> [image: View DEVAN M S's profile on LinkedIn]
>> 
>>
>>
>> On Wed, Oct 28, 2015 at 3:04 PM, Frederick Ayala <
>> frederickay...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am getting an error when adding flink-ml to the libraryDependencies on
>>> my build.sbt file:
>>>
>>> [error] (*:update) sbt.ResolveException: unresolved dependency:
>>> org.scalanlp#breeze_${scala.binary.version};0.11.2: not found
>>>
>>> My libraryDependencies is:
>>>
>>> libraryDependencies ++= Seq("org.apache.flink" % "flink-scala" %
>>> "0.9.1", "org.apache.flink" % "flink-streaming-scala" % "0.9.1",
>>> "org.apache.flink" % "flink-clients" % "0.9.1",
>>> "org.apache.flink" % "flink-ml" % "0.9.1")
>>>
>>> I am using scalaVersion := "2.10.6"
>>>
>>> If I remove flink-ml all the other dependencies are resolved.
>>>
>>> Could you help me to figure out a solution for this?
>>>
>>> Thanks!
>>>
>>> Frederick Ayala
>>>
>>
>>
>
>
> --
> Frederick Ayala
>


Re: Powered by Flink

2015-10-19 Thread Anwar Rizal
Nice indeed :-)


On Mon, Oct 19, 2015 at 3:08 PM, Suneel Marthi 
wrote:

> +1 to this.
>
> On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske  wrote:
>
>> Sounds good +1
>>
>> 2015-10-19 14:57 GMT+02:00 Márton Balassi :
>>
>> > Thanks for starting and big +1 for making it more prominent.
>> >
>> > On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske 
>> wrote:
>> >
>> >> Thanks for starting this Kostas.
>> >>
>> >> I think the list is quite hidden in the wiki. Should we link from
>> >> flink.apache.org to that page?
>> >>
>> >> Cheers, Fabian
>> >>
>> >> 2015-10-19 14:50 GMT+02:00 Kostas Tzoumas :
>> >>
>> >>> Hi everyone,
>> >>>
>> >>> I started a "Powered by Flink" wiki page, listing some of the
>> >>> organizations that are using Flink:
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
>> >>>
>> >>> If you would like to be added to the list, just send me a short email
>> >>> with your organization's name and a description and I will add you to
>> the
>> >>> wiki page.
>> >>>
>> >>> Best,
>> >>> Kostas
>> >>>
>> >>
>> >>
>> >
>>
>
>


Re: Flink Data Stream Union

2015-10-19 Thread Anwar Rizal
Do you really need to iterate ?

On Mon, Oct 19, 2015 at 5:42 PM, flinkuser  wrote:

>
> Here is my code snippet but I find the union operator not workable.
>
> DataStream msgDataStream1 = env.addSource((new
> SocketSource(hostName1,port,'\n',-1))).filter(new
> MessageFilter()).setParallelism(1);
> DataStream msgDataStream2 = env.addSource((new
> SocketSource(hostName2,port,'\n',-1))).filter(new
> MessageFilter()).setParallelism(1);
>
>
> DataStream stockStream =
> (msgDataStream1.union(msgDataStream2)).iterate();
> stockStream.print();
>
>
> The stockStream doesn’t print the consolidated stream data. Sometimes
> Stream1 is printed, sometimes none is printed.
>
> Can you please help me out, as of what is wrong here.
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Data-Stream-Union-tp3169.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Flink Kafka example in Scala

2015-07-20 Thread Anwar Rizal
I do the same trick as Wendong to avoid compilation error of sbt (excluding
kafka_$(scala.binary.version) )

I still don't manage to make sbt pass scala.binary.version to maven.

Anwar.

On Mon, Jul 20, 2015 at 9:42 AM, Till Rohrmann trohrm...@apache.org wrote:

 Hi Wendong,

 why do you exclude the kafka dependency from the `flink-connector-kafka`?
 Do you want to use your own kafka version?

 I'd recommend you to build a fat jar instead of trying to put the right
 dependencies in `/lib`. Here [1] you can see how to build a fat jar with
 sbt.

 Cheers,
 Till

 [1]
 http://stackoverflow.com/questions/28459333/how-to-build-an-uber-jar-fat-jar-using-sbt-within-intellij-idea

 On Sat, Jul 18, 2015 at 12:40 AM, Wendong wendong@gmail.com wrote:

 Hi Till,

 Thanks for the information. I'm using sbt and I have the following line in
 build.sbt:

 libraryDependencies += org.apache.flink % flink-connector-kafka %
 0.9.0 exclude(org.apache.kafka, kafka_${scala.binary.version})

 Also, I copied flink-connector-kafka-0.9.0.jar under
 flink_root_dir/lib/,
 but there is still ClassNotFoundException for KafkaSink.

 I appreciate it if you have any suggestion.

 Wendong



 --
 View this message in context:
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Kafka-example-in-Scala-tp2069p2144.html
 Sent from the Apache Flink User Mailing List archive. mailing list
 archive at Nabble.com.





Re: Flink Forward 2015

2015-04-07 Thread Anwar Rizal
Look great. Any dates for the abstract deadline already ?

On Tue, Apr 7, 2015 at 2:38 PM, Kostas Tzoumas ktzou...@apache.org wrote:

 Ah, thanks Sebastian! :-)

 On Tue, Apr 7, 2015 at 2:33 PM, Sebastian ssc.o...@googlemail.com wrote:

 There are still some Berlin Buzzwords snippets in your texts ;)

 http://flink-forward.org/?page_id=294


 On 07.04.2015 14:24, Kostas Tzoumas wrote:

 Hi everyone,

 The folks at data Artisans and the Berlin Big Data Center are organizing
 the first physical conference all about Apache Flink in Berlin the
 coming October:

 http://flink-forward.org

 The conference will be held in a beautiful spot an old brewery turned
 event space (the same space that Berlin Buzzwords took place last year).
 We are soliciting technical talks on Flink, talks on how you are using
 Flink to solve real world problems, as well as talks on Big Data
 technology in general that relate to Apache Flink's general direction.
 And of course, there will be enough social and networking events to get
 the community together :-)

 The website and the call for abstracts are live, but the ticket
 registration is not yet open.

 At this point, I would like to ask the community to mark your calendars
 if you'd like to attend, submit an abstract, and forward the event to
 your friends and family. If you can help us market the event, help in
 any other way, or have any other inquiries, please get in touch with me!

 I will also announce this via our social media channels this week.

 I am looking forward to gathering the community in a great conference!

 Best,
 Kostas