Re: Spark support for Complex Event Processing (CEP)

2016-05-09 Thread Esa Heikkinen


Sorry for answering delay.. Yes, this is not pure "CEP", but quite close 
for it or many similar "functionalities".


My case is not so easy, because i dont' want to compare against original 
time schedule of route.


I want to compare how close (ITS) system has estimated arrival time to 
bus stop.


That means i have to read more (LOG C) logs (and do little calculation) 
to determine the estimated arrival time.


And then it is checked how much a difference (error) between bus real 
and the system's estimated arrival time..


In practic the situation can be little more complex..

---
Esa Heikkinen

29.4.2016, 19:38, Michael Segel kirjoitti:
If you’re getting the logs, then it really isn’t CEP unless you 
consider the event to be the log from the bus.

This doesn’t sound like there is a time constraint.

Your bus schedule is fairly fixed and changes infrequently.
Your bus stops are relatively fixed points. (Within a couple of meters)

So then you’re taking bus A who is scheduled to drive route 123 and 
you want to compare their nearest location to the bus stop at time T 
and see how close it is to the scheduled route.



Or am I missing something?

-Mike

On Apr 29, 2016, at 3:54 AM, Esa Heikkinen 
> 
wrote:



Hi

I try to explain my case ..

Situation is not so simple in my logs and solution. There also many 
types of logs and there are from many sources.

They are as csv-format and header line includes names of the columns.

This is simplified description of input logs.

LOG A's: bus coordinate logs (every bus has own log):
- timestamp
- bus number
- coordinates

LOG B: bus login/logout (to/from line) message log:
- timestamp
- bus number
- line number

LOG C:  log from central computers:
- timestamp
- bus number
- bus stop number
- estimated arrival time to bus stop

LOG A are updated every 30 seconds (i have also another system by 1 
seconds interval). LOG B are updated when bus starts from terminal 
bus stop and arrives to final bus stop in a line. LOG C is updated 
when central computer sends new arrival time estimation to bus stop.


I also need metadata for logs (and analyzer). For example coordinates 
for bus stop areas.


Main purpose of analyzing is to check an accuracy (error) of the 
estimated arrival time to bus stops.


Because there are many buses and lines, it is too time-comsuming to 
check all of them. So i check only specific lines with specific bus 
stops. There are many buses (logged to lines) coming to one bus stop 
and i am interested about only certain bus.


To do that, i have to read log partly not in time order (upstream) by 
sequence:

1. From LOG C is searched bus number
2. From LOG A is searched when the bus has leaved from terminal bus stop
3. From LOG B is searched when bus has sent a login to the line
4. From LOG A is searched when the bus has entered to bus stop
5. From LOG C is searched a last estimated arrival time to the bus 
stop and calculates error between real and estimated value


In my understanding (almost) all log file analyzers reads all data 
(lines) in time order from log files. My need is only for specific 
part of log (lines). To achieve that, my solution is to read logs in 
an arbitrary order (with given time window).


I know this solution is not suitable for all cases (for example for 
very fast analyzing and very big data). This solution is suitable for 
very complex (targeted) analyzing. It can be too slow and 
memory-consuming, but well done pre-processing of log data can help a 
lot.


---
Esa Heikkinen

28.4.2016, 14:44, Michael Segel kirjoitti:

I don’t.

I believe that there have been a  couple of hack-a-thons like one 
done in Chicago a few years back using public transportation data.


The first question is what sort of data do you get from the city?

I mean it could be as simple as time_stamp, bus_id, route and GPS 
(x,y).   Or they could provide more information. Like last stop, 
distance to next stop, avg current velocity…


Then there is the frequency of the updates. Every second? Every 3 
seconds? 5 or 6 seconds…


This will determine how much work you have to do.

Maybe they provide the routes of the busses via a different API call 
since its relatively static.


This will drive your solution more than the underlying technology.

Oh and whileI focused on bus, there are also rail and other modes of 
public transportation like light rail, trains, etc …


HTH

-Mike


On Apr 28, 2016, at 4:10 AM, Esa Heikkinen 
> wrote:



Do you know any good examples how to use Spark streaming in 
tracking public transportation systems ?


Or Storm or some other tool example ?

Regards
Esa Heikkinen

28.4.2016, 3:16, Michael Segel kirjoitti:

Uhm…
I think you need to clarify a couple of things…

First there is this thing called analog signal processing…. Is 
that continuous enough for you?


But more to the point, Spark 

Re: Spark support for Complex Event Processing (CEP)

2016-04-29 Thread Michael Segel
If you’re getting the logs, then it really isn’t CEP unless you consider the 
event to be the log from the bus. 
This doesn’t sound like there is a time constraint. 

Your bus schedule is fairly fixed and changes infrequently. 
Your bus stops are relatively fixed points. (Within a couple of meters) 

So then you’re taking bus A who is scheduled to drive route 123 and you want to 
compare their nearest location to the bus stop at time T and see how close it 
is to the scheduled route. 


Or am I missing something? 

-Mike

> On Apr 29, 2016, at 3:54 AM, Esa Heikkinen  
> wrote:
> 
> 
> Hi
> 
> I try to explain my case ..
> 
> Situation is not so simple in my logs and solution. There also many types of 
> logs and there are from many sources.
> They are as csv-format and header line includes names of the columns.
> 
> This is simplified description of input logs.
> 
> LOG A's: bus coordinate logs (every bus has own log):
> - timestamp
> - bus number
> - coordinates
> 
> LOG B: bus login/logout (to/from line) message log:
> - timestamp
> - bus number
> - line number
> 
> LOG C:  log from central computers:
> - timestamp
> - bus number
> - bus stop number
> - estimated arrival time to bus stop
> 
> LOG A are updated every 30 seconds (i have also another system by 1 seconds 
> interval). LOG B are updated when bus starts from terminal bus stop and 
> arrives to final bus stop in a line. LOG C is updated when central computer 
> sends new arrival time estimation to bus stop.
> 
> I also need metadata for logs (and analyzer). For example coordinates for bus 
> stop areas.
> 
> Main purpose of analyzing is to check an accuracy (error) of the estimated 
> arrival time to bus stops.
> 
> Because there are many buses and lines, it is too time-comsuming to check all 
> of them. So i check only specific lines with specific bus stops. There are 
> many buses (logged to lines) coming to one bus stop and i am interested about 
> only certain bus.
> 
> To do that, i have to read log partly not in time order (upstream) by 
> sequence:
> 1. From LOG C is searched bus number
> 2. From LOG A is searched when the bus has leaved from terminal bus stop
> 3. From LOG B is searched when bus has sent a login to the line
> 4. From LOG A is searched when the bus has entered to bus stop
> 5. From LOG C is searched a last estimated arrival time to the bus stop and 
> calculates error between real and estimated value
> 
> In my understanding (almost) all log file analyzers reads all data (lines) in 
> time order from log files. My need is only for specific part of log (lines). 
> To achieve that, my solution is to read logs in an arbitrary order (with 
> given time window).
> 
> I know this solution is not suitable for all cases (for example for very fast 
> analyzing and very big data). This solution is suitable for very complex 
> (targeted) analyzing. It can be too slow and memory-consuming, but well done 
> pre-processing of log data can help a lot.
> 
> ---
> Esa Heikkinen
> 
> 28.4.2016, 14:44, Michael Segel kirjoitti:
>> I don’t.
>> 
>> I believe that there have been a  couple of hack-a-thons like one done in 
>> Chicago a few years back using public transportation data.
>> 
>> The first question is what sort of data do you get from the city? 
>> 
>> I mean it could be as simple as time_stamp, bus_id, route and GPS (x,y).   
>> Or they could provide more information. Like last stop, distance to next 
>> stop, avg current velocity… 
>> 
>> Then there is the frequency of the updates. Every second? Every 3 seconds? 5 
>> or 6 seconds…
>> 
>> This will determine how much work you have to do. 
>> 
>> Maybe they provide the routes of the busses via a different API call since 
>> its relatively static.
>> 
>> This will drive your solution more than the underlying technology. 
>> 
>> Oh and whileI focused on bus, there are also rail and other modes of public 
>> transportation like light rail, trains, etc … 
>> 
>> HTH
>> 
>> -Mike
>> 
>> 
>>> On Apr 28, 2016, at 4:10 AM, Esa Heikkinen >> > wrote:
>>> 
>>> 
>>> Do you know any good examples how to use Spark streaming in tracking public 
>>> transportation systems ?
>>> 
>>> Or Storm or some other tool example ?
>>> 
>>> Regards
>>> Esa Heikkinen
>>> 
>>> 28.4.2016, 3:16, Michael Segel kirjoitti:
 Uhm… 
 I think you need to clarify a couple of things…
 
 First there is this thing called analog signal processing…. Is that 
 continuous enough for you? 
 
 But more to the point, Spark Streaming does micro batching so if you’re 
 processing a continuous stream of tick data, you will have more than 50K 
 of tics per second while there are markets open and trading.  Even at 50K 
 a second, that would mean 1 every .02 ms or 50 ticks a ms. 
 
 And you don’t want to wait until you have a batch to start processing, but 
 you want to process when the 

Re: Spark support for Complex Event Processing (CEP)

2016-04-29 Thread Mich Talebzadeh
ok like any work you need to start this from a simple model. take one bus
only (identified by bus number which is unique).

for any bus no N you have two logs LOG A and LOG B and LOG C the
coordinator from Central computer that sends estimated time of arrival
(ETA) to the bus stops. Pretty simple.

What is the difference in timestamp in LOG and LOG B for bus N? Are they
the same.

Your window for a give bus would be (start from station, deterministic,
already known) and End Time (complete round). The end time could be start
from station + 1 hour say or any bigger.

val ssc = new StreamingContext(sparkConf, Seconds(xx))

Then it is pretty simple.. You need to work out

val windowLength = xx
val slidingInterval = yy

For each bus you have two topics (LOG A and LOG B) plus LOG C that you need
to update based on LOG A and LOG B. outcome

Start from this simple heuristic model first

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 29 April 2016 at 09:54, Esa Heikkinen 
wrote:

>
> Hi
>
> I try to explain my case ..
>
> Situation is not so simple in my logs and solution. There also many types
> of logs and there are from many sources.
> They are as csv-format and header line includes names of the columns.
>
> This is simplified description of input logs.
>
> LOG A's: bus coordinate logs (every bus has own log):
> - timestamp
> - bus number
> - coordinates
>
> LOG B: bus login/logout (to/from line) message log:
> - timestamp
> - bus number
> - line number
>
> LOG C:  log from central computers:
> - timestamp
> - bus number
> - bus stop number
> - estimated arrival time to bus stop
>
> LOG A are updated every 30 seconds (i have also another system by 1
> seconds interval). LOG B are updated when bus starts from terminal bus stop
> and arrives to final bus stop in a line. LOG C is updated when central
> computer sends new arrival time estimation to bus stop.
>
> I also need metadata for logs (and analyzer). For example coordinates for
> bus stop areas.
>
> Main purpose of analyzing is to check an accuracy (error) of the estimated
> arrival time to bus stops.
>
> Because there are many buses and lines, it is too time-comsuming to check
> all of them. So i check only specific lines with specific bus stops. There
> are many buses (logged to lines) coming to one bus stop and i am interested
> about only certain bus.
>
> To do that, i have to read log partly not in time order (upstream) by
> sequence:
> 1. From LOG C is searched bus number
> 2. From LOG A is searched when the bus has leaved from terminal bus stop
> 3. From LOG B is searched when bus has sent a login to the line
> 4. From LOG A is searched when the bus has entered to bus stop
> 5. From LOG C is searched a last estimated arrival time to the bus stop
> and calculates error between real and estimated value
>
> In my understanding (almost) all log file analyzers reads all data (lines)
> in time order from log files. My need is only for specific part of log
> (lines). To achieve that, my solution is to read logs in an arbitrary order
> (with given time window).
>
> I know this solution is not suitable for all cases (for example for very
> fast analyzing and very big data). This solution is suitable for very
> complex (targeted) analyzing. It can be too slow and memory-consuming, but
> well done pre-processing of log data can help a lot.
>
> ---
> Esa Heikkinen
>
>
> 28.4.2016, 14:44, Michael Segel kirjoitti:
>
> I don’t.
>
> I believe that there have been a  couple of hack-a-thons like one done in
> Chicago a few years back using public transportation data.
>
> The first question is what sort of data do you get from the city?
>
> I mean it could be as simple as time_stamp, bus_id, route and GPS (x,y).
> Or they could provide more information. Like last stop, distance to next
> stop, avg current velocity…
>
> Then there is the frequency of the updates. Every second? Every 3 seconds?
> 5 or 6 seconds…
>
> This will determine how much work you have to do.
>
> Maybe they provide the routes of the busses via a different API call since
> its relatively static.
>
> This will drive your solution more than the underlying technology.
>
> Oh and whileI focused on bus, there are also rail and other modes of
> public transportation like light rail, trains, etc …
>
> HTH
>
> -Mike
>
>
> On Apr 28, 2016, at 4:10 AM, Esa Heikkinen 
> wrote:
>
>
> Do you know any good examples how to use Spark streaming in tracking
> public transportation systems ?
>
> Or Storm or some other tool example ?
>
> Regards
> Esa Heikkinen
>
> 28.4.2016, 3:16, Michael Segel kirjoitti:
>
> Uhm…
> I think you need to clarify a couple of things…
>
> First there is this thing called analog signal processing…. Is that
> continuous enough for you?

Re: Spark support for Complex Event Processing (CEP)

2016-04-29 Thread Esa Heikkinen


Hi

I try to explain my case ..

Situation is not so simple in my logs and solution. There also many 
types of logs and there are from many sources.

They are as csv-format and header line includes names of the columns.

This is simplified description of input logs.

LOG A's: bus coordinate logs (every bus has own log):
- timestamp
- bus number
- coordinates

LOG B: bus login/logout (to/from line) message log:
- timestamp
- bus number
- line number

LOG C:  log from central computers:
- timestamp
- bus number
- bus stop number
- estimated arrival time to bus stop

LOG A are updated every 30 seconds (i have also another system by 1 
seconds interval). LOG B are updated when bus starts from terminal bus 
stop and arrives to final bus stop in a line. LOG C is updated when 
central computer sends new arrival time estimation to bus stop.


I also need metadata for logs (and analyzer). For example coordinates 
for bus stop areas.


Main purpose of analyzing is to check an accuracy (error) of the 
estimated arrival time to bus stops.


Because there are many buses and lines, it is too time-comsuming to 
check all of them. So i check only specific lines with specific bus 
stops. There are many buses (logged to lines) coming to one bus stop and 
i am interested about only certain bus.


To do that, i have to read log partly not in time order (upstream) by 
sequence:

1. From LOG C is searched bus number
2. From LOG A is searched when the bus has leaved from terminal bus stop
3. From LOG B is searched when bus has sent a login to the line
4. From LOG A is searched when the bus has entered to bus stop
5. From LOG C is searched a last estimated arrival time to the bus stop 
and calculates error between real and estimated value


In my understanding (almost) all log file analyzers reads all data 
(lines) in time order from log files. My need is only for specific part 
of log (lines). To achieve that, my solution is to read logs in an 
arbitrary order (with given time window).


I know this solution is not suitable for all cases (for example for very 
fast analyzing and very big data). This solution is suitable for very 
complex (targeted) analyzing. It can be too slow and memory-consuming, 
but well done pre-processing of log data can help a lot.


---
Esa Heikkinen

28.4.2016, 14:44, Michael Segel kirjoitti:

I don’t.

I believe that there have been a  couple of hack-a-thons like one done 
in Chicago a few years back using public transportation data.


The first question is what sort of data do you get from the city?

I mean it could be as simple as time_stamp, bus_id, route and GPS 
(x,y).   Or they could provide more information. Like last stop, 
distance to next stop, avg current velocity…


Then there is the frequency of the updates. Every second? Every 3 
seconds? 5 or 6 seconds…


This will determine how much work you have to do.

Maybe they provide the routes of the busses via a different API call 
since its relatively static.


This will drive your solution more than the underlying technology.

Oh and whileI focused on bus, there are also rail and other modes of 
public transportation like light rail, trains, etc …


HTH

-Mike


On Apr 28, 2016, at 4:10 AM, Esa Heikkinen 
> 
wrote:



Do you know any good examples how to use Spark streaming in tracking 
public transportation systems ?


Or Storm or some other tool example ?

Regards
Esa Heikkinen

28.4.2016, 3:16, Michael Segel kirjoitti:

Uhm…
I think you need to clarify a couple of things…

First there is this thing called analog signal processing…. Is that 
continuous enough for you?


But more to the point, Spark Streaming does micro batching so if 
you’re processing a continuous stream of tick data, you will have 
more than 50K of tics per second while there are markets open and 
trading.  Even at 50K a second, that would mean 1 every .02 ms or 50 
ticks a ms.


And you don’t want to wait until you have a batch to start 
processing, but you want to process when the data hits the queue and 
pull it from the queue as quickly as possible.


Spark streaming will be able to pull batches in as little as 500ms. 
So if you pull a batch at t0 and immediately have a tick in your 
queue, you won’t process that data until t0+500ms. And said batch 
would contain 25,000 entries.


Depending on what you are doing… that 500ms delay can be enough to 
be fatal to your trading process.


If you don’t like stock data, there are other examples mainly when 
pulling data from real time embedded systems.



If you go back and read what I said, if your data flow is >> (much 
slower) than 500ms, and / or the time to process is >> 500ms ( much 
longer )  you could use spark streaming.  If not… and there are 
applications which require that type of speed…  then you shouldn’t 
use spark streaming.


If you do have that constraint, then you can look at systems like 
storm/flink/samza / whatever where you have a 

Re: Spark support for Complex Event Processing (CEP)

2016-04-28 Thread Michael Segel
Look, you said that you didn’t have continuous data, and you do have continuous 
data. I just used an analog signal which can be converted. So that you end up 
with contiguous digital sampling.  

The point is that you have to consider that micro batches are still batched and 
you’re adding latency. 
Even at 500ms, if you’re dealing with a high velocity low latency stream, that 
delay can kill you. 

Time is relative.  Which is why Spark Streaming isn’t good enough for *all* 
streaming. It wasn’t designed to be a contiguous stream. 
And by contiguous I mean that at any given time, there will be an inbound 
message in the queue. 


> On Apr 28, 2016, at 3:57 PM, Mich Talebzadeh  
> wrote:
> 
> Also the point about
> 
> "First there is this thing called analog signal processing…. Is that 
> continuous enough for you? "
> 
> I agree  that analog signal processing like a sine wave,  an AM radio signal 
> – is truly continuous. However,  here we are talking about digital data which 
> will always be sent as bytes and typically with bytes grouped into messages . 
> In other words when we are sending data it is never truly continuous.  We are 
> sending discrete messages.
> 
> HTH,
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 28 April 2016 at 17:22, Mich Talebzadeh  > wrote:
> In a commerical (C)EP like say StreamBase, or for example its competitor 
> Apama, the arrival of an input event **immediately** triggers further 
> downstream processing.
> 
> This is admitadly an asynchronous approach, not a synchronous clock-driven 
> micro-batch approach like Spark's.
> 
> I suppose if one wants to split hairs / be philosophical, the clock rate of 
> the microprocessor chip underlies everything.  But I don't think that is 
> quite the point.
> 
> The point is that an asychonrous event-driven approach is as continuous / 
> immediate as **the underlying computer hardware will ever allow.**. It is not 
> limited by an architectural software clock.
> 
> So it is asynchronous vs synchronous that is the key issue, not just the 
> exact speed of the software clock in the synchronous approach.
> 
> It isalso indeed true that latencies down to the single digit microseconds 
> level can sometimes matter in financial trading but rarely.
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
>  
> 
> On 28 April 2016 at 12:44, Michael Segel  > wrote:
> I don’t.
> 
> I believe that there have been a  couple of hack-a-thons like one done in 
> Chicago a few years back using public transportation data.
> 
> The first question is what sort of data do you get from the city? 
> 
> I mean it could be as simple as time_stamp, bus_id, route and GPS (x,y).   Or 
> they could provide more information. Like last stop, distance to next stop, 
> avg current velocity… 
> 
> Then there is the frequency of the updates. Every second? Every 3 seconds? 5 
> or 6 seconds…
> 
> This will determine how much work you have to do. 
> 
> Maybe they provide the routes of the busses via a different API call since 
> its relatively static.
> 
> This will drive your solution more than the underlying technology. 
> 
> Oh and whileI focused on bus, there are also rail and other modes of public 
> transportation like light rail, trains, etc … 
> 
> HTH
> 
> -Mike
> 
> 
>> On Apr 28, 2016, at 4:10 AM, Esa Heikkinen > > wrote:
>> 
>> 
>> Do you know any good examples how to use Spark streaming in tracking public 
>> transportation systems ?
>> 
>> Or Storm or some other tool example ?
>> 
>> Regards
>> Esa Heikkinen
>> 
>> 28.4.2016, 3:16, Michael Segel kirjoitti:
>>> Uhm… 
>>> I think you need to clarify a couple of things…
>>> 
>>> First there is this thing called analog signal processing…. Is that 
>>> continuous enough for you? 
>>> 
>>> But more to the point, Spark Streaming does micro batching so if you’re 
>>> processing a continuous stream of tick data, you will have more than 50K of 
>>> tics per second while there are markets open and trading.  Even at 50K a 
>>> second, that would mean 1 every .02 ms or 50 ticks a ms. 
>>> 
>>> And you don’t want to wait until you have a batch to start processing, but 
>>> you want to process when the data hits the queue and pull it from the queue 
>>> as quickly as possible. 
>>> 
>>> Spark 

Re: Spark support for Complex Event Processing (CEP)

2016-04-28 Thread Mich Talebzadeh
Also the point about

"First there is this thing called analog signal processing…. Is that
continuous enough for you? "

I agree  that analog signal processing like a sine wave,  an AM radio
signal – is truly continuous. However,  here we are talking about digital
data which will always be sent as bytes and typically with bytes grouped
into messages . In other words when we are sending data it is never truly
continuous.  We are sending discrete messages.


HTH,



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 28 April 2016 at 17:22, Mich Talebzadeh 
wrote:

> In a commerical (C)EP like say StreamBase, or for example its competitor
> Apama, the arrival of an input event **immediately** triggers further
> downstream processing.
>
> This is admitadly an asynchronous approach, not a synchronous clock-driven
> micro-batch approach like Spark's.
>
> I suppose if one wants to split hairs / be philosophical, the clock rate
> of the microprocessor chip underlies everything.  But I don't think that
> is quite the point.
>
> The point is that an asychonrous event-driven approach is as continuous /
> immediate as **the underlying computer hardware will ever allow.**. It
> is not limited by an architectural software clock.
>
> So it is asynchronous vs synchronous that is the key issue, not just the
> exact speed of the software clock in the synchronous approach.
>
> It isalso indeed true that latencies down to the single digit microseconds
> level can sometimes matter in financial trading but rarely.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 April 2016 at 12:44, Michael Segel 
> wrote:
>
>> I don’t.
>>
>> I believe that there have been a  couple of hack-a-thons like one done in
>> Chicago a few years back using public transportation data.
>>
>> The first question is what sort of data do you get from the city?
>>
>> I mean it could be as simple as time_stamp, bus_id, route and GPS (x,y).
>>   Or they could provide more information. Like last stop, distance to next
>> stop, avg current velocity…
>>
>> Then there is the frequency of the updates. Every second? Every 3
>> seconds? 5 or 6 seconds…
>>
>> This will determine how much work you have to do.
>>
>> Maybe they provide the routes of the busses via a different API call
>> since its relatively static.
>>
>> This will drive your solution more than the underlying technology.
>>
>> Oh and whileI focused on bus, there are also rail and other modes of
>> public transportation like light rail, trains, etc …
>>
>> HTH
>>
>> -Mike
>>
>>
>> On Apr 28, 2016, at 4:10 AM, Esa Heikkinen 
>> wrote:
>>
>>
>> Do you know any good examples how to use Spark streaming in tracking
>> public transportation systems ?
>>
>> Or Storm or some other tool example ?
>>
>> Regards
>> Esa Heikkinen
>>
>> 28.4.2016, 3:16, Michael Segel kirjoitti:
>>
>> Uhm…
>> I think you need to clarify a couple of things…
>>
>> First there is this thing called analog signal processing…. Is that
>> continuous enough for you?
>>
>> But more to the point, Spark Streaming does micro batching so if you’re
>> processing a continuous stream of tick data, you will have more than 50K of
>> tics per second while there are markets open and trading.  Even at 50K a
>> second, that would mean 1 every .02 ms or 50 ticks a ms.
>>
>> And you don’t want to wait until you have a batch to start processing,
>> but you want to process when the data hits the queue and pull it from the
>> queue as quickly as possible.
>>
>> Spark streaming will be able to pull batches in as little as 500ms. So if
>> you pull a batch at t0 and immediately have a tick in your queue, you won’t
>> process that data until t0+500ms. And said batch would contain 25,000
>> entries.
>>
>> Depending on what you are doing… that 500ms delay can be enough to be
>> fatal to your trading process.
>>
>> If you don’t like stock data, there are other examples mainly when
>> pulling data from real time embedded systems.
>>
>>
>> If you go back and read what I said, if your data flow is >> (much
>> slower) than 500ms, and / or the time to process is >> 500ms ( much longer
>> )  you could use spark streaming.  If not… and there are applications which
>> require that type of speed…  then you shouldn’t use spark streaming.
>>
>> If you do have that constraint, then you can look at systems like
>> storm/flink/samza / whatever where you have a continuous queue and listener
>> and no micro batch delays.
>> Then for each bolt (storm) you can have a spark context for 

Re: Spark support for Complex Event Processing (CEP)

2016-04-28 Thread Mich Talebzadeh
In a commerical (C)EP like say StreamBase, or for example its competitor
Apama, the arrival of an input event **immediately** triggers further
downstream processing.

This is admitadly an asynchronous approach, not a synchronous clock-driven
micro-batch approach like Spark's.

I suppose if one wants to split hairs / be philosophical, the clock rate of
the microprocessor chip underlies everything.  But I don't think that
is quite the point.

The point is that an asychonrous event-driven approach is as continuous /
immediate as **the underlying computer hardware will ever allow.**. It
is not limited by an architectural software clock.

So it is asynchronous vs synchronous that is the key issue, not just the
exact speed of the software clock in the synchronous approach.

It isalso indeed true that latencies down to the single digit microseconds
level can sometimes matter in financial trading but rarely.

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 28 April 2016 at 12:44, Michael Segel  wrote:

> I don’t.
>
> I believe that there have been a  couple of hack-a-thons like one done in
> Chicago a few years back using public transportation data.
>
> The first question is what sort of data do you get from the city?
>
> I mean it could be as simple as time_stamp, bus_id, route and GPS (x,y).
> Or they could provide more information. Like last stop, distance to next
> stop, avg current velocity…
>
> Then there is the frequency of the updates. Every second? Every 3 seconds?
> 5 or 6 seconds…
>
> This will determine how much work you have to do.
>
> Maybe they provide the routes of the busses via a different API call since
> its relatively static.
>
> This will drive your solution more than the underlying technology.
>
> Oh and whileI focused on bus, there are also rail and other modes of
> public transportation like light rail, trains, etc …
>
> HTH
>
> -Mike
>
>
> On Apr 28, 2016, at 4:10 AM, Esa Heikkinen 
> wrote:
>
>
> Do you know any good examples how to use Spark streaming in tracking
> public transportation systems ?
>
> Or Storm or some other tool example ?
>
> Regards
> Esa Heikkinen
>
> 28.4.2016, 3:16, Michael Segel kirjoitti:
>
> Uhm…
> I think you need to clarify a couple of things…
>
> First there is this thing called analog signal processing…. Is that
> continuous enough for you?
>
> But more to the point, Spark Streaming does micro batching so if you’re
> processing a continuous stream of tick data, you will have more than 50K of
> tics per second while there are markets open and trading.  Even at 50K a
> second, that would mean 1 every .02 ms or 50 ticks a ms.
>
> And you don’t want to wait until you have a batch to start processing, but
> you want to process when the data hits the queue and pull it from the queue
> as quickly as possible.
>
> Spark streaming will be able to pull batches in as little as 500ms. So if
> you pull a batch at t0 and immediately have a tick in your queue, you won’t
> process that data until t0+500ms. And said batch would contain 25,000
> entries.
>
> Depending on what you are doing… that 500ms delay can be enough to be
> fatal to your trading process.
>
> If you don’t like stock data, there are other examples mainly when pulling
> data from real time embedded systems.
>
>
> If you go back and read what I said, if your data flow is >> (much slower)
> than 500ms, and / or the time to process is >> 500ms ( much longer )  you
> could use spark streaming.  If not… and there are applications which
> require that type of speed…  then you shouldn’t use spark streaming.
>
> If you do have that constraint, then you can look at systems like
> storm/flink/samza / whatever where you have a continuous queue and listener
> and no micro batch delays.
> Then for each bolt (storm) you can have a spark context for processing the
> data. (Depending on what sort of processing you want to do.)
>
> To put this in perspective… if you’re using spark streaming / akka / storm
> /etc to handle real time requests from the web, 500ms added delay can be a
> long time.
>
> Choose the right tool.
>
> For the OP’s problem. Sure Tracking public transportation could be done
> using spark streaming. It could also be done using half a dozen other tools
> because the rate of data generation is much slower than 500ms.
>
> HTH
>
>
> On Apr 27, 2016, at 4:34 PM, Mich Talebzadeh 
> wrote:
>
> couple of things.
>
> There is no such thing as Continuous Data Streaming as there is no such
> thing as Continuous Availability.
>
> There is such thing as Discrete Data Streaming and  High Availability  but
> they reduce the finite unavailability to minimum. In terms of business
> needs a 5 SIGMA is good enough and acceptable. Even the 

Re: Spark support for Complex Event Processing (CEP)

2016-04-28 Thread Michael Segel
I don’t.

I believe that there have been a  couple of hack-a-thons like one done in 
Chicago a few years back using public transportation data.

The first question is what sort of data do you get from the city? 

I mean it could be as simple as time_stamp, bus_id, route and GPS (x,y).   Or 
they could provide more information. Like last stop, distance to next stop, avg 
current velocity… 

Then there is the frequency of the updates. Every second? Every 3 seconds? 5 or 
6 seconds…

This will determine how much work you have to do. 

Maybe they provide the routes of the busses via a different API call since its 
relatively static.

This will drive your solution more than the underlying technology. 

Oh and whileI focused on bus, there are also rail and other modes of public 
transportation like light rail, trains, etc … 

HTH

-Mike


> On Apr 28, 2016, at 4:10 AM, Esa Heikkinen  
> wrote:
> 
> 
> Do you know any good examples how to use Spark streaming in tracking public 
> transportation systems ?
> 
> Or Storm or some other tool example ?
> 
> Regards
> Esa Heikkinen
> 
> 28.4.2016, 3:16, Michael Segel kirjoitti:
>> Uhm… 
>> I think you need to clarify a couple of things…
>> 
>> First there is this thing called analog signal processing…. Is that 
>> continuous enough for you? 
>> 
>> But more to the point, Spark Streaming does micro batching so if you’re 
>> processing a continuous stream of tick data, you will have more than 50K of 
>> tics per second while there are markets open and trading.  Even at 50K a 
>> second, that would mean 1 every .02 ms or 50 ticks a ms. 
>> 
>> And you don’t want to wait until you have a batch to start processing, but 
>> you want to process when the data hits the queue and pull it from the queue 
>> as quickly as possible. 
>> 
>> Spark streaming will be able to pull batches in as little as 500ms. So if 
>> you pull a batch at t0 and immediately have a tick in your queue, you won’t 
>> process that data until t0+500ms. And said batch would contain 25,000 
>> entries. 
>> 
>> Depending on what you are doing… that 500ms delay can be enough to be fatal 
>> to your trading process. 
>> 
>> If you don’t like stock data, there are other examples mainly when pulling 
>> data from real time embedded systems. 
>> 
>> 
>> If you go back and read what I said, if your data flow is >> (much slower) 
>> than 500ms, and / or the time to process is >> 500ms ( much longer )  you 
>> could use spark streaming.  If not… and there are applications which require 
>> that type of speed…  then you shouldn’t use spark streaming. 
>> 
>> If you do have that constraint, then you can look at systems like 
>> storm/flink/samza / whatever where you have a continuous queue and listener 
>> and no micro batch delays.
>> Then for each bolt (storm) you can have a spark context for processing the 
>> data. (Depending on what sort of processing you want to do.) 
>> 
>> To put this in perspective… if you’re using spark streaming / akka / storm 
>> /etc to handle real time requests from the web, 500ms added delay can be a 
>> long time. 
>> 
>> Choose the right tool. 
>> 
>> For the OP’s problem. Sure Tracking public transportation could be done 
>> using spark streaming. It could also be done using half a dozen other tools 
>> because the rate of data generation is much slower than 500ms. 
>> 
>> HTH
>> 
>> 
>>> On Apr 27, 2016, at 4:34 PM, Mich Talebzadeh >> > wrote:
>>> 
>>> couple of things.
>>> 
>>> There is no such thing as Continuous Data Streaming as there is no such 
>>> thing as Continuous Availability.
>>> 
>>> There is such thing as Discrete Data Streaming and  High Availability  but 
>>> they reduce the finite unavailability to minimum. In terms of business 
>>> needs a 5 SIGMA is good enough and acceptable. Even the candles set to a 
>>> predefined time interval say 2, 4, 15 seconds overlap. No FX savvy trader 
>>> makes a sell or buy decision on the basis of 2 seconds candlestick
>>> 
>>> The calculation itself in measurements is subject to finite error as 
>>> defined by their Confidence Level (CL) using Standard Deviation function.
>>> 
>>> OK so far I have never noticed a tool that requires that details of 
>>> granularity. Those stuff from Flink etc is in practical term is of little 
>>> value and does not make commercial sense.
>>> 
>>> Now with regard to your needs, Spark micro batching is perfectly adequate.
>>> 
>>> HTH
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn   
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> 
>>>  
>>> http://talebzadehmich.wordpress.com 
>>>  
>>> 
>>> On 27 April 2016 at 22:10, Esa Heikkinen < 
>>> 

Re: Spark support for Complex Event Processing (CEP)

2016-04-28 Thread Esa Heikkinen


Do you know any good examples how to use Spark streaming in tracking 
public transportation systems ?


Or Storm or some other tool example ?

Regards
Esa Heikkinen

28.4.2016, 3:16, Michael Segel kirjoitti:

Uhm…
I think you need to clarify a couple of things…

First there is this thing called analog signal processing…. Is that 
continuous enough for you?


But more to the point, Spark Streaming does micro batching so if 
you’re processing a continuous stream of tick data, you will have more 
than 50K of tics per second while there are markets open and trading. 
 Even at 50K a second, that would mean 1 every .02 ms or 50 ticks a ms.


And you don’t want to wait until you have a batch to start processing, 
but you want to process when the data hits the queue and pull it from 
the queue as quickly as possible.


Spark streaming will be able to pull batches in as little as 500ms. So 
if you pull a batch at t0 and immediately have a tick in your queue, 
you won’t process that data until t0+500ms. And said batch would 
contain 25,000 entries.


Depending on what you are doing… that 500ms delay can be enough to be 
fatal to your trading process.


If you don’t like stock data, there are other examples mainly when 
pulling data from real time embedded systems.



If you go back and read what I said, if your data flow is >> (much 
slower) than 500ms, and / or the time to process is >> 500ms ( much 
longer )  you could use spark streaming.  If not… and there are 
applications which require that type of speed…  then you shouldn’t use 
spark streaming.


If you do have that constraint, then you can look at systems like 
storm/flink/samza / whatever where you have a continuous queue and 
listener and no micro batch delays.
Then for each bolt (storm) you can have a spark context for processing 
the data. (Depending on what sort of processing you want to do.)


To put this in perspective… if you’re using spark streaming / akka / 
storm /etc to handle real time requests from the web, 500ms added 
delay can be a long time.


Choose the right tool.

For the OP’s problem. Sure Tracking public transportation could be 
done using spark streaming. It could also be done using half a dozen 
other tools because the rate of data generation is much slower than 
500ms.


HTH


On Apr 27, 2016, at 4:34 PM, Mich Talebzadeh 
> wrote:


couple of things.

There is no such thing as Continuous Data Streaming as there is no 
such thing as Continuous Availability.


There is such thing as Discrete Data Streaming and  High 
Availability  but they reduce the finite unavailability to minimum. 
In terms of business needs a 5 SIGMA is good enough and acceptable. 
Even the candles set to a predefined time interval say 2, 4, 15 
seconds overlap. No FX savvy trader makes a sell or buy decision on 
the basis of 2 seconds candlestick


The calculation itself in measurements is subject to finite error as 
defined by their Confidence Level (CL) using Standard Deviation 
function.


OK so far I have never noticed a tool that requires that details of 
granularity. Those stuff from Flink etc is in practical term is of 
little value and does not make commercial sense.


Now with regard to your needs, Spark micro batching is perfectly 
adequate.


HTH

Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com 




On 27 April 2016 at 22:10, Esa Heikkinen 
> 
wrote:



Hi

Thanks for the answer.

I have developed a log file analyzer for RTPIS (Real Time
Passenger Information System) system, where buses drive lines and
the system try to estimate the arrival times to the bus stops.
There are many different log files (and events) and analyzing
situation can be very complex. Also spatial data can be included
to the log data.

The analyzer also has a query (or analyzing) language, which
describes a expected behavior. This can be a requirement of
system. Analyzer can be think to be also a test oracle.

I have published a paper (SPLST'15 conference) about my analyzer
and its language. The paper is maybe too technical, but it is found:
http://ceur-ws.org/Vol-1525/paper-19.pdf

I do not know yet where it belongs. I think it can be some "CEP
with delays". Or do you know better ?
My analyzer can also do little bit more complex and
time-consuming analyzings? There are no a need for real time.

And it is possible to do "CEP with delays" reasonably some
existing analyzer (for example Spark) ?

Regards
PhD student at Tampere University of Technology, Finland,
www.tut.fi 
Esa Heikkinen

27.4.2016, 15:51, Michael Segel kirjoitti:

Spark and CEP? It depends…

Ok, I know that’s not the answer 

Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Mich Talebzadeh
couple of things.

There is no such thing as Continuous Data Streaming as there is no such
thing as Continuous Availability.

There is such thing as Discrete Data Streaming and  High Availability  but
they reduce the finite unavailability to minimum. In terms of business
needs a 5 SIGMA is good enough and acceptable. Even the candles set to a
predefined time interval say 2, 4, 15 seconds overlap. No FX savvy trader
makes a sell or buy decision on the basis of 2 seconds candlestick

The calculation itself in measurements is subject to finite error as
defined by their Confidence Level (CL) using Standard Deviation function.

OK so far I have never noticed a tool that requires that details of
granularity. Those stuff from Flink etc is in practical term is of little
value and does not make commercial sense.

Now with regard to your needs, Spark micro batching is perfectly adequate.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 27 April 2016 at 22:10, Esa Heikkinen 
wrote:

>
> Hi
>
> Thanks for the answer.
>
> I have developed a log file analyzer for RTPIS (Real Time Passenger
> Information System) system, where buses drive lines and the system try to
> estimate the arrival times to the bus stops. There are many different log
> files (and events) and analyzing situation can be very complex. Also
> spatial data can be included to the log data.
>
> The analyzer also has a query (or analyzing) language, which describes a
> expected behavior. This can be a requirement of system. Analyzer can be
> think to be also a test oracle.
>
> I have published a paper (SPLST'15 conference) about my analyzer and its
> language. The paper is maybe too technical, but it is found:
> http://ceur-ws.org/Vol-1525/paper-19.pdf
>
> I do not know yet where it belongs. I think it can be some "CEP with
> delays". Or do you know better ?
> My analyzer can also do little bit more complex and time-consuming
> analyzings? There are no a need for real time.
>
> And it is possible to do "CEP with delays" reasonably some existing
> analyzer (for example Spark) ?
>
> Regards
> PhD student at Tampere University of Technology, Finland, www.tut.fi
> Esa Heikkinen
>
> 27.4.2016, 15:51, Michael Segel kirjoitti:
>
> Spark and CEP? It depends…
>
> Ok, I know that’s not the answer you want to hear, but its a bit more
> complicated…
>
> If you consider Spark Streaming, you have some issues.
> Spark Streaming isn’t a Real Time solution because it is a micro batch
> solution. The smallest Window is 500ms.  This means that if your compute
> time is >> 500ms and/or  your event flow is >> 500ms this could work.
> (e.g. 'real time' image processing on a system that is capturing 60FPS
> because the processing time is >> 500ms. )
>
> So Spark Streaming wouldn’t be the best solution….
>
> However, you can combine spark with other technologies like Storm, Akka,
> etc .. where you have continuous streaming.
> So you could instantiate a spark context per worker in storm…
>
> I think if there are no class collisions between Akka and Spark, you could
> use Akka, which may have a better potential for communication between
> workers.
> So here you can handle CEP events.
>
> HTH
>
> On Apr 27, 2016, at 7:03 AM, Mich Talebzadeh 
> wrote:
>
> please see my other reply
>
> Dr Mich Talebzadeh
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 27 April 2016 at 10:40, Esa Heikkinen 
> wrote:
>
>> Hi
>>
>> I have followed with interest the discussion about CEP and Spark. It is
>> quite close to my research, which is a complex analyzing for log files and
>> "history" data  (not actually for real time streams).
>>
>> I have few questions:
>>
>> 1) Is CEP only for (real time) stream data and not for "history" data?
>>
>> 2) Is it possible to search "backward" (upstream) by CEP with given time
>> window? If a start time of the time window is earlier than the current
>> stream time.
>>
>> 3) Do you know any good tools or softwares for "CEP's" using for log data
>> ?
>>
>> 4) Do you know any good (scientific) papers i should read about CEP ?
>>
>>
>> Regards
>> PhD student at Tampere University of Technology, Finland, www.tut.fi
>> Esa Heikkinen
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
> The opinions expressed here are mine, while they may reflect a cognitive
> 

Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Esa Heikkinen


Hi

Thanks for the answer.

I have developed a log file analyzer for RTPIS (Real Time Passenger 
Information System) system, where buses drive lines and the system try 
to estimate the arrival times to the bus stops. There are many different 
log files (and events) and analyzing situation can be very complex. Also 
spatial data can be included to the log data.


The analyzer also has a query (or analyzing) language, which describes a 
expected behavior. This can be a requirement of system. Analyzer can be 
think to be also a test oracle.


I have published a paper (SPLST'15 conference) about my analyzer and its 
language. The paper is maybe too technical, but it is found:

http://ceur-ws.org/Vol-1525/paper-19.pdf

I do not know yet where it belongs. I think it can be some "CEP with 
delays". Or do you know better ?
My analyzer can also do little bit more complex and time-consuming 
analyzings? There are no a need for real time.


And it is possible to do "CEP with delays" reasonably some existing 
analyzer (for example Spark) ?


Regards
PhD student at Tampere University of Technology, Finland, www.tut.fi 


Esa Heikkinen

27.4.2016, 15:51, Michael Segel kirjoitti:

Spark and CEP? It depends…

Ok, I know that’s not the answer you want to hear, but its a bit more 
complicated…


If you consider Spark Streaming, you have some issues.
Spark Streaming isn’t a Real Time solution because it is a micro batch 
solution. The smallest Window is 500ms.  This means that if your 
compute time is >> 500ms and/or  your event flow is >> 500ms this 
could work.
(e.g. 'real time' image processing on a system that is capturing 60FPS 
because the processing time is >> 500ms. )


So Spark Streaming wouldn’t be the best solution….

However, you can combine spark with other technologies like Storm, 
Akka, etc .. where you have continuous streaming.

So you could instantiate a spark context per worker in storm…

I think if there are no class collisions between Akka and Spark, you 
could use Akka, which may have a better potential for communication 
between workers.

So here you can handle CEP events.

HTH

On Apr 27, 2016, at 7:03 AM, Mich Talebzadeh 
> wrote:


please see my other reply

Dr Mich Talebzadeh

LinkedIn 
/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/


http://talebzadehmich.wordpress.com 




On 27 April 2016 at 10:40, Esa Heikkinen 
> 
wrote:


Hi

I have followed with interest the discussion about CEP and Spark.
It is quite close to my research, which is a complex analyzing
for log files and "history" data  (not actually for real time
streams).

I have few questions:

1) Is CEP only for (real time) stream data and not for "history"
data?

2) Is it possible to search "backward" (upstream) by CEP with
given time window? If a start time of the time window is earlier
than the current stream time.

3) Do you know any good tools or softwares for "CEP's" using for
log data ?

4) Do you know any good (scientific) papers i should read about CEP ?


Regards
PhD student at Tampere University of Technology, Finland,
www.tut.fi 
Esa Heikkinen

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org





The opinions expressed here are mine, while they may reflect a 
cognitive thought, that is purely accidental.

Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com 









Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Mich Talebzadeh
please see my other reply

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 27 April 2016 at 10:40, Esa Heikkinen 
wrote:

> Hi
>
> I have followed with interest the discussion about CEP and Spark. It is
> quite close to my research, which is a complex analyzing for log files and
> "history" data  (not actually for real time streams).
>
> I have few questions:
>
> 1) Is CEP only for (real time) stream data and not for "history" data?
>
> 2) Is it possible to search "backward" (upstream) by CEP with given time
> window? If a start time of the time window is earlier than the current
> stream time.
>
> 3) Do you know any good tools or softwares for "CEP's" using for log data ?
>
> 4) Do you know any good (scientific) papers i should read about CEP ?
>
>
> Regards
> PhD student at Tampere University of Technology, Finland, www.tut.fi
> Esa Heikkinen
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark support for Complex Event Processing (CEP)

2016-04-27 Thread Esa Heikkinen

Hi

I have followed with interest the discussion about CEP and Spark. It is 
quite close to my research, which is a complex analyzing for log files 
and "history" data  (not actually for real time streams).


I have few questions:

1) Is CEP only for (real time) stream data and not for "history" data?

2) Is it possible to search "backward" (upstream) by CEP with given time 
window? If a start time of the time window is earlier than the current 
stream time.


3) Do you know any good tools or softwares for "CEP's" using for log data ?

4) Do you know any good (scientific) papers i should read about CEP ?


Regards
PhD student at Tampere University of Technology, Finland, www.tut.fi
Esa Heikkinen

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark support for Complex Event Processing (CEP)

2016-04-21 Thread Mich Talebzadeh
Hi Mario, I sorted that one out with Ted's help thanks

scalatest_2.11-2.2.6.jar


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 21 April 2016 at 17:57, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote:

> googling 'java error 'is not a member of package' and then even its
> related searches seemed to suggest it is not a missing jar problem, though
> i couldnt put a finger on what exactly it is in your case
>
> some specifically in spark-shell as well -
> http://spark-packages.org/package/databricks/spark-csv
>
>
> thanks
> Mario
>
> [image: Inactive hide details for Mich Talebzadeh ---21/04/2016 08:34:08
> pm---Hi, Following example in]Mich Talebzadeh ---21/04/2016 08:34:08
> pm---Hi, Following example in
>
> From: Mich Talebzadeh <mich.talebza...@gmail.com>
> To: Mario Ds Briggs/India/IBM@IBMIN
> Cc: Alonso Isidoro Roman <alons...@gmail.com>, Luciano Resende <
> luckbr1...@gmail.com>, "user @spark" <user@spark.apache.org>
> Date: 21/04/2016 08:34 pm
>
> Subject: Re: Spark support for Complex Event Processing (CEP)
> --
>
>
>
> Hi,
>
> Following example in
>
>
> *https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532*
> <https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532>
>
> Does anyone know which jar file this belongs to?
>
> I use *scalatest_2.11-2.2.6.jar *in my spark-shell
>
>  spark-shell --master spark://*50.140.197.217:7077*
> <http://50.140.197.217:7077/> --jars
> ,/home/hduser/jars/junit-4.12.jar,/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar,
> /*home/hduser/jars/scalatest_2.11-2.2.6.jar'*
>
> scala> import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
> :28: error: object scalatest is not a member of package org
>  import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
> LinkedIn
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> *http://talebzadehmich.wordpress.com*
> <http://talebzadehmich.wordpress.com/>
>
>
>
> On 20 April 2016 at 10:28, Mario Ds Briggs <*mario.bri...@in.ibm.com*
> <mario.bri...@in.ibm.com>> wrote:
>
>I did see your earlier post about Stratio decision. Will readup on it
>
>
>thanks
>Mario
>
>[image: Inactive hide details for Alonso Isidoro Roman ---20/04/2016
>02:24:39 pm---Stratio decision could do the job https://github.com]Alonso
>Isidoro Roman ---20/04/2016 02:24:39 pm---Stratio decision could do the job
>*https://github.com/Stratio/Decision*
><https://github.com/Stratio/Decision>
>
>From: Alonso Isidoro Roman <*alons...@gmail.com* <alons...@gmail.com>>
>To: Mich Talebzadeh <*mich.talebza...@gmail.com*
><mich.talebza...@gmail.com>>
>Cc: Mario Ds Briggs/India/IBM@IBMIN, Luciano Resende <
>*luckbr1...@gmail.com* <luckbr1...@gmail.com>>, "user @spark" <
>*user@spark.apache.org* <user@spark.apache.org>>
>Date: 20/04/2016 02:24 pm
>Subject: Re: Spark support for Complex Event Processing (CEP)
>--
>
>
>
>Stratio decision could do the job
>
> *https://github.com/Stratio/Decision*
><https://github.com/Stratio/Decision>
>
>
>
>Alonso Isidoro Roman.
>
>Mis citas preferidas (de hoy) :
>"Si depurar es el proceso de quitar los errores de software, entonces
>programar debe ser el proceso de introducirlos..."
> -  Edsger Dijkstra
>
>My favorite quotes (today):
>"If debugging is the process of removing software bugs, then
>programming must be the process of putting ..."
>  - Edsger Dijkstra
>
>"If you pay peanuts you get monkeys"
>
>
>2016-04-20 7:55 GMT+02:00 Mich Talebzadeh <*mich.talebza...@gmail.com*
><mich.talebza...@gmail.com>>:
>   Thanks a lot Mario. Will have a look.
>
>  Regards,
>
>
>  Dr Mich Talebzadeh
>
>  LinkedIn
>  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
>  
> <https://www.linkedi

Re: Spark support for Complex Event Processing (CEP)

2016-04-21 Thread Mario Ds Briggs

googling 'java error 'is not a member of package' and then even its related
searches seemed to suggest it is not a missing jar problem, though i
couldnt put a finger on what exactly it is in your case

some specifically in spark-shell as well -
http://spark-packages.org/package/databricks/spark-csv


thanks
Mario



From:   Mich Talebzadeh <mich.talebza...@gmail.com>
To: Mario Ds Briggs/India/IBM@IBMIN
Cc: Alonso Isidoro Roman <alons...@gmail.com>, Luciano Resende
<luckbr1...@gmail.com>, "user @spark" <user@spark.apache.org>
Date:   21/04/2016 08:34 pm
Subject:    Re: Spark support for Complex Event Processing (CEP)



Hi,

Following example in

https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532

Does anyone know which jar file this belongs to?

I use scalatest_2.11-2.2.6.jar in my spark-shell

 spark-shell --master spark://50.140.197.217:7077
--jars 
,/home/hduser/jars/junit-4.12.jar,/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar,
 /
home/hduser/jars/scalatest_2.11-2.2.6.jar'

scala> import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
:28: error: object scalatest is not a member of package org
 import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}

Thanks


Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com




On 20 April 2016 at 10:28, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote:
  I did see your earlier post about Stratio decision. Will readup on it


  thanks
  Mario

  Inactive hide details for Alonso Isidoro Roman ---20/04/2016 02:24:39
  pm---Stratio decision could do the job https://github.comAlonso Isidoro
  Roman ---20/04/2016 02:24:39 pm---Stratio decision could do the job
  https://github.com/Stratio/Decision

  From: Alonso Isidoro Roman <alons...@gmail.com>
  To: Mich Talebzadeh <mich.talebza...@gmail.com>
  Cc: Mario Ds Briggs/India/IBM@IBMIN, Luciano Resende <
  luckbr1...@gmail.com>, "user @spark" <user@spark.apache.org>
  Date: 20/04/2016 02:24 pm
  Subject: Re: Spark support for Complex Event Processing (CEP)



  Stratio decision could do the job

  https://github.com/Stratio/Decision



  Alonso Isidoro Roman.

  Mis citas preferidas (de hoy) :
  "Si depurar es el proceso de quitar los errores de software, entonces
  programar debe ser el proceso de introducirlos..."
   -  Edsger Dijkstra

  My favorite quotes (today):
  "If debugging is the process of removing software bugs, then programming
  must be the process of putting ..."
    - Edsger Dijkstra

  "If you pay peanuts you get monkeys"


  2016-04-20 7:55 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>:
Thanks a lot Mario. Will have a look.

Regards,


Dr Mich Talebzadeh

LinkedIn

https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


http://talebzadehmich.wordpress.com




On 20 April 2016 at 06:53, Mario Ds Briggs <mario.bri...@in.ibm.com
> wrote:
Hi Mich,

Info is here - https://issues.apache.org/jira/browse/SPARK-14745

overview is in the pdf -

https://issues.apache.org/jira/secure/attachment/12799670/SparkStreamingCEP.pdf


Usage examples not in the best place for now (will make it better)
-

https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532


Your feedback is appreciated.


thanks
Mario

Inactive hide details for Mich Talebzadeh ---19/04/2016 12:45:52
am---great stuff Mario. Much appreciated. MichMich Talebzadeh
---19/04/2016 12:45:52 am---great stuff Mario. Much appreciated.
Mich

From: Mich Talebzadeh <mich.talebza...@gmail.com>
To: Mario Ds Briggs/India/IBM@IBMIN
Cc: "user @spark" <user@spark.apache.org>, Luciano Resende <
luckbr1...@gmail.com>
Date: 19/04/2016 12:45 am
Subject: Re: Spark support for Complex Event Processing (CEP)




great stuff Mario. Much appreciated.

Mich

Dr Mich Talebzadeh

LinkedIn

https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


http://talebzadehmich.wordpress.com




On 18 April 2016 at 20:08, Mario Ds Briggs <mario.bri...@in.ibm.com
> wrote:
Hey Mich, Luciano

Will provide links with docs by tomorrow

thanks
Mario

- Message from Mich Talebzadeh <
mich.talebza...@gmail.com> on Sun, 17 Ap

Re: Spark support for Complex Event Processing (CEP)

2016-04-21 Thread Mich Talebzadeh
Hi,

Following example in

https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532

Does anyone know which jar file this belongs to?

I use *scalatest_2.11-2.2.6.jar *in my spark-shell

 spark-shell --master spark://50.140.197.217:7077 --jars
,/home/hduser/jars/junit-4.12.jar,/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar,
/*home/hduser/jars/scalatest_2.11-2.2.6.jar'*

scala> import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}
:28: error: object scalatest is not a member of package org
 import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll}

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 April 2016 at 10:28, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote:

> I did see your earlier post about Stratio decision. Will readup on it
>
>
> thanks
> Mario
>
> [image: Inactive hide details for Alonso Isidoro Roman ---20/04/2016
> 02:24:39 pm---Stratio decision could do the job https://github.com]Alonso
> Isidoro Roman ---20/04/2016 02:24:39 pm---Stratio decision could do the job
> https://github.com/Stratio/Decision
>
> From: Alonso Isidoro Roman <alons...@gmail.com>
> To: Mich Talebzadeh <mich.talebza...@gmail.com>
> Cc: Mario Ds Briggs/India/IBM@IBMIN, Luciano Resende <luckbr1...@gmail.com>,
> "user @spark" <user@spark.apache.org>
> Date: 20/04/2016 02:24 pm
> Subject: Re: Spark support for Complex Event Processing (CEP)
> --
>
>
>
> Stratio decision could do the job
>
> *https://github.com/Stratio/Decision*
> <https://github.com/Stratio/Decision>
>
>
>
> Alonso Isidoro Roman.
>
> Mis citas preferidas (de hoy) :
> "Si depurar es el proceso de quitar los errores de software, entonces
> programar debe ser el proceso de introducirlos..."
>  -  Edsger Dijkstra
>
> My favorite quotes (today):
> "If debugging is the process of removing software bugs, then programming
> must be the process of putting ..."
>   - Edsger Dijkstra
>
> "If you pay peanuts you get monkeys"
>
>
> 2016-04-20 7:55 GMT+02:00 Mich Talebzadeh <*mich.talebza...@gmail.com*
> <mich.talebza...@gmail.com>>:
>
>Thanks a lot Mario. Will have a look.
>
>Regards,
>
>
>Dr Mich Talebzadeh
>
>LinkedIn
>
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
>
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
>*http://talebzadehmich.wordpress.com*
><http://talebzadehmich.wordpress.com/>
>
>
>
>On 20 April 2016 at 06:53, Mario Ds Briggs <*mario.bri...@in.ibm.com*
><mario.bri...@in.ibm.com>> wrote:
>Hi Mich,
>
>Info is here - *https://issues.apache.org/jira/browse/SPARK-14745*
><https://issues.apache.org/jira/browse/SPARK-14745>
>
>overview is in the pdf -
>
> *https://issues.apache.org/jira/secure/attachment/12799670/SparkStreamingCEP.pdf*
>
> <https://issues.apache.org/jira/secure/attachment/12799670/SparkStreamingCEP.pdf>
>
>Usage examples not in the best place for now (will make it better) -
>
> *https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532*
>
> <https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532>
>
>Your feedback is appreciated.
>
>
>thanks
>Mario
>
>[image: Inactive hide details for Mich Talebzadeh ---19/04/2016
>12:45:52 am---great stuff Mario. Much appreciated. Mich]Mich
>Talebzadeh ---19/04/2016 12:45:52 am---great stuff Mario. Much appreciated.
>Mich
>
>From: Mich Talebzadeh <*mich.talebza...@gmail.com*
><mich.talebza...@gmail.com>>
>To: Mario Ds Briggs/India/IBM@IBMIN
>Cc: "user @spark" <*user@spark.apache.org* <user@spark.apache.org>>,
>Luciano Resende <*luckbr1...@gmail.com* <luckbr1...@gmail.com>>
>Date: 19/04/2016 12:45 am
>Subject: Re: Spark support for Complex Event Processing (CEP)
>--
>
>
>
>
>great stuff Mario. Much appreciated.
>
>Mich
>
>Dr Mich Talebzadeh
>
>LinkedIn
>
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6Ac

Re: Spark support for Complex Event Processing (CEP)

2016-04-20 Thread Mario Ds Briggs

I did see your earlier post about Stratio decision. Will readup on it


thanks
Mario



From:   Alonso Isidoro Roman <alons...@gmail.com>
To: Mich Talebzadeh <mich.talebza...@gmail.com>
Cc: Mario Ds Briggs/India/IBM@IBMIN, Luciano Resende
<luckbr1...@gmail.com>, "user @spark" <user@spark.apache.org>
Date:   20/04/2016 02:24 pm
Subject:    Re: Spark support for Complex Event Processing (CEP)



Stratio decision could do the job

https://github.com/Stratio/Decision



Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"


2016-04-20 7:55 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com>:
  Thanks a lot Mario. Will have a look.

  Regards,


  Dr Mich Talebzadeh

  LinkedIn
  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

  http://talebzadehmich.wordpress.com




  On 20 April 2016 at 06:53, Mario Ds Briggs <mario.bri...@in.ibm.com>
  wrote:
   Hi Mich,

   Info is here - https://issues.apache.org/jira/browse/SPARK-14745

   overview is in the pdf -
   
https://issues.apache.org/jira/secure/attachment/12799670/SparkStreamingCEP.pdf


   Usage examples not in the best place for now (will make it better) -
   
https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532


   Your feedback is appreciated.


   thanks
   Mario

   Inactive hide details for Mich Talebzadeh ---19/04/2016 12:45:52
   am---great stuff Mario. Much appreciated. MichMich Talebzadeh
   ---19/04/2016 12:45:52 am---great stuff Mario. Much appreciated. Mich

   From: Mich Talebzadeh <mich.talebza...@gmail.com>
   To: Mario Ds Briggs/India/IBM@IBMIN
   Cc: "user @spark" <user@spark.apache.org>, Luciano Resende <
   luckbr1...@gmail.com>
   Date: 19/04/2016 12:45 am
   Subject: Re: Spark support for Complex Event Processing (CEP)




   great stuff Mario. Much appreciated.

   Mich

   Dr Mich Talebzadeh

   LinkedIn
   
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


   http://talebzadehmich.wordpress.com




   On 18 April 2016 at 20:08, Mario Ds Briggs <mario.bri...@in.ibm.com>
   wrote:


 Hey Mich, Luciano

 Will provide links with docs by tomorrow

 thanks
 Mario

 - Message from Mich Talebzadeh <mich.talebza...@gmail.com> on
 Sun, 17 Apr 2016 19:17:38 +0100 -
   
To: Luciano Resende <luckbr1...@gmail.com> 
   
        cc: "user @spark" <user@spark.apache.org>      
   
   Subject: Re: Spark support for Complex Event
Processing (CEP)   
   

 Thanks Luciano. Appreciated.

 Regards

 Dr Mich Talebzadeh

 LinkedIn
 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


 http://talebzadehmich.wordpress.com




 On 17 April 2016 at 17:32, Luciano Resende <luckbr1...@gmail.com>
 wrote:


 Hi Mitch,

 I know some folks that were investigating/prototyping
 on this area, let me see if I can get them to reply
 here with more details.

 On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:
 Hi,

 Has Spark got libraries for CEP using Spark Streaming
 with Kafka by any chance?

 I am looking at Flink that supposed to have these
 libraries for CEP but I find Flink itself very much
 work in progress.

 Thanks

 Dr Mich Talebzadeh

 LinkedIn
 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


 http://talebzadehmich.wordpress.com






 --
 Luciano Resende
 http://twitter.com/lresende1975
 http://lresende.blogspot.com/

















Re: Spark support for Complex Event Processing (CEP)

2016-04-19 Thread Mich Talebzadeh
Thanks a lot Mario. Will have a look.

Regards,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 April 2016 at 06:53, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote:

> Hi Mich,
>
> Info is here - https://issues.apache.org/jira/browse/SPARK-14745
>
> overview is in the pdf -
> https://issues.apache.org/jira/secure/attachment/12799670/SparkStreamingCEP.pdf
>
> Usage examples not in the best place for now (will make it better) -
> https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532
>
> Your feedback is appreciated.
>
>
> thanks
> Mario
>
> [image: Inactive hide details for Mich Talebzadeh ---19/04/2016 12:45:52
> am---great stuff Mario. Much appreciated. Mich]Mich Talebzadeh
> ---19/04/2016 12:45:52 am---great stuff Mario. Much appreciated. Mich
>
> From: Mich Talebzadeh <mich.talebza...@gmail.com>
> To: Mario Ds Briggs/India/IBM@IBMIN
> Cc: "user @spark" <user@spark.apache.org>, Luciano Resende <
> luckbr1...@gmail.com>
> Date: 19/04/2016 12:45 am
> Subject: Re: Spark support for Complex Event Processing (CEP)
> --
>
>
>
> great stuff Mario. Much appreciated.
>
> Mich
>
> Dr Mich Talebzadeh
>
> LinkedIn
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> *http://talebzadehmich.wordpress.com*
> <http://talebzadehmich.wordpress.com/>
>
>
>
> On 18 April 2016 at 20:08, Mario Ds Briggs <*mario.bri...@in.ibm.com*
> <mario.bri...@in.ibm.com>> wrote:
>
>Hey Mich, Luciano
>
>Will provide links with docs by tomorrow
>
>thanks
>Mario
>
>- Message from Mich Talebzadeh <*mich.talebza...@gmail.com*
><mich.talebza...@gmail.com>> on Sun, 17 Apr 2016 19:17:38 +0100 -
>
>*To:*
>Luciano Resende <*luckbr1...@gmail.com* <luckbr1...@gmail.com>>
>
>*cc:*
>"user @spark" <*user@spark.apache.org* <user@spark.apache.org>>
>
>*Subject:*
>Re: Spark support for Complex Event Processing (CEP)Thanks Luciano.
>Appreciated.
>
>Regards
>
>Dr Mich Talebzadeh
>
>LinkedIn
>
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
>
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> *http://talebzadehmich.wordpress.com*
><http://talebzadehmich.wordpress.com/>
>
>
>
>On 17 April 2016 at 17:32, Luciano Resende <*luckbr1...@gmail.com*
><luckbr1...@gmail.com>> wrote:
>   Hi Mitch,
>
>  I know some folks that were investigating/prototyping on this
>  area, let me see if I can get them to reply here with more details.
>
>  On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh <
>  *mich.talebza...@gmail.com* <mich.talebza...@gmail.com>> wrote:
>  Hi,
>
>  Has Spark got libraries for CEP using Spark Streaming with Kafka
>  by any chance?
>
>  I am looking at Flink that supposed to have these libraries for
>  CEP but I find Flink itself very much work in progress.
>
>  Thanks
>
>  Dr Mich Talebzadeh
>
>  LinkedIn
>  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
>  
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> *http://talebzadehmich.wordpress.com*
>  <http://talebzadehmich.wordpress.com/>
>
>
>
>
>
>  --
>  Luciano Resende
> *http://twitter.com/lresende1975* <http://twitter.com/lresende1975>
> *http://lresende.blogspot.com/* <http://lresende.blogspot.com/>
>
>
>
>
>
>


Re: Spark support for Complex Event Processing (CEP)

2016-04-19 Thread Mario Ds Briggs

Hi Mich,

Info is here - https://issues.apache.org/jira/browse/SPARK-14745

overview is in the pdf -
https://issues.apache.org/jira/secure/attachment/12799670/SparkStreamingCEP.pdf

Usage examples not in the best place for now (will make it better) -
https://github.com/agsachin/spark/blob/CEP/external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala#L532

Your feedback is appreciated.


thanks
Mario



From:   Mich Talebzadeh <mich.talebza...@gmail.com>
To: Mario Ds Briggs/India/IBM@IBMIN
Cc: "user @spark" <user@spark.apache.org>, Luciano Resende
<luckbr1...@gmail.com>
Date:   19/04/2016 12:45 am
Subject:    Re: Spark support for Complex Event Processing (CEP)



great stuff Mario. Much appreciated.

Mich

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com




On 18 April 2016 at 20:08, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote:
  Hey Mich, Luciano

  Will provide links with docs by tomorrow

  thanks
  Mario

  - Message from Mich Talebzadeh <mich.talebza...@gmail.com> on Sun, 17
  Apr 2016 19:17:38 +0100 -
 
  To: Luciano Resende <luckbr1...@gmail.com> 
 
  cc: "user @spark" <user@spark.apache.org>  
                 
 Subject: Re: Spark support for Complex Event
  Processing (CEP)   
 

  Thanks Luciano. Appreciated.

  Regards

  Dr Mich Talebzadeh

  LinkedIn
  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


  http://talebzadehmich.wordpress.com




  On 17 April 2016 at 17:32, Luciano Resende <luckbr1...@gmail.com> wrote:
Hi Mitch,

I know some folks that were investigating/prototyping on this area,
let me see if I can get them to reply here with more details.

On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh <
mich.talebza...@gmail.com> wrote:
Hi,

Has Spark got libraries for CEP using Spark Streaming with Kafka by
any chance?

I am looking at Flink that supposed to have these libraries for CEP
but I find Flink itself very much work in progress.

Thanks

Dr Mich Talebzadeh

LinkedIn

https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw


http://talebzadehmich.wordpress.com






--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/













Re: Spark support for Complex Event Processing (CEP)

2016-04-18 Thread Mich Talebzadeh
great stuff Mario. Much appreciated.

Mich

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 18 April 2016 at 20:08, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote:

> Hey Mich, Luciano
>
> Will provide links with docs by tomorrow
>
> thanks
> Mario
>
> - Message from Mich Talebzadeh <mich.talebza...@gmail.com> on Sun, 17
> Apr 2016 19:17:38 +0100 -
>
> *To:*
> Luciano Resende <luckbr1...@gmail.com>
>
> *cc:*
> "user @spark" <user@spark.apache.org>
>
> *Subject:*
> Re: Spark support for Complex Event Processing (CEP)Thanks Luciano.
> Appreciated.
>
> Regards
>
> Dr Mich Talebzadeh
>
> LinkedIn
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> *http://talebzadehmich.wordpress.com*
> <http://talebzadehmich.wordpress.com/>
>
>
>
> On 17 April 2016 at 17:32, Luciano Resende <*luckbr1...@gmail.com*
> <luckbr1...@gmail.com>> wrote:
>
>Hi Mitch,
>
>I know some folks that were investigating/prototyping on this area,
>let me see if I can get them to reply here with more details.
>
>On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh <
>*mich.talebza...@gmail.com* <mich.talebza...@gmail.com>> wrote:
>Hi,
>
>Has Spark got libraries for CEP using Spark Streaming with Kafka by
>any chance?
>
>I am looking at Flink that supposed to have these libraries for CEP
>but I find Flink itself very much work in progress.
>
>Thanks
>
>Dr Mich Talebzadeh
>
>LinkedIn
>
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw*
>
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
>*http://talebzadehmich.wordpress.com*
><http://talebzadehmich.wordpress.com/>
>
>
>
>
>
>--
>Luciano Resende
> *http://twitter.com/lresende1975* <http://twitter.com/lresende1975>
> *http://lresende.blogspot.com/* <http://lresende.blogspot.com/>
>
>
>
>
>


Re: Spark support for Complex Event Processing (CEP)

2016-04-18 Thread Mario Ds Briggs

Hey Mich, Luciano

 Will provide links with docs by tomorrow

thanks
Mario

- Message from Mich Talebzadeh <mich.talebza...@gmail.com> on Sun, 17
Apr 2016 19:17:38 +0100 -
 
  To: Luciano Resende <luckbr1...@gmail.com> 
 
  cc: "user @spark" <user@spark.apache.org>  
         
 Subject: Re: Spark support for Complex Event        
  Processing (CEP)   
 

Thanks Luciano. Appreciated.

Regards

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com




On 17 April 2016 at 17:32, Luciano Resende <luckbr1...@gmail.com> wrote:
  Hi Mitch,

  I know some folks that were investigating/prototyping on this area, let
  me see if I can get them to reply here with more details.

  On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh <
  mich.talebza...@gmail.com> wrote:
   Hi,

   Has Spark got libraries for CEP using Spark Streaming with Kafka by any
   chance?

   I am looking at Flink that supposed to have these libraries for CEP but
   I find Flink itself very much work in progress.

   Thanks

   Dr Mich Talebzadeh

   LinkedIn
   
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

   http://talebzadehmich.wordpress.com






  --
  Luciano Resende
  http://twitter.com/lresende1975
  http://lresende.blogspot.com/




Re: Spark support for Complex Event Processing (CEP)

2016-04-17 Thread Mich Talebzadeh
Thanks Luciano. Appreciated.

Regards

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 17:32, Luciano Resende  wrote:

> Hi Mitch,
>
> I know some folks that were investigating/prototyping on this area, let me
> see if I can get them to reply here with more details.
>
> On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> Has Spark got libraries for CEP using Spark Streaming with Kafka by any
>> chance?
>>
>> I am looking at Flink that supposed to have these libraries for CEP but I
>> find Flink itself very much work in progress.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Spark support for Complex Event Processing (CEP)

2016-04-17 Thread Luciano Resende
Hi Mitch,

I know some folks that were investigating/prototyping on this area, let me
see if I can get them to reply here with more details.

On Sun, Apr 17, 2016 at 1:54 AM, Mich Talebzadeh 
wrote:

> Hi,
>
> Has Spark got libraries for CEP using Spark Streaming with Kafka by any
> chance?
>
> I am looking at Flink that supposed to have these libraries for CEP but I
> find Flink itself very much work in progress.
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/