Re: Listening to timed-out patterns in Flink CEP

2016-10-07 Thread lgfmt
The following is a better link:

http://mail-archives.apache.org/mod_mbox/flink-user/201609.mbox/%3CCAC27z%3DOTtv7USYUm82bE43-DkoGfVC4UAWD6uQwwRgTsE5be8g%40mail.gmail.com%3E


- LF
 



  From: "lg...@yahoo.com" 
 To: "user@flink.apache.org"  
 Sent: Friday, October 7, 2016 3:36 PM
 Subject: Re: Listening to timed-out patterns in Flink CEP
   
Isn't the upcoming CEP negation (absence of an event) feature solve this issue?
See this discussion 
thread:http://mail-archives.apache.org/mod_mbox/flink-user/201609.mbox/%3CCAC27z%3DOD%2BTq8twBw_1YKni5sWAU3g1S9WDpJw0DUwgiG9YX9Fg%40mail.gmail.com%3E

 //  Atul

  From: Till Rohrmann 
 To: user@flink.apache.org 
 Sent: Friday, October 7, 2016 12:58 AM
 Subject: Re: Listening to timed-out patterns in Flink CEP
  
Hi David,
in case of event time, the timeout will be detected when the first watermark 
exceeding the timeout value is received. Thus, it depends a little bit how you 
generate watermarks (e.g. periodically, watermark per event).
In case of processing time, the time is only updated whenever a new element 
arrives. Thus, if you have an element arriving 4 seconds after Event A, it 
should detect the timeout. If the next event arrives 20 seconds later, than you 
won't see the timeout until then.
In the case of processing time, we could think about registering timeout timers 
for processing time. However, I would highly recommend you to use event time, 
because with processing time, Flink cannot guarantee meaningful computations, 
because the events might arrive out of order.
Cheers,Till
On Thu, Oct 6, 2016 at 3:08 PM, David Koch  wrote:

Hello,
With Flink CEP, is there a way to actively listen to pattern matches that time 
out? I am under the impression that this is not possible.

In my case I partition a stream containing user web navigation by "userId" to 
look for sequences of Event A, followed by B within 4 seconds for each user.
I registered a PatternTimeoutFunction which assuming a non-match only fires 
upon the first event after the specified timeout. For example, given user X: 
Event A, 20 seconds later Event B (or any other type of event).
I'd rather have a notification fire directly upon the 4 second interval 
expiring since passive invalidation is not really applicable in my case.
How, if at all can this be achieved with Flink CEP?
Thanks,
David




   

   

Re: more complex patterns for CEP (was: CEP two transitions to the same state)

2016-10-07 Thread lgfmt
hi Till,
Thanks for the detailed response.
I'm looking forward to seeing these features implemented in Flink. Can anyone 
provide timelines for the 3 tickets that you mentioned in your response?  - LF




  From: Till Rohrmann 
 To: user@flink.apache.org 
 Sent: Tuesday, September 20, 2016 7:13 AM
 Subject: Re: more complex patterns for CEP (was: CEP two transitions to the 
same state)
   
Hi Frank,
thanks for sharing your analysis. It indeed pinpoints some of the current CEP 
library's shortcomings.
Let me address your points:
1. Lack of not operator
The functionality to express events which must not occur in a pattern is 
missing. We've currently a JIRA [1] which addresses exactly this. For the 
notFollowedBy operator, we should discard all patterns where we've seen a 
matching event for the not state. I think it could be implemented like a 
special terminal state where we prune the partial pattern.
For the notNext operator, we could think about keeping the event which has not 
matched the notNext state and return it as part of the fully matched pattern. 
Alternatively, we could simply forget about it once we've assured that it does 
not match.
2. Allow functions to access fields of previous events
This hasn't been implemented yet because it is a quite expensive operation. 
Before calling the filter function you always have to reconstruct the current 
partial pattern and then give it to the filter function. But I agree that the 
user should be allowed to use such a functionality (and then pay the price for 
it in terms of efficiency). Giving access to the partially matched fields via a 
Map would be a way to solve the problem on the API level.
I think that almost all functionality for this feature is already in place. We 
simply would have to check the filter condition whether they require access to 
previous events and then compute the partial pattern.
3. Support for recursive patterns
The underlying SharedBuffer implementation should allow recursive event 
patterns. Once we have support for branching CEP patterns [2] which allow to 
connect different states this should also be possible with some minor changes.
However, a more interesting way to specify recursive CEP patterns is to use 
regular expression syntax (Kleene star, bounded occurrences) to express 
recursive parts of a pattern. I think this makes specifying such a pattern 
easier and more intuitive for the user. We've also a JIRA issue to track the 
process there [3] and Ivan is already working on this.
If you want to get involved in Flink's CEP development, then feel free to take 
over any free JIRA issue or create one yourself :-)
[1] https://issues.apache.org/jira/browse/FLINK-3320
[2] https://issues.apache.org/jira/browse/FLINK-4641[3] 
https://issues.apache.org/jira/browse/FLINK-3318
Cheers,Till
On Fri, Sep 16, 2016 at 10:04 PM, Frank Dekervel  wrote:

Hello,
i did some more analysis wrt the problem i'm facing and the flink CEP api.
In order to complete the problem i'm facing using flink CEP i would need 3 
additions to the API (i think). I tried to understand the NFA logic, and i 
think 2 of them should be doable without fundamental changes.
First is to add a "negative" pattern (notFollowedBy / notNext):
Reason is the flow below: i have a start and a termination event, and an 
optional "failure" event in between. i want all succesful termination events, 
so i want to express there should not be a failure event between the start and 
the termination event. Note that there is no "success" event in this case on 
which i could match.


To implement, upon checking whether a transition would be possible, one would 
first need to check if it was not already dead-ended by a notFollowedBy / 
notNext. This would add a bit of complexity to the logic (when seeing if a 
transition is valid for a state, first check if on this state there was not 
already a match made to an notFollowedBy/notNext state. in that case one would 
reject the match)
Second is to allow the filterfunction to inspect the partial match made, so one 
would be able to filter based on the already-matched event. Reason is the 
following (hypothetical) example where we would match arrivals of a trains in a 
station. We cannot keyBy train (because the "occupied" events of the station 
don't have train information), neither can we keyBy station (as the start of 
the sequence is outside the station), so we need to add an additional condition 
for the second event: the train number must equal the train number of the first 
one. And in the third event, the station number should equal the station number 
of the second one.
I think this could be accomplished by overloading the where function with a 
second filterfunction variant that takes 2 parameters: the event considered + 
the partial match (as a Map with T the class of the event)


Third one is - i think - more difficult to accomplish, and that's more complex 
graphs i asked in my 

Re: Listening to timed-out patterns in Flink CEP

2016-10-07 Thread lgfmt
Isn't the upcoming CEP negation (absence of an event) feature solve this issue?
See this discussion 
thread:http://mail-archives.apache.org/mod_mbox/flink-user/201609.mbox/%3CCAC27z%3DOD%2BTq8twBw_1YKni5sWAU3g1S9WDpJw0DUwgiG9YX9Fg%40mail.gmail.com%3E

 //  Atul

  From: Till Rohrmann 
 To: user@flink.apache.org 
 Sent: Friday, October 7, 2016 12:58 AM
 Subject: Re: Listening to timed-out patterns in Flink CEP
   
Hi David,
in case of event time, the timeout will be detected when the first watermark 
exceeding the timeout value is received. Thus, it depends a little bit how you 
generate watermarks (e.g. periodically, watermark per event).
In case of processing time, the time is only updated whenever a new element 
arrives. Thus, if you have an element arriving 4 seconds after Event A, it 
should detect the timeout. If the next event arrives 20 seconds later, than you 
won't see the timeout until then.
In the case of processing time, we could think about registering timeout timers 
for processing time. However, I would highly recommend you to use event time, 
because with processing time, Flink cannot guarantee meaningful computations, 
because the events might arrive out of order.
Cheers,Till
On Thu, Oct 6, 2016 at 3:08 PM, David Koch  wrote:

Hello,
With Flink CEP, is there a way to actively listen to pattern matches that time 
out? I am under the impression that this is not possible.

In my case I partition a stream containing user web navigation by "userId" to 
look for sequences of Event A, followed by B within 4 seconds for each user.
I registered a PatternTimeoutFunction which assuming a non-match only fires 
upon the first event after the specified timeout. For example, given user X: 
Event A, 20 seconds later Event B (or any other type of event).
I'd rather have a notification fire directly upon the 4 second interval 
expiring since passive invalidation is not really applicable in my case.
How, if at all can this be achieved with Flink CEP?
Thanks,
David




   

Re: readCsvFile

2016-10-07 Thread Fabian Hueske
I would check that the field delimiter is correctly set.

With the correct delimiter your code would give

((a),1)
((aa),1)

because the single field is wrapped in a Tuple1.
You have to unwrap it in the map function: .map { (_._1, 1) }

2016-10-07 18:08 GMT+02:00 Alberto Ramón :

> Humm
>
> Your solution compile with out errors, but IncludedFields Isn't working:
> [image: Imágenes integradas 1]
>
> The output is incorrect:
> [image: Imágenes integradas 2]
>
> The correct result must be only 1º Column
> (a,1)
> (aa,1)
>
> 2016-10-06 21:37 GMT+02:00 Fabian Hueske :
>
>> Hi Alberto,
>>
>> if you want to read a single column you have to wrap it in a Tuple1:
>>
>> val text4 = env.readCsvFile[Tuple1[String]]("file:data.csv" ,includedFields 
>> = Array(1))
>>
>> Best, Fabian
>>
>> 2016-10-06 20:59 GMT+02:00 Alberto Ramón :
>>
>>> I'm learning readCsvFile
>>> (I discover if the file ends on "/n", you will return a null exception)
>>>
>>> *if I try to read only 1 column *
>>>
>>> val text4 = env.readCsvFile[String]("file:data.csv" ,includedFields = 
>>> Array(1))
>>>
>>> The error is: he type String has to be a tuple or pojo type. [null]
>>>
>>>
>>>
>>>
>>> *If  I put > 1 column; (*1º and 2º in this case*)*
>>>
>>> val text4 = env.readCsvFile [(String,String)]("data.csv"
>>>   ,fieldDelimiter = ","
>>>   ,includedFields = Array(0,1))
>>>
>>> Read all columns from, CSV (3 in my example)
>>>
>>>
>>>
>>>
>>
>


Re: jdbc.JDBCInputFormat

2016-10-07 Thread Fabian Hueske
As the exception says the class
org.apache.flink.api.scala.io.jdbc.JDBCInputFormat does not exist.

You have to do:

import org.apache.flink.api.java.io.jdbc.JDBCInputFormat

There is no Scala implementation of this class but you can also use Java
classes in Scala.

2016-10-07 21:38 GMT+02:00 Alberto Ramón :

>
> I want use CreateInput + buildJDBCInputFormat to acces to database on SCALA
>
> PB1:
>
> import org.apache.flink.api.scala.io.jdbc.JDBCInputFormat
> Error:(25, 37) object jdbc is not a member of package 
> org.apache.flink.api.java.io
> import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
>
> Then, I can't use:
> [image: Imágenes integradas 1]
>
> I tried to download code from git and recompile, also
>
>


jdbc.JDBCInputFormat

2016-10-07 Thread Alberto Ramón
I want use CreateInput + buildJDBCInputFormat to acces to database on SCALA

PB1:

import org.apache.flink.api.scala.io.jdbc.JDBCInputFormat
Error:(25, 37) object jdbc is not a member of package
org.apache.flink.api.java.io
import org.apache.flink.api.java.io.jdbc.JDBCInputFormat

Then, I can't use:
[image: Imágenes integradas 1]

I tried to download code from git and recompile, also


Re: Data Transfer between TM should be encrypted

2016-10-07 Thread vinay patil
Hi Stephan,

https://github.com/apache/flink/pull/2518
Is this pull request going to be part of 1.2 release ? Just wanted to get
an idea on timelines so that I can pass on to the team.


Regards,
Vinay Patil

On Thu, Sep 15, 2016 at 11:45 AM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com> wrote:

> Hi Vinay,
>
> There are some delays and we expect the PR to be created next week.
>
> Regards
> Vijay
>
> On Wednesday, September 14, 2016 5:41 PM, vinay patil <
> vinay18.pa...@gmail.com> wrote:
>
>
> Hi Vijay,
>
> Did you raise the PR for this task, I don't mind testing it out as well.
>
> Regards,
> Vinay Patil
>
> On Tue, Aug 30, 2016 at 6:28 PM, Vinay Patil <[hidden email]> wrote:
>
> Hi Vijay,
>
> That's a good news for me. Eagerly waiting for this change so that I can
> integrate and test it before going live.
>
> Regards,
> Vinay Patil
>
> On Tue, Aug 30, 2016 at 4:06 PM, Vijay Srinivasaraghavan [via Apache Flink
> User Mailing List archive.] <[hidden email]> wrote:
>
> Hi Stephan,
>
> The dev work is almost complete except the Yarn mode deployment stuff that
> needs to be patched. We are expecting to send a PR in a week or two.
>
> Regards
> Vijay
>
>
> On Tuesday, August 30, 2016 12:39 AM, Stephan Ewen <[hidden email]> wrote:
>
>
> Let me loop in Vijay, I think he is the one working on this and can
> probably give the best estimate when it can be expected.
>
> @vijay: For the SSL/TLS transport encryption - do you have an estimate for
> the timeline of that feature?
>
>
> On Mon, Aug 29, 2016 at 8:54 PM, vinay patil <[hidden email]> wrote:
>
> Hi Stephan,
>
> Thank you for your reply.
>
> Till when can I expect this feature to be integrated in master or release
> version ?
>
> We are going to get production data (financial data) in October end , so
> want to have this feature before that.
>
> Regards,
> Vinay Patil
>
> On Mon, Aug 29, 2016 at 11:15 AM, Stephan Ewen [via Apache Flink User
> Mailing List archive.] <[hidden email]> wrote:
>
> Hi!
>
> The way that the JIRA issue you linked will achieve this is by hooking
> into the network stream pipeline directly, and encrypt the raw network byte
> stream. We built the network stack on Netty, and will use Netty's SSL/TLS
> handlers for that.
>
> That should be much more efficient than manual encryption/decryption in
> each user function.
>
> Stephan
>
>
>
>
>
>
> On Mon, Aug 29, 2016 at 6:12 PM, vinay patil <[hidden email]> wrote:
>
> Hi Ufuk,
>
> This is regarding this issue
> https://issues.apache.org/jira /browse/FLINK-4404
> 
>
> How can we achieve this, I am able to decrypt the data from Kafka coming
> in, but I want to make sure that the data is encrypted when flowing between
> TM's.
>
> One approach I can think of is to decrypt the data at the start of each
> operator and encrypt it at the end of each operator, but I feel this is not
> an efficient approach.
>
> I just want to check if there are alternatives to this and can this be
> achieved by doing some configurations.
>
> Regards,
> Vinay Patil
>
> --
> View this message in context: Data Transfer between TM should be encrypted
> 
> Sent from the Apache Flink User Mailing List archive. mailing list archive
>  at
> Nabble.com.
>
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-maili ng-list-archive.2336050.n4.
> nabble.com/Data-Transfer-betwe en-TM-should-be-encrypted- tp8781p8782.html
> 
> To start a new topic under Apache Flink User Mailing List archive., email 
> [hidden
> email]
> To unsubscribe from Apache Flink User Mailing List archive., click here.
> NAML
> 
>
>
>
> --
> View this message in context: Re: Data Transfer between TM should be
> encrypted
> 
>
> Sent from the Apache Flink User Mailing List archive. mailing list archive
>  at
> Nabble.com.
>
>
>
>
>
>
> --
> If you reply to 

Re: readCsvFile

2016-10-07 Thread Alberto Ramón
Humm

Your solution compile with out errors, but IncludedFields Isn't working:
[image: Imágenes integradas 1]

The output is incorrect:
[image: Imágenes integradas 2]

The correct result must be only 1º Column
(a,1)
(aa,1)

2016-10-06 21:37 GMT+02:00 Fabian Hueske :

> Hi Alberto,
>
> if you want to read a single column you have to wrap it in a Tuple1:
>
> val text4 = env.readCsvFile[Tuple1[String]]("file:data.csv" ,includedFields = 
> Array(1))
>
> Best, Fabian
>
> 2016-10-06 20:59 GMT+02:00 Alberto Ramón :
>
>> I'm learning readCsvFile
>> (I discover if the file ends on "/n", you will return a null exception)
>>
>> *if I try to read only 1 column *
>>
>> val text4 = env.readCsvFile[String]("file:data.csv" ,includedFields = 
>> Array(1))
>>
>> The error is: he type String has to be a tuple or pojo type. [null]
>>
>>
>>
>>
>> *If  I put > 1 column; (*1º and 2º in this case*)*
>>
>> val text4 = env.readCsvFile [(String,String)]("data.csv"
>>   ,fieldDelimiter = ","
>>   ,includedFields = Array(0,1))
>>
>> Read all columns from, CSV (3 in my example)
>>
>>
>>
>>
>


Re: Compression for AvroOutputFormat

2016-10-07 Thread Kostas Kloudas
Hi Lars,

As far as I know there are no plans to do so in the near future, 
but every contribution is welcome. 

Looking forward to your Pull Request.

Regards,
Kostas

> On Oct 7, 2016, at 12:40 PM, lars.bachm...@posteo.de wrote:
> 
> Hi,
> 
> at the moment it is not possible to set a compression for the 
> AvroOutputFormat. There is a post in the mailing list from april this year 
> about the same topic but it seams that nothing has happened so far. Are there 
> any plans to add this feature? Otherwise I could contribute this code.
> 
> Regards,
> 
> Lars
> 
> 



Tuple vs Row

2016-10-07 Thread Flavio Pompermaier
Hi to all,

is there any performance degradation using Row instead of Tuple objects in
Flink?

Best,
Flavio


Compression for AvroOutputFormat

2016-10-07 Thread lars . bachmann

Hi,

at the moment it is not possible to set a compression for the 
AvroOutputFormat. There is a post in the mailing list from april this 
year about the same topic but it seams that nothing has happened so far. 
Are there any plans to add this feature? Otherwise I could contribute 
this code.


Regards,

Lars




Re: Listening to timed-out patterns in Flink CEP

2016-10-07 Thread Till Rohrmann
Hi David,

in case of event time, the timeout will be detected when the first
watermark exceeding the timeout value is received. Thus, it depends a
little bit how you generate watermarks (e.g. periodically, watermark per
event).

In case of processing time, the time is only updated whenever a new element
arrives. Thus, if you have an element arriving 4 seconds after Event A, it
should detect the timeout. If the next event arrives 20 seconds later, than
you won't see the timeout until then.

In the case of processing time, we could think about registering timeout
timers for processing time. However, I would highly recommend you to use
event time, because with processing time, Flink cannot guarantee meaningful
computations, because the events might arrive out of order.

Cheers,
Till

On Thu, Oct 6, 2016 at 3:08 PM, David Koch  wrote:

> Hello,
>
> With Flink CEP, is there a way to actively listen to pattern matches that
> time out? I am under the impression that this is not possible.
>
> In my case I partition a stream containing user web navigation by "userId"
> to look for sequences of Event A, followed by B within 4 seconds for each
> user.
>
> I registered a PatternTimeoutFunction which assuming a non-match only
> fires upon the first event after the specified timeout. For example, given
> user X: Event A, 20 seconds later Event B (or any other type of event).
>
> I'd rather have a notification fire directly upon the 4 second interval
> expiring since passive invalidation is not really applicable in my case.
>
> How, if at all can this be achieved with Flink CEP?
>
> Thanks,
>
> David
>
>


Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-07 Thread Fabian Hueske
If you are using time windows, you can access the TimeWindow parameter of
the WindowFunction.apply() method.
The TimeWindow contains the start and end timestamp of a window (as Long)
which can act as keys.

If you are using count windows, I think you have to use a counter as you
described.


2016-10-07 1:06 GMT+02:00 AJ Heller :

> Thank you Fabian, I think that solves it. I'll need to rig up some tests
> to verify, but it looks good.
>
> I used a RichMapFunction to assign ids incrementally to windows (mapping
> STREAM_OBJECT to Tuple2 using a private long value in
> the mapper that increments on every map call). It works, but by any chance
> is there a more succinct way to do it?
>
> On Thu, Oct 6, 2016 at 1:50 PM, Fabian Hueske  wrote:
>
>> Maybe this can be done by assigning the same window id to each of the N
>> local windows, and do a
>>
>> .keyBy(windowId)
>> .countWindow(N)
>>
>> This should create a new global window for each window id and collect all
>> N windows.
>>
>> Best, Fabian
>>
>> 2016-10-06 22:39 GMT+02:00 AJ Heller :
>>
>>> The goal is:
>>>  * to split data, random-uniformly, across N nodes,
>>>  * window the data identically on each node,
>>>  * transform the windows locally on each node, and
>>>  * merge the N parallel windows into a global window stream, such that
>>> one window from each parallel process is merged into a "global window"
>>> aggregate
>>>
>>> I've achieved all but the last bullet point, merging one window from
>>> each partition into a globally-aggregated window output stream.
>>>
>>> To be clear, a rolling reduce won't work because it would aggregate over
>>> all previous windows in all partitioned streams, and I only need to
>>> aggregate over one window from each partition at a time.
>>>
>>> Similarly for a fold.
>>>
>>> The closest I have found is ParallelMerge for ConnectedStreams, but I
>>> have not found a way to apply it to this problem. Can flink achieve this?
>>> If so, I'd greatly appreciate a point in the right direction.
>>>
>>> Cheers,
>>> -aj
>>>
>>
>>
>