Re: flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-07 Thread Yan Zhou [FDS Science]
Hi Xingcan,


Thanks for your help. Attached is a sample code that can reproduce the problem.

When I was writing the sample code, if I remove the `distinct` keyword in 
select clause, the AssertionError doesn't occur.


String sql1 = "select distinct id, eventTs, count(*) over (partition by id 
order by eventTs rows between 100 preceding and current row) as cnt1 from 
myTable";

Best
Yan

From: xccui-foxmail 
Sent: Wednesday, March 7, 2018 8:10 PM
To: Yan Zhou [FDS Science]
Cc: user@flink.apache.org
Subject: Re: flink sql timed-window join throw "mismatched type" AssertionError 
on rowtime column

Hi Yan,

I’d like to look into this. Can you share more about your queries and the full 
stack trace?

Thank,
Xingcan

On 8 Mar 2018, at 11:28 AM, Yan Zhou [FDS Science] 
> wrote:

Hi experts,
I am using flink table api to join two tables, which are datastream underneath. 
However, I got an assertion error of "java.lang.AssertionError: mismatched type 
$1 TIMESTAMP(3)" on rowtime column. Below is more details:

There in only one kafka data source, which is then converted to Table and 
registered. One existed column is set as rowtime(event time) attribute. Two 
over-window aggregation queries are run against the table and two tables are 
created as results. Everything works great so far.
However when timed-window joining two result tables with inherented rowtime, 
calcite throw the "java.lang.AssertionError: mismatched type $1 TIMESTAMP(3)" 
AssertionError. Can someone let me know what is the possible cause? F.Y.I., I 
rename the rowtime column for one of the result table.


DataStream dataStream = env.addSource(kafkaConsumer);

Table table = tableEnv.fromDataStream(dataStream, "col1", "col2", ...);

tableEnv.registerTable(tableName, table);

Table left = tableEnv.sqlQuery("select id, eventTime,count (*) over ...  from 
...");

Table right = tableEnv.sqlQuery("select id as r_id, eventTime as r_event_time, 
count (*) over ...  from ...");

left.join(right).where("id = r_id && eventTime === r_event_time)

.addSink(...); // here calcite throw exception: java.lang.AssertionError: 
mismatched type $1 TIMESTAMP(3)

source table
 |-- id: Long
 |-- eventTime: TimeIndicatorTypeInfo(rowtime)
 |-- ...
 |-- ...

result_1 table
 |-- id: Long
 |-- eventTime: TimeIndicatorTypeInfo(rowtime)
 |-- ...
 |-- ...

result_2 table
 |-- rid: Long
 |-- r_event_time: TimeIndicatorTypeInfo(rowtime)
 |-- ...



Best
Yan



Sample.java
Description: Sample.java


Re: Dynamic CEP https://issues.apache.org/jira/browse/FLINK-7129?subTaskView=all

2018-03-07 Thread Fabian Hueske
We hope to pick up FLIP-20 after Flink 1.5.0 has been released.

2018-03-07 22:05 GMT-08:00 Shailesh Jain :

> In addition to making the addition of patterns dynamic, any updates on
> FLIP 20 ?
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 20%3A+Integration+of+SQL+and+CEP
>
> On Thu, Mar 8, 2018 at 12:23 AM, Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> I see https://github.com/dawidwys/flink/tree/cep-dynamic-nfa is almost
>> there.
>>
>> On Wed, Mar 7, 2018 at 1:34 PM, Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> What is the state of this ticket ? Is CEP invested in supporting dynamic
>>> patterns that could potentially be where patterns can be added/disabled
>>> through a control stream ?
>>>
>>
>>
>


Re: Dynamic CEP https://issues.apache.org/jira/browse/FLINK-7129?subTaskView=all

2018-03-07 Thread Shailesh Jain
In addition to making the addition of patterns dynamic, any updates on FLIP
20 ?
https://cwiki.apache.org/confluence/display/FLINK/FLIP-20%3A+Integration+of+SQL+and+CEP

On Thu, Mar 8, 2018 at 12:23 AM, Vishal Santoshi 
wrote:

> I see https://github.com/dawidwys/flink/tree/cep-dynamic-nfa is almost
> there.
>
> On Wed, Mar 7, 2018 at 1:34 PM, Vishal Santoshi  > wrote:
>
>> What is the state of this ticket ? Is CEP invested in supporting dynamic
>> patterns that could potentially be where patterns can be added/disabled
>> through a control stream ?
>>
>
>


[ANNOUNCE] Apache Flink 1.4.2 released

2018-03-07 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache 
Flink 1.4.2, which is the second bugfix release for the Apache Flink 1.4 
series. 

Apache Flink® is an open-source stream processing framework for distributed, 
high-performing, always-available, and accurate data streaming applications. 

The release is available for download at: 
https://flink.apache.org/downloads.html 

Please check out the release blog post for an overview of the improvements for 
this bugfix release: 
https://flink.apache.org/news/2018/03/08/release-1.4.2.html

The full release notes are available in Jira: 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12342745

We would like to thank all contributors of the Apache Flink community who made 
this release possible! 

Cheers, 
Gordon 



Re: flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-07 Thread xccui-foxmail
Hi Yan,

I’d like to look into this. Can you share more about your queries and the full 
stack trace?

Thank,
Xingcan

> On 8 Mar 2018, at 11:28 AM, Yan Zhou [FDS Science]  > wrote:
> 
> Hi experts, 
> I am using flink table api to join two tables, which are datastream 
> underneath. However, I got an assertion error of "java.lang.AssertionError: 
> mismatched type $1 TIMESTAMP(3)" on rowtime column. Below is more details:
> 
> There in only one kafka data source, which is then converted to Table and 
> registered. One existed column is set as rowtime(event time) attribute. Two 
> over-window aggregation queries are run against the table and two tables are 
> created as results. Everything works great so far.
> However when timed-window joining two result tables with inherented rowtime, 
> calcite throw the "java.lang.AssertionError: mismatched type $1 TIMESTAMP(3)" 
> AssertionError. Can someone let me know what is the possible cause? F.Y.I., I 
> rename the rowtime column for one of the result table.  
> 
> DataStream dataStream = env.addSource(kafkaConsumer);
> Table table = tableEnv.fromDataStream(dataStream, "col1", "col2", ...);
> tableEnv.registerTable(tableName, table);
> Table left = tableEnv.sqlQuery("select id, eventTime,count (*) over ...  from 
> ...");
> Table right = tableEnv.sqlQuery("select id as r_id, eventTime as 
> r_event_time, count (*) over ...  from ...");
> left.join(right).where("id = r_id && eventTime === r_event_time)
> .addSink(...); // here calcite throw exception: java.lang.AssertionError: 
> mismatched type $1 TIMESTAMP(3) 
> 
> source table
>  |-- id: Long
>  |-- eventTime: TimeIndicatorTypeInfo(rowtime)
>  |-- ...
>  |-- ...
>  
> result_1 table
>  |-- id: Long
>  |-- eventTime: TimeIndicatorTypeInfo(rowtime)
>  |-- ...
>  |-- ...
>  
> result_2 table
>  |-- rid: Long
>  |-- r_event_time: TimeIndicatorTypeInfo(rowtime)
>  |-- ...
> 
> 
> Best
> Yan



flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-07 Thread Yan Zhou [FDS Science]
Hi experts,

I am using flink table api to join two tables, which are datastream underneath. 
However, I got an assertion error of "java.lang.AssertionError: mismatched type 
$1 TIMESTAMP(3)" on rowtime column. Below is more details:


There in only one kafka data source, which is then converted to Table and 
registered. One existed column is set as rowtime(event time) attribute. Two 
over-window aggregation queries are run against the table and two tables are 
created as results. Everything works great so far.

However when timed-window joining two result tables with inherented rowtime, 
calcite throw the "java.lang.AssertionError: mismatched type $1 TIMESTAMP(3)" 
AssertionError. Can someone let me know what is the possible cause? F.Y.I., I 
rename the rowtime column for one of the result table.


DataStream dataStream = env.addSource(kafkaConsumer);

Table table = tableEnv.fromDataStream(dataStream, "col1", "col2", ...);

tableEnv.registerTable(tableName, table);

Table left = tableEnv.sqlQuery("select id, eventTime,count (*) over ...  from 
...");

Table right = tableEnv.sqlQuery("select id as r_id, eventTime as r_event_time, 
count (*) over ...  from ...");

left.join(right).where("id = r_id && eventTime === r_event_time)

.addSink(...); // here calcite throw exception: java.lang.AssertionError: 
mismatched type $1 TIMESTAMP(3)

source table
 |-- id: Long
 |-- eventTime: TimeIndicatorTypeInfo(rowtime)
 |-- ...
 |-- ...

result_1 table
 |-- id: Long
 |-- eventTime: TimeIndicatorTypeInfo(rowtime)
 |-- ...
 |-- ...

result_2 table
 |-- rid: Long
 |-- r_event_time: TimeIndicatorTypeInfo(rowtime)
 |-- ...



Best

Yan




Job is be cancelled, but the stdout log still prints

2018-03-07 Thread sundy

Hi:

I faced a problem, the taskmanagers in 3 nodes are still running, I make sure 
that all job are cancelled,  but I could see that stdout logs are still 
printing all the way. The job's parallelism is 6.

I wrote a scheduled pool like this

static {
  Executors.newScheduledThreadPool(1).scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
  try {
getLiveInfo();
  } catch (Exception e) {
e.printStackTrace();
  }
}
  }, 0, 60, TimeUnit.SECONDS);
}
Is that the static methods will still be running in the taskmanagers even if 
the job is cancelled? That’s weird.

Re: Failing to recover once checkpoint fails

2018-03-07 Thread Stephan Ewen
The assumption in your previous mail is correct.

Just to double check:

  - The initially affected version you were running was 1.3.2, correct?

The issue should be fixed in all active branches (1.4, 1.5, 1.6) and
additional in 1.3.

Currently released versions with this fix: 1.4.0, 1.4.1
1.5.0 is in the makings.

We are looking to create a dedicated 1.3.3 for this fix.


On Thu, Jan 25, 2018 at 5:13 PM, Vishal Santoshi 
wrote:

> To add to this, we are assuming that the default configuration will fail a
> pipeline if  a checkpoint fails and will hit the recover loop only and only
> if the retry limit is not reached
>
>
>
>
> On Thu, Jan 25, 2018 at 7:00 AM, Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> Sorry.
>>
>> There are 2 scenerios
>>
>>   * Idempotent Sinks Use Case where we would want to restore from the
>> latest valid checkpoint.  If I understand the code correctly we try to
>> retrieve all completed checkpoints  for all handles in ZK and abort ( throw
>> an exception ) if there are handles but no corresponding complete
>> checkpoints in hdfs,  else we use the latest valid checkpoint state.  On
>> abort a restart  and thus restore of the  pipe  is issued repeating the
>> above execution. If the failure in hdfs was transient a retry will succeed
>> else when the  retry limit is reached the pipeline is aborted for good.
>>
>>
>> * Non Idempotent Sinks where we have no retries. We do not want to
>> recover from the last available checkpoint as the above code will do as the
>> more  into history we go the more duplicates will be delivered. The only
>> solution is use exactly once semantics of the source and sinks if possible.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 24, 2018 at 7:20 AM, Aljoscha Krettek 
>> wrote:
>>
>>> Did you see my second mail?
>>>
>>>
>>> On 24. Jan 2018, at 12:50, Vishal Santoshi 
>>> wrote:
>>>
>>> As in, if there are chk handles in zk, there should no reason to start a
>>> new job ( bad handle, no hdfs connectivity etc ),
>>>  yes that sums it up.
>>>
>>> On Wed, Jan 24, 2018 at 5:35 AM, Aljoscha Krettek 
>>> wrote:
>>>
 Wait a sec, I just checked out the code again and it seems we already
 do that: https://github.com/apache/flink/blob/9071e3befb8c279f7
 3c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apac
 he/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L210

 If there were some checkpoints but none could be read we fail recovery.


 On 24. Jan 2018, at 11:32, Aljoscha Krettek 
 wrote:

 That sounds reasonable: We would keep the first fix, i.e. never delete
 checkpoints if they're "corrupt", only when they're subsumed. Additionally,
 we fail the job if there are some checkpoints in ZooKeeper but none of them
 can be restored to prevent the case where a job starts from scratch even
 though it shouldn't.

 Does that sum it up?

 On 24. Jan 2018, at 01:19, Vishal Santoshi 
 wrote:

 If we hit the retry limit, abort the job. In our case we will restart
 from the last SP ( we as any production pile do it is n time s a day )  and
 that I would think should be OK for most folks ?

 On Tue, Jan 23, 2018 at 11:38 AM, Vishal Santoshi <
 vishal.santo...@gmail.com> wrote:

> Thank you for considering this. If I understand you correctly.
>
> * CHK pointer on ZK for a CHK state on hdfs was done successfully.
> * Some issue restarted the pipeline.
> * The NN was down unfortunately and flink could not retrieve the  CHK
> state from the CHK pointer on ZK.
>
> Before
>
> * The CHK pointer was being removed and the job started from a brand
> new slate.
>
> After ( this fix on 1.4 +)
>
> * do not delete the CHK pointer ( It has to be subsumed to be deleted
> ).
> * Flink keeps using this CHK pointer ( if retry is > 0 and till we hit
> any retry limit ) to restore state
> * NN comes back
> * Flink restores state on the next retry.
>
> I would hope that is the sequence to follow.
>
> Regards.
>
>
>
>
>
>
>
>
> On Tue, Jan 23, 2018 at 7:25 AM, Aljoscha Krettek  > wrote:
>
>> Hi Vishal,
>>
>> I think you might be right. We fixed the problem that checkpoints
>> where dropped via https://issues.apache.org/jira/browse/FLINK-7783.
>> However, we still have the problem that if the DFS is not up at all then 
>> it
>> will look as if the job is starting from scratch. However, the 
>> alternative
>> is failing the job, in which case you will also never be able to restore
>> from a checkpoint. What do you think?
>>
>> Best,
>> Aljoscha
>>
>>
>> On 23. Jan 2018, at 10:15, Fabian Hueske 

Re: Dynamic CEP https://issues.apache.org/jira/browse/FLINK-7129?subTaskView=all

2018-03-07 Thread Vishal Santoshi
I see https://github.com/dawidwys/flink/tree/cep-dynamic-nfa is almost
there.

On Wed, Mar 7, 2018 at 1:34 PM, Vishal Santoshi 
wrote:

> What is the state of this ticket ? Is CEP invested in supporting dynamic
> patterns that could potentially be where patterns can be added/disabled
> through a control stream ?
>


Dynamic CEP https://issues.apache.org/jira/browse/FLINK-7129?subTaskView=all

2018-03-07 Thread Vishal Santoshi
What is the state of this ticket ? Is CEP invested in supporting dynamic
patterns that could potentially be where patterns can be added/disabled
through a control stream ?


Re: bin/start-cluster.sh won't start jobmanager on master machine

2018-03-07 Thread Yesheng Ma
Oh, I have figured out the problem, which has something to do with my
~/.profile, i cannot remember when i added one line in the ~/.profile,
which sources my .zshrc, leading to  the login shell always goes to zsh.

On Wed, Mar 7, 2018 at 2:13 AM, Yesheng Ma  wrote:

> Related source code: https://github.com/apache/flink/blob/master/
> flink-dist/src/main/flink-bin/bin/start-cluster.sh#L40
>
> On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma  wrote:
>
>> Hi Nico,
>>
>> Thanks for your reply. My major concern is actually the `-l` argument.
>> The command I executed is: `nohup /bin/bash -x -l
>> "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster
>> dell-01.epcc 8091`, with and without the `-l` argument (the script in
>> Flink's bin directory uses the `-l` argument).
>>
>> 1) with the `-l` argument: the log is quite messy, but there are some
>> clue, the last executed command starts a zsh shell:
>> ```
>> + . /home/ysma/.bashrc
>> ++ case $- in
>> ++ return
>> + PATH=/home/ysma/bin:/home/ysma/.local/bin:/state/partition1/
>> ysma/redis-4.0.8/../bin:/home/ysma/env/jdk1.8.0_151/bin:/
>> home/ysma/env/maven/bin:/home/ysma/bin:/home/ysma/.local/
>> bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
>> sbin:/bin:/usr/games:/usr/local/games:/snap/bin
>> + '[' -f /bin/zsh ']'
>> + exec /bin/zsh -l
>> ```
>> I guess the bash -l arguments detects the user's login shell and then
>> logs in a zsh shell (which I'm currently using) and never back.
>>
>> 2) without the `-l` argument, everything just goes fine.
>>
>> Therefore I suspect there might be something wrong with the `-l`
>> argument, or something wrong with my bash config?  Any ideas? Thanks!
>>
>>
>> On Wed, Mar 7, 2018 at 12:20 AM, Nico Kruber 
>> wrote:
>>
>>> Hi Yesheng,
>>> `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit
>>> strange since it should (imho) be an absolute path towards flink.
>>>
>>> What you could do to diagnose further, is to try to run the ssh command
>>> manually, i.e. figure out what is being executed by calling
>>> bash -x ./bin/start-cluster.sh
>>> and then run the ssh command without "-n" and not in background "&".
>>> Then you should also see the JobManager stdout to diagnose further.
>>>
>>> If that does not help yet, please log into the master manually and
>>> execute the "nohup /bin/bash..." command there to see what is going on.
>>>
>>> Depending on where the failure was, there may even be logs on the master
>>> machine.
>>>
>>>
>>> Nico
>>>
>>> On 04/03/18 15:52, Yesheng Ma wrote:
>>> > Hi all,
>>> >
>>> > ​​When I execute bin/start-cluster.sh on the master machine, actually
>>> > the command `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` is
>>> > exexuted, which does not open the job manager properly.
>>> >
>>> > I think there might be something wrong with the `-l` argument, since
>>> > when I use the `bin/jobmanager.sh start` command, everything is fine.
>>> > Kindly point out if I've done any configuration wrong. Thanks!
>>> >
>>> > Best,
>>> > Yesheng
>>> >
>>> >
>>>
>>>
>>
>


Flink UI not responding on Yarn + Flink

2018-03-07 Thread samar kumar
Hi Users,
 I am currently trying to run flink with yarn. I have written a program
which will launch flink jobs on yarn . It works perfectly fine with local
yarn cluster . When I run it on a remote yarn cluster it does not work as
expected. As I am reading and writing data from kafka , I do not see any
data on the destination kafka topic. The yarn log goes print the
transformation I want to apply. I am not able to debug the issue because
there is no error in the yarn logs. Also the flink application UI seem to
be blank , and does not give the usual flink job list. Please help me debug
and fix the issue.

Running Flink 1.3.2 with yarn 2.6
Thanks,
Samar


Re: Flink Kafka reads too many bytes .... Very rarely

2018-03-07 Thread Philip Doctor
Hi Stephan,
Sorry for the slow response.

I added some logging inside of my DeserializationSchema’s `deserialize(byte[] 
message)` method.

I see the extra bytes appearing in that method.

If there’s another place I should add logging, please let me know and I’m happy 
to do so.

Additionally (and this is weird), I write all my messages to the DB, so I was 
looking for what messages didn’t make it (i.e. input message 1->10,000 which of 
those isn’t in the DB).  Turns out all 10k are in the DB.  I’m not sure if that 
indicates this message is read and then retried, or what.  I would have guessed 
that somehow extra data got written to my topic, but kafka tool tell me 
otherwise.  So from my application’s perspective it just looks like I get extra 
garbage data every now and then.

This is actually a big relief, I toss out the garbage data and keep rolling.

I hope this helps, thank you.



From: Stephan Ewen 
Date: Thursday, March 1, 2018 at 9:26 AM
To: "user@flink.apache.org" 
Cc: Philip Doctor 
Subject: Re: Flink Kafka reads too many bytes  Very rarely

Can you specify exactly where you have that excess of data?

Flink uses basically Kafka's standard consumer and passes byte[] unmodified to 
the DeserializationSchema. Can you help us check whether the "too many bytes" 
happens already before or after the DeserializationSchema?

  - If the "too many bytes" already arrive at the DeserializationSchema, then 
we should dig into the way that Kafka's consumer is configured


  - If the "too many bytes" appears after the  DeserializationSchema, then we 
should look into the DeserializationSchema, for example whether it is stateful, 
accidentally shared across threads, etc.





On Thu, Mar 1, 2018 at 11:08 AM, Fabian Hueske 
> wrote:
Hi Phil,
I've created a JIRA ticket for the problem that you described and linked it to 
this thread: FLINK-8820.
Thank you, Fabian

[1] https://issues.apache.org/jira/browse/FLINK-8820

2018-02-28 5:13 GMT+01:00 Philip Doctor 
>:

  *   The fact that I seem to get all of my data is currently leading me to 
discard and ignore this error

Please ignore this statement, I typed this email as I was testing a theory, I 
meant to delete this line.  This is still a very real issue for me.  I was 
looking to try a work around tomorrow, I saw that the Kafka 11 consumer 
supported transactions for exactly once processing, I was going to read about 
that and see if I could somehow fail a read that I couldn’t deserialize and try 
again, and if that might make a difference (can I just retry this ?).  I’m not 
sure how that’ll go.  If you’ve got an idea for a work around, I’d be all ears 
too.


From: Philip Doctor >
Date: Tuesday, February 27, 2018 at 10:02 PM
To: "Tzu-Li (Gordon) Tai" >, 
Fabian Hueske >
Cc: "user@flink.apache.org" 
>
Subject: Re: Flink Kafka reads too many bytes  Very rarely

Honestly this has been a very frustrating issue to dig in to.  The fact that I 
seem to get all of my data is currently leading me to discard and ignore this 
error, it’s rare, flink still seems to work, but something is very hard to 
debug here and despite some confusing observations, most of my evidence 
suggests that this originates in the flink kafka consumer.




Re: Table Api and CSV builder

2018-03-07 Thread Stefan Richter
Hi,

I think you just need to specify a custom watermark extractor that constructs 
the watermark from the 3 fields, as described here: 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/event_timestamp_extractors.html
 
.

Best,
Stefan

> Am 07.03.2018 um 00:52 schrieb Karim Amer :
> 
> Hi there 
> 
> I Have a CSV file with the timestamp deconstructed into 3 fields and I was 
> wondering what is the best way to  specify the those 3 fields are the event  
> time ? Should I make extend  CsvTableSource and do the preprocessing or can 
> CsvTableSource.builder() handle it. Or is there a better way in general to 
> tackle this obstacle.
> 
> Thanks
> 



Re: Simple CEP pattern

2018-03-07 Thread Kostas Kloudas
Hi,

You can adjust it to see if your pattern works. 
I thought that this is what you want to do.

If you want to run it on a cluster or run a full job, then 
the best way is actually do that. Write the job using the quickstart,
launch a cluster locally using the start-cluster.sh script, submit the job,
and check out its progress though the webmonitor.

There you also have access to the individual TaskManager and JobManager
logs, in case something goes wrong.

Cheers,
Kostas

> On Mar 7, 2018, at 3:44 PM, Esa Heikkinen  
> wrote:
> 
> Hi
>  
> Yes I have access to the flink source code, but could you explain little bit 
> more what to do with it in this case ?
>  
> Best, Esa
>  
> From: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 
> Sent: Wednesday, March 7, 2018 3:51 PM
> To: Esa Heikkinen 
> Cc: user@flink.apache.org
> Subject: Re: Simple CEP pattern
>  
> Hi Esa,
> 
> You can always test your pattern in isolation.
> For an example on how to do that, if you have access to the flink source 
> code, 
> you can check the UntilConditionITCase or any other ITCase in the same 
> package.
>  
> It can also be found here: 
> https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/nfa/UntilConditionITCase.java
>  
> 
> 
> Kostas
> 
> 
> On Mar 7, 2018, at 2:43 PM, Esa Heikkinen  > wrote:
> 
> Hi
>  
> I have tried this CEP example of Data-artisans and it works. It is also only 
> fully working example I have found, but it little bit too complex for my 
> purpose. I have also tried to read the documentation, but I have not found 
> about simple “Hello World” type example about Flink’s CEP.
>  
> It is surprisingly difficult to form correct Pattern, because there are many 
> operators and how to combine them by correct way..
> It looks like very simple, but in practice it has not been for me, but this 
> maybe because I am new with FlinkCEP.
>  
> Often I don’t know is it problem with “pattern” or “select”, because no 
> results.. Is there any way to debug CEP’s operations ? 
>  
> Best, Esa
>  
> From: Kostas Kloudas [mailto:k.klou...@data-artisans.com 
> ] 
> Sent: Wednesday, March 7, 2018 2:54 PM
> To: Esa Heikkinen  >
> Cc: user@flink.apache.org 
> Subject: Re: Simple CEP pattern
>  
> Hi Esa,
>  
> You could try the examples either from the documentation or from the training.
> http://training.data-artisans.com/exercises/CEP.html 
> 
>  
> Kostas
> 
> 
> On Mar 7, 2018, at 11:32 AM, Esa Heikkinen  > wrote:
>  
> What would be the simplest working CEP (Scala) pattern ?
> 
> I want to test if my CEP application works at all.
>  
> Best, Esa



RE: Simple CEP pattern

2018-03-07 Thread Esa Heikkinen
Hi

Yes I have access to the flink source code, but could you explain little bit 
more what to do with it in this case ?

Best, Esa

From: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
Sent: Wednesday, March 7, 2018 3:51 PM
To: Esa Heikkinen 
Cc: user@flink.apache.org
Subject: Re: Simple CEP pattern

Hi Esa,

You can always test your pattern in isolation.
For an example on how to do that, if you have access to the flink source code,
you can check the UntilConditionITCase or any other ITCase in the same package.

It can also be found here: 
https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/nfa/UntilConditionITCase.java

Kostas


On Mar 7, 2018, at 2:43 PM, Esa Heikkinen 
> wrote:

Hi

I have tried this CEP example of Data-artisans and it works. It is also only 
fully working example I have found, but it little bit too complex for my 
purpose. I have also tried to read the documentation, but I have not found 
about simple “Hello World” type example about Flink’s CEP.

It is surprisingly difficult to form correct Pattern, because there are many 
operators and how to combine them by correct way..
It looks like very simple, but in practice it has not been for me, but this 
maybe because I am new with FlinkCEP.

Often I don’t know is it problem with “pattern” or “select”, because no 
results.. Is there any way to debug CEP’s operations ?

Best, Esa

From: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
Sent: Wednesday, March 7, 2018 2:54 PM
To: Esa Heikkinen 
>
Cc: user@flink.apache.org
Subject: Re: Simple CEP pattern

Hi Esa,

You could try the examples either from the documentation or from the training.
http://training.data-artisans.com/exercises/CEP.html

Kostas


On Mar 7, 2018, at 11:32 AM, Esa Heikkinen 
> wrote:

What would be the simplest working CEP (Scala) pattern ?

I want to test if my CEP application works at all.

Best, Esa



Re: CEP issue

2018-03-07 Thread Vishal Santoshi
Will do.

On Wed, Mar 7, 2018 at 9:33 AM, Kostas Kloudas 
wrote:

> Why not opening a JIRA and working on adding some debug
> statements that you consider useful?
>
> This could help the next user that faces the same issues ;)
>
> Kostas
>
> On Mar 7, 2018, at 3:29 PM, Vishal Santoshi 
> wrote:
>
> Aah, yes we never had a sink state so never came across a case where it
> was ever exercised.  When the range expires, it is a prune rather than a
> stop state ( we were expecting it to be a stop state )  which is some what
> misleading if we hold stop state to " that invalidates a partial match "
> whatever the reason may be.
>
> Again I would also advise ( though not a biggy )  that strategic debug
> statements in the CEP core would help folks to see what actually happens.
> We instrumented the code to follow the construction of NFA that was very
> helpful.
>
> On Wed, Mar 7, 2018 at 9:23 AM, Kostas Kloudas <
> k.klou...@data-artisans.com> wrote:
>
>> Hi Vishal,
>>
>> A stopState is a state that invalidates a partial match, e.g.
>> a.NotFollowedBy(b).followedBy(c).
>> If you have an “a” and then you see a “b” then you invalidate the pattern.
>>
>> A finalState is the one where a match has been found.
>>
>> Kostas
>>
>>
>> On Mar 7, 2018, at 3:20 PM, Vishal Santoshi 
>> wrote:
>>
>> Absolutely.  For one a simple m out of n true conditions where n is
>> defined by range is a little under optimized as in just using time(m) will
>> not short circuit the partial patterns till the time range is achieved even
>> if there is no way m true conditions can be achieved ( we already have had
>> n-m false conditions ) . That makes sense as we have defined a within()
>> condition predicated on n.
>>
>> I think the way one would do it is to iterative condition and look at
>> all  events  ( including the ones with false but that can be expensive )
>> and stop a pattern. One question I had is that an NFA can be in a
>> FinalState or a StopState.
>>
>> What would constitute a StopState ?
>>
>> On Wed, Mar 7, 2018 at 8:47 AM, Kostas Kloudas <
>> k.klou...@data-artisans.com> wrote:
>>
>>> Hi Vishal,
>>>
>>> Thanks a lot for sharing your experience and the potential caveats to
>>> consider when
>>> specifying your pattern.
>>>
>>> I agree that there is room for improvement when it comes to the state
>>> checkpointed in Flink.
>>> We already have some ideas but still, as you also said, the bulk of the
>>> space consumption
>>> comes from the pattern definition, so it could be nice if more people
>>> did the same, i.e. sharing
>>> their experience, and why not, compiling a guide of things to avoid and
>>> put it along the rest
>>> of FlinkCEP documentation.
>>>
>>> What do you think?
>>>
>>> Kostas
>>>
>>>
>>>
>>> On Mar 7, 2018, at 2:34 PM, Vishal Santoshi 
>>> wrote:
>>>
>>> Hello all,  There were recent changes to the flink master that I pulled
>>> in and that *seems* to have solved our issue.
>>>
>>> Few points
>>>
>>> * CEP is heavy as the NFA  transition  matrix   as state which can be
>>> possibly  n^2 ( worst case )  can easily blow up space requirements.  The
>>> after match skip strategy is likely to play a crucial role in keeping the
>>> state lean https://ci.apache.org/projects/flink/flink-docs-master/
>>> dev/libs/cep.html#after-match-skip-strategy.  In our case we do not
>>> require partial matches within a match to contribute to another potential
>>> match ( noise for us )  and thus *SKIP_PAST_LAST_EVENT *was used which
>>> on match will prune the SharedBuffer ( almost reset it )
>>>
>>> * The argument that the pattern events should be lean holds much more in
>>> CEP due to the potential exponential increase in space requirements.
>>>
>>> * The nature of the pattern will require consideration if state does
>>> blow up for you.
>>>
>>> Apart from that, I am still not sure why toString() on SharedBuffer was
>>> called to get an OOM to begin with.
>>>
>>>
>>>
>>> On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi <
>>> vishal.santo...@gmail.com> wrote:
>>>
 We could not recreate in a controlled setup, but here are a few notes
 that we have gathered on a simple  "times(n),within(..)"

 In case where the Event does not create a Final or Stop state

 * As an NFA processes an Event, NFA mutates if there is a true Event.
 Each computation is a counter that keeps track of partial matches with each
 true Event already existent partial match for that computation unit.
 Essentially for n Events and if each Event is a true there will be roughly
 n-1 computations, each with representing an Event from 1 to n-1 ( so 1 or
 first will have n-1 events in the partial match, 2 has n-1 events and so on
 and n-1 has the last event as a partial match ).

 * If the WM progresses  beyond the ts of the 1st computation, that
 partial match is pruned.

 * It makes sure that 

Re: CEP issue

2018-03-07 Thread Kostas Kloudas
Why not opening a JIRA and working on adding some debug 
statements that you consider useful?

This could help the next user that faces the same issues ;)

Kostas

> On Mar 7, 2018, at 3:29 PM, Vishal Santoshi  wrote:
> 
> Aah, yes we never had a sink state so never came across a case where it was 
> ever exercised.  When the range expires, it is a prune rather than a stop 
> state ( we were expecting it to be a stop state )  which is some what 
> misleading if we hold stop state to " that invalidates a partial match " 
> whatever the reason may be.
> 
> Again I would also advise ( though not a biggy )  that strategic debug 
> statements in the CEP core would help folks to see what actually happens. We 
> instrumented the code to follow the construction of NFA that was very 
> helpful. 
> 
> On Wed, Mar 7, 2018 at 9:23 AM, Kostas Kloudas  > wrote:
> Hi Vishal,
> 
> A stopState is a state that invalidates a partial match, e.g. 
> a.NotFollowedBy(b).followedBy(c). 
> If you have an “a” and then you see a “b” then you invalidate the pattern.
> 
> A finalState is the one where a match has been found.
> 
> Kostas
> 
> 
>> On Mar 7, 2018, at 3:20 PM, Vishal Santoshi > > wrote:
>> 
>> Absolutely.  For one a simple m out of n true conditions where n is defined 
>> by range is a little under optimized as in just using time(m) will not short 
>> circuit the partial patterns till the time range is achieved even if there 
>> is no way m true conditions can be achieved ( we already have had n-m false 
>> conditions ) . That makes sense as we have defined a within() condition 
>> predicated on n. 
>> 
>> I think the way one would do it is to iterative condition and look at all  
>> events  ( including the ones with false but that can be expensive ) and stop 
>> a pattern. One question I had is that an NFA can be in a FinalState or a 
>> StopState. 
>> 
>> What would constitute a StopState ? 
>> 
>> On Wed, Mar 7, 2018 at 8:47 AM, Kostas Kloudas > > wrote:
>> Hi Vishal,
>> 
>> Thanks a lot for sharing your experience and the potential caveats to 
>> consider when 
>> specifying your pattern.
>> 
>> I agree that there is room for improvement when it comes to the state 
>> checkpointed in Flink.
>> We already have some ideas but still, as you also said, the bulk of the 
>> space consumption
>> comes from the pattern definition, so it could be nice if more people did 
>> the same, i.e. sharing 
>> their experience, and why not, compiling a guide of things to avoid and put 
>> it along the rest
>> of FlinkCEP documentation.
>> 
>> What do you think?
>> 
>> Kostas
>> 
>> 
>> 
>>> On Mar 7, 2018, at 2:34 PM, Vishal Santoshi >> > wrote:
>>> 
>>> Hello all,  There were recent changes to the flink master that I pulled in 
>>> and that seems to have solved our issue.  
>>> 
>>> Few points 
>>> 
>>> * CEP is heavy as the NFA  transition  matrix   as state which can be  
>>> possibly  n^2 ( worst case )  can easily blow up space requirements.  The 
>>> after match skip strategy is likely to play a crucial role in keeping the 
>>> state lean 
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#after-match-skip-strategy
>>>  
>>> .
>>>   In our case we do not require partial matches within a match to 
>>> contribute to another potential match ( noise for us )  and thus 
>>> SKIP_PAST_LAST_EVENT was used which on match will prune the SharedBuffer ( 
>>> almost reset it ) 
>>> 
>>> * The argument that the pattern events should be lean holds much more in 
>>> CEP due to the potential exponential increase in space requirements. 
>>> 
>>> * The nature of the pattern will require consideration if state does blow 
>>> up for you.
>>> 
>>> Apart from that, I am still not sure why toString() on SharedBuffer was 
>>> called to get an OOM to begin with.
>>> 
>>> 
>>> 
>>> On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi >> > wrote:
>>> We could not recreate in a controlled setup, but here are a few notes that 
>>> we have gathered on a simple  "times(n),within(..)"
>>> 
>>> In case where the Event does not create a Final or Stop state
>>> 
>>> * As an NFA processes an Event, NFA mutates if there is a true Event. Each 
>>> computation is a counter that keeps track of partial matches with each true 
>>> Event already existent partial match for that computation unit. Essentially 
>>> for n Events and if each Event is a true there will be roughly n-1 
>>> computations, each with representing an Event from 1 to n-1 ( so 1 or first 
>>> will have n-1 events in the partial match, 2 has 

Re: CEP issue

2018-03-07 Thread Vishal Santoshi
Aah, yes we never had a sink state so never came across a case where it was
ever exercised.  When the range expires, it is a prune rather than a stop
state ( we were expecting it to be a stop state )  which is some what
misleading if we hold stop state to " that invalidates a partial match "
whatever the reason may be.

Again I would also advise ( though not a biggy )  that strategic debug
statements in the CEP core would help folks to see what actually happens.
We instrumented the code to follow the construction of NFA that was very
helpful.

On Wed, Mar 7, 2018 at 9:23 AM, Kostas Kloudas 
wrote:

> Hi Vishal,
>
> A stopState is a state that invalidates a partial match, e.g.
> a.NotFollowedBy(b).followedBy(c).
> If you have an “a” and then you see a “b” then you invalidate the pattern.
>
> A finalState is the one where a match has been found.
>
> Kostas
>
>
> On Mar 7, 2018, at 3:20 PM, Vishal Santoshi 
> wrote:
>
> Absolutely.  For one a simple m out of n true conditions where n is
> defined by range is a little under optimized as in just using time(m) will
> not short circuit the partial patterns till the time range is achieved even
> if there is no way m true conditions can be achieved ( we already have had
> n-m false conditions ) . That makes sense as we have defined a within()
> condition predicated on n.
>
> I think the way one would do it is to iterative condition and look at all
> events  ( including the ones with false but that can be expensive ) and
> stop a pattern. One question I had is that an NFA can be in a FinalState or
> a StopState.
>
> What would constitute a StopState ?
>
> On Wed, Mar 7, 2018 at 8:47 AM, Kostas Kloudas <
> k.klou...@data-artisans.com> wrote:
>
>> Hi Vishal,
>>
>> Thanks a lot for sharing your experience and the potential caveats to
>> consider when
>> specifying your pattern.
>>
>> I agree that there is room for improvement when it comes to the state
>> checkpointed in Flink.
>> We already have some ideas but still, as you also said, the bulk of the
>> space consumption
>> comes from the pattern definition, so it could be nice if more people did
>> the same, i.e. sharing
>> their experience, and why not, compiling a guide of things to avoid and
>> put it along the rest
>> of FlinkCEP documentation.
>>
>> What do you think?
>>
>> Kostas
>>
>>
>>
>> On Mar 7, 2018, at 2:34 PM, Vishal Santoshi 
>> wrote:
>>
>> Hello all,  There were recent changes to the flink master that I pulled
>> in and that *seems* to have solved our issue.
>>
>> Few points
>>
>> * CEP is heavy as the NFA  transition  matrix   as state which can be
>> possibly  n^2 ( worst case )  can easily blow up space requirements.  The
>> after match skip strategy is likely to play a crucial role in keeping the
>> state lean https://ci.apache.org/projects/flink/flink-docs-master/
>> dev/libs/cep.html#after-match-skip-strategy.  In our case we do not
>> require partial matches within a match to contribute to another potential
>> match ( noise for us )  and thus *SKIP_PAST_LAST_EVENT *was used which
>> on match will prune the SharedBuffer ( almost reset it )
>>
>> * The argument that the pattern events should be lean holds much more in
>> CEP due to the potential exponential increase in space requirements.
>>
>> * The nature of the pattern will require consideration if state does blow
>> up for you.
>>
>> Apart from that, I am still not sure why toString() on SharedBuffer was
>> called to get an OOM to begin with.
>>
>>
>>
>> On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> We could not recreate in a controlled setup, but here are a few notes
>>> that we have gathered on a simple  "times(n),within(..)"
>>>
>>> In case where the Event does not create a Final or Stop state
>>>
>>> * As an NFA processes an Event, NFA mutates if there is a true Event.
>>> Each computation is a counter that keeps track of partial matches with each
>>> true Event already existent partial match for that computation unit.
>>> Essentially for n Events and if each Event is a true there will be roughly
>>> n-1 computations, each with representing an Event from 1 to n-1 ( so 1 or
>>> first will have n-1 events in the partial match, 2 has n-1 events and so on
>>> and n-1 has the last event as a partial match ).
>>>
>>> * If the WM progresses  beyond the ts of the 1st computation, that
>>> partial match is pruned.
>>>
>>> * It makes sure that a SharedBufferEntry is pruned only if the count of
>>> Edges originating from it reduces to 0 ( the internalRemove() which uses a
>>> Stack) , which should happen as WM keeps progressing to the nth element for
>>> unfulfilled patterns. A "null" ( not a fan ) event is used to establish  a
>>> WM progression
>>>
>>>
>>> In case there is a FinalState  ( and we skipToFirstAfterLast )
>>>
>>> * The NFA by will prune ( release )  all partial matches and prune the
>>> shared buffer 

Re: CEP issue

2018-03-07 Thread Kostas Kloudas
Hi Vishal,

A stopState is a state that invalidates a partial match, e.g. 
a.NotFollowedBy(b).followedBy(c). 
If you have an “a” and then you see a “b” then you invalidate the pattern.

A finalState is the one where a match has been found.

Kostas

> On Mar 7, 2018, at 3:20 PM, Vishal Santoshi  wrote:
> 
> Absolutely.  For one a simple m out of n true conditions where n is defined 
> by range is a little under optimized as in just using time(m) will not short 
> circuit the partial patterns till the time range is achieved even if there is 
> no way m true conditions can be achieved ( we already have had n-m false 
> conditions ) . That makes sense as we have defined a within() condition 
> predicated on n. 
> 
> I think the way one would do it is to iterative condition and look at all  
> events  ( including the ones with false but that can be expensive ) and stop 
> a pattern. One question I had is that an NFA can be in a FinalState or a 
> StopState. 
> 
> What would constitute a StopState ? 
> 
> On Wed, Mar 7, 2018 at 8:47 AM, Kostas Kloudas  > wrote:
> Hi Vishal,
> 
> Thanks a lot for sharing your experience and the potential caveats to 
> consider when 
> specifying your pattern.
> 
> I agree that there is room for improvement when it comes to the state 
> checkpointed in Flink.
> We already have some ideas but still, as you also said, the bulk of the space 
> consumption
> comes from the pattern definition, so it could be nice if more people did the 
> same, i.e. sharing 
> their experience, and why not, compiling a guide of things to avoid and put 
> it along the rest
> of FlinkCEP documentation.
> 
> What do you think?
> 
> Kostas
> 
> 
> 
>> On Mar 7, 2018, at 2:34 PM, Vishal Santoshi > > wrote:
>> 
>> Hello all,  There were recent changes to the flink master that I pulled in 
>> and that seems to have solved our issue.  
>> 
>> Few points 
>> 
>> * CEP is heavy as the NFA  transition  matrix   as state which can be  
>> possibly  n^2 ( worst case )  can easily blow up space requirements.  The 
>> after match skip strategy is likely to play a crucial role in keeping the 
>> state lean 
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#after-match-skip-strategy
>>  
>> .
>>   In our case we do not require partial matches within a match to contribute 
>> to another potential match ( noise for us )  and thus SKIP_PAST_LAST_EVENT 
>> was used which on match will prune the SharedBuffer ( almost reset it ) 
>> 
>> * The argument that the pattern events should be lean holds much more in CEP 
>> due to the potential exponential increase in space requirements. 
>> 
>> * The nature of the pattern will require consideration if state does blow up 
>> for you.
>> 
>> Apart from that, I am still not sure why toString() on SharedBuffer was 
>> called to get an OOM to begin with.
>> 
>> 
>> 
>> On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi > > wrote:
>> We could not recreate in a controlled setup, but here are a few notes that 
>> we have gathered on a simple  "times(n),within(..)"
>> 
>> In case where the Event does not create a Final or Stop state
>> 
>> * As an NFA processes an Event, NFA mutates if there is a true Event. Each 
>> computation is a counter that keeps track of partial matches with each true 
>> Event already existent partial match for that computation unit. Essentially 
>> for n Events and if each Event is a true there will be roughly n-1 
>> computations, each with representing an Event from 1 to n-1 ( so 1 or first 
>> will have n-1 events in the partial match, 2 has n-1 events and so on and 
>> n-1 has the last event as a partial match ).
>> 
>> * If the WM progresses  beyond the ts of the 1st computation, that partial 
>> match is pruned.
>> 
>> * It makes sure that a SharedBufferEntry is pruned only if the count of 
>> Edges originating from it reduces to 0 ( the internalRemove() which uses a 
>> Stack) , which should happen as WM keeps progressing to the nth element for 
>> unfulfilled patterns. A "null" ( not a fan ) event is used to establish  a 
>> WM progression 
>> 
>> 
>> In case there is a FinalState  ( and we skipToFirstAfterLast ) 
>> 
>> * The NFA by will prune ( release )  all partial matches and prune the 
>> shared buffer and emit the current match. The computations now should be 
>> empty.
>> 
>> There is a lot to it, but is that roughly what is done in that code ?
>> 
>> 
>> 
>> Few questions. 
>> 
>> * What we have seen is that the call to toString method of SharedBuffer is 
>> where OOM occurs. Now in the code there is no call to a Log so we are not 
>> sure why the method or who calls that method. Surely that is not part of the 

Re: CEP issue

2018-03-07 Thread Vishal Santoshi
Absolutely.  For one a simple m out of n true conditions where n is defined
by range is a little under optimized as in just using time(m) will not
short circuit the partial patterns till the time range is achieved even if
there is no way m true conditions can be achieved ( we already have had n-m
false conditions ) . That makes sense as we have defined a within()
condition predicated on n.

I think the way one would do it is to iterative condition and look at all
events  ( including the ones with false but that can be expensive ) and
stop a pattern. One question I had is that an NFA can be in a FinalState or
a StopState.

What would constitute a StopState ?

On Wed, Mar 7, 2018 at 8:47 AM, Kostas Kloudas 
wrote:

> Hi Vishal,
>
> Thanks a lot for sharing your experience and the potential caveats to
> consider when
> specifying your pattern.
>
> I agree that there is room for improvement when it comes to the state
> checkpointed in Flink.
> We already have some ideas but still, as you also said, the bulk of the
> space consumption
> comes from the pattern definition, so it could be nice if more people did
> the same, i.e. sharing
> their experience, and why not, compiling a guide of things to avoid and
> put it along the rest
> of FlinkCEP documentation.
>
> What do you think?
>
> Kostas
>
>
>
> On Mar 7, 2018, at 2:34 PM, Vishal Santoshi 
> wrote:
>
> Hello all,  There were recent changes to the flink master that I pulled in
> and that *seems* to have solved our issue.
>
> Few points
>
> * CEP is heavy as the NFA  transition  matrix   as state which can be
> possibly  n^2 ( worst case )  can easily blow up space requirements.  The
> after match skip strategy is likely to play a crucial role in keeping the
> state lean https://ci.apache.org/projects/flink/flink-docs-master/
> dev/libs/cep.html#after-match-skip-strategy.  In our case we do not
> require partial matches within a match to contribute to another potential
> match ( noise for us )  and thus *SKIP_PAST_LAST_EVENT *was used which on
> match will prune the SharedBuffer ( almost reset it )
>
> * The argument that the pattern events should be lean holds much more in
> CEP due to the potential exponential increase in space requirements.
>
> * The nature of the pattern will require consideration if state does blow
> up for you.
>
> Apart from that, I am still not sure why toString() on SharedBuffer was
> called to get an OOM to begin with.
>
>
>
> On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> We could not recreate in a controlled setup, but here are a few notes
>> that we have gathered on a simple  "times(n),within(..)"
>>
>> In case where the Event does not create a Final or Stop state
>>
>> * As an NFA processes an Event, NFA mutates if there is a true Event.
>> Each computation is a counter that keeps track of partial matches with each
>> true Event already existent partial match for that computation unit.
>> Essentially for n Events and if each Event is a true there will be roughly
>> n-1 computations, each with representing an Event from 1 to n-1 ( so 1 or
>> first will have n-1 events in the partial match, 2 has n-1 events and so on
>> and n-1 has the last event as a partial match ).
>>
>> * If the WM progresses  beyond the ts of the 1st computation, that
>> partial match is pruned.
>>
>> * It makes sure that a SharedBufferEntry is pruned only if the count of
>> Edges originating from it reduces to 0 ( the internalRemove() which uses a
>> Stack) , which should happen as WM keeps progressing to the nth element for
>> unfulfilled patterns. A "null" ( not a fan ) event is used to establish  a
>> WM progression
>>
>>
>> In case there is a FinalState  ( and we skipToFirstAfterLast )
>>
>> * The NFA by will prune ( release )  all partial matches and prune the
>> shared buffer and emit the current match. The computations now should be
>> empty.
>>
>> There is a lot to it, but is that roughly what is done in that code ?
>>
>>
>>
>> Few questions.
>>
>> * What we have seen is that the call to toString method of SharedBuffer
>> is where OOM occurs. Now in the code there is no call to a Log so we are
>> not sure why the method or who calls that method. Surely that is not part
>> of the Seriazation/DeSer routine or is it ( very surprising if it is )
>> * There is no out of the box implementation of "m out of n"  pattern
>> match. We have to resort to n in range ( m * time series slot ) which we
>> do. This is fine but what it does not allow is an optimization where if n
>> false conditions are seen, one can prune.  Simply speaking if n-m  false
>> have been seen there is no way  that out of n there will be ever m trues
>> and thus SharedBuffer can be pruned to the last true seen ( very akin to 
>> skipToFirstAfterLast
>> ).
>>
>> We will keep instrumenting the code ( which apart from the null message
>> is easily understandable ) but would love to hear your 

Re: Simple CEP pattern

2018-03-07 Thread Kostas Kloudas
Hi Esa,

You can always test your pattern in isolation.
For an example on how to do that, if you have access to the flink source code, 
you can check the UntilConditionITCase or any other ITCase in the same package.

It can also be found here: 
https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/nfa/UntilConditionITCase.java
 


Kostas

> On Mar 7, 2018, at 2:43 PM, Esa Heikkinen  
> wrote:
> 
> Hi
>  
> I have tried this CEP example of Data-artisans and it works. It is also only 
> fully working example I have found, but it little bit too complex for my 
> purpose. I have also tried to read the documentation, but I have not found 
> about simple “Hello World” type example about Flink’s CEP.
>  
> It is surprisingly difficult to form correct Pattern, because there are many 
> operators and how to combine them by correct way..
> It looks like very simple, but in practice it has not been for me, but this 
> maybe because I am new with FlinkCEP.
>  
> Often I don’t know is it problem with “pattern” or “select”, because no 
> results.. Is there any way to debug CEP’s operations ? 
>  
> Best, Esa
>  
> From: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 
> Sent: Wednesday, March 7, 2018 2:54 PM
> To: Esa Heikkinen 
> Cc: user@flink.apache.org
> Subject: Re: Simple CEP pattern
>  
> Hi Esa,
>  
> You could try the examples either from the documentation or from the training.
> http://training.data-artisans.com/exercises/CEP.html
>  
> Kostas
> 
> 
> On Mar 7, 2018, at 11:32 AM, Esa Heikkinen  
> wrote:
>  
> What would be the simplest working CEP (Scala) pattern ?
> 
> I want to test if my CEP application works at all.
>  
> Best, Esa



Re: CEP issue

2018-03-07 Thread Kostas Kloudas
Hi Vishal,

Thanks a lot for sharing your experience and the potential caveats to consider 
when 
specifying your pattern.

I agree that there is room for improvement when it comes to the state 
checkpointed in Flink.
We already have some ideas but still, as you also said, the bulk of the space 
consumption
comes from the pattern definition, so it could be nice if more people did the 
same, i.e. sharing 
their experience, and why not, compiling a guide of things to avoid and put it 
along the rest
of FlinkCEP documentation.

What do you think?

Kostas



> On Mar 7, 2018, at 2:34 PM, Vishal Santoshi  wrote:
> 
> Hello all,  There were recent changes to the flink master that I pulled in 
> and that seems to have solved our issue.  
> 
> Few points 
> 
> * CEP is heavy as the NFA  transition  matrix   as state which can be  
> possibly  n^2 ( worst case )  can easily blow up space requirements.  The 
> after match skip strategy is likely to play a crucial role in keeping the 
> state lean 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#after-match-skip-strategy
>  
> .
>   In our case we do not require partial matches within a match to contribute 
> to another potential match ( noise for us )  and thus SKIP_PAST_LAST_EVENT 
> was used which on match will prune the SharedBuffer ( almost reset it ) 
> 
> * The argument that the pattern events should be lean holds much more in CEP 
> due to the potential exponential increase in space requirements. 
> 
> * The nature of the pattern will require consideration if state does blow up 
> for you.
> 
> Apart from that, I am still not sure why toString() on SharedBuffer was 
> called to get an OOM to begin with.
> 
> 
> 
> On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi  > wrote:
> We could not recreate in a controlled setup, but here are a few notes that we 
> have gathered on a simple  "times(n),within(..)"
> 
> In case where the Event does not create a Final or Stop state
> 
> * As an NFA processes an Event, NFA mutates if there is a true Event. Each 
> computation is a counter that keeps track of partial matches with each true 
> Event already existent partial match for that computation unit. Essentially 
> for n Events and if each Event is a true there will be roughly n-1 
> computations, each with representing an Event from 1 to n-1 ( so 1 or first 
> will have n-1 events in the partial match, 2 has n-1 events and so on and n-1 
> has the last event as a partial match ).
> 
> * If the WM progresses  beyond the ts of the 1st computation, that partial 
> match is pruned.
> 
> * It makes sure that a SharedBufferEntry is pruned only if the count of Edges 
> originating from it reduces to 0 ( the internalRemove() which uses a Stack) , 
> which should happen as WM keeps progressing to the nth element for 
> unfulfilled patterns. A "null" ( not a fan ) event is used to establish  a WM 
> progression 
> 
> 
> In case there is a FinalState  ( and we skipToFirstAfterLast ) 
> 
> * The NFA by will prune ( release )  all partial matches and prune the shared 
> buffer and emit the current match. The computations now should be empty.
> 
> There is a lot to it, but is that roughly what is done in that code ?
> 
> 
> 
> Few questions. 
> 
> * What we have seen is that the call to toString method of SharedBuffer is 
> where OOM occurs. Now in the code there is no call to a Log so we are not 
> sure why the method or who calls that method. Surely that is not part of the 
> Seriazation/DeSer routine or is it ( very surprising if it is ) 
> * There is no out of the box implementation of "m out of n"  pattern match. 
> We have to resort to n in range ( m * time series slot ) which we do. This is 
> fine but what it does not allow is an optimization where if n false 
> conditions are seen, one can prune.  Simply speaking if n-m  false have been 
> seen there is no way  that out of n there will be ever m trues and thus 
> SharedBuffer can be pruned to the last true seen ( very akin to 
> skipToFirstAfterLast ).  
> 
> We will keep instrumenting the code ( which apart from the null message is 
> easily understandable ) but would love to hear your feedback. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> On Tue, Feb 6, 2018 at 12:00 PM, Kostas Kloudas  > wrote:
> Thanks a lot Vishal! 
> 
> We are looking forward to a test case that reproduces the failure.
> 
> Kostas
> 
> 
>> On Feb 2, 2018, at 4:05 PM, Vishal Santoshi > > wrote:
>> 
>> This is the pattern. Will create a test case. 
>> /**
>>  *
>>  * @param condition a single condition is applied as a  acceptance criteria
>>  * @param params defining the 

RE: Simple CEP pattern

2018-03-07 Thread Esa Heikkinen
Hi

I have tried this CEP example of Data-artisans and it works. It is also only 
fully working example I have found, but it little bit too complex for my 
purpose. I have also tried to read the documentation, but I have not found 
about simple "Hello World" type example about Flink's CEP.

It is surprisingly difficult to form correct Pattern, because there are many 
operators and how to combine them by correct way..
It looks like very simple, but in practice it has not been for me, but this 
maybe because I am new with FlinkCEP.

Often I don't know is it problem with "pattern" or "select", because no 
results.. Is there any way to debug CEP's operations ?

Best, Esa

From: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
Sent: Wednesday, March 7, 2018 2:54 PM
To: Esa Heikkinen 
Cc: user@flink.apache.org
Subject: Re: Simple CEP pattern

Hi Esa,

You could try the examples either from the documentation or from the training.
http://training.data-artisans.com/exercises/CEP.html

Kostas


On Mar 7, 2018, at 11:32 AM, Esa Heikkinen 
> wrote:

What would be the simplest working CEP (Scala) pattern ?

I want to test if my CEP application works at all.

Best, Esa



Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-07 Thread Hequn Cheng
Hi kant,

You are right. Batch joins require the inputs are bounded. To join two
unbounded streams without window, all data will be stored in join's states,
so the late right row will join the previous left row when it is input.
As for state retention time, if the input tables of join are both keyed
table and key number of the keyed tables are limited, then you don't have
to set state retention time, otherwise it is suggested to set the state
retention time.

Best, Hequn

On Wed, Mar 7, 2018 at 8:02 PM, kant kodali  wrote:

> Hi Cheng,
>
> The docs here
> 
>  states
> full outer joins are only available for batch (I am not sure if I am
> reading that correctly). I am trying to understand how two unbounded
> streams can be joined like a batch? If we have to do batch join then it *must
> be* bounded right? If so, how do we bound? I can think Time Window is one
> way to bound but other than that if I execute the below join query on the
> unbounded stream I am not even sure how that works? A row from one table
> can join with a row from another table and that row can come anytime in
> future right if it is unbounded. so I am sorry I am failing to understand.
>
>
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id = s.orderId
>
> Thanks!
>
> On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng  wrote:
>
>> Hi kant,
>>
>> It seems that you mean the Time-windowed Join. The Time-windowed Joins are 
>> supported
>> now. You can check more details with the docs given by Xingcan.
>> As for the non-window join, it is used to join two unbounded stream and
>> the semantic is very like batch join.
>>
>> Time-windowed Join:
>>
>>> SELECT *
>>> FROM Orders o, Shipments s
>>> WHERE o.id = s.orderId AND
>>>   o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
>>
>>
>> Non-windowed Join:
>>
>>> SELECT *
>>> FROM Orders o, Shipments s
>>> WHERE o.id = s.orderId
>>
>>
>> On Wed, Mar 7, 2018 at 7:02 PM, kant kodali  wrote:
>>
>>> Hi!
>>>
>>> Thanks for all this. and yes I was indeed talking about SQL/Table API so
>>> I will keep track of these tickets! BTW, What is non-windowed Join? I
>>> thought stream-stream-joins by default is a stateful operation so it has to
>>> be within some time window right? Also does the output of stream-stream
>>> joins emit every time so we can see the state of the join at any given time
>>> or only when the watermark elapses and join result fully materializes?
>>>
>>> On a side note, Full outer join seems to be the most useful for my use
>>> case. so the moment its available in master I can start playing and testing
>>> it!
>>>
>>> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng 
>>> wrote:
>>>
 Hi Kant,

 The stream-stream outer joins are work in progress
 now(left/right/full), and will probably be ready before the end of this
 month. You can check the progress from[1].

 Best, Hequn

 [1] https://issues.apache.org/jira/browse/FLINK-5878

 On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui  wrote:

> Hi Kant,
>
> I suppose you refer to the stream join in SQL/Table API since the
> outer join for windowed-streams can always be achieved with the
> `JoinFunction` in DataStream API.
>
> There are two kinds of stream joins, namely, the time-windowed join
> and the non-windowed join in Flink SQL/Table API [1, 2]. The
> time-windowed outer join has been supported since version 1.5 and the
> non-windowed outer join is still work in progress.
>
> Hope that helps.
>
> Best,
> Xingcan
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/d
> ev/table/tableApi.html#joins
> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
> ev/table/sql.html#joins
>
>
> On 7 Mar 2018, at 12:45 AM, kant kodali  wrote:
>
> Hi All,
>
> Does Flink support stream-stream outer joins in the latest version?
>
> Thanks!
>
>
>

>>>
>>
>


Re: CEP issue

2018-03-07 Thread Vishal Santoshi
Hello all,  There were recent changes to the flink master that I pulled in
and that *seems* to have solved our issue.

Few points

* CEP is heavy as the NFA  transition  matrix   as state which can be
possibly  n^2 ( worst case )  can easily blow up space requirements.  The
after match skip strategy is likely to play a crucial role in keeping the
state lean https://ci.apache.org/projects/flink/flink-docs-
master/dev/libs/cep.html#after-match-skip-strategy.  In our case we do not
require partial matches within a match to contribute to another potential
match ( noise for us )  and thus *SKIP_PAST_LAST_EVENT *was used which on
match will prune the SharedBuffer ( almost reset it )

* The argument that the pattern events should be lean holds much more in
CEP due to the potential exponential increase in space requirements.

* The nature of the pattern will require consideration if state does blow
up for you.

Apart from that, I am still not sure why toString() on SharedBuffer was
called to get an OOM to begin with.



On Mon, Feb 26, 2018 at 2:09 PM, Vishal Santoshi 
wrote:

> We could not recreate in a controlled setup, but here are a few notes that
> we have gathered on a simple  "times(n),within(..)"
>
> In case where the Event does not create a Final or Stop state
>
> * As an NFA processes an Event, NFA mutates if there is a true Event. Each
> computation is a counter that keeps track of partial matches with each true
> Event already existent partial match for that computation unit. Essentially
> for n Events and if each Event is a true there will be roughly n-1
> computations, each with representing an Event from 1 to n-1 ( so 1 or first
> will have n-1 events in the partial match, 2 has n-1 events and so on and
> n-1 has the last event as a partial match ).
>
> * If the WM progresses  beyond the ts of the 1st computation, that partial
> match is pruned.
>
> * It makes sure that a SharedBufferEntry is pruned only if the count of
> Edges originating from it reduces to 0 ( the internalRemove() which uses a
> Stack) , which should happen as WM keeps progressing to the nth element for
> unfulfilled patterns. A "null" ( not a fan ) event is used to establish  a
> WM progression
>
>
> In case there is a FinalState  ( and we skipToFirstAfterLast )
>
> * The NFA by will prune ( release )  all partial matches and prune the
> shared buffer and emit the current match. The computations now should be
> empty.
>
> There is a lot to it, but is that roughly what is done in that code ?
>
>
>
> Few questions.
>
> * What we have seen is that the call to toString method of SharedBuffer is
> where OOM occurs. Now in the code there is no call to a Log so we are not
> sure why the method or who calls that method. Surely that is not part of
> the Seriazation/DeSer routine or is it ( very surprising if it is )
> * There is no out of the box implementation of "m out of n"  pattern
> match. We have to resort to n in range ( m * time series slot ) which we
> do. This is fine but what it does not allow is an optimization where if n
> false conditions are seen, one can prune.  Simply speaking if n-m  false
> have been seen there is no way  that out of n there will be ever m trues
> and thus SharedBuffer can be pruned to the last true seen ( very akin to 
> skipToFirstAfterLast
> ).
>
> We will keep instrumenting the code ( which apart from the null message is
> easily understandable ) but would love to hear your feedback.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Feb 6, 2018 at 12:00 PM, Kostas Kloudas <
> k.klou...@data-artisans.com> wrote:
>
>> Thanks a lot Vishal!
>>
>> We are looking forward to a test case that reproduces the failure.
>>
>> Kostas
>>
>>
>> On Feb 2, 2018, at 4:05 PM, Vishal Santoshi 
>> wrote:
>>
>> This is the pattern. Will create a test case.
>>
>> /**
>>  *
>>  * @param condition a single condition is applied as a  acceptance criteria
>>  * @param params defining the bounds of the pattern.
>>  * @param  the element in the stream
>>  * @return compiled pattern alonf with the params.
>>  */
>> public static  RelaxedContiguousPattern 
>> of(SimpleCondition condition,
>>   
>> RelaxedContiguityWithinTime params,
>>   
>> RichMapFunction, List> mapFunc,
>>   
>> String patternId) {
>> assert (params.seriesLength >= params.elementCount && 
>> params.elementCount > 0);
>> Pattern pattern = Pattern.
>> begin(START).
>> where(condition);
>> if (params.elementCount > 1) pattern = pattern.
>> followedBy(REST).
>> where(condition).
>> times(params.elementCount - 1);
>>
>>
>> return new RelaxedContiguousPattern(
>>  

Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-07 Thread Xingcan Cui
Hi Kant,

the non windowed stream-stream join is not equivalent to the full-history join, 
though they get the same SQL form. The retention times for records must be set 
to leverage the storage consumption and completeness of the results.

Best,
Xingcan

> On 7 Mar 2018, at 8:02 PM, kant kodali  wrote:
> 
> Hi Cheng,
> 
> The docs here 
> 
>  states full outer joins are only available for batch (I am not sure if I am 
> reading that correctly). I am trying to understand how two unbounded streams 
> can be joined like a batch? If we have to do batch join then it must be 
> bounded right? If so, how do we bound? I can think Time Window is one way to 
> bound but other than that if I execute the below join query on the unbounded 
> stream I am not even sure how that works? A row from one table can join with 
> a row from another table and that row can come anytime in future right if it 
> is unbounded. so I am sorry I am failing to understand.
> 
> 
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id  = s.orderId
> 
> Thanks!
> 
> On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng  > wrote:
> Hi kant,
> 
> It seems that you mean the Time-windowed Join. The Time-windowed Joins are 
> supported now. You can check more details with the docs given by Xingcan.
> As for the non-window join, it is used to join two unbounded stream and the 
> semantic is very like batch join.
> 
> Time-windowed Join:
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id  = s.orderId AND
>   o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
>  
> Non-windowed Join:
> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id  = s.orderId
> 
> On Wed, Mar 7, 2018 at 7:02 PM, kant kodali  > wrote:
> Hi! 
> 
> Thanks for all this. and yes I was indeed talking about SQL/Table API so I 
> will keep track of these tickets! BTW, What is non-windowed Join? I thought 
> stream-stream-joins by default is a stateful operation so it has to be within 
> some time window right? Also does the output of stream-stream joins emit 
> every time so we can see the state of the join at any given time or only when 
> the watermark elapses and join result fully materializes? 
> 
> On a side note, Full outer join seems to be the most useful for my use case. 
> so the moment its available in master I can start playing and testing it!
> 
> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng  > wrote:
> Hi Kant,
> 
> The stream-stream outer joins are work in progress now(left/right/full), and 
> will probably be ready before the end of this month. You can check the 
> progress from[1]. 
> 
> Best, Hequn
> 
> [1] https://issues.apache.org/jira/browse/FLINK-5878 
> 
> 
> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui  > wrote:
> Hi Kant,
> 
> I suppose you refer to the stream join in SQL/Table API since the outer join 
> for windowed-streams can always be achieved with the `JoinFunction` in 
> DataStream API.
> 
> There are two kinds of stream joins, namely, the time-windowed join and the 
> non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed outer join 
> has been supported since version 1.5 and the non-windowed outer join is still 
> work in progress.
> 
> Hope that helps.
> 
> Best,
> Xingcan
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins
>  
> 
> [2] 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins
>  
> 
> 
> 
>> On 7 Mar 2018, at 12:45 AM, kant kodali > > wrote:
>> 
>> Hi All,
>> 
>> Does Flink support stream-stream outer joins in the latest version?
>> 
>> Thanks!
> 
> 
> 
> 
> 



RE: CsvTableSource Types.TIMESTAMP

2018-03-07 Thread Esa Heikkinen
Hi

It works now. It was because of the missing “import”. Thank you.

Best, Esa

From: Hequn Cheng [mailto:chenghe...@gmail.com]
Sent: Wednesday, March 7, 2018 3:00 PM
To: Esa Heikkinen 
Cc: Timo Walther ; user@flink.apache.org
Subject: Re: CsvTableSource Types.TIMESTAMP

Hi Esa,

Have you ever imported org.apache.flink.table.api.scala._ ?  There are some 
examples here[1].

Best, Hequn

[1] 
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/table/TableSinkITCase.scala

On Tue, Mar 6, 2018 at 5:24 PM, Esa Heikkinen 
> wrote:
Hi

Thank you, it worked, but there was another problem now in same example.

How to use .filter():

val table = tEnv
.scan("customers")
.filter('name.isNotNull && 'last_update > "2016-01-01 00:00:00".toTimestamp)
.select('id, 'name.lowerCase(), 'prefs)

Error in compiling: “Value > is not member of Symbol”

Is that syntactically correct, may it be problem with “imports” or is it 
deprecated ?

Best, Esa

From: Timo Walther [mailto:twal...@apache.org]
Sent: Monday, March 5, 2018 3:15 PM
To: user@flink.apache.org
Subject: Re: CsvTableSource Types.TIMESTAMP

Hi,

SQL_TIMESTAMP is the same. A couple of months ago it was decided to rename this 
property such that it can be used for timestamps with timezone support in the 
future.

Regards,
Tiom


Am 3/5/18 um 2:10 PM schrieb Esa Heikkinen:
I have tried to following example to work, but no succeed yet.

https://flink.apache.org/news/2017/03/29/table-sql-api-update.html

Error .. value TIMESTAMP is not a member of object 
org.apache.glink.table.api.Types

What would be the problem ?

What the imports should I use ?

Or should I use SQL_TIMESTAMP instead of it ? is it same ?

Best, Esa





Re: CsvTableSource Types.TIMESTAMP

2018-03-07 Thread Hequn Cheng
Hi Esa,

Have you ever imported org.apache.flink.table.api.scala._ ?  There are some
examples here[1].

Best, Hequn

[1]
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/table/TableSinkITCase.scala

On Tue, Mar 6, 2018 at 5:24 PM, Esa Heikkinen 
wrote:

> Hi
>
>
>
> Thank you, it worked, but there was another problem now in same example.
>
>
>
> How to use .filter():
>
>
>
> val table = tEnv
>
> .scan("customers")
>
> .filter('name.isNotNull && 'last_update > "2016-01-01 00:00:00".
> toTimestamp)
>
> .select('id, 'name.lowerCase(), 'prefs)
>
>
>
> Error in compiling: “Value > is not member of Symbol”
>
>
>
> Is that syntactically correct, may it be problem with “imports” or is it
> deprecated ?
>
>
>
> Best, Esa
>
>
>
> *From:* Timo Walther [mailto:twal...@apache.org]
> *Sent:* Monday, March 5, 2018 3:15 PM
> *To:* user@flink.apache.org
> *Subject:* Re: CsvTableSource Types.TIMESTAMP
>
>
>
> Hi,
>
> SQL_TIMESTAMP is the same. A couple of months ago it was decided to rename
> this property such that it can be used for timestamps with timezone support
> in the future.
>
> Regards,
> Tiom
>
>
> Am 3/5/18 um 2:10 PM schrieb Esa Heikkinen:
>
> I have tried to following example to work, but no succeed yet.
>
>
>
> https://flink.apache.org/news/2017/03/29/table-sql-api-update.html
>
>
>
> Error .. value TIMESTAMP is not a member of object
> org.apache.glink.table.api.Types
>
>
>
> What would be the problem ?
>
>
>
> What the imports should I use ?
>
>
>
> Or should I use SQL_TIMESTAMP instead of it ? is it same ?
>
>
>
> Best, Esa
>
>
>


Re: Simple CEP pattern

2018-03-07 Thread Kostas Kloudas
Hi Esa,

You could try the examples either from the documentation or from the training.
http://training.data-artisans.com/exercises/CEP.html 


Kostas

> On Mar 7, 2018, at 11:32 AM, Esa Heikkinen  
> wrote:
> 
> What would be the simplest working CEP (Scala) pattern ?
> 
> I want to test if my CEP application works at all.
>  
> Best, Esa



Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-07 Thread kant kodali
Hi Cheng,

The docs here

states
full outer joins are only available for batch (I am not sure if I am
reading that correctly). I am trying to understand how two unbounded
streams can be joined like a batch? If we have to do batch join then it *must
be* bounded right? If so, how do we bound? I can think Time Window is one
way to bound but other than that if I execute the below join query on the
unbounded stream I am not even sure how that works? A row from one table
can join with a row from another table and that row can come anytime in
future right if it is unbounded. so I am sorry I am failing to understand.


SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId

Thanks!

On Wed, Mar 7, 2018 at 3:49 AM, Hequn Cheng  wrote:

> Hi kant,
>
> It seems that you mean the Time-windowed Join. The Time-windowed Joins are 
> supported
> now. You can check more details with the docs given by Xingcan.
> As for the non-window join, it is used to join two unbounded stream and
> the semantic is very like batch join.
>
> Time-windowed Join:
>
>> SELECT *
>> FROM Orders o, Shipments s
>> WHERE o.id = s.orderId AND
>>   o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
>
>
> Non-windowed Join:
>
>> SELECT *
>> FROM Orders o, Shipments s
>> WHERE o.id = s.orderId
>
>
> On Wed, Mar 7, 2018 at 7:02 PM, kant kodali  wrote:
>
>> Hi!
>>
>> Thanks for all this. and yes I was indeed talking about SQL/Table API so
>> I will keep track of these tickets! BTW, What is non-windowed Join? I
>> thought stream-stream-joins by default is a stateful operation so it has to
>> be within some time window right? Also does the output of stream-stream
>> joins emit every time so we can see the state of the join at any given time
>> or only when the watermark elapses and join result fully materializes?
>>
>> On a side note, Full outer join seems to be the most useful for my use
>> case. so the moment its available in master I can start playing and testing
>> it!
>>
>> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng 
>> wrote:
>>
>>> Hi Kant,
>>>
>>> The stream-stream outer joins are work in progress now(left/right/full),
>>> and will probably be ready before the end of this month. You can check the
>>> progress from[1].
>>>
>>> Best, Hequn
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-5878
>>>
>>> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui  wrote:
>>>
 Hi Kant,

 I suppose you refer to the stream join in SQL/Table API since the outer
 join for windowed-streams can always be achieved with the `JoinFunction` in
 DataStream API.

 There are two kinds of stream joins, namely, the time-windowed join and
 the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
 outer join has been supported since version 1.5 and the non-windowed outer
 join is still work in progress.

 Hope that helps.

 Best,
 Xingcan

 [1] https://ci.apache.org/projects/flink/flink-docs-master/d
 ev/table/tableApi.html#joins
 [2] https://ci.apache.org/projects/flink/flink-docs-master/d
 ev/table/sql.html#joins


 On 7 Mar 2018, at 12:45 AM, kant kodali  wrote:

 Hi All,

 Does Flink support stream-stream outer joins in the latest version?

 Thanks!



>>>
>>
>


Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-07 Thread Hequn Cheng
Hi kant,

It seems that you mean the Time-windowed Join. The Time-windowed Joins
are supported
now. You can check more details with the docs given by Xingcan.
As for the non-window join, it is used to join two unbounded stream
and the semantic
is very like batch join.

Time-windowed Join:

> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id = s.orderId AND
>   o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime


Non-windowed Join:

> SELECT *
> FROM Orders o, Shipments s
> WHERE o.id = s.orderId


On Wed, Mar 7, 2018 at 7:02 PM, kant kodali  wrote:

> Hi!
>
> Thanks for all this. and yes I was indeed talking about SQL/Table API so I
> will keep track of these tickets! BTW, What is non-windowed Join? I
> thought stream-stream-joins by default is a stateful operation so it has to
> be within some time window right? Also does the output of stream-stream
> joins emit every time so we can see the state of the join at any given time
> or only when the watermark elapses and join result fully materializes?
>
> On a side note, Full outer join seems to be the most useful for my use
> case. so the moment its available in master I can start playing and testing
> it!
>
> On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng  wrote:
>
>> Hi Kant,
>>
>> The stream-stream outer joins are work in progress now(left/right/full),
>> and will probably be ready before the end of this month. You can check the
>> progress from[1].
>>
>> Best, Hequn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-5878
>>
>> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui  wrote:
>>
>>> Hi Kant,
>>>
>>> I suppose you refer to the stream join in SQL/Table API since the outer
>>> join for windowed-streams can always be achieved with the `JoinFunction` in
>>> DataStream API.
>>>
>>> There are two kinds of stream joins, namely, the time-windowed join and
>>> the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
>>> outer join has been supported since version 1.5 and the non-windowed outer
>>> join is still work in progress.
>>>
>>> Hope that helps.
>>>
>>> Best,
>>> Xingcan
>>>
>>> [1] https://ci.apache.org/projects/flink/flink-docs-master/d
>>> ev/table/tableApi.html#joins
>>> [2] https://ci.apache.org/projects/flink/flink-docs-master/d
>>> ev/table/sql.html#joins
>>>
>>>
>>> On 7 Mar 2018, at 12:45 AM, kant kodali  wrote:
>>>
>>> Hi All,
>>>
>>> Does Flink support stream-stream outer joins in the latest version?
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>


Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-07 Thread kant kodali
Hi!

Thanks for all this. and yes I was indeed talking about SQL/Table API so I
will keep track of these tickets! BTW, What is non-windowed Join? I thought
stream-stream-joins by default is a stateful operation so it has to be
within some time window right? Also does the output of stream-stream joins
emit every time so we can see the state of the join at any given time or
only when the watermark elapses and join result fully materializes?

On a side note, Full outer join seems to be the most useful for my use
case. so the moment its available in master I can start playing and testing
it!

On Tue, Mar 6, 2018 at 10:39 PM, Hequn Cheng  wrote:

> Hi Kant,
>
> The stream-stream outer joins are work in progress now(left/right/full),
> and will probably be ready before the end of this month. You can check the
> progress from[1].
>
> Best, Hequn
>
> [1] https://issues.apache.org/jira/browse/FLINK-5878
>
> On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui  wrote:
>
>> Hi Kant,
>>
>> I suppose you refer to the stream join in SQL/Table API since the outer
>> join for windowed-streams can always be achieved with the `JoinFunction` in
>> DataStream API.
>>
>> There are two kinds of stream joins, namely, the time-windowed join and
>> the non-windowed join in Flink SQL/Table API [1, 2]. The time-windowed
>> outer join has been supported since version 1.5 and the non-windowed outer
>> join is still work in progress.
>>
>> Hope that helps.
>>
>> Best,
>> Xingcan
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-master/
>> dev/table/tableApi.html#joins
>> [2] https://ci.apache.org/projects/flink/flink-docs-master/
>> dev/table/sql.html#joins
>>
>>
>> On 7 Mar 2018, at 12:45 AM, kant kodali  wrote:
>>
>> Hi All,
>>
>> Does Flink support stream-stream outer joins in the latest version?
>>
>> Thanks!
>>
>>
>>
>


Simple CEP pattern

2018-03-07 Thread Esa Heikkinen
What would be the simplest working CEP (Scala) pattern ?

I want to test if my CEP application works at all.

Best, Esa


Re: Rest APIs

2018-03-07 Thread Chesnay Schepler
||The |jobmanager.web.upload.dir| option ||only affects where jars 
submitted through the WebUI/REST API are stored.


As for uploading jars directly through the REST API, this 
 
may be useful to you.


On 07.03.2018 01:37, Daniele Foroni wrote:

Hi guys,

I am using flink 1.4.1 and I am working with the rest api.
If I run a flink job through the command line (./bin/flink run 
job.jar) is it uploaded to the folder set in variable 
|jobmanager.web.upload.dir? It seems no.|
|So, through the rest api I can cancel the job creating a savepoint, 
but I don’t have the data to restart the same jar from the build 
savepoint (since I cannot retrieve the jar id). Am I right?|

|
|
Another problem is that I built a java rest client for uploading a jar 
file, but somehow it doesn’t work.

Is there any working code example or available api that I can use?

Thank you all in advance,
---
Daniele





Re: Flink is looking for Kafka topic "n/a"

2018-03-07 Thread Mu Kong
Hi Gordon,

Thanks for your response.
I think I've misspoken about the failure after "n/a" exception.
The behavior after this exception would be:

switched from RUNNING to CANCELING
switched from CANCELING to CANCELED
Try to restart or fail the job "X" () if no
longer possible.
switched from state FAILING to RESTARTING
Restarting the job "X" ()
Recovering checkpoints from ZooKeeper
Found 1 checkpoints in ZooKeeper
Trying to retrieve checkpoint 1091
Restoring from latest valid checkpoint: Checkpoint 1091 @
 for 
switched from CREATED to SCHEDULED
switched from SCHEDULED to DEPLOYING
switched from DEPLOYING to RUNNING
(several check pointings)
switched from RUNNING to FAILED
TimerException{java.io.EOFException:Premature EOF: no length prefix
available}
at
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:219)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1347)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

Since there several successful check points after the restart, I think the
later failure might be something else.
Also, could you please share more information about the MARKER in the code?
Like which piece of code should I look for.

And thanks for the suggestion to let me upgrade the flink to 1.3.2

Best regards,
Mu


On Wed, Mar 7, 2018 at 3:04 PM, Tzu-Li Tai  wrote:

> Hi Mu,
>
> You mentioned that the job stopped after the "n/a" topic error, but the job
> failed to recover.
> What exception did you encounter in the restart executions? Was it the same
> error?
> This would verify if we actually should be removing more than one of these
> special MARKER partition states.
>
> On the other hand, if I recall correctly, the Kafka consumer had a severe
> bug in 1.3.0 which could lead to potential duplicate data, which was fixed
> in 1.3.2. Though I don't think it is related to the error you encountered,
> I
> strongly recommend that you use 1.3.2 instead.
>
> Cheers,
> Gordon
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>