[jira] [Commented] (FLINK-3109) Join two streams with two different buffer time

2016-01-11 Thread Wang Yangjun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091769#comment-15091769
 ] 

Wang Yangjun commented on FLINK-3109:
-

During last month, this implementation worked well in my work. I think the 
feature could be contributed. 

> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-3109) Join two streams with two different buffer time

2015-12-04 Thread Wang Yangjun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042530#comment-15042530
 ] 

Wang Yangjun edited comment on FLINK-3109 at 12/5/15 1:28 AM:
--

After implement this feature, nothing changed with any old API. 
We just need add one operator under package 
org.apache.flink.streaming.api.operators, and do some modification in the class 
org.apache.flink.streaming.api.datastream.JoinedStreams

https://github.com/apache/flink/compare/master...wangyangjun:master



was (Author: yangjun.wan...@gmail.com):
After implement this feature, nothing changed with any old API. 
We just need add one operator under package 
org.apache.flink.streaming.api.operators, and do some modification in the class 
org.apache.flink.streaming.api.datastream.JoinedStreams

> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3109) Join two streams with two different buffer time

2015-12-04 Thread Wang Yangjun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Yangjun updated FLINK-3109:

Summary: Join two streams with two different buffer time  (was: Join two 
streams with two different cache time)

> Join two streams with two different buffer time
> ---
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3109) Join two streams with two different cache time

2015-12-04 Thread Wang Yangjun (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042530#comment-15042530
 ] 

Wang Yangjun commented on FLINK-3109:
-

After implement this feature, nothing changed with any old API. 
We just need add one operator under package 
org.apache.flink.streaming.api.operators, and do some modification in the class 
org.apache.flink.streaming.api.datastream.JoinedStreams

> Join two streams with two different cache time
> --
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3109) Join two streams with two different cache time

2015-12-03 Thread Wang Yangjun (JIRA)
Wang Yangjun created FLINK-3109:
---

 Summary: Join two streams with two different cache time
 Key: FLINK-3109
 URL: https://issues.apache.org/jira/browse/FLINK-3109
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.10.1
Reporter: Wang Yangjun
 Fix For: 0.10.2


Current Flink streaming only supports join two streams on the same window. How 
to solve this problem?
For example, there are two streams. One is advertisements showed to users. The 
tuple in which could be described as (id, showed timestamp). The other one is 
click stream -- (id, clicked timestamp). We want get a joined stream, which 
includes all the advertisement that is clicked by user in 20 minutes after 
showed.
It is possible that after an advertisement is shown, some user click it 
immediately. The "click" message could arrive to server earlier than "show" 
message because of Internet delay. We assume that the maximum delay is one 
minute.

Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
and another buffer(1 min) of "click" stream.

It would be grate that there is such an API like.
showStream.join(clickStream)
.where(keySelector)
.buffer(Time.of(20, TimeUnit.MINUTES))
.equalTo(keySelector)
.buffer(Time.of(1, TimeUnit.MINUTES))
.apply(JoinFunction)

http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3109) Join two streams with two different cache time

2015-12-03 Thread Wang Yangjun (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Yangjun updated FLINK-3109:

Description: 
Current Flink streaming only supports join two streams on the same window. How 
to solve this problem?
For example, there are two streams. One is advertisements showed to users. The 
tuple in which could be described as (id, showed timestamp). The other one is 
click stream -- (id, clicked timestamp). We want get a joined stream, which 
includes all the advertisement that is clicked by user in 20 minutes after 
showed.
It is possible that after an advertisement is shown, some user click it 
immediately. It is possible that "click" message arrives server earlier than 
"show" message because of Internet delay. We assume that the maximum delay is 
one minute.

Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
and another buffer(1 min) of "click" stream.

It would be grate that there is such an API like.
showStream.join(clickStream)
.where(keySelector)
.buffer(Time.of(20, TimeUnit.MINUTES))
.equalTo(keySelector)
.buffer(Time.of(1, TimeUnit.MINUTES))
.apply(JoinFunction)

http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149

  was:
Current Flink streaming only supports join two streams on the same window. How 
to solve this problem?
For example, there are two streams. One is advertisements showed to users. The 
tuple in which could be described as (id, showed timestamp). The other one is 
click stream -- (id, clicked timestamp). We want get a joined stream, which 
includes all the advertisement that is clicked by user in 20 minutes after 
showed.
It is possible that after an advertisement is shown, some user click it 
immediately. The "click" message could arrive to server earlier than "show" 
message because of Internet delay. We assume that the maximum delay is one 
minute.

Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
and another buffer(1 min) of "click" stream.

It would be grate that there is such an API like.
showStream.join(clickStream)
.where(keySelector)
.buffer(Time.of(20, TimeUnit.MINUTES))
.equalTo(keySelector)
.buffer(Time.of(1, TimeUnit.MINUTES))
.apply(JoinFunction)

http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149


> Join two streams with two different cache time
> --
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.1
>Reporter: Wang Yangjun
>  Labels: easyfix, patch
> Fix For: 0.10.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window. 
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users. 
> The tuple in which could be described as (id, showed timestamp). The other 
> one is click stream -- (id, clicked timestamp). We want get a joined stream, 
> which includes all the advertisement that is clicked by user in 20 minutes 
> after showed.
> It is possible that after an advertisement is shown, some user click it 
> immediately. It is possible that "click" message arrives server earlier than 
> "show" message because of Internet delay. We assume that the maximum delay is 
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream 
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)