[jira] [Commented] (SPARK-27424) Joining of one stream against the most recent update in another stream

2019-06-09 Thread Thilo Schneider (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859593#comment-16859593
 ] 

Thilo Schneider commented on SPARK-27424:
-

Sehr geehrte Damen und Herren,

vielen Dank für Ihre Nachricht. Ich bin bis einschließlich 15. Juni 2019 nicht 
erreichbar. Ihre Nachricht wird nicht weitergeleitet und bis dahin nicht 
bearbeitet.

Mit freundlichen Grüßen
Thilo Schneider


Fraport AG Frankfurt Airport Services Worldwide, 60547 Frankfurt am Main, Sitz 
der Gesellschaft: Frankfurt am Main, Amtsgericht Frankfurt am Main: HRB 7042, 
Umsatzsteuer-Identifikationsnummer: DE 114150623, Vorsitzender des 
Aufsichtsrates: Karlheinz Weimar - Hessischer Finanzminister a.D.; Vorstand: 
Dr. Stefan Schulte (Vorsitzender), Anke Giesen, Michael Mueller, Dr. Matthias 
Zieschang


> Joining of one stream against the most recent update in another stream
> --
>
> Key: SPARK-27424
> URL: https://issues.apache.org/jira/browse/SPARK-27424
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Thilo Schneider
>Priority: Major
> Attachments: join-last-update-design.pdf
>
>
> Currently, adding the most recent update of a row with a given key to another 
> stream is not possible. This situation arises if one wants to use the current 
> state, of one object, for example when joining the room temperature with the 
> current weather.
> This ticket covers creation of a {{stream_lead}} and modification of the 
> streaming join logic (and state store) to additionally allow joins of the 
> form 
> {code:sql}
> SELECT *
> FROM A, B
> WHERE 
> A.key = B.key 
> AND A.time >= B.time 
> AND A.time < stream_lead(B.time)
> {code}
> The major aspect of this change is that we actually need a third watermark to 
> cover how late updates may come. 
> A rough sketch may be found in the attached document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27424) Joining of one stream against the most recent update in another stream

2019-06-09 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859594#comment-16859594
 ] 

Dongjoon Hyun commented on SPARK-27424:
---

Thank you for filing a JIRA and document, [~thiloschneider]. I updated the 
affected version since this is a proposal to new feature for 3.0.0.

> Joining of one stream against the most recent update in another stream
> --
>
> Key: SPARK-27424
> URL: https://issues.apache.org/jira/browse/SPARK-27424
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Thilo Schneider
>Priority: Major
> Attachments: join-last-update-design.pdf
>
>
> Currently, adding the most recent update of a row with a given key to another 
> stream is not possible. This situation arises if one wants to use the current 
> state, of one object, for example when joining the room temperature with the 
> current weather.
> This ticket covers creation of a {{stream_lead}} and modification of the 
> streaming join logic (and state store) to additionally allow joins of the 
> form 
> {code:sql}
> SELECT *
> FROM A, B
> WHERE 
> A.key = B.key 
> AND A.time >= B.time 
> AND A.time < stream_lead(B.time)
> {code}
> The major aspect of this change is that we actually need a third watermark to 
> cover how late updates may come. 
> A rough sketch may be found in the attached document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27424) Joining of one stream against the most recent update in another stream

2019-04-10 Thread Thilo Schneider (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814120#comment-16814120
 ] 

Thilo Schneider commented on SPARK-27424:
-

Attached is a - not fully detailed - sketch of the improvement. I would be 
willing to work on this further but would like to get your feedback before 
going into more detail. Do we need a SPIP for this proposal?

> Joining of one stream against the most recent update in another stream
> --
>
> Key: SPARK-27424
> URL: https://issues.apache.org/jira/browse/SPARK-27424
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.1
>Reporter: Thilo Schneider
>Priority: Major
> Attachments: join-last-update-design.pdf
>
>
> Currently, adding the most recent update of a row with a given key to another 
> stream is not possible. This situation arises if one wants to use the current 
> state, of one object, for example when joining the room temperature with the 
> current weather.
> This ticket covers creation of a {{stream_lead}} and modification of the 
> streaming join logic (and state store) to additionally allow joins of the 
> form 
> {code:sql}
> SELECT *
> FROM A, B
> WHERE 
> A.key = B.key 
> AND A.time >= B.time 
> AND A.time < stream_lead(B.time)
> {code}
> The major aspect of this change is that we actually need a third watermark to 
> cover how late updates may come. 
> A rough sketch may be found in the attached document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org