Re: flatMapGroupsWithState not timing out (spark 2.2.1)

2018-01-12 Thread Tathagata Das
Aah okay! How are testing whether there is a timeout? The situation that would lead to the *EventTimeTimeout* would be the following. 1. Send bunch of data to group1, to set the timeout timestamp using event-time 2. Then send more data to group2 only, to advance the watermark (since it's based on

Spark preserve timestamp

2018-01-12 Thread sk skk
Do we have option to say to spark to preserve time stamp while creating struct. Regards, Sudhir

Re: flatMapGroupsWithState not timing out (spark 2.2.1)

2018-01-12 Thread Tathagata Das
Hello Dan, >From your code, it seems like you are setting the timeout timestamp based on the current processing-time / wall-clock-time, while the watermark is being calculated on the event-time ("when" column). The semantics of the EventTimeTimeout is that when the last set timeout timestamp of a

flatMapGroupsWithState not timing out (spark 2.2.1)

2018-01-12 Thread daniel williams
Hi, I’m attempting to leverage flatMapGroupsWithState to handle some arbitrary aggregations and am noticing a couple of things: - *ProcessingTimeTimeout* + *setTimeoutDuration* timeout not being honored - *EventTimeTimeout* + watermark value not being honored. - *EventTimeTimeout* + *

Using Logistic regression with SGD in Spark ML

2018-01-12 Thread NinjaYali
Hi All, I am working on a logistic regression model in spark.ml, and we utilize org.apache.spark.ml.classification.LogisticRegression. After looking into the code and doc, it seems like it does not contain SGD as its optimizer. https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/ml/class