[Spark Structured Streaming, Spark 2.3.0] Calling current_timestamp() function within a streaming dataframe results in dataType error

2018-03-19 Thread Artem Moskvin
Hi all,

There's probably a regression in Spark 2.3.0. Running the code below in
2.2.1 succeeds but in 2.3.0 results in error
`org.apache.spark.sql.streaming.StreamingQueryException: Invalid call to
dataType on unresolved object, tree: 'current_timestamp`.

```
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.Trigger
import scala.concurrent.duration._

val values = spark.
  readStream.
  format("rate").
  load.
  withColumn("current_timestamp", current_timestamp)

values.
  writeStream.
  format("console").
  option("truncate", false).
  trigger(Trigger.ProcessingTime(10.seconds)).
  start().
  awaitTermination()
```

Can anyone confirm the same behavior?


Respectfully,
Artem Moskvin


Why Spark Streaming keeps all batches in memory after processing?

2015-11-19 Thread Artem Moskvin
Hello there!

I wonder why Spark Streaming keeps all processed batches in memory? It
leads to getting out of memory on executors but I really don't need them
after processing. Can it be configured somewhere so that batches are not
kept in memory after processing?

Respectfully,
Artem Moskvin