from:"Artem Moskvin"

[Spark Structured Streaming, Spark 2.3.0] Calling current_timestamp() function within a streaming dataframe results in dataType error

2018-03-19 Thread Artem Moskvin

Hi all,

There's probably a regression in Spark 2.3.0. Running the code below in
2.2.1 succeeds but in 2.3.0 results in error
`org.apache.spark.sql.streaming.StreamingQueryException: Invalid call to
dataType on unresolved object, tree: 'current_timestamp`.

```
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.Trigger
import scala.concurrent.duration._

val values = spark.
  readStream.
  format("rate").
  load.
  withColumn("current_timestamp", current_timestamp)

values.
  writeStream.
  format("console").
  option("truncate", false).
  trigger(Trigger.ProcessingTime(10.seconds)).
  start().
  awaitTermination()
```

Can anyone confirm the same behavior?


Respectfully,
Artem Moskvin

Why Spark Streaming keeps all batches in memory after processing?

2015-11-19 Thread Artem Moskvin

Hello there!

I wonder why Spark Streaming keeps all processed batches in memory? It
leads to getting out of memory on executors but I really don't need them
after processing. Can it be configured somewhere so that batches are not
kept in memory after processing?

Respectfully,
Artem Moskvin

[Spark Structured Streaming, Spark 2.3.0] Calling current_timestamp() function within a streaming dataframe results in dataType error

Why Spark Streaming keeps all batches in memory after processing?

2 matches

Site Navigation

Mail list logo

Footer information