Re: Excessive disk IO with Spark structured streaming

2020-11-05 Thread Jungtaek Lim
FYI, SPARK-30294 is merged and will be available in Spark 3.1.0. This reduces the number of temp files for the state store to half when you use streaming aggregation. 1. https://issues.apache.org/jira/browse/SPARK-30294 On Thu, Oct 8, 2020 at 11:55 AM Jungtaek Lim wrote: > I can't spend too

Need suggestions for Spark on K8S: RPC Encryption

2020-11-05 Thread Xuan Gong
Hello, spark experts: I am trying to figure out how to encrypt traffic when using spark on k8s. >From the spark security doc, I learned how to do the RPC encryption between spark driver and spark executors. But I do not understand how to do it between spark driver and K8S API Server. (and/maybe

Re: Confuse on Spark to_date function

2020-11-05 Thread Daniel Stojanov
On 5/11/20 2:48 pm, 杨仲鲍 wrote: Code ```scala object Suit{ case class Data(node:String,root:String) def apply[A](xs:A *):List[A] = xs.toList def main(args: Array[String]): Unit ={ val spark = SparkSession.builder() .master("local") .appName("MoneyBackTest") .getOrCreate() import

How does order work in Row objects when .toDF() is called?

2020-11-05 Thread Daniel Stojanov
>>> row_1 = psq.Row(first=1, second=2) >>> row_2 = psq.Row(second=22, first=11) >>> spark.sparkContext.parallelize([row_1, row_2]).toDF().collect() [Row(first=1, second=2), Row(first=22, second=11)] (Spark 3.0.1) What is happening in the above? When .toDF() is called it appears that order is