Re: Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-06-15 Thread puneetloya
Just More info on the above post: Have been seeing lot of these logs: 1) The state for version 15109(other numbers too) doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query. 2) KafkaConsumer cache hitting ma

Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-06-15 Thread puneetloya
Hi, Just upgraded Spark from 2.2.3 to 2.4.3. Ran a load test with a week worth of messages in kafka. Seeing an odd behavior, why is the storage memory so high? Have run similar workloads with Spark 2.2.3, ha

unsubscribe

2019-06-15 Thread Humberto Marchezi
-- Humberto C Marchezi -

Creating Spark buckets that Presto / Athena / Hive can leverage

2019-06-15 Thread Daniel Mateus Pires
Hi there! I am trying to optimize joins on data created by Spark, so I'd like to bucket the data to avoid shuffling. I am writing to immutable partitions every day by writing data to a local HDFS and then copying this data to S3, is there a combination of bucketBy options and DDL that I can use s