Spark UNEVENLY distributing data

2018-05-19 Thread Alchemist
I am trying to parallelize a simple Spark program processes HBASE data in parallel.// Get Hbase RDD JavaPairRDD hBaseRDD = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); long count = hBaseRDD.count(); Only two

Re: OOM: Structured Streaming aggregation state not cleaned up properly

2018-05-19 Thread Ted Yu
Hi, w.r.t. ElementTrackingStore, since it is backed by KVStore, there should be other classes which occupy significant memory. Can you pastebin the top 10 entries among the heap dump ? Thanks

Spark is not evenly distributing data

2018-05-19 Thread SparkUser6
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: OOM: Structured Streaming aggregation state not cleaned up propertly

2018-05-19 Thread weand
Nobody has any idea... ? Is filtering after aggregation in structured streaming supported but maybe buggy? See following line in the example from earlier mail... ... .where(F.expr("distinct_username >= 2")) ... -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --