Seok-Joon,Yun created SPARK-24030: ------------------------------------- Summary: SparkSQL percentile_approx function is too slow for over 1,060,000 records. Key: SPARK-24030 URL: https://issues.apache.org/jira/browse/SPARK-24030 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1 Environment: zeppline + Spark 2.2.1 on Amazon EMR and local laptop. Reporter: Seok-Joon,Yun
I used percentile_approx functions for over 1,060,000 records. It is too slow. It takes about 90 mins. So I tried for 1,040,000 records. It take about 10 secs. I tested for data reading on JDBC and parquet. It takes same time lengths. I wonder that function is not designed for multi worker. I looked gangglia and spark history. It worked on one worker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org