[ https://issues.apache.org/jira/browse/SPARK-24030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Seok-Joon,Yun updated SPARK-24030: ---------------------------------- Attachment: screenshot_2018-04-20 23.15.02.png > SparkSQL percentile_approx function is too slow for over 1,060,000 records. > --------------------------------------------------------------------------- > > Key: SPARK-24030 > URL: https://issues.apache.org/jira/browse/SPARK-24030 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.1 > Environment: zeppline + Spark 2.2.1 on Amazon EMR and local laptop. > Reporter: Seok-Joon,Yun > Priority: Major > Attachments: screenshot_2018-04-20 23.15.02.png > > > I used percentile_approx functions for over 1,060,000 records. It is too > slow. It takes about 90 mins. So I tried for 1,040,000 records. It take about > 10 secs. > I tested for data reading on JDBC and parquet. It takes same time lengths. > I wonder that function is not designed for multi worker. > I looked gangglia and spark history. It worked on one worker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org