[ https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557153#comment-15557153 ]
holdenk commented on SPARK-2032: -------------------------------- I'm assuming since there hasn't been any activity for awhile [~prashant_] isn't working on this anymore? Is this something we still think would make sense to add to the RDD API? > Add an RDD.samplePartitions method for partition-level sampling > --------------------------------------------------------------- > > Key: SPARK-2032 > URL: https://issues.apache.org/jira/browse/SPARK-2032 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Matei Zaharia > Assignee: Prashant Sharma > Priority: Minor > > This would allow us to sample a percent of the partitions and not have to > materialize all of them. It's less uniform but much faster and may be useful > for quickly exploring data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org