Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/16668#discussion_r97473647
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3406,3 +3406,28 @@ setMethod("randomSplit",
}
sapply(sdfs, dataFrame)
})
+
+#' getNumPartitions
+#'
+#' Return the number of partitions
+#' Note: in order to compute the number of partition the SparkDataFrame
has to be converted into a
+#' RDD temporarily internally.
+#'
+#' @param x A SparkDataFrame
+#' @family SparkDataFrame functions
+#' @aliases getNumPartitions,SparkDataFrame-method
+#' @rdname getNumPartitions
+#' @name getNumPartitions
+#' @export
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df <- createDataFrame(cars, numPartitions = 2)
+#' getNumPartitions(df)
+#' }
+#' @note getNumPartitions since 2.1.1
+setMethod("getNumPartitions",
+ signature(x = "SparkDataFrame"),
+ function(x) {
+ getNumPartitionsRDD(toRDD(x))
--- End diff --
Give this is a bit of a hole I think it would be worthwhile to think if
there is a reasonable workaround for 2.1.1 release (say JVM wrapper for
`.rdd.getNumPartitions`), @shivaram would you agree?
As for the new Scala API, since it has broader implications it might be
something to target the 2.2 release? If so that would be better served in a
different PR.
I don't mind taking a shot at that - I'm not super familiar with that and
from a quick scan it seems to be non-trivial (to handle different RDD subtypes
and so on), so a few pointers would be appreciated, @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]