Hi Everyone, I'd like to start a discussion about possibility of adding magrittr (https://magrittr.tidyverse.org/) as an explicit dependency for SparkR. For those not familiar with the package, it provides a number small utilities where the most important one is %>% function, similar to pipe-forward (|>) in F# or thread-first macro (->) in Clojure. In other words, it allows us to replace:
df <- createDataFrame(iris) df_filtered <- filter(df, df$Sepal_Width > df$Petal_Length) df_projected <- select(df_filtered, min(df$Sepal_Width - df$Petal_Length)) or df_projected <- select( filter(createDataFrame(iris), column("Sepal_Width") > column("Petal_Length")), min(column("Sepal_Width") - column("Petal_Length")) ) with df_projected <- createDataFrame(iris) %>% filter(.$Sepal_Width > .$Petal_Length) %>% select(min(.$Sepal_Width - .$Petal_Length)) It is widely used (see reverse dependency section https://cran.r-project.org/web/packages/magrittr/index.html), stable and pretty much a core element of idiomatic R code these days. Why we might want to add it: * Improve readability of SparkR examples which, subjectively speaking, can look a bit archaic. * Reduce verbosity of SparkR codebase. Possible risks: * It is additional dependency for CI pipeline. A: magrittr is already a transitive dependency for SparkR tests (it is required by testthat), its API is extremely stable and itself requires no dependencies. * It is an additional dependency for SparkR installations. A: Give widespread usage (over 1200 reverse imports, including some of the most popular packages) it is probably of any, but minimal, R installation. While it's just anecdotal evidence, most of the SparkR applications I've seen out there, already use magrittr. Non-goals: * Supporting non-standard evaluation. Thanks in advance for your input. -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: A30CEF0C31A501EC
signature.asc
Description: OpenPGP digital signature