On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger <ber...@mpiib-berlin.mpg.de> wrote:
> Hi, > > On 08/05/17 16:37, Ista Zahn wrote: > >> One of the key strengths of R is that packages are not akin to "fan >> created mods". They are a central and necessary part of the R system. >> >> I would tend to disagree here. R packages are in their majority not > maintained by the core R developers. Concepts, features and lifetime depend > mainly on the maintainers of the package (even though in theory GPL will > allow to somebody to take over anytime). Several packages that are critical > for processing big data and providing "modern" visualizations introduce > concepts quite different from the legacy S/R language. I do feel that in a > way, current core R shows strongly its origin in S, while modern concepts > (e.g. data.table, dplyr, ggplot, ...) are often only available via > extension packages. This is fine if one considers R to be a statistical > toolkit; as a programming language, however, it introduces inconsistencies > and uncertainties which could be avoided if some of the "modern" parts > (including language concepts) could be more integrated in core-R. > > Best regards, > Hilmar > And I would tend to disagree here. R is build upon the paradigm of a functional programming language, and falls in the same group as clojure, haskell and the likes. It is a turing complete programming language on its own. That's quite a bit more than "a statistical toolkit". You can say that about eg the macro language of SPSS, but not about R. Second, there's little "modern" about the ideas behind the tidyverse. Piping is about as old as unix itself. The grammar of graphics, on which ggplot is based, stems from the SYStat graphics system from the nineties. Hadley and colleagues did (and do) a great job implementing these ideas in R, but the ideas do have a respectable age. Third, there's a lot of nonstandard evaluation going on in all these packages. Using them inside your own functions requires serious attention (eg the difference between aes() and aes_() in ggplot2). Actually, even though I definitely see the merits of these packages in data analysis, the tidyverse feels like a (clean and powerful) macro language on top of R. And that's good, but that doesn't mean these parts are essential to transform R into a programming language. Rather the contrary actually: too heavily relying on these packages does complicate things when you start to develop your own packages in R. Forth, the tidyverse masks quite some native R functions. Obviously they took great care in keeping the functionality as close as one would expect, but that's not always the case. The lag() function of dplyr() masks an S3 generic from the stats package for example. So if you work with time series in the stats package, loading the tidyverse gives you trouble. Fifth, many of the tidyverse packages are a version 0.x.y : they're still in beta development and their functionality might (and will) change. Functions disappear, arguments are called different, tags change,... Often the changes improve the packages, but they did break older code for me more than once. You can't expect the R core team to incorporate something that is bound to change. Last but not least, the tidyverse actually sometimes works against new R users. At least R users that go beyond the classic data workflow. I literally rewrote some code -from a consultant- that abused the _ply functions to create nested loops. Removing all that stuff and rewriting the code using a simple list in combination with a simple for-loop, sped up the code with a factor 150. That has nothing to do with dplyr, it's very fast. That has everything to do with that person having a hammer and thinking everything he sees is a nail. The tidyverse is no reason to not learn the concepts of the language it's built upon. The one thing I would like to see though, is the adaptation of the statistical toolkit so that it can work with data.table and tibble objects directly, as opposed to having to convert to a data.frame once you start building the models. And I believe that eventually there will be a replacement for the data.frame that increases R's performance and lessens its burden on the memory. So all in all, I do admire the tidyverse and how it speeds up data preparation for analysis. But tidyverse is a powerful data toolkit, not a programming language. And it won't make R a programming language either. Because R is already. Cheers Joris > > -- > Dr. Hilmar Berger, MD > Max Planck Institute for Infection Biology > Charitéplatz 1 > D-10117 Berlin > GERMANY > > Phone: + 49 30 28460 430 > Fax: + 49 30 28460 401 > E-Mail: ber...@mpiib-berlin.mpg.de > Web : www.mpiib-berlin.mpg.de > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel