Re: [R] identify the distribution of the data
Hi Others gave you more fundamental answers. To check the possible distribution you could use package https://cran.r-project.org/web/packages/fitdistrplus/index.html Cheers Petr > -Original Message- > From: R-help On Behalf Of Bogdan Tanasa > Sent: Wednesday, February 8, 2023 5:35 PM > To: r-help > Subject: [R] identify the distribution of the data > > Dear all, > > I do have dataframes with numerical values such as 1,9, 20, 51, 100 etc > > Which way do you recommend to use in order to identify the type of the > distribution of the data (normal, poisson, bernoulli, exponential, log-normal etc > ..) > > Thanks so much, > > Bogdan > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] preserve class in apply function
Also try apply(Filter(is.numeric, mydf), 1, sum) On Tue, Feb 7, 2023 at 8:42 AM PIKAL Petr wrote: > > Hi Naresh > > If you wanted to automate the function a bit you can use sapply to find > numeric columns > ind <- sapply(mydf, is.numeric) > > and use it in apply construct > apply(mydf[,ind], 1, function(row) sum(row)) > [1] 2.13002569 0.63305300 1.48420429 0.13523859 1.17515873 -0.98531131 > [7] 0.47044467 0.23914494 0.26504430 0.02037657 > > Cheers > Petr > > > -Original Message- > > From: R-help On Behalf Of Naresh Gurbuxani > > Sent: Tuesday, February 7, 2023 1:52 PM > > To: r-help@r-project.org > > Subject: [R] preserve class in apply function > > > > > > > Consider a data.frame whose different columns have numeric, character, > > > and factor data. In apply function, R seems to pass all elements of a > > > row as character. Is it possible to preserve numeric class? > > > > > >> mydf <- data.frame(x = rnorm(10), y = runif(10)) > > >> apply(mydf, 1, function(row) {row["x"] + row["y"]}) > > > [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 > > 0.31351909 > > > [7] -0.63575990 0.22670658 0.55696314 0.39587314 > > >> mydf[, "z"] <- sample(letters[1:3], 10, replace = TRUE) > > >> apply(mydf, 1, function(row) {row["x"] + row["y"]}) > > > Error in row["x"] + row["y"] (from #1) : non-numeric argument to binary > > operator > > >> apply(mydf, 1, function(row) {as.numeric(row["x"]) + > > as.numeric(row["y"])}) > > > [1] 0.60150194 -0.74201826 0.80476394 -0.59729282 -0.02980338 > > 0.31351912 > > > [7] -0.63575991 0.22670663 0.55696309 0.39587311 > > >> apply(mydf[,c("x", "y")], 1, function(row) {row["x"] + row["y"]}) > > > [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 > > 0.31351909 > > > [7] -0.63575990 0.22670658 0.55696314 0.39587314 > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to split a overflow slide content to slides automatically using revealjs in rmarkdown?
Hi, How to split a overflow slide content to slides automatically using revealjs or other html representation creator in rmarkdown, the effects like beamer "allow frame breaker" option with subsection number added? And the main section number (slide title) will not be increased? Thanks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to split a overflow slide content to slides automatically using revealjs in rmarkdown?
How to split a overflow slide content to slides automatically using revealjs or other html representation creator in rmarkdown, the effects like beamer "allow frame breaker" option with subsection number added? And the main section number (slide title) will not be increased? Thanks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] preserve class in apply function
Jorgen is correct that for many purposes, viewing a data.frame as a collection of vectors of the same length allows you to code fairly complex logic using whichever vectors you want and result in a vector answer, either externally or as a new column. Text columns used to make some decisions in the function are also usable using vectorized functions like ifelse(cond, when_true, when_false). And, although much can be done in base R, people often use the dplyr/tidyverse function of mutate() to do such calculations in a slightly less wordy way. You may be looking at apply() as a way to operate one row at a time when an R paradigm is to be able to operate on all rows sort of at once. -Original Message- From: R-help On Behalf Of Jorgen Harmse via R-help Sent: Wednesday, February 8, 2023 11:10 AM To: r-help@r-project.org; naresh_gurbux...@hotmail.com Subject: Re: [R] preserve class in apply function What are you trying to do? Why use apply when there is already a vector addition operation? df$x+df$y or as.numeric(df$x)+as.numeric(df$y) or rowSums(as.numeric(df[c('x','y')])). As noted in other answers, apply will coerce your data frame to a matrix, and all entries of a matrix must have the same type. Regards, Jorgen Harmse. Message: 1 Date: Tue, 7 Feb 2023 07:51:50 -0500 From: Naresh Gurbuxani To: "r-help@r-project.org" Subject: [R] preserve class in apply function Message-ID: Content-Type: text/plain; charset="us-ascii" > Consider a data.frame whose different columns have numeric, character, > and factor data. In apply function, R seems to pass all elements of a > row as character. Is it possible to preserve numeric class? > >> mydf <- data.frame(x = rnorm(10), y = runif(10)) apply(mydf, 1, >> function(row) {row["x"] + row["y"]}) > [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 > 0.31351909 [7] -0.63575990 0.22670658 0.55696314 0.39587314 >> mydf[, "z"] <- sample(letters[1:3], 10, replace = TRUE) apply(mydf, >> 1, function(row) {row["x"] + row["y"]}) > Error in row["x"] + row["y"] (from #1) : non-numeric argument to > binary operator >> apply(mydf, 1, function(row) {as.numeric(row["x"]) + >> as.numeric(row["y"])}) > [1] 0.60150194 -0.74201826 0.80476394 -0.59729282 -0.02980338 > 0.31351912 [7] -0.63575991 0.22670663 0.55696309 0.39587311 >> apply(mydf[,c("x", "y")], 1, function(row) {row["x"] + row["y"]}) > [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 > 0.31351909 [7] -0.63575990 0.22670658 0.55696314 0.39587314 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] preserve class in apply function
Naresh, This is a common case where the answer to a question is to ask the right question. Your question was how to make apply work. My question is how can you get the functionality you want done in some version of R. Apply is a tool and it is only one of many tools and may be the wrong one for your task. For a data.frame there can be lots of tools you may investigate both in vase R and add-on packages like dplyr in the tidyverse. As has been pointed out, a side-effect of apply is to make a matrix and R automagically figures out what the most specific kind of data type it can handle. So solutions range from not including any columns that are not numeric, if that makes sense, to accepting they are all going to be of type character and in the function you apply, convert them individually back to what you want. One straightforward solution is to make a loop indexed to the number of rows in your data.frame and process the variables in each row using [] notation. Not fast, but you see what you have. Another is functions like pmap() in the purr package. Yet another might be the rowwise() function in the dplyr package. It depends on what you want to do. Note with multiple columns, sometimes your function may need to use a ... to receive them. -Original Message- From: R-help On Behalf Of Naresh Gurbuxani Sent: Tuesday, February 7, 2023 3:29 PM To: PIKAL Petr Cc: r-help@r-project.org Subject: Re: [R] preserve class in apply function Thanks for all the responses. I need to use some text columns to determine method applied to numeric columns. Split seems to be the way to go. Sent from my iPhone > On Feb 7, 2023, at 8:31 AM, PIKAL Petr wrote: > > Hi Naresh > > If you wanted to automate the function a bit you can use sapply to > find numeric columns ind <- sapply(mydf, is.numeric) > > and use it in apply construct > apply(mydf[,ind], 1, function(row) sum(row)) [1] 2.13002569 > 0.63305300 1.48420429 0.13523859 1.17515873 -0.98531131 [7] > 0.47044467 0.23914494 0.26504430 0.02037657 > > Cheers > Petr > >> -Original Message- >> From: R-help On Behalf Of Naresh >> Gurbuxani >> Sent: Tuesday, February 7, 2023 1:52 PM >> To: r-help@r-project.org >> Subject: [R] preserve class in apply function >> >> >>> Consider a data.frame whose different columns have numeric, >>> character, and factor data. In apply function, R seems to pass all >>> elements of a row as character. Is it possible to preserve numeric class? >>> mydf <- data.frame(x = rnorm(10), y = runif(10)) apply(mydf, 1, function(row) {row["x"] + row["y"]}) >>> [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 >> 0.31351909 >>> [7] -0.63575990 0.22670658 0.55696314 0.39587314 mydf[, "z"] <- sample(letters[1:3], 10, replace = TRUE) apply(mydf, 1, function(row) {row["x"] + row["y"]}) >>> Error in row["x"] + row["y"] (from #1) : non-numeric argument to >>> binary >> operator apply(mydf, 1, function(row) {as.numeric(row["x"]) + >> as.numeric(row["y"])}) >>> [1] 0.60150194 -0.74201826 0.80476394 -0.59729282 -0.02980338 >> 0.31351912 >>> [7] -0.63575991 0.22670663 0.55696309 0.39587311 apply(mydf[,c("x", "y")], 1, function(row) {row["x"] + row["y"]}) >>> [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 >> 0.31351909 >>> [7] -0.63575990 0.22670658 0.55696314 0.39587314 >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] identify the distribution of the data
On 2/8/23 12:06 PM, Ebert,Timothy Aaron wrote: IMO) The best approach is to develop a good understanding of the individual processes that resulted in the observed values. The blend of those processes then results in the distribution of the observed values. This is seldom done, and often not possible to do. The alternatives depend on why you are doing this. 0) Sometime the nature of the data suggest a distribution. You list integer values. If all observations are integer (counts for example) then Poisson may be appropriate. With two values then maybe the Binomial distribution. Continuous data might be normally distributed (Gaussian distribution). If I roll one six-sided die many times I will have a uniform distribution (assuming a fair die). I could then try the same task but roll 2 dice and add the result. I still have discrete values, but the shape is closer to Gaussian. The distribution looks more and more Gaussian as I add more dice together in each roll. I concur: The application will often suggest a distribution, e.g., Poisson, binomial or negative binomial for nonnegative integers, Weibull for lifetime data, etc. I love normal probability plots -- the qqnorm function. This can identify outliers or multimodality or the need for a transformation. Continuous data that are always positive are often log-normal -- or a mixture of log-normals. x <- rnorm(100) X <- exp(x) qqnorm(X, datax=TRUE, log='x') The central limit theorem says that the distribution of almost any sum of random variables will be more nearly normal than the distributions of individual summands. It also says that almost any product of positive random variables will be more nearly log-normal than the distributions of individual components of the product. This application to products is less well known and occasionally controversial. https://en.wikipedia.org/wiki/Gibrat%27s_law Spencer Graves 1) Try a simulation. Draw 5 values from a normal distribution, make a histogram. Then do it again. Is it easy to see that both samples are from the same distribution? Personally, the answer is no. So increase the sample size until you are happy with a decision that any two draws are from the same distribution. For my part, at 1 million most people would not be able to detect any difference between the two histograms. This helps calibrate the people. How does your sample size compare to your choice in this exercise? 2) Given that you have sufficient data (see above), can you see the distribution in your data? Is that good enough? 3) Are you doing this as part of following the assumptions of statistical models? In such tests for normality, we tend to assume that a failure to reject the null hypothesis is sufficient proof that the null hypothesis is true. However, in most other cases we are told that a failure to reject the null hypothesis is not sufficient to prove the null hypothesis. You need to work this out, but the importance, consequences, and alternatives of testing model assumptions is a large body of literature with (sometimes) widely divergent viewpoints. 4) There are hundreds of distributions. https://cran.r-project.org/web/views/Distributions.html but the common distributions are seen in sites like this one: https://www.stat.umn.edu/geyer/old/5101/rlook.html. Given so many choices, you can probably find one that will fit your data reasonably well. Depending on how many data points you have will determine the reliability of that answer. Is that really informative to the problem you are trying to solve? Answering "what distribution do these data follow?" is not usually the goal. Regards, Tim -Original Message- From: R-help On Behalf Of Bert Gunter Sent: Wednesday, February 8, 2023 12:00 PM To: Bogdan Tanasa Cc: r-help Subject: Re: [R] identify the distribution of the data [External Email] 1. This is a statistical question, which usually is inappropriate here: this list is about R language (including packages) programming. 2. IMO (so others may disagree), your question indicates a profound misunderstanding of basic statistical issues. While maybe you phrased it poorly or I misunderstand, but "identify the type of distribution" is basically a meaningless query. Explaining why this is so and what may be more meaningful would require a deep dive into statistics. You might try referencing a basic statistical text and/or online tutorials. Try searching on "Goodness of fit", "statistical modeling" or the like. Cheers, Bert On Wed, Feb 8, 2023 at 8:35 AM Bogdan Tanasa wrote: Dear all, I do have dataframes with numerical values such as 1,9, 20, 51, 100 etc Which way do you recommend to use in order to identify the type of the distribution of the data (normal, poisson, bernoulli, exponential, log-normal etc ..) Thanks so much, Bogdan [[alternative HTML version deleted]]
Re: [R] identify the distribution of the data
IMO) The best approach is to develop a good understanding of the individual processes that resulted in the observed values. The blend of those processes then results in the distribution of the observed values. This is seldom done, and often not possible to do. The alternatives depend on why you are doing this. 0) Sometime the nature of the data suggest a distribution. You list integer values. If all observations are integer (counts for example) then Poisson may be appropriate. With two values then maybe the Binomial distribution. Continuous data might be normally distributed (Gaussian distribution). If I roll one six-sided die many times I will have a uniform distribution (assuming a fair die). I could then try the same task but roll 2 dice and add the result. I still have discrete values, but the shape is closer to Gaussian. The distribution looks more and more Gaussian as I add more dice together in each roll. 1) Try a simulation. Draw 5 values from a normal distribution, make a histogram. Then do it again. Is it easy to see that both samples are from the same distribution? Personally, the answer is no. So increase the sample size until you are happy with a decision that any two draws are from the same distribution. For my part, at 1 million most people would not be able to detect any difference between the two histograms. This helps calibrate the people. How does your sample size compare to your choice in this exercise? 2) Given that you have sufficient data (see above), can you see the distribution in your data? Is that good enough? 3) Are you doing this as part of following the assumptions of statistical models? In such tests for normality, we tend to assume that a failure to reject the null hypothesis is sufficient proof that the null hypothesis is true. However, in most other cases we are told that a failure to reject the null hypothesis is not sufficient to prove the null hypothesis. You need to work this out, but the importance, consequences, and alternatives of testing model assumptions is a large body of literature with (sometimes) widely divergent viewpoints. 4) There are hundreds of distributions. https://cran.r-project.org/web/views/Distributions.html but the common distributions are seen in sites like this one: https://www.stat.umn.edu/geyer/old/5101/rlook.html. Given so many choices, you can probably find one that will fit your data reasonably well. Depending on how many data points you have will determine the reliability of that answer. Is that really informative to the problem you are trying to solve? Answering "what distribution do these data follow?" is not usually the goal. Regards, Tim -Original Message- From: R-help On Behalf Of Bert Gunter Sent: Wednesday, February 8, 2023 12:00 PM To: Bogdan Tanasa Cc: r-help Subject: Re: [R] identify the distribution of the data [External Email] 1. This is a statistical question, which usually is inappropriate here: this list is about R language (including packages) programming. 2. IMO (so others may disagree), your question indicates a profound misunderstanding of basic statistical issues. While maybe you phrased it poorly or I misunderstand, but "identify the type of distribution" is basically a meaningless query. Explaining why this is so and what may be more meaningful would require a deep dive into statistics. You might try referencing a basic statistical text and/or online tutorials. Try searching on "Goodness of fit", "statistical modeling" or the like. Cheers, Bert On Wed, Feb 8, 2023 at 8:35 AM Bogdan Tanasa wrote: > Dear all, > > I do have dataframes with numerical values such as 1,9, 20, 51, 100 > etc > > Which way do you recommend to use in order to identify the type of the > distribution of the data (normal, poisson, bernoulli, exponential, > log-normal etc ..) > > Thanks so much, > > Bogdan > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu > %7Cfe002d446d0d4d722f1408db09f5e78f%7C0d4da0f84a314d76ace60a62331e1b84 > %7C0%7C0%7C638114724007457767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C > ta=GrZd0ZRFfnvbXzZKvJy7XUkRN4IsJOykuN5xTliR4sY%3D=0 > PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r > -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7Cfe > 002d446d0d4d722f1408db09f5e78f%7C0d4da0f84a314d76ace60a62331e1b84%7C0% > 7C0%7C638114724007457767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Fz > GMCrWD2aA2zBxcAKXQQEcbD1%2FILkTPB3jjCypcIfI%3D=0 > and provide commented, minimal, self-contained, reproducible code. >
Re: [R] identify the distribution of the data
1. This is a statistical question, which usually is inappropriate here: this list is about R language (including packages) programming. 2. IMO (so others may disagree), your question indicates a profound misunderstanding of basic statistical issues. While maybe you phrased it poorly or I misunderstand, but "identify the type of distribution" is basically a meaningless query. Explaining why this is so and what may be more meaningful would require a deep dive into statistics. You might try referencing a basic statistical text and/or online tutorials. Try searching on "Goodness of fit", "statistical modeling" or the like. Cheers, Bert On Wed, Feb 8, 2023 at 8:35 AM Bogdan Tanasa wrote: > Dear all, > > I do have dataframes with numerical values such as 1,9, 20, 51, 100 etc > > Which way do you recommend to use in order to identify the type of the > distribution of the data (normal, poisson, bernoulli, exponential, > log-normal etc ..) > > Thanks so much, > > Bogdan > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R-es] Actualizar tabla en shiny
Hola, Quería preguntar si alguien sabe cómo se puede utilizar el resultado de modificar una tabla con el argumento editable = TRUE de datatable, para actualizar una segunda tabla. Adjunto un código de ejemplo y lo copio también aquí abajo. Lo que yo querría por ejemplo es poner en la columna 'replacements' de la tab1 los valores 1 y 2 (clicando en las celdas de esa columna y escribiendo a mano ese valor, gracias a editable = TRUE) y que en la tab2 aparecieran esos valores cambiados. He encontrado este enlace con la misma idea https://stackoverflow.com/questions/31744300/using-the-values-of-rendered-datatable-in-later-analysis pero no sé llevar a cabo los procedimientos que se comentan en la respuesta. Gracias de antemano. Saludos, Guillermo library(shiny) library(DT) library(dplyr) ui <- fluidPage( sidebarLayout( sidebarPanel( h4("App para sustituir valores. A partir de sustituir valores en la tabla de arriba, el objetivo es actualizar la tabla de abajo."), ), mainPanel( dataTableOutput("tab1"), br(), dataTableOutput("tab2") ) ) ) server <- function(input, output) { output$tab1 <- renderDataTable({ df1 <- data.frame(values_to_replace = c("A+", "B-")) %>% mutate(replacements = NA) datatable(df1, rownames = FALSE, editable = TRUE) }) output$tab2 <- renderDataTable({ df1 <- data.frame(observations = c("A+", "B-", 1, 5, "B-", 7, "A+", "B-")) vals_orig <- c("A+", "B-") #vals_repl <- df1$replacements # ESTO ES LO QUE DEBE VENIR DE tab. vals_repl <- c(10, 20) df2 <- df1 %>% mutate(observations_repl = plyr::mapvalues(observations, from = vals_orig, to = vals_repl)) datatable(df2, rownames = FALSE) }) } shinyApp(ui, server) app_example_data_replacement.R Description: Binary data ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R] identify the distribution of the data
Dear all, I do have dataframes with numerical values such as 1,9, 20, 51, 100 etc Which way do you recommend to use in order to identify the type of the distribution of the data (normal, poisson, bernoulli, exponential, log-normal etc ..) Thanks so much, Bogdan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] preserve class in apply function
What are you trying to do? Why use apply when there is already a vector addition operation? df$x+df$y or as.numeric(df$x)+as.numeric(df$y) or rowSums(as.numeric(df[c('x','y')])). As noted in other answers, apply will coerce your data frame to a matrix, and all entries of a matrix must have the same type. Regards, Jorgen Harmse. Message: 1 Date: Tue, 7 Feb 2023 07:51:50 -0500 From: Naresh Gurbuxani To: "r-help@r-project.org" Subject: [R] preserve class in apply function Message-ID: Content-Type: text/plain; charset="us-ascii" > Consider a data.frame whose different columns have numeric, character, > and factor data. In apply function, R seems to pass all elements of a > row as character. Is it possible to preserve numeric class? > >> mydf <- data.frame(x = rnorm(10), y = runif(10)) >> apply(mydf, 1, function(row) {row["x"] + row["y"]}) > [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 0.31351909 > [7] -0.63575990 0.22670658 0.55696314 0.39587314 >> mydf[, "z"] <- sample(letters[1:3], 10, replace = TRUE) >> apply(mydf, 1, function(row) {row["x"] + row["y"]}) > Error in row["x"] + row["y"] (from #1) : non-numeric argument to binary > operator >> apply(mydf, 1, function(row) {as.numeric(row["x"]) + as.numeric(row["y"])}) > [1] 0.60150194 -0.74201826 0.80476394 -0.59729282 -0.02980338 0.31351912 > [7] -0.63575991 0.22670663 0.55696309 0.39587311 >> apply(mydf[,c("x", "y")], 1, function(row) {row["x"] + row["y"]}) > [1] 0.60150197 -0.74201827 0.80476392 -0.59729280 -0.02980335 0.31351909 > [7] -0.63575990 0.22670658 0.55696314 0.39587314 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.