Re: [R] function using values separated by a comma
Hi, Thanks again for your help with this. I would like to use a variation of this function in a similar dataset (numeric) with elements separated by a comma e.g. dat - read.table(tc - textConnection( '0,1 1,3 40,10 0,0 20,5 4,2 10,40 10,0 0,11 1,2 120,10 0,0'), sep=) to simply calculate the frequency of the first number divided by the total number, i.e. x[1]/sum(x). to produce: [,1] [,2] [,3] [,4] [1,] 0 0.25 0.8 NaN [2,] 0.8 0.33 0.2 1 [3,] 0 0.33 0.92 NaN My actual dataset is an enormous file (800,000 rows and 100 columns). Any advice on how I can do this, maybe using gsubfn? Thank you very much! -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2999723.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
Try this (I think your result in [2,2] is incorrect): dat - read.table(tc - textConnection( + '0,1 1,3 40,10 0,0 + 20,5 4,2 10,40 10,0 + 0,11 1,2 120,10 0,0'), as.is = TRUE) closeAllConnections() # split the data and create new matrix newDat - lapply(dat, function(.col){ + # split by comma, unlist, convert to numeric and divide + x1 - matrix(as.numeric(unlist(strsplit(.col, ','))), nrow = 2) + x1[1, ] / colSums(x1) + }) do.call(cbind, newDat) V1V2 V3 V4 [1,] 0.0 0.250 0.80 NaN [2,] 0.8 0.667 0.20 1 [3,] 0.0 0.333 0.923077 NaN On Mon, Oct 18, 2010 at 2:37 AM, burgundy saub...@yahoo.com wrote: Hi, Thanks again for your help with this. I would like to use a variation of this function in a similar dataset (numeric) with elements separated by a comma e.g. dat - read.table(tc - textConnection( '0,1 1,3 40,10 0,0 20,5 4,2 10,40 10,0 0,11 1,2 120,10 0,0'), sep=) to simply calculate the frequency of the first number divided by the total number, i.e. x[1]/sum(x). to produce: [,1] [,2] [,3] [,4] [1,] 0 0.25 0.8 NaN [2,] 0.8 0.33 0.2 1 [3,] 0 0.33 0.92 NaN My actual dataset is an enormous file (800,000 rows and 100 columns). Any advice on how I can do this, maybe using gsubfn? Thank you very much! -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2999723.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
Hi Just used this function on my real data - several enormous files (80 rows by 200 columns...) and it worked perfectly! Thanks again for your help, saved me a lot of time! A last quick query, I have several other similar problems to deal with in my data - do you know a useful book or online course that would be helpful for learning these sorts of data handling functions? Thanks again! --- On Fri, 8/10/10, Jeffrey Spies-2 [via R] ml-node+2968583-620301009-75...@n4.nabble.com wrote: From: Jeffrey Spies-2 [via R] ml-node+2968583-620301009-75...@n4.nabble.com Subject: Re: function using values separated by a comma To: burgundy saub...@yahoo.com Date: Friday, 8 October, 2010, 16:48 Here's another method without using any external regular expression libraries: dat - read.table(tc - textConnection( '0,1 1,3 40,10 0,0 20,5 4,2 10,40 10,0 0,11 1,2 120,10 0,0'), sep=) mat - apply(dat, c(1,2), function(x){ Â Â Â Â temp - as.numeric(unlist(strsplit(x, ','))) Â Â Â Â min(temp)/sum(temp) }) For mat[2,4], I get 0 (as did the other solutions), and you get 1, so check on that. If you want the divide-by-0 NaNs to be 0, you can check that by replacing min(temp)/sum(temp) with: ifelse(is.nan(val-min(temp)/sum(temp)), 0, val) This has an advantage over: mat[is.na(mat)] - 0 in that you might have true missingness in your data and is.na won't be able to distinguish it. Cheers, Jeff. On Fri, Oct 8, 2010 at 1:19 AM, burgundy [hidden email] wrote: Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. Â Â [,1] Â [,2] Â [,3] Â [ ,4] [1,] 0,1 Â 1,3 Â 40,10 Â 0,0 [2,] 20,5 Â 4,2 Â 10,40 Â 10,0 [3,] 0,11 Â 1,2 Â 120,10 Â 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: Â Â [,1] Â [,2] Â [,3] Â [,4] [1,] 1 Â 0.25 Â 0.2 Â 0 [2,] 0.2 Â 0.33 Â 0.2 Â 1 [3,] 1 Â 0.33 Â 0.08 Â 0 Apologies, I know this is very complex. Any help, even just some pointers on how to write a general function where values are separated by a comma, is realy very much appreciated! Thank you -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @ http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2968583.html To unsubscribe from function using values separated by a comma, click here. -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2990966.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function using values separated by a comma
Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. [,1] [,2] [,3] [ ,4] [1,] 0,1 1,3 40,10 0,0 [2,] 20,5 4,2 10,40 10,0 [3,] 0,11 1,2 120,10 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: [,1] [,2] [,3] [,4] [1,] 1 0.25 0.2 0 [2,] 0.2 0.33 0.2 1 [3,] 1 0.33 0.08 0 Apologies, I know this is very complex. Any help, even just some pointers on how to write a general function where values are separated by a comma, is realy very much appreciated! Thank you -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
Hi, It is not the most elegant thing ever, but this does what you want. I am *fairly* certain it generalizes to different sized matrices, but I'd double check. When you divide by 0, it returns NaN, but this is pretty easy to fix if you really want 0s using is.nan(). My general process was: split data by commas, convert to numeric, define a function that does your calculations, apply this function, convert results back from a list to a matrix with the same number of columns as the original data, add any column/rownames from original matrix, return results. # Define a function my.fun - function(dat) { # split data by commas, and convert to numeric # with commas, it would have been character # so something like this is necessary temp - lapply(strsplit(dat, ,), as.numeric) # Define summary function my.summary - function(x) { ## This combines your first and second steps value - x[1]/sum(x) ## if value .5, return 1 - value ## otherwise, just return the value if(isTRUE(value 0.5)) { return(1 - value) } else {return(value)} } temp2 - lapply(temp, my.summary) output - matrix(unlist(temp2), ncol = ncol(dat), dimnames = dimnames(dat)) return(output) } # Create your data dat - c(0,1, 1,3, 40,10, 0,0, 20,5, 4,2, 10,40, 10,0, 0,11, 1,2, 120,10, 0,0) dat - matrix(dat, ncol = 4, byrow = TRUE) # Test it out my.fun(dat) HTH, Josh On Thu, Oct 7, 2010 at 10:19 PM, burgundy saub...@yahoo.com wrote: Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. [,1] [,2] [,3] [ ,4] [1,] 0,1 1,3 40,10 0,0 [2,] 20,5 4,2 10,40 10,0 [3,] 0,11 1,2 120,10 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: [,1] [,2] [,3] [,4] [1,] 1 0.25 0.2 0 [2,] 0.2 0.33 0.2 1 [3,] 1 0.33 0.08 0 Apologies, I know this is very complex. Any help, even just some pointers on how to write a general function where values are separated by a comma, is realy very much appreciated! Thank you -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
On Fri, Oct 8, 2010 at 1:19 AM, burgundy saub...@yahoo.com wrote: Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. [,1] [,2] [,3] [ ,4] [1,] 0,1 1,3 40,10 0,0 [2,] 20,5 4,2 10,40 10,0 [3,] 0,11 1,2 120,10 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: [,1] [,2] [,3] [,4] [1,] 1 0.25 0.2 0 [2,] 0.2 0.33 0.2 1 [3,] 1 0.33 0.08 0 Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). Using that match a regular expression consisting of digits, a comma and digits capturing the two strings of digits and passing them to function f replacing the expression with the output of f. Then read the resulting text into a data frame. library(gsubfn) L - c( 0,1 1,3 40,10 0,0, 20,5 4,2 10,40 10,0, 0,11 1,2 120,10 0,0) f - function(a, b) { x - as.numeric(c(a, b)); min(x)/sum(x) } L2 - gsubfn((\\d+),(\\d+), f, L) DF - read.table(textConnection(L2)) which gives: DF V1V2 V3 V4 1 0.0 0.250 0.2000 NaN 2 0.2 0.333 0.2000 0 3 0.0 0.333 0.07692308 NaN -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Fri, Oct 8, 2010 at 1:19 AM, burgundy saub...@yahoo.com wrote: Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. [,1] [,2] [,3] [ ,4] [1,] 0,1 1,3 40,10 0,0 [2,] 20,5 4,2 10,40 10,0 [3,] 0,11 1,2 120,10 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: [,1] [,2] [,3] [,4] [1,] 1 0.25 0.2 0 [2,] 0.2 0.33 0.2 1 [3,] 1 0.33 0.08 0 Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). Using that match a regular expression consisting of digits, a comma and digits capturing the two strings of digits and passing them to function f replacing the expression with the output of f. Then read the resulting text into a data frame. library(gsubfn) L - c( 0,1 1,3 40,10 0,0, 20,5 4,2 10,40 10,0, 0,11 1,2 120,10 0,0) f - function(a, b) { x - as.numeric(c(a, b)); min(x)/sum(x) } L2 - gsubfn((\\d+),(\\d+), f, L) DF - read.table(textConnection(L2)) which gives: DF V1 V2 V3 V4 1 0.0 0.250 0.2000 NaN 2 0.2 0.333 0.2000 0 3 0.0 0.333 0.07692308 NaN A further simplification would be to use strapply from the same package. It eliminates the need for read.table at the end: strapply(L, (\\d+),(\\d+), f, simplify = rbind) [,1] [,2] [,3] [,4] [1,] 0.0 0.250 0.2000 NaN [2,] 0.2 0.333 0.20000 [3,] 0.0 0.333 0.07692308 NaN -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
Here's another method without using any external regular expression libraries: dat - read.table(tc - textConnection( '0,1 1,3 40,10 0,0 20,5 4,2 10,40 10,0 0,11 1,2 120,10 0,0'), sep=) mat - apply(dat, c(1,2), function(x){ temp - as.numeric(unlist(strsplit(x, ','))) min(temp)/sum(temp) }) For mat[2,4], I get 0 (as did the other solutions), and you get 1, so check on that. If you want the divide-by-0 NaNs to be 0, you can check that by replacing min(temp)/sum(temp) with: ifelse(is.nan(val-min(temp)/sum(temp)), 0, val) This has an advantage over: mat[is.na(mat)] - 0 in that you might have true missingness in your data and is.na won't be able to distinguish it. Cheers, Jeff. On Fri, Oct 8, 2010 at 1:19 AM, burgundy saub...@yahoo.com wrote: Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. [,1] [,2] [,3] [ ,4] [1,] 0,1 1,3 40,10 0,0 [2,] 20,5 4,2 10,40 10,0 [3,] 0,11 1,2 120,10 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: [,1] [,2] [,3] [,4] [1,] 1 0.25 0.2 0 [2,] 0.2 0.33 0.2 1 [3,] 1 0.33 0.08 0 Apologies, I know this is very complex. Any help, even just some pointers on how to write a general function where values are separated by a comma, is realy very much appreciated! Thank you -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.