Re: [R] function using values separated by a comma

2010-10-18 Thread burgundy

Hi,

Thanks again for your help with this. I would like to use a variation of
this function in a similar dataset (numeric) with elements separated by a
comma e.g. 

dat - read.table(tc - textConnection(
'0,1 1,3 40,10 0,0
20,5 4,2 10,40 10,0
0,11 1,2 120,10 0,0'), sep=)

to simply calculate the frequency of the first number divided by the total
number, i.e. x[1]/sum(x).

to produce:

   [,1]  [,2]  [,3]  [,4]
[1,] 0   0.25  0.8  NaN
[2,] 0.8  0.33  0.2  1
[3,] 0  0.33  0.92  NaN


My actual dataset is an enormous file (800,000 rows and 100 columns). Any
advice on how I can do this, maybe using gsubfn?

Thank you very much!


-- 
View this message in context: 
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2999723.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function using values separated by a comma

2010-10-18 Thread jim holtman
Try this (I think your result in [2,2] is incorrect):

 dat - read.table(tc - textConnection(
+ '0,1 1,3 40,10 0,0
+ 20,5 4,2 10,40 10,0
+ 0,11 1,2 120,10 0,0'), as.is = TRUE)
 closeAllConnections()
 # split the data and create new matrix
 newDat - lapply(dat, function(.col){
+ # split by comma, unlist, convert to numeric and divide
+ x1 - matrix(as.numeric(unlist(strsplit(.col, ','))), nrow = 2)
+ x1[1, ] / colSums(x1)
+ })
 do.call(cbind, newDat)
  V1V2   V3  V4
[1,] 0.0 0.250 0.80 NaN
[2,] 0.8 0.667 0.20   1
[3,] 0.0 0.333 0.923077 NaN



On Mon, Oct 18, 2010 at 2:37 AM, burgundy saub...@yahoo.com wrote:

 Hi,

 Thanks again for your help with this. I would like to use a variation of
 this function in a similar dataset (numeric) with elements separated by a
 comma e.g.

 dat - read.table(tc - textConnection(
 '0,1 1,3 40,10 0,0
 20,5 4,2 10,40 10,0
 0,11 1,2 120,10 0,0'), sep=)

 to simply calculate the frequency of the first number divided by the total
 number, i.e. x[1]/sum(x).

 to produce:

   [,1]  [,2]  [,3]  [,4]
 [1,] 0   0.25  0.8  NaN
 [2,] 0.8  0.33  0.2  1
 [3,] 0  0.33  0.92  NaN


 My actual dataset is an enormous file (800,000 rows and 100 columns). Any
 advice on how I can do this, maybe using gsubfn?

 Thank you very much!


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2999723.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function using values separated by a comma

2010-10-11 Thread burgundy

Hi

Just used this function on my real data - several enormous files (80 rows 
by 200 columns...) and it worked perfectly! Thanks again for your help, saved 
me a lot of time!

A last quick query, I have several other similar problems to deal with in my 
data - do you know a useful book or online course that would be helpful for 
learning these sorts of data handling functions?

Thanks again!


--- On Fri, 8/10/10, Jeffrey Spies-2 [via R] 
ml-node+2968583-620301009-75...@n4.nabble.com wrote:

From: Jeffrey Spies-2 [via R] ml-node+2968583-620301009-75...@n4.nabble.com
Subject: Re: function using values separated by a comma
To: burgundy saub...@yahoo.com
Date: Friday, 8 October, 2010, 16:48



Here's another method without using any external regular expression libraries:


dat - read.table(tc - textConnection(

'0,1 1,3 40,10 0,0

20,5 4,2 10,40 10,0

0,11 1,2 120,10 0,0'), sep=)


mat - apply(dat, c(1,2), function(x){

        temp - as.numeric(unlist(strsplit(x, ',')))

        min(temp)/sum(temp)

})


For mat[2,4], I get 0 (as did the other solutions), and you get 1, so

check on that. If you want the divide-by-0 NaNs to be 0, you can check

that by replacing


min(temp)/sum(temp)


with:


ifelse(is.nan(val-min(temp)/sum(temp)), 0, val)


This has an advantage over:


mat[is.na(mat)] - 0


in that you might have true missingness in your data and is.na won't

be able to distinguish it.


Cheers,


Jeff.


On Fri, Oct 8, 2010 at 1:19 AM, burgundy [hidden email] wrote:



 Hello,



 I have a dataframe (tab separated file) which looks like the example below -

 two values separated by a comma, and tab separation between each of these.



     [,1]  [,2]  [,3]  [ ,4]

 [1,] 0,1  1,3   40,10  0,0

 [2,] 20,5  4,2  10,40  10,0

 [3,] 0,11  1,2  120,10  0,0



 I would like to calculate the percentage of the smallest number separated by

 the comma by:

 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50

 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50

 = 0.8

 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave

 value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2



 plan to generate file like:



    [,1]  [,2]  [,3]  [,4]

 [1,] 1   0.25  0.2  0

 [2,] 0.2  0.33  0.2  1

 [3,] 1  0.33  0.08  0



 Apologies, I know this is very complex. Any help, even just some pointers on

 how to write a general function where values are separated by a comma, is

 realy very much appreciated!



 Thank you



 --

 View this message in context: 
 http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
 Sent from the R help mailing list archive at Nabble.com.



 __

 [hidden email] mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







View message @ 
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2968583.html


To unsubscribe from function using values separated by a comma, click here.







-- 
View this message in context: 
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2990966.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function using values separated by a comma

2010-10-08 Thread burgundy

Hello,

I have a dataframe (tab separated file) which looks like the example below -
two values separated by a comma, and tab separation between each of these.

 [,1]  [,2]  [,3]  [ ,4]
[1,] 0,1  1,3   40,10  0,0
[2,] 20,5  4,2  10,40  10,0
[3,] 0,11  1,2  120,10  0,0

I would like to calculate the percentage of the smallest number separated by
the comma by:
1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
= 0.8
3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave
value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2

plan to generate file like:
   
[,1]  [,2]  [,3]  [,4]
[1,] 1   0.25  0.2  0
[2,] 0.2  0.33  0.2  1
[3,] 1  0.33  0.08  0

Apologies, I know this is very complex. Any help, even just some pointers on
how to write a general function where values are separated by a comma, is
realy very much appreciated!

Thank you

-- 
View this message in context: 
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function using values separated by a comma

2010-10-08 Thread Joshua Wiley
Hi,

It is not the most elegant thing ever, but this does what you want.  I
am *fairly* certain it generalizes to different sized matrices, but
I'd double check.  When you divide by 0, it returns NaN, but this is
pretty easy to fix if you really want 0s using is.nan().  My general
process was: split data by commas, convert to numeric, define a
function that does your calculations, apply this function, convert
results back from a list to a matrix with the same number of columns
as the original data, add any column/rownames from original matrix,
return results.


# Define a function
my.fun - function(dat) {
  # split data by commas, and convert to numeric
  # with commas, it would have been character
  # so something like this is necessary
  temp - lapply(strsplit(dat, ,), as.numeric)
  # Define summary function
  my.summary - function(x) {
## This combines your first and second steps
value - x[1]/sum(x)
## if value  .5, return 1 - value
## otherwise, just return the value
if(isTRUE(value  0.5)) {
  return(1 - value)
} else {return(value)}
  }
  temp2 - lapply(temp, my.summary)
  output - matrix(unlist(temp2), ncol = ncol(dat),
dimnames = dimnames(dat))
  return(output)
}

# Create your data
dat - c(0,1, 1,3, 40,10, 0,0, 20,5, 4,2,
 10,40, 10,0, 0,11, 1,2, 120,10, 0,0)
dat - matrix(dat, ncol = 4, byrow = TRUE)

# Test it out
my.fun(dat)

HTH,

Josh

On Thu, Oct 7, 2010 at 10:19 PM, burgundy saub...@yahoo.com wrote:

 Hello,

 I have a dataframe (tab separated file) which looks like the example below -
 two values separated by a comma, and tab separation between each of these.

     [,1]  [,2]  [,3]  [ ,4]
 [1,] 0,1  1,3   40,10  0,0
 [2,] 20,5  4,2  10,40  10,0
 [3,] 0,11  1,2  120,10  0,0

 I would like to calculate the percentage of the smallest number separated by
 the comma by:
 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
 = 0.8
 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave
 value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2

 plan to generate file like:

    [,1]  [,2]  [,3]  [,4]
 [1,] 1   0.25  0.2  0
 [2,] 0.2  0.33  0.2  1
 [3,] 1  0.33  0.08  0

 Apologies, I know this is very complex. Any help, even just some pointers on
 how to write a general function where values are separated by a comma, is
 realy very much appreciated!

 Thank you

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function using values separated by a comma

2010-10-08 Thread Gabor Grothendieck
On Fri, Oct 8, 2010 at 1:19 AM, burgundy saub...@yahoo.com wrote:

 Hello,

 I have a dataframe (tab separated file) which looks like the example below -
 two values separated by a comma, and tab separation between each of these.

     [,1]  [,2]  [,3]  [ ,4]
 [1,] 0,1  1,3   40,10  0,0
 [2,] 20,5  4,2  10,40  10,0
 [3,] 0,11  1,2  120,10  0,0

 I would like to calculate the percentage of the smallest number separated by
 the comma by:
 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
 = 0.8
 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave
 value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2

 plan to generate file like:

    [,1]  [,2]  [,3]  [,4]
 [1,] 1   0.25  0.2  0
 [2,] 0.2  0.33  0.2  1
 [3,] 1  0.33  0.08  0

Try using gsubfn in gsubfn (http://gsubfn.googlecode.com).  Using that
match a regular expression consisting of digits, a comma and digits
capturing the two strings of digits and passing them to function f
replacing the expression with the output of f.  Then read the
resulting text into a data frame.

library(gsubfn)
L - c( 0,1  1,3   40,10  0,0,  20,5  4,2  10,40  10,0,
0,11  1,2  120,10  0,0)

f - function(a, b) { x - as.numeric(c(a, b)); min(x)/sum(x) }
L2 - gsubfn((\\d+),(\\d+), f, L)

DF - read.table(textConnection(L2))

which gives:

 DF
   V1V2 V3  V4
1 0.0 0.250 0.2000 NaN
2 0.2 0.333 0.2000   0
3 0.0 0.333 0.07692308 NaN

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function using values separated by a comma

2010-10-08 Thread Gabor Grothendieck
On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, Oct 8, 2010 at 1:19 AM, burgundy saub...@yahoo.com wrote:

 Hello,

 I have a dataframe (tab separated file) which looks like the example below -
 two values separated by a comma, and tab separation between each of these.

     [,1]  [,2]  [,3]  [ ,4]
 [1,] 0,1  1,3   40,10  0,0
 [2,] 20,5  4,2  10,40  10,0
 [3,] 0,11  1,2  120,10  0,0

 I would like to calculate the percentage of the smallest number separated by
 the comma by:
 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
 = 0.8
 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave
 value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2

 plan to generate file like:

    [,1]  [,2]  [,3]  [,4]
 [1,] 1   0.25  0.2  0
 [2,] 0.2  0.33  0.2  1
 [3,] 1  0.33  0.08  0

 Try using gsubfn in gsubfn (http://gsubfn.googlecode.com).  Using that
 match a regular expression consisting of digits, a comma and digits
 capturing the two strings of digits and passing them to function f
 replacing the expression with the output of f.  Then read the
 resulting text into a data frame.

 library(gsubfn)
 L - c( 0,1  1,3   40,10  0,0,  20,5  4,2  10,40  10,0,
    0,11  1,2  120,10  0,0)

 f - function(a, b) { x - as.numeric(c(a, b)); min(x)/sum(x) }
 L2 - gsubfn((\\d+),(\\d+), f, L)

 DF - read.table(textConnection(L2))

 which gives:

 DF
   V1        V2         V3  V4
 1 0.0 0.250 0.2000 NaN
 2 0.2 0.333 0.2000   0
 3 0.0 0.333 0.07692308 NaN

A further simplification would be to use strapply from the same
package. It eliminates the need for read.table at the end:

 strapply(L, (\\d+),(\\d+), f, simplify = rbind)
 [,1]  [,2]   [,3] [,4]
[1,]  0.0 0.250 0.2000  NaN
[2,]  0.2 0.333 0.20000
[3,]  0.0 0.333 0.07692308  NaN

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function using values separated by a comma

2010-10-08 Thread Jeffrey Spies
Here's another method without using any external regular expression libraries:

dat - read.table(tc - textConnection(
'0,1 1,3 40,10 0,0
20,5 4,2 10,40 10,0
0,11 1,2 120,10 0,0'), sep=)

mat - apply(dat, c(1,2), function(x){
temp - as.numeric(unlist(strsplit(x, ',')))
min(temp)/sum(temp)
})

For mat[2,4], I get 0 (as did the other solutions), and you get 1, so
check on that. If you want the divide-by-0 NaNs to be 0, you can check
that by replacing

min(temp)/sum(temp)

with:

ifelse(is.nan(val-min(temp)/sum(temp)), 0, val)

This has an advantage over:

mat[is.na(mat)] - 0

in that you might have true missingness in your data and is.na won't
be able to distinguish it.

Cheers,

Jeff.

On Fri, Oct 8, 2010 at 1:19 AM, burgundy saub...@yahoo.com wrote:

 Hello,

 I have a dataframe (tab separated file) which looks like the example below -
 two values separated by a comma, and tab separation between each of these.

     [,1]  [,2]  [,3]  [ ,4]
 [1,] 0,1  1,3   40,10  0,0
 [2,] 20,5  4,2  10,40  10,0
 [3,] 0,11  1,2  120,10  0,0

 I would like to calculate the percentage of the smallest number separated by
 the comma by:
 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
 = 0.8
 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave
 value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2

 plan to generate file like:

    [,1]  [,2]  [,3]  [,4]
 [1,] 1   0.25  0.2  0
 [2,] 0.2  0.33  0.2  1
 [3,] 1  0.33  0.08  0

 Apologies, I know this is very complex. Any help, even just some pointers on
 how to write a general function where values are separated by a comma, is
 realy very much appreciated!

 Thank you

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.