Re: [R] Matrix oriented computing
On Fri, 26 Aug 2005 14:44:10 +0200 Sigbert Klinke wrote: Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? You could use sapply(): m - sapply(x, function(x) qchisq(x, df)) hth, Z Thanks Sigbert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On Fri, 2005-08-26 at 14:44 +0200, Sigbert Klinke wrote: Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? Thanks Sigbert See ?sapply x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
I believe that the following is what you want: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) dof - 1:100 ans - outer(x, dof, qchisq) dimnames(ans) - list(x, dof) Note that 'df' is not a very auspicious name for an object since it is the name of a function. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Sigbert Klinke wrote: Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? Thanks Sigbert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Try profiling. Doing this many times to get an overview, e.g. for sapply with df=1:1000: % self% total self secondstotalsecondsname 98.26 6.78 98.26 6.78 FUN 0.58 0.04 0.58 0.04 unlist 0.29 0.02 0.87 0.06 as.vector 0.29 0.02 0.58 0.04 names- 0.29 0.02 0.29 0.02 names-.default 0.29 0.02 0.29 0.02 names so almost all the time is in qchisq. On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote: On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Definitely not for such small matrices. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Dear Mark, For that matter, the loop isn't a whole a slower (on my 3GHz Win XP system): x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - 1:1000 system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.08 0.00 0.08 NA NA mat - matrix(0, 1000, 11) system.time(for (i in 1:length(df)) mat[i,] - qchisq(x, df[i])) [1] 0.09 0.00 0.10 NA NA Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz (via MN) Sent: Friday, August 26, 2005 10:26 AM To: Peter Dalgaard Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Matrix oriented computing On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Prof. Ripley, Excellent point. Neither sapply() nor outer() are the elephant in the room in this situation. On Fri, 2005-08-26 at 16:55 +0100, Prof Brian Ripley wrote: Try profiling. Doing this many times to get an overview, e.g. for sapply with df=1:1000: % self% total self secondstotalsecondsname 98.26 6.78 98.26 6.78 FUN 0.58 0.04 0.58 0.04 unlist 0.29 0.02 0.87 0.06 as.vector 0.29 0.02 0.58 0.04 names- 0.29 0.02 0.29 0.02 names-.default 0.29 0.02 0.29 0.02 names so almost all the time is in qchisq. On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote: On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Definitely not for such small matrices. True and both are C functions, which of course helps as well. Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On 8/26/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. You might need to do it repeatedly to get a more reliable reading. When I do it 1000 times outer is faster than sapply though not by much: n - 1000 system.time(for (i in 1:n) mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 14.05 0.00 14.43NANA system.time(for(i in 1:n) mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 13.42 0.00 13.85NANA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html