Re: [R] Parsing JSON records to a dataframe
Jeroen Ooms wrote: What is the most efficient method of parsing a dataframe-like structure that has been json encoded in record-based format rather than vector based. For example a structure like this: [ {name:joe, gender:male, age:41}, {name:anna, gender:female, age:23} ] RJSONIO parses this as a list of lists, which I would then have to apply as.data.frame to and append them to an existing dataframe, which is terribly slow. unlist is pretty fast. The solution below assumes that you know how your structure is, so it is not very flexible, but it should show you that the conversion to data.frame is not the bottleneck. # json library(RJSONIO) # [ {name:joe, gender:male, age:41}, # {name:anna, gender:female, age:23} ] n = 30 d = data.frame(name=rep(c(joe,anna),n), gender=rep(c(male,female),n), age = rep(c(23,41),n)) dj = toJSON(d) system.time(d1 - fromJSON(dj)) # user system elapsed # 4.060.264.32 system.time( dd - data.frame( name = unlist(d1$name), gender = unlist(d1$gender), age=as.numeric(unlist(d1$age))) ) # user system elapsed # 1.130.051.18 -- View this message in context: http://r.789695.n4.nabble.com/Parsing-JSON-records-to-a-dataframe-tp3178646p3178753.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
?read.table says ‘file’ can also be a complete URL. This is implemented by url(): see the section on URLs on its help page. You haven't followed the posting guide and told us your OS, and what the section says does depend on the OS. On Thu, 6 Jan 2011, John Kane wrote: # Can anyone suggest why this works datafilename - http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data; person.data - read.table(datafilename,header=TRUE) # but this does not? dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt; treedata - read.table(dd, header=TRUE) === Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : unsupported URL scheme # I can access both through a hyperlink in OOO Calc. t # Thanks -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R packages for R 2.11.1
On Thu, 6 Jan 2011, Joshua Wiley wrote: I would try using the R 2.12.1 packages first, but if that does not On 32-bit Windows this will not work if compiled code is involved: both the compiler and the package layout changed at 2.12.0. However, binary packages for 2.11.x are still being run through the autobuiilder and are available on CRAN if they were built successfully (increasingly many are not). See the summary table at http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html Nevertheless (as the posting guide makes clear) the R developers do not support obsolete versions of R, so you are advised to update to R 2.12.1. work, then you can go here: http://cran.r-project.org/src/contrib/Archive/ to get older versions of the tar balls. I think you might have to build them yourself. I kind of doubt anyone is keeping entire duplicates of old CRAN packages in all forms. This is not that difficult though. If the packages you want to install do not have other code, I believe you can actually install them without any additional software. If there is compiled code (e.g., C or C++), then I think you'll need RTools as well as a compatible compiler. See the installation manual for details: http://cran.r-project.org/doc/manuals/R-admin.html Cheers, Josh On Thu, Jan 6, 2011 at 8:48 PM, Raji raji.sanka...@gmail.com wrote: Hi , I am using R 2.11.1 . I need to download few packages for the same for Windows.But in CRAN i see the latest packages for R 2.12.1 only. Can you help me out with the locations where i can find the packages for R 2.11.1 Windows zip? -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
John Kane-2 wrote: # Can anyone suggest why this works datafilename - http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data; person.data - read.table(datafilename,header=TRUE) # but this does not? dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt; treedata - read.table(dd, header=TRUE) === Error in file(file, rt) : cannot open the connection Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter library(RCurl) sessionInfo() dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png; x = getBinaryURL(dd) - sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RCurl_1.5-0.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.12.1 dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png; x = getBinaryURL(dd) Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed -- View this message in context: http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3178773.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with rJava
On Thu, 6 Jan 2011, karamoo wrote: Hi All, and Heberto, Did you ever resolve your installation problem with rJava? I have a new windows 7 machine and can't seem to get it installed correctly. I do have Java installed. I download rJava without proble, then: As you didn't follow the posting guide, we don't know if this is 32- or 64-bit R on 32- or 64-bit Windows. If you are running 64-bit R you need 64-bit Java, and similarly for 32-bit. The message you show indicates that you do not have the appropriate Java installed. As I have pointed out already this week, rJava has its own mailing list, so please follow up there. install.packages(rJava) Installing package(s) into ‘C:\Users\Patrick\Documents/R/win-library/2.12’ (as ‘lib’ is unspecified) trying URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.12/rJava_0.8-8.zip' Content type 'application/zip' length 637125 bytes (622 Kb) opened URL downloaded 622 Kb The downloaded packages are in C:\Users\Patrick\AppData\Local\Temp\Rtmp9zcM7g\downloaded_packages Looks like it downloaded fine to my eyes. But in the next step I get an error: library(rJava) Error in utils::readRegistry(key, HLM, 2) : Registry key 'Software\JavaSoft\Java Runtime Environment' not found Error in utils::readRegistry(key, HLM, 2) : Registry key 'Software\JavaSoft\Java Development Kit' not found Error : .onLoad failed in loadNamespace() for 'rJava', details: call: fun(...) error: JAVA_HOME cannot be found from the Registry Error: package/namespace load failed for 'rJava' I'm not sure how to check where R is looking for rJava vs where it lives, this seems to be an issue in other posts, although they have different errors. I'm a novice still at R. I see a lot of posts on this topic, but few are recent. Help would be much appreciated. Thanks! Kara Moore Evolution and Ecology University of California, Davis, USA -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/06/2011 11:57 PM, (Ted Harding) wrote: On 06-Jan-11 22:16:38, array chip wrote: Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John The basic information you need can be got using rle() (run length encoding). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F - 0, T - 1 The following has a somewhat twisted logic at the end, and may be flawed, but you can probably adapt it! L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of T (i.e. 1) ## starts A different approach would be to use the diff() function: Where diff(dat$a) [1] 0 1 0 0 0 -1 0 1 0 -1 1 0 0 0 -1 0 0 0 1 is not equal 0, the value is changing from 0 to 1 or one to 0. The indices of the first new value can be found by: which(diff(dat$a)!=0) + 1 [1] 3 7 9 11 12 16 20 where it is changing from 0 to 1 is at which(diff(dat$a)==1) + 1 [1] 3 9 12 20 where it is changing from 1 to 0 is at which(diff(dat$a)==-1) + 1 [1] 7 11 16 By taking into consideration if the first value and the last values are 0 or 1, you can calculate the length. Cheers, Rainer Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 06-Jan-11 Time: 22:57:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Tel:+33 - (0)9 53 10 27 44 Cell: +27 - (0)8 39 47 90 42 Fax (SA): +27 - (0)8 65 16 27 82 Fax (D) : +49 - (0)3 21 21 25 22 44 Fax (FR): +33 - (0)9 58 10 27 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0m1dMACgkQoYgNqgF2egoQbACcCB3iFQ6SKYfL4KVX8AMAN9Gp 1awAn0Z+8KXnOmwCLu61gihc8xZIT++j =O+xA -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating a Matrix from a vector with some conditions
On Thu, Jan 06, 2011 at 01:34:31PM -0800, ADias wrote: Hi Suppose we have an object with strings: A-c(a,b,c,d) Now I do: B-matrix(A,4,4, byrow=F) and I get a a a a b b b b c c c c d d d d But what I really want is: a b c d b c d a c d a b d a b c How can I do this? Try the following A - c(a,b,c,d) B - matrix(A, 5, 4)[1:4, ] # [,1] [,2] [,3] [,4] #[1,] a b c d #[2,] b c d a #[3,] c d a b #[4,] d a b c Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with IF operator
On Thu, Jan 06, 2011 at 12:21:33PM -0800, ADias wrote: Hi, I am with a problem on how to do a comparison of values. My script is as follows: repeat{ cat(How many teams to use? (to end write 0) ) nro-scan(n=1) if(nro==0)break cat(write the, nro, teams names \n) teams-readLines(n=nro) if (teams[1]==teams[2)next else print(teams) } On this example I only compare teams 1 name with teams 2 name, and if they are the same the scrip starts again. If I had 10 teams how could I make it compare the nro number of teams names in order to check if the same name has been written more then once? The idea is, if the same name is written more then once it should give an error and start the scrip again by asking the teams names again. Two more things: With the next function the script stats from top, I mean starts by asking the number of teams to use. Can I make it that it goes directly to asking teams names? Consider also using readline(), which reads a single line, and %in% operator to compare the new name to the previous ones immediately. nro - as.numeric(readline(no of teams )) teams - rep(NA, times=nro) for (i in seq(length=nro)) { repeat { current - readline(paste(team, i, )) if (current %in% teams) { cat(error - repeated name\n) } else { break } } teams[i] - current } Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting Fortran or C++ etc to R
I will wind this thread up with some happy comments. I have indeed succeeded in constructing an R program to do the same thing as my Fortran program for an EM algorithm. I have not done timings yet but it seems to run acceptably fast for my purposes. The key code to be replaced was the E and the M steps of the algorithm. I decided to try to replace all the loops with matrix operations such as %*%, t(), crossprod(), tcrossprod(). Other operations that I used were of the form A + v where dim(A) = c(a, b) and length(v) = a. Here the vector v operates term by term down columns, recycling for each new column. [ *, - and / also work similarly.] I was relived that matrices were as far as I needed to go, and I had visions of having to use tensor products of higher dimensioned arrays. Fortunately it did not come to that. I didn't actually translate from F to R. The original is itself a translation of my underlying maths, and it was easier to translate the maths into R directly. I preserved the form of my Fortran input and output files so that I will be able to run either version on the same files. As I mentioned earlier the main point of doing all this is so that I may try out some variants of the program. I expect this will be much easier to do in R! Thanks to all who replied. Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nzFax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441 Mobile 021 0200 8350 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stepwise SVM Variable selection
On 06/01/11 23:10:59, Noah Silverman wrote: I have a data set with about 30,000 training cases and 103 variable. I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. The accuracy isn't great. I used a grid search over the C and G parameters with an RBF kernel to find the best settings. [...] Can anyone suggest an approach to seek the ideal subset of variables for my SVM classifier? The standard feature selection stuff (backward/forward etc.) is probably ruled out by the time it takes to compute all the sets and subsets. What you could try is the following: First, do a cross-validation setup: split up your data set into a training and testing set (ratio 0.9 / 0.1 or so). Second, train your SVM on the training set (try conservative parameters first). Third, have your trained SVM classify the test set and compute the classification error. Fourth, iterate over all variables and do the following: a) choose one variable and permute its values (only) in the test set b) have your trained SVM (from step 2) classify this test set and measure the classification error c) repeat a) and b) a (high) number of times to be significant d) go to next variable Fifth, you can get an impression of the importance that one variable has by comparing the errors generated on the permuted test set for each variable with the non-permuted test set classification error. If the permutation of one variable drastically increases the classification error, the variable is probably important. Sixth: repeat the cross-validation / random sampling a number of times to be significant. This is more like an ad-hoc approach and there are some pitfalls, but the idea is easily explained and can also be carried over to any other regression model with cross-validation. The computational burden in SVM is assumed to be the training and not the prediction step and you only need a relatively low number of training runs (sixth step) here. Regards, Georg. -- Research Assistant Otto-von-Guericke-Universität Magdeburg resea...@georgruss.de http://research.georgruss.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Calculating Returns : (Extremely sorry for earlier incomplete mail)
Hi Your code is quite complicated and I get an error spot_returns_table - lapply(1:nrow(trans), function(z) with(trans[z, ], spot_trans(currency_trans=trans$currency_transacted))) Error in if (currency_trans == USD) { : argument is of length zero It seems to me that you do not know what is your code doing. The warnings are from the fact that the currency_trans value you feed to spot_trans function is longer than one and if function needs an input of only one logical value. Maybe you could use debug and see what are values of your variables during computation but I believe that better is to use more convenient input objects together with *apply or aggregate or basic math could be better solution. rate1 USDGBP EURO CHFAUD 1 112.05 171.52 42.71 41.50 109.55 2 112.90 168.27 42.68 41.47 102.52 3 110.85 169.03 41.86 42.84 114.91 4 109.63 169.64 44.71 43.44 122.48 5 108.08 169.29 44.14 43.69 122.12 6 111.23 169.47 44.58 42.30 123.96 7 112.49 170.90 41.07 42.05 100.36 8 108.87 168.69 42.23 41.23 110.19 9 109.33 170.90 44.55 42.76 121.58 10 111.88 169.96 41.12 43.79 103.46 log(rate1[-1,]/rate1[-nrow(rate1),]) Is this what you want? Regards Petr r-help-boun...@r-project.org napsal dne 07.01.2011 07:36:28: Dear R forum helpers, I am extremely sorry for the receipt of my incomplete mail yesterday. There was connectivity problem at my end and so I chose to send the mail through my cell, only to realize today about the way mail has been transmitted. I am again sending my complete mail through regular channel and sincerely apologize for the inconvenience caused. ## Here is my actual mail Dear R forum helpers, I have following data trans - data.frame(currency = c(EURO, USD, USD, GBP, USD, AUD), position_amt = c(1, 25000, 2, 15000, 22000, 3)) date - c(12/31/2010, 12/30/2010, 12/29/2010, 12/28/2010, 12/27/ 2010, 12/24/2010, 12/23/2010, 12/22/2010, 12/21/2010, 12/20/2010) USD - c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 109.33, 111.88) GBP - c(171.52, 168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 170.9, 169.96) EURO - c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 41.12) CHF - c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 43.79) AUD - c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 121. 58, 103.46) These are the exchange rates and I am trying calculating the returns. I am giving only a small portion of actual data as I can't send this as an attachment. I am using function as I want to generalize the code for any portfolio. # __ # My Code trans- read.table('transactions.csv', header=TRUE, sep=,, na.strings=NA, dec=., strip.white=TRUE) # reading as table. #currency - read.table('currency.csv') #date - currency$date #USD = currency$USD #GBP = currency$GBP #EURO = currency$EURO #CHF = currency$CHF #AUD = currency$AUD # _ # CREATION of Function. I am using function as no of transactions is not constant. spot_trans = function(currency_trans) { if (currency_trans == USD) {rate = USD} # So if I am dealing with TRANSACTION USD, I am selecting the USD exchange rates. if (currency_trans == GBP) {rate = GBP} if (currency_trans == EURO) {rate = EURO} if (currency_trans == CHF) {rate = CHF} if (currency_trans == AUD) {rate = AUD} # # CURRENCY Rate RETURNS i.e. lob(todays rate / yesterday rate) and the data is in descending Date order currency_rate_returns = NULL for (i in 1:(length(rate)-1)) # if there are 10 elements, total no of returns = 9 { currency_rate_returns[i] = log(rate[i]/rate[i+1]) } currency_rate_returns return(data.frame(returns = currency_rate_returns)) } # ___ spot_returns_table - lapply(1:nrow(trans), function(z) with(trans[z, ], spot_trans(currency_trans=trans$currency_transacted))) spot_returns_table This generates the output as given below with 30 warnings. Also, as there are six transactions, 6 outputs are generated but the output in all pertains only to the first transacations i.e. 6 times returns are generated for the first transaction EURO warnings() Warning messages: 1: In if (currency_trans == USD) { ... : the condition has length 1 and only the first element will be used 2: In if (currency_trans == GBP) { ... : the condition has length 1 and only the first element will be used 3: In if (currency_trans == EURO) { ... : the condition has length 1 and only the first element will be used and so on The output is as given below. spot_returns_table [[1]] spot_returns 1 0.0007026584 2 0.0193997094 3
Re: [R] R packages for R 2.11.1
On 07.01.2011 09:21, Prof Brian Ripley wrote: On Thu, 6 Jan 2011, Joshua Wiley wrote: I would try using the R 2.12.1 packages first, but if that does not On 32-bit Windows this will not work if compiled code is involved: both the compiler and the package layout changed at 2.12.0. However, binary packages for 2.11.x are still being run through the autobuiilder and are available on CRAN if they were built successfully (increasingly many are not). See the summary table at http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html Right, and if they were not built successfully, then the latest version that passed the checks is still available there (which is the one you want to use anyway then). Let me add that just typing install.packages(packagename) should do the trick and will use the 2.11 package repository at your-CRAN-mirror/bin/windows/contrib/2.11 Best, Uwe Nevertheless (as the posting guide makes clear) the R developers do not support obsolete versions of R, so you are advised to update to R 2.12.1. work, then you can go here: http://cran.r-project.org/src/contrib/Archive/ to get older versions of the tar balls. I think you might have to build them yourself. I kind of doubt anyone is keeping entire duplicates of old CRAN packages in all forms. This is not that difficult though. If the packages you want to install do not have other code, I believe you can actually install them without any additional software. If there is compiled code (e.g., C or C++), then I think you'll need RTools as well as a compatible compiler. See the installation manual for details: http://cran.r-project.org/doc/manuals/R-admin.html Cheers, Josh On Thu, Jan 6, 2011 at 8:48 PM, Raji raji.sanka...@gmail.com wrote: Hi , I am using R 2.11.1 . I need to download few packages for the same for Windows.But in CRAN i see the latest packages for R 2.12.1 only. Can you help me out with the locations where i can find the packages for R 2.11.1 Windows zip? -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Match numeric vector against rows in a matrix?
On Wed, Jan 05, 2011 at 07:16:47PM +, Kevin Ummel wrote: Two posts in one day is not a good day...and this question seems like it should have an obvious answer: I have a matrix where rows are unique combinations of 1's and 0's: combs=as.matrix(expand.grid(c(0,1),c(0,1))) combs Var1 Var2 [1,]00 [2,]10 [3,]01 [4,]11 I want a single function that will give the row index containing an exact match with vector x: x=c(0,1) The solution needs to be applied many times, so I need something quick -- I was hoping a base function would do it, but I'm drawing a blank. If the matrix can have different number of columns, then also the following can be used combs - as.matrix(expand.grid(c(0,1),c(0,1),c(0,1))) x - c(0,1,1) which(rowSums(combs != rep(x, each=nrow(combs))) == 0) # [1] 7 Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross validation for Ordinary Kriging
Pearl, The error suggests that there is something wrong with x2, and that there is a difference between the row names of the coordinates and the data. If you call str(x2) see if the first element of @coords is different from NULL, as this can cause some problems when cross-validating. If it is, try to figure out why. You can also set the row.names equal to NULL directly: row.names(x...@coords) = NULL although I dont think such manipulation of the slots of an object is usually recommended. Cheers, Jon BTW, you will usually get more response to questions about spatial data handling using the list r-sig-geo (https://stat.ethz.ch/mailman/listinfo/r-sig-geo) On 1/6/2011 4:00 PM, pearl may dela cruz wrote: ear ALL, The last part of my thesis analysis is the cross validation. Right now I am having difficulty using the cross validation of gstat. Below are my commands with the tsport_ace as the variable: nfold- 3 part- sample(1:nfold, 69, replace = TRUE) sel- (part != 1) m.model- x2[sel, ] m.valid- x2[-sel, ] t- fit.variogram(v,vgm(0.0437, Exp, 26, 0)) cv69- krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2)) The last line gives an error saying: Error in SpatialPointsDataFrame(coordinates(data), data.frame(matrix(as.numeric(NA), : row.names of data and coords do not match I don't know what is wrong. The x2 data is a SpatialPointsdataframe that is why i did not specify the location (as it will take it from the data). Here is the usage of the function krige.cv: krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = Inf, nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, ...) I hope you can help me on this. Thanks a lot. Best regards, Pearl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weighed mean of a data frame row-by-row
Thanks for the help guys! For my purpose I think that rharlow2's answer, i.e. the `rowSums' function is the most appropriate since it also takes care of the NAs. Best, Vassilis -- View this message in context: http://r.789695.n4.nabble.com/weighed-mean-of-a-data-frame-row-by-row-tp3177421p3178912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
Date: Fri, 7 Jan 2011 00:24:19 -0800 From: dieter.me...@menne-biomed.de To: r-help@r-project.org Subject: Re: [R] Accessing data via url John Kane-2 wrote: # Can anyone suggest why this works datafilename - http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data; person.data - read.table(datafilename,header=TRUE) # but this does not? dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt; treedata - read.table(dd, header=TRUE) === Error in file(file, rt) : cannot open the connection Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter library(RCurl) sessionInfo() dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png; x = getBinaryURL(dd) - sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RCurl_1.5-0.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.12.1 dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png; x = getBinaryURL(dd) Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed I think I replied to OP only using wget but puresumably there is similar option for rcurl as -k on cmd line version. Network IO is unpredictable, you really can use a few external tools from time to time. $ wget -O xxx -S -v --no-check-certificate --user-agent=Mozilla5.0 http://si tes.google.com/site/jrkrideau/home/general-stores/trees.txt --2011-01-06 16:00:01-- http://sites.google.com/site/jrkrideau/home/general-sto res/trees.txt Resolving sites.google.com (sites.google.com)... 74.125.229.3, 74.125.229.5, 74. 125.229.13, ... Connecting to sites.google.com (sites.google.com)|74.125.229.3|:80... connected. HTTP request sent, awaiting response... HTTP/1.0 404 Not Found Content-Type: text/html; charset=utf-8 Date: Thu, 06 Jan 2011 22:00:05 GMT Expires: Thu, 06 Jan 2011 22:00:05 GMT Cache-Control: private, max-age=0 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Server: GSE 2011-01-06 16:00:01 ERROR 404: Not Found. $ wget -O xxx -S -v --no-check-certificate --user-agent=Mozilla5.0 http://si tes.google.com/site/jrkrideau/home/general-stores/history.png --2011-01-07 05:43:00-- http://sites.google.com/site/jrkrideau/home/general-sto res/history.png Resolving sites.google.com (sites.google.com)... 74.125.229.11, 74.125.229.6, 74 .125.229.14, ... Connecting to sites.google.com (sites.google.com)|74.125.229.11|:80... connected . HTTP request sent, awaiting response... HTTP/1.0 200 OK Content-Type: image/png X-Robots-Tag: noarchive Cache-Control: no-cache, no-store, max-age=0, must-revalidate Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Date: Fri, 07 Jan 2011 11:43:04 GMT Last-Modified: Wed, 28 Oct 2009 18:58:56 GMT ETag: 1256756336889 Content-Length: 3817 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Server: GSE Connection: Keep-Alive Length: 3817 (3.7K) [image/png] Saving to: `xxx' 100%[==] 3,817 --.-K/s in 0s 2011-01-07 05:43:00 (30.8 MB/s) - `xxx' saved [3817/3817] $ $ curl -o xxx -k http://sites.google.com/site/jrkrideau/home/general-stores/hi story.png % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3817 100 3817 0 0 28916 0 --:--:-- --:--:-- --:--:-- 40606 $ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
With the ssl.verifypeer = FALSE argument it works: x = getBinaryURL(dd, ssl.verifypeer = FALSE) On Fri, Jan 7, 2011 at 6:24 AM, Dieter Menne dieter.me...@menne-biomed.dewrote: John Kane-2 wrote: # Can anyone suggest why this works datafilename - http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data; person.data - read.table(datafilename,header=TRUE) # but this does not? dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt; treedata - read.table(dd, header=TRUE) === Error in file(file, rt) : cannot open the connection Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter library(RCurl) sessionInfo() dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png; x = getBinaryURL(dd) - sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RCurl_1.5-0.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.12.1 dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png x = getBinaryURL(dd) Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed -- View this message in context: http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3178773.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting user specified variables from data frame to use as function arguments
Hello All; I am writing an R program in the form of a function to find out the summary statistics for response variable(s) in a data frame. I need to accept variable names from the data as arguments and use them inside the function for various tasks. The dataset name is also one of the arguments to the program. The tasks include merging two datasets, transposing a dataset or finding the summary stats of response variable(s) using the input variables as the 'by' variables. The variables specified as argument for the function are taken to be character strings and I have not been able to use these to extract the corresponding variables. As an example, suppose I have an argument 'merge_by' which is used to specify the variables used to merge two datasets. Typically, the input is an period (with the double quotes) but could be more or less variables than that. How do I input the user specified variables and use them for tasks like merging, transposing etc.? A similar problem is with specifying the user defined by variables in summaryBy. I have played around with substitute, parse, eval etc. but without success. Any suggestions? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to join matrices of different row length from a list
Dear Emma, there is a 'cbind.na', 'rbind.na' and 'data.frame.na' function in my qpcR package. library(qpcR) matLis - list(matrix(1:4, 2, 2), matrix(1:6, 3, 2), matrix(2:1, 1, 2)) do.call(cbind.na, matLis) They are essentially the generic functions extended with an internal fill. You might also want to try these examples: ## binding cbind.na(1, 1:7) # the '1' (= shorter vector) is NOT recycled but filled cbind.na(1:8, 1:7, 1:5, 1:10) # same with many vectors rbind.na(1:8, 1:7, 1:5, 1:10) # or in rows a - matrix(rnorm(20), ncol = 4) # unequal size matrices b - matrix(rnorm(20), ncol = 5) cbind.na(a, b) # works, in contrast to original cbind rbind.na(a, b) # works, in contrast to original rbind ## data frame with unequal size vectors data.frame.na(A = 1:7, B = 1:5, C = letters[1:3], D = factor(c(1, 1, 2, 2))) ## convert a list with unequal length list items ## to a data frame z - list(a = 1:5, b = letters[1:3], c = matrix(rnorm(20), ncol = 2)) do.call(data.frame.na, z) -- View this message in context: http://r.789695.n4.nabble.com/How-to-join-matrices-of-different-row-length-from-a-list-tp3177212p3178991.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
Date: Thu, 6 Jan 2011 23:06:44 -0800 From: peter.langfel...@gmail.com To: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. You see this everywhere in one form or another from medical to financial modelling. My solution here is simply to publish more raw data in a computer readable form, in this case of course something easy to get with R, so disinterested or adversarial parties can run their own analysis. I think there was also a push to create a data base for failed drug trials that may contain data of some value later. The value of R with easily available data for a large cross section of users could be to moderate problems like the one cited here. I almost slammed a poster here earlier who wanted a simple rule for when do I use this test with something like when your mom tells you to since post hoc you do just about everything to assume you messed up and missed something but a priori you hope you have designed a good hypothesis. And at the end of the day, a given p-value is one piece of evidence in the overall objective of learning about some system, not appeasing a sponsor. Personally I'm a big fan of post hoc analysis on biotech data in some cases, especially as more pathway or other theory is published, but it is easy to become deluded if you have a conclusion that you know JUST HAS TO BE RIGHT. Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the data I've been able to get tends to make me side with the FDA and I still hate the idea of any regulation or access restrictions but it seems to be the only way to keep sponsors honest to any extent. Your mileage may vary however, take a look at some rather loud disagreement with FDA over earlier DNDN panel results, possibly involving threats against critics. LOL. That being said, it would seem that biomedical sciences do make progress, so some of the published results are presumably correct :) Peter On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves wrote: Part of the phenomenon can be explained by the natural censorship in what is accepted for publication: Stronger results tend to have less difficulty getting published. Therefore, given that a result is published, it is evident that the estimated magnitude of the effect is in average larger than it is in reality, just by the fact that weaker results are less likely to be published. A study of the literature on this subject might yield an interesting and valuable estimate of the magnitude of this selection bias. A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research (http://portal.unesco.org/education/en/ev.php-URL_ID=21052URL_DO=DO_TOPICURL_SECTION=201.html). For example, I've heard claims (which I cannot substantiate right now) that cell phone companies allegedly lobbied successfully to block funding for researchers they thought were likely to document health problems with their products. Related claims have been made by scientists in the US Food and Drug Administration that certain therapies were approved on political grounds in spite of substantive questions about the validity of the research backing the request for approval (e.g., www.naturalnews.com/025298_the_FDA_scientists.html). Some of these accusations of political corruption may be groundless. However, as private funding replaces tax money for basic science, we must expect an increase in research results that match the needs of the funding agency while degrading the quality of published research. This produces more research that can not be replicated -- effects that get smaller upon replication. (My wife and I routinely avoid certain therapies recommended by physicians, because the physicians get much of their information on recent drugs from the pharmaceuticals, who have a vested interest in presenting their products in the most positive light.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to export/save an mrpp object?
Nikos: I finally ran mrpp tests. I think all is fine but one very important issue: I have no idea how to export/save an mrpp object. Tried anything I know and searched the archives but found nothing. David W: And what happened when you tried what seems like the obvious: save(mrpp_obj, file=) # rm(list=ls() ) # Only uncomment if you are ready for your workspace to clear load(mrpp_store.Rdata) Right, clearing did the trick. Any ideas? Is really copy-pasting the mrpp results the only way? Many of us have no idea what such an object is, since you have not described the packages and functions used to create it. If you want an ASCII version then dput or dump are also available. Multiresponse Permuation Procedures (MRPP) is implemented in the vegan package. The function mrpp() returns (an object of class mrpp) something like: --%--- # check class class ( samples_bitemporal_modis.0001.mrpp ) [1] mrpp # check structure str ( samples_bitemporal_modis.0001.mrpp ) List of 12 $ call: language mrpp(dat = samples_bitemporal_modis.0001[, 1:5], grouping = samples_bitemporal_modis.0001[[Class]]) $ delta : num 0.126 $ E.delta : num 0.202 $ CS : logi NA $ n : Named int [1:5] 335 307 183 188 27 ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned ... $ classdelta : Named num [1:5] 0.1255 0.1045 0.1837 0.0981 0.1743 ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned ... $ Pvalue : num 0.001 $ A : num 0.378 $ distance: chr euclidean $ weight.type : num 1 $ boot.deltas : num [1:999] 0.202 0.202 0.202 0.203 0.202 ... $ permutations: num 999 - attr(*, class)= chr mrpp --%--- Now I've tried the following: --%--- # 1. save(d) it save ( samples_bitemporal_modis.0001.mrpp , file=exported.mrpp.R ) # 2. loade(d) it in a new object... loadedmrpp - load ( exported.mrpp.R) # 3. (tried) to check it... str ( exported.mrpp.R) chr samples_bitemporal_modis.0001.mrpp # it did not cross my mind immediately to... get(loadedmrpp) Call: mrpp(dat = samples_bitemporal_modis.0001[, 1:5], grouping = samples_bitemporal_modis.0001[[Class]]) Dissimilarity index: euclidean Weights for groups: n Class means and counts: Urban Vegetation Bare ground Burned Water delta 0.1255 0.1045 0.1837 0.0981 0.1743 n 335307183 18827 Chance corrected within-group agreement A: 0.3778 Based on observed delta 0.1258 and expected delta 0.2022 Significance of delta: 0.001 Based on 999 permutations # ...or to work on a clean workspace! --%--- Thank you David. Cheers, Nikos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Currency return calculations
Dear sir, I am extremely sorry for messing up the logic asking for help w.r.t. my earlier mails I have tried to explain below what I am looking for. I have a database (say, currency_rates) storing datewise currency exchange rates with some base currency XYZ. currency_rates - data.frame(date = c(12/31/2010, 12/30/2010, 12/29/2010, 12/28/2010, 12/27/2010,12/24/2010, 12/23/2010, 12/22/2010, 12/21/2010, 12/20/2010), USD = c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 109.33, 111.88), GBP = c(171.52, 168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 170.9, 169.96), EURO = c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 41.12), CHF = c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 43.79), AUD = c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 121.58, 103.46)) I have a portfolio consisting of some of these currencies. At this moment, suppose my portfolio has following currency transactions. i.e following is my current portfolio and has 2 USD transactions, 2 EURO transactions and a CHF transactions. portfolio_currency_names = c(USD, USD, EURO, CHF, EURO, USD) # My objective is AS PER THE PORTFOLIO, I need to generate a data.frame giving respective currency returns. Thus, I need to have an output like USD USD EURO CHF EURO USD -0.0076 -0.0076 0.0007 0.0007 0.0007 -0.0076 0.0183 0.0183 0.0194 -0.0325 0.0194 0.0183 0.0111 0.0111 -0.0659 -0.0139 -0.0659 0.0111 0.0142 0.0142 0.0128 -0.0057 0.0128 0.0142 -0.0287 -0.0287 -0.0099 0.0323 -0.0099 -0.0287 -0.0113 -0.0113 0.0820 0.0059 0.0820 -0.0113 0.0327 0.0327 -0.0279 0.0197 -0.0279 0.0327 -0.0042 -0.0042 -0.0535 -0.0364 -0.0535 -0.0042 -0.0231 -0.0231 0.0801 -0.0238 0.0801 -0.0231 Thus, my requirement is to have the dataframe as per the composition of my portfolio. Thus, if there are only 2 transactions i.e. if my portfolio contains say only CHF and AUD, I need the return calculations done only forCHF and AUD. CHF AUD 0.0007 0.0663 -0.0325 -0.1141 -0.0139 -0.0638 -0.0057 0.0029 0.0323 -0.0150 0.0059 0.2112 0.0197 -0.0934 -0.0364 -0.0984 -0.0238 0.1614 I once again apologize for not asking my requirement properly thereby causing not only inconvenience to all of you, but also wasting your valuable time. Its not that I wasn't careful while asking for guidance for my requirement, I wasn't clear about it. I am sorry for the same once again. I request you to please help me. Amelia Vettori [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Match numeric vector against rows in a matrix?
Try this: which(colSums(t(combs) == x) == ncol(combs)) On Wed, Jan 5, 2011 at 5:16 PM, Kevin Ummel kevinum...@gmail.com wrote: Two posts in one day is not a good day...and this question seems like it should have an obvious answer: I have a matrix where rows are unique combinations of 1's and 0's: combs=as.matrix(expand.grid(c(0,1),c(0,1))) combs Var1 Var2 [1,]00 [2,]10 [3,]01 [4,]11 I want a single function that will give the row index containing an exact match with vector x: x=c(0,1) The solution needs to be applied many times, so I need something quick -- I was hoping a base function would do it, but I'm drawing a blank. Thanks! Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Currency return calculations
Hi What is wrong with my suggestion then. rate1 USDGBP EURO CHFAUD 1 112.05 171.52 42.71 41.50 109.55 2 112.90 168.27 42.68 41.47 102.52 3 110.85 169.03 41.86 42.84 114.91 4 109.63 169.64 44.71 43.44 122.48 5 108.08 169.29 44.14 43.69 122.12 6 111.23 169.47 44.58 42.30 123.96 7 112.49 170.90 41.07 42.05 100.36 8 108.87 168.69 42.23 41.23 110.19 9 109.33 170.90 44.55 42.76 121.58 10 111.88 169.96 41.12 43.79 103.46 portfolio-c(USD, USD, CHF, AUD, USD) log(rate1[-1,portfolio]/rate1[-nrow(rate1),portfolio]) USDUSD.1 CHF AUDUSD.2 2 0.007557271 0.007557271 -0.000723153 -0.066323165 0.007557271 3 -0.018324535 -0.018324535 0.032501971 0.114091312 -0.018324535 4 -0.011066876 -0.011066876 0.013908430 0.063798538 -0.011066876 5 -0.014239366 -0.014239366 0.005738567 -0.002943583 -0.014239366 6 0.028728436 0.028728436 -0.032332157 0.014954765 0.028728436 7 0.011264199 0.011264199 -0.005927700 -0.211195211 0.011264199 8 -0.032709819 -0.032709819 -0.019693240 0.093442427 -0.032709819 9 0.004216322 0.004216322 0.036436939 0.098366334 0.004216322 10 0.023056037 0.023056037 0.023802395 -0.161387418 0.023056037 As I said instead fiddling with several loop/if/function/variables attempt it seems to me better to use powerful R indexing and whole object approach where it is possible. Regards Petr Amelia Vettori amelia_vett...@yahoo.co.nz napsal dne 07.01.2011 13:46:53: Dear sir, I am extremely sorry for messing up the logic asking for help w.r.t. my earlier mails I have tried to explain below what I am looking for. I have a database (say, currency_rates) storing datewise currency exchange rates with some base currency XYZ. currency_rates - data.frame(date = c(12/31/2010, 12/30/2010, 12/29/ 2010, 12/28/2010, 12/27/2010,12/24/2010, 12/23/2010, 12/22/2010, 12/21/2010, 12/20/2010), USD = c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 109.33, 111.88), GBP = c(171.52, 168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 170.9, 169.96), EURO = c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 41.12), CHF = c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 43.79), AUD = c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 121. 58, 103.46)) I have a portfolio consisting of some of these currencies. At this moment, suppose my portfolio has following currency transactions. i.e following is my current portfolio and has 2 USD transactions, 2 EURO transactions and a CHF transactions. portfolio_currency_names = c(USD, USD, EURO, CHF, EURO, USD) # My objective is AS PER THE PORTFOLIO, I need to generate a data.frame giving respective currency returns. Thus, I need to have an output like USDUSD EURO CHF EUROUSD -0.0076-0.0076 0.0007 0.0007 0. 0007-0.0076 0.01830.0183 0.0194 -0.0325 0. 0194 0.0183 0.01110.0111-0.0659 -0.0139 -0. 0659 0.0111 0.01420.0142 0.0128 -0.0057 0. 0128 0.0142 -0.0287-0.0287-0.0099 0.0323 -0. 0099-0.0287 -0.0113-0.0113 0.0820 0.0059 0. 0820-0.0113 0.03270.0327-0.0279 0.0197 -0. 0279 0.0327 -0.0042-0.0042-0.0535 -0.0364 -0. 0535-0.0042 -0.0231-0.0231 0.0801 -0.02380. 0801-0.0231 Thus, my requirement is to have the dataframe as per the composition of my portfolio. Thus, if there are only 2 transactions i.e. if my portfolio contains say only CHF and AUD, I need the return calculations done only forCHF and AUD. CHFAUD 0.0007 0.0663 -0.0325-0.1141 -0.0139-0.0638 -0.0057 0.0029 0.0323 -0.0150 0.0059 0.2112 0.0197 -0.0934 -0.0364-0.0984 -0.0238 0.1614 I once again apologize for not asking my requirement properly thereby causing not only inconvenience to all of you, but also wasting your valuable time. Its not that I wasn't careful while asking for guidance for my requirement, I wasn't clear about it. I am sorry for the same once again. I request you to please help me. Amelia Vettori __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a Cluster of Clusters
Hi Michael, I agree with you and I will make this ordination. But I also want to check a spatial correlation of the variables, so I thought that comparing the dendrogram of the environmental variables with the dendrogram of the geographical distances of the lakes it will indicates if similar lakes are next to each other. But I have just one geographical coordinate for each lake, but 12 measures of environmental variables. How can I analyse this? Thank you very much for the attention Diego PJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross validation for Ordinary Kriging
On 1/7/2011 12:40 PM, Jon Olav Skoien wrote: Pearl, The error suggests that there is something wrong with x2, and that there is a difference between the row names of the coordinates and the data. If you call str(x2) see if the first element of @coords is different from NULL, as this can cause some problems when cross-validating. If it is, try to figure out why. You can also set the row.names equal to NULL directly: row.names(x...@coords) = NULL although I dont think such manipulation of the slots of an object is usually recommended. Pearl, It seems the problem was caused by a recent change in sp without updating gstat, the maintainer has fixed it and submitted new version of gstat to CRAN. So you should be able to use your original script after downloading the new version, probably available in a couple of days. In the mean time the suggestion above should still work. Cheers, Jon Cheers, Jon BTW, you will usually get more response to questions about spatial data handling using the list r-sig-geo (https://stat.ethz.ch/mailman/listinfo/r-sig-geo) On 1/6/2011 4:00 PM, pearl may dela cruz wrote: ear ALL, The last part of my thesis analysis is the cross validation. Right now I am having difficulty using the cross validation of gstat. Below are my commands with the tsport_ace as the variable: nfold- 3 part- sample(1:nfold, 69, replace = TRUE) sel- (part != 1) m.model- x2[sel, ] m.valid- x2[-sel, ] t- fit.variogram(v,vgm(0.0437, Exp, 26, 0)) cv69- krige.cv(tsport_ace ~ 1, x2, t, nfold = nrow(x2)) The last line gives an error saying: Error in SpatialPointsDataFrame(coordinates(data), data.frame(matrix(as.numeric(NA), : row.names of data and coords do not match I don't know what is wrong. The x2 data is a SpatialPointsdataframe that is why i did not specify the location (as it will take it from the data). Here is the usage of the function krige.cv: krige.cv(formula, locations, data, model = NULL, beta = NULL, nmax = Inf, nmin = 0, maxdist = Inf, nfold = nrow(data), verbose = TRUE, ...) I hope you can help me on this. Thanks a lot. Best regards, Pearl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dont show zero values in line graph
On 07.01.2011 04:10, LCOG1 wrote: Hey everyone, Im getting better at plotting my data but cant for the life of me figure out how to show a line graph with missing data that doesnt continue the line down to zero then back up to the remaining values. Consider the following x-c(1:5,0,0,8:10) y-1:10 plot(0,0,xlim=c(0,10), ylim=c(0,10),type=n,main=Dont show the bloody 0 values!!) lines(x~y, col=blue, lwd=2,) My data is missing the 6th and 7th values and they come in as NA's so i change them to 0s but then the plot has these ugly lines that dive toward the x axis then back up. I would do bar plots but i need to show multiple sets of data on the same and side by side bars doesnt do it for me. So i need a line graph that starts and stops where 0s or missing values exist. Thoughts? If I understand correctly what you are going to do: Just do not change the NAs to zero in advance. NAs are not printed. Uwe Ligges JR __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction error for Ordinary Kriging
Pearl, You find the prediction error as the var1.var column in your result object, i.e., y in your script. For plotting: spplot(y, 2) or spplot(y,var1.var) Jon On 1/5/2011 9:28 PM, pearl may dela cruz wrote: Hi ALL, Can you please help me on how to determine the prediction error for ordinary kriging?Below are all the commands i used to generate the OK plot: rsa2- readShapeSpatial(residentialsa, CRS(+proj=tmerc +lat_0=39.66 +lon_0=-8.1319062 +k=1 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs)) x2- readShapeSpatial(ptna2, CRS(+proj=tmerc +lat_0=39.66 +lon_0=-8.1319062 +k=1 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs)) bb- bbox(rsa2) cs- c(1, 1) cc- bb[, 1] + (cs/2) cd- ceiling(diff(t(bb))/cs) rsa2_grd- GridTopology(cellcentre.offset = cc,cellsize = cs, cells.dim = cd) getClass(SpatialGrid) p4s- CRS(proj4string(rsa2)) x2_SG- SpatialGrid(rsa2_grd, proj4string = p4s) x2_SP- SpatialPoints(cbind(x2$X, x2$Y)) v- variogram(log1p(tsport_ace) ~ 1, x2, cutoff=100, width=9) te- fit.variogram(v,vgm(0.0437, Exp, 26, 0)) y- krige(tsport_ace~1, x2, x2_SG, model = ve.fit) spplot(y, 1, col.regions = bpy.colors(100), sp.layout = list(sp.lines,as(rsa2, SpatialLines),no.clip = TRUE)) I'm looking forward to your response. Thanks. Best regards, Pearl dela Cruz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cairo pdf canvas size
Thanks! On Thu, Jan 6, 2011 at 7:13 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: On Thu, Jan 6, 2011 at 5:36 AM, Eduardo de Oliveira Horta eduardo.oliveiraho...@gmail.com wrote: Peter, thank you, that's what I was looking for! David, I forgot to tell you my OS. Sorry... it's Win7. I'm running a RKWard session. And this is strange: Cairo(example.pdf, type=pdf,width=12,height=12,units=cm,dpi=300) Error: could not find function Cairo ... maybe you're not using the Cairo package? http://cran.r-project.org/web/packages/Cairo/Cairo.pdf And Dennis, thanks for the code. It worked, and I'm considering to adopt data frames in the near future. By the way, I'm working with functional time series, so each observation is a function (or a vector representing that function evaluated on a grid) indexed by time. Any insights on how to implement data frames here? I don't see a real issue. It would be easier to give you concrete information if there were an artificial example that mimics your situation, but it's not that hard. I'd suggest looking into the zoo package to create a series - it can handle both regular (zooreg()) and irregular (zoo()) series. Basically, a zoo object is a numeric vector with a time index. One can create multiple series with a single index, individual series with different indices that can be combined into data frames, etc. I've browsed through some of the code that accompanies Ramsey, Hooker and Graves' FDA book in R and Matlab, and occasionally they use the zoo package as well. Here's an example, but I expect that someone will show how to convert the zoo series to data frames much more efficiently for use in ggplot2... library(zoo) library(ggplot2) library(lattice) # Generate three daily series with different start times and lengths a - zoo(rnorm(450), as.Date(2005-01-01) + 0:449) b - zoo(rnorm(600, 1, 2), as.Date('2005-06-01') + 0:599) d - zoo(rnorm(300, 2, 1), as.Date('2004-09-01') + 0:299) # Convert to data frame, make time index a variable and make sure it's a Date object A - as.data.frame(a) B - as.data.frame(b) D - as.data.frame(d) A$Date - as.Date(rownames(A)) B$Date - as.Date(rownames(B)) D$Date - as.Date(rownames(D)) # Give all three series the same name names(A)[1] - names(B)[1] - names(D)[1] - 'y' # Stack the three data frames and create a series ID variable comb - rbind(A, B, D) comb$Series - rep(c('A', 'B', 'D'), c(nrow(A), nrow(B), nrow(D))) str(comb) # make sure that Date is a Date object # ggplot of the three series ggplot(comb, aes(x = Date, y = y, color = Series)) + geom_path() # Stacked individual plots (faceted) last_plot() + facet_grid(Series ~ .) # lattice version xyplot(y ~ Date, data = comb, groups = Series, type = 'l', col.line = 1:3) # Stacked individual series xyplot(y ~ Date | Series, data = comb, type = 'l', layout = c(1, 3)) If you need the grid coordinates, use expand.grid() - it can be used when creating a data frame, too. As Bert noted the other night in another thread, one can use xyplot directly on zoo objects, but I don't have any direct experience with that yet so will defer to others if they wish to contribute. ?xyplot.zoo provides some examples. Hope this gives you some idea of what can be done, Dennis Best regards, Eduardo On Thu, Jan 6, 2011 at 1:47 AM, Peter Langfelder peter.langfel...@gmail.com wrote: On Wed, Jan 5, 2011 at 7:35 PM, Eduardo de Oliveira Horta eduardo.oliveiraho...@gmail.com wrote: Something like this: u=seq(from=-pi, to=pi, length=1000) f=sin(u) Cairo(example.pdf, type=pdf,width=12,height=12,units=cm,dpi=300) par(cex.axis=.6,col.axis=grey,ann=FALSE, lwd=.25,bty=n, las=1, tcl=-.2, mgp=c(3,.5,0)) xlim=c(-pi,pi) ylim=round(c(min(f),max(f))) plot(u,f,xlim,ylim,type=l,col=firebrick3, axes=FALSE) axis(side=1, lwd=.25, col=darkgrey, at=seq(from=xlim[1], to=xlim[2], length=5)) axis(side=2, lwd=.25, col=darkgrey, at=seq(from=ylim[1], to=ylim[2], length=5)) abline(v=seq(from=xlim[1], to=xlim[2], length=5), lwd=.25,lty=dotted, col=grey) abline(h=seq(from=ylim[1], to=ylim[2], length=5), lwd=.25,lty=dotted, col=grey) dev.off() Wow, you must like light colors :) To the point, just set margins, for example par(mar = c(2,2,0.5, 0.5)) (margins are bottom, left, top, right) after the Cairo command. BTW, Cairo doesn't work for me either... but I tried your example by plotting to the screen. Peter Notice how the canvas' margins are relatively far from the plotting area. Thanks, Eduardo On Thu, Jan 6, 2011 at 1:00 AM, David Winsemius dwinsem...@comcast.netwrote: On Jan 5, 2011, at 9:38 PM, Eduardo de Oliveira Horta wrote: Hello, I want to save a pdf plot using Cairo, but the canvas of the saved file seems too large when compared to the actual plotted area. Is there a way to control the relation between the canvas size and the size of actual plotting area?
Re: [R] Parsing JSON records to a dataframe
On 01/07/2011 12:05 AM, Dieter Menne wrote: Jeroen Ooms wrote: What is the most efficient method of parsing a dataframe-like structure that has been json encoded in record-based format rather than vector based. For example a structure like this: [ {name:joe, gender:male, age:41}, {name:anna, gender:female, age:23} ] RJSONIO parses this as a list of lists, which I would then have to apply as.data.frame to and append them to an existing dataframe, which is terribly slow. unlist is pretty fast. The solution below assumes that you know how your structure is, so it is not very flexible, but it should show you that the conversion to data.frame is not the bottleneck. # json library(RJSONIO) # [ {name:joe, gender:male, age:41}, # {name:anna, gender:female, age:23} ] n = 30 d = data.frame(name=rep(c(joe,anna),n), gender=rep(c(male,female),n), age = rep(c(23,41),n)) dj = toJSON(d) This doesn't create the required structure cat(dj) { name: [ joe, anna, joe, anna ], gender: [ male, female, male, female ], age: [ 23, 41, 23, 41 ] } instead library(rjson) n - 1000 name - apply(matrix(sample(letters, n * 5, TRUE), n), 1, paste, collapse=) gender - sample(c(male, female), n, TRUE) age - ceiling(runif(n, 20, 60)) recs - sprintf('{name: %s, gender:%s, age:%d}', name, gender, age) j - sprintf([%s], paste(recs, collapse=,)) lol - fromJSON(j) and then with f - function(lst) function(nm) unlist(lapply(lst, [[, nm), use.names=FALSE) oopt - options(stringsAsFactors=FALSE) # convenience for 'identical' system.time({ + df0 - as.data.frame(Map(f(lol), names(lol[[1]]))) + }) user system elapsed 0.006 0.000 0.006 versus for instance system.time({ + df1 - do.call(rbind, lapply(lol, data.frame)) + }) user system elapsed 1.497 0.000 1.500 identical(df0, df1) [1] TRUE Martin system.time(d1 - fromJSON(dj)) # user system elapsed # 4.060.264.32 system.time( dd - data.frame( name = unlist(d1$name), gender = unlist(d1$gender), age=as.numeric(unlist(d1$age))) ) # user system elapsed # 1.130.051.18 -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomical-data). Regarding my earlier comment, I just found a Wikipedia article on scientific misconduct that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2 http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-by-authors-publishing-in-plos-journals-2/). On 1/7/2011 4:08 AM, Mike Marchywka wrote: Date: Thu, 6 Jan 2011 23:06:44 -0800 From: peter.langfel...@gmail.com To: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. You see this everywhere in one form or another from medical to financial modelling. My solution here is simply to publish more raw data in a computer readable form, in this case of course something easy to get with R, so disinterested or adversarial parties can run their own analysis. I think there was also a push to create a data base for failed drug trials that may contain data of some value later. The value of R with easily available data for a large cross section of users could be to moderate problems like the one cited here. I almost slammed a poster here earlier who wanted a simple rule for when do I use this test with something like when your mom tells you to since post hoc you do just about everything to assume you messed up and missed something but a priori you hope you have designed a good hypothesis. And at the end of the day, a given p-value is one piece of evidence in the overall objective of learning about some system, not appeasing a sponsor. Personally I'm a big fan of post hoc analysis on biotech data in some cases, especially as more pathway or other theory is published, but it is easy to become deluded if you have a conclusion that you know JUST HAS TO BE RIGHT. Also FWIW, in the few cases I've examined with FDA-sponsor rhetoric, the data I've been able to get tends to make me side with the FDA and I still hate the idea of any regulation or access restrictions but it seems to be the only way to keep sponsors honest to any extent. Your mileage may vary however, take a look at some rather loud disagreement with FDA over earlier DNDN panel results, possibly involving threats against critics. LOL. That being said, it would seem that biomedical sciences do make progress, so some of the published results are presumably correct :) Peter On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves wrote: Part of the phenomenon can be explained by the natural censorship in what is accepted for publication: Stronger results tend to have less difficulty getting published. Therefore, given that a result is published, it is evident that the estimated magnitude of the effect is in average larger than it is in reality, just by the fact that weaker results are less likely to be published. A study of the literature on this subject might yield an interesting and valuable estimate of the magnitude of this selection bias. A more insidious problem, that may not
Re: [R] Different LLRs on multinomial logit models in R and SPSS
On Thu, 6 Jan 2011, David Winsemius wrote: On Jan 6, 2011, at 11:23 AM, Sören Vogel wrote: Thanks for your replies. I am no mathematician or statistician by far, however, it appears to me that the actual value of any of the two LLs is indeed important when it comes to calculation of Pseudo-R-Squared-s. If Rnagel devides by (some transformation of) the actiual value of llnull then any calculation of Rnagel should differ. How come? Or is my function wrong? And if my function is right, how can I calculate a R-Squared independent from the software used? You have two models in that function, the null one with .~ 1 and the origianl one and you are getting a ratio on the likelihood scale (which is a difference on the log-likelihood or deviance scale). If this is the case, calculating 'fit' indices for those models must end up in different fit indices depending on software: n - 143 ll1 - 135.02 ll2 - 129.8 # Rcs (Rcs - 1 - exp( (ll2 - ll1) / n )) # Rnagel Rcs / (1 - exp(-ll1/n)) ll3 - 204.2904 ll4 - 199.0659 # Rcs (Rcs - 1 - exp( (ll4 - ll3) / n )) # Rnagel Rcs / (1 - exp(-ll3/n)) The Rcs' are equal, however, the Rnagel's are not. Of course, this is no question, but I am rather confused. When publishing results I am required to use fit indices and editors would complain that they differ. Sören__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with glht function for lme object
Dear all, I'm trying to make multiple comparisons for an lme-object. The data is for an experiment on parental work load in birds, in which adults at different sites were induced to work at one of three levels ('treat'; H, M, L). The response is 'feedings', which is a quantitative measure of nest provisioning per parent per chick per hour. Site is included as a random effect (one male/female pair per site). My final model takes the following form: feedings ~ treat + year + data^2, random = ~1|site,data=feed.df For this model, I would like to do multiple comparisons on 'treat', using the multcomp package: summary(glht(m4.feed,linfct=mcp(treat=Tukey))) However, this does not work, and I get the below error message. Error in if (is.null(pkg) | pkg == nlme) terms(formula(x)) else slot(x, : argument is of length zero Error in factor_contrasts(model) : no ‘model.matrix’ method for ‘model’ found! I suspect this might have quite a straightforward solution, but I'm stuck at this point. Any help would be most appreciated. Sample data below. Kind regards, Andreas Nord Sweden == feedings sex site treat year date^2 1.8877888 M 838 H 2009 81 1.9102787 M 247 H 2009 81 1.4647229 M 674 H 2010 121 1.4160590 M 7009 M 2009 144 1.3106749 M 863 M 2010 196 1.2718121 M 61 M 2009 225 1.2799263 M 729 L 2009 256 1.5829564 M 629 L 2009 256 1.4847251 M 299 L 2010 324 1.2463151 M 569 L 2010 324 2.1694169 F 838 H 2009 81 1.5966899 F 247 H 2009 81 2.4136983 F 674 H 2010 121 1.7784873 F 7009 M 2009 144 1.6681317 F 863 M 2010 196 2.3691275 F 61 M 2009 225 2.0672192 F 729 L 2009 256 1.6389902 F 629 L 2009 256 0.9307536 F 299 L 2010 324 1.6786767 F 569 L 2010 324 == -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-glht-function-for-lme-object-tp3179128p3179128.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
Henrique Dallazuanna wrote: With the ssl.verifypeer = FALSE argument it works: x = getBinaryURL(dd, ssl.verifypeer = FALSE) Thank, good to know. It's only in the examples of ..., but is looks like a parameter important enough to be included in the docs of getBinaryURL. Digging through CURL docs can be daunting. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3179197.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] defining a formula method for a weighted lm()
Thanks, Martin Now I understand 'standard non-standard evaluation' magic, and the code in http://developer.r-project.org/model-fitting-functions.txt explains how this works. Still, I can't help but think of this as evil-magic, for which some anti-magic would be extremely useful, so that a simple function like my.lm - function(formula, data, subset, weights, ...) { lm(formula, data, subset, weights, ...) } would work as expected. Oh dear, I think I need some syntactic sugar in my coffee! -Michael On 1/6/2011 2:16 PM, Martin Maechler wrote: Michael Friendlyfrien...@yorku.ca on Thu, 06 Jan 2011 09:33:25 -0500 writes: No one replied to this, so I'll try again, with a simple example. I calculate a set of log odds ratios, and turn them into a data frame as follows: library(vcdExtra) (lor.CM- loddsratio(CoalMiners)) log odds ratios for Wheeze and Breathlessness by Age 25-2930-3435-3940-4445-4950-5455-5960-64 3.695261 3.398339 3.140658 3.014687 2.782049 2.926395 2.440571 2.637954 (lor.CM.df- as.data.frame(lor.CM)) Wheeze Breathlessness Age LORASE 1 W:NoW B:NoB 25-29 3.695261 0.16471778 2 W:NoW B:NoB 30-34 3.398339 0.07733658 3 W:NoW B:NoB 35-39 3.140658 0.03341311 4 W:NoW B:NoB 40-44 3.014687 0.02866111 5 W:NoW B:NoB 45-49 2.782049 0.01875164 6 W:NoW B:NoB 50-54 2.926395 0.01585918 7 W:NoW B:NoB 55-59 2.440571 0.01452057 8 W:NoW B:NoB 60-64 2.637954 0.02159903 Now I want to fit a linear model by WLS, LOR ~ Age, which can do directly as lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df) Call: lm(formula = LOR ~ as.numeric(Age), data = lor.CM.df, weights = 1/ASE) Coefficients: (Intercept) as.numeric(Age) 3.5850 -0.1376 But, I want to do the fitting in my own function, the simplest version is my.lm- function(formula, data, subset, weights) { lm(formula, data, subset, weights) } But there is obviously some magic about formula objects and evaluation environments, because I don't understand why this doesn't work. my.lm(LOR ~ as.numeric(Age), weights=1/ASE, data=lor.CM.df) Error in model.frame.default(formula = formula, data = data, subset = subset, : invalid type (closure) for variable '(weights)' Yes, the magic has been called standard non-standard evaluation for a while (since August 2002, to be precise), and the http://developer.r-project.org/ web page has had two very relevant links since then, namely those mentioned in the following two lines there: # Description of the nonstandard evaluation rules in R 1.5.1 and some suggestions. (updated). Also an R function and docn for making model frames from multiple formulas. # Notes on model-fitting functions in R, and especially on how to enable all the safety features. For what you want, I think (but haven't tried) the second link, which is http://developer.r-project.org/model-fitting-functions.txt is still very relevant. Many many people (package authors) had to use something like that or just directly taken the lm function as an example.. {{ but then probably failed the more subtle points on how to program residuals() , predict() , etc functions which you can also learn from model-fitting-functions.txt}} A second question: Age is a factor, and as.numeric(Age) gives me 1:8. What simple expression on lor.CM.df$Age would give me either the lower limits (here: seq(25, 60, by = 5)) or midpoints of these Age intervals (here: seq(27, 62, by = 5))? With data(CoalMiners, package = vcd) here are some variations : (Astr- dimnames(CoalMiners)[[3]]) [1] 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 sapply(lapply(strsplit(Astr, -), as.numeric), `[[`, 1) [1] 25 30 35 40 45 50 55 60 sapply(lapply(strsplit(Astr, -), as.numeric), `[[`, 2) [1] 29 34 39 44 49 54 59 64 sapply(lapply(strsplit(Astr, -), as.numeric), mean) [1] 27 32 37 42 47 52 57 62 Or use the 2-row matrix and apply(*, 1) to that : sapply(strsplit(Astr, -), as.numeric) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 25 30 35 40 45 50 55 60 [2,] 29 34 39 44 49 54 59 64 Regards, Martin Maechler, ETH Zurich -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Problem with timeSequence {timeDate} - wrong end date
Joshua Wiley jwiley.ps...@gmail.com on Thu, 6 Jan 2011 06:47:43 -0800 writes: On Thu, Jan 6, 2011 at 5:27 AM, Joshua Wiley jwiley.ps...@gmail.com wrote: timeSequence() ultimately is relying on seq.POSIXt(). If you look at My apologies, I spoke nonsense---timeSequence() does NOT rely on seq.POSIXt(). The timeDate package has its own method defined for seq() which is what gets dispatched. Still, the behavior is similar: timeSequence(from = 2010-01-15, to = 2010-02-01, by = 1 month) GMT [1] [2010-01-15] [2010-02-15] seq.POSIXt(from = as.POSIXct(2010-01-15), + to = as.POSIXct(2010-02-01), by = 1 month) [1] 2010-01-15 PST 2010-02-15 PST which is not surprising because the code is responsible for this behavior is similar: ## seq.timeDate* ## else if (valid == 6) { if (missing(to)) { mon - seq.int(r1$mon, by = by, length.out = length.out) } else { to - as.POSIXlt(to) mon - seq.int(r1$mon, 12 * (to$year - r1$year) + to$mon, by) } r1$mon - mon r1$isdst - -1 res - as.POSIXct(r1) } ## seq.POSIXt ## else if (valid == 6L) { if (missing(to)) { mon - seq.int(r1$mon, by = by, length.out = length.out) } else { to - as.POSIXlt(to) mon - seq.int(r1$mon, 12 * (to$year - r1$year) + to$mon, by) } r1$mon - mon r1$isdst - -1 res - as.POSIXct(r1) } and res is the object returned in both cases (I believe). Thank you, Joshua. Note that R 2.12.1 patched and R 2.13.0 unstable have this fixed (since yesterday) and hence will no longer overshoot. Also, the next release of the timeDate package will have it fixed. Martin Maechler, ETH Zurich My system: R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] timeDate_2130.91 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] POSIXct issue
Hello I have trouble getting my original datetime back see below. I hope I am missing something. Please help. tt - as.POSIXct(2011-01-07 07:49:13, tz=EST) tt [1] 2011-01-07 07:49:13 EST ttn - as.numeric(tt) ttn [1] 1294404553 tt - as.POSIXct(ttn,origin='1970-01-01',tz=EST) tt [1] 2011-01-07 12:49:13 EST [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Contour plot with time on X-axis (day) and Y-axis (hours)
Hello all, I'd like to graphically represent an hourly temperature timeseries ( http://r.789695.n4.nabble.com/file/n3179098/data.csv data.csv , and see below for pre-process of the data) with the R functions image + contour. To do that I wrote that script ( http://r.789695.n4.nabble.com/file/n3179098/Timetemp.r Timetemp.r ) which basically : - creates x-axis, a vector which will contain the days, - creates y-axis, a vector which will contain the hours of the day, - creates the z-matrix, which has x columns and y rows. This works fine, except that I have some troubles with the time format : I can't set a date (day/month/year) to the x-axis of the resulting graph. Does anyone knows where the pb is ? Thanks ! xav ||| Pre-process of data.csv: data - na.omit(read.csv(data.csv, header = T, sep = ,, dec = .)) data - data[,-1] data[[2]] - strptime(data[[2]], %m/%d/%y %I:%M:%S %p, tz = America/Santiago) -- View this message in context: http://r.789695.n4.nabble.com/Contour-plot-with-time-on-X-axis-day-and-Y-axis-hours-tp3179098p3179098.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector of character with unequal width
Dear R users, Thanks for your help The recomendations were exaclty what I was searching. Bellow the one I will use for my script. Thanks again, jose nchar(xx) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 4 4 4 4 4 4 4 4 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 z1 - rep(0, times=length(xx)) z2 - substr(z1, 1, 9 - nchar(xx)) xx - paste(z2, xx, sep=) xx-substring(xx, 6, 9) nchar(xx) [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [38] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Date: Wed, 5 Jan 2011 19:50:28 +0100 From: savi...@cs.cas.cz To: r-help@r-project.org Subject: Re: [R] vector of character with unequal width On Wed, Jan 05, 2011 at 03:50:13PM +, jose Bartolomei wrote: [...] I was thinking to create a character vector of 0's 9-nchar(xx). Then paste it to xx. 9-nchar(xx) [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 [38] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 6 6 6 5 5 5 5 5 5 5 5 [75] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ..1 Nevertheless, I have not been able to create this vector nor I do not know if this is the best option. Did you consider something like the following? xx - c(abc, abcd, abcde) z1 - rep(0, times=length(xx)) z2 - substr(z1, 1, 9 - nchar(xx)) yy - paste(z2, xx, sep=) cbind(yy) # yy #[1,] 00abc #[2,] 0abcd #[3,] abcde Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to run linear regression models at once
hey, folks, I have two very simple questions. I am not quite sure if I can do this using R. so, I am analyzing a large data frame with thousands of variables. For example: Dependent variables: d1, d2, d3 (i.e., there are three dependent variables) Independent variables: s1, s2, s3, ..s1000 (i.e., there are 1000 independent variables) now I want to run simple linear regression analyses of dependent variables on independent variables. A laborious way to do this is running 1000 linear regression models for each dependent variable separately. This would take a lot of time. My question is: 1) is there a easy way to run all 1000 linear regression analyses for each dependent variable at once? 2) after running all 1000 linear regression analyses at once, how can I write 3000 regression weights (1000 regression weights for each dependent variable) and its significance level in to a file (e.g., csv, excel, ect). Many thanks in advance!! -- View this message in context: http://r.789695.n4.nabble.com/how-to-run-linear-regression-models-at-once-tp3179256p3179256.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] POSIXct issue
On Fri, 7 Jan 2011, Rustamali Manesiya wrote: Hello I have trouble getting my original datetime back see below. I hope I am missing something. Please help. The numeric form (which you should never use) has an origin at midnight in UTC, not EST. Note too that 'EST' is not very plausible for your timezone (or are you in Canada?): see ?Sys.timezone. tt - as.POSIXct(ttn,origin='1970-01-01',tz=UTC) attr(tt, tzone) - EST tt [1] 2011-01-07 07:49:13 EST tt - as.POSIXct(2011-01-07 07:49:13, tz=EST) tt [1] 2011-01-07 07:49:13 EST ttn - as.numeric(tt) ttn [1] 1294404553 tt - as.POSIXct(ttn,origin='1970-01-01',tz=EST) tt [1] 2011-01-07 12:49:13 EST [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with timeSequence {timeDate} - wrong end date
Thank you, Martin (and all of R core) for all your work making such wonderful software available. Not only is it practical, using R, going through the source code and the documentation has been a huge learning experience for me. I am very grateful to you and everyone else for being so generous with your time. Josh On Fri, Jan 7, 2011 at 6:59 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Thank you, Joshua. Note that R 2.12.1 patched and R 2.13.0 unstable have this fixed (since yesterday) and hence will no longer overshoot. Also, the next release of the timeDate package will have it fixed. Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] anova vs aov commands for anova with repeated measures
Dear all, I need to understand a thing in the beheaviour of the two functions aov and anova in the following case involving an analysis of ANOVA with repeated measures: If I use the folowing command I don´t get any problem: aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) summary(aov1) Instead if I try to fit the same model for the regression I get an error: fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) Error in eval(expr, envir, enclos) : could not find function Error so I cannot run the command anova(fit1) afterwards. I want to use fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) because I want to analyze the residuals in order to check normality, and see if the anova assumption of normality still holds. Could you please help me in understanding how to do this? Thanks in advance Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with glht function for lme object
anord andreas.nord at zooekol.lu.se writes: Dear all, I'm trying to make multiple comparisons for an lme-object. The data is for an experiment on parental work load in birds, in which adults at different sites were induced to work at one of three levels ('treat'; H, M, L). The response is 'feedings', which is a quantitative measure of nest provisioning per parent per chick per hour. Site is included as a random effect (one male/female pair per site). (For more complicated mixed model questions you might want to try the r-sig-mixed-models list instead.) It works for me: feed.df - read.table(textConnection( feedings sex site treat year date 1.8877888 M 838 H 2009 81 1.9102787 M 247 H 2009 81 1.4647229 M 674 H 2010 121 1.4160590 M 7009 M 2009 144 1.3106749 M 863 M 2010 196 1.2718121 M 61 M 2009 225 1.2799263 M 729 L 2009 256 1.5829564 M 629 L 2009 256 1.4847251 M 299 L 2010 324 1.2463151 M 569 L 2010 324 2.1694169 F 838 H 2009 81 1.5966899 F 247 H 2009 81 2.4136983 F 674 H 2010 121 1.7784873 F 7009 M 2009 144 1.6681317 F 863 M 2010 196 2.3691275 F 61 M 2009 225 2.0672192 F 729 L 2009 256 1.6389902 F 629 L 2009 256 0.9307536 F 299 L 2010 324 1.6786767 F 569 L 2010 324), header=TRUE) library(nlme) m4.feed - lme(feedings ~ treat + year + date, random = ~1|site,data=feed.df) library(multcomp) gg - glht(m4.feed,linfct=mcp(treat=Tukey)) plot(gg) summary(gg) It works for me (although it doesn't look like there's anything going on there ...) = Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lme.formula(fixed = feedings ~ treat + year + date, data = feed.df, random = ~1 | site) Linear Hypotheses: Estimate Std. Error z value Pr(|z|) L - H == 0 -0.6452 0.7667 -0.8420.592 M - H == 0 -0.3996 0.4276 -0.9340.529 M - L == 0 0.2456 0.4229 0.5810.771 (Adjusted p values reported -- single-step method) sessionInfo() R version 2.12.1 (2010-12-16) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] multcomp_1.2-4 survival_2.36-3 mvtnorm_0.9-95 nlme_3.1-97 loaded via a namespace (and not attached): [1] grid_2.12.1 lattice_0.19-18 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different LLRs on multinomial logit models in R and SPSS
On Jan 7, 2011, at 8:26 AM, sovo0...@gmail.com wrote: On Thu, 6 Jan 2011, David Winsemius wrote: On Jan 6, 2011, at 11:23 AM, Sören Vogel wrote: Thanks for your replies. I am no mathematician or statistician by far, however, it appears to me that the actual value of any of the two LLs is indeed important when it comes to calculation of Pseudo-R-Squared-s. If Rnagel devides by (some transformation of) the actiual value of llnull then any calculation of Rnagel should differ. How come? Or is my function wrong? And if my function is right, how can I calculate a R-Squared independent from the software used? You have two models in that function, the null one with .~ 1 and the origianl one and you are getting a ratio on the likelihood scale (which is a difference on the log-likelihood or deviance scale). If this is the case, calculating 'fit' indices for those models must end up in different fit indices depending on software: n - 143 ll1 - 135.02 ll2 - 129.8 # Rcs (Rcs - 1 - exp( (ll2 - ll1) / n )) # Rnagel Rcs / (1 - exp(-ll1/n)) ll3 - 204.2904 ll4 - 199.0659 # Rcs (Rcs - 1 - exp( (ll4 - ll3) / n )) # Rnagel Rcs / (1 - exp(-ll3/n)) The Rcs' are equal, however, the Rnagel's are not. Of course, this is no question, but I am rather confused. When publishing results I am required to use fit indices and editors would complain that they differ. It is well known that editors are sometimes confused about statistics, and if an editor is insistent on publishing indices that are in fact arbitrary then that is a problem. I would hope that the editor were open to education. (And often there is a statistical associate editor who will be more likely to have a solid grounding and to whom one can appeal in situations of initial obstinancy.) Perhaps you will be doing the world of science a favor by suggesting that said editor first review a first-year calculus text regarding the fact that indefinite integrals are only calculated up to a arbitrary constant and that one can only use the results in a practical setting by specifying the limits of integration. So it is with likelihoods. They are only meaningful when comparing two nested models. Sometimes the software obscures this fact, but it remains a statistical _fact_. Whether you code is correct (and whether the Nagelkerke R^2 remain invariant with respect to such transformations) I cannot say. (I suspect that it would be, but I have never liked the NagelR2 as a measure, and didn't really like R^2 as a measure in linear regression for that matter, either.) I focus on fitting functions to trends, examining predictions, and assessing confidence intervals for parameter estimates. The notion that model fit is well-summarized in a single number blinds one to other critical issues such as the linearity and monotonicity assumptions implicit in much of regression (mal-)practice. So, if someone who is more enamored of (or even more knowledgeably scornful of) the Nagelkerke R^2 measure wants to take over here, I will read what they say with interest and appreciation. Sören David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: Currency return calculations
My mistake sir. I was literally engrossed in my stupid logic, and while doing so, overlooked the simple and very effective solution you had offered. Sorry once again sir and will certainly try to be very careful in future. Thanks again and have a great weekend sir. Regards Amelia --- On Fri, 7/1/11, Petr PIKAL petr.pi...@precheza.cz wrote: From: Petr PIKAL petr.pi...@precheza.cz Subject: Odp: Currency return calculations To: Amelia Vettori amelia_vett...@yahoo.co.nz Cc: r-help@r-project.org Received: Friday, 7 January, 2011, 12:59 PM Hi What is wrong with my suggestion then. rate1 USD GBP EURO CHF AUD 1 112.05 171.52 42.71 41.50 109.55 2 112.90 168.27 42.68 41.47 102.52 3 110.85 169.03 41.86 42.84 114.91 4 109.63 169.64 44.71 43.44 122.48 5 108.08 169.29 44.14 43.69 122.12 6 111.23 169.47 44.58 42.30 123.96 7 112.49 170.90 41.07 42.05 100.36 8 108.87 168.69 42.23 41.23 110.19 9 109.33 170.90 44.55 42.76 121.58 10 111.88 169.96 41.12 43.79 103.46 portfolio-c(USD, USD, CHF, AUD, USD) log(rate1[-1,portfolio]/rate1[-nrow(rate1),portfolio]) USD USD.1 CHF AUD USD.2 2 0.007557271 0.007557271 -0.000723153 -0.066323165 0.007557271 3 -0.018324535 -0.018324535 0.032501971 0.114091312 -0.018324535 4 -0.011066876 -0.011066876 0.013908430 0.063798538 -0.011066876 5 -0.014239366 -0.014239366 0.005738567 -0.002943583 -0.014239366 6 0.028728436 0.028728436 -0.032332157 0.014954765 0.028728436 7 0.011264199 0.011264199 -0.005927700 -0.211195211 0.011264199 8 -0.032709819 -0.032709819 -0.019693240 0.093442427 -0.032709819 9 0.004216322 0.004216322 0.036436939 0.098366334 0.004216322 10 0.023056037 0.023056037 0.023802395 -0.161387418 0.023056037 As I said instead fiddling with several loop/if/function/variables attempt it seems to me better to use powerful R indexing and whole object approach where it is possible. Regards Petr Amelia Vettori amelia_vett...@yahoo.co.nz napsal dne 07.01.2011 13:46:53: Dear sir, I am extremely sorry for messing up the logic asking for help w.r.t. my earlier mails I have tried to explain below what I am looking for. I have a database (say, currency_rates) storing datewise currency exchange rates with some base currency XYZ. currency_rates - data.frame(date = c(12/31/2010, 12/30/2010, 12/29/ 2010, 12/28/2010, 12/27/2010,12/24/2010, 12/23/2010, 12/22/2010, 12/21/2010, 12/20/2010), USD = c(112.05, 112.9, 110.85, 109.63, 108.08, 111.23, 112.49, 108.87, 109.33, 111.88), GBP = c(171.52, 168.27,169.03, 169.64, 169.29, 169.47, 170.9, 168.69, 170.9, 169.96), EURO = c(42.71, 42.68, 41.86, 44.71, 44.14, 44.58, 41.07, 42.23, 44.55, 41.12), CHF = c(41.5, 41.47, 42.84, 43.44, 43.69, 42.3, 42.05, 41.23, 42.76, 43.79), AUD = c(109.55, 102.52, 114.91, 122.48, 122.12, 123.96, 100.36, 110.19, 121. 58, 103.46)) I have a portfolio consisting of some of these currencies. At this moment, suppose my portfolio has following currency transactions. i.e following is my current portfolio and has 2 USD transactions, 2 EURO transactions and a CHF transactions. portfolio_currency_names = c(USD, USD, EURO, CHF, EURO, USD) # My objective is AS PER THE PORTFOLIO, I need to generate a data.frame giving respective currency returns. Thus, I need to have an output like USD USD EURO CHF EURO USD -0.0076 -0.0076 0.0007 0.0007 0. 0007 -0.0076 0.0183 0.0183 0.0194 -0.0325 0. 0194 0.0183 0.0111 0.0111 -0.0659 -0.0139 -0. 0659 0.0111 0.0142 0.0142 0.0128 -0.0057 0. 0128 0.0142 -0.0287 -0.0287 -0.0099 0.0323 -0. 0099 -0.0287 -0.0113 -0.0113 0.0820 0.0059 0. 0820 -0.0113 0.0327 0.0327 -0.0279 0.0197 -0. 0279 0.0327 -0.0042 -0.0042 -0.0535 -0.0364 -0. 0535 -0.0042 -0.0231 -0.0231 0.0801 -0.0238 0. 0801 -0.0231 Thus, my requirement is to have the dataframe as per the composition of my portfolio. Thus, if there are only 2 transactions i.e. if my portfolio contains say only CHF and AUD, I need the return calculations done only forCHF and AUD. CHF AUD 0.0007 0.0663 -0.0325 -0.1141 -0.0139 -0.0638 -0.0057 0.0029 0.0323 -0.0150 0.0059 0.2112 0.0197 -0.0934 -0.0364 -0.0984 -0.0238 0.1614 I once again apologize for not asking my requirement properly thereby causing
[R] Summing over specific columns in a matrix
Hi, I would like to sum some specific columns in my matrix- for example, my matrix looks like this: [,1] [,2] [,3] [,4] [,5] [1,]1 NA NA NA NA [2,]21 NA1 NA [3,]321 21 [4,]432 32 [5,] NA NA NA43 [6,] NA NA NA5 NA I would like to find the sum of the first two columns, the second two columns and the last column: i.e I am left with a vector of c(16, 18, 6). I know about colSums and sum overall- I just wondered if this type of grouping can be included somehow in a vector such as c(2,2,1)? I don't really want to have to use a loop for this. Many thanks Emma -- View this message in context: http://r.789695.n4.nabble.com/Summing-over-specific-columns-in-a-matrix-tp3179400p3179400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to calculate this natural logarithm
Hello I want to calculate natural logarithm of sum of combinations as follow: (R code) { com_sum=choose(200,482)*choose(100,118)+choose(200,483)*choose(100,117)+...+choose(200,i)*choose(100,600-i)+...+choose(200,600)*choose(100,0) #calculate the sum result=log(com_sum) #calculate the log of the sum } But every element of the com_sum is Inf, so I can't get the result Thank you in advance Yours sincerely ZhaoXing Department of Health Statistics West China School of Public Health Sichuan University No.17 Section 3, South Renmin Road Chengdu, Sichuan 610041 P.R.China __ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with glht function for lme object
On Jan 7, 2011, at 8:28 AM, anord wrote: Dear all, I'm trying to make multiple comparisons for an lme-object. The data is for an experiment on parental work load in birds, in which adults at different sites were induced to work at one of three levels ('treat'; H, M, L). The response is 'feedings', which is a quantitative measure of nest provisioning per parent per chick per hour. Site is included as a random effect (one male/female pair per site). My final model takes the following form: feedings ~ treat + year + data^2, random = ~1|site,data=feed.df I am guessing that problems could easily arise as a result of your variable name containing ^. That is an invalid name in an un-back- quoted state. You have clearly not given us a copy of a console session since your variable is date^2 below and data^2 above. -- David. For this model, I would like to do multiple comparisons on 'treat', using the multcomp package: summary(glht(m4.feed,linfct=mcp(treat=Tukey))) However, this does not work, and I get the below error message. Error in if (is.null(pkg) | pkg == nlme) terms(formula(x)) else slot(x, : argument is of length zero Error in factor_contrasts(model) : no ‘model.matrix’ method for ‘model’ found! I suspect this might have quite a straightforward solution, but I'm stuck at this point. Any help would be most appreciated. Sample data below. Kind regards, Andreas Nord Sweden == feedings sex site treat year date^2 1.8877888 M 838 H 2009 81 1.9102787 M 247 H 2009 81 1.4647229 M 674 H 2010 121 1.4160590 M 7009 M 2009 144 1.3106749 M 863 M 2010 196 1.2718121 M 61 M 2009 225 1.2799263 M 729 L 2009 256 1.5829564 M 629 L 2009 256 1.4847251 M 299 L 2010 324 1.2463151 M 569 L 2010 324 2.1694169 F 838 H 2009 81 1.5966899 F 247 H 2009 81 2.4136983 F 674 H 2010 121 1.7784873 F 7009 M 2009 144 1.6681317 F 863 M 2010 196 2.3691275 F 61 M 2009 225 2.0672192 F 729 L 2009 256 1.6389902 F 629 L 2009 256 0.9307536 F 299 L 2010 324 1.6786767 F 569 L 2010 324 == -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-glht-function-for-lme-object-tp3179128p3179128.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing over specific columns in a matrix
Try this: rowSums(rowsum(t(m), rep(1:3, c(2, 2, 1)), na.rm = TRUE)) On Fri, Jan 7, 2011 at 2:29 PM, emj83 stp08...@shef.ac.uk wrote: Hi, I would like to sum some specific columns in my matrix- for example, my matrix looks like this: [,1] [,2] [,3] [,4] [,5] [1,]1 NA NA NA NA [2,]21 NA1 NA [3,]321 21 [4,]432 32 [5,] NA NA NA43 [6,] NA NA NA5 NA I would like to find the sum of the first two columns, the second two columns and the last column: i.e I am left with a vector of c(16, 18, 6). I know about colSums and sum overall- I just wondered if this type of grouping can be included somehow in a vector such as c(2,2,1)? I don't really want to have to use a loop for this. Many thanks Emma -- View this message in context: http://r.789695.n4.nabble.com/Summing-over-specific-columns-in-a-matrix-tp3179400p3179400.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I have just recently written about this issue (i.e. open learning and data sharing) in a manuscript that is currently under review in a clinical journal. I have argued that data hoarding is unethical. Participants in research studies give their time, effort, saliva and blood in the altruistic hope that their sacrifice will benefit humankind. If they were to realize that the real (ulterior) motive of the study investigators is only to advance their careers, they would really think hard about participating in the studies. The study participants should only consent to participate if they can get a signed assurance from the investigators that the investigators will make their data available for scrutiny and for public use (under some reasonable conditions that are fair to the study investigators). As Vickers (Trials 2006) says, whose data is it anyway? I believe that we can achieve great progress in clinical research if and only if we make a concerted effort towards open learning. Stakeholders (i.e. patients, clinicians, policy-makers) should demand that all the data that is potentially relevant to addressing a critical clinical question should be made available in an open learning environment. Unless, we can achieve this we cannot solve the problems of publication bias and inefficient and sub-optimal use of data. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Friday, January 07, 2011 8:26 AM To: Mike Marchywka Cc: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi cal-data). Regarding my earlier comment, I just found a Wikipedia article on scientific misconduct that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2 http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2/). On 1/7/2011 4:08 AM, Mike Marchywka wrote: Date: Thu, 6 Jan 2011 23:06:44 -0800 From: peter.langfel...@gmail.com To: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. You see this everywhere in one form or another from medical to financial modelling. My solution here is simply to publish more raw data in a computer readable form, in this case of course something easy to get with R, so disinterested or adversarial parties can run their own analysis. I think there was also a push to create a data base for failed drug trials that may contain data of some value later. The value of R with easily available data for a large cross section of users could be to moderate problems like the one cited here. I almost slammed a poster here earlier
Re: [R] How to export/save an mrpp object?
On Fri, 2011-01-07 at 14:04 +0200, Nikos Alexandris wrote: Nikos: I finally ran mrpp tests. I think all is fine but one very important issue: I have no idea how to export/save an mrpp object. Tried anything I know and searched the archives but found nothing. David W: And what happened when you tried what seems like the obvious: save(mrpp_obj, file=) # rm(list=ls() ) # Only uncomment if you are ready for your workspace to clear load(mrpp_store.Rdata) Right, clearing did the trick. Any ideas? Is really copy-pasting the mrpp results the only way? Many of us have no idea what such an object is, since you have not described the packages and functions used to create it. If you want an ASCII version then dput or dump are also available. Multiresponse Permuation Procedures (MRPP) is implemented in the vegan package. The function mrpp() returns (an object of class mrpp) something like: --%--- # check class class ( samples_bitemporal_modis.0001.mrpp ) [1] mrpp # check structure str ( samples_bitemporal_modis.0001.mrpp ) List of 12 $ call: language mrpp(dat = samples_bitemporal_modis.0001[, 1:5], grouping = samples_bitemporal_modis.0001[[Class]]) $ delta : num 0.126 $ E.delta : num 0.202 $ CS : logi NA $ n : Named int [1:5] 335 307 183 188 27 ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned ... $ classdelta : Named num [1:5] 0.1255 0.1045 0.1837 0.0981 0.1743 ..- attr(*, names)= chr [1:5] Urban Vegetation Bare ground Burned ... $ Pvalue : num 0.001 $ A : num 0.378 $ distance: chr euclidean $ weight.type : num 1 $ boot.deltas : num [1:999] 0.202 0.202 0.202 0.203 0.202 ... $ permutations: num 999 - attr(*, class)= chr mrpp --%--- Now I've tried the following: --%--- # 1. save(d) it save ( samples_bitemporal_modis.0001.mrpp , file=exported.mrpp.R ) You would normally use the .R extension for R code not saved R objects. instead, used .rda or .RData. # 2. loade(d) it in a new object... loadedmrpp - load ( exported.mrpp.R) If you don't assign to output of load() then the samples_bitemporal_modis.0001.mrpp object will end up in the global workspace. If you assign the output, you store the strings of the objects loaded, not the objects, which are still loaded into the global workspace by default. This is stated in ?load: Value: A character vector of the names of objects created, invisibly. # 3. (tried) to check it... str ( exported.mrpp.R) chr samples_bitemporal_modis.0001.mrpp Hence the above, but I don't believe this. Surely you mean `str(loadedmrpp)` but as I said this will only give the character string of the names of the objects loaded... # it did not cross my mind immediately to... get(loadedmrpp) ...hence the above works, but what is wrong with simply typing `samples_bitemporal_modis.0001.mrpp` (apart from the obvious - you must enjoy typing to give objects such long names...). I think you should read ?load to rethink what these functions do and how they do it. HTH G Call: mrpp(dat = samples_bitemporal_modis.0001[, 1:5], grouping = samples_bitemporal_modis.0001[[Class]]) Dissimilarity index: euclidean Weights for groups: n Class means and counts: Urban Vegetation Bare ground Burned Water delta 0.1255 0.1045 0.1837 0.0981 0.1743 n 335307183 18827 Chance corrected within-group agreement A: 0.3778 Based on observed delta 0.1258 and expected delta 0.2022 Significance of delta: 0.001 Based on 999 permutations # ...or to work on a clean workspace! --%--- Thank you David. Cheers, Nikos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
Bert, consider the short rebuttal offered by George Musser in Scientific American: http://www.scientificamerican.com/blog/post.cfm?id=in-praise-of-scientific-error-2010-12-20 Perhaps a more realistic assessment of the (acknowledged) problem. Regards, Alan Kelly Trinity College Dublin On 7 Jan 2011, at 11:00, r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org r-help-requ...@r-project.orgmailto:r-help-requ...@r-project.org wrote: Message: 54 Date: Thu, 6 Jan 2011 10:56:34 -0800 From: Bert Gunter gunter.ber...@gene.commailto:gunter.ber...@gene.com To: r-help@r-project.orgmailto:r-help@r-project.org Subject: [R] Wyy off topic...Statistical methods, pub bias, scientific validity Message-ID: aanlktinvwp0bm864aedpr=hb-r=e_=b7zgftwdbxn...@mail.gmail.commailto:aanlktinvwp0bm864aedpr=hb-r=e_=b7zgftwdbxn...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Folks: The following has NOTHING (obvious) to do with R. But I believe that all on this list would find it relevant and, I hope, informative. It is LONG. I apologize in advance to those who feel I have wasted their time. http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer Best regards to all, Bert -- Bert Gunter Genentech Nonclinical Biostatistics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Match numeric vector against rows in a matrix?
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Petr Savicky Sent: Friday, January 07, 2011 2:11 AM To: r-help@r-project.org Subject: Re: [R] Match numeric vector against rows in a matrix? On Wed, Jan 05, 2011 at 07:16:47PM +, Kevin Ummel wrote: Two posts in one day is not a good day...and this question seems like it should have an obvious answer: I have a matrix where rows are unique combinations of 1's and 0's: combs=as.matrix(expand.grid(c(0,1),c(0,1))) combs Var1 Var2 [1,]00 [2,]10 [3,]01 [4,]11 I want a single function that will give the row index containing an exact match with vector x: x=c(0,1) The solution needs to be applied many times, so I need something quick -- I was hoping a base function would do it, but I'm drawing a blank. If the matrix can have different number of columns, then also the following can be used combs - as.matrix(expand.grid(c(0,1),c(0,1),c(0,1))) x - c(0,1,1) which(rowSums(combs != rep(x, each=nrow(combs))) == 0) # [1] 7 If the combs matrix always has that form, why not dispense with it and treat x as the binary expansion of the row number (+1)? 1 + sum( 2^(seq_along(x)-1) * x ) [1] 7 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate this natural logarithm
On 07-Jan-11 16:20:59, zhaoxing731 wrote: Hello I want to calculate natural logarithm of sum of combinations as follow: (R code) {com_sum=choose(200,482)*choose(100,118)+ choose(200,483)*choose(100,117)+...+ choose(200,i)*choose(100,600-i)+...+ choose(200,600)*choose(100,0) #calculate the sum result=log(com_sum) #calculate the log of the sum } But every element of the com_sum is Inf, so I can't get the result. Thank you in advance Yours sincerely ZhaoXing You may find you can usefully adapt the technique I have described in: http://www.zen89632.zen.co.uk/R/FisherTest/extreme_fisher.pdf A somewhat different but closely related problem. Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 07-Jan-11 Time: 17:42:48 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
This seems to get me something (the binary format) but what do I do with it? Thanks to everyone for the help. Results: Original dd file on my hard drive: mydd - structure(list(AUA9363 = structure(c(1L, 1L, 2L, 3L, 3L, 4L, 4L, 4L, 6L, 5L, 5L), .Label = c(AUA9733, AUA9734, JAF104, KLM686, KLM744, KLM783), class = factor), OE.LNQ = structure(c(5L, 5L, 5L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), .Label = c( OO-TUC, PH-BFF, PH-BFH, PH-BQD, OE-LVK), class = factor), X51.30 = c(50.82, 50.82, 50.74, 50.74, 50.75, 50.73, 50.84, 50.86, 50.94, 50.66, 50.75), X.3.01 = c(-3.12, -3.14, -3.17, -3.18, -3.05, -3.96, -3.37, -3.28, -4, -3.92, -3.25), X3575 = c(3775L, 3625L, 5575L, 38000L, 38000L, 38975L, 39000L, 39000L, 39000L, 37000L, 36300L), X09.21.44 = structure(c(4L, 5L, 6L, 2L, 3L, 8L, 10L, 11L, 1L, 7L, 9L), .Label = c( 05:44:12, 07:50:08, 07:50:44, 08:02:34, 08:02:47, 08:57:12, 13:22:37, 13:23:11, 13:25:42, 13:25:51, 13:26:15), class = factor)), .Names = c(AUA9363, OE.LNQ, X51.30, X.3.01, X3575, X09.21.44), class = data.frame, row.names = c(NA, -11L)) Downloaded result x [1] 3c 48 54 4d 4c 3e 0a 3c 48 45 41 44 3e 0a 3c 54 49 54 4c 45 3e 4d 6f 76 65 64 20 54 65 6d 70 6f 72 [34] 61 72 69 6c 79 3c 2f 54 49 54 4c 45 3e 0a 3c 2f 48 45 41 44 3e 0a 3c 42 4f 44 59 20 42 47 43 4f 4c [67] 4f 52 3d 22 23 46 46 46 46 46 46 22 20 54 45 58 54 3d 22 23 30 30 30 30 30 30 22 3e 0a 3c 48 31 3e [100] 4d 6f 76 65 64 20 54 65 6d 70 6f 72 61 72 69 6c 79 3c 2f 48 31 3e 0a 54 68 65 20 64 6f 63 75 6d 65 [133] 6e 74 20 68 61 73 20 6d 6f 76 65 64 20 3c 41 20 48 52 45 46 3d 22 68 74 74 70 73 3a 2f 2f 73 69 74 [166] 65 73 2e 67 6f 6f 67 6c 65 2e 63 6f 6d 2f 73 69 74 65 2f 6a 72 6b 72 69 64 65 61 75 2f 68 6f 6d 65 [199] 2f 67 65 6e 65 72 61 6c 2d 73 74 6f 72 65 73 2f 64 75 70 6c 69 63 61 74 65 73 2e 63 73 76 3f 61 74 [232] 74 72 65 64 69 72 65 63 74 73 3d 30 22 3e 68 65 72 65 3c 2f 41 3e 2e 0a 3c 2f 42 4f 44 59 3e 0a 3c [265] 2f 48 54 4d 4c 3e 0a --- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: Dieter Menne dieter.me...@menne-biomed.de Cc: r-help@r-project.org Received: Friday, January 7, 2011, 7:07 AM With the ssl.verifypeer = FALSE argument it works: x = getBinaryURL(dd, ssl.verifypeer = FALSE) On Fri, Jan 7, 2011 at 6:24 AM, Dieter Menne dieter.me...@menne-biomed.dewrote: John Kane-2 wrote: # Can anyone suggest why this works datafilename - http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data; person.data - read.table(datafilename,header=TRUE) # but this does not? dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt; treedata - read.table(dd, header=TRUE) === Error in file(file, rt) : cannot open the connection Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter library(RCurl) sessionInfo() dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png; x = getBinaryURL(dd) - sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RCurl_1.5-0.1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.12.1 dd - https://sites.google.com/site/jrkrideau/home/general-stores/history.png x = getBinaryURL(dd) Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed -- View this message in context: http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3178773.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] -Inline Attachment Follows- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
[R] Merging zoo objects
Hi I have n zoo objects M1, M2, M3, ... , Mn that I want to merge where n is a number calculated at run-time. I am struggling to find the correct syntax to do this Assuming n is calculated as 10 (for example), I have tried n = 10 # First Effort alldata= merge(paste(M,rep(1:n), sep=),all=TRUE) # Second Effort alldata =merge(noquote(toString(paste(M,rep(1:nrow(counts1)),sep=))),all=TRUE) Any insights would be gratefully received. Kind regards Pete -- View this message in context: http://r.789695.n4.nabble.com/Merging-zoo-objects-tp3184696p3184696.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I applaud your efforts, Ravi. Regarding Whose data is it?, I humbly suggest that referees and editorial boards push (demand?) for rules that require the raw data be made available to the referees and concurrent with publication. Spencer On 1/7/2011 8:43 AM, Ravi Varadhan wrote: I have just recently written about this issue (i.e. open learning and data sharing) in a manuscript that is currently under review in a clinical journal. I have argued that data hoarding is unethical. Participants in research studies give their time, effort, saliva and blood in the altruistic hope that their sacrifice will benefit humankind. If they were to realize that the real (ulterior) motive of the study investigators is only to advance their careers, they would really think hard about participating in the studies. The study participants should only consent to participate if they can get a signed assurance from the investigators that the investigators will make their data available for scrutiny and for public use (under some reasonable conditions that are fair to the study investigators). As Vickers (Trials 2006) says, whose data is it anyway? I believe that we can achieve great progress in clinical research if and only if we make a concerted effort towards open learning. Stakeholders (i.e. patients, clinicians, policy-makers) should demand that all the data that is potentially relevant to addressing a critical clinical question should be made available in an open learning environment. Unless, we can achieve this we cannot solve the problems of publication bias and inefficient and sub-optimal use of data. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Friday, January 07, 2011 8:26 AM To: Mike Marchywka Cc: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi cal-data). Regarding my earlier comment, I just found a Wikipedia article on scientific misconduct that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2 http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b y-authors-publishing-in-plos-journals-2/). On 1/7/2011 4:08 AM, Mike Marchywka wrote: Date: Thu, 6 Jan 2011 23:06:44 -0800 From: peter.langfel...@gmail.com To: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. You see this everywhere in one form or another from medical to financial modelling. My solution here is simply to publish more raw data in a computer readable form, in this case of course something easy to get with R, so disinterested or adversarial parties can run their own analysis. I think there
Re: [R] Accessing data via url
--- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging zoo objects
On Fri, Jan 7, 2011 at 1:01 PM, Pete B peter.breckn...@bp.com wrote: Hi I have n zoo objects M1, M2, M3, ... , Mn that I want to merge where n is a number calculated at run-time. I am struggling to find the correct syntax to do this Assuming n is calculated as 10 (for example), I have tried n = 10 # First Effort alldata= merge(paste(M,rep(1:n), sep=),all=TRUE) # Second Effort alldata =merge(noquote(toString(paste(M,rep(1:nrow(counts1)),sep=))),all=TRUE) Try this where the sapply creates a list of the objects. library(zoo) M1 - zoo(11:13); M2 - zoo(21:24); M3 - zoo(31:35) do.call(merge, sapply(ls(pattern = ^M), get, simplify = FALSE)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
--- On Fri, 1/7/11, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: From: Prof Brian Ripley rip...@stats.ox.ac.uk Subject: Re: [R] Accessing data via url ?read.table says ‘file’ can also be a complete URL. This is implemented by url(): see the section on URLs on its help Thanks page. You haven't followed the posting guide and told us your OS, and what the section says does depend on the OS. Sorry I thought I had it in my sig file . Added now. R 2.12.0 Windows 7 On Thu, 6 Jan 2011, John Kane wrote: # Can anyone suggest why this works datafilename - http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data; person.data - read.table(datafilename,header=TRUE) # but this does not? dd - https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt; treedata - read.table(dd, header=TRUE) === Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : unsupported URL scheme # I can access both through a hyperlink in OOO Calc. t # Thanks -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merging zoo objects
Thanks Gabor. Much appreciated (as always). -- View this message in context: http://r.789695.n4.nabble.com/Merging-zoo-objects-tp3184696p3186113.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
Take a look on the return of: rawToChar(x) On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote: --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing over specific columns in a matrix
On Fri, Jan 7, 2011 at 8:42 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: rowSums(rowsum(t(m), rep(1:3, c(2, 2, 1)), na.rm = TRUE)) On Fri, Jan 7, 2011 at 2:29 PM, emj83 stp08...@shef.ac.uk wrote: Hi, I would like to sum some specific columns in my matrix- for example, my matrix looks like this: [,1] [,2] [,3] [,4] [,5] [1,] 1 NA NA NA NA [2,] 2 1 NA 1 NA [3,] 3 2 1 2 1 [4,] 4 3 2 3 2 [5,] NA NA NA 4 3 [6,] NA NA NA 5 NA I would like to find the sum of the first two columns, the second two columns and the last column: i.e I am left with a vector of c(16, 18, 6). Can you help me extend this example? I'd like to get (PL_Pos - Costs) for each row in Data1, sum those results for each matching date in Result1, and put the result in a new column in Result1 called 'Daily'. Been messing with this for an hour now. Nothing comes close. Thanks, Mark Result1 = structure(list(TradeDates = structure(c(14249, 14250, 14251, 14252, 14253, 14256, 14257, 14258, 14259, 14260, 14263, 14264 ), class = Date)), .Names = TradeDates, row.names = c(NA, 12L), class = data.frame) Data1 = structure(list(Trade = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22), PosType = c(1, 1, -1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1), EnDate = structure(c(14249, 14250, 14251, 14253, 14256, 14256, 14256, 14257, 14258, 14258, 14259, 14259, 14260, 14264, 14264, 14265, 14266, 14267, 14270, 14271, 14273, 14274), class = Date), EnTime = c(1406, 1318, 838, 846, 846, 1038, 1102, 918, 838, 950, 1134, 1254, 1110, 846, 1318, 854, 950, 838, 1246, 838, 854, 902), ExDate = structure(c(14249, 14250, 14251, 14253, 14256, 14256, 14256, 14257, 14258, 14258, 14259, 14259, 14260, 14264, 14264, 14265, 14266, 14267, 14270, 14271, 14273, 14274 ), class = Date), ExTime = c(1515, 1515, 1030, 942, 1030, 1046, 1110, 1515, 942, 1030, 1142, 1515, 1515, 1030, 1326, 1515, 1515, 1030, 1515, 1515, 1515, 1022), PL_Pos = c(133.5, -41.5, 171, 483.5, 333.5, -29, -54, -291.5, 596, -141.5, -54, 558.5, 533.5, 521, -41.5, 883.5, 358.5, -979, -191.5, 196, -791.5, 446), Costs = c(29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29 ), PL = c(104.5, -70.5, 142, 454.5, 304.5, -58, -83, -320.5, 567, -170.5, -83, 529.5, 504.5, 492, -70.5, 854.5, 329.5, -1008, -220.5, 167, -820.5, 417)), .Names = c(Trade, PosType, EnDate, EnTime, ExDate, ExTime, PL_Pos, Costs, PL ), row.names = c(NA, 22L), class = data.frame) Result1 Data1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
I think that the strategy of Editors simply telling the authors to share or perish is a bit naïve. There are a number of practical challenges that need to be addressed in order to create a fair and effective open-learning environment. Eysenbach (BMJ 2001) and Vickers (2006) discuss these and some partial solutions. We need more creative thinking that uses both carrot and sticks. We also need more empirical experience with this. Perhaps, we can learn from fields, if there are any, that do a good job of data sharing and open learning. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: Spencer Graves [mailto:spencer.gra...@structuremonitoring.com] Sent: Friday, January 07, 2011 1:01 PM To: Ravi Varadhan Cc: 'Mike Marchywka'; r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity I applaud your efforts, Ravi. Regarding Whose data is it?, I humbly suggest that referees and editorial boards push (demand?) for rules that require the raw data be made available to the referees and concurrent with publication. Spencer On 1/7/2011 8:43 AM, Ravi Varadhan wrote: I have just recently written about this issue (i.e. open learning and data sharing) in a manuscript that is currently under review in a clinical journal. I have argued that data hoarding is unethical. Participants in research studies give their time, effort, saliva and blood in the altruistic hope that their sacrifice will benefit humankind. If they were to realize that the real (ulterior) motive of the study investigators is only to advance their careers, they would really think hard about participating in the studies. The study participants should only consent to participate if they can get a signed assurance from the investigators that the investigators will make their data available for scrutiny and for public use (under some reasonable conditions that are fair to the study investigators). As Vickers (Trials 2006) says, whose data is it anyway? I believe that we can achieve great progress in clinical research if and only if we make a concerted effort towards open learning. Stakeholders (i.e. patients, clinicians, policy-makers) should demand that all the data that is potentially relevant to addressing a critical clinical question should be made available in an open learning environment. Unless, we can achieve this we cannot solve the problems of publication bias and inefficient and sub-optimal use of data. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Friday, January 07, 2011 8:26 AM To: Mike Marchywka Cc: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity I wholeheartedly agree with the trend towards publishing datasets. One way to do that is as datasets in an R package contributed to CRAN. Beyond this, there seems to be an increasing trend towards journals requiring authors of scientific research to publish their data as well. The Public Library of Science (PLOS) has such a policy, but it is not enforced: Savage and Vickers (2010) were able to get the raw data behind only one of ten published articles they tried, and that one came only after reminding the author that s/he had agreed to making the data available as a condition of publishing in PLOS. (Four other authors refused to share their data in spite of their legal and moral commitment to do so as a condition of publishing in PLOS.) There are other venues for publishing data. For example, much astronomical data is now routinely web published so anyone interested can test their pet algorithm on real data (http://sites.google.com/site/vousergroup/presentations/publishing-astronomi cal-data). Regarding my earlier comment, I just found a Wikipedia article on scientific misconduct that mentioned the tendency to refuse to publish research that proves your new drug is positively harmful. This is an extreme version of both types of bias I previously mentioned: (1) only significant results get published. (2) private funding provides its own biases. Spencer # Savage and Vickers (2010) Empirical Study Of Data Sharing By Authors Publishing In PLoS Journals, Scientific Data Sharing, added Apr. 26, 2010 (http://scientificdatasharing.com/medicine/empirical-study-of-data-sharing-b
Re: [R] Accessing data via url
Well it seems to mean that the file has moved. It , to my untrained eye, seems to be in the same sport as it was yesterday BTW I am now talking about dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; Would this indicate it's physically moving on googe servers? I had not realised there was something like rawToChar(). Amazing what R does. Thank , --- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de Received: Friday, January 7, 2011, 1:21 PM Take a look on the return of: rawToChar(x) On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote: --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assumptions for ANOVA: the right way to check the normality
A lot of this depends on what question you are really trying to answer. For one way anova replacing y-values with their ranks essentially transforms the distribution to uniform (under the null) and the Central Limit Theorem kicks in for the uniform with samples larger than about 5, so the normal approximations are pretty good and the theory works, but what are you actually testing? The most meaningful null that is being tested is that all data come from the exact same distribution. So what does it mean when you reject that null? It means that all the groups are not representing the same distribution, but is that because the means differ? Or the variances? Or the shapes? It can be any of those. Some point out that if you make certain assumptions such as symmetry or shifts of the same distributions, then you can talk about differences in means or medians, but usually if I am using non-parametrics it is because I don't believe that things are symmetric and the shift idea doesn't fit in my mind. Some alternatives include bootstrapping or permutation tests, or just transforming the data to get something closer to normal. Now what does replacing by ranks do in 2-way anova where we want to test the difference in one factor without making assumptions about whether the other factor has an effect or not? I'm not sure on this one. I have seen regression on ranks, it basically tests for some level of relationship, but regression is usually used for some type of prediction and predicting from a rank-rank regression does not seem meaningful to me. Fitting the regression model does not require normality, it is the tests on the coefficients and confidence and prediction intervals that assume normality (again the CLT helps for large samples (but not for prediction intervals)). Bootstrapping is an option for regression without assuming normality, transformations can also help. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ben Ward Sent: Thursday, January 06, 2011 2:00 PM To: r-help@r-project.org Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality On 06/01/2011 20:29, Greg Snow wrote: Some would argue to always use the kruskal wallis test since we never know for sure if we have normality. Personally I am not sure that I understand what exactly that test is really testing. Plus in your case you are doing a two-way anova and kruskal.test does one-way, so it will not work for your case. There are other non-parametric options. Just read this and had queries of my own and comments on this subject: Would one of these options be to rank the data before doing whatever model or test you want to do? As I understand it makes the place of the data the same, but pulls extreme cases closer to the rest. Not an expert though. I've been doing lm() for my work, and I don't know if that makes an assumption of normality (may data is not normal). And I'm unsure of any other assumptions as my texts don't really discuss them. Although I can comfortably evaluate a model say using residual vs fitted, and F values turned to P, resampling and confidence intervals, and looking at sums of squares terms add to explanation of the model. I've tried the plot() function to help graphically evaluate a model, and I want to make sure I understand what it's showing me. I think the first, is showing me the models fitted values vs the residuals, and ideally, I think the closer the points are to the red line the better. The next plot is a Q-Q plot, the closer the points to the line, the more normal the model coefficients (or perhaps the data). I'm not sure what the next two plots are, but it is titled Scale-Location. And it looks to have the square root of standardized residuals on y, and fitted model values on x. Might this be similar to the first plot? The final one is titled Residuals vs Leverage, which has standardized residuals on y and leverage on x, and something called Cooks Distance is plotted as well. Thanks, Ben. W Whether to use anova and other normality based tests is really a matter of what assumptions you are willing to live with and what level of close enough you are comfortable with. Consulting with a local consultant with experience in these areas is useful if you don't have enough experience to decide what you are comfortable with. For your description, I would try the proportional odds logistic regression, but again, you should probably consult with someone who has experience rather than trying that on your own until you have more training and experience. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: Frodo Jedi [mailto:frodo.j...@yahoo.com] Sent: Thursday,
Re: [R] Accessing data via url
Great, but how did you do that? --- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de Received: Friday, January 7, 2011, 1:31 PM Google redirect to: https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0 On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.ca wrote: Well it seems to mean that the file has moved. It , to my untrained eye, seems to be in the same sport as it was yesterday BTW I am now talking about dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; Would this indicate it's physically moving on googe servers? I had not realised there was something like rawToChar(). Amazing what R does. Thank , --- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de Received: Friday, January 7, 2011, 1:21 PM Take a look on the return of: rawToChar(x) On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote: --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv; x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
--- On Fri, 1/7/11, Peter Langfelder peter.langfel...@gmail.com wrote: From: Peter Langfelder peter.langfel...@gmail.com Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity To: r-help@r-project.org r-help@r-project.org Received: Friday, January 7, 2011, 2:06 AM From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. That being said, it would seem that biomedical sciences do make progress, so some of the published results are presumably correct :) Totally a placebo effect :) Peter On Thu, Jan 6, 2011 at 9:13 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Part of the phenomenon can be explained by the natural censorship in what is accepted for publication: Stronger results tend to have less difficulty getting published. Therefore, given that a result is published, it is evident that the estimated magnitude of the effect is in average larger than it is in reality, just by the fact that weaker results are less likely to be published. A study of the literature on this subject might yield an interesting and valuable estimate of the magnitude of this selection bias. A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research (http://portal.unesco.org/education/en/ev.php-URL_ID=21052URL_DO=DO_TOPICURL_SECTION=201.html). For example, I've heard claims (which I cannot substantiate right now) that cell phone companies allegedly lobbied successfully to block funding for researchers they thought were likely to document health problems with their products. Related claims have been made by scientists in the US Food and Drug Administration that certain therapies were approved on political grounds in spite of substantive questions about the validity of the research backing the request for approval (e.g., www.naturalnews.com/025298_the_FDA_scientists.html). Some of these accusations of political corruption may be groundless. However, as private funding replaces tax money for basic science, we must expect an increase in research results that match the needs of the funding agency while degrading the quality of published research. This produces more research that can not be replicated -- effects that get smaller upon replication. (My wife and I routinely avoid certain therapies recommended by physicians, because the physicians get much of their information on recent drugs from the pharmaceuticals, who have a vested interest in presenting their products in the most positive light.) Spencer On 1/6/2011 2:39 PM, Carl Witthoft wrote: The next week's New Yorker has some decent rebuttal letters. The case is hardly as clear-cut as the author would like to believe. Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
On 01/07/2011 06:13 AM, Spencer Graves wrote: A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research Maybe I'm too pessimistic, but the term _political_ corruption reminds me that I can just as easily imagine a funding bias* in public funding. And I'm not sure it is (or would be) less of a problem just because the interests of private funding are easier to spot. * I think of bias on both sides: the funding agency selecting the studies to support and the researcher subconsciously complying to the expectations of the funding agency. On 01/07/2011 08:06 AM, Peter Langfelder wrote: From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. Even if the number of all the tests were known, I have the impression that the corrected p-value would be kind of the right answer to the wrong question. I'm not particularly interested in the probability of arriving at the presented findings if the null hypothesis were true. I'd rather know the probability that the conclusions are true. Switching to the language of clinical chemistry, this is: I'm presented with the sensitivity of a test, but I really want to know the positive predictive value. What is still missing with the corrected p-values is the prevalence of good ideas of the publishing scientist (not even known for all scientists). And I'm not sure this is not decreasing if the scientist generates and tests more and more ideas. I found my rather hazy thoughts about this much better expressed in the books of Beck-Bornholdt and Dubben (which I'm afraid are only available in German). Conclusion: try to be/become a good scientist: with a high prevalence of good ideas. At least with a high prevalence of good ideas among the tested hypotheses. Including thinking first which hypotheses are the ones to test, and not giving in to the temptation to try out more and more things as one gets more familiar with the experiment/data set/problem. The latter I find very difficult. Including the experience of giving a presentation where I explicitly talked about why I did not do any data-driven optimization of my models. Yet in the discussion I was very prominently told I need to try in addition these other pre-processing techniques and these other modeling techniques - even by people whom I know to be very much aware and concerned about optimistically biased validation results. Which were of course very valid questions (and easy to comply), but I conclude it is common/natural/human to have and want to try out more ideas. Also, after several years in the field and with the same kind of samples of course I run the risk of my ideas being overfit to our kind of samples - this is a cost that I have to pay for the gain due to experience/expertise. Some more thoughts: - reproducibility: I'm analytical chemist. We have huge amounts of work going into round robin trials in order to measure the natural variability of different labs on very defined systems. - we also have huge amounts of work going into calibration transfer, i.e. making quantitative predictive models work on a different instrument. This is always a whole lot of work, and for some fields of problems at the moment considered basically impossible even between two instruments of the same model and manufacturer. The quoted results on the mice are not very astonishing to me... ;-) - Talking about (not so) astonishing differences between between replications of experiments: I find myself moving from reporting ± 1 standard deviation to reporting e.g. the 5th to 95th percentiles. Not only because my data distributions are often not symmetric, but also because I find Im not able to directly perceive the real spread of the data from a standard deviation error bar. This is all about perception, of course I can reflect about the meaning. Such a reflection also tells me that one student having a really unlikely number of right guesses is unlikely but not impossible. There is no statistical law stating that unlikely events happen only with large sample sizes/number of tests. Yet the immediate perception is completely different. - I happily agree with the ideas of publishing findings (conclusions) as well as the data and data analysis code I used to arrive there. But I'm aware that part of this agreement is due to the fact that I'm quite interested in the data analytical methods (I'd say as well as in the particular chemical-analytical problem at hand, but rather
Re: [R] Accessing data via url
I don't know how Henrique did it, but in Firefox one can go to the Downloads panel and right click on the downloaded file and choose Copy Download link (or something similar) and get: https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7cpNemjCFz14tAP3IPYCsAnvo-JJbgPNnPEWN_evBHG2jEYaNFOIT6GZF4M3VuKzioPZwvX7QSvMDWfJ3pHac5JK5BHyflOGBLOo_v44C0oU2V6teTwnjeg4TFufeltT-i5T3ThkuyesCztr6g2yLl65YcckwlEGEDtS-L9yzVe1B6tFEu2n6sjAOV9EHokEFx8e-HDFyf-u5mVIGMPgCHvaQL8pupVz-1p1rEdPpS0f6pqApTc%3Dattredirects=0 -- David. On Jan 7, 2011, at 1:35 PM, John Kane wrote: Great, but how did you do that? --- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne- biomed.de Received: Friday, January 7, 2011, 1:31 PM Google redirect to: https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0 On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.ca wrote: Well it seems to mean that the file has moved. It , to my untrained eye, seems to be in the same sport as it was yesterday BTW I am now talking about dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv Would this indicate it's physically moving on googe servers? I had not realised there was something like rawToChar(). Amazing what R does. Thank , --- On Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne- biomed.de Received: Friday, January 7, 2011, 1:21 PM Take a look on the return of: rawToChar(x) On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.ca wrote: --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
The issue Spencer brings up is a problem whether the funding is private or public. Just as businesses fund studies that support their goals, government agencies fund studies that justify the need for their services and expansion of their powers and budgets. In fact, there's a whole field of study variously called public choice economics and the new institutional economics that study these and related issues. On a related note, there is certainly a lot of self-selection bias in what fields of study people choose to enter. For just one example, it isn't too difficult to believe that of the pool of people talented and interested in statistics, those who choose to enter public health or epidemiology might be more likely to want research that justifies expansion of public health and environmental agencies' regulatory powers and this might affect the research questions they ask, the ways they design and select their statistical models, and what results they choose to include and exclude from publications. AFAIK, there is substantial evidence that researchers, espeically in non-experimental studies, tend to get results they expect or hope to find, even if they feel no conscious bias. This is likely one of the reasons observational studies are so frequently overturned by randomized controlled trials. RCT's provide less room for confirmation bias to rear its ugly head. Joel -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Spencer Graves Sent: Thursday, January 06, 2011 9:13 PM To: Carl Witthoft Cc: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research (http://portal.unesco.org/education/en/ev.php-URL_ID=21052URL_DO=DO_TOPICU RL_SECTION=201.html). ...as private funding replaces tax money for basic science, we must expect an increase in research results that match the needs of the funding agency while degrading the quality of published research. This produces more research that can not be replicated -- effects that get smaller upon replication. (My wife and I routinely avoid certain therapies recommended by physicians, because the physicians get much of their information on recent drugs from the pharmaceuticals, who have a vested interest in presenting their products in the most positive light.) Spencer On 1/6/2011 2:39 PM, Carl Witthoft wrote: The next week's New Yorker has some decent rebuttal letters. The case is hardly as clear-cut as the author would like to believe. Carl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assumptions for ANOVA: the right way to check the normality
I believe what I'm doing, is an ancova, because I have two categorical and a numerical explanatory variables, and a numerical response variable (this is the same experiment as before, the bacteria), and I'm just, at the minute (because I'm only half way through), doing some modelling and seeing what I get with what I currently have. And I'm paying attention to 95% CI for the different terms of a model, as well as the coefficient, and the explanatory power of the term and likelyhood that the same result could be obtained at random through the P values, derived from F. To be honest I havent checked much what my data distributions are like and such becasue I'm not finished collecting it yet. I mainly mentioned the ranking because it was given considerable mention in one of my texts sections on hypothesis testing on models. On 07/01/2011 18:34, Greg Snow wrote: A lot of this depends on what question you are really trying to answer. For one way anova replacing y-values with their ranks essentially transforms the distribution to uniform (under the null) and the Central Limit Theorem kicks in for the uniform with samples larger than about 5, so the normal approximations are pretty good and the theory works, but what are you actually testing? The most meaningful null that is being tested is that all data come from the exact same distribution. So what does it mean when you reject that null? It means that all the groups are not representing the same distribution, but is that because the means differ? Or the variances? Or the shapes? It can be any of those. Some point out that if you make certain assumptions such as symmetry or shifts of the same distributions, then you can talk about differences in means or medians, but usually if I am using non-parametrics it is because I don't believe that things are symmetric and the shift idea doesn't fit in my mind. Some alternatives include bootstrapping or permutation tests, or just transforming the data to get something closer to normal. Now what does replacing by ranks do in 2-way anova where we want to test the difference in one factor without making assumptions about whether the other factor has an effect or not? I'm not sure on this one. I have seen regression on ranks, it basically tests for some level of relationship, but regression is usually used for some type of prediction and predicting from a rank-rank regression does not seem meaningful to me. Fitting the regression model does not require normality, it is the tests on the coefficients and confidence and prediction intervals that assume normality (again the CLT helps for large samples (but not for prediction intervals)). Bootstrapping is an option for regression without assuming normality, transformations can also help. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ben Ward Sent: Thursday, January 06, 2011 2:00 PM To: r-help@r-project.org Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality On 06/01/2011 20:29, Greg Snow wrote: Some would argue to always use the kruskal wallis test since we never know for sure if we have normality. Personally I am not sure that I understand what exactly that test is really testing. Plus in your case you are doing a two-way anova and kruskal.test does one-way, so it will not work for your case. There are other non-parametric options. Just read this and had queries of my own and comments on this subject: Would one of these options be to rank the data before doing whatever model or test you want to do? As I understand it makes the place of the data the same, but pulls extreme cases closer to the rest. Not an expert though. I've been doing lm() for my work, and I don't know if that makes an assumption of normality (may data is not normal). And I'm unsure of any other assumptions as my texts don't really discuss them. Although I can comfortably evaluate a model say using residual vs fitted, and F values turned to P, resampling and confidence intervals, and looking at sums of squares terms add to explanation of the model. I've tried the plot() function to help graphically evaluate a model, and I want to make sure I understand what it's showing me. I think the first, is showing me the models fitted values vs the residuals, and ideally, I think the closer the points are to the red line the better. The next plot is a Q-Q plot, the closer the points to the line, the more normal the model coefficients (or perhaps the data). I'm not sure what the next two plots are, but it is titled Scale-Location. And it looks to have the square root of standardized residuals on y, and fitted model values on x. Might this be similar to the first plot? The final one is titled Residuals vs Leverage, which has standardized residuals on y and leverage
Re: [R] plot without points overlap
On Fri, Jan 7, 2011 at 10:34 AM, JD joon...@gmail.com wrote: Hey, thanks for the suggestions but I'm still having some problem. I was able to modify the size pdf/ps file that I generate, This is a somewhat different question. The size of the finally created portable document or postscript file is determined by the width and height arguments when you start the respective device. I would just expand the size of the PDF/PS. If you have a particular aspect ratio you desire, then just be sure to expand height proportionally when you add width to accommodate the labels. but still labels of the right side of the heatmap are not plot entirely. I tried par(mar..) par(pin..) but still cannot do it, any ideas? Thanks! g 2011/1/4 Joshua Wiley jwiley.ps...@gmail.com Hi, You can set the device region in inches using the pin argument (see ?par maybe halfway down or so). You can also set the aspect ratio in plot(), but I am not sure that is really what you want (see ?plot.window for that). Two Examples ### par(pin = c(2, 4)) plot(1:10) dev.off() plot(1:10, asp = 2) ### Hope that helps, Josh On Tue, Jan 4, 2011 at 8:46 AM, joonR joon...@gmail.com wrote: Hi, I'm trying to plot a grid of points of different dimensions using the simple plot() function. I want to plot the points such that they DO NOT overlap, I guess there should be a way to set a maximum distance between the points, but I cannot find it. Can you help? Thanks a lot! g PS: Is it possible to produce device regions of different dimensions? (i.e. a rectangular one with height width) -- View this message in context: http://r.789695.n4.nabble.com/plot-without-points-overlap-tp3173894p3173894.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot without points overlap
Hey, thanks for the suggestions but I'm still having some problem. I was able to modify the size pdf/ps file that I generate, but still labels of the right side of the heatmap are not plot entirely. I tried par(mar..) par(pin..) but still cannot do it, any ideas? Thanks! g 2011/1/4 Joshua Wiley jwiley.ps...@gmail.com Hi, You can set the device region in inches using the pin argument (see ?par maybe halfway down or so). You can also set the aspect ratio in plot(), but I am not sure that is really what you want (see ?plot.window for that). Two Examples ### par(pin = c(2, 4)) plot(1:10) dev.off() plot(1:10, asp = 2) ### Hope that helps, Josh On Tue, Jan 4, 2011 at 8:46 AM, joonR joon...@gmail.com wrote: Hi, I'm trying to plot a grid of points of different dimensions using the simple plot() function. I want to plot the points such that they DO NOT overlap, I guess there should be a way to set a maximum distance between the points, but I cannot find it. Can you help? Thanks a lot! g PS: Is it possible to produce device regions of different dimensions? (i.e. a rectangular one with height width) -- View this message in context: http://r.789695.n4.nabble.com/plot-without-points-overlap-tp3173894p3173894.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to export/save an mrpp object?
Gavin: [...] I think you should read ?load to rethink what these functions do and how they do it. [...] Absolutely correct. I will. Thank you for your time Gavin, Nikos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rprofile.site
Hi all, i just installed R 2.12.1 and Tinn-R 2.3.7.1. My OS is Windows 7 32bit. Here is my question: Do I have to execute the rprofile.site manually every time I start a R/Tinn-R session? Because it seems that way. Here is the error code I receive after sending selection(echo=TRUE): source(.trPaths[5], echo=TRUE, max.deparse.length=150) It doesn't do that after I configure (Rconfigurepermanent(rprofile.site)) but after i closed R/Tinn-R I have to configure R again. Is that necessary? Thank you, Julian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compare the risk
I have a dataset which looks like this: cbr dust smoking expo 1 0 0.20 15 2 0 0.25 14 3 0 0.25 18 4 0 0.25 14 5 0 0.25 14 6 0 0.25 18 7 0 0.25 18 8 0 0.25 14 9 1 0.25 18 10 0 0.29 17 11 0 0.29 07 12 0 0.29 17 13 0 0.30 0 10 14 0 0.30 1 10 15 16 . . . How can I compare the risk of getting bronchitis between smoker and no-smoker? -- View this message in context: http://r.789695.n4.nabble.com/Compare-the-risk-tp3189084p3189084.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector of character with unequal width
On Fri, Jan 07, 2011 at 02:55:18PM +, jose Bartolomei wrote: Dear R users, Thanks for your help The recomendations were exaclty what I was searching. Bellow the one I will use for my script. Thanks again, jose nchar(xx) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 4 4 4 4 4 4 4 4 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 z1 - rep(0, times=length(xx)) z2 - substr(z1, 1, 9 - nchar(xx)) xx - paste(z2, xx, sep=) xx-substring(xx, 6, 9) nchar(xx) [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [38] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Let me suggest a slightly simpler code, which produces the same output, if the input has length at most 9. xx - c(abc, abcd, abcde) xx - paste(000, xx, sep=) xx - substr(xx, nchar(xx) - 3, nchar(xx)) cbind(xx) # xx #[1,] 0abc #[2,] abcd #[3,] bcde Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Effects Meta Regression
Hi All, I have run a series of random effects meta regressions on binomial outcomes using the metabin function in R. Now I would like to conduct some random effects meta regressions on the outcomes. Is there a command available which will allow for me to test the impact of a certain variable on the relative treatment effect from my meta regressions? Many Thanks, Steph -- View this message in context: http://r.789695.n4.nabble.com/Random-Effects-Meta-Regression-tp3180267p3180267.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate this natural logarithm
On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote: Hello I want to calculate natural logarithm of sum of combinations as follow: (R code) { com_sum=choose(200,482)*choose(100,118)+choose(200,483)*choose(100,117)+...+choose(200,i)*choose(100,600-i)+...+choose(200,600)*choose(100,0) #calculate the sum result=log(com_sum) #calculate the log of the sum } But every element of the com_sum is Inf, so I can't get the result Thank you in advance The natural logarithm of choose() is lchoose(), so the natural logarithms of the products above are i - 482:600 logx - lchoose(200,i) + lchoose(100,600-i) maxlog - max(logx) # [1] 5675.315 The sum of numbers, which have very different magnitudes, may be approximated by their maximum, so max(logx) is an approximation of the required logarithm. A more accurate calculation can be done, for example, as follows maxlog + log(sum(exp(logx - maxlog))) # [1] 5675.977 Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
Google redirect to: https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0 On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.ca wrote: Well it seems to mean that the file has moved. It , to my untrained eye, seems to be in the same sport as it was yesterday BTW I am now talking about dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv Would this indicate it's physically moving on googe servers? I had not realised there was something like rawToChar(). Amazing what R does. Thank , --- On *Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com* wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de Received: Friday, January 7, 2011, 1:21 PM Take a look on the return of: rawToChar(x) On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca wrote: --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org http://mc/compose?to=r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
In firefox, after the download, rigth click in the download and copy download link On Fri, Jan 7, 2011 at 4:35 PM, John Kane jrkrid...@yahoo.ca wrote: Great, but how did you do that? --- On *Fri, 1/7/11, Henrique Dallazuanna www...@gmail.com* wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.ca Cc: r-help@r-project.org, Dieter Menne dieter.me...@menne-biomed.de Received: Friday, January 7, 2011, 1:31 PM Google redirect to: https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7crKufJCh6WJtrooAoCEKIYzT9u-TObiYElZaXj2XQhKRI9-oEMyKH3VN8YGuNtMHs2ec2qKeF0YBWrpSd9jTIs2JI4e6aR60v_AirgtSahjNNjoReLiE2XPZrhm4SF_NwCTu9rBwmirss8EP3UKkoAQ8Nm7JW_Sa6C3Jp-kcbSeo-kl4bCXrZ9yMfS_Ex9maq84-ON6GiGuuHbvtNxuF4HLY3UK7HIjupG3fQYzO9p3UkkTH1c%3Dattredirects=0 On Fri, Jan 7, 2011 at 4:29 PM, John Kane jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca wrote: Well it seems to mean that the file has moved. It , to my untrained eye, seems to be in the same sport as it was yesterday BTW I am now talking about dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv Would this indicate it's physically moving on googe servers? I had not realised there was something like rawToChar(). Amazing what R does. Thank , --- On *Fri, 1/7/11, Henrique Dallazuanna www...@gmail.comhttp://mc/compose?to=www...@gmail.com * wrote: From: Henrique Dallazuanna www...@gmail.comhttp://mc/compose?to=www...@gmail.com Subject: Re: [R] Accessing data via url To: John Kane jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca Cc: r-help@r-project.org http://mc/compose?to=r-h...@r-project.org, Dieter Menne dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de Received: Friday, January 7, 2011, 1:21 PM Take a look on the return of: rawToChar(x) On Fri, Jan 7, 2011 at 4:13 PM, John Kane jrkrid...@yahoo.cahttp://mc/compose?to=jrkrid...@yahoo.ca wrote: --- On Fri, 1/7/11, Dieter Menne dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de wrote: From: Dieter Menne dieter.me...@menne-biomed.dehttp://mc/compose?to=dieter.me...@menne-biomed.de Your original file is no longer there, but when I try RCurl with a png file that is present, I get a certificate error: Dieter Since replaced with dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv library(RCurl) dd - https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv x = getBinaryURL(dd, ssl.verifypeer = FALSE) seems to be downloading a binary file. Thanks __ R-help@r-project.org http://mc/compose?to=r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with 2-ways ANOVA interactions
Maybe a simple concrete example would help: tmpdat - data.frame( One= rep( c('A','B'), each=10 ), + Two=rep( c('C','D'), each=5, length.out=20 ), + mu1 = rep( c(10, 11, 12, 16), each=5 ) ) tmpdat$e - with(tmpdat, ave( rnorm(20), One, Two, FUN=scale ) ) tmpdat$y - with(tmpdat, mu1+e) # check the means tapply( tmpdat$y, tmpdat[,c('One','Two')], mean ) Two One C D A 10 11 B 12 16 # now fit the data fit - aov( y ~ One*Two, data=tmpdat ) # look at what was measured coef(fit) (Intercept)OneBTwoD OneB:TwoD 10 2 1 3 # notice: (16-12) - (11-10) [1] 3 (16-11) - (12-10) [1] 3 # another way of thinking model.tables(fit) Tables of effects One One A B -1.75 1.75 Two Two C D -1.25 1.25 One:Two Two One C D A 0.75 -0.75 B -0.75 0.75 model.tables(fit, 'means') Tables of means Grand mean 12.25 One One AB 10.5 14.0 Two Two CD 11.0 13.5 One:Two Two One C D A 10 11 B 12 16 fit2 - aov( y ~ One + Two, data=tmpdat ) model.tables(fit2) Tables of effects One One A B -1.75 1.75 Two Two C D -1.25 1.25 model.tables(fit2, 'means') Tables of means Grand mean 12.25 One One AB 10.5 14.0 Two Two CD 11.0 13.5 tmpdat2 - expand.grid( One=c('A','B'), Two=c('C','D') ) cbind( tmpdat2, fit=predict(fit, tmpdat2), fit2=predict(fit2, tmpdat2) ) One Two fit fit2 1 A C 10 9.25 2 B C 12 12.75 3 A D 11 11.75 4 B D 16 15.25 Now go back and replace the 16 with 13 and see how things change. Also, how are you accounting for people in your model? Did you have multiple people? Did each person report more than one outcome? You probably need to include person as some form of random effect to properly account for them. This is really getting to the point where you need a consultant (or several more classes). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: Frodo Jedi [mailto:frodo.j...@yahoo.com] Sent: Thursday, January 06, 2011 2:39 PM To: Greg Snow; r-help@r-project.org Subject: Re: [R] Problem with 2-ways ANOVA interactions Dear Greg, many many thanks, still have a doubt: Before I wrongly thought that if I do the analysis anova(response ~ stimulus*condition) I would have got the comparison between the same stimulus in condition A and in condition AH (e.g. stimulus_1_A, stimulus_1_AH). Instad, apparently, the interaction stimulus:condition means that I find the differences between the stimuli keeping fixed the condition!! If this is true then doing the anova with the interaction stimulus:condition is equivalent to do the ONE WAY ANOVA first on the subset where all the conditions are A and then on the subset where all the conditions are AH? Right? I think you are closer, but not quite there. The test on the interaction tests if the difference between A and AH is the same across the different stimuli. The main effect for condition tests if there is a difference between A and AH. So you mean that the interaction compare for example: stimulus1 in condition A with stimulus 2 in condition AH, right? Could you please answer also to my question I did at the end?..that is what at the end I what to know: So if all before is correct, my final question is: how by means of ANOVA can I track the significative differences between the stimuli presented in A and AH condition whitout passing for the t-test? Indeed my goal was to find in one hand if globally the condition AH bring to better results than condition A, and on the other hand I needed to know for which stimuli the condition AH brings better results than condition A. Thanks Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survval analysis microarray expression data
For any given pre-specified gene or short list of genes, yes the Cox model works fine. Two important caveats: 1. Remeber the rule of thumb for a Cox model of 20 events per variable (not n=20). Many microarray studies will have very marginal sample size. 2. If you are looking at many genes then a completely different strategy is required. There is a large and growing literature; I like Newton et al, Annals of Applied Statistictis, 2007, 85-106 as an intro; but expect to read much more. Terry Therneau begin included message - I want to test the expression of a subset of genes for correlation with patient survival. I found out that the coxph function is appropriate for doing this since it works with continuous variables such as Affy mRNA expression values. I applied the following code: cp - coxph(Surv(t.rfs, !e.rfs) ~ ex, pData(eset.n0)) #t.rfs: time to relapse, status (0=alive,1=dead), ex: expression value (continuous) The results I get look sensible but I would appreciate any advice on the correctness and also any suggestions for any (better) alternative methods. Best wishes __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting an Inverse Gamma Distribution to Survey Data
Hello, I've been attempting to fit the data below with an inverse gamma distribution. The reason for this is outside proprietary software (@Risk) kicked back a Pearson5 (inverse gamma) as the best fitting distribution with a Chi-Sqr goodness-of-fit roughly 40% better than with a log-normal fit. Looking up Inverse gamma on this forum led me the following post: http://r.789695.n4.nabble.com/Inverse-gamma-td825481.html#a825482 But I think I'm misunderstanding the suggestion made in that post. Is there way to estimate the shape and rate parameters for an inverse-gamma and then plot the PDF as I have done below using other more readily available pdf's in R? Thanks, Eric library(MASS) iniSal_US_forHist-c(2.368000,3.532614,3.064330,3.347069,3.066333,4.233636,3.465650,2.858553, 2.946731,2.945417,2.415000,2.873019,5.521000,5.788148,5.314630,5.509672,6.032840,6.009310, 4.110833,6.073182,5.652833,4.425733,6.481852,4.076857,3.289310,4.524000,3.985811,5.399714, 4.490606,6.956729,5.270933,8.099107,5.058250,6.394500,5.644000,5.202459,5.67,3.152680, 3.220952,2.777381,3.115467,3.642759,3.488333,3.022439,2.610290,2.618571,3.218000,3.417634, 10.327317,7.344270,6.886154,4.015800,3.063103,6.832292,4.600238,2.939000,5.999027,7.894878, 4.411538,2.384762,6.816154,2.782500,2.475333,2.799138,2.739063,2.619917,2.892545,2.468167, 2.577079,2.821875,2.502500,2.969032,2.046023,3.073077,4.408000,3.411774,3.50,4.283607, 4.284000,4.276714,3.228103,2.639875,3.453194,2.821200,3.838723,1.714253,2.273750,2.611882, 2.321781,2.567500,2.557045,1.288875,2.175211,1.736000,2.250781,7.433366,7.033553,5.47, 7.132727,8.505937,9.174545,6.554487,7.060286,6.617160,8.210986,4.404045,6.062381,5.149625, 2.972105,5.358889,3.910968,3.715873,1.728966,2.843667,4.413906,3.016346,7.168636,3.839394, 3.930141,7.019882,3.459429,5.050250,3.492714,3.226667,3.987667,2.770227,3.661167,1.553000, 2.867391,2.897193,2.611707,2.577167,2.904697,2.733077,2.507241,11.044865,6.425484,8.567222, 8.552344,7.493396,4.807381,9.697869,9.471333,6.783175,4.563571,8.059649,9.448679,5.803778, 4.769423,4.424634,7.586042,4.451556,3.622373,6.390152,4.424375,4.135806,5.025400,5.410635, 7.012292,2.961071,3.192188,2.989643,3.471429,2.867966,1.980541,3.172344,2.574783,2.958983, 1.708140,3.604853,3.479000,2.845000,2.742603,2.923968,3.620308,2.452500,2.721375,3.166333, 2.742162,2.793000,3.337000,5.192025,5.365875,3.079000,8.415970,6.612277,6.734706,4.856857, 5.164783,7.743667,6.894151,4.666538,9.227167,8.077581,6.109833,6.621724,18.098182,12.705600, 15.490784,17.394750,12.422364,14.832727,8.326000,11.352400,3.431429,2.658261,3.219773,3.605185, 4.030299,3.262241,3.503250,3.522763,2.847312,2.996618,3.075769,3.387731,3.066923,3.078200, 2.466957,3.214167,2.707778,3.384839,2.283556,2.912258,3.378000,2.726750,2.95,2.195000, 4.819063,3.604578,3.694906,5.068000,4.676582,3.028831,4.261042,3.593235,4.501224,2.880317, 5.750333,3.257833,3.967458,2.522292,2.725738,2.549231,2.591389,2.990488,2.681222,2.685854, 2.284750,2.585938,2.432824,3.108875,2.611340,3.916667,2.418095,2.476406,2.801235,3.278000, 2.434921,2.617826,3.133939,2.774321,4.196173,3.764286,3.555833,5.317361,3.970800,4.136400, 4.487013,3.746393,4.754000,3.854316,3.742353,3.044079,2.817821,3.995179,3.643134,3.642593, 3.604533,2.935902,4.088310,5.344407,3.076883,3.287105,3.720870,2.032258,2.872593,5.787313, 6.017838,5.425205,4.880600,3.582295,4.90,3.489016,4.603030,5.344407,6.184286,4.047083, 4.788304,4.661325,4.815938,4.056790,3.765595,5.348772,5.200222,4.906311,3.900147,3.782897, 3.767313,3.417732,3.725455,2.888750,2.552333,2.521613,2.531522,2.510833,2.710208,2.445273, 2.619750,2.094737,2.399355,2.758000,2.317077,2.247755,3.594333,4.607805,2.69,3.084706, 3.529000,2.326200,3.309851,2.647805,2.802250,2.778667,3.231235,2.418065,3.134545,3.807843, 2.988372,2.709792,3.084035,3.633279,2.958750,2.170081,2.67,2.640706,2.841600,3.399219, 2.561373,2.574824,3.046447,2.647500,3.554875,1.865000,2.858333,2.355000,2.508082,2.376833, 2.728710,1.752833,1.571967,1.800333,2.265455,2.353226,2.568028,2.359500,2.472639,1.675385, 2.667386,2.49,2.482632,2.593452,2.695510,2.466941,2.624211,3.835645,3.519667,2.661940, 3.516167,3.146585,3.420462,4.809833,3.028500,3.192297,4.256333,2.516897,3.033846,2.793359, 6.700714,5.441250,6.872500,4.528333,7.490328,4.673788,6.376885,3.023167,4.429167,4.317625, 16.729231,8.372500,6.279828,10.805098,8.359452,8.277576,8.008846,8.742308,12.155942,5.975063, 3.317333,2.883021,3.310822,3.119219,3.921190,3.552986,3.647162,4.017667,3.895342,3.381096, 2.769412,3.225294,4.169333,3.733919,2.859492,3.674875,3.401017,3.160267,4.109545,4.347867, 3.867000,3.672763,4.364844,3.804250,2.613000,4.289909,4.212059,4.785797,4.149687,6.299444, 5.640135,5.713448,4.766438,7.032027,5.610533,3.126111,6.322029,4.417692,6.392532,2.753175, 2.549655,3.456533,3.199000,3.852338,3.387549,3.098033,3.271733,3.679180,2.655484,6.858167, 5.808033,7.55,8.388387,5.108732,7.895152,5.223580,5.741714,8.026250,4.928421,2.797292,
Re: [R] Waaaayy off topic...Statistical methods, pub bias, scientific validity
Conclusion: try to be/become a good scientist: with a high prevalence of good ideas. Or, I would say: try to publish only good and mature ideas. Gauss said it best pauca sed matura or few, but ripe. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Claudia Beleites Sent: Friday, January 07, 2011 1:46 PM To: r-help@r-project.org Subject: Re: [R] Wyy off topic...Statistical methods, pub bias, scientific validity On 01/07/2011 06:13 AM, Spencer Graves wrote: A more insidious problem, that may not affect the work of Jonah Lehrer, is political corruption in the way research is funded, with less public and more private funding of research Maybe I'm too pessimistic, but the term _political_ corruption reminds me that I can just as easily imagine a funding bias* in public funding. And I'm not sure it is (or would be) less of a problem just because the interests of private funding are easier to spot. * I think of bias on both sides: the funding agency selecting the studies to support and the researcher subconsciously complying to the expectations of the funding agency. On 01/07/2011 08:06 AM, Peter Langfelder wrote: From a purely statistical and maybe somewhat naive point of view, published p-values should be corrected for the multiple testing that is effectively happening because of the large number of published studies. My experience is also that people will often try several statistical methods to get the most significant p-value but neglect to share that fact with the audience and/or at least attempt to correct the p-values for the selection bias. Even if the number of all the tests were known, I have the impression that the corrected p-value would be kind of the right answer to the wrong question. I'm not particularly interested in the probability of arriving at the presented findings if the null hypothesis were true. I'd rather know the probability that the conclusions are true. Switching to the language of clinical chemistry, this is: I'm presented with the sensitivity of a test, but I really want to know the positive predictive value. What is still missing with the corrected p-values is the prevalence of good ideas of the publishing scientist (not even known for all scientists). And I'm not sure this is not decreasing if the scientist generates and tests more and more ideas. I found my rather hazy thoughts about this much better expressed in the books of Beck-Bornholdt and Dubben (which I'm afraid are only available in German). Conclusion: try to be/become a good scientist: with a high prevalence of good ideas. At least with a high prevalence of good ideas among the tested hypotheses. Including thinking first which hypotheses are the ones to test, and not giving in to the temptation to try out more and more things as one gets more familiar with the experiment/data set/problem. The latter I find very difficult. Including the experience of giving a presentation where I explicitly talked about why I did not do any data-driven optimization of my models. Yet in the discussion I was very prominently told I need to try in addition these other pre-processing techniques and these other modeling techniques - even by people whom I know to be very much aware and concerned about optimistically biased validation results. Which were of course very valid questions (and easy to comply), but I conclude it is common/natural/human to have and want to try out more ideas. Also, after several years in the field and with the same kind of samples of course I run the risk of my ideas being overfit to our kind of samples - this is a cost that I have to pay for the gain due to experience/expertise. Some more thoughts: - reproducibility: I'm analytical chemist. We have huge amounts of work going into round robin trials in order to measure the natural variability of different labs on very defined systems. - we also have huge amounts of work going into calibration transfer, i.e. making quantitative predictive models work on a different instrument. This is always a whole lot of work, and for some fields of problems at the moment considered basically impossible even between two instruments of the same model and manufacturer. The quoted results on the mice are not very astonishing to me... ;-) - Talking about (not so) astonishing differences between between replications of experiments: I find myself moving from reporting ± 1 standard deviation to reporting e.g. the 5th to 95th percentiles. Not only because my data distributions are often not symmetric, but also because I find Im not able to directly perceive the real spread of the data from a standard deviation
Re: [R] how to run linear regression models at once
Hi: On Fri, Jan 7, 2011 at 7:04 AM, wangwallace talentt...@gmail.com wrote: hey, folks, I have two very simple questions. I am not quite sure if I can do this using R. so, I am analyzing a large data frame with thousands of variables. For example: Dependent variables: d1, d2, d3 (i.e., there are three dependent variables) Independent variables: s1, s2, s3, ..s1000 (i.e., there are 1000 independent variables) now I want to run simple linear regression analyses of dependent variables on independent variables. A laborious way to do this is running 1000 linear regression models for each dependent variable separately. This would take a lot of time. My question is: 1) is there a easy way to run all 1000 linear regression analyses for each dependent variable at once? Yes. 2) after running all 1000 linear regression analyses at once, how can I write 3000 regression weights (1000 regression weights for each dependent variable) and its significance level in to a file (e.g., csv, excel, ect). Define 'weights'. Do you mean the parameter estimates? Here's a simplified example, but the output is a separate list object for each response. Each component of each list is an lm object, produced from the lm() function. # Generate some fake data: three responses and eight covariates df - data.frame(y1 = rnorm(50), y2 = rnorm(50), y3 = rnorm(50), x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50), x4 = rpois(50, 30), x5 = rpois(50, 20), x6 = rpois(50, 10), x7 = runif(50), x8 = runif(50)) # Create a vector of covariate names xs - paste('x', 1:8, sep = '') # Initialize a list whose length is that of the vector xs rl1 - vector('list', 8) rl2 - vector('list', 8) rl3 - vector('list', 8) # The following loop regresses all three responses individually on a single covariate x[i] # and exports the results to separate lists for each response # The first statement creates a formula with the name of the i-th covariate # The second statement does the regression and assigns the output to the i-th # component of the list corresponding to the j-th response (j = 1, 2, 3) for(i in 1:8) { fm1- as.formula(paste('y1', xs[i], sep = '~')) fm2 - as.formula(paste('y2', xs[i], sep = '~')) fm3 - as.formula(paste('y3', xs[i], sep = '~')) rl1[[i]] - lm(fm1, data = df) rl2[[i]] - lm(fm2, data = df) rl3[[i]] - lm(fm3, data = df) } # The print method of lm() applied to the first component is rl1[[1]] Each component of each list will resemble the output object you would get from running lm() on a single response with one explanatory variable at a time. In each list, there are as many components as there are covariates. You can extract all sorts of things from each component of the list; I prefer using the ldply() function from package plyr, but there are other ways with base functions. Here are some examples: library(plyr) # R^2 values: ldply(rl1, function(x) summary(x)$r.squared) # Model coefficients: ldply(rl1, function(x) coef(x)) # p-values of significance tests for intercept and slope ldply(rl1, function(x) summary(x)$coefficients[, 4]) # residuals from each model res1 - t(ldply(rl1, function(x) resid(x))) # produces a matrix Some comments: 1. If you want to run multivariate regression on the response matrix [y1 y2 y3], you only need one list object for output. Substitute 'cbind(y1, y2, y3)' for yi when creating the formula. Only one call is needed for multivariate regression rather than three for the univariate regressions. However, you would then need to be more careful about output structures and pulling them together over list components. For instance, in a multivariate regression, the coefficients are output as a matrix rather than a vector, so to combine them all, you would need to use a 3D array rather than a data frame. 2. The 'x' in the anonymous functions in ldply() corresponds to a generic list component. In this context, x is an lm object, so anything you could do with the output of an lm object could be mapped componentwise with ldply. This approach is very convenient if you want to pick off individual output pieces (e.g., R^2 or the root MSE) for all the regressions at once. 3. To associate the covariates with rows of output in each generated data frame above, you could use, e.g., r2y1 - ldply(rl1, function(x) summary(x)$r.squared) rownames(r2y1) - xs In the residual example, the data frame is 8 x 50; using the t() function [for transpose], the result is a 50 x 8 matrix whose columns correspond to the individual covariates, so in this case colnames(res1) - xs 4. Make sure you organize your code and desired output in advance; with 1000 covariates, things could get messy if you're not careful. HTH, Dennis Many thanks in advance!! -- View this message in
[R] Error in x %*% coef(object) : non-conformable arguments
Hello, and thanks in advance! What does the error message Error in x %*% coef(object) : non- conformable arguments when indicate when predicting values for newdata with a model from bigglm (in package biglm), and how can I debug it? I am attempting to do Monte Carlo simulations, which may explain the somewhat interesting loop which follows. After the code I have included the output, which shows that a) the simulations are changing the values, and b) there are not any atypical values for the factors in the seventh iteration. At the end of the output is the aforementioned error message. Code: === iter - nrow(nov.2010) predict.nov.2011 - vector(mode='numeric', length=iter) for (i in 1:iter) { iter.df - nov.2010 ##-- Update values of dynamic variables -- iter.df$age - iter.df$age + 12 iter.df$pct_utilize - iter.df$pct_utilize + mc.util.delta[i] iter.df$updated_varname1 - ceiling(iter.df$updated_varname1 + mc.varname1.delta[i]) if(iter.df$state==WI) iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i] if(iter.df$state==MN) iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i] if(iter.df$state==IL) iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i] if(iter.df$state==US) iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i] ##--- Bin Variables -- iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1, 300:499 = '300 - 499'; 500:549 = '500 - 549'; 550:599 = '550 - 599'; 600:649 = '600 - 649'; 650:699 = '650 - 699'; 700:749 = '700 - 749'; 750:799 = '750 - 799'; 800:849 = 'GE 800'; else= 'missing'; )) iter.df$bin_age - as.factor(recode(iter.df$age, 0:23 = ' 24mo.'; 24:72 = '24 - 72mo.'; 72:300 = '72 - 300mo'; else = 'missing'; )) iter.df$bin_util - as.factor(recode(iter.df$pct_utilize, 0.0:0.2 = ' 0 - 20%'; 0.2:0.4 = ' 20 - 40%'; 0.4:0.6 = ' 40 - 60%'; 0.6:0.8 = ' 60 - 80%'; 0.8:1.0 = ' 80 - 100%'; 1.0:1.2 = '100 - 120%'; else= 'missing'; )) iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop, 0:70 = ' 70%'; 70:85 = ' 70 - 85%'; 85:95 = ' 85 - 95%'; 95:110 = '95 - 110%'; else = 'missing'; )) iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing') iter.df$bin_age - relevel(iter.df$bin_age, 'missing') iter.df$bin_util - relevel(iter.df$bin_util, 'missing') iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing') #~ print(head(iter.df)) if (i=6 i=8){ print('-') browser() print(i) print(table(iter.df$bin_varname1)) print(table(iter.df$bin_age)) print(table(iter.df$bin_util)) print(table(iter.df$bin_varname2)) #~ debug(predict.nov.2011[i] - #~ sum(predict(logModel.1, newdata=iter.df, type='response'))) } predict.nov.2011[i] - sum(predict(logModel.1, newdata=iter.df, type='response')) print(predict.nov.2011[i]) } Output == [1] 36.56073 [1] 561.4516 [1] 4.83483 [1] 5.01398 [1] 7.984146 [1] - Called from: top level Browse[1] [1] 6 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799GE 800 842 283 690 1094 1695 3404 6659 18374 21562 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17906 4832 4599 5154 7205 14865 42 missing 70% 70 - 85% 85 - 95% 95 - 110% 10423 19429 10568 8350 5833 [1] 11.04090 [1] - Called from: top level Browse[1] [1] 7 missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749 750 - 799 847 909 1059 1586 3214 6304 16349 24335 missing 24mo. 24 - 72mo. 72 - 300mo 16 2997 19709 31881 missing0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100 - 120% 17145 4972 4617 5020 6634 16139 76 missing 70% 70 - 85% 85 - 95% 95 - 110% 10423 19429 10568 8350 5833 Error in x %*% coef(object) : non-conformable arguments Model === Large data regression model: bigglm(outcome ~ bin_varname1 + bin_varname2 + bin_age + bin_util + state + varname3 + varname3:state, family = binomial(link = logit), data = dev.data, maxit = 75, sandwich = FALSE) Sample size = 1372250 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
[R] Error: unexpected string constant
I want to analize some points location using the ppp, but i have this problem: Datos=read.table(puntos_texto.txt,dec=.,sep=\t,header=T) summary(Datos) id y x Min. : 1.0 Min. :1013581 Min. :1177842 1st Qu.: 821.2 1st Qu.:1014442 1st Qu.:1179658 Median :1641.5 Median :1014455 Median :1179670 Mean :1641.8 Mean :1014465 Mean :1179652 3rd Qu.:2462.8 3rd Qu.:1014473 3rd Qu.:1179680 Max. :3283.0 Max. :1015575 Max. :1180213 danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842,max(Datos$x)1180213),c(min(Datos$y)1013581,max(Datos$y)1015575)) Error: inesperado constante numérica en danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842 danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842,max(Datos$x)1180213),c(min(Datos$y)1013581,max(Datos$y)1015575)) Error: inesperado string constante en danta=ppp(Datos$x,Datos$y,c(min(Datos$x)1177842 -- View this message in context: http://r.789695.n4.nabble.com/Error-unexpected-string-constant-tp3193987p3193987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare the risk
On Jan 7, 2011, at 1:58 PM, kiotoqq wrote: I have a dataset which looks like this: cbr dust smoking expo 1 0 0.20 15 2 0 0.25 14 3 0 0.25 18 4 0 0.25 14 5 0 0.25 14 6 0 0.25 18 7 0 0.25 18 8 0 0.25 14 9 1 0.25 18 10 0 0.29 17 11 0 0.29 07 12 0 0.29 17 13 0 0.30 0 10 14 0 0.30 1 10 15 16 . . . How can I compare the risk of getting bronchitis between smoker and no-smoker? cbrsmk.tbl - with(dfrm, table(cbr, smoking) ) cbrsmk.tbl cbrsmk.tbl[2, ]/(cbrsmk.tbl[1, ]+cbrsmk.tbl[2, ]) (That homework set must be going awfully slowly.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare the risk
thank you.. but it says dfrm not found (sorry, I'm just very new here) -- View this message in context: http://r.789695.n4.nabble.com/Compare-the-risk-tp3189084p3204440.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble compiling R-2.10.0
Hi, I am having trouble compiling R-2.10.0 on the Solaris x86_64 platform using the native solaris-studio cc/F99 compilers. I am pretty sure that I have all my environment set up for a 64-bit compile. However, when doing a make I get the following error: /opt/solstudio12.2/bin/cc -m64 -I./api -I. -I../../../src/include -I../../../src/include -I/usr/local/include -DHAVE_CONFIG_H -KPIC -xO3 -c alone_decoder.c -o alone_decoder.o sysdefs.h, line 126: invalid type combination sysdefs.h, line 126: warning: useless declaration sysdefs.h, line 126: warning: typedef declares no type name cc: acomp failed for alone_decoder.c *** Error code 2 make: Fatal error: Command failed for target `alone_decoder.o' Current working directory /gtools/src/R-2.10.0/src/extra/xz *** Error code 1 The following command caused the error: make liblzma.a make: Fatal error: Command failed for target `R' Current working directory /gtools/src/R-2.10.0/src/extra/xz *** Error code 1 The following command caused the error: (cd xz; make) make: Fatal error: Command failed for target `make.xz' Current working directory /gtools/src/R-2.10.0/src/extra *** Error code 1 The following command caused the error: for d in scripts include extra appl nmath unix main modules library; do \ (cd ${d} make R) || exit 1; \ done make: Fatal error: Command failed for target `R' Current working directory /gtools/src/R-2.10.0/src *** Error code 1 The following command caused the error: for d in m4 tools doc etc share src tests po; do \ (cd ${d} make R) || exit 1; \ done make: Fatal error: Command failed for target `R' I am perplexed as what could be wrong? Any help would be much appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Trouble-compiling-R-2-10-0-tp3200883p3200883.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare the risk
On Jan 7, 2011, at 5:28 PM, kiotoqq wrote: thank you.. but it says dfrm not found Of course it says, dfrm not found. That's because it's on _my_ workspace. You did not say what you were calling your dataframe and I cannot read your mind. I had to make up some name, and the chances that I would call it the same name as you did are pretty small. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compare the risk
thanks a lot! I got it! -- View this message in context: http://r.789695.n4.nabble.com/Compare-the-risk-tp3189084p3204512.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing over specific columns in a matrix
Hi: It's impolite on this list to hijack a thread to ask your own question. Please start a new thread in the future if you have a question of your own. Re your question, try this: Data1$Daily - with(Data1, PL_Pos - Costs) Data1Sum - aggregate(Daily ~ EnDate, FUN = sum) # Merge the two data frames merge(Result1, Data1Sum, by.x = 'TradeDates', by.y = 'EnDate', all.x = TRUE)) If this is the result you wanted, save it as Result1. If you don't want the NAs, remove the all.x = TRUE part in the merge() call. HTH, Dennis On Fri, Jan 7, 2011 at 10:21 AM, Mark Knecht markkne...@gmail.com wrote: On Fri, Jan 7, 2011 at 8:42 AM, Henrique Dallazuanna www...@gmail.com wrote: Try this: rowSums(rowsum(t(m), rep(1:3, c(2, 2, 1)), na.rm = TRUE)) On Fri, Jan 7, 2011 at 2:29 PM, emj83 stp08...@shef.ac.uk wrote: Hi, I would like to sum some specific columns in my matrix- for example, my matrix looks like this: [,1] [,2] [,3] [,4] [,5] [1,]1 NA NA NA NA [2,]21 NA1 NA [3,]321 21 [4,]432 32 [5,] NA NA NA43 [6,] NA NA NA5 NA I would like to find the sum of the first two columns, the second two columns and the last column: i.e I am left with a vector of c(16, 18, 6). Can you help me extend this example? I'd like to get (PL_Pos - Costs) for each row in Data1, sum those results for each matching date in Result1, and put the result in a new column in Result1 called 'Daily'. Been messing with this for an hour now. Nothing comes close. Thanks, Mark Result1 = structure(list(TradeDates = structure(c(14249, 14250, 14251, 14252, 14253, 14256, 14257, 14258, 14259, 14260, 14263, 14264 ), class = Date)), .Names = TradeDates, row.names = c(NA, 12L), class = data.frame) Data1 = structure(list(Trade = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22), PosType = c(1, 1, -1, -1, -1, -1, -1, 1, -1, -1, -1, 1, 1, -1, -1, 1, 1, -1, 1, 1, 1, -1), EnDate = structure(c(14249, 14250, 14251, 14253, 14256, 14256, 14256, 14257, 14258, 14258, 14259, 14259, 14260, 14264, 14264, 14265, 14266, 14267, 14270, 14271, 14273, 14274), class = Date), EnTime = c(1406, 1318, 838, 846, 846, 1038, 1102, 918, 838, 950, 1134, 1254, 1110, 846, 1318, 854, 950, 838, 1246, 838, 854, 902), ExDate = structure(c(14249, 14250, 14251, 14253, 14256, 14256, 14256, 14257, 14258, 14258, 14259, 14259, 14260, 14264, 14264, 14265, 14266, 14267, 14270, 14271, 14273, 14274 ), class = Date), ExTime = c(1515, 1515, 1030, 942, 1030, 1046, 1110, 1515, 942, 1030, 1142, 1515, 1515, 1030, 1326, 1515, 1515, 1030, 1515, 1515, 1515, 1022), PL_Pos = c(133.5, -41.5, 171, 483.5, 333.5, -29, -54, -291.5, 596, -141.5, -54, 558.5, 533.5, 521, -41.5, 883.5, 358.5, -979, -191.5, 196, -791.5, 446), Costs = c(29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29 ), PL = c(104.5, -70.5, 142, 454.5, 304.5, -58, -83, -320.5, 567, -170.5, -83, 529.5, 504.5, 492, -70.5, 854.5, 329.5, -1008, -220.5, 167, -820.5, 417)), .Names = c(Trade, PosType, EnDate, EnTime, ExDate, ExTime, PL_Pos, Costs, PL ), row.names = c(NA, 22L), class = data.frame) Result1 Data1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble with installing Rmpi package
Hi, I am having a problem with installing Rmpi package on redhat linux machine. The R I am using is version 2.10.1. Heres what happens. install.packages( 'Rmpi' ) --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://cran.cnr.Berkeley.edu/src/contrib/Rmpi_0.5-9.tar.gz' Content type 'application/x-gzip' length 87953 bytes (85 Kb) opened URL == downloaded 85 Kb * installing *source* package âRmpiâ ... checking for gcc... gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking mpi.h usability... no checking mpi.h presence... no checking for mpi.h... no configure: error: Cannot find mpi.h header file ERROR: configuration failed for package âRmpiâ * removing â/usr/local/lib64/R/library/Rmpiâ The downloaded packages are in â/tmp/Rtmp1J1kDj/downloaded_packagesâ Updating HTML index of packages in '.Library' Warning message: In install.packages(Rmpi) : installation of package 'Rmpi' had non-zero exit status library ('Rmpi' ) Error in library(Rmpi) : there is no package called 'Rmpi' I think it is upset because the file mpi.h is missing. Am I right? If so, How would I cure this problem? And if not, what must I do? Please help. Thank you. Tena Sakai tsa...@gallo.ucsf.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with environments
Dear list, Although I tried to make a simple example, this may be too lengthy/unclear of a question for anyone to want to provide an answer. I have been trying to write two functions. My goal is to have an outer function (foo) that will do some computations based on data created by the nested function (f). In particular I want to: 1) create a new data frame using the data from users data, when specified 2) replace user's variable names with standard names (particularly the argument names) 3) when the user does not specify an argument, possibly use defaults to do this, I have been trying to pass the results of match.call() (with some modifications if an argument was not specified), to eval() using the user specified data. Evidently, I do not have a very thorough understanding of environments in R and have met with a spectacular variety of failures. II am also open to suggestions of alternate ways to do this, from my rut I do not see any other elegant solution. As always, insight, suggestions, or recommended readings are greatly appreciated, Josh ## f - function(x, y) { d - as.vector(match.call()[-1L], list) if (missing(y)) d$y - 1 output - as.data.frame(d) } foo - function(data, value) {eval(substitute(value), data)} mydf - data.frame(var1 = 1:10, var2 = letters[1:10]) desired use foo(data = mydf, value = f(var1)) desired output x y 1 1 1 2 2 1 3 3 1 4 4 1 5 5 1 6 6 1 7 7 1 8 8 1 9 9 1 10 10 1 Actual output #Error in data.frame(x = var1, y = 1, check.names = TRUE, # stringsAsFactors = TRUE) : #object 'var1' not found ## It has the right names. This works: eval(substitute(data.frame(x = var1, y = 1)), mydf) ## it is like I am one environment too nested -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Factors -- Sorting x-axis
That rather depends on what kind of plot you want to use. Here is one option that you can use without any changes: ## con - textConnection( Months Prec 1 Jan 102.1 2 Feb69.7 3 Mar44.7 4 Apr32.1 5 May24.0 6 Jun18.7 7 Jul14.0 8 Aug20.0 9 Sep32.4 10Oct58.9 11Nov94.5 12Dec 108.2 ) tab - read.table(con) closeAllConnections() rm(con) with(tab, barplot(Prec, names = as.character(Months))) # ## If you want to adjust the factor so that the levels remain in natural months order, then # tab - within(tab, Months - factor(Months, levels = month.abb)) with(tab, barplot(Prec, names = levels(Months))) ## From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Taylor, Eric HLS:EX [eric.tay...@gov.bc.ca] Sent: 07 January 2011 09:13 To: 'r-h...@stat.math.ethz.ch' Subject: [R] Plotting Factors -- Sorting x-axis Hello; How do I plot these data in R without the Months being ordered alphabetically? Months Prec 1 Jan 102.1 2 Feb69.7 3 Mar44.7 4 Apr32.1 5 May24.0 6 Jun18.7 7 Jul14.0 8 Aug20.0 9 Sep32.4 10Oct58.9 11Nov94.5 12Dec 108.2 Eric Eric Taylor, Air Quality Meteorologist, Health Protection, Ministry of Health Services [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova vs aov commands for anova with repeated measures
lm() and aov() are not fully equivalent. They both fit linear models, but they use different algorighms, and this allows aov, for example, to handle some simple multistratum models. The algorithm used by lm does not allow this, but it has other advantages for simpler models. If you want to fit a multistratum model, such as a repeated measures model, you need to use aov. When it comes to finding the residuals, you have to be explicit about which residuals you mean, too. You get residuals for each stratum in a multistratum model. Using plain old resid() will not work as that function can only be used when there is only one kind of residuals vector defined. (it could me made to do something sensible, but that's another issue. Right now, it doesn't.) The function proj() can be used on a fitted model object to obtain the projections, at each stratum, on to the subspaces defined by the terms in the model, and includes the residuals (the projection onto the orthogonal complement of the model space in R^n) as the final column of the matrix of projections. These are easy to dig out and you can analyse away. See ?proj for an example of its use, but you will need to dig out the appropriate column yourself. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Frodo Jedi [frodo.j...@yahoo.com] Sent: 08 January 2011 01:51 To: r-help@r-project.org Subject: [R] anova vs aov commands for anova with repeated measures Dear all, I need to understand a thing in the beheaviour of the two functions aov and anova in the following case involving an analysis of ANOVA with repeated measures: If I use the folowing command I don´t get any problem: aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) summary(aov1) Instead if I try to fit the same model for the regression I get an error: fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) Error in eval(expr, envir, enclos) : could not find function Error so I cannot run the command anova(fit1) afterwards. I want to use fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) because I want to analyse the residuals in order to check normality, and see if the anova assumption of normality still holds. Could you please help me in understanding how to do this? Thanks in advance Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.