[R] De-serialization vulnerability?
All, There seems to be a hullaboo about a vulnerability in R when deserializing untrusted data: https://hiddenlayer.com/research/r-bitrary-code-execution https://nvd.nist.gov/vuln/detail/CVE-2024-27322 https://www.kb.cert.org/vuls/id/238194 Apparently a fix was made for R 4.4.0, but I see no mention of it in the changes report: https://cloud.r-project.org/bin/windows/base/NEWS.R-4.4.0.html Is this real? Were there changes in R 4.4.0 that aren't reported? Of course, we should *always* update to the most recent version, but I was confused why it wasn't mentioned in the release info. Thanks, Tim __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing shapes in legend with scale_shape_manual
I believe the missing shapes are because you had set alpha=0 for the last geom point. I expect there are better ways, but one way to handle it would be to avoid the filtering, adding columns with med and exercise status, like the following: # setup with data provided Date <- c('2023-10-17', '2023-10-16', '2023-10-15', '2023-10-14', '2023-10-13', '2023-10-12', '2023-10-11') Time <- c('08:50', '06:58', '09:17', '09:04', '08:44', '08:55', '07:55') bg <- c(128, 144, 137, 115, 136, 122, 150) missed_meds <- c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE) no_exercise <- c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE) b2 <- data.frame(Date, Time, bg, missed_meds, no_exercise) b2$Date <- as.Date(b2$Date) # add "status" columns, could also be defined as factor. b2$medStat <- c("missed_meds",NA, NA, NA, NA, NA, "missed_meds") b2$exercise <- c(NA, NA, "missed_exercise",NA,"missed_exercise", "missed_exercise", "missed_exercise") Then your ggplot call would be like this: ggplot(data = b2, aes(x = Date, y = bg)) + geom_line() + geom_point(aes(shape = medStat), size = 3)+ geom_point(aes(shape = exercise),size = 3)+ scale_y_continuous(name = "Blood glucose (mg/dL)", breaks = seq(100, 230, by = 20) ) + geom_hline(yintercept = 130) + scale_shape_manual(name = "Conditions", labels = c("Missed meds", "Missed exercise"), values = c(20, 4) ) Note that this method then gets very close without the scale_shape_manual, too. Hope that helps. Tim > Date: Mon, 30 Oct 2023 20:55:17 + > From: Kevin Zembower > To: r-help@r-project.org > Subject: [R] Missing shapes in legend with scale_shape_manual > Message-ID: > <0100018b825e8f7f-646d2539-f8b5-4e1a-afc3-5d29f961967f- > 000...@email.amazonses.com> > > Content-Type: text/plain; charset="utf-8" > > Hello, > > I'm trying to plot a graph of blood glucose versus date. I also record > conditions, such as missing the previous night's medications, and missing > exercise on the previous day. My data looks like: > > > b2[68:74,] > # A tibble: 7 × 5 > Date Time bg missed_meds no_exercise > > 1 2023-10-17 08:50128 TRUEFALSE > 2 2023-10-16 06:58144 FALSE FALSE > 3 2023-10-15 09:17137 FALSE TRUE > 4 2023-10-14 09:04115 FALSE FALSE > 5 2023-10-13 08:44136 FALSE TRUE > 6 2023-10-12 08:55122 FALSE TRUE > 7 2023-10-11 07:55150 TRUETRUE > > > > This gets me most of the way to what I want: > > ggplot(data = b2, aes(x = Date, y = bg)) + > geom_line() + > geom_point(data = filter(b2, missed_meds), >shape = 20, >size = 3) + > geom_point(data = filter(b2, no_exercise), >shape = 4, >size = 3) + > geom_point(aes(x = Date, y = bg, shape = missed_meds), >alpha = 0) + #Invisible point layer for shape mapping > scale_y_continuous(name = "Blood glucose (mg/dL)", >breaks = seq(100, 230, by = 20) >) + > geom_hline(yintercept = 130) + > scale_shape_manual(name = "Conditions", >labels = c("Missed meds", > "Missed exercise"), >values = c(20, 4), >## size = 3 >) > > However, the legend just prints an empty square in front of the labels. > What I want is a filled circle (shape 20) in front of "Missed meds" and a > filled > circle (shape 4) in front of "Missed exercise." > > My questions are: > 1. How can I fix my plot to show the shapes in the legend? > 2. Can my overall plotting method be improved? Would you do it this way? > > Thanks so much for your advice and guidance. > > -Kevin > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] win110 + R 4.3.0 dowload only when method = "wininet"
Also note that with R version 4.2.1 and later, there's another change for computers within many corporate firewall systems (that monitor network traffic). The result is that in the default settings R can't connect out to https connections (and thus can't install or update packages). You just get an error that R can't connect, it may be because of the super-special security in your network. I don't know if this is your issue, but it seemed worth pointing out if so. (and worth getting the word out there for others...) Here's the discussion of the bug and the patch. https://bugs.r-project.org/show_bug.cgi?id=18379 If you are in this situation, the fix is to add a new User Environment Variable: R_LIBCURL_SSL_REVOKE_BEST_EFFORT and set it to TRUE Tim >>>>>> Date: Thu, 11 May 2023 09:39:25 +0300 From: Ivan Krylov To: =?UTF-8?B?U2ViYXN0acOhbg==?= Kruk Gencarelli Cc: "r-help@r-project.org" Subject: Re: [R] win110 + R 4.3.0 dowload only when method = "wininet" Message-ID: <20230511090040.2015ff59@Tarkus> Content-Type: text/plain; charset="utf-8" On Wed, 10 May 2023 22:13:15 + Sebasti�n Kruk Gencarelli wrote: > I don�t know what kind of proxy is used by my system. Do you at least know whether it's an HTTP proxy, a SOCKS proxy, or something completely different? Do you need to authenticate to the proxy? > How I can add my proxy and port to my R configuration? libcurl recognises the "http_proxy" environment variable, which can include the protocol, the address and the port of the proxy server: https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcurl.se%2Flibcurl%2Fc%2FCURLOPT_PROXY.html=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=UHNtHLqF67%2FGq9%2FILSW6pHmuhhaHsdI1qwl%2FVXun5hI%3D=0<https://curl.se/libcurl/c/CURLOPT_PROXY.html> For example, if you have a SOCKS5 hostname-resolving proxy on address 192.2.0.1 and port 1080, you can try calling Sys.setenv('http_proxy' = 'socks5h://192.2.0.1:1080'). For a different example, an HTTP proxy on 198.51.100.2, port 8080 can be set up as Sys.setenv('http_proxy' = 'https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2F198.51.100.2%3A8080%2F=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=K8Lp3oLzHHA9LxIq6U5QrdQvEV63%2BTVOq930i6HABlw%3D=0<http://198.51.100.2:8080/>'). (In both cases, the proxy will be used by curl for HTTP and HTTPS access, but not, say, FTP or other protocols. For HTTPS via HTTP proxy, curl will be using the CONNECT verb.) Detecting the right proxy for a given address (and yes, it's expected to do that per-address) is hard�. It's been a "nice to have" feature in curl for years: https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcurl.se%2Fdocs%2Ftodo.html%23auto_detect_proxy=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Ucxe4e4OK%2BSUzB9dhhBPrtcb%2FlLF3bcq6tC1OocY8IM%3D=0<https://curl.se/docs/todo.html#auto_detect_proxy>, so I'm afraid it won't happen any time soon. It's unfortunate that the curl back-end will break something that used to work with wininet. I wonder whether it could make sense to call WinHttpGetIEProxyConfigForCurrentUser() [*] when setting up connections on Windows. There's no simple way to deal with the proxy auto-configuration file [**], but we could at least call curl_easy_setopt(handle, CURLOPT_PROXY, wchar_to_current_locale(proxy_config->lpszProxy)); to make the transition from wininet to curl slightly less painful. Or would that break even more installations? This won't help in the more complicated cases [***], though. -- Best regards, Ivan [*] https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Fwinhttp%2Fnf-winhttp-winhttpgetieproxyconfigforcurrentuser=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=ERbQZxCXaR4wc8%2F5%2BRiVYdFt3lmFyCQ29tGYalcljqk%3D=0<https://learn.microsoft.com/en-us/windows/win32/api/winhttp/nf-winhttp-winhttpgetieproxyconfigforcurrentuser> [**] https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.mozilla.org%2Fen-US%2Fdocs%2FWeb%2FHTTP%2FProxy_servers_and_tunneli
Re: [R] Regex Split?
If you only want the character strings, this seems a little simpler: > strsplit("a bc,def, adef ,,gh", "[ ,]+", perl=T) [[1]] [1] "a""bc" "def" "adef" "gh" If you need delimeters (the commas) you could then add them back in again afterwards. Tim -- Message: 2 Date: Thu, 4 May 2023 23:59:33 +0300 From: Leonard Mada To: R-help Mailing List Subject: [R] Regex Split? Message-ID: <7b1cdbe7-0086-24b4-9da6-369296ead...@syonic.eu> Content-Type: text/plain; charset="utf-8"; Format="flowed" Dear R-Users, I tried the following 3 Regex expressions in R 4.3: strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T) # "a" "bc" "," "def" "," "" "adef" "," "," "gh" strsplit("a bc,def, adef ,,gh", " |(?https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R version 4.2.1 install.packages does not work with IE proxy setting
Petr, You also might want to check out the bug reported here: https://bugs.r-project.org/show_bug.cgi?id=18379 it was fixed and the last two comments discuss how to handle it in Windows: you add a new User Environment Variable: R_LIBCURL_SSL_REVOKE_BEST_EFFORT and set it to TRUE This fix is in R-4.2.1 Patched (I don't know if it has made it out to the full distribution) and works in my 'corporate' environment. Perhaps it also applies to your environment. Tim Date: Wed, 5 Oct 2022 10:34:02 + From: PIKAL Petr To: Ivan Krylov Cc: r-help mailing list Subject: Re: [R] R version 4.2.1 install.packages does not work with IE proxy setting Message-ID: <9b38aacf51d746bb87a9cc3765a16...@srvexchcm1302.precheza.cz> Content-Type: text/plain; charset="us-ascii" Thanks, the workaround works but we need try the "permanent" solution with Renviron.site file in future. Cheers Petr > -Original Message- > From: Ivan Krylov > Sent: Tuesday, October 4, 2022 5:43 PM > To: PIKAL Petr > Cc: r-help mailing list > Subject: Re: [R] R version 4.2.1 install.packages does not work with IE proxy > setting > > On Tue, 4 Oct 2022 11:01:14 + > PIKAL Petr wrote: > > > After we installed new R version R 4.2.1 installing packages through > > IE proxy setting is compromised with warning that R could not connect > > to server (tested in vanilla R). > > R 4.1 deprecated the use of download.file(method = 'wininet'). R 4.2 switched > the default download method to 'libcurl' and started giving warnings for > 'wininet' and http[s]:// URLs. > > A workaround to get it working right now would be to set > options(download.file.method = 'wininet') and live with the resulting warnings > while R downloads the files, but a next version of R may remove 'wininet' > support altogether. > > In order to get it working with the 'libcurl' method, you'll need to provide some > environment variables to curl: > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fpipermail%2Fr-help%2F2022-September%2F475917.htmldata=05%7C01%7Ctim.howard%40dec.ny.gov%7C1b8601d6e602486b347508daa781d26e%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638006473285010034%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=7w1P6Sfu7V3DUM4a3Ezgmf87bXn8CnC%2FipTvXfBpI0c%3Dreserved=0 > > Not sure if libcurl would accept a patch to discover the Windows proxy settings > automatically, but I don't think it does that now. > > -- > Best regards, > Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formula compared to call within model call
Manipulating formulas within different models I notice the following: m1 <- lm(formula = hp ~ cyl, data = mtcars) m2 <- update(m1, formula. = hp ~ cyl) all.equal(m1, m2) #> [1] TRUE identical(m1, m2) #> [1] FALSE waldo::compare(m1, m2) #> `old$call[[2]]` is a call #> `new$call[[2]]` is an S3 object of class , a call I'm aware formulas are a form of call but what I'm unsure of is whether there is meaningful difference between the two versions of the models? Any clarification, even just on the relation between formulas and calls would be useful. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Potential Bug: file.show does not support double-byte characters in the file path
Hi, I may have come across a bug regarding the file.show function in the base package. ### Bug Description file.show does not support double-byte characters in the file path. ### Platform Info > sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043) Matrix products: default locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936 [3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_China.936 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0 tools_4.1.0 ### Steps to Reproduce 1. Install R-4.1.0 for Windows on Windows 10 x64 21H1 (Chinese-Simplified, zh-CN) 2. Open RGui 3. Create a file named "test.txt" under "C:\Test", so that the full file path does not include double-byte characters 4. Create a file named "test.txt" under "C:\测试", so that the full file path includes double-byte characters 5. Run the following commands in the console > t1 = "C:\\Test\\test.txt" > t1 [1] "C:\\Test\\test.txt" > file.show(t1) > > t2 = "C:\\测试\\test.txt" > t2 [1] "C:\\测试\\test.txt" > file.show(t2) Warning message: In file.show(t2) : file.show(): file 'C:\测试\test.txt' does not exist Actual Result: file.show() can correctly open the "C:\Test\test.txt" as the file path does not contain double-byte characters, but it fails to do so for "C:\测试\test.txt". ### Additional information For users with usernames that include double-byte characters, this bug even prevents demo() from functioning properly. > demo() Warning message: In file.show(outFile, delete.file = TRUE, title = paste("R", tolower(x$title))) : file.show():不存在'C:\Users\娴嬭瘯涓瓇1\AppData\Local\Temp\RtmpGuWTun\RpackageIQR197068c75431'这个文件 file.show() can correctly read from files containing double-byte characters if the file paths do not. For example, the content for "C:\Test\test.txt" can be "Testing 123 测试". file.show("C:\\Test\\test.txt") would open this file and display its content correctly without specifying the encoding parameter. Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] weibulltools version 2.0.0 released on CRAN
Dear all, we are happy to announce that a new version of weibulltools is available on CRAN. https://CRAN.R-project.org/package=weibulltools weibulltools is a package that focuses on statistical methods and visualizations that are often used in reliability engineering. The core idea is to provide a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field and bench test data as simple as possible. Compared to previous releases (<= 1.0.1), the new version (2.0.0) includes a data-based interface through which the obtained function outputs can be easily passed into subsequent methods and visualizations. Furthermore, all visualizations that were generated dynamically with 'plotly', can now also be created statically with the support of 'ggplot2'. The previously used vector-based approach has been largely preserved but is no longer recommended. Examples for the usage of the new interface, the documentation of all functions, as well as further changes can be found at https://tim-tu.github.io/weibulltools/ If you notice a bug or have suggestions for improvements, please submit an issue with a minimal reproducible example at https://github.com/Tim-TU/weibulltools/issues/ Best regards, Tim-Gunnar Hensel and David Barkemeyer ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] understanding as.list(substitute(...()))
Cheers Denes, That's useful to know. I'll stick with the match.call version instead. Interestingly I initially tried to post to R-devel (as I thought it may involve something internal) but was asked to post here instead. Best Tim On Tue, 6 Oct 2020 at 08:22, Dénes Tóth wrote: > > Hi Tim, > > I have also asked a similar question a couple of months ago, and someone > else did the same recently, maybe on r-devel. > > We received no "official" response, but Deepayan Sarkar (R Core Team > member) claimed that: > > " > There is no documented reason for this to work (AFAIK), so again, I > would guess this is a side-effect of the implementation, and not a API > feature you should rely on. This is somewhat borne out by the > following: > > > foo <- function(...) substitute({...()}) > > foo(abc$de, fg[h], i) > { > pairlist(abc$de, fg[h], i) > } > > foo(abc$de, fg[h], , i) # add a missing argument for extra fun > { > as.pairlist(alist(abc$de, fg[h], , i)) > } > > which is not something you would expect to see at the user level. So > my recommendation: don't use ...() and pretend that you never > discovered it in the first place. Use match.call() instead, as > suggested by Serguei. > > [Disclaimer: I have no idea what is actually going on, so these are > just guesses. There are some hints at > https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Dot_002ddot_002ddot-arguments > if you want to folllow up.] > " > > Cheers, > Denes > > > > > On 10/6/20 8:38 AM, Tim Taylor wrote: > > I probably need to be more specific. What confuses me is not the use > > of substitute, but the parenthesis after the dots. It clearly works > > and I can make guesses as to why but it is definitely not obvious. > > The following function gives the same final result but I can > > understand what is happening. > > > > dots <- function (...) { > > exprs <- substitute(list(...)) > > as.list(exprs[-1]) > > } > > > > In the original, dots <- function(...) as.list(substitute(...())), > > Does ...() get parsed in a special way? > > > > Tim > > > > On Tue, 6 Oct 2020 at 05:30, Bert Gunter wrote: > >> > >> You need to understand what substitute() does -- see ?substitute and/or a > >> tutorial on "R computing on the language" or similar. > >> > >> Here is a simple example that may clarify: > >> > >>> dots <- function(...) as.list(substitute(...())) > >>> dots(log(foo)) > >> [[1]] > >> log(foo) ## a call, a language object > >> > >>> dots2 <- function(...) as.list(...) > >>> dots2(log(foo)) > >> Error in as.list(...) : object 'foo' not found > >> ## substitute() does not evaluate its argument; as.list() does > >> > >> Cheers, > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along and > >> sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Mon, Oct 5, 2020 at 1:37 PM Tim Taylor > >> wrote: > >>> > >>> Could someone explain what is happening with the ...() of the > >>> following function: > >>> > >>> dots <- function(...) as.list(substitute(...())) > >>> > >>> I understand what I'm getting as a result but not why. ?dots and > >>> ?substitute leave me none the wiser. > >>> > >>> regards > >>> Tim > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] understanding as.list(substitute(...()))
I probably need to be more specific. What confuses me is not the use of substitute, but the parenthesis after the dots. It clearly works and I can make guesses as to why but it is definitely not obvious. The following function gives the same final result but I can understand what is happening. dots <- function (...) { exprs <- substitute(list(...)) as.list(exprs[-1]) } In the original, dots <- function(...) as.list(substitute(...())), Does ...() get parsed in a special way? Tim On Tue, 6 Oct 2020 at 05:30, Bert Gunter wrote: > > You need to understand what substitute() does -- see ?substitute and/or a > tutorial on "R computing on the language" or similar. > > Here is a simple example that may clarify: > > > dots <- function(...) as.list(substitute(...())) > > dots(log(foo)) > [[1]] > log(foo) ## a call, a language object > > > dots2 <- function(...) as.list(...) > > dots2(log(foo)) > Error in as.list(...) : object 'foo' not found > ## substitute() does not evaluate its argument; as.list() does > > Cheers, > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Oct 5, 2020 at 1:37 PM Tim Taylor > wrote: >> >> Could someone explain what is happening with the ...() of the >> following function: >> >> dots <- function(...) as.list(substitute(...())) >> >> I understand what I'm getting as a result but not why. ?dots and >> ?substitute leave me none the wiser. >> >> regards >> Tim >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] understanding as.list(substitute(...()))
Could someone explain what is happening with the ...() of the following function: dots <- function(...) as.list(substitute(...())) I understand what I'm getting as a result but not why. ?dots and ?substitute leave me none the wiser. regards Tim __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compositional Data Analysis
Hi All, I am currently working on a project examining the health effects of physical activity using a compositional data (CoDa) approach. I am using a linear regression to measure the effect of physical activity. What I want to do is predict an outcome, e.g. body mass index, based on the mean physical activity composition. I then want to alter this composition by reallocating time from one behavior (such as sedentary) to light intensity physical activity, and see what effect this has on BMI. I have included the code below. My problem is that I cannot seem to generate this new composition correctly. library(compositions) attach(Data_Analysis) #Replace zeros in raw data with 0.1 bedtimemins[bedtimemins==0] <- 0.1 sedwakingmins[sedwakingmins==0] <- 0.1 standingtimemins[standingtimemins==0] <- 0.1 LIPAmins[LIPAmins==0] <- 0.1 MVPAmins[MVPAmins==0] <- 0.1 # make the composition by binding the components together comp <- cbind(bedtimemins, sedwakingmins, standingtimemins, LIPAmins, MVPAmins) # tell R that comp is a compositional variable comp <- acomp(comp) #Generate variation matrix PAvar_matrix <- var.acomp(comp) # make the ilr multiple linear regression model. BMI = body mass index. . lm <- lm(bmi~ ilr(comp) + age) #determine the mean composition, geometric mean of behaviors which sum to 1 comp.mean <- mean(comp) # predict BMI for the mean composition from above, keeping age constant at its mean. mean.pred <- predict(lm, newdata=list(comp=comp.mean, age = mean(age))) #generate new compostion of PA with 30 minutes of sleep reallocated to 30 mins of LIPA new.comp<- acomp(comp.mean +c(-30/1440, 0/1440, 0/1440, 30/1440, 0/1440)) #predict BMI based on new composition pred<- predict(lm, newdata=list(comp=new.comp, age=mean(age))) #Estimated difference in BMI for the above reallocation pred- mean.pred When I run the command to generate the new composition, I receive the message below new.comp<- acomp(comp.mean +c(-30/1440, 0/1440, 0/1440, 30/1440, 0/1440)) Warning messages: 1: In acomp(gsi.mul(x, y)) : Negative values in composition are used as detection limits 2: In acomp(comp.mean + c(-30/1440, 0/1440, 0/1440, 30/1440, 0/1440)) : Negative values in composition are used as detection limits When I print comp.mean I get a result that seems to make sense: comp.mean bedtimeminssedwakingmins standingtimemins LIPAmins MVPAmins 0.36410089 0.32460230 0.222783240.06948202 0.01903155 I can't say the same for the new composition: new.comp bedtimeminssedwakingmins standingtimemins LIPAmins MVPAmins <5.240217 BDL BDL 1.00BDL Any help or advice would be greatly appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] project path in Rmd (Ivan Calandra)
You also should be able to reference locations from the 'root' of your project using the here() package. https://here.r-lib.org/ Best, Tim Howard > Date: Thu, 2 Apr 2020 10:21:47 +0100 > From: Rui Barradas > To: Ivan Calandra > Cc: "r-help@r-project.org" > Subject: Re: [R] project path in Rmd > Message-ID: <09f30516-7af5-fee7-5efd-e6db69c75...@sapo.pt> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > Hello, > > This is not an answer to the original problem, it's about '..'. (And a > bit more.) > > About '..', see if the following sequence of instructions can help. > Subdirectories '~/tmp' and '~/snap' exist on my PC, change to > '~/analysis/scripts' or to what makes sense on yours. > > od <- getwd() # save this for later > > setwd('~/tmp') # > list.files('../snap') # goes up one level and > # executes a SO/files related command > curr <- getwd() > basename(curr) # these two instructions are meant to show that > dirname(curr) # you don't need to know how many levels you have > # to go up, you can parse 'curr' if basename and > # dirname are not enough > > setwd(od) # back to where I was > > > Hope this helps, > Rui Barradas > Às 10:02 de 02/04/20, Ivan Calandra escreveu: > I do not know this ".." command (could you please show me how to use it > in a relative path?), but it sounds like a good start. > > But it implies that I know in advance how many folders up the parent > directory is. I guess in most cases it will always be the same, but it > would be even better if it could be applied generically. > > As I said, ideally, I would like to get the project directory from a > script located in a subfolder. > > Thanks! > Ivan > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > On 02/04/2020 10:54, Ivan Krylov wrote: >> On Thu, 2 Apr 2020 10:30:29 +0200 >> Ivan Calandra wrote: >> >>> The problem I then have is to specify the path for 'raw_data' and >>> 'derived_data' <...> And these folders are not subfolders of >>> the working directory '~/analysis/scripts'. >> >>> I would like to avoid absolute paths of course >> Is there a reason to avoid relative paths built using '..' to access >> parent directories? >> > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] substr gives empty output
In y <- substr(x, i, 1) your third integer needs to be the location not the number of digits, so change it to y <- substr(x, i, i) and you should get what you want. Cheers, Tim > Date: Sun, 21 Jan 2018 10:50:31 -0500 > From: Ek Esawi <esaw...@gmail.com> > To: Luigi Marongiu <marongiu.lu...@gmail.com>, r-help@r-project.org > Subject: Re: [R] substr gives empty output > Message-ID: > <CA+ZkTxubYDSZ3iqsg_=be9HBA2_3-TE95=mXbh4atvG- > ri_...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > The reason you get "" is, as stated on the previous response and on the > documentation of substr function, the function "When extracting, if start is > larger than the string length then "" is returned.". This is what happens on > your function. > > HTH > > EK > > On Sun, Jan 21, 2018 at 3:59 AM, Luigi Marongiu <marongiu.lu...@gmail.com> > wrote: > > Dear all, > > I have a string, let's say "testing", and I would like to extract in > > sequence each letter (character) from it. But when I use substr() I > > only properly get the first character, the rest is empty (""). What am > > I getting wrong? > > For example, I have this code: > > > >>>> > > x <- "testing" > > k <- nchar(x) > > for (i in 1:k) { > > y <- substr(x, i, 1) > > print(y) > > } > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interesting behavior of lm() with small, problematic data sets
I've recently come across the following results reported from the lm() function when applied to a particular type of admittedly difficult data. When working with small data sets (for instance 3 points) with the same response for different predicting variable, the resulting slope estimate is a reasonable approximation of the expected 0.0, but the p-value of that slope estimate is a surprising value. A reproducible example is included below, along with the output of the summary of results # example code x <- c(1,2,3) y <- c(1,1,1) #above results in{ (1,1) (2,1) (3,1)} data set to regress new.rez <- lm (y ~ x) # regress constant y on changing x) summary(new.rez) # display results of regression end of example code Results: Call: lm(formula = y ~ x) Residuals: 1 2 3 5.906e-17 -1.181e-16 5.906e-17 Coefficients: Estimate Std. Errort value Pr(>|t|) (Intercept) 1.000e+00 2.210e-16 4.525e+15 <2e-16 *** x -1.772e-16 1.023e-16 -1.732e+000.333 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.447e-16 on 1 degrees of freedom Multiple R-squared: 0.7794,Adjusted R-squared: 0.5589 F-statistic: 3.534 on 1 and 1 DF, p-value: 0.3112 Warning message: In summary.lm(new.rez) : essentially perfect fit: summary may be unreliable ## There is a warning that the summary may be unreliable sue to the essentially perfect fit, but a p-value of 0.3112 doesn’t seem reasonable. As a side note, the various r^2 values seem odd too. Tim Glover Senior Scientist II (Geochemistry, Statistics), Americas - Environment & Infrastructure, Amec Foster Wheeler 271 Mill Road, Chelmsford, Massachusetts, USA 01824-4105 T +01 978 692 9090 D +01 978 392 5383 M +01 850 445 5039 tim.glo...@amecfw.com amecfw.com This message is the property of Amec Foster Wheeler plc and/or its subsidiaries and/or affiliates and is intended only for the named recipient(s). Its contents (including any attachments) may be confidential, legally privileged or otherwise protected from disclosure by law. Unauthorised use, copying, distribution or disclosure of any of it may be unlawful and is strictly prohibited. We assume no responsibility to persons other than the intended named recipient(s) and do not accept liability for any errors or omissions which are a result of email transmission. If you have received this message in error, please notify us immediately by reply email to the sender and confirm that the original message and any attachments and copies have been destroyed and deleted from your system. If you do not wish to receive future unsolicited commercial electronic messages from us, please forward this email to: unsubscr...@amecfw.com and include “Unsubscribe” in the subject line. If applicable, you will continue to receive invoices, project communications and similar factual, non-commercial electronic communications. Please click http://amecfw.com/email-disclaimer for notices and company information in relation to emails originating in the UK, Italy or France. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CRAN I-MR / Xbar-R / Xbar-S control chart package ?
Interesting, thanks for that. I came accross qcc but my quick scan of the docs is that it only did xbars but maybe I need to re-read the docs, I guess it does the individual plot versions (I-MR) too. Tim On 8 July 2017 at 20:53, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > I have no experience with I-MR charts but a google search found package qcc. > Maybe it's what you're looking for. > > Hope this helps, > > Rui Barradas > > > Em 08-07-2017 09:07, Tim Smith escreveu: >> >> Hi, >> >> I've had a quick look through the package list, and unless I've missed >> something, I can't seem to find anything that will do I-MR / Xbar-R / >> Xbar-S control charts ? >> >> Assuming there is something out there, can anyone point me in the >> right direction ? >> >> Thanks ! >> >> TIm >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CRAN I-MR / Xbar-R / Xbar-S control chart package ?
Hi, I've had a quick look through the package list, and unless I've missed something, I can't seem to find anything that will do I-MR / Xbar-R / Xbar-S control charts ? Assuming there is something out there, can anyone point me in the right direction ? Thanks ! TIm __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] CRAN updates: rpg and odeintr
rpg is a package for working with postgresql: https://github.com/thk686/rpg odeintr is a package for integrating differential equations: https://github.com/thk686/odeintr Cheers, THK http://www.keittlab.org/ [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem installing/loading packages from Africa
I'm very new to R and I live in Mali, west Africa. I'm on *OS X 10.7.5*. I downloaded and installed *R 3.2.1*. I downloaded and installed *RStudio 00.99.893*. I ran through the free Microsoft data camp intro to R course, then started another free course through 'edX', for Data and Statistics for Life Sciences, using R. The first prompt in the course is to install.packages("swirl"). Copied below are the various error messages I get when trying to install or load any package. My best guess is that the problems I'm having are due to being in west Africa, with unreliable connections, weak connections and no CRAN Mirror closer than Italy or Spain (as far as I know). I checked into common package errors on the RStudio page, but I'm not confident enough in my computing to get into internet proxies and some of the other suggested troubleshooting. Any insight would be very helpful. See error messages below. Thanks -- Tim - When attempting to install, the print-out I get in the Console in RStudio is any length of: > install.packages("swirl") % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- 0:00:10 --:--:-- 0 0 00 00 0 0 0 --:--:-- 0:00:10 --:--:-- 0 0 00 00 0 0 0 --:--:-- 0:00:11 --:--:-- 0 22 207k 22 483840 0 3833 0 0:00:55 0:00:12 0:00:43 19740 30 207k 30 648230 0 4808 0 0:00:44 0:00:13 0:00:31 19578 38 207k 38 812070 0 5638 0 0:00:37 0:00:14 0:00:23 19193 46 207k 46 97591 0 0 6299 0 0:00:33 0:00:15 0:00:18 20076 53 207k 53 111k0 0 6966 0 0:00:30 0:00:16 0:00:14 22717 69 207k 69 143k0 0 8423 0 0:00:25 0:00:17 0:00:08 20487 76 207k 76 159k0 0 8821 0 0:00:24 0:00:18 0:00:06 19621 92 207k 92 191k0 0 10085 0 0:00:21 0:00:19 0:00:02 22841100 207k 100 207k0 0 10363 0 0:00:20 0:00:20 --:--:-- 230100 207k 100 207k0 0 10363 0 0:00:20 0:00:20 --:--:-- 23922 The downloaded binary packages are in /var/folders/yb/7z339kn56mdbwx92ydmsqswcgn/T//RtmpBzQ16u/downloaded_packages For other packages, ggplot2, forexample, a similar printout can run for hundreds and hundred of lines. So RStudio tells me that the packages are available to load, BUT: - when *loading* any package an error comes up saying package 'name of package' was built under R version 3.2.5 but info on swirl says it should work on any version of R above 3.0.2. I get similar errors for other packages (treemap, ggplot2, etc). - Or sometimes I"ll get this: Error : .onAttach failed in attachNamespace() for 'swirl', details: call: stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE) error: object 'C_stri_join' not found In addition: Warning message: package ‘swirl’ was built under R version 3.2.5 Error: package or namespace load failed for ‘swirl’ Any suggestions? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to conduct a PERMANOVA using a dissimilarity matrix
arity matrix Dear Michael, You need to create another file dividing the different categories/ variables (make sure that they are in the same order for the variable you are testing), then attach it and run the adonis function. For example: attach(metadata) adonis(formula = vegdist(matrixfile, method = "bray") ~ Site, data = metadata) It's also important to think about if you have nestedness in your design, in which case you would need to add strata = factor: adonis(formula = vegdist(matrixfile, method = "bray") ~ Species, data = metadata, strata = Site) I hope this helps for now! Let me know if you need more detail. Hazel (BSc student). From: R-help <r-help-boun...@r-project.org> on behalf of michael.eisenr...@agroscope.admin.ch <michael.eisenr...@agroscope.admin.ch> Sent: 22 December 2015 16:18 To: r-help@r-project.org Subject: [R] How to conduct a PERMANOVA using a dissimilarity matrix Dear R-List members, I have to compare how similar two types of forest (old growth=O) and (young forest=Y) in terms of moth communities are. I sampled moths at 4 O and 4 Y sites. I need to analyse the data using a PERMANOVA approach. But I am having a really hard time to do this in R. I found out that I need to create a dissimilarity matrix and read this matrix then into R to conduct a one-way Permanova with forest type (O or Y) as factor. The package vegan with the function "adonis" seems to be able to do a permanova. I created the matrix (based on Soerenson (dis)similarities) and imported it into R. Could anyone help me with the next step? How can I conduct a permanova on my dataset? In the end I would need an R value and significance level telling me if community compositions differ significantly between sites. Below is my code (not too much) and the data for the matrix. #dput for matrix: structure(c("", "O", "Y", "Y", "Y", "O", "O", "Y", "O", "O", "0", "0.544", "0.519", "0.533", "0.481", "0.548", "0.518", "0.479", "Y", "0.544", "0", "0.383", "0.416", "0.383", "0.358", "0.434", "0.399", "Y", "0.519", "0.383", "0", "0.398", "0.359", "0.392", "0.401", "0.374", "Y", "0.533", "0.416", "0.398", "0", "0.398", "0.399", "0.358", "0.348", "O", "0.481", "0.383", "0.359", "0.398", "0", "0.37", "0.317", "0.354", "O", "0.548", "0.358", "0.392", "0.399", "0.37", "0", "0.39", "0.365", "Y", "0.518", "0.434", "0.401", "0.358", "0.317", "0.39", "0", "0.371", "O", "0.479", "0.399", "0.374", "0.348", "0.354", "0.365", "0.371", "0"), .Dim = c(9L, 9L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9"))) #Code #load dissimilarity matrix (based on Soerenson similarity) moth_dta<-read.csv("Geo_sorenson_8.csv",header=T,sep=";")#Creates matrix from imported data moth_dta<-as.matrix(moth_dta) moth_dta library(vegan) Thank you very much, Michael Eisenring Michael, Msc. PhD Student Federal Department of Economic Affairs, Education and Research EAER Institute of Sustainability Sciences ISS Biosafety Reckenholzstrasse 191, CH-8046 Z rich Tel. +41 44 37 77181 Fax +41 44 37 77201 michael.eisenr...@agroscope.admin.ch<mailto:michael.eisenr...@agroscope.admin.ch> www.agroscope.ch<http://www.agroscope.ch/> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regular R freezes
Dear List, some days ago my R started to regularily freezes, especially during these operations: mso(,permutations=999), package vegan anova.cca(), package vegan forward.sel(), package packfor I am running R on a 64-bit Windows 7 installation, 4 cores, 8 GB RAM. I just today updated R to 3.2.3, all packages, and R-Studio to their most recent versions. Still the error persists. Two things are strange: All these functions (and more) have been working without any problem for 2 years. The freezes occur at different points in the algorithms. For forward.sel(), a stepwise variable selection process in species distribution modelling, the system fails at different steps (say after variable 8 or 18), but inevitably fails. R-Studio is still responsive meanwhile, but the "Stop"-Buttom does not work, and the programm needs to be restarted. Is there any way of troubleshooting or finding out who the culprit is? Thanks. -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to find out if two cells in a dataframe belong to the same pre-specified factor-level
Thank you, that turned out to work very well. If you want to, you can answer it here: http://stackoverflow.com/questions/32809249/how-to-find-out-if-two-cells-in-a-dataframe-belong-to-the-same-pre-specified-fac/ The question wasnt properly answered, which is why i switched to R-list. On 28.09.2015 20:15, Adams, Jean wrote: Here's one approach that works. I made some changes to the code you provided. Full working example code given below. library(reshape) library(ggplot2) library(dplyr) dist1 <- matrix(runif(16), 4, 4) dist2 <- matrix(runif(16), 4, 4) rownames(dist1) <- colnames(dist1) <- paste0("A", 1:4) rownames(dist2) <- colnames(dist2) <- paste0("A", 1:4) m1 <- melt(dist1) m2 <- melt(dist2) # I changed the by= argument here final <- full_join(m1, m2, by=c("X1", "X2")) # I made some changes to keep spcs character and grps factor species <- data.frame(spcs=paste0("A", 1:4), grps=as.factor(c(rep("cat", 2), (rep("dog", 2, stringsAsFactors=FALSE) # define new variables for final indicating group membership final$g1 <- species$grps[match(final$X1, species$spcs)] final$g2 <- species$grps[match(final$X2, species$spcs)] final$group <- as.factor(with(final, ifelse(g1==g2, as.character(g1), "dif"))) # plot just the rows with matching groups ggplot(final[final$group!="dif", ], aes(value.x, value.y, col=group)) + geom_point() # plot all the rows ggplot(final, aes(value.x, value.y, col=group)) + geom_point() Jean On Sun, Sep 27, 2015 at 4:22 PM, <trich...@uni-bremen.de <mailto:trich...@uni-bremen.de>> wrote: Dear list, I really couldnt find a better way to describe my question, so please bear with me. To illustrate my problem, i have a matrix with ecological distances (m1) and one with genetic distances (m2) for a number of biological species. I have merged both matrices and want to plot both distances versus each other, as illustrated in this example: library(reshape) library(ggplot2) library(dplyr) dist1 <- matrix(runif(16),4,4) dist2 <- matrix(runif(16),4,4) rownames(dist1) <- colnames(dist1) <- paste0("A",1:4) rownames(dist2) <- colnames(dist2) <- paste0("A",1:4) m1 <- melt(dist1) m2 <- melt(dist2) final <- full_join(m1,m2, by=c("Var1","Var2")) ggplot(final, aes(value.x,value.y)) + geom_point() Here is the twist: The biological species belong to certain groups, which are given in the dataframe `species`, for example: species <- data.frame(spcs=as.character(paste0("A",1:4)), grps=as.factor(c(rep("cat",2),(rep("dog",2) I want to check if a x,y pair in final (as in `final$Var1`, `final$Var2`) belongs to the same group of species (here "cat" or "dog"), and then want to color all groups specifically in the x,y-scatterplot. Thus, i need an R translation for: final$group <- If (final$Var1 and final$Var2) belong to the same group as specified in species, then assign the species group here, else do nothing or assign NA so i can proceed with ggplot(final, aes(value.x,value.y, col=group)) + geom_point() So, in the example, the pairs A1-A1, A1-A2, A2-A1, A2-A2 should be identified as "both cats", hence should get the factor "cat". Thank you very much! Tim __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Download showing as exploit
Hi , I work for the NHS, and our IT service has been unable to download as its anti-virus software says it contains an exploit. Is this normal? Is there a way around this? Kind regards, Tim Kingston Sent from my HTC [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross correlation between two time series over nested time periods?
Thanks! Here the period of my time series B is a proper subinterval of the period of A. Does ccf(A,B) requires A and B span the same period? If A and B don't span the same period, what does ccf do? When moving B along the period of A by a lag, does ccf(A,B) calculate the cross correlation between B and the part of A overlapping with B? Or does ccf(A,B) calculate the cross correlation between A and the extension of B to the period of A by zero padding? On Thu, 5/14/15, Franklin Bretschneider brets...@xs4all.nl wrote: Subject: Re: [R] Cross correlation between two time series over nested time periods? Date: Thursday, May 14, 2015, 6:14 AM On 2015-05-14 , at 02:11, Tim via R-help r-help@r-project.org wrote: Hello Tim, Re: I have two time series Calculate and plot cross correlation between two time series over nested time periods. Each point in either time series is for a week (not exactly a calendar week, but the first week in a calendar year always starts from Jan 1, and the other weeks in the same year follow that, and the last week of the year may contain more than 7 days but no more than 13 days). The first time series A is stored in a compressed (.gz) text file, which looks like (each week and the corresponding time series value are separated by a comma in a line): week,value 20060101-20060107,0 20060108-20060114,5 ... 20061217-20061223,0 20061224-20061230,0 20070101-20070107,0 20070108-20070114,4 ... 20150903-20150909,0 20150910-20150916,1 The second time series B is similarly stored in a compressed (.gz) text file, but over a subset of period of A, which looks like: week,value 20130122-20130128,509 20130129-20130204,204 ... 20131217-20131223,150 20131224-20131231,148.0 20140101-20140107,365.0 20140108-20140114,45.0 ... 20150305-20150311,0 20150312-20150318,364 I wonder how to calculate the cross correlation between the two time series A and B (up to a specified maximum lag), and plot A and B in a single plot? The auto- and crosscorrelation functions are in the stats package: acf(x, lag.max = NULL, type = c(correlation, covariance, partial), plot = TRUE, na.action = na.fail, demean = TRUE, ...) ccf(x, y, lag.max = NULL, type = c(correlation, covariance), plot = TRUE, na.action = na.fail, ...) See further: ?ccf Succes and Best wishes, Frank --- Franklin Bretschneider Dept of Biology Utrecht University brets...@xs4all.nl __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cross correlation between two time series over nested time periods?
Dear R users, I have two time series Calculate and plot cross correlation between two time series over nested time periods. Each point in either time series is for a week (not exactly a calendar week, but the first week in a calendar year always starts from Jan 1, and the other weeks in the same year follow that, and the last week of the year may contain more than 7 days but no more than 13 days). The first time series A is stored in a compressed (.gz) text file, which looks like (each week and the corresponding time series value are separated by a comma in a line): week,value 20060101-20060107,0 20060108-20060114,5 ... 20061217-20061223,0 20061224-20061230,0 20070101-20070107,0 20070108-20070114,4 ... 20150903-20150909,0 20150910-20150916,1 The second time series B is similarly stored in a compressed (.gz) text file, but over a subset of period of A, which looks like: week,value 20130122-20130128,509 20130129-20130204,204 ... 20131217-20131223,150 20131224-20131231,148.0 20140101-20140107,365.0 20140108-20140114,45.0 ... 20150305-20150311,0 20150312-20150318,364 I wonder how to calculate the cross correlation between the two time series A and B (up to a specified maximum lag), and plot A and B in a single plot? Thanks and regards, Tim __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split a dataframe by rownames and/or colnames
Thank you very much for the line. It was doing the split as suggested. However, i want to release all the dataframes to the environment (later on, for each dataframe, some dozen lines of code will be carried out, and i dont know how to do it w lapply or for-looping, so i do it separately): list2env(split(df, sub(.+_,, rownames(df))), envir=.GlobalEnv) Anyway, the dataframes have now numeric names in some cases, and cannot be easily accessed because of it. How would the line be altered to add an df_ for each of the dataframe names resulting from list2env? Thank you very much! Thanks, On 20.02.2015 20:36, David Winsemius wrote: On Feb 20, 2015, at 9:33 AM, Tim Richter-Heitmann wrote: Dear List, Consider this example df - data.frame(matrix(rnorm(9*9), ncol=9)) names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 3_o1) row.names(df) - names(df) indx - gsub(.*_, , names(df)) I can split the dataframe by the index that is given in the column.names after the underscore _. list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep=_)), envir=.GlobalEnv) However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do as.data.frame(t(df_x), but maybe that is not elegant. What would be the solution for splitting the dataframe by rows? The split.data.frame method seems to work perfectly well with a rownames-derived index argument: split(df, sub(.+_,, rownames(df) ) ) $`1` c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 c_1 -0.11 -0.04 1.33 -0.87 -0.16 -0.25 -0.75 0.34 0.14 d_1 -0.62 -0.94 0.80 -0.78 -0.70 0.74 0.11 1.44 -0.33 e_1 0.98 -0.83 0.48 0.19 -0.32 -1.01 1.28 1.04 -2.16 $o1 c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 1_o1 -0.93 -0.02 0.69 -0.67 1.04 1.04 -1.50 -0.36 0.50 2_o1 0.02 -0.16 -0.09 -1.50 -0.02 -1.04 1.07 -0.45 1.56 3_o1 -1.42 0.88 -0.05 0.85 -1.35 0.21 1.35 0.92 -0.76 $p c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1 a_p -1.35 0.91 -0.58 -0.63 0.94 -1.13 0.71 0.25 0.82 b_p -0.25 -0.73 -0.41 -1.71 1.28 0.19 -0.35 1.74 -0.93 c_p -0.01 -1.11 -0.12 0.58 1.51 0.03 -0.99 -0.23 -0.03 Thank you very much! -- Tim Richter-Heitmann -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split a dataframe by rownames and/or colnames
Dear List, Consider this example df - data.frame(matrix(rnorm(9*9), ncol=9)) names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 3_o1) row.names(df) - names(df) indx - gsub(.*_, , names(df)) I can split the dataframe by the index that is given in the column.names after the underscore _. list2env( setNames( lapply(split(colnames(df), indx), function(x) df[x]), paste('df', sort(unique(indx)), sep=_)), envir=.GlobalEnv) However, i changed my mind and want to do it now by rownames. Exchanging colnames with rownames does not work, it gives the exact same output (9 rows x 3 columns). I could do as.data.frame(t(df_x), but maybe that is not elegant. What would be the solution for splitting the dataframe by rows? Thank you very much! -- Tim Richter-Heitmann __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table with missing data and consecutive delimiters
All, Assume we have data in an ASCII file that looks like Var1$Var2$Var3$Var4 1$2$3$4 2$$5 $$$6 When I execute read.table( 'test.dat', header=TRUE, sep='$' ) I, of course, receive the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 2 did not have 4 elements When I set fill=TRUE, e.g., read.table( 'test.dat', header=TRUE, sep='$', fill=TRUE ) I get: Var1 Var2 Var3 Var4 11234 22 NA5 NA 3 NA NA NA6 What I need is Var1 Var2 Var3 Var4 11234 22 NA NA5 3 NA NA NA6 What am I missing? Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Network graph google maps
Hello everybody. I am not sure if I am in the right forum for my question. If so, please let me know so I can go somewhere else! Thanks. Currently I am trying to familiarize myself with R. I am trying to plot some sort of Network graph (think of migration maps). Basically, I'd like to have an interactive map (via the google maps packages for R f.e.) of The Netherlands which has nodes and edges. However, I stumble upon some problems. I figured out how to create edges and nodes in R using the map function (this was quite the victory for me). R automatically creates a world map on which I can plot the data. But the world map is nowhere near as detailed as an interactive google map (via, f.e., gvisMap package). So I was looking for a way to get a google map, and plot data onto that, but I just can't seem to find a tutorial. I found this document, which shows several graphs, but none is a network graph. The goal is to eventually connect it to shiny and be able to plot different! kinds of network graphs based on input. I really need a tutorial as I am still a big noob in R :) So is there someone who is willing to help me find some sort of tutorial? Any help is much appreciated! Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rtmvnorm {tmvtnorm} seems broken
General linear constraints don't seem to work. I get an error message if I have more constraint equations than variables. E.g. executing the following code print(R.version) library('tmvtnorm') cat('tmvtnorm version ') print(packageVersion('tmvtnorm')) ## Let's constrain our sample to the dwarfed hypercube of dimension p. p - 3 # dimension mean - rep(0,p) sigma - diag(p) ## a = Dx = b a - c(rep(0,p),-Inf) b - c(rep(1,p),2) D - rbind(diag(p),rep(1,p)) cat('mean is\n'); print(mean) cat('a is\n'); print(a) cat('b is\n'); print(b) cat('D is\n'); print(D) X - rtmvnorm(n=1000, mean, sigma, D=D, lower=a, upper=b, algorithm=gibbsR) produces the following output platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 1.0 year 2014 month 04 day10 svn rev65387 language R version.string R version 3.1.0 (2014-04-10) nickname Spring Dance tmvtnorm version [1] '1.4.9' mean is [1] 0 0 0 a is [1]000 -Inf b is [1] 1 1 1 2 D is [,1] [,2] [,3] [1,]100 [2,]010 [3,]001 [4,]111 Error in checkTmvArgs(mean, sigma, lower, upper) (from rtmvnorm-test.R#18) : mean, lower and upper must have the same length That error message is not appropriate when a matrix of linear constraints is passed in. I emailed the package maintainer on the 3rd but received only an automatic out-of-office reply. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating submatrices from a dataframe, depending on factors in sample names
Hello there, this is a cross-post of a stack-overflow question, which wasnt answered, but is very important for my work. Apologies for breaking any rules, but i do hope for some help from the list instead: I have a huge matrix of pairwise similarity percentages between different samples. The samples are belonging to groups. The groups are determined by the suffix _n in the row.names/header names. In the first step, i wanted to create submatrices consisting of all pairs within single groups (i.e. for all samples from _1). However, I realized that i need to know all pairwise submatrices, between all combination of groups. So, i want to create (a list of) vectors that are named _n1 vs _n2 (or similar) for all combinations of n, as illustrated by the colored rectangulars: http://i.stack.imgur.com/XMkxj.png Reproducible code, as provided by helpful Stack Overflow members, dealing with identical _ns. df - structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8, 88.9, 87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4, 97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4, 98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4, NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4, 98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9, 97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8, 94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = c(88.2, 96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = c(88.3, 95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names = c(HQ673618_1, HQ674317_1, EU686630_1, EU686593_2, JN166322_2, EU491340_2, AB694259_3, AB694258_3, AB694462_3), class = data.frame, row.names = c(HQ673618_1, HQ674317_1, EU686630_1, EU686593_2, JN166322_2, EU491340_2, AB694259_3, AB694258_3, AB694462_3)) indx - gsub(.*_, , names(df)) sub.matrices - lapply(unique(indx), function(x) { temp - which(indx %in% x) df[temp, temp] }) unique_values - lapply(sub.matrices, function(x) x[upper.tri(x)]) names(unique_values) - unique(indx) This code needs to be expanded to form sub.matrices for any combination of unique indices in temp. Thank you so much! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] big datasets for R
Another source of large datasets is the Public Data Sets on AWS http://aws.amazon.com/public-data-sets/ Tim Hoolihan @thoolihan http://linkedin.com/in/timhoolihan On Oct 28, 2014, at 7:00 AM, r-help-requ...@r-project.org wrote: -- Message: 2 Date: Mon, 27 Oct 2014 13:37:12 +0100 From: Qiong Cai qiong@gmail.com To: r-help@r-project.org Subject: [R] big datasets for R Message-ID: caorq85hxzxgsua-zdy5ee30m6qlxuvu8cqtkwbfewovbg1a...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 Hi, Could anyone please tell me where I can find very big datasets for R? I'd like to do some benchmarking on R by stressing R a lot. Thanks Qiong [[alternative HTML version deleted]] -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I overwrite a method in R?
How can I create an improved version of a method in R, and have it be used? Short version: I think plot.histogram has a bug, and I'd like to try a version with a fix. But when I call hist(), my fixed version doesn't get used. Long version: hist() calls plot() which calls plot.histogram() which fails to pass ... when it calls plot.window(). As a result hist() ignores xaxs and yaxs arguments. I'd like to make my own copy of plot.histogram that passes ... to plot.window(). If I just make my own copy of plot.histogram, plot() ignores it, because my version is not part of the same graphics package that plot belongs to. If I copy hist, hist.default and plot, the copies inherit the same environments as the originals, and behave the same. If I also change the environment of each to .GlobalEnv, hist.default fails in a .Call because it cannot find C_BinCount. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [vegan] Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' when performing anova on CCA objects
Hi there, i have three dataframes, which have the same structure (data.frame, 358 observations of 340 variables, every column is numeric), and are basically submatrices of a parental dataset. I can perform a CCA on all of them (with the explanatory dataset being abio) /cca1_ab - cca(df1~., data=abio)/ I can also perform an anova on all three cca-derived values /anova(cca1_ab)/ However, if i do /sig1_ab-anova(cca1_ab, by=term, perm=200)/ which i believe gives me confidence values for the fitting of the explanatory variables, only one of the cca-derived values gives an unexpected Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' The other two are working fine. Its hard to come up with an explanation, as the datasets look and behave the same otherwise. Any idea why this could happen? I am sorry for not providing data for reproduction, as the data sets are pretty large. -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Stra�e (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I overwrite a method in R?
Thank you Duncan, Brian, Hadley, and Lin. In Lin's suggestion, I believe the latter two statements should be reversed, so that the environment is added before the function is placed into the graphics namespace. source('plot.histogram.R') environment(plot.histogram) - asNamespace('graphics') assignInNamespace('plot.histogram', plot.histogram, ns='graphics') The middle statement could also be environment(plot.histogram) - environment(graphics:::plot.histogram) The point is to ensure that the replacement version has the same environment as the original. Having tested this, I will now submit a bug report :-) On Thu, Oct 9, 2014 at 9:11 AM, C Lin bac...@hotmail.com wrote: I posted similar question a while ago. Search for modify function in a package. In your case, the following should work. source('plot.histogram.R') assignInNamespace('plot.histogram',plot.histogram,ns='graphics') environment(plot.histogram) - asNamespace('graphics'); Assuming you have your own plot.histogram function inside plot.histogram.R and the plot.histogram function you are trying to overwrite is in graphics package. Lin From: h.wick...@gmail.com Date: Thu, 9 Oct 2014 07:00:31 -0500 To: timhesterb...@gmail.com CC: r-h...@stat.math.ethz.ch Subject: Re: [R] How can I overwrite a method in R? This is usually ill-advised, but I think it's the right solution for your problem: assignInNamespace(plot.histogram, function(...) plot(1:10), graphics) hist(1:10) Haley On Thu, Oct 9, 2014 at 1:14 AM, Tim Hesterberg timhesterb...@gmail.com wrote: How can I create an improved version of a method in R, and have it be used? Short version: I think plot.histogram has a bug, and I'd like to try a version with a fix. But when I call hist(), my fixed version doesn't get used. Long version: hist() calls plot() which calls plot.histogram() which fails to pass ... when it calls plot.window(). As a result hist() ignores xaxs and yaxs arguments. I'd like to make my own copy of plot.histogram that passes ... to plot.window(). If I just make my own copy of plot.histogram, plot() ignores it, because my version is not part of the same graphics package that plot belongs to. If I copy hist, hist.default and plot, the copies inherit the same environments as the originals, and behave the same. If I also change the environment of each to .GlobalEnv, hist.default fails in a .Call because it cannot find C_BinCount. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tim Hesterberg http://www.timhesterberg.net (resampling, water bottle rockets, computers to Costa Rica, hot shower = 2650 light bulbs, ...) Help your students understand statistics: Mathematical Statistics with Resampling and R, Chihara Hesterberg http://www.timhesterberg.net/bootstrap/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gplot heatmaps: clustering according to rowsidecolors + key.xtickfun
Hi there, I have two questions about heatmap.2 in gplot. My input is a simple square matrix with numeric values between 75 and 100 (it is a similarity matrix based on bacterial DNA sequences). 1. I can sort my input matrix into categories with rowsidecolors (in this case, very conveniently by bacterial taxa). I do a clustering and reordering of my matrix by Rowv=TRUE (and Colv=Rowv). The question is now, can i combine the two features that the clustering/reordering is done only for submatrices defined by the vectors given in rowsidecolors (so, in this case, that the original ordering by bacterial taxa is preserved)? That would be very amazing. 2. I have set my own coloring rules with: mypal - c(grey,blue, green,yellow,orange,red) col_breaks = c(seq(0,74.9), seq(75.0,78.4), seq(78.5,81.9), seq(82.0,86.4), seq(86.5, 94.5), seq(94.5,100.0)) Is it possible to pass this sequential ordering to key.xtickfun? May i ask for an example code? Thank you very much! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [vegan]Envfit, pvalues and ggplot2
Good Morning, first let me thank you very much for answering my first two questions on this list. Currently, i do vegan's EnvFit to simple PCA ordinations. When drawing the biplot, one can set a cutoff to just fit the parameters with significant p-values (via p.max=0.05 in the plot command). There is already sufficient coverage on the net for biplotting this kind of data with ggplot2 (with the problem being the arrow length). http://stackoverflow.com/questions/14711470/plotting-envfit-vectors-vegan-package-in-ggplot2 However, what the solution is not covering is the exclusion from insignificant environmental parameters, as the score extraction process as described in the link only works with 'display=vectors': |Envfit_scores- as.data.frame(scores(list_from_envfit, display= vectors)) | envfit creates lists like this: PC1PC2r2Pr(r) param1-0.708820.705390.09940.000999*** param2-0.601220.799080.05930.000999*** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 P values based on 999 permutations. The list contains a vector called $pval containing the pvalues. So, i need to reduce the list created by envfit to rows meeting a criterion in $pval (via unlist and which, i suppose). However, i have difficulties to work out the correct code. Any help is much appreciated! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with labeling plots, possibly in font defaults
Hello, I am using R 3.1.1 on a (four year old) MacBook, running OSX 10.9.4. I just tried making and labeling a plot as follows: x-rnorm(10) y-rnorm(10) plot(x,y) title(main=random points) which produces a scatter plot of the random points, but without the title and without any numbers along the axes. If I then run par(family=sans) plot(x,y,main=plot title) the plot has the title and the numbers on the axes (also and 'x' and 'y' appear as default labels for the axes). I do not know what is going on, but maybe there is some problem in the default font settings (I don't know if that could be an R issue or an issue specific to my Mac)? This is clearly not a big problem (at least for now) since I can put labels on plots by running par(), but if it is indicative of a larger underlying problem (or if there is a simple fix) I would like to know. Thank you! tb [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with read.table and data structure
Hi there! I have huge datafile of 600 columns 360 samples: data - read.table(small.txt, header = TRUE, sep = \t, dec = ., row.names=1) The txt.file (compiled with excel) is showing me only numbers, however R gives me the structure of ANY column as factor. When i try stringsAsFactors=FALSE in the read command, the structure of the dataset becomes character. When i try as.numeric(data), i get Error: (list) object cannot be coerced to type 'double' even, if i try to subset columns with []. When i try as.numeric on single columns with $, i am successful, but the numbers dont make any sense at all, as the factors are not converted by their levels: Factor w/ 358 levels 0,123111694,..: 11 14 50 12 38 44 13 76 31 30 becomes num 11 14 50 12 38 44 13 76 31 30 whereas i would need the levels, though! I suspect excel to mess up the save as tab-delimited text, but the text file seems fine with me on surface (i dont know how the numbers are stored internally). I just see correct numbers, also the View command yields the correct content. Anyone knows help? Its pretty annoying. Thank you! -- Tim Richter-Heitmann [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] x,y scatterplots, with averaging by a column
Hallo! I have this matrix: SampleID, Day, Species1, Species2,Species3,...,Speciesn 1,Monday,abundance values 2,Monday,abundance values 11,Tuesday,abundance values 12,Tuesday,abundance values 21,Wednesday,abundance values 22,Wednesday,abundance values I would like to plot the Days on the x-axis, and the species abundance data on the y-axis. Each pair of values (i just have two measurements for each day) should be represented by their mean with absolute error bars (so we have basically a vertical min,max-range) and whiskers. The means for each Species observation should be connected with lines. Also of interest would be annotating the whispers with their sample ID (because the whiskers basically represent the values for y1,2 (11,12; 21,22)). Any help is welcome! I am new to R, so please bear with me. Thank you! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Large matrices and pairs/ggpairs
Hi there, i recently started to learn R to deal with a huge data matrix to deal with with ecological data (about 40 observations in 360 samples, divided in 7 groups of similar samples). I want to do a simple pairs or ggpairs: pairs(mydata, pch=20, col=brewer.pal(7, Dark2)[unclass(all$group)]) ggpairs(mydata, colour=group) (preferred) Of course, the plot margins are way to small to produce a 40x40 plot matrix (or is there a way to create poster-sized plots?), so i need to produce 39 x,y scatter-plots for each single column and arrange them somehow. Is there a way to do it automatically (without creating 39 single plots one by one?) Another option would be to only show plots with x,y pairs with correlation values above a certain treshold (i would need to create the cor-matrix first and identify those highly correlated pairs by myself, correct, or is there a way to tell R to do that?) And finally, ggpairs is giving me fits because the font size of the upper part does not change despite me trying to tell R to do so (params=list(corSize=10) or just params=list(Size=10) or upper=list(params=c(Size=10)). As i said i am a beginner and hopefully my questions are not too stupid. -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: using coord_trans for logit - probability
I think all you have to do is add type=response to your call for the predictions. Does this work for you # get fitted values on the logit scale pred - data.frame(Arthritis, predict(arth.logistic, se.fit=TRUE,type=response)) library(ggplot2) library(scales) # plot on logit scale gg - ggplot(pred, aes(x=Age, y=fit)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = fit - 1.96 * se.fit, ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = transparent) + labs(x = Age, y = Log odds (Better)) gg -Tim On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly frien...@yorku.ca wrote: I'm trying to see if how I can use coord_trans() with ggplot2 to transform the Y axis of a plot on the logit scale to the probability scale, as opposed to recalculating everything manually and constructing a new plot. Here is a simple example of the 'base' plot I'd like to transform: data(Arthritis, package=vcdExtra) Arthritis$Better - as.numeric(Arthritis$Improved None) arth.logistic - glm(Better ~ Age, data=Arthritis, family=binomial) # get fitted values on the logit scale pred - data.frame(Arthritis, predict(arth.logistic, se.fit=TRUE)) library(ggplot2) library(scales) # plot on logit scale gg - ggplot(pred, aes(x=Age, y=fit)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = fit - 1.96 * se.fit, ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = transparent) + labs(x = Age, y = Log odds (Better)) gg Things I've tried that don't work: gg + coord_trans(ytrans=logis) Error in get(as.character(FUN), mode = function, envir = envir) : object 'logis_trans' of mode 'function' was not found gg + coord_trans(ytrans=probability_trans(logis)) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed In addition: Warning message: In qfun(x, ...) : NaNs produced Doing what I want manually: # doing it manually pred2 - within(pred, { prob - plogis(fit) lower - plogis(fit - 1.96 * se.fit) upper - plogis(fit + 1.96 * se.fit) }) gg2 - ggplot(pred2, aes(x=Age, y=prob)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = transparent) + labs(x = Age, y = Probability (Better)) gg2 -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: using coord_trans for logit - probability
Sorry I jumped the gun. That does not provide you with the same plot as gg2 that you are aiming for. -T On Wed, Apr 16, 2014 at 7:37 PM, Tim Marcella timmarce...@gmail.com wrote: I think all you have to do is add type=response to your call for the predictions. Does this work for you # get fitted values on the logit scale pred - data.frame(Arthritis, predict(arth.logistic, se.fit=TRUE,type=response)) library(ggplot2) library(scales) # plot on logit scale gg - ggplot(pred, aes(x=Age, y=fit)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = fit - 1.96 * se.fit, ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = transparent) + labs(x = Age, y = Log odds (Better)) gg -Tim On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly frien...@yorku.cawrote: I'm trying to see if how I can use coord_trans() with ggplot2 to transform the Y axis of a plot on the logit scale to the probability scale, as opposed to recalculating everything manually and constructing a new plot. Here is a simple example of the 'base' plot I'd like to transform: data(Arthritis, package=vcdExtra) Arthritis$Better - as.numeric(Arthritis$Improved None) arth.logistic - glm(Better ~ Age, data=Arthritis, family=binomial) # get fitted values on the logit scale pred - data.frame(Arthritis, predict(arth.logistic, se.fit=TRUE)) library(ggplot2) library(scales) # plot on logit scale gg - ggplot(pred, aes(x=Age, y=fit)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = fit - 1.96 * se.fit, ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = transparent) + labs(x = Age, y = Log odds (Better)) gg Things I've tried that don't work: gg + coord_trans(ytrans=logis) Error in get(as.character(FUN), mode = function, envir = envir) : object 'logis_trans' of mode 'function' was not found gg + coord_trans(ytrans=probability_trans(logis)) Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed In addition: Warning message: In qfun(x, ...) : NaNs produced Doing what I want manually: # doing it manually pred2 - within(pred, { prob - plogis(fit) lower - plogis(fit - 1.96 * se.fit) upper - plogis(fit + 1.96 * se.fit) }) gg2 - ggplot(pred2, aes(x=Age, y=prob)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, color = transparent) + labs(x = Age, y = Probability (Better)) gg2 -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tim Marcella 508.498.2989 timmarce...@gmail.com -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot mlogit object
Hi, I cannot figure out how or if I even can plot the results from a nested multinomial logit model. I am using the mlogit package. Does anyone have a lead on any tutorials? Both of the vignettes are lacking plotting instructions. Thanks, Tim -- Tim Marcella [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time interactions for coxph object
Hi All, I am working in the 'survival' library and need to know how to code a variable time interaction for left truncated data (not all subjects followed from the start of the study). I therefor add in the entry time and exit time variables into the formula. Here is my basic formula CSHR.shore.fly - coxph(Surv(entry, exit, to == 1) ~ shore.cat, data glba.mod) My variable shore.cat is violating the proportional hazards assumption so I am trying to add in an interaction with time. Do I interact exit? entry? or the range of the two? Thanks, Tim -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nested Logit Model
Hi All, I am using the package 'mlogit' I am having trouble constructing a nested logit model. I am headed down this path as I am violating the IIA assumption when running a multinomial logistic regression. I am analyzing the choice a seabird floating on the surface of the water makes when approached by a ship. I have three choices defined for the dependent variable 'act2'; fly, dive, or remain on the water (none). Independent variables I hypothesize might affect the decision are perpendicular distance to the ship's path, group size, location, distance to shore, speed of the ship, number of ships in the area and wind speed. All the variables are individual specific. A non-nested mutinomial logit model will run fine using the following code (data converted to long format already). mod.1 - mlogit(act2 ~ 0 | dist + as.factor(grp.bin) + as.factor(ship) + as.factor(sea) + avgknots + shore + as.factor(location),long_perp , reflevel = 'none') After assessing that I am violating the IIA assumption I attempted to run a nested logit model with the following code. nested.1 - mlogit(act2 ~ 0 | dist + as.factor(grp.bin) + as.factor(ship) + as.factor(sea) + avgknots + shore + as.factor(location),long_perp , reflevel = 'none', nests = list(react = c(dive,fly), remain = none), unscaled=TRUE) I get the following error code. Error in solve.default(crossprod(attr(x, gradi)[, !fixed])) : Lapack routine dgesv: system is exactly singular: U[6,6] = 0 Does this indicate that my variables are col-linear?, overfitting? I have tried to remove all but one variable and rerun and still get an error code. Can anyone help me get over this hump or help suggest a different modeling approach? I am mainly interested in the probability of choosing to react (fly or dive) or not, and then once a reaction has been made, which one is chosen and how these decisions relate to perpendicular distance to the ship's path. Thanks, Tim -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multicore - handling a list of return values
Hi, I was trying to gather/combine the results returned from the mclapply function. This is how I do it if only one data object is returned: #= This works fine === library(multicore) frandom1 - function(iter,var1 = 3,var2 =2){ mat - matrix(rnorm(var1*var2),nrow=var1,ncol=var2) return(mat) } N - 3 ### OK temp1 - mclapply(1:N,frandom1,mc.cores=2) result1 - do.call(rbind,temp1) print(result1) # Now, I want to return more than one object from my function and I am not sure how best to combine the values returned. My current code is: #=== ??? = frandom2 - function(iter,var1 = 3,var2 =2){ mat - matrix(rnorm(var1*var2),nrow=var1,ncol=var2) vect1 - sample(1:1000,var1) return(list(mat,vect1)) } temp2 - mclapply(1:N,frandom2,mc.cores=2) Combining returned values result2 - result3 - c() for(k in 1:N){ thismat - temp2[[k]][[1]] result2 - rbind(result2,thismat) thisvect - temp2[[k]][[2]] result3 - c(result3,thisvect) } #== Is there a more elegant way of combining these values (as in the top example)? Although this works, for a large value of N ( N 500,000), running a loop would be very expensive. Any help would be appreciated! thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coding for segmented regression within a hurdle model
Hi, I am using a two part hurdle model to account for zero inflation and overdispersion in my count data. I would like to account for a segmented or breakpoint relationship in the binomial logistic hurdle model and pass these results onto the count model (negative binomial). Using the segemented package I have determined that my data supports one breakpoint at 3.85. The slope to this point is significant and will affect the presence of zeros in a linear fashion. The slope 3.85 is non-significant and estimated to not help predict the presence of zeros in the data (threshold effect). Here are the results from this model Estimated Break-Point(s): Est. St.Err 3.853 1.372 t value for the gap-variable(s) V: 0 Meaningful coefficients of the linear terms: Estimate Std. Error z value Pr(|z|) (Intercept) -0.2750 0.3556 -0.774 0.4392 approach_km -0.4520 0.2184 -2.069 0.0385 * sea2 0.3627 0.2280 1.591 0.1117 U1.approach_km 0.4543 0.2188 2.076 NA U1.approach_km is the estimate for the second slope. The actual estimated slope for the section section is the difference between this value and the approach_km value (0.0023). I think that I have found a way to maually code this into the hurdle model as follows hurdle.fit - hurdle(tot_f ~ x1 + x2 + x3 | approach_km + I(pmax(approach_km-3.849,0)) + sea ) When I look at the estimated coefficients from the manual code it gives the same values. However, the std.errors are estimated lower. Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(|z|) (Intercept) -0.274410.29347 -0.9350.350 approach_km -0.452610.09993 -4.529 5.92e-06 *** I(pmax(approach_km - 3.849, 0)) 0.454860.10723 4.242 2.22e-05 *** sea2 0.362710.22803 1.5910.112 Question # 1: Does the hurdle equation use the standard errors from the zero model when building the count predictions? If no then I guess I would not have to worry about this and can just report the original std.errors and associated p values from the segemented object in the pub. Question # 2: If the count model uses the std.errors, how can I reformulate this equation to generate the original std.errors. Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Negative binomial models and censored observations
Hi, I am working with hurdle models in the pscl package to model zero inflated overdispersed count data and want to incorporate censored observations into the equation. 33% of the observed positive count data is right censored, i.e. subject lost to follow up during the duration of the study. Can this be accounted for in the hurdle() function? Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating RMSE in R from hurdle regression object
Hi, My data is characterized by many zeros (82%) and overdispersion. I have chosen to model with hurdle regression (pscl package) with a negative binomial distribution for the count data. In an effort to validate the model I would like to calculate the RMSE of the predicted vs. the observed values. From my reading I understand that this is the calculated on the raw residuals generated from the model output. This is the formula I used H1.RMSE - sqrt(mean(H1$residuals^2)) # Where H1 is my fitted hurdle model I get 46.7 as the RMSE. This seems high to me based on the model results. Assuming my formula and my understanding of RMSE is correct (and please correct me if I am wrong) I question whether this is an appropriate use of validation for this particular structure of model. The hurdle model correctly predicts all of my zeros. The predictions I get from the fitted model are all values greater than zero. From my readings I understand that the predictions from the fitted hurdle model are means generated for the particular covariate environment based on the model coefficients. If this is truly the case it does not make sense to compare these means to the observations. This will generate large residuals (only 18% of the observations contain counts greater than 0, while the predicted counts all exceed 0). It seems like comparing apples to oranges. Other correlative tests (Pearson's r, Spearman's p) would seem to be comparing the mean predicted value for particular covariate to the observed which again is heavily dominated by zeros. Any tips on how best to validate hurdle models in R? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ifelse...
Hi, Sorry for the newbie question! My code: x - 'a' ifelse(x == 'a',y - 1, y - 2) print(y) Shouldn't this assign a value of 1? When I execute this I get: x - 'a' ifelse(x == 'a',y - 1, y - 2) [1] 1 print(y) [1] 2 Am I doing something really daft??? thanks! sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines parallel stats graphics grDevices utils datasets methods base other attached packages: [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6 ggplot2_0.9.3.1 sda_1.3.2 fdrtool_1.2.11 corpcor_1.6.6 [8] entropy_1.2.0 scatterplot3d_0.3-34 pdist_1.2 hash_2.2.6 DAAG_1.18 multicore_0.1-7 multtest_2.18.0 [15] XML_3.95-0.2 hgu133a.db_2.10.1 affy_1.40.0 genefilter_1.44.0 GOstats_2.28.0 graph_1.40.1 Category_2.28.0 [22] GO.db_2.10.1 venneuler_1.1-0 rJava_0.9-6 colorRamps_2.3 RColorBrewer_1.0-5 sparcl_1.0.3 gap_1.1-10 [29] plotrix_3.5-2 som_0.3-5 pvclust_1.2-2 lsr_0.3.1 compute.es_0.2-2 sm_2.2-5.3 imputation_2.0.1 [36] locfit_1.5-9.1 TimeProjection_0.2.0 Matrix_1.1-1.1 timeDate_3010.98 lubridate_1.3.3 gbm_2.1 lattice_0.20-24 [43] survival_2.37-4 RobustRankAggreg_1.1 impute_1.36.0 reshape_0.8.4 plyr_1.8 zoo_1.7-10 data.table_1.8.10 [50] foreach_1.4.1 foreign_0.8-57 languageR_1.4.1 preprocessCore_1.24.0 gtools_3.1.1 BiocInstaller_1.12.0 org.Hs.eg.db_2.10.1 [57] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 biomaRt_2.18.0 loaded via a namespace (and not attached): [1] affyio_1.30.0 annotate_1.40.0 AnnotationForge_1.4.4 codetools_0.2-8 colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4 [8] grid_3.0.2 GSEABase_1.24.0 gtable_0.1.2 iterators_1.0.6 labeling_0.2 latticeExtra_0.6-26 MASS_7.3-29 [15] munsell_0.4.2 proto_0.3-10 RBGL_1.38.0 RCurl_1.95-4.1 reshape2_1.2.2 scales_0.2.3 stats4_3.0.2 [22] stringr_0.6.2 tools_3.0.2 xtable_1.7-1 zlibbioc_1.8.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] set new path for package installation, because In normalizePath(path.expand(path), winslash, mustWork) :
Hi there, i am no way an expert user, so please bear with me, as i am basically looking for a step.by.step tutorial to change pathes in R. This is my current problem: We are running R 3.0.2 with R-Studio on a Win7 environment. In our case, the default working directory Users/Documents/R on the local drive is not available. Instead, the path is redirected to a different network drive (as all our profile data is stored on that H: drive). This used to work fine at some point, but the other day, some major changes were made to our server (which i have nothing to do with), and suddenly, a few users having this problem: Whenever i am starting R, i am welcomed with: Warning message: In normalizePath(path.expand(path), winslash, mustWork) : path[1]=//x/home$/tim/Documents/R/win-library/3.0: Access is denied And i am not allowed to install packages or manipulate objects in my workspace. However, whenever i am running R with admin rights, i may install packages, but they will be stored in a directory in profiles/admin/appdata/local. Curiously, when i perform .libPaths(), i get: [1] x/home$/tim/Documents/R/win-library/3.0 [2] C:/Program Files/R/R-3.0.2/library I can use the packages then with the normal user account though. Still, i am not able to write into my workspace with normal user rights, and for us, it is impossible to grant every user with admin rights to run R-Studio. From what i have found on google, there are workarounds for R, like starting R from batch files, but i do not know if they're fine when i want to start R from an GUI like R-Studio, too. I also have found advice to change the path to the working and library location directory in the Rprofile. I do find a rprofile.site in ../etc , but there is no path given to the rprofile, which i think is needed. There was another tip to manipulate R_LIBS_USER. But, tbh, there is a limit to my technical skill. Do you have any idea to fix that issue? Or is anyone please willing to give me a little guidance? Thank you very much! -- Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Straße (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Append text to panels of lattice::barchart()
I could use a little help writing a panel function to append text to each panel of a lattice::barchart(). Below is a modified version of the barley dataset to illustrate. data(barley) # add a new variable called samp.size barley$samp.size-round(runif(n=nrow(barley), min=0, max=50),0) # Below is a barchart plot for this example barchart(yield ~ variety | site, data = barley, groups = year, layout = c(1,6), stack = TRUE, auto.key = list(space = right), ylab = Barley Yield (bushels/acre), scales = list(x = list(rot = 45))) # I need to add to each panel the sum of samp.size by the groups variable (year) for that panel. # So the text appended to the top panel (Waseca) might say 1931=15, 1932=20, and same for each subsequent panel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Append text to panels of lattice::barchart()
Thanks Duncan, although that helps me learn something new about lattice plots (I never had played with the as.table argument before), I need something a bit more elegant. I need to generate about 10 plots, each with up to 20 panels. So the panel-by-panel approach is a bit cludgier than I am looking for. I'll look into trellis.focus to see if the light bulb comes on in my head. I'd still appreciate a more efficient way to do this given the number of plots and panels I need. Thanks, Tim On Tue, Nov 26, 2013 at 2:08 PM, Duncan Mackay dulca...@bigpond.com wrote: Hi Try barchart(yield ~ variety | site, data = barley, groups = year, layout = c(1,6), stack = TRUE, auto.key = list(space = right), ylab = Barley Yield (bushels/acre), scales = list(x = list(rot = 45)), panel = function(x,y,...){ pnl = panel.number() panel.barchart(x,y,...) if (pnl == 6) grid.text(1931=15, 1932=20, 0.5, 0.88, gp = gpar(cex = 0.8)) } ) if you had as.table = TRUE as an argument pnl==1 would apply to the top panel you then have to repeat the line an amend the pnl == 6. On a panel basis is easier than on a plot basis as the xy coordinates are easily managed. Another way is trellis.focus but possibly a little fiddlier Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tim Sippel Sent: Wednesday, 27 November 2013 07:08 To: r-help@r-project.org Subject: [R] Append text to panels of lattice::barchart() I could use a little help writing a panel function to append text to each panel of a lattice::barchart(). Below is a modified version of the barley dataset to illustrate. data(barley) # add a new variable called samp.size barley$samp.size-round(runif(n=nrow(barley), min=0, max=50),0) # Below is a barchart plot for this example barchart(yield ~ variety | site, data = barley, groups = year, layout = c(1,6), stack = TRUE, auto.key = list(space = right), ylab = Barley Yield (bushels/acre), scales = list(x = list(rot = 45))) # I need to add to each panel the sum of samp.size by the groups variable (year) for that panel. # So the text appended to the top panel (Waseca) might say 1931=15, 1932=20, and same for each subsequent panel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spsurvey analysis
All, I've used the excellent package, spsurvey, to create spatially balanced samples many times in the past. I'm now attempting to use the analysis portion of the package, which compares CDFs among sub-populations to test for differences in sub-population metrics. - My data (count data) have many zeros, following a negative binomial or even zero-inflated negative binomial distribution. - Samples are within polygons of varying sizes - I want to test whether a sample at time 1 is different from a sample at time 2. Essentially the same sample areas and number of samples. The problem: - cont.cdftest throws a warning and does not complete for most (but not all) species sampled. Warning message: The combined number of values in at least one class is less than five. Action: The user should consider using a smaller number of classes. - There are plenty of samples in my two time periods (the dummy set below: Yr1=27, Yr2=31 non-zero values). My Question: Why is it throwing this error and is there a way to get around it? Reproduceable example (change path to spsurvey sample data), requires us to use spsurvey to generate sample points: ### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis' library(spsurvey) ### Analysis set up setwd(C:/Program Files/R/R-3.0.2/library/spsurvey/doc) att - read.dbf(UT_ecoregions) shp - read.shape(UT_ecoregions) set.seed(4447864) # Create the design list Stratdsgn - list(Central Basin and Range=list(panel=c(PanelOne=25), seltype=Equal), Colorado Plateaus=list(panel=c(PanelOne=25), seltype=Equal), Mojave Basin and Range=list(panel=c(PanelOne=10), seltype=Equal), Northern Basin and Range=list(panel=c(PanelOne=10), seltype=Equal), Southern Rockies=list(panel=c(PanelOne=14), seltype=Equal), Wasatch and Uinta Mountains=list(panel=c(PanelOne=10), seltype=Equal), Wyoming Basin=list(panel=c(PanelOne=6), seltype=Equal)) # Select the sample design for each year Stratsites_Yr1 - grts(design=Stratdsgn, DesignID=STRATIFIED, type.frame=area, src.frame=sp.object, sp.object=shp, att.frame=att, stratum=Level3_Nam, shapefile=FALSE) Stratsites_Yr2 - grts(design=Stratdsgn, DesignID=STRATIFIED, type.frame=area, src.frame=sp.object, sp.object=shp, att.frame=att, stratum=Level3_Nam, shapefile=FALSE) #extract the core information, add year as a grouping variable, add a plot ID to link with dummy data Yr1 - cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = Yr1) Yr2 - cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = Yr2) sitedat - rbind(Yr1, Yr2) # create dummy sampling data. Lots of zeros! bn.a - rnbinom(size = 0.06, mu = 19.87, n=100) bn.b - rnbinom(size = 0.06, mu = 20.15, n=100) dat.a - data.frame(pltID = 1001:1100, grp = Yr1,count = bn.a) dat.b - data.frame(pltID = 2001:2100, grp = Yr2,count = bn.b) dat - rbind(dat.a, dat.b) ## Analysis begins here data.cont - data.frame(siteID = dat$pltID, Density=dat$count) sites - data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop - data.frame(siteID = dat$pltID, All_years=(rep(allYears,nrow(dat))), Year = dat$grp) design - data.frame(siteID = sitedat$pltID, wgt = sitedat$wgt, xcoord = sitedat$xcoord, ycoord = sitedat$ycoord) framesize - c(Yr1=888081202000, Yr2=888081202000) ## There seem to be pretty good estimates CDF_Estimates - cont.analysis(sites, subpop, design, data.cont, popsize = list(All_years=sum(framesize), Year = as.list(framesize))) print(CDF_Estimates$Pct) ## this test fails CDF_Tests - cont.cdftest(sites, subpop[,c(1,3)], design, data.cont, popsize=list(Year=as.list(framesize))) warnprnt() ## how many records have values greater than zero, by year? Probably irrelevant! notZero - dat[dat$count 0,] table(notZero$grp) ### end sessionInfo() R version 3.0.2 (2013-09-25) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 Thanks in advance. Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] spsurvey analysis
Jason, Thank you for your reply. Interesting ... so you think the 'classes' in the error message The combined number of values in at least one class... is referring to the CDF bins rather than the sub-population classes that I defined. That makes sense as I only defined two classes (!). I was worried it was detecting and treating polygons as classes, somehow (ecoregions in example below). I had already reached out to Kincaid and Olsen but had not received an answer yet so I moved on to R-help. I'll go back to them. Thanks again. Best, Tim Law, Jason jason@portlandoregon.gov 11/1/2013 2:47 PM I use the spsurvey package a decent amount. The cont.cdftest function bins the cdf in order to perform the test which I think is the root of the problem. Unfortunately, the default is 3 which is the minimum number of bins. I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about this problem. Another option would be to take a different analytical approach (e.g., a mixed effects model) which would allow you a lot more flexibility. Jason Law Statistician City of Portland Bureau of Environmental Services Water Pollution Control Laboratory 6543 N Burlington Avenue Portland, OR 97203-5452 503-823-1038 jason@portlandoregon.gov -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tim Howard Sent: Friday, November 01, 2013 7:49 AM To: r-help@r-project.org Subject: [R] spsurvey analysis All, I've used the excellent package, spsurvey, to create spatially balanced samples many times in the past. I'm now attempting to use the analysis portion of the package, which compares CDFs among sub-populations to test for differences in sub-population metrics. - My data (count data) have many zeros, following a negative binomial or even zero-inflated negative binomial distribution. - Samples are within polygons of varying sizes - I want to test whether a sample at time 1 is different from a sample at time 2. Essentially the same sample areas and number of samples. The problem: - cont.cdftest throws a warning and does not complete for most (but not all) species sampled. Warning message: The combined number of values in at least one class is less than five. Action: The user should consider using a smaller number of classes. - There are plenty of samples in my two time periods (the dummy set below: Yr1=27, Yr2=31 non-zero values). My Question: Why is it throwing this error and is there a way to get around it? Reproduceable example (change path to spsurvey sample data), requires us to use spsurvey to generate sample points: ### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis' library(spsurvey) ### Analysis set up setwd(C:/Program Files/R/R-3.0.2/library/spsurvey/doc) att - read.dbf(UT_ecoregions) shp - read.shape(UT_ecoregions) set.seed(4447864) # Create the design list Stratdsgn - list(Central Basin and Range=list(panel=c(PanelOne=25), seltype=Equal), Colorado Plateaus=list(panel=c(PanelOne=25), seltype=Equal), Mojave Basin and Range=list(panel=c(PanelOne=10), seltype=Equal), Northern Basin and Range=list(panel=c(PanelOne=10), seltype=Equal), Southern Rockies=list(panel=c(PanelOne=14), seltype=Equal), Wasatch and Uinta Mountains=list(panel=c(PanelOne=10), seltype=Equal), Wyoming Basin=list(panel=c(PanelOne=6), seltype=Equal)) # Select the sample design for each year Stratsites_Yr1 - grts(design=Stratdsgn, DesignID=STRATIFIED, type.frame=area, src.frame=sp.object, sp.object=shp, att.frame=att, stratum=Level3_Nam, shapefile=FALSE) Stratsites_Yr2 - grts(design=Stratdsgn, DesignID=STRATIFIED, type.frame=area, src.frame=sp.object, sp.object=shp, att.frame=att, stratum=Level3_Nam, shapefile=FALSE) #extract the core information, add year as a grouping variable, add a plot ID to link with dummy data Yr1 - cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = Yr1) Yr2 - cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = Yr2) sitedat - rbind(Yr1, Yr2) # create dummy sampling data. Lots of zeros! bn.a - rnbinom(size = 0.06, mu = 19.87, n=100) bn.b - rnbinom(size = 0.06, mu = 20.15, n=100) dat.a - data.frame(pltID = 1001:1100, grp = Yr1,count = bn.a) dat.b - data.frame(pltID = 2001:2100, grp = Yr2,count = bn.b) dat - rbind(dat.a, dat.b) ## Analysis begins here data.cont - data.frame(siteID = dat$pltID, Density=dat$count) sites - data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop - data.frame(siteID = dat$pltID, All_years=(rep(allYears,nrow(dat))), Year = dat$grp) design - data.frame(siteID = sitedat$pltID, wgt = sitedat$wgt, xcoord = sitedat
[R] Selecting maximums between different variables
Hi there, another beginners question, I'm afraid. Basically i want to selct the maximum of values, that correspond to different variables. I have a table of oil production that looks somewhat like this: oil - data.frame( YEAR = c(2011, 2012), TX = c(2, 3), CA = c(4, 25000), AL = c(2, 21000), ND = c(21000,6)) Now I want to find out, which state produced most oil in a given year. I tried this: attach(oil) last_year = oil[ c(YEAR == 2012), ] max(last_year) Which works, but it doesnt't give me the corresponding values (i.e. it just gives me the maximum output, not what state its from). So I tried this: oil[c(oil == max(last_year)),] and this: oil[c(last_year == max(last_year)),] and this: oil[which.max(last_year),] and this: last_year[max(last_year),] None of them work, but they don't give error messages either, the output is just NA. The problem is, in my eyes, that I'm comparing the values of different variables with each other. Because if i change the structure of the dataframe (which I can't do with the real data, at least not with out doing it by hand with a huge dataset), it looks like this and works perfectly: oil2 - data.frame ( names = c('YEAR', 'TX', 'CA', 'AL', 'ND'), oil_2011 = c(2011, 2, 4, 2, 21000), oil_2012 = c(2012, 3, 25000, 21000, 6) ) attach(oil2) oil2[c(oil_2012 == max(oil_2012)),] Any help is much appreciated. Thanks, Tim Umbach [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.packages fails on pdf write - CSAgent
For the archives, the solution I've come up with is to complete the install in a location where write permissions are allowed for exe files and then simply copy the extracted files from this folder to R's library folder, using the components of update.packages as follows: #get a list of old packages x - old.packages() #download the zips download.packages(pkgs = x, destdir = C:/pathToLocationWhereWritesAreAllowed) #install them in the safe folder setwd(C:/pathToLocationWhereWritesAreAllowed) y - list.files(pattern = .zip) install.packages(pkgs = y, repos = NULL, lib = C:/pathToLocationWhereWritesAreAllowed) #now manually copy the folders to the R installation and overwrite the existing ones. Tim Date: Wed, 02 Oct 2013 11:17:29 -0400 From: Tim Howard tghow...@gw.dec.state.ny.us To: r-help@r-project.org Subject: [R] update.packages fails on pdf write - CSAgent Message-ID: 524c00c902d500b80...@gwsmtp.dec.state.ny.us Content-Type: text/plain; charset=UTF-8 All, I work in a building where our computers are running Windows XP with Cisco Security Agent (CSA) activated, locking many automated actions, especially file writes. We (lowly staff) are allowed to install software, so on upgrading to R 3.0.2, I tried the standard approach for updating my packages: Copying the non-base folders from my older installation into the library folder and running update.packages(checkBuilt=TRUE, ask=FALSE) R merrily downloads from the mirror and then continues installation -- until it gets to a package that has a pdf to write, as follows: update.packages(checkBuilt=TRUE, ask=FALSE) --- Please select a CRAN mirror for use in this session --- trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/akima_0.5-11.zip' Content type 'application/zip' length 370759 bytes (362 Kb) opened URL downloaded 362 Kb trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/bitops_1.0-6.zip' Content type 'application/zip' length 35878 bytes (35 Kb) opened URL downloaded 35 Kb trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/chron_2.3-44.zip' Content type 'application/zip' length 105192 bytes (102 Kb) opened URL downloaded 102 Kb trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/colorspace_1.2-4.zip' Content type 'application/zip' length 384585 bytes (375 Kb) opened URL downloaded 375 Kb .. many more packages ... package ?akima? successfully unpacked and MD5 sums checked package ?bitops? successfully unpacked and MD5 sums checked package ?chron? successfully unpacked and MD5 sums checked Error in unzip(zipname, exdir = dest) : cannot open file 'C:/Program Files/R/R-3.0.2/library/file8ec2cd16e8/colorspace/doc/hcl-colors.pdf': Permission denied Here's the message from the Application log in my Event Viewer: The process 'C:\Program Files\R\R-3.0.2\bin\i386\Rgui.exe' (as user myusername) attempted to access 'C:\Program Files\R\R-3.0.2\library\file8ec2cd16e8\colorspace\doc\hcl-colors.pdf'. The attempted access was a write (operation = OPEN/CREATE). The operation was denied. So it seems to me that CSA is blocking the installation of this package because of the PDF write attempt. Please correct me if my interpretation is wrong! I can re-install these packages manually, but would love to do it automatically. My Questions: It seems that the pdf is 'extra' since it isn't required for all packages. Is there a way for me to tell update.packages to not install these extras? I don't see that as an option at ?update.packages, but I may be missing it. Alternatively, is there an easy way to identify and download the zips needed in a folder of my choice and but then finish the install manually? sessionInfo() R version 3.0.2 (2013-09-25) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2 Thanks for any help Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] update.packages fails on pdf write - CSAgent
All, I work in a building where our computers are running Windows XP with Cisco Security Agent (CSA) activated, locking many automated actions, especially file writes. We (lowly staff) are allowed to install software, so on upgrading to R 3.0.2, I tried the standard approach for updating my packages: Copying the non-base folders from my older installation into the library folder and running update.packages(checkBuilt=TRUE, ask=FALSE) R merrily downloads from the mirror and then continues installation -- until it gets to a package that has a pdf to write, as follows: update.packages(checkBuilt=TRUE, ask=FALSE) --- Please select a CRAN mirror for use in this session --- trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/akima_0.5-11.zip' Content type 'application/zip' length 370759 bytes (362 Kb) opened URL downloaded 362 Kb trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/bitops_1.0-6.zip' Content type 'application/zip' length 35878 bytes (35 Kb) opened URL downloaded 35 Kb trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/chron_2.3-44.zip' Content type 'application/zip' length 105192 bytes (102 Kb) opened URL downloaded 102 Kb trying URL 'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/colorspace_1.2-4.zip' Content type 'application/zip' length 384585 bytes (375 Kb) opened URL downloaded 375 Kb .. many more packages ... package ‘akima’ successfully unpacked and MD5 sums checked package ‘bitops’ successfully unpacked and MD5 sums checked package ‘chron’ successfully unpacked and MD5 sums checked Error in unzip(zipname, exdir = dest) : cannot open file 'C:/Program Files/R/R-3.0.2/library/file8ec2cd16e8/colorspace/doc/hcl-colors.pdf': Permission denied Here's the message from the Application log in my Event Viewer: The process 'C:\Program Files\R\R-3.0.2\bin\i386\Rgui.exe' (as user myusername) attempted to access 'C:\Program Files\R\R-3.0.2\library\file8ec2cd16e8\colorspace\doc\hcl-colors.pdf'. The attempted access was a write (operation = OPEN/CREATE). The operation was denied. So it seems to me that CSA is blocking the installation of this package because of the PDF write attempt. Please correct me if my interpretation is wrong! I can re-install these packages manually, but would love to do it automatically. My Questions: It seems that the pdf is 'extra' since it isn't required for all packages. Is there a way for me to tell update.packages to not install these extras? I don't see that as an option at ?update.packages, but I may be missing it. Alternatively, is there an easy way to identify and download the zips needed in a folder of my choice and but then finish the install manually? sessionInfo() R version 3.0.2 (2013-09-25) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2 Thanks for any help Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GGally installation problems
Hi, I was trying to install the GGally package, but was getting errors. Here is what I get: install.packages(GGally) Installing package(s) into â/Users/ts2w/Library/R/2.15/libraryâ (as âlibâ is unspecified) Warning in install.packages : Â package âGGallyâ is not available (for R version 2.15.2) install.packages(GGally,repos=http://cran-r.project.org;) Installing package(s) into â/Users/ts2w/Library/R/2.15/libraryâ (as âlibâ is unspecified) Warning in install.packages : Â cannot open: HTTP status was '404 Not Found' Warning in install.packages : Â cannot open: HTTP status was '404 Not Found' Warning in install.packages : Â unable to access index for repository http://cran-r.project.org/src/contrib Warning in install.packages : Â package âGGallyâ is not available (for R version 2.15.2) Warning in install.packages : Â cannot open: HTTP status was '404 Not Found' Warning in install.packages : Â cannot open: HTTP status was '404 Not Found' Warning in install.packages : Â unable to access index for repository http://cran-r.project.org/bin/macosx/leopard/contrib/2.15 thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] #Keeping row names when using as.data.frame.matrix
#question I have the following data set: Date-c(9/7/2010,9/7/2010,9/7/2010,9/7/2010,9/7/2010,9/7/2010,9/8/2010) EstimatedQuantity-c(3535,2772,3279,3411,3484,3274,3305) ScowNo-c(4001,3002,4002,BR 8,4002,BR 8,4001) dataset- data.frame(EstimatedQuantity,Date,ScowNo) #I'm trying to convert the data set into a contingency table and then back into a regular data frame: xtabdata-as.data.frame.matrix(xtabs(EstimatedQuantity~Date+ScowNo,data=dataset), row.names=(dataset$Date),optional=F) #I'm trying to keep the row names (in xtabsdata) as the dates. #But the row names keep coming up as integers. #How can I preserve the row names as dates when #the table is converted back to a data frame? -- View this message in context: http://r.789695.n4.nabble.com/Keeping-row-names-when-using-as-data-frame-matrix-tp4667344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv quotes within fields
Yes! This does this trick. Thank You! Tim peter dalgaard pda...@gmail.com 1/26/2013 11:49 AM On Jan 26, 2013, at 16:32 , Tim Howard wrote: Duncan, Good point - I guess I am expecting too much. I'll work on a global replace before import or chopping with strsplit while importing. FYI everyone - these folks with the non-standard csv are the US Feds: https://oeaaa.faa.gov/oeaaa/external/public/publicAction.jsp?action=showCaseDownloadForm (see off-airport AEA 2005 for a csv with issues.) Does this do the trick? dd - read.csv(text=gsub(\\\, \\, fixed=TRUE, readLines(~/Downloads/OffAirportAEA2005List.csv))) Thank you for the help. I really do appreciate the suggested solutions. Also, thanks to John Kane for the link to csvEdit. Tim Duncan Murdoch murdoch.dun...@gmail.com 01/25/13 6:37 PM On 13-01-25 4:37 PM, Tim Howard wrote: David, Thank you again for the reply. I'll try to make readLines() and strplit() work. What bugs me is that I think it would import fine if the folks who created the csv had used double quotes rather than an escaped quote \ for those pesky internal quotes. Since that's the case, I'd think there would be a solution within read.csv() ... or perhaps scan()?, I just can't figure it out. What you say doesn't make sense. Let me paraphrase: The folks who sent me the data created a weird, non-standard .csv file. You'd think read.csv() could handle it. The standard way to handle quotes in strings is to double them. They didn't, so you've got a weird file on your hands. You can probably fix it by doing a global substitution of all backslash doublequote pairs with doublequote doublequote. Unless they have some backslash backslash doublequote triples that they want you to interpret as backslash doublequote. Or some other weirdness. I'd suggest you try the global subst mentioned above. or ask them for data in a reasonably standardized format. Duncan Murdoch best, Tim David Winsemius dwinsem...@comcast.net 1/25/2013 4:16 PM On Jan 25, 2013, at 11:35 AM, Tim Howard wrote: Great point, your fix (quote=) works for the example I gave. Unfortunately, these text strings have commas in them as well(!). Throw a few commas in any of the text strings and it breaks again. Sorry about not including those in the example. So, I need to incorporate commas *and* quotes with the escape character within a single string. Well you need to have _some_ delimiter. At the moment it sounds as though you might end upusing readLines() and strsplit( . , split=\\'\\,\\s\\). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv quotes within fields
Duncan, Good point - I guess I am expecting too much. I'll work on a global replace before import or chopping with strsplit while importing. FYI everyone - these folks with the non-standard csv are the US Feds: https://oeaaa.faa.gov/oeaaa/external/public/publicAction.jsp?action=showCaseDownloadForm (see off-airport AEA 2005 for a csv with issues.) Thank you for the help. I really do appreciate the suggested solutions. Also, thanks to John Kane for the link to csvEdit. Tim Duncan Murdoch murdoch.dun...@gmail.com 01/25/13 6:37 PM On 13-01-25 4:37 PM, Tim Howard wrote: David, Thank you again for the reply. I'll try to make readLines() and strplit() work. What bugs me is that I think it would import fine if the folks who created the csv had used double quotes rather than an escaped quote \ for those pesky internal quotes. Since that's the case, I'd think there would be a solution within read.csv() ... or perhaps scan()?, I just can't figure it out. What you say doesn't make sense. Let me paraphrase: The folks who sent me the data created a weird, non-standard .csv file. You'd think read.csv() could handle it. The standard way to handle quotes in strings is to double them. They didn't, so you've got a weird file on your hands. You can probably fix it by doing a global substitution of all backslash doublequote pairs with doublequote doublequote. Unless they have some backslash backslash doublequote triples that they want you to interpret as backslash doublequote. Or some other weirdness. I'd suggest you try the global subst mentioned above. or ask them for data in a reasonably standardized format. Duncan Murdoch best, Tim David Winsemius dwinsem...@comcast.net 1/25/2013 4:16 PM On Jan 25, 2013, at 11:35 AM, Tim Howard wrote: Great point, your fix (quote=) works for the example I gave. Unfortunately, these text strings have commas in them as well(!). Throw a few commas in any of the text strings and it breaks again. Sorry about not including those in the example. So, I need to incorporate commas *and* quotes with the escape character within a single string. Well you need to have _some_ delimiter. At the moment it sounds as though you might end upusing readLines() and strsplit( . , split=\\'\\,\\s\\). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.csv quotes within fields
All, I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: # set up a matrix, using escape-quote as the internal double quote mark. x - data.frame(matrix(data=c(1, string one, another string, 2, quotes escaped 10' 20\ 5' 30\ \test string, final string, 3,third row,last \ col),ncol = 3, byrow=TRUE)) write.csv(x, test.csv) # NOTE that write.csv correctly created the three internal quotes ' ' by using double quotes ' '. # here's what got written ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20 5' 30 test string,final string 3,3,third row,last col # Importing test.csv works fine. read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20 5' 30 test string final string 3 3 3 third row last col # this looks good. # now, please go and open test.csv with a text editor and replace all the double quotes '' with the # quote escaped ' \ ' as is found in my data set. Like this: ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string 3,3,third row,last \ col # this breaks read.csv: read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col # we now have only two rows, with all the data captured in col2 row2 Any suggestions on how to fix this behavior? I've tried fiddling with quote=\ to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! Thank you in advance, Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv quotes within fields
Drat, I forgot to tell you what system I am on: sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base Tim Howard 1/25/2013 1:42 PM All, I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: # set up a matrix, using escape-quote as the internal double quote mark. x - data.frame(matrix(data=c(1, string one, another string, 2, quotes escaped 10' 20\ 5' 30\ \test string, final string, 3,third row,last \ col),ncol = 3, byrow=TRUE)) write.csv(x, test.csv) # NOTE that write.csv correctly created the three internal quotes ' ' by using double quotes ' '. # here's what got written ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20 5' 30 test string,final string 3,3,third row,last col # Importing test.csv works fine. read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20 5' 30 test string final string 3 3 3 third row last col # this looks good. # now, please go and open test.csv with a text editor and replace all the double quotes '' with the # quote escaped ' \ ' as is found in my data set. Like this: ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string 3,3,third row,last \ col # this breaks read.csv: read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col # we now have only two rows, with all the data captured in col2 row2 Any suggestions on how to fix this behavior? I've tried fiddling with quote=\ to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! Thank you in advance, Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv quotes within fields
Great point, your fix (quote=) works for the example I gave. Unfortunately, these text strings have commas in them as well(!). Throw a few commas in any of the text strings and it breaks again. Sorry about not including those in the example. So, I need to incorporate commas *and* quotes with the escape character within a single string. Tim David Winsemius dwinsem...@comcast.net 1/25/2013 2:27 PM On Jan 25, 2013, at 10:42 AM, Tim Howard wrote: All, I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: # set up a matrix, using escape-quote as the internal double quote mark. x - data.frame(matrix(data=c(1, string one, another string, 2, quotes escaped 10' 20\ 5' 30\ \test string, final string, 3,third row,last \ col),ncol = 3, byrow=TRUE)) write.csv(x, test.csv) # NOTE that write.csv correctly created the three internal quotes ' ' by using double quotes ' '. # here's what got written ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20 5' 30 test string,final string 3,3,third row,last col # Importing test.csv works fine. read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20 5' 30 test string final string 3 3 3 third row last col # this looks good. # now, please go and open test.csv with a text editor and replace all the double quotes '' with the # quote escaped ' \ ' as is found in my data set. Like this: ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string 3,3,third row,last \ col Use quote=: read.csv(text=',X1,X2,X3 + 1,1,string one,another string + 2,2,quotes escaped 10\' 20 5\' 30 test string,final string + 3,3,third row,last col', sep=,, quote=) Not , quote=\ X.. X.X1. X.X2.X.X3. 1 1 1string one another string 2 2 2 quotes escaped 10' 20 5' 30 test string final string 3 3 3 third rowlast col You will then be depending entirely on commas to separate. (Needed to use escaped single quotes to illustrate from a command line.) # this breaks read.csv: read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col # we now have only two rows, with all the data captured in col2 row2 Any suggestions on how to fix this behavior? I've tried fiddling with quote=\ to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! Thank you in advance, Tim David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv quotes within fields
David, Thank you again for the reply. I'll try to make readLines() and strplit() work. What bugs me is that I think it would import fine if the folks who created the csv had used double quotes rather than an escaped quote \ for those pesky internal quotes. Since that's the case, I'd think there would be a solution within read.csv() ... or perhaps scan()?, I just can't figure it out. best, Tim David Winsemius dwinsem...@comcast.net 1/25/2013 4:16 PM On Jan 25, 2013, at 11:35 AM, Tim Howard wrote: Great point, your fix (quote=) works for the example I gave. Unfortunately, these text strings have commas in them as well(!). Throw a few commas in any of the text strings and it breaks again. Sorry about not including those in the example. So, I need to incorporate commas *and* quotes with the escape character within a single string. Well you need to have _some_ delimiter. At the moment it sounds as though you might end upusing readLines() and strsplit( . , split=\\'\\,\\s\\). -- david. Tim David Winsemius dwinsem...@comcast.net 1/25/2013 2:27 PM On Jan 25, 2013, at 10:42 AM, Tim Howard wrote: All, I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem: # set up a matrix, using escape-quote as the internal double quote mark. x - data.frame(matrix(data=c(1, string one, another string, 2, quotes escaped 10' 20\ 5' 30\ \test string, final string, 3,third row,last \ col),ncol = 3, byrow=TRUE)) write.csv(x, test.csv) # NOTE that write.csv correctly created the three internal quotes ' ' by using double quotes ' '. # here's what got written ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20 5' 30 test string,final string 3,3,third row,last col # Importing test.csv works fine. read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20 5' 30 test string final string 3 3 3 third row last col # this looks good. # now, please go and open test.csv with a text editor and replace all the double quotes '' with the # quote escaped ' \ ' as is found in my data set. Like this: ,X1,X2,X3 1,1,string one,another string 2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string 3,3,third row,last \ col Use quote=: read.csv(text=',X1,X2,X3 + 1,1,string one,another string + 2,2,quotes escaped 10\' 20 5\' 30 test string,final string + 3,3,third row,last col', sep=,, quote=) Not , quote=\ X.. X.X1. X.X2.X.X3. 1 1 1string one another string 2 2 2 quotes escaped 10' 20 5' 30 test string final string 3 3 3 third rowlast col You will then be depending entirely on commas to separate. (Needed to use escaped single quotes to illustrate from a command line.) # this breaks read.csv: read.csv(test.csv) X X1 X2 X3 1 1 1 string one another string 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col # we now have only two rows, with all the data captured in col2 row2 Any suggestions on how to fix this behavior? I've tried fiddling with quote=\ to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while! Thank you in advance, Tim David Winsemius Alameda, CA, USA David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filling Lists or Arrays of variable dimensions
Jessica, In terms of initializing your list for populating it in a later step, I think this partially gets at your question -- it removes the upper loop assignment, but then does require a loop to get the second level. Perhaps there's something here you can work with ... height-c(high,low) width-c(slim,wide) l - vector(list,length(height)) names(l) - height for(i in 1:length(l)){ l[[i]] - vector(list,length(width)) names(l[[i]]) - width } Best Tim Date: Fri, 21 Dec 2012 11:34:06 +0100 From: Jessica Streicher j.streic...@micromata.de To: David Winsemius dwinsem...@comcast.net Cc: R help r-help@r-project.org Subject: Re: [R] Filling Lists or Arrays of variable dimensions Message-ID: be0ef9be-a571-45f3-92e6-72e093bb5...@micromata.de Content-Type: text/plain; charset=us-ascii @David : In my mind it was quite complete enough. @William: Thanks, didn't know you could do that with data frames, if i ever have to do something similar again i might try this. On 20.12.2012, at 22:39, David Winsemius wrote: On Dec 20, 2012, at 10:01 AM, Jessica Streicher wrote: Really must have been unclear at some point, sorry. Hasn't it become abundantly clear that this would have progress farther had you post a complete example? -- David. William, thats interesting, but not really helping the main problem, which is: how to do l[[ as.character(grid[1, ]) ]] - 1 without having initialized the list in the loop before. Well, or how to initialize it without having to do the loop thing, because the loop stuff can only be done for a specific set of parameter vectors. But those change, and i don't want to have to write another loop construct every time for the new version. I want to say: hey, i have these vectors here with these values (my parameters), could you build me that nested list structure (tree - whatever) from it? And the function will give me that structure whatever i give it without me needing to intervene in form of changing the code. -- Clarification - First: i am not computing statistics over the parameters. I'm computing stuff from other data, and the computation is affected by the parameters. I am computing classifiers for different sets of parameters for those classifiers. So the result of doSomething() isn't a simple value. Its usually a list of 6 lists (doing cross validation), which in turn have the classifier object, some statistics of the classifier (e.g what was missclassified), and the subsets of data used in them. That doesn't really fit in a data.frame, hence the use of lists. I want the nested lists because it helps me find stuff in the object browser faster, and because all my other code is already geared towards it. If i had the time i might still go for a flat structure that everyone keeps telling me to use (got a few mails off the list), but i really haven't the time. If theres no good way i'll just keep things as they are now. On 20.12.2012, at 18:37, William Dunlap wrote: Arranging data as a list of lists of lists of lists [...] of scalar values generally will lead to slow and hard-to-read R code, mainly because R is designed to work on long vectors of simple data. If you were to start over, consider constructing a data.frame with one column for each attribute. Then tools like aggregate and the plyr functions would be useful. However, your immediate problem may be solved by creating your 'grid' object as a data.frame of character, not factor, columns because as.character works differently on lists of scalar factors and lists of scalar characters. Usually as.mode(x), when x is a list of length-1 items, gives the same result as as.mode(unlist(x)), but not when x is a list of length-1 factors: height-c(high, low) width-c(slim, wide) gridF - expand.grid(height, width, stringsAsFactors=FALSE) gridT - expand.grid(height, width, stringsAsFactors=TRUE) as.character(gridF[1,]) [1] high slim as.character(gridT[1,]) [1] 1 1 as.character(unlist(gridT[1,])) # another workaround [1] high slim Your example was not self-contained so I changed the call to doSomething() to paste(h,w,sep=/): height-c(high, low) width-c(slim, wide) l - list() for(h in height){ l[[h]] - list() for(w in width){ l[[h]][[w]] - paste(h, w, sep=/) # doSomething() } } grid - expand.grid(height, width, stringsAsFactors=FALSE) as.character(grid[1,]) # [1] high slim, not the [1] 1 1 you get with stringsAsFactors=TRUE l[[ as.character(grid[1, ]) ]] # [1] high/slim l[[ as.character(grid[1, ]) ]] - 1 l[[ as.character(grid[1, ]) ]] # [1] 1 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jessica Streicher Sent: Thursday, December 20, 2012 8:43 AM To: Chris Campbell Cc: R help
Re: [R] boot() with glm/gnm on a contingency table
One approach is to bootstrap the vector 1:n, where n is the number of individuals, with a function that does: f - function(vectorOfIndices, theTable) { (1) create a new table with the same dimensions, but with the counts in the table based on vectorOfIndices. (2) Calculate the statistics of interest on the new table. } When f is called with 1:n, the table it creates should be the same as the original table. When called with a bootstrap sample of values from 1:n, it should create a table corresponding to the bootstrap sample. Tim Hesterberg http://www.timhesterberg.net (resampling, water bottle rockets, computers to Costa Rica, shower = 2650 light bulbs, ...) NEW! Mathematical Statistics with Resampling and R, Chihara Hesterberg http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8 Hi everyone! In a package I'm developing, I have created a custom function to get jackknife standard errors for the parameters of a gnm model (which is essentially the same as a glm model for this issue). I'd like to add support for bootstrap using package boot, but I couldn't find how to proceed. The problem is, my data is a table object. Thus, I don't have one individual per line: when the object is converted to a data frame, one row represents one cell, or one combination of factor levels. I cannot pass this to boot() as the data argument and use indices from my custom statistic() function, since I would drop cells, not individual observations. A very inefficient solution would be to create a data frame with one row per observation, by replicating each cell using its frequencies. That's really a brute force solution, though. ;-) The other way would be generate importance weights based on observed frequencies, and to multiply the original data by the weights at each iteration, but I'm not sure that's correct. Thoughts? Thanks for your help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boot() with glm/gnm on a contingency table
Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit : One approach is to bootstrap the vector 1:n, where n is the number of individuals, with a function that does: f - function(vectorOfIndices, theTable) { (1) create a new table with the same dimensions, but with the counts in the table based on vectorOfIndices. (2) Calculate the statistics of interest on the new table. } When f is called with 1:n, the table it creates should be the same as the original table. When called with a bootstrap sample of values from 1:n, it should create a table corresponding to the bootstrap sample. Indeed, that's another solution I considered, but I wanted to be sure nothing more reasonable exists. You're right that it's more efficient than replicating the whole data set. But still, with a typical table of less than 100 cells and several thousands of observations, this means creating a potentially long vector, much larger than the original data; nothing really hard with common machines, to be sure. If no other way exists, I'll use this. Thanks. In your original posting you also suggested: The other way would be generate importance weights based on observed frequencies, and to multiply the original data by the weights at each iteration, but I'm not sure that's correct. Thoughts? You could do: bootstrapTable - x # where x is the original table for(i in numberOfBootstrapSamples) { bootstrapTable[] - rmultinom(1, size = sum(x), prob = x) replicate[i] - myFunction(bootstrapTable) } # caveat - not tested I can't tell from help(boot) whether you could do it correctly there. boot has a 'weights' argument that you could use for the sampling probabilities, but you also need a way to tell it to draw sum(x) observations. Or, you could also pass boot a parametric sampler. But be careful if you use boot in either of these ways; you not only need to generate the bootstrap samples, you also need to make sure that it is does all other calculations correctly, including calculating the statistic for the original data, calculating jackknife statistics if they are used for confidence intervals, etc. Wistful sigh - this would be pretty easy to do with S+Resample. Tim Hesterberg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram to KDE
To bootstrap from a histogram, use sample(bins, replace = TRUE, prob = counts) Note that a kernel density estimate is biased, so some bootstrap confidence intervals have poor coverage properties. Furthermore, if the kernel bandwidth is data-driven then the estimate is not functional, so some bootstrap and jackknife methods won't work right. Tim Hesterberg http://www.timhesterberg.net New: Mathematical Statistics with Resampling and R, Chihara Hesterberg On Fri, Aug 31, 2012 at 12:15 PM, David L Carlson dcarl...@tamu.edu wrote: Using a data.frame x with columns bins and counts: x - structure(list(bins = c(3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5), counts = c(1, 1, 2, 3, 6, 18, 19, 23, 8, 10, 6, 2, 1)), .Names = c(bins, counts), row.names = 4:16, class = data.frame) This will give you a plot of the kde estimate: Thanks. xkde - density(rep(bins, counts), bw=SJ) plot(xkde) As for the standard error or the confidence interval, you would probably need to use bootstrapping. On a similar note - is there a way in R to directly resample / cross-validate from a histogram of a data-set without recreating the original data-set ? -Original Message- Hello, I wanted to know if there was way to convert a histogram of a data-set to a kernel density estimate directly in R ? Specifically, I have a histogram [bins, counts] of samples {X1 ... XN} of a quantized variable X where there is one bin for each level of X, and I'ld like to directly get a kde estimate of the pdf of X from the histogram. Therefore, there is no additional quantization of X in the histogram. Most KDE methods in R seem to require the original sample set - and I would like to avoid re-creating the samples from the histogram. Is there some quick way of doing this using one of the standard kde methods in R ? Also, a general statistical question - is there some measure of the standard error or confidence interval or similar of a KDE of a data-set ? Thanks, -fj [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help reshaping table using aggregate
I am trying to learn how to reshape my data set. I am new to R, so please bear with me. Basically, I have the following data set: site-c(A,A,B,B) bug-c(spider,grasshopper,ladybug,stinkbug) count-c(2,4,6,8) myf - data.frame(site, bug, count) myf site bug count 1A spider 2 2A grasshopper 4 3B ladybug 6 4Bstinkbug 8 This means that in site A, I found 2 spiders and 4 grasshopper. In site B, I found 6 ladybugs and 8 stinkbugs. I would like to change the df to aggregate the site column and make the bugs columns so it arranged like this: site spider grasshopper ladybug stinkbug 1A 2 4 00 2B 0 0 68 -- View this message in context: http://r.789695.n4.nabble.com/need-help-reshaping-table-using-aggregate-tp4634014.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: legend for geom_rug() ..?
Hi, Here is the corrected code: library(ggplot2) ids - paste('id_',1:3,sep='') before - sample(9) after - sample(1:10,9) dat - as.matrix(cbind(before,after)) rownames(dat) - rep(ids,3) position - c(rep(10,3),rep(13,3),rep(19,3)) mdat - cbind(melt(dat),position) colnames(mdat) - c('ID','Treatment','value','position') ggplot(mdat, aes(position, value)) + geom_point(aes(colour = Treatment)) +       geom_rug(subset = .(position 14),aes(y=NULL),color=orange) +       geom_rug(subset = .(position 14),aes(y=NULL),color=black) Alternatively, how do I add a second legend in ggplot2? thanks! From: John Kane jrkrid...@inbox.com To: Brian Smith bsmith030...@gmail.com; r-help@r-project.org Sent: Wednesday, June 6, 2012 3:06 PM Subject: Re: [R] ggplot2: legend for geom_rug() ..? What is X2? code not running at the moment John Kane Kingston ON Canada -Original Message- From: bsmith030...@gmail.com Sent: Wed, 6 Jun 2012 11:52:25 -0400 To: r-help@r-project.org Subject: [R] ggplot2: legend for geom_rug() ..? Hi, I was trying to make another legend for the rug plot. Sample code: library(ggplo2) ids - paste('id_',1:3,sep='') before - sample(9) after - sample(1:10,9) dat - as.matrix(cbind(before,after)) rownames(dat) - rep(ids,3) position - c(rep(10,3),rep(13,3),rep(19,3)) mdat - cbind(melt(dat),position) ggplot(mdat, aes(position, value)) + geom_point(aes(colour = X2)) +     geom_rug(subset = .(position 14),aes(y=NULL),color=orange) +     geom_rug(subset = .(position 14),aes(y=NULL),color=black) This gives the plot correctly, but how can I add another legend that would give some more information on the rugplot (e.g. that 'orange' line = 'London', and 'black' line = 'NYC')? thanks!!    [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys [[elided Yahoo spam]] ®, Google Talk⢠and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] dataframe package
A new 'dataframe' package is on CRAN. This is a modified version of the data frame code in R, modified to make fewer copies of the inputs and run faster, e.g. as.data.frame(a vector) makes 1 copy of the vector rather than 3, data.frame(a vector) makes 3 copies rather than 6, and x[, a] - newValue makes (2,2,1) copies of (old data frame, new column, integer vector of row names) rather than (6,4,2). Functionality should be identical to existing R code, with two exceptions: * bug fix, if x has two columns, then x[[4]] - value will fail rather than producing an illegal data frame * round(a data frame with numeric and factor columns) rounds the numeric columns and leaves the factor columns unchanged, rather than failing. Tim Hesterberg NEW! Mathematical Statistics with Resampling and R, Chihara Hesterberg http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8 http://home.comcast.net/~timhesterberg (resampling, water bottle rockets, computers to Costa Rica, shower = 2650 light bulbs, ...) ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible artifacts in cross-correlation function (ccf)?
Dear R-users, I have been using R and its core-packages with great satisfaction now for many years, and have recently started using the ccf function (part of the stats package version 2.16.0), about which I have a question. The ccf-algorithm for calculating the cross-correlation between two time series always calculates the mean and standard deviation per time series beforehand, thereby using a constant value for these irrespective of any time-lag. Another piece of statistical software that I'm using, a toolbox in Matlab, does this in a fundamentally different way. It first chops off the parts of the time-series that do not overlap when a time-lag has been introduced, and then calculates a new mean and standard deviation to be used for further calculations. This latter method has the advantage of always theoretically still being able to obtain a cross-correlation of 1 (or -1), whereas the ccf-method of R seems to introduce zeros at the non-overlapping parts of the time-series, thereby preventing this possibility and producing very different results. Take for instance the two time series: a = {1,3,2} and b = {3,2,1}. The query ccf(a,b) produces the output {-0.5, -0.5, 0.5}, but I would t! hink that a time-lag of -1 should produce a cross-correlation here of 1, since the two time series will overlap with identical parts {3,2}. I have attached clean implementations (removing all dependencies) of how the R algorithm seems to calculate cross-correlations with time-lag (it produces identical results to ccf), and how this other method (in Matlab) calculates it (with newly calculated means and standard deviation for each time-lag). Could someone be so kind as to explain to me why the ccf-algorithm has this specific implementation that seems to, at least for specific situations, produce results with artifacts? It is very likely that the R-implementation, as opposed to the alternative algorithm described above and in the attachment, has a very good statistical explanation, but one that unfortunately is not dawning on me. Sincerely, Tim Dorscheidt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] suggested method to transform list to a matrix
I have data in the following list format: USER,VARIABLE,COUNT user1, var1, 3 user1, var2, 4 user2, var1, 7 userN, var12, 5 And would like to have it format as a matrix: var1var2var12 user1 3 4 user2 7 userN 5 What is the suggested method to do this for 1 million rows, with 12 variables and ~ 4,000 unique users? Thank you, Tim Stutt t...@ischool.berkeley.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [BioC] Read .idat Illumina files in R
Unfortunately, this won't help for expression arrays. Last time I checked, those IDATs appeared to be encrypted. crlmm, methylumi, and minfi can all read IDAT files... *if* they are version 3 or later (e.g. genotyping, methylation, etc). Otherwise you are probably stuck with GenomeStudio if it is expression data you're dealing with. On Wed, Apr 11, 2012 at 7:14 AM, Moiz Bootwalla msbootwa...@gmail.comwrote: Hi Ekta, You can use the bioconductor package methylumi to read in .idat files. The function methylumIDAT() will read in .idat files and provide you with beta values along with the M and U values as a MethyLumiSet object. You can then use the functions methylumi.bgcorr() and normalizeMethyLumiSet() to background correct and normalize your data. See the vignette for more details. Best, Moiz On Apr 11, 2012, at 1:46 AM, Ekta Jain wrote: Dear Bioc and R List Users, I am having trouble analysing illumine data generated from BeadScan. I have .idat files and JPEG images. I realise that i need bead-level summary data to be able to begin quality control followed by normalization. Is there a way i can read .idat files for expression analysis or do i need to go back to BeadScan and generate .txt files/tiff files ? Appreciate any help here. Many Thanks, Ekta Jain The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner,! a! nd is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. www.jubl.com [[alternative HTML version deleted]] ___ Bioconductor mailing list bioconduc...@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ___ Bioconductor mailing list bioconduc...@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- *A model is a lie that helps you see the truth.* * * Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] detecting noise in data?
You might want to provide an example? It's a pretty vague problem at the moment. If the data can be easily picked out by human eyes, you might want to think about your criteria you're using to pick out a contaminated result. If you can express it in such a way that you don't need to scan each observation (e.g. if a snapper weighs = 30kg then somebody entered that data incorrectly) then you can create an indicator variable and continue with your analysis. Other than that - some sort of cluster analysis might be able to pick up on 2 distinct groups provided within each group there's a reasonable level of homogeneity. Then from there, you can do a basic inference test for group means to detect whether there are significant differences detected between groups. Cheers, Tim -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Sent: Wednesday, 25 January 2012 9:31 AM To: r-help Subject: Re: [R] detecting noise in data? Hi all, I just wanted to add that I am looking for a solution that's in R ... to handle this... And also, in a given sample, the correct data are of the majority and the noise are of the minority. Thank you! On Tue, Jan 24, 2012 at 4:09 PM, Michael comtech@gmail.com wrote: Hi all, I have data which are unfortuantely comtaminated by noise. We knew that the noise is at different level than the correct data, i.e. the noise data can be easily picked out by human eyes. It looks as if there are two people that generated the two very different data with different mean levels, and they got mixed together. i.e. assming the two data are following unknown distribution DF, and the two mean levels are u1 and u2... (unknown) Then the correct data are generated by DF(u1) and the noise are generated by DF(u2), and they got mixed... Now, how do I flag those suspicious data? At least is there a way I could answer the question: Given a sample of mixed data - are these data generated from the above-mentioned two sources, or the data are indeed generated from one source only. i.e. are there two substantially distinct species in the given data? Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System. Emails and attachments are monitored to ensure compliance with the NSW Ministry of Health's Electronic Messaging Policy. __ __ Disclaimer: This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of the NSW Ministry of Health. __ This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System. Emails and attachments are monitored to ensure compliance with the NSW Ministry of Health's Electronic Messaging Policy. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bootstrap means by strata
All - I have an uneven set of replicates and would like to sample from this set X number of times to generate a mean for each grouping variable. I was thinking the boot package would be the thing to use, but now I'm not so sure ... given the discussion here: http://finzi.psych.upenn.edu/Rhelp10/2010-June/243828.html Given these data (a small subset): dput(x) structure(list(numSpp = c(7, 8, 6, 4, 5, 4, 10, 12, 7, 13, 4, 11, 4, 3, 3, 10, 8, 16, 11, 6, 3, 5, 6, 13, 15, 2, 6, 11, 11, 10, 6, 9, 11, 14, 10, 7), TrSeasYr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c(numSpp, TrSeasYr), row.names = c(NA, -36L), class = data.frame) #get into R by pasting the above after x- or dget(clipboard) # my grouping var is col 2, data are in col 1 # my attempt using boot library(boot) mn - function(d,f) { mean(d[,1] * f[d[,1]]) } b1 - boot(data = x, mn, R=99, stype = f, strata=x$TrSeasYr) b1 STRATIFIED BOOTSTRAP Call: boot(data = x, statistic = mn, R = 99, stype = f, strata = x$TrSeasYr) Bootstrap Statistics : originalbiasstd. error t1* 8.08 0.13524131.681901 What is the most efficient way to resample, with replacement, and generate means for each grouping variable? Thanks in advance, Tim sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-2 boot_1.3-2 loaded via a namespace (and not attached): [1] tools_2.12.2 Timothy G. Howard, Ph.D. Director of Science New York Natural Heritage Program 625 Broadway, 5th floor Albany, NY 12233-4757 (518) 402-8945 facsimile (518) 402-8925 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bootstrap means by strata
Answering my own question. ?sample (!) y - by(x, x$TrSeasYr, function(x) mean(sample(x[,1], size=999, replace = TRUE))) Tim Howard 10/13/2011 9:42 AM All - I have an uneven set of replicates and would like to sample from this set X number of times to generate a mean for each grouping variable. I was thinking the boot package would be the thing to use, but now I'm not so sure ... given the discussion here: http://finzi.psych.upenn.edu/Rhelp10/2010-June/243828.html Given these data (a small subset): dput(x) structure(list(numSpp = c(7, 8, 6, 4, 5, 4, 10, 12, 7, 13, 4, 11, 4, 3, 3, 10, 8, 16, 11, 6, 3, 5, 6, 13, 15, 2, 6, 11, 11, 10, 6, 9, 11, 14, 10, 7), TrSeasYr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c(numSpp, TrSeasYr), row.names = c(NA, -36L), class = data.frame) #get into R by pasting the above after x- or dget(clipboard) # my grouping var is col 2, data are in col 1 # my attempt using boot library(boot) mn - function(d,f) { mean(d[,1] * f[d[,1]]) } b1 - boot(data = x, mn, R=99, stype = f, strata=x$TrSeasYr) b1 STRATIFIED BOOTSTRAP Call: boot(data = x, statistic = mn, R = 99, stype = f, strata = x$TrSeasYr) Bootstrap Statistics : originalbiasstd. error t1* 8.08 0.13524131.681901 What is the most efficient way to resample, with replacement, and generate means for each grouping variable? Thanks in advance, Tim sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RODBC_1.3-2 boot_1.3-2 loaded via a namespace (and not attached): [1] tools_2.12.2 Timothy G. Howard, Ph.D. Director of Science New York Natural Heritage Program 625 Broadway, 5th floor Albany, NY 12233-4757 (518) 402-8945 facsimile (518) 402-8925 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
I'll concur with Peter Dalgaard that * a permutation test is the right thing to do - your problem is equivalent to a two-sample test, * don't bootstrap, and * don't bother with t-statistics but I'll elaborate a bit on on why, including * two approaches to the whole problem - and how your approach relates to the usual approach, * an interesting tidbit about resampling t statistics. First, I'm assuming that your x variable is irrelevant, only y matters, and that sample1 is a proper subset of df. I'll also assume that you want to look for differences in the mean, rather than arbitrary differences (in which case you might use e.g. a Kolmogorov-Smirnov test). There are two general approaches to this problem: (1) two-sample problem, sample1$y vs df$y[rows other than sample 1] (2) the approach you outlined, thinking of sample1$y as part of df$y. Consider (1), and call the two data sets y1 and y2 The basic permutation test approach is * compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2) * repeat (or 9) times: draw a sample of size n1 from the pooled data, call that Y1, call the rest Y2 compute theta(Y1, Y2) * P-value for a one-sided test is (1 + k) / (1 + ) where k is the number of permutation samples with theta(Y1,Y2) = theta(y1,y2) The test statistic could be mean(y1) - mean(y2) mean(y1) sum(y1) t-statistic (pooled variance) P-value for a t-test (pooled variance) mean(y1) - mean(pooled data) t-statistic (unpooled variance) P-value for a t-test (unpooled variance) median(y1) - median(y2) ... The first six of those are equivalent - they give exactly the same P-value for the permutation test. The reason is that those test statistics are monotone transformations of each other, given the data. Hence, doing the pooled-variance t calculations gains nothing. Now consider your approach (2). That is equivalent to the permutation test using the test statistic: mean(y1) - mean(pooled data). Why not a bootstrap? E.g. pool the data and draw samples of size n1 and n2 from the pooled data, independently and with replacement. That is similar to the permutation test, but less accurate. Probably the easiest way to see this is to suppose there is 1 outlier in the pooled data. In any permutation iteration there is exactly 1 outlier among the two samples. With bootstrapping, there could be 0, 1, 2, The permutation test answers the question - given that there is exactly 1 outlier in my combined data, what is the probability that random chance would give a difference as large as I observed. The bootstrap would answer some other question. Tim Hesterberg NEW! Mathematical Statistics with Resampling and R, Chihara Hesterberg http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8 http://home.comcast.net/~timhesterberg (resampling, water bottle rockets, computers to Guatemala, shower = 2650 light bulbs, ...) On Oct 8, 2011, at 16:04 , francy wrote: Hi, I am having trouble understanding how to approach a simulation: I have a sample of n=250 from a population of N=2,000 individuals, and I would like to use either permutation test or bootstrap to test whether this particular sample is significantly different from the values of any other random samples of the same population. I thought I needed to take random samples (but I am not sure how many simulations I need to do) of n=250 from the N=2,000 population and maybe do a one-sample t-test to compare the mean score of all the simulated samples, + the one sample I am trying to prove that is different from any others, to the mean value of the population. But I don't know: (1) whether this one-sample t-test would be the right way to do it, and how to go about doing this in R (2) whether a permutation test or bootstrap methods are more appropriate This is the data frame that I have, which is to be sampled: df- i.e. x y 1 2 3 4 5 6 7 8 . . . . . . 2,000 I have this sample from df, and would like to test whether it is has extreme values of y. sample1- i.e. x y 3 4 7 8 . . . . . . 250 For now I only have this: R=999 #Number of simulations, but I don't know how many... t.values =numeric(R) #creates a numeric vector with 999 elements, which will hold the results of each simulation. for (i in 1:R) { sample1 - df[sample(nrow(df), 250, replace=TRUE),] But I don't know how to continue the loop: do I calculate the mean for each simulation and compare it to the population mean? Any help you could give me would be very appreciated, Thank you. The straightforward way would be a permutation test, something like this msamp - mean(sample1$y) mpop - mean(df$y) msim - replicate(1, mean(sample(df$y, 250))) sum(abs(msim-mpop) = abs(msamp-mpop))/1 I don't really see a reason to do bootstrapping here. You say you want to test whether your sample could be a random sample from the population, so just simulate that sampling
[R] Can R be installed in a Hyper-V environment?
We are contemplating the installation of R on a Windows server with Hyper-V but we can't find any information on whether it can be done. This transmission is intended solely for the person or organization to whom it is addressed and it may contain privileged and confidential information. If you are not the intended recipient you should not copy, distribute or take any action in reliance on it. If you believe you received this transmission in error please notify the sender. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 freqpoly() layers..?
Hi, I was trying to overlay/combine two freqpoly plots. The sample code below illustrates the problem. Essentially, I want to do is: 1. Have the same colour for all the lines in 'Plot 1' (and 'Plot 2'). Currently, all the lines in Plot 1 have different colours and all the lines in Plot 2 have different colors. I'd like for all lines in Plot 1 to be 'red' and all the lines in Plot 2 to be 'black'. 2. Combine both the plots ('Plot 1' and 'Plot 2' as one combined plot - which I attempt to do in 'Combined Plot'). However, I'm doing something wrong because with the code for 'Combined Plot' I just get two lines. sample code library(ggplot2) ## Plot 1 - normal distributions with mean = 0 ## mat - matrix(rnorm(1,mean=0),1000,10) colnames(mat) - paste('a',1:ncol(mat),sep='') rownames(mat) - 1:nrow(mat) mat2 - melt(mat) ggplot(mat2) + geom_freqpoly(aes(x = value, y = ..density.., colour = X2)) ## Plot 2- normal distributions with mean = 1 tab - matrix(rnorm(1,mean=1),1000,10) colnames(tab) - paste('b',1:ncol(tab),sep='') rownames(tab) - 1:nrow(tab) tab2 - melt(tab) ggplot(tab2) + geom_freqpoly(aes(x = value, y = ..density.., colour = X2)) ## Combined plot comb - cbind(mat,tab) comb2 - melt(comb) cols - c(rep('red',ncol(mat)*nrow(mat)),rep('black',ncol(tab)*nrow(tab))) ggplot(comb2) + geom_freqpoly(aes(x = value, y = ..density.., colour = cols)) ### End code ### Any help would be appreciated! thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hide row and column-names in cat()
Dear list, to prepare data as input for a stand-alone groundwater model I use the cat() function to print data. I want to print a table with print() or cat() but without the row- and column-names. Is this possible? Here a small example my.df - rbind(c(Nr,FraSand,FraSilt,FraClay,pH), c(, (kg.kg-1),(kg.kg-1), (kg.kg-1), (-)), c(1, 19, 60, 21, 7.1), c(2, 35, 53, 11, 7.7)) print(my.df, quote=FALSE) [,1] [,2] [,3] [,4] [,5] [1,] Nr FraSand FraSilt FraClay pH [2,] (kg.kg-1) (kg.kg-1) (kg.kg-1) (-) [3,] 1 19 60 21 7.1 [4,] 2 35 53 11 7.7 What I want to have is this: Nr FraSand FraSilt FraClay pH (kg.kg-1) (kg.kg-1) (kg.kg-1) (-) 1 19 60 21 7.1 2 35 53 11 7.7 Thank you. TIM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting components from a 'boot' class output in R
Sarath pointed out (privately) that the standard error is not actually stored as part of the bootstrap object. Instead, it is calculated on the fly when you print the bootstrap object; this is done by boot:::print.boot Similarly for bias. (The 'boot:::' is necessary because 'print.boot' is not an exported object from 'namespace:boot'). Tim Hesterberg Do names(bootObj) to find out what the components are, and use $ or [[ to extract components. Do help(boot) for a description of components of the object (look in the Value section). That is general advice in R, applying to all kinds of objects - boot, and many other functions such as lm(), return lists with a class added, and you can operate on the object as a list using names(), $, etc. Tim Hesterberg Dear R user, I used the following to do a bootstrap. bootObj-boot(data=DAT, statistic=Lp.est, R=1000,x0=3) I have the following output from the above bootstrap. How can I extract components of the output. For example, how can I extract the std.error? bootObj ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = DAT, statistic = Lp.est, R = 1000, x0 = 3) Bootstrap Statistics : original bias std. error t1* 794.9745 -0.341 4.042099 Any help is greatly appreciated. Thank you Sarath Banneheka __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting components from a 'boot' class output in R
Do names(bootObj) to find out what the components are, and use $ or [[ to extract components. Do help(boot) for a description of components of the object (look in the Value section). That is general advice in R, applying to all kinds of objects - boot, and many other functions such as lm(), return lists with a class added, and you can operate on the object as a list using names(), $, etc. Tim Hesterberg Dear R user, I used the following to do a bootstrap. bootObj-boot(data=DAT, statistic=Lp.est, R=1000,x0=3) I have the following output from the above bootstrap. How can I extract components of the output. For example, how can I extract the std.error? bootObj ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = DAT, statistic = Lp.est, R = 1000, x0 = 3) Bootstrap Statistics : original bias std. error t1* 794.9745 -0.341 4.042099 Any help is greatly appreciated. Thank you Sarath Banneheka __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pulling data from remote MySQL database
Agree - I started with RMySQL but eventually changed over to RODBC when I realised some of the additional functionality it contains and the fact that I needed to access different types of RDBMSs (e.g. acquire data from a Microsoft access database file). Cheers, Tim -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David A. Johnston Sent: Wednesday, 22 June 2011 8:22 AM To: r-help@r-project.org Subject: Re: [R] Pulling data from remote MySQL database Hi Joseph, I know that the RMySQL and RODBC packages allow you to pull the result of a MySQL query into a data.frame within R. I like RODBC because you can access many different flavors of RDBMS's. -David Johnston -- View this message in context: http://r.789695.n4.nabble.com/Pulling-data-from-remote-MySQL-database-tp3615351p3615523.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ This email has been scanned for the NSW Department of Health by the Websense Hosted Email Security System. The Department regularly monitors emails and attachments to ensure compliance with its Electronic Messaging Policy. __ __ Disclaimer: This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of NSW Health. __ This email has been scanned for the NSW Department of Health by the Websense Hosted Email Security System. The Department regularly monitors emails and attachments to ensure compliance with its Electronic Messaging Policy. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting a data frame with values of different lengths
William, I think to convert to numeric, you might need to do something like: as.numeric(as.character()) ## and not just as.numeric() As it stands, it would appear that it is still being read as a character string. From: William Armstrong william.armstr...@noaa.gov To: r-help@r-project.org Sent: Tue, June 7, 2011 10:05:37 AM Subject: Re: [R] Sorting a data frame with values of different lengths Also, I tried changing a line to store W as numeric: sample_info-c(pds_gagehandles[i],p,as.numeric(sample_W)) But it is still sorting incorrectly: W_table[order(W_table$as.numeric.W.),] pds_gagehandles.i. p as.numeric.W. 8mibe 81004.5 1mibe 1 746 10 mibe 10 746 6mibe 6 811.5 2mibe 2 818.5 11 mibe 11 822.5 4mibe 4 879 5mibe 5 888 9mibe 9 901 7mibe 7 903.5 3mibe 3 919.5 str(W_table) 'data.frame': 11 obs. of 3 variables: $ pds_gagehandles.i.: Factor w/ 1 level mibe: 1 1 1 1 1 1 1 1 1 1 ... $ p : chr 1 2 3 4 ... $ as.numeric.W. : chr 746 818.5 919.5 879 ... -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p3579691.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on approximations of full logistic regression model
My usual rule is that whatever gives the widest confidence intervals in a particular problem is most accurate for that problem :-) Bootstrap percentile intervals tend to be too narrow. Consider the case of the sample mean; the usual formula CI is xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n) The bootstrap percentile interval for symmetric data is roughly xbar +- z_alpha sqrt( (1/(n )) sum((x_i - xbar)^2)) / sqrt(n) It is narrower than the formula CI because * z quantiles rather than t quantiles * standard error uses divisor of n rather than (n-1) In stratified sampling, the narrowness factor depends on the stratum sizes, not the overall n. In regression, estimates for some quantities may be based on a small subset of the data (e.g. coefficients related to rare factor levels). This doesn't mean we should give up on the bootstrap. There are remedies for the bootstrap biases, see e.g. Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930. http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf And other methods have their own biases, particularly in nonlinear applications such as logistic regression. Tim Hesterberg Thank you for your reply, Prof. Harrell. I agree with you. Dropping only one variable does not actually help a lot. I have one more question. During analysis of this model I found that the confidence intervals (CIs) of some coefficients provided by bootstrapping (bootcov function in rms package) was narrower than CIs provided by usual variance-covariance matrix and CIs of other coefficients wider. My data has no cluster structure. I am wondering which CIs are better. I guess bootstrapping one, but is it right? I would appreciate your help in advance. -- KH (11/05/16 12:25), Frank Harrell wrote: I think you are doing this correctly except for one thing. The validation and other inferential calculations should be done on the full model. Use the approximate model to get a simpler nomogram but not to get standard errors. With only dropping one variable you might consider just running the nomogram on the entire model. Frank KH wrote: Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not. library(rms) plogit- predict(full.model) full.ols- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) fastbw(full.ols, aics=1e10) Deleted Chi-Sq d.f. P Residual d.f. P AICR2 stenosis 1.41 10.2354 1.41 10.2354 -0.59 0.991 x216.78 10. 18.19 20.0001 14.19 0.882 procedure 26.12 10. 44.31 30. 38.31 0.711 ClinicalScore 25.75 10. 70.06 40. 62.06 0.544 x183.42 10. 153.49 50. 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping stenosis. full.ols.approx- ols(plogit ~ x1+x2+ClinicalScore+procedure) full.ols.approx$stats n Model L.R.d.f. R2 g Sigma 104.000 487.9006640 4.000 0.9908257 1.3341718 0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping stenosis as predictor. full.approx.lrm- update(full.model, ~ . -stenosis) validate(full.model, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 R20.3270 0.3716 0.3335 0.0382 0.2888 1000 Intercept 0. 0. 0.0821 -0.0821 0.0821 1000 Slope 1. 1. 1.0548 -0.0548 1.0548 1000 Emax 0. 0. 0.0263 0.0263 0.0263 1000 validate(full.approx.lrm, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 R20.3245 0.3592 0.3428 0.0164 0.3081 1000 Intercept 0. 0. 0.1281 -0.1281 0.1281 1000 Slope 1. 1. 1.1104 -0.1104 1.1104 1000 Emax 0. 0. 0.0444 0.0444 0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram. full.approx.lrm.nom- nomogram
Re: [R] memory and bootstrapping
(Regarding bootstrapping logistic regression.) If the number of rows with Y=1 is small, it doesn't matter that n is huge. If both number of successes and failures is huge, then ad Ripley notes you can use asymptotic CIs. The mean difference in predicted probabilities is a nonlinear function of the coefficients, so you can use the delta method to get standard errors. In general, if you're not sure about normality and bias, you can use the bootstrap to estimate how close to normal the sampling distribution is. The results may surprise you. For example, for a one-sample mean, if the population has skewness = 2 (like an exponential distn), you need n=5000 before the CLT is reasonably accurate (actual one-sided non-coverage probabilities within 10% of the nominal, for a 95% interval). Finally, you can speed up bootstrapping glms by using starting values based on the coefficients estimated from the original data. And, you can compute the model matrix once and resample rows of that along with y, rather than computing a model matrix from scratch each time. Tim Hesterberg The only reason the boot package will take more memory for 2000 replications than 10 is that it needs to store the results. That is not to say that on a 32-bit OS the fragmentation will not get worse, but that is unlikely to be a significant factor. As for the methodology: 'boot' is support software for a book, so please consult it (and not secondary sources). From your brief description it looks to me as if you should be using studentized CIs. 130,000 cases is a lot, and running the experiment on a 1% sample may well show that asymptotic CIs are good enough. On Thu, 5 May 2011, E Hofstadler wrote: hello, the following questions will without doubt reveal some fundamental ignorance, but hopefully you can still help me out. I'd like to bootstrap a coefficient gained on the basis of the coefficients in a logistic regression model (the mean differences in the predicted probabilities between two groups, where each predict() operation uses as the newdata-argument a dataframe of equal size as the original dataframe).I've got 130,000 rows and 7 columns in my dataframe. The glm-model uses all variables (as well as two 2-way interactions). System: - R-version: 2.12.2 - OS: Windows XP Pro, 32-bit - 3.16Ghz intel dual core processor, 2.9GB RAM I'm using the boot package to arrive at the standard errors for this difference, but even with only 10 replications, this takes quite a long time: 216 seconds (perhaps this is partly also due to my inefficiently programmed function underlying the boot-call, I'm also looking into that). I wanted to try out calculating a bca-bootstrapped confidence interval, which as I understand requires a lot more replications than normal-theory intervals. Drawing on John Fox' Appendix to his An R Companion to Applied Regression, I was thinking of trying out 2000 replications -- but this will take several hours to compute on my system (which isn't in itself a major issue though). My Questions: - let's say I try bootstrapping with 2000 replications. Can I be certain that the memory available to R will be sufficient for this operation? - (this relates to statistics more generally): is it a good idea in your opinion to try bca-bootstrapping, or can it be assumed that a normal theory confidence interval will be a sufficiently good approximation (letting me get away with, say, 500 replications)? Best, Esther -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple Missing cases Function
Dear all I have written a function to perform a very simple but useful task which I do regularly. It is designed to show how many values are missing from each variable in a data.frame. In its current form it works but is slow because I have used several loops to achieve this simple task. Can anyone see a more efficient way to get the same results? Or is there existing function which does this? Thanks for your help Tim Function: miss - function (data) { miss.list - list(NA) for (i in 1:length(data)) { miss.list[[i]] - table(is.na(data[i])) } for (i in 1:length(miss.list)) { if (length(miss.list[[i]]) == 2) { miss.list[[i]] - miss.list[[i]][2] } } for (i in 1:length(miss.list)) { if (names(miss.list[[i]]) == FALSE) { miss.list[[i]] - 0 } } data.frame(names(data), as.numeric(miss.list)) } Example: data(ToothGrowth) data.m - ToothGrowth data.m$supp[sample(1:nrow(data.m), size=25)] - NA miss(data.m) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Missing cases Function
Dear Petr Thanks so much. That is a LOT more efficient. Tim -Original Message- From: Petr PIKAL [mailto:petr.pi...@precheza.cz] Sent: Tuesday, April 19, 2011 3:37 PM To: tesutton Cc: r-help@r-project.org Subject: Odp: [R] Simple Missing cases Function Hi Hi try colSums(is.na(data.m)) It is not in data frame but you can easily transform it if you want. Regards Petr r-help-boun...@r-project.org napsal dne 19.04.2011 09:29:08: Dear all I have written a function to perform a very simple but useful task which I do regularly. It is designed to show how many values are missing from each variable in a data.frame. In its current form it works but is slow because I have used several loops to achieve this simple task. Can anyone see a more efficient way to get the same results? Or is there existing function which does this? Thanks for your help Tim Function: miss - function (data) { miss.list - list(NA) for (i in 1:length(data)) { miss.list[[i]] - table(is.na(data[i])) } for (i in 1:length(miss.list)) { if (length(miss.list[[i]]) == 2) { miss.list[[i]] - miss.list[[i]][2] } } for (i in 1:length(miss.list)) { if (names(miss.list[[i]]) == FALSE) { miss.list[[i]] - 0 } } data.frame(names(data), as.numeric(miss.list)) } Example: data(ToothGrowth) data.m - ToothGrowth data.m$supp[sample(1:nrow(data.m), size=25)] - NA miss(data.m) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Missing cases Function
Thanks Tyler This function has some useful features Tim From: Tyler Rinker [mailto:tyler_rin...@hotmail.com] Sent: Tuesday, April 19, 2011 3:52 PM To: tesutton; r-help@r-project.org Subject: RE: [R] Simple Missing cases Function I use the following code/function which gives me some quick descriptives about each variable (ie. n of missing values, % missing, case #'s missing, etc.): Fairly quick, maybe not pretty but effective on either single variables or entire data sets. NAhunter-function(dataset) { find.NA-function(variable) { if(is.numeric(variable)){ n-length(variable) mean-mean(variable, na.rm=T) median-median(variable, na.rm=T) sd-sd(variable, na.rm=T) NAs-is.na(variable) total.NA-sum(NAs) percent.missing-total.NA/n descriptives-data.frame(n,mean,median,sd,total.NA,percent.missing) rownames(descriptives)-c( ) Case.Number-1:n Missing.Values-ifelse(NAs0,Missing Value, ) missing.value-data.frame(Case.Number,Missing.Values) missing.values-missing.value[ which(Missing.Values=='Missing Value'),] list(NUMERIC DATA,DESCRIPTIVES=t(descriptives),CASE # OF MISSING VALUES=missing.values[,1]) } else{ n-length(variable) NAs-is.na(variable) total.NA-sum(NAs) percent.missing-total.NA/n descriptives-data.frame(n,total.NA,percent.missing) rownames(descriptives)-c( ) Case.Number-1:n Missing.Values-ifelse(NAs0,Missing Value, ) missing.value-data.frame(Case.Number,Missing.Values) missing.values-missing.value[ which(Missing.Values=='Missing Value'),] list(CATEGORICAL DATA,DESCRIPTIVES=t(descriptives),CASE # OF MISSING VALUES=missing.values[,1]) } } dataset-data.frame(dataset) options(scipen=100) options(digits=2) lapply(dataset,find.NA) } From: tesut...@hku.hk To: r-help@r-project.org Date: Tue, 19 Apr 2011 15:29:08 +0800 Subject: [R] Simple Missing cases Function Dear all I have written a function to perform a very simple but useful task which I do regularly. It is designed to show how many values are missing from each variable in a data.frame. In its current form it works but is slow because I have used several loops to achieve this simple task. Can anyone see a more efficient way to get the same results? Or is there existing function which does this? Thanks for your help Tim Function: miss - function (data) { miss.list - list(NA) for (i in 1:length(data)) { miss.list[[i]] - table(is.na(data[i])) } for (i in 1:length(miss.list)) { if (length(miss.list[[i]]) == 2) { miss.list[[i]] - miss.list[[i]][2] } } for (i in 1:length(miss.list)) { if (names(miss.list[[i]]) == FALSE) { miss.list[[i]] - 0 } } data.frame(names(data), as.numeric(miss.list)) } Example: data(ToothGrowth) data.m - ToothGrowth data.m$supp[sample(1:nrow(data.m), size=25)] - NA miss(data.m) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap 95% confidence intervals for splines
You're mixing up two concepts here, - splines - bootstrap confidence intervals Separating them may help cut the confusion. First, to do a bootstrap confidence interval for a difference in predictions in the linear regression case, do: repeat 10^4 times draw a bootstrap sample of the observations (subjects, keeping x y together) fit the linear regression to the bootstrap sample record the difference in predictions at the two x values end loop The bootstrap confidence interval is the range of the middle 95% of the recorded differences. For a spline, the procedure is the same except for fitting a spline regression: repeat 10^4 times draw a bootstrap sample of the observations (subjects, keeping x y together) fit the SPLINE regression to the bootstrap sample record the difference in predictions at the two x values end loop The bootstrap confidence interval is the range of the middle 95% of the recorded differences. Tim Hesterberg P.S. I think you're mixing up the response and explanatory variables. I'd think of eating hot dogs as the cause (explanatory variable), and waistline as the effect (response, or outcome). P.P.S. I don't like the terms independent and dependent variables, as that conflicts with the concept of independence in probability. Independent variables are generally not independent, and the dependent variable may be independent of the others. There appear to be reports in the literature that transform continuous independent variablea by the use of splines, e.g., assume the dependent variable is hot dogs eaten per week (HD) and the independent variable is waistline (WL), a normal linear regression model would be: nonconfusing_regression - lm(HD ~ WL) One might use a spline, confusion_inducing_regression_with_spline - lm(HD ~ ns(WL, df = 4) ) Now is where the problem starts. From nonconfusing_regression , I get, say 2 added hot dogs per week for each centimeter of waistline along with a s.e. of 0.5 hot dogs per week, which I multiply by 1.96 to garner each side of the 95% c.i. If I want to show what the difference between the 75th percentile (say 100 cm) and 25th percentile (say 80 cm) waistlines are, I multiply 2 by 100-80=20 and get 40 hot dogs per week as the point estimate with a similar bumping of the s.e. to 10 hot dogs per week. What do I do to get the point estimate and 95% confidence interval for the difference between 100 cm persons and 80 cm persons with confusion_inducing_regression_with_spline ? Best regards. Mitchell S. Wachtel, MD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assign factor levels based on list
David and Bill, Thank you so much for your rapid and exceptional help. Bill, the reason I had gone with lists within the list was because I thought I might use the list for holding other information - and because it was easier to get the column name. Your simple-list suggestion is cleaner and I'm going with that for now. To better understand it, I tweaked David's fabulous lapply() statement to work with the simpler list and for prosperity, provide that below. The for-loop seems cleanest and I'll probably go with it on its own, outside of the function. y - data.frame(colOne = c(1,2,3), colTwo = c(apple,pear,orange), + colThree=c(4,5,6) ) my.factor.defs - list(colOne = c(1,2,3,4,5,6), +colTwo = c(apple, pear, orange, fig, banana)) y[ , names(my.factor.defs)] - lapply(names(my.factor.defs), function(x) { +y[[x]] - factor(y[[x]] , levels= my.factor.defs[[x]])}) str(y) 'data.frame': 3 obs. of 3 variables: $ colOne : Factor w/ 6 levels 1,2,3,4,..: 1 2 3 $ colTwo : Factor w/ 5 levels apple,pear,..: 1 2 3 $ colThree: num 4 5 6 Thank you again! Best, Tim Howard William Dunlap wdun...@tibco.com 2/9/2011 4:41 PM -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tim Howard Sent: Wednesday, February 09, 2011 12:44 PM To: r-help@r-project.org Subject: [R] assign factor levels based on list All, Given a data frame and a list containing factor definitions for certain columns, how can I apply those definitions from the list, rather than doing it the standard way, as noted below. I'm lost in the world of do.call, assign, paste, and can't find my way through. For example: #set up df y - data.frame(colOne = c(1,2,3), colTwo = c(apple,pear,orange)) factor.defs - list(colOne = list(name = colOne, lvl = c(1,2,3,4,5,6)), colTwo = list(name = colTwo, lvl = c(apple,pear,orange,fig,banana))) Why not the following format? my.factor.defs - list(colOne = c(1,2,3,4,5,6), colTwo = c(apple, pear, orange, fig, banana)) Do you really want to support a case like the following? list(colOne = list( name = anotherColumn, lvl=c(1,2,3,4,5,6)) #A standard way to define levels y$colTwo - factor(y$colTwo , levels = c(apple,pear,orange,fig,banana)) # I'd like to use the definitions locally but also pass them (but not the data) to a function, # so, rather than defining each manually each time, I'd like to loop through the columns, # call them by name, find the definitions in the list and use them from there. Before I try to loop # or use some form of apply, I'd like to get a single factor definition working. First write a function that takes a data.frame and list of desired levels for each column and outputs a new data.frame. E.g., if you use the simpler form of the levelsList I gave above, the following might work well enough (it does no error checking): assignNewLevelsToDataFrameColumns - function(x, levelsList) { for(colName in names(levelsList)) { # note that x$name is equivalent to x[[name]], so # if you want to use a variable as the name, use [[. x[[colName]] - factor(x[[colName]], levels=levelsList[[colName]]) } x } Test it: fixedY - assignNewLevelsToDataFrameColumns(y, my.factor.defs) colOne colTwo 1 1 apple 2 2 pear 3 3 orange str(fixedY) 'data.frame': 3 obs. of 2 variables: $ colOne: Factor w/ 6 levels 1,2,3,4,..: 1 2 3 $ colTwo: Factor w/ 5 levels apple,pear,..: 1 2 3 Do y - assignNewLevelsToDataFrameColumns(y, my.factor.defs) if you want to overwrite the old y. Now if you want a function that changes the data.frame you give it, use a replacement function. If you want to use the syntax func(y) - newStuff then the function should be called `func-` and the last argument must be called 'value' (newStuff will be passed via value=newStuff). E.g., `func-` - function(x, value) { alteredX - assignNewLevelsToDataFrameColumns(x, value) alteredX } and use it as func(y) - my.factor.defs str(y) 'data.frame': 3 obs. of 2 variables: $ colOne: Factor w/ 6 levels 1,2,3,4,..: 1 2 3 $ colTwo: Factor w/ 5 levels apple,pear,..: 1 2 3 The first command gets translated into y - `func-`(y, value=my.factor.defs) If you write a replacement function, it is nice to create a matching extractor function called 'func'. E.g., func - function(x) lapply(x, levels) func(y) $colOne [1] 1 2 3 4 5 6 $colTwo [1] apple pear orange figbanana Note that this avoids assign(), get(), eval(), etc., and thus makes it easy to follow the flow of data in the code: only things on the left side of the assignment arrow can get changed. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com # this doesn't seem to see the dataframe properly
[R] assign factor levels based on list
All, Given a data frame and a list containing factor definitions for certain columns, how can I apply those definitions from the list, rather than doing it the standard way, as noted below. I'm lost in the world of do.call, assign, paste, and can't find my way through. For example: #set up df y - data.frame(colOne = c(1,2,3), colTwo = c(apple,pear,orange)) factor.defs - list(colOne = list(name = colOne, lvl = c(1,2,3,4,5,6)), colTwo = list(name = colTwo, lvl = c(apple,pear,orange,fig,banana))) #A standard way to define levels y$colTwo - factor(y$colTwo , levels = c(apple,pear,orange,fig,banana)) # I'd like to use the definitions locally but also pass them (but not the data) to a function, # so, rather than defining each manually each time, I'd like to loop through the columns, # call them by name, find the definitions in the list and use them from there. Before I try to loop # or use some form of apply, I'd like to get a single factor definition working. # this doesn't seem to see the dataframe properly do.call(factor,list((paste(y$,factor.defs[2][[1]]$name,sep=)),levels=factor.defs[2][[1]]$lvl)) #adding as.name doesn't help do.call(factor,list(as.name(paste(y$,factor.defs[2][[1]]$name,sep=)),levels=factor.defs[2][[1]]$lvl)) #Here's my attempt to mimic the standard way, using assign. Ha! what a joke. assign(as.name(paste(y$,factor.defs[2][[1]]$name,sep=)), do.call(factor, list(as.name(paste(y$,factor.defs[2][[1]]$name,sep=)), levels = factor.defs[2][[1]]$lvl))) ##Error in function (x = character(), levels, labels = levels, exclude = NA, : ## object 'y$colTwo' not found Any help or perspective (or better way from the beginning!) would be greatly appreciated. Thanks in advance! Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing formula as an agrument
Hi, I had a function that looked like: diff - lm(x ~ y + z) How can I pass the argument to the 'lm' function on the fly? E.g., if I pass it in as a string (e.g. x ~ y + z), then the lm function treats it as a string and not a proper argument. many thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] legend not appearing in Word document
I need help with using graphics in Word 2007 that will later be converted into a pdf document. I have tried several formats and found that I get the best quality of graphics using .wmf, .eps format, but when I convert it to .pdf I get a bunch of lines across the figures. I also tried .tiff and .png but they give me much lower quality. The best quality that converts to pdf appears to be .eps. However, I have now come across a problem with my figure legends. For some reason the legend is visible in R but not in Word. Does anyone know why a legend in .eps format won't work in Word, or how I can get it to work? I have made an example of the legend below that you should be able to save as .eps and paste into Word as an example. I would appreciate any help you can offer on getting the legend in .eps format to work, or on other formats that may be better for Word and pdf files. Thanks, Tim library(plotrix) Satelite.Palette - colorRampPalette(c(blue3,cyan,aquamarine,yellow,orange,red)) mycol-Satelite.Palette(ceiling(5000+1))#Max relative angle multiplied by 100 to give larger range. Max is 3.1415, rounded up to 3.15 plus one. col.labels-round((seq(0,5000,length.out=5)/1000),1) plot(0, 0, type=n, axes=F, xlab=, ylab=) #New plot for legend color.legend(0,0,1,1,col.labels, mycol, align=rb, gradient=y) #Adds legend Tim Clark Marine Ecologist National Park of American Samoa Pago Pago, AS 96799 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] legend not appearing in Word document
Thanks, I will take it up with MS. I just downloaded their latest converter and that hasn't fixed the issue. Hopefully they will have additional advice. Aloha, Tim Tim Clark Marine Ecologist National Park of American Samoa Pago Pago, AS 96799 From: Schalk Heunis schalk.heu...@enerweb.co.za Cc: r help r-help r-help@r-project.org; Tim Clark tim_cl...@nps.gov Sent: Sun, December 12, 2010 6:05:23 AM Subject: Re: [R] legend not appearing in Word document Tim Works in Word 2002 on Windows XP with PDF-xchange 3.0 to convert to pdf. Saw reponse from Duncan - agree might be problem with Word 2007 PDF converter. HTH Schalk I need help with using graphics in Word 2007 that will later be converted into a pdf document. I have tried several formats and found that I get the best quality of graphics using .wmf, .eps format, but when I convert it to .pdf I get a bunch of lines across the figures. I also tried .tiff and .png but they give me much lower quality. The best quality that converts to pdf appears to be .eps. However, I have now come across a problem with my figure legends. For some reason the legend is visible in R but not in Word. Does anyone know why a legend in .eps format won't work in Word, or how I can get it to work? I have made an example of the legend below that you should be able to save as .eps and paste into Word as an example. I would appreciate any help you can offer on getting the legend in .eps format to work, or on other formats that may be better for Word and pdf files. Thanks, Tim library(plotrix) Satelite.Palette - colorRampPalette(c(blue3,cyan,aquamarine,yellow,orange,red)) mycol-Satelite.Palette(ceiling(5000+1))#Max relative angle multiplied by 100 to give larger range. Max is 3.1415, rounded up to 3.15 plus one. col.labels-round((seq(0,5000,length.out=5)/1000),1) plot(0, 0, type=n, axes=F, xlab=, ylab=) #New plot for legend color.legend(0,0,1,1,col.labels, mycol, align=rb, gradient=y) #Adds legend Tim Clark Marine Ecologist National Park of American Samoa Pago Pago, AS 96799 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply over list of data.frames
R users, This probably involves a simple incantation of one of the flavors of apply... that I can't yet figure out. Consider a list of data frames. I'd like to apply a function (mean) across the list and return a dataframe of the same dimensions where each cell represents the mean of that cell across all dataframes. # set up the list x - vector(list,2) names(x) - c(one,two) # add data to the list for(i in 1:2){ y = i^2 x[[i]] - data.frame(a=c(y,2*y,3*y),b=c(y+1,y+2,y+3),c=c(2*y+1,2*y+2,2*y+3)) } #show the list x $one a b c 1 1 2 3 2 2 3 4 3 3 4 5 $two a b c 1 4 5 9 2 8 6 10 3 12 7 11 #the result should be a b c 1 2.5 3.5 6 2 5 4.5 7 3 7.5 5.5 8 Can anyone direct me down the right path? Thanks in advance Tim Howard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.