[R] De-serialization vulnerability?

2024-05-01 Thread Howard, Tim G (DEC) via R-help
All, 
There seems to be a hullaboo about a vulnerability in R when deserializing 
untrusted data:

https://hiddenlayer.com/research/r-bitrary-code-execution

https://nvd.nist.gov/vuln/detail/CVE-2024-27322

https://www.kb.cert.org/vuls/id/238194


Apparently a fix was made for R 4.4.0, but I see no mention of it in the 
changes report:

https://cloud.r-project.org/bin/windows/base/NEWS.R-4.4.0.html

Is this real?  Were there changes in R 4.4.0 that aren't reported?

Of course, we should *always* update to the most recent version, but I was 
confused why it wasn't mentioned in the release info. 

Thanks,
Tim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing shapes in legend with scale_shape_manual

2023-10-31 Thread Howard, Tim G (DEC) via R-help
I believe the missing shapes are because you had set alpha=0 for the last geom 
point. 

I expect there are better ways, but one way to handle it would be to avoid the 
filtering, adding columns with med and exercise status, like the following:

# setup with data provided
Date <- c('2023-10-17', '2023-10-16', '2023-10-15', '2023-10-14',
 '2023-10-13', '2023-10-12', '2023-10-11')
Time <- c('08:50', '06:58', '09:17', '09:04', '08:44', '08:55', '07:55') 
bg <- c(128, 144, 137, 115, 136, 122, 150)
missed_meds <- c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)
no_exercise <- c(FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE)

b2 <- data.frame(Date, Time, bg, missed_meds, no_exercise)

b2$Date <- as.Date(b2$Date)
# add "status" columns, could also be defined as factor. 
b2$medStat <- c("missed_meds",NA, NA, NA, NA, NA, "missed_meds")
b2$exercise <- c(NA, NA, "missed_exercise",NA,"missed_exercise", 
"missed_exercise", "missed_exercise")

Then your ggplot call would be like this:

ggplot(data = b2, aes(x = Date, y = bg)) +
  geom_line() +
  geom_point(aes(shape = medStat), size = 3)+
  geom_point(aes(shape = exercise),size = 3)+
  scale_y_continuous(name = "Blood glucose (mg/dL)",
 breaks = seq(100, 230, by = 20)
  ) +
  geom_hline(yintercept = 130) +
  scale_shape_manual(name = "Conditions",
 labels = c("Missed meds",
"Missed exercise"),
     values = c(20, 4)
  )


Note that this method then gets very close without the scale_shape_manual, too. 

Hope that helps. 
Tim


> Date: Mon, 30 Oct 2023 20:55:17 +
> From: Kevin Zembower 
> To: r-help@r-project.org 
> Subject: [R] Missing shapes in legend with scale_shape_manual
> Message-ID:
> <0100018b825e8f7f-646d2539-f8b5-4e1a-afc3-5d29f961967f-
> 000...@email.amazonses.com>
> 
> Content-Type: text/plain; charset="utf-8"
> 
> Hello,
> 
> I'm trying to plot a graph of blood glucose versus date. I also record
> conditions, such as missing the previous night's medications, and missing
> exercise on the previous day. My data looks like:
> 
> > b2[68:74,]
> # A tibble: 7 × 5
>   Date   Time  bg missed_meds no_exercise
> 
> 1 2023-10-17 08:50128 TRUEFALSE
> 2 2023-10-16 06:58144 FALSE   FALSE
> 3 2023-10-15 09:17137 FALSE   TRUE
> 4 2023-10-14 09:04115 FALSE   FALSE
> 5 2023-10-13 08:44136 FALSE   TRUE
> 6 2023-10-12 08:55122 FALSE   TRUE
> 7 2023-10-11 07:55150 TRUETRUE
> >
> 
> This gets me most of the way to what I want:
> 
> ggplot(data = b2, aes(x = Date, y = bg)) +
> geom_line() +
> geom_point(data = filter(b2, missed_meds),
>shape = 20,
>size = 3) +
> geom_point(data = filter(b2, no_exercise),
>shape = 4,
>size = 3) +
> geom_point(aes(x = Date, y = bg, shape = missed_meds),
>alpha = 0) + #Invisible point layer for shape mapping
> scale_y_continuous(name = "Blood glucose (mg/dL)",
>breaks = seq(100, 230, by = 20)
>) +
> geom_hline(yintercept = 130) +
> scale_shape_manual(name = "Conditions",
>labels = c("Missed meds",
>   "Missed exercise"),
>values = c(20, 4),
>## size = 3
>)
> 
> However, the legend just prints an empty square in front of the labels.
> What I want is a filled circle (shape 20) in front of "Missed meds" and a 
> filled
> circle (shape 4) in front of "Missed exercise."
> 
> My questions are:
>  1. How can I fix my plot to show the shapes in the legend?
>  2. Can my overall plotting method be improved? Would you do it this way?
> 
> Thanks so much for your advice and guidance.
> 
> -Kevin
> 
> 
> 
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] win110 + R 4.3.0 dowload only when method = "wininet"

2023-05-12 Thread Howard, Tim G (DEC) via R-help


Also note that with R version 4.2.1 and later, there's another change for 
computers within many corporate firewall systems (that monitor network 
traffic).  The result is that in the default settings R can't connect out to 
https connections (and thus can't install or update packages). You just get an 
error that R can't connect, it may be because of the super-special security in 
your network.


I don't know if this is your issue, but it seemed worth pointing out if so. 
(and worth getting the word out there for others...)



Here's the discussion of the bug and the patch.




https://bugs.r-project.org/show_bug.cgi?id=18379




If you are in this situation, the fix is to add a new User Environment Variable:



R_LIBCURL_SSL_REVOKE_BEST_EFFORT and set it to TRUE



Tim


>>>>>>
Date: Thu, 11 May 2023 09:39:25 +0300
From: Ivan Krylov 
To: =?UTF-8?B?U2ViYXN0acOhbg==?= Kruk Gencarelli

Cc: "r-help@r-project.org" 
Subject: Re: [R] win110 + R 4.3.0 dowload only when method = "wininet"
Message-ID: <20230511090040.2015ff59@Tarkus>
Content-Type: text/plain; charset="utf-8"

On Wed, 10 May 2023 22:13:15 +
Sebasti�n Kruk Gencarelli  wrote:

> I don�t know what kind of proxy is used by my system.

Do you at least know whether it's an HTTP proxy, a SOCKS proxy, or
something completely different? Do you need to authenticate to the
proxy?

> How I can add my proxy and port to my R configuration?

libcurl recognises the "http_proxy" environment variable, which can
include the protocol, the address and the port of the proxy server:
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcurl.se%2Flibcurl%2Fc%2FCURLOPT_PROXY.html=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=UHNtHLqF67%2FGq9%2FILSW6pHmuhhaHsdI1qwl%2FVXun5hI%3D=0<https://curl.se/libcurl/c/CURLOPT_PROXY.html>

For example, if you have a SOCKS5 hostname-resolving proxy on address
192.2.0.1 and port 1080, you can try calling Sys.setenv('http_proxy' =
'socks5h://192.2.0.1:1080'). For a different example, an HTTP proxy on
198.51.100.2, port 8080 can be set up as Sys.setenv('http_proxy' =
'https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2F198.51.100.2%3A8080%2F=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=K8Lp3oLzHHA9LxIq6U5QrdQvEV63%2BTVOq930i6HABlw%3D=0<http://198.51.100.2:8080/>').
 (In both cases, the proxy will be used by
curl for HTTP and HTTPS access, but not, say, FTP or other protocols.
For HTTPS via HTTP proxy, curl will be using the CONNECT verb.)

Detecting the right proxy for a given address (and yes, it's expected
to do that per-address) is hard�. It's been a "nice to have" feature in
curl for years: 
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcurl.se%2Fdocs%2Ftodo.html%23auto_detect_proxy=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Ucxe4e4OK%2BSUzB9dhhBPrtcb%2FlLF3bcq6tC1OocY8IM%3D=0<https://curl.se/docs/todo.html#auto_detect_proxy>,
 so
I'm afraid it won't happen any time soon. It's unfortunate that the
curl back-end will break something that used to work with wininet.

I wonder whether it could make sense to call
WinHttpGetIEProxyConfigForCurrentUser() [*] when setting up connections
on Windows. There's no simple way to deal with the proxy
auto-configuration file [**], but we could at least call
curl_easy_setopt(handle, CURLOPT_PROXY,
wchar_to_current_locale(proxy_config->lpszProxy)); to make the
transition from wininet to curl slightly less painful. Or would that
break even more installations? This won't help in the more complicated
cases [***], though.

--
Best regards,
Ivan

[*]
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flearn.microsoft.com%2Fen-us%2Fwindows%2Fwin32%2Fapi%2Fwinhttp%2Fnf-winhttp-winhttpgetieproxyconfigforcurrentuser=05%7C01%7Ctim.howard%40dec.ny.gov%7Cba9855f6b8bd4d23531e08db52069dd1%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638193960595128444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=ERbQZxCXaR4wc8%2F5%2BRiVYdFt3lmFyCQ29tGYalcljqk%3D=0<https://learn.microsoft.com/en-us/windows/win32/api/winhttp/nf-winhttp-winhttpgetieproxyconfigforcurrentuser>

[**]
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.mozilla.org%2Fen-US%2Fdocs%2FWeb%2FHTTP%2FProxy_servers_and_tunneli

Re: [R] Regex Split?

2023-05-05 Thread Howard, Tim G (DEC) via R-help
If you only want the character strings, this seems a little simpler:

> strsplit("a bc,def, adef ,,gh", "[ ,]+", perl=T)
[[1]]
[1] "a""bc"   "def"  "adef" "gh"  


If you need delimeters (the commas) you could then add them back in again 
afterwards. 
Tim

--

Message: 2
Date: Thu, 4 May 2023 23:59:33 +0300
From: Leonard Mada 
To: R-help Mailing List 
Subject: [R] Regex Split?
Message-ID: <7b1cdbe7-0086-24b4-9da6-369296ead...@syonic.eu>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Dear R-Users,

I tried the following 3 Regex expressions in R 4.3:
strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
# "a"    "bc"   ","    "def"  ","    "" "adef" ","    "," "gh"

strsplit("a bc,def, adef ,,gh", " |(?https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R version 4.2.1 install.packages does not work with IE proxy setting

2022-10-06 Thread Howard, Tim G (DEC) via R-help
Petr, 
You also might want to check out the bug reported here:

https://bugs.r-project.org/show_bug.cgi?id=18379

it was fixed and the last two comments discuss how to handle it in Windows:

you add a new User Environment Variable:
R_LIBCURL_SSL_REVOKE_BEST_EFFORT and set it to TRUE

This fix is in R-4.2.1 Patched (I don't know if it has made it out to the full 
distribution) and works in my 'corporate' environment.  Perhaps it also applies 
to your environment. 

Tim


Date: Wed, 5 Oct 2022 10:34:02 +
From: PIKAL Petr 
To: Ivan Krylov 
Cc: r-help mailing list 
Subject: Re: [R]  R version 4.2.1 install.packages does not work with
    IE proxy setting
Message-ID:
    <9b38aacf51d746bb87a9cc3765a16...@srvexchcm1302.precheza.cz>
Content-Type: text/plain; charset="us-ascii"

Thanks,

the workaround works but we need try the "permanent" solution with
Renviron.site file in future.

Cheers
Petr

> -Original Message-
> From: Ivan Krylov 
> Sent: Tuesday, October 4, 2022 5:43 PM
> To: PIKAL Petr 
> Cc: r-help mailing list 
> Subject: Re: [R] R version 4.2.1 install.packages does not work with IE
proxy
> setting
>
> On Tue, 4 Oct 2022 11:01:14 +
> PIKAL Petr  wrote:
>
> > After we installed new R version R 4.2.1 installing packages through
> > IE proxy setting is compromised with warning that R could not connect
> > to server (tested in vanilla R).
>
> R 4.1 deprecated the use of download.file(method = 'wininet'). R 4.2
switched
> the default download method to 'libcurl' and started giving warnings for
> 'wininet' and http[s]:// URLs.
>
> A workaround to get it working right now would be to set
> options(download.file.method = 'wininet') and live with the resulting
warnings
> while R downloads the files, but a next version of R may remove 'wininet'
> support altogether.
>
> In order to get it working with the 'libcurl' method, you'll need to
provide some
> environment variables to curl:
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fpipermail%2Fr-help%2F2022-September%2F475917.htmldata=05%7C01%7Ctim.howard%40dec.ny.gov%7C1b8601d6e602486b347508daa781d26e%7Cf46cb8ea79004d108ceb80e8c1c81ee7%7C0%7C0%7C638006473285010034%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=7w1P6Sfu7V3DUM4a3Ezgmf87bXn8CnC%2FipTvXfBpI0c%3Dreserved=0
>
> Not sure if libcurl would accept a patch to discover the Windows proxy
settings
> automatically, but I don't think it does that now.
>
> --
> Best regards,
> Ivan


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Formula compared to call within model call

2021-08-11 Thread Tim Taylor
Manipulating formulas within different models I notice the following:

m1 <- lm(formula = hp ~ cyl, data = mtcars)
m2 <- update(m1, formula. = hp ~ cyl)
all.equal(m1, m2)
#> [1] TRUE
identical(m1, m2)
#> [1] FALSE
waldo::compare(m1, m2)
#> `old$call[[2]]` is a call
#> `new$call[[2]]` is an S3 object of class , a call

I'm aware formulas are a form of call but what I'm unsure of is whether there 
is meaningful difference between the two versions of the models? Any 
clarification, even just on the relation between formulas and calls would be 
useful.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Potential Bug: file.show does not support double-byte characters in the file path

2021-06-17 Thread Tim via R-help
Hi,

I may have come across a bug regarding the file.show function in the base 
package.

### Bug Description

file.show does not support double-byte characters in the file path.

### Platform Info

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese 
(Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.1.0 tools_4.1.0

### Steps to Reproduce

1. Install R-4.1.0 for Windows on Windows 10 x64 21H1 (Chinese-Simplified, 
zh-CN)
2. Open RGui
3. Create a file named "test.txt" under "C:\Test", so that the full file path 
does not include double-byte characters
4. Create a file named "test.txt" under "C:\测试", so that the full file path 
includes double-byte characters
5. Run the following commands in the console
> t1 = "C:\\Test\\test.txt"
> t1
[1] "C:\\Test\\test.txt"
> file.show(t1)
>
> t2 = "C:\\测试\\test.txt"
> t2
[1] "C:\\测试\\test.txt"
> file.show(t2)
Warning message:
In file.show(t2) : file.show(): file 'C:\测试\test.txt' does not exist

Actual Result:

file.show() can correctly open the "C:\Test\test.txt" as the file path does not 
contain double-byte characters, but it fails to do so for "C:\测试\test.txt".

### Additional information

For users with usernames that include double-byte characters, this bug even 
prevents demo() from functioning properly.

> demo()
Warning message:
In file.show(outFile, delete.file = TRUE, title = paste("R", tolower(x$title))) 
:
file.show():不存在'C:\Users\娴嬭瘯涓瓇1\AppData\Local\Temp\RtmpGuWTun\RpackageIQR197068c75431'这个文件

file.show() can correctly read from files containing double-byte characters if 
the file paths do not. For example, the content for "C:\Test\test.txt" can be 
"Testing 123 测试". file.show("C:\\Test\\test.txt") would open this file and 
display its content correctly without specifying the encoding parameter.

Thanks,
Tim
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] weibulltools version 2.0.0 released on CRAN

2021-01-14 Thread Hensel, Tim-Gunnar
Dear all, 

we are happy to announce that a new version of weibulltools is available on 
CRAN. 
https://CRAN.R-project.org/package=weibulltools  

weibulltools is a package that focuses on statistical methods and 
visualizations that are often used in reliability engineering. 
The core idea is to provide a compact and easily accessible set of methods and 
visualization tools that make the examination and 
adjustment as well as the analysis and interpretation of field and bench test 
data as simple as possible. 

Compared to previous releases (<= 1.0.1), the new version (2.0.0) includes a 
data-based interface through which the obtained function outputs 
can be easily passed into subsequent methods and visualizations. 
Furthermore, all visualizations that were generated dynamically with 'plotly', 
can now also be created statically with the support of 'ggplot2'. 

The previously used vector-based approach has been largely preserved but is no 
longer recommended.

Examples for the usage of the new interface, the documentation of all 
functions, as well as further changes can be found at 
https://tim-tu.github.io/weibulltools/

If you notice a bug or have suggestions for improvements, please submit an 
issue with a minimal reproducible example at 
https://github.com/Tim-TU/weibulltools/issues/

Best regards, 
Tim-Gunnar Hensel and David Barkemeyer

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] understanding as.list(substitute(...()))

2020-10-06 Thread Tim Taylor
Cheers Denes,

That's useful to know.  I'll stick with the match.call version
instead.  Interestingly I initially tried to post to R-devel (as I
thought it may involve something internal) but was asked to post here
instead.

Best

Tim


On Tue, 6 Oct 2020 at 08:22, Dénes Tóth  wrote:
>
> Hi Tim,
>
> I have also asked a similar question a couple of months ago, and someone
> else did the same recently, maybe on r-devel.
>
> We received no "official" response, but Deepayan Sarkar (R Core Team
> member) claimed that:
>
> "
> There is no documented reason for this to work (AFAIK), so again, I
> would guess this is a side-effect of the implementation, and not a API
> feature you should rely on. This is somewhat borne out by the
> following:
>
>  > foo <- function(...) substitute({...()})
>  > foo(abc$de, fg[h], i)
> {
> pairlist(abc$de, fg[h], i)
> }
>  > foo(abc$de, fg[h], , i) # add a missing argument for extra fun
> {
> as.pairlist(alist(abc$de, fg[h], , i))
> }
>
> which is not something you would expect to see at the user level. So
> my recommendation: don't use ...() and pretend that you never
> discovered it in the first place. Use match.call() instead, as
> suggested by Serguei.
>
> [Disclaimer: I have no idea what is actually going on, so these are
> just guesses. There are some hints at
> https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Dot_002ddot_002ddot-arguments
> if you want to folllow up.]
> "
>
> Cheers,
> Denes
>
>
>
>
> On 10/6/20 8:38 AM, Tim Taylor wrote:
> > I probably need to be more specific.  What confuses me is not the use
> > of substitute, but the parenthesis after the dots.  It clearly works
> > and I can make guesses as to why but it is definitely not obvious.
> > The following function gives the same final result but I can
> > understand what is happening.
> >
> > dots <- function (...) {
> > exprs <- substitute(list(...))
> > as.list(exprs[-1])
> > }
> >
> > In the original, dots <- function(...) as.list(substitute(...())),
> > Does ...() get parsed in a special way?
> >
> > Tim
> >
> > On Tue, 6 Oct 2020 at 05:30, Bert Gunter  wrote:
> >>
> >> You need to understand what substitute() does -- see ?substitute and/or a 
> >> tutorial on "R computing on the language" or similar.
> >>
> >> Here is a simple example that may clarify:
> >>
> >>> dots <- function(...) as.list(substitute(...()))
> >>> dots(log(foo))
> >> [[1]]
> >> log(foo)  ## a call, a language object
> >>
> >>> dots2 <- function(...) as.list(...)
> >>> dots2(log(foo))
> >> Error in as.list(...) : object 'foo' not found
> >> ## substitute() does not evaluate its argument; as.list() does
> >>
> >> Cheers,
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along and 
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Mon, Oct 5, 2020 at 1:37 PM Tim Taylor 
> >>  wrote:
> >>>
> >>> Could someone explain what is happening with the ...() of the
> >>> following function:
> >>>
> >>> dots <- function(...) as.list(substitute(...()))
> >>>
> >>> I understand what I'm getting as a result but not why.   ?dots and
> >>> ?substitute leave me none the wiser.
> >>>
> >>> regards
> >>> Tim
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] understanding as.list(substitute(...()))

2020-10-06 Thread Tim Taylor
I probably need to be more specific.  What confuses me is not the use
of substitute, but the parenthesis after the dots.  It clearly works
and I can make guesses as to why but it is definitely not obvious.
The following function gives the same final result but I can
understand what is happening.

dots <- function (...) {
   exprs <- substitute(list(...))
   as.list(exprs[-1])
}

In the original, dots <- function(...) as.list(substitute(...())),
   Does ...() get parsed in a special way?

Tim

On Tue, 6 Oct 2020 at 05:30, Bert Gunter  wrote:
>
> You need to understand what substitute() does -- see ?substitute and/or a 
> tutorial on "R computing on the language" or similar.
>
> Here is a simple example that may clarify:
>
> > dots <- function(...) as.list(substitute(...()))
> > dots(log(foo))
> [[1]]
> log(foo)  ## a call, a language object
>
> > dots2 <- function(...) as.list(...)
> > dots2(log(foo))
> Error in as.list(...) : object 'foo' not found
> ## substitute() does not evaluate its argument; as.list() does
>
> Cheers,
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Oct 5, 2020 at 1:37 PM Tim Taylor  
> wrote:
>>
>> Could someone explain what is happening with the ...() of the
>> following function:
>>
>> dots <- function(...) as.list(substitute(...()))
>>
>> I understand what I'm getting as a result but not why.   ?dots and
>> ?substitute leave me none the wiser.
>>
>> regards
>> Tim
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] understanding as.list(substitute(...()))

2020-10-05 Thread Tim Taylor
Could someone explain what is happening with the ...() of the
following function:

dots <- function(...) as.list(substitute(...()))

I understand what I'm getting as a result but not why.   ?dots and
?substitute leave me none the wiser.

regards
Tim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Compositional Data Analysis

2020-07-22 Thread Tim Locke
Hi All,

I am currently working on a project examining the health effects of physical
activity using a compositional data (CoDa) approach. I am using a
linear regression to measure the effect of physical activity. What I want to
do is predict an outcome, e.g. body mass index, based on the mean physical
activity composition. I then want to alter this composition by reallocating
time from one behavior (such as sedentary) to light intensity physical
activity, and see what effect this has on BMI. I have included the code
below. My problem is that I cannot seem to generate this new composition
correctly.



library(compositions)
attach(Data_Analysis)

#Replace zeros in raw data with 0.1
bedtimemins[bedtimemins==0] <- 0.1
sedwakingmins[sedwakingmins==0] <- 0.1
standingtimemins[standingtimemins==0] <- 0.1
LIPAmins[LIPAmins==0] <- 0.1
MVPAmins[MVPAmins==0] <- 0.1

# make the composition by   binding the components together
comp <- cbind(bedtimemins, sedwakingmins, standingtimemins, LIPAmins,
MVPAmins)
# tell R that comp is a compositional   variable
comp <- acomp(comp)

#Generate variation matrix
PAvar_matrix <- var.acomp(comp)

# make the ilr multiple linear regression model. BMI = body mass index. .
lm <- lm(bmi~ ilr(comp) + age)

#determine the mean composition, geometric mean of behaviors which sum to 1
comp.mean <- mean(comp)

# predict BMI for the mean composition from above, keeping age constant at
its mean.
mean.pred   <- predict(lm,  newdata=list(comp=comp.mean, age = mean(age)))

#generate new compostion of PA with 30 minutes of sleep reallocated to 30
mins of LIPA
new.comp<- acomp(comp.mean +c(-30/1440, 0/1440, 0/1440, 30/1440, 
0/1440))

#predict BMI based on new composition
pred<- predict(lm,  newdata=list(comp=new.comp, age=mean(age)))

#Estimated difference in BMI for the above reallocation
pred- mean.pred


When I run the command to generate the new composition, I receive the
message below

new.comp<- acomp(comp.mean +c(-30/1440, 0/1440, 0/1440, 30/1440, 
0/1440))
Warning messages:
1: In acomp(gsi.mul(x, y)) :
  Negative values in composition are used as detection limits
2: In acomp(comp.mean + c(-30/1440, 0/1440, 0/1440, 30/1440, 0/1440)) :
  Negative values in composition are used as detection limits

When I print comp.mean I get a result that seems to make sense:
comp.mean
 bedtimeminssedwakingmins standingtimemins LIPAmins   
MVPAmins

  0.36410089   0.32460230   0.222783240.06948202
 0.01903155


I can't say the same for the new composition:
new.comp
 bedtimeminssedwakingmins standingtimemins LIPAmins   
MVPAmins
   <5.240217  BDL  BDL 
1.00BDL

Any help or advice would be greatly appreciated!



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] project path in Rmd (Ivan Calandra)

2020-04-02 Thread Howard, Tim G (DEC) via R-help
You also should be able to reference locations from the 'root' of your project 
using the here() package. 

https://here.r-lib.org/

Best, 
Tim Howard


> Date: Thu, 2 Apr 2020 10:21:47 +0100

> From: Rui Barradas 

> To: Ivan Calandra 

> Cc: "r-help@r-project.org" 

> Subject: Re: [R] project path in Rmd

> Message-ID: <09f30516-7af5-fee7-5efd-e6db69c75...@sapo.pt>

> Content-Type: text/plain; charset="utf-8"; Format="flowed"

> 

> Hello,

> 

> This is not an answer to the original problem, it's about '..'. (And a

> bit more.)

> 

> About '..', see if the following sequence of instructions can help.

> Subdirectories '~/tmp' and '~/snap' exist on my PC, change to

> '~/analysis/scripts' or to what makes sense on yours.

> 

> od <- getwd()  # save this for later

> 

> setwd('~/tmp') #

> list.files('../snap')  # goes up one level and

>     # executes a SO/files related command

> curr <- getwd()

> basename(curr) # these two instructions are meant to show that

> dirname(curr)  # you don't need to know how many levels you have

>     # to go up, you can parse 'curr' if basename and

>     # dirname are not enough

> 

> setwd(od)  # back to where I was

> 

> 

> Hope this helps,

> Rui Barradas

> Às 10:02 de 02/04/20, Ivan Calandra escreveu:

> I do not know this ".." command (could you please show me how to use it

> in a relative path?), but it sounds like a good start.

>

> But it implies that I know in advance how many folders up the parent

> directory is. I guess in most cases it will always be the same, but it

> would be even better if it could be applied generically.

>

> As I said, ideally, I would like to get the project directory from a

> script located in a subfolder.

>

> Thanks!

> Ivan

>

> --

> Dr. Ivan Calandra

> TraCEr, laboratory for Traceology and Controlled Experiments

> MONREPOS Archaeological Research Centre and

> Museum for Human Behavioural Evolution

> Schloss Monrepos

> 56567 Neuwied, Germany

> +49 (0) 2631 9772-243

> https://www.researchgate.net/profile/Ivan_Calandra

>

> On 02/04/2020 10:54, Ivan Krylov wrote:

>> On Thu, 2 Apr 2020 10:30:29 +0200

>> Ivan Calandra  wrote:

>>

>>> The problem I then have is to specify the path for 'raw_data' and

>>> 'derived_data' <...> And these folders are not subfolders of

>>> the working directory '~/analysis/scripts'.

>>

>>> I would like to avoid absolute paths of course

>> Is there a reason to avoid relative paths built using '..' to access

>> parent directories?

>>

>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substr gives empty output

2018-01-22 Thread Howard, Tim G (DEC)
In 

 y <- substr(x, i, 1)

your third integer needs to be the location not the number of digits, so change 
it to 

 y <- substr(x, i, i)

and you should get what you want. 
Cheers, 
Tim

> Date: Sun, 21 Jan 2018 10:50:31 -0500
> From: Ek Esawi <esaw...@gmail.com>
> To: Luigi Marongiu <marongiu.lu...@gmail.com>, r-help@r-project.org
> Subject: Re: [R] substr gives empty output
> Message-ID:
> <CA+ZkTxubYDSZ3iqsg_=be9HBA2_3-TE95=mXbh4atvG-
> ri_...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> The reason you get "" is, as stated on the previous response and on the
> documentation of substr function, the function "When extracting, if start is
> larger than the string length then "" is returned.". This is what happens on
> your function.
> 
> HTH
> 
> EK
> 
> On Sun, Jan 21, 2018 at 3:59 AM, Luigi Marongiu <marongiu.lu...@gmail.com>
> wrote:
> > Dear all,
> > I have a string, let's say "testing", and I would like to extract in
> > sequence each letter (character) from it. But when I use substr() I
> > only properly get the first character, the rest is empty (""). What am
> > I getting wrong?
> > For example, I have this code:
> >
> >>>>
> > x <- "testing"
> > k <- nchar(x)
> > for (i in 1:k) {
> >   y <- substr(x, i, 1)
> >   print(y)
> > }
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interesting behavior of lm() with small, problematic data sets

2017-09-05 Thread Glover, Tim
I've recently come across the following results reported from the lm() function 
when applied to a particular type of admittedly difficult data.  When working 
with
small data sets (for instance 3 points) with the same response for different 
predicting variable, the resulting slope estimate is a reasonable approximation 
of the expected 0.0, but the p-value of that slope estimate is a surprising 
value.  A reproducible example is included below, along with the output of the 
summary of results

# example code
x <- c(1,2,3)
y <- c(1,1,1)

#above results in{ (1,1) (2,1) (3,1)} data set to regress

new.rez <- lm (y ~ x) # regress constant y on changing x)
summary(new.rez) # display results of regression

 end of example code

Results:

Call:
lm(formula = y ~ x)

Residuals:
 1  2  3
 5.906e-17 -1.181e-16  5.906e-17

Coefficients:
  Estimate Std. Errort value Pr(>|t|)
(Intercept)  1.000e+00  2.210e-16  4.525e+15   <2e-16 ***
x   -1.772e-16  1.023e-16 -1.732e+000.333
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.447e-16 on 1 degrees of freedom
Multiple R-squared:  0.7794,Adjusted R-squared:  0.5589
F-statistic: 3.534 on 1 and 1 DF,  p-value: 0.3112

Warning message:
In summary.lm(new.rez) : essentially perfect fit: summary may be unreliable


##

There is a warning that the summary may be unreliable sue to the essentially 
perfect fit, but a p-value of 0.3112 doesn’t seem reasonable.
As a side note, the various r^2 values seem odd too.







Tim Glover
Senior Scientist II (Geochemistry, Statistics), Americas - Environment & 
Infrastructure, Amec Foster Wheeler
271 Mill Road, Chelmsford, Massachusetts, USA 01824-4105
T +01 978 692 9090  D +01 978 392 5383  M +01 850 445 5039
tim.glo...@amecfw.com  amecfw.com


This message is the property of Amec Foster Wheeler plc and/or its subsidiaries 
and/or affiliates and is intended only for the named recipient(s). Its contents 
(including any attachments) may be confidential, legally privileged or 
otherwise protected from disclosure by law. Unauthorised use, copying, 
distribution or disclosure of any of it may be unlawful and is strictly 
prohibited. We assume no responsibility to persons other than the intended 
named recipient(s) and do not accept liability for any errors or omissions 
which are a result of email transmission. If you have received this message in 
error, please notify us immediately by reply email to the sender and confirm 
that the original message and any attachments and copies have been destroyed 
and deleted from your system. If you do not wish to receive future unsolicited 
commercial electronic messages from us, please forward this email to: 
unsubscr...@amecfw.com and include “Unsubscribe” in the subject line. If 
applicable, you will continue to receive invoices, project communications and 
similar factual, non-commercial electronic communications.

Please click http://amecfw.com/email-disclaimer for notices and company 
information in relation to emails originating in the UK, Italy or France.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CRAN I-MR / Xbar-R / Xbar-S control chart package ?

2017-07-09 Thread Tim Smith
Interesting, thanks for that.  I came accross qcc but my quick scan of
the docs is that it only did xbars but maybe I need to re-read the
docs, I guess it does the individual plot versions (I-MR) too.

Tim

On 8 July 2017 at 20:53, Rui Barradas <ruipbarra...@sapo.pt> wrote:
> Hello,
>
> I have no experience with I-MR charts but a google search found package qcc.
> Maybe it's what you're looking for.
>
> Hope this helps,
>
> Rui Barradas
>
>
> Em 08-07-2017 09:07, Tim Smith escreveu:
>>
>> Hi,
>>
>> I've had a quick look through the package list, and unless I've missed
>> something, I can't seem to find anything that will do I-MR / Xbar-R /
>> Xbar-S control charts ?
>>
>> Assuming there is something out there, can anyone point me in the
>> right direction ?
>>
>> Thanks !
>>
>> TIm
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CRAN I-MR / Xbar-R / Xbar-S control chart package ?

2017-07-08 Thread Tim Smith
Hi,

I've had a quick look through the package list, and unless I've missed
something, I can't seem to find anything that will do I-MR / Xbar-R /
Xbar-S control charts ?

Assuming there is something out there, can anyone point me in the
right direction ?

Thanks !

TIm

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] CRAN updates: rpg and odeintr

2017-02-20 Thread Tim Keitt
rpg is a package for working with postgresql: https://github.com/thk686/rpg
odeintr is a package for integrating differential equations:
https://github.com/thk686/odeintr

Cheers,
THK

http://www.keittlab.org/

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem installing/loading packages from Africa

2016-04-22 Thread Tim Werwie
I'm very new to R and I live in Mali, west Africa. I'm on *OS X 10.7.5*. I
downloaded and installed *R 3.2.1*. I downloaded and installed *RStudio
00.99.893*.

I ran through the free Microsoft data camp intro to R course, then started
another free course through 'edX', for Data and Statistics for Life
Sciences, using R. The first prompt in the course is to
install.packages("swirl"). Copied below are the various error messages I
get when trying to install or load any package.

My best guess is that the problems I'm having are due to being in west
Africa, with unreliable connections, weak connections and no CRAN Mirror
closer than Italy or Spain (as far as I know). I checked into common
package errors on the RStudio page, but I'm not confident enough in my
computing to get into internet proxies and some of the other suggested
troubleshooting.

Any insight would be very helpful. See error messages below. Thanks -- Tim

- When attempting to install, the print-out I get in the Console in RStudio
is any length of: > install.packages("swirl")
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
  0 00 00 0  0  0 --:--:--  0:00:10
--:--:-- 0  0 00 00 0  0  0 --:--:--
0:00:10 --:--:-- 0  0 00 00 0  0  0
--:--:--  0:00:11 --:--:-- 0 22  207k   22 483840 0   3833
0  0:00:55  0:00:12  0:00:43 19740 30  207k   30 648230 0
4808  0  0:00:44  0:00:13  0:00:31 19578 38  207k   38 812070
0   5638  0  0:00:37  0:00:14  0:00:23 19193 46  207k   46 97591
0 0   6299  0  0:00:33  0:00:15  0:00:18 20076 53  207k   53
111k0 0   6966  0  0:00:30  0:00:16  0:00:14 22717 69  207k
69  143k0 0   8423  0  0:00:25  0:00:17  0:00:08 20487 76
207k   76  159k0 0   8821  0  0:00:24  0:00:18  0:00:06 19621
92  207k   92  191k0 0  10085  0  0:00:21  0:00:19  0:00:02
22841100  207k  100  207k0 0  10363  0  0:00:20  0:00:20
--:--:-- 230100  207k  100  207k0 0  10363  0  0:00:20  0:00:20
--:--:-- 23922
The downloaded binary packages are in

/var/folders/yb/7z339kn56mdbwx92ydmsqswcgn/T//RtmpBzQ16u/downloaded_packages

For other packages, ggplot2, forexample, a similar printout can run for
hundreds and hundred of lines. So RStudio tells me that the packages are
available to load, BUT:

- when *loading* any package an error comes up saying package 'name of
package' was built under R version 3.2.5 but info on swirl says it should
work on any version of R above 3.0.2. I get similar errors for other
packages (treemap, ggplot2, etc).

- Or sometimes I"ll get this: Error : .onAttach failed in attachNamespace()
for 'swirl', details: call: stri_c(..., sep = sep, collapse = collapse,
ignore_null = TRUE)
error: object 'C_stri_join' not found
In addition: Warning message:
package ‘swirl’ was built under R version 3.2.5
Error: package or namespace load failed for ‘swirl’

Any suggestions?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to conduct a PERMANOVA using a dissimilarity matrix

2015-12-23 Thread Tim Richter-Heitmann
arity matrix

Dear Michael,

You need to create another file dividing the different categories/ variables 
(make sure that they are in the same order for the variable you are testing), 
then attach it and run the adonis function. For example:

attach(metadata)

adonis(formula = vegdist(matrixfile, method = "bray") ~ Site, data = metadata)

It's also important to think about if you have nestedness in your design, in 
which case you would need to add strata = factor:

adonis(formula = vegdist(matrixfile, method = "bray") ~ Species, data = 
metadata,  strata = Site)

I hope this helps for now! Let me know if you need more detail.

Hazel (BSc student).




From: R-help <r-help-boun...@r-project.org> on behalf of 
michael.eisenr...@agroscope.admin.ch <michael.eisenr...@agroscope.admin.ch>
Sent: 22 December 2015 16:18
To: r-help@r-project.org
Subject: [R] How to conduct a PERMANOVA using a dissimilarity matrix

Dear R-List members,

I have to compare how similar two types of forest (old growth=O) and (young 
forest=Y) in terms of moth communities are.
I sampled moths at 4 O and 4 Y sites.
I need to analyse the data using a PERMANOVA approach. But I am having a really 
hard time to do this in R.

I found out that I need to create a dissimilarity matrix and read this matrix 
then into R to conduct a one-way Permanova with forest type (O or Y) as factor.
The package vegan with the function "adonis" seems to be able to do a permanova.

I created the matrix (based on Soerenson (dis)similarities) and imported it 
into R.

Could anyone help me with the next step? How can I conduct a permanova on my 
dataset? In the end I would need an R value and significance level telling me 
if community compositions differ significantly between sites.

Below is my code (not too much) and the data for the matrix.

#dput for matrix:

structure(c("", "O", "Y", "Y", "Y", "O", "O", "Y", "O", "O", "0", "0.544", "0.519", "0.533", "0.481", "0.548", "0.518", "0.479", "Y", "0.544", "0", "0.383", "0.416", "0.383", "0.358", "0.434", "0.399", "Y", "0.519", "0.383", "0", "0.398", "0.359", "0.392", "0.401", "0.374", "Y", "0.533", "0.416", "0.398", "0", "0.398", "0.399", "0.358", "0.348", "O", "0.481", "0.383", "0.359", "0.398", "0", "0.37", "0.317", "0.354", "O", 
"0.548", "0.358", "0.392", "0.399", "0.37", "0", "0.39", "0.365", "Y", "0.518", "0.434", "0.401", "0.358", "0.317", "0.39", "0", "0.371", "O", "0.479", "0.399", "0.374", "0.348", "0.354", "0.365", "0.371", "0"), .Dim = c(9L, 9L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9")))


#Code
#load dissimilarity matrix (based on Soerenson similarity) 
moth_dta<-read.csv("Geo_sorenson_8.csv",header=T,sep=";")#Creates matrix from 
imported data
moth_dta<-as.matrix(moth_dta)
moth_dta
library(vegan)


Thank you very much,
Michael

Eisenring Michael, Msc.
PhD Student

Federal Department of Economic Affairs, Education and Research EAER Institute 
of Sustainability Sciences ISS Biosafety

Reckenholzstrasse 191, CH-8046 Z rich
Tel. +41 44 37 77181
Fax +41 44 37 77201
michael.eisenr...@agroscope.admin.ch<mailto:michael.eisenr...@agroscope.admin.ch>
www.agroscope.ch<http://www.agroscope.ch/>


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regular R freezes

2015-12-22 Thread Tim Richter-Heitmann

Dear List,

some days ago my R started to regularily freezes, especially during 
these operations:


mso(,permutations=999), package vegan
anova.cca(), package vegan
forward.sel(), package packfor

I am running R on a 64-bit Windows 7 installation, 4 cores, 8 GB RAM.
I just today updated R to 3.2.3, all packages, and R-Studio to their 
most recent versions.

Still the error persists.
Two things are strange:

All these functions (and more) have been working without any problem for 
2 years.
The freezes occur at different points in the algorithms. For 
forward.sel(), a stepwise variable selection process
in species distribution modelling, the system fails at different steps 
(say after variable 8 or 18), but inevitably fails.
R-Studio is still responsive meanwhile, but the "Stop"-Buttom does not 
work, and the programm needs to be restarted.


Is there any way of troubleshooting or finding out who the culprit is?

Thanks.

--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to find out if two cells in a dataframe belong to the same pre-specified factor-level

2015-09-29 Thread Tim Richter-Heitmann

Thank you, that turned out to work very well.

If you want to, you can answer it here:

http://stackoverflow.com/questions/32809249/how-to-find-out-if-two-cells-in-a-dataframe-belong-to-the-same-pre-specified-fac/

The question wasnt properly answered, which is why i switched to R-list.


On 28.09.2015 20:15, Adams, Jean wrote:
Here's one approach that works.  I made some changes to the code you 
provided.  Full working example code given below.


library(reshape)
library(ggplot2)
library(dplyr)

dist1 <- matrix(runif(16), 4, 4)
dist2 <- matrix(runif(16), 4, 4)
rownames(dist1) <- colnames(dist1) <- paste0("A", 1:4)
rownames(dist2) <- colnames(dist2) <- paste0("A", 1:4)
m1 <- melt(dist1)
m2 <- melt(dist2)
# I changed the by= argument here
final <- full_join(m1, m2, by=c("X1", "X2"))

# I made some changes to keep spcs character and grps factor
species <- data.frame(spcs=paste0("A", 1:4),
  grps=as.factor(c(rep("cat", 2), (rep("dog", 2, 
stringsAsFactors=FALSE)


# define new variables for final indicating group membership
final$g1 <- species$grps[match(final$X1, species$spcs)]
final$g2 <- species$grps[match(final$X2, species$spcs)]
final$group <- as.factor(with(final, ifelse(g1==g2, as.character(g1), 
"dif")))


# plot just the rows with matching groups
ggplot(final[final$group!="dif", ], aes(value.x, value.y, col=group)) +
  geom_point()
# plot all the rows
ggplot(final, aes(value.x, value.y, col=group)) + geom_point()

Jean


On Sun, Sep 27, 2015 at 4:22 PM, <trich...@uni-bremen.de 
<mailto:trich...@uni-bremen.de>> wrote:


Dear list,
I really couldnt find a better way to describe my question, so
please bear with me.

To illustrate my problem, i have a matrix with ecological
distances (m1) and one with genetic distances (m2) for a number of
biological species. I have merged both matrices and want to plot
both distances versus each other, as illustrated in this example:

library(reshape)
library(ggplot2)
library(dplyr)

dist1 <- matrix(runif(16),4,4)
dist2 <- matrix(runif(16),4,4)
rownames(dist1) <- colnames(dist1) <- paste0("A",1:4)
rownames(dist2) <- colnames(dist2) <- paste0("A",1:4)

m1 <- melt(dist1)
m2 <- melt(dist2)

final <- full_join(m1,m2, by=c("Var1","Var2"))
ggplot(final, aes(value.x,value.y)) + geom_point()

Here is the twist:
The biological species belong to certain groups, which are given
in the dataframe `species`, for example:

species <- data.frame(spcs=as.character(paste0("A",1:4)),
grps=as.factor(c(rep("cat",2),(rep("dog",2)

I want to check if a x,y pair in final (as in `final$Var1`,
`final$Var2`) belongs to the same group of species (here "cat" or
"dog"), and then want to color all groups specifically in the
x,y-scatterplot.
Thus, i need an R translation for:

final$group <- If (final$Var1 and final$Var2) belong to the same
group as specified
  in species, then assign the species group here, else do
nothing or assign NA

    so i can proceed with

ggplot(final, aes(value.x,value.y, col=group)) + geom_point()

So, in the example, the pairs A1-A1, A1-A2, A2-A1, A2-A2 should be
identified as "both cats", hence should get the factor "cat".

Thank you very much!


Tim

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Download showing as exploit

2015-09-21 Thread Tim Kingston

Hi , 

I work for the NHS, and our IT service has been unable to download as its 
anti-virus software says it contains an exploit.

Is this normal? Is there a way around this?

Kind regards,

Tim Kingston

Sent from my HTC



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cross correlation between two time series over nested time periods?

2015-05-14 Thread Tim via R-help
Thanks!

Here the period of my time series B is a proper subinterval of the period of A.

Does ccf(A,B) requires A and B span the same period?

If A and B don't span the same period, what does ccf do?
When moving B along the period of A by a lag, does ccf(A,B) calculate the cross 
correlation between B and the part of A overlapping with B?
Or does ccf(A,B) calculate the cross correlation between A and the extension of 
B to the period of A by zero padding?


On Thu, 5/14/15, Franklin Bretschneider brets...@xs4all.nl wrote:

 Subject: Re: [R] Cross correlation between two time series over nested time 
periods?

 Date: Thursday, May 14, 2015, 6:14 AM


 On
 2015-05-14 , at 02:11, Tim via R-help r-help@r-project.org
 wrote:


 Hello Tim,


 Re:


  I have two time series
  
  
  Calculate and plot cross correlation
 between two time series over nested time periods. Each point
 in either time series is for a week (not exactly a calendar
 week, but the first week in a calendar year always starts
 from Jan 1, and the other weeks in the same year follow
 that, and the last week of the year may contain more than 7
 days but no more than 13 days).
  
  The first time series A is stored in a
 compressed (.gz) text file, which looks like (each week and
 the corresponding time series value are separated by a comma
 in a line):
  week,value
  20060101-20060107,0
 
 20060108-20060114,5
  ...
  20061217-20061223,0
 
 20061224-20061230,0
 
 20070101-20070107,0
 
 20070108-20070114,4
  ...
  20150903-20150909,0
 
 20150910-20150916,1
  
  The second time series B is similarly
 stored in a compressed (.gz) text file, but over a subset of
 period of A, which looks like:
 
 week,value
  20130122-20130128,509
  20130129-20130204,204
 
 ...
  20131217-20131223,150
  20131224-20131231,148.0
  20140101-20140107,365.0
  20140108-20140114,45.0
  ...
 
 20150305-20150311,0
 
 20150312-20150318,364
  
  I wonder how to calculate the cross
 correlation between the two time series A and B (up to a
 specified maximum lag), and plot A and B in a single plot?





 The auto- and crosscorrelation
 functions are in the stats package:

 acf(x, lag.max = NULL,
    
 type = c(correlation, covariance,
 partial),
     plot = TRUE,
 na.action = na.fail, demean = TRUE, ...)

 ccf(x, y, lag.max = NULL, type =
 c(correlation, covariance),
     plot = TRUE, na.action = na.fail, ...)

 See further: ?ccf

 Succes and
 Best wishes,


 Frank
 ---



 Franklin Bretschneider
 Dept of
 Biology
 Utrecht University
 brets...@xs4all.nl

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cross correlation between two time series over nested time periods?

2015-05-13 Thread Tim via R-help
Dear R users,

I have two time series


Calculate and plot cross correlation between two time series over nested time 
periods. Each point in either time series is for a week (not exactly a calendar 
week, but the first week in a calendar year always starts from Jan 1, and the 
other weeks in the same year follow that, and the last week of the year may 
contain more than 7 days but no more than 13 days).

The first time series A is stored in a compressed (.gz) text file, which looks 
like (each week and the corresponding time series value are separated by a 
comma in a line):
week,value
20060101-20060107,0
20060108-20060114,5
...
20061217-20061223,0
20061224-20061230,0
20070101-20070107,0
20070108-20070114,4
...
20150903-20150909,0
20150910-20150916,1

The second time series B is similarly stored in a compressed (.gz) text file, 
but over a subset of period of A, which looks like:
week,value
20130122-20130128,509
20130129-20130204,204
...
20131217-20131223,150
20131224-20131231,148.0
20140101-20140107,365.0
20140108-20140114,45.0
...
20150305-20150311,0
20150312-20150318,364

I wonder how to calculate the cross correlation between the two time series A 
and B (up to a specified maximum lag), and plot A and B in a single plot? 

Thanks and regards,
Tim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split a dataframe by rownames and/or colnames

2015-02-23 Thread Tim Richter-Heitmann

Thank you very much for the line. It was doing the split as suggested.
However, i want to release all the dataframes to the environment (later 
on, for each dataframe, some dozen lines of code will be carried out, 
and i dont know how to do it w lapply or for-looping, so i do it 
separately):


list2env(split(df, sub(.+_,, rownames(df))), envir=.GlobalEnv)

Anyway, the dataframes have now numeric names in some cases, and cannot 
be easily accessed because of it.
How would the line be  altered to add an df_ for each  of the 
dataframe names resulting from list2env?


Thank you very much!



Thanks, On 20.02.2015 20:36, David Winsemius wrote:

On Feb 20, 2015, at 9:33 AM, Tim Richter-Heitmann wrote:


Dear List,

Consider this example

df - data.frame(matrix(rnorm(9*9), ncol=9))
names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 3_o1)
row.names(df) - names(df)


indx - gsub(.*_, , names(df))

I can split the dataframe by the index that is given in the column.names after the 
underscore _.

list2env(
  setNames(
lapply(split(colnames(df), indx), function(x) df[x]),
paste('df', sort(unique(indx)), sep=_)),
  envir=.GlobalEnv)

However, i changed my mind and want to do it now by rownames. Exchanging 
colnames with rownames does not work, it gives the exact same output (9 rows x 
3 columns). I could do
as.data.frame(t(df_x),
but maybe that is not elegant.
What would be the solution for splitting the dataframe by rows?

The split.data.frame method seems to work perfectly well with a 
rownames-derived index argument:


split(df, sub(.+_,, rownames(df) ) )

$`1`
   c_1   d_1  e_1   a_p   b_p   c_p  1_o1 2_o1  3_o1
c_1 -0.11 -0.04 1.33 -0.87 -0.16 -0.25 -0.75 0.34  0.14
d_1 -0.62 -0.94 0.80 -0.78 -0.70  0.74  0.11 1.44 -0.33
e_1  0.98 -0.83 0.48  0.19 -0.32 -1.01  1.28 1.04 -2.16

$o1
c_1   d_1   e_1   a_p   b_p   c_p  1_o1  2_o1  3_o1
1_o1 -0.93 -0.02  0.69 -0.67  1.04  1.04 -1.50 -0.36  0.50
2_o1  0.02 -0.16 -0.09 -1.50 -0.02 -1.04  1.07 -0.45  1.56
3_o1 -1.42  0.88 -0.05  0.85 -1.35  0.21  1.35  0.92 -0.76

$p
   c_1   d_1   e_1   a_p  b_p   c_p  1_o1  2_o1  3_o1
a_p -1.35  0.91 -0.58 -0.63 0.94 -1.13  0.71  0.25  0.82
b_p -0.25 -0.73 -0.41 -1.71 1.28  0.19 -0.35  1.74 -0.93
c_p -0.01 -1.11 -0.12  0.58 1.51  0.03 -0.99 -0.23 -0.03


Thank you very much!

--
Tim Richter-Heitmann




--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split a dataframe by rownames and/or colnames

2015-02-20 Thread Tim Richter-Heitmann

Dear List,

Consider this example

df - data.frame(matrix(rnorm(9*9), ncol=9))
names(df) - c(c_1, d_1, e_1, a_p, b_p, c_p, 1_o1, 2_o1, 
3_o1)

row.names(df) - names(df)


indx - gsub(.*_, , names(df))

I can split the dataframe by the index that is given in the column.names 
after the underscore _.


list2env(
  setNames(
lapply(split(colnames(df), indx), function(x) df[x]),
paste('df', sort(unique(indx)), sep=_)),
  envir=.GlobalEnv)

However, i changed my mind and want to do it now by rownames. Exchanging 
colnames with rownames does not work, it gives the exact same output (9 
rows x 3 columns). I could do

as.data.frame(t(df_x),
but maybe that is not elegant.
What would be the solution for splitting the dataframe by rows?

Thank you very much!

--
Tim Richter-Heitmann

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table with missing data and consecutive delimiters

2015-02-11 Thread Tim Victor
All,

Assume we have data in an ASCII file that looks like

Var1$Var2$Var3$Var4
1$2$3$4
2$$5
$$$6

When I execute

read.table( 'test.dat', header=TRUE, sep='$' )

I, of course, receive the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
 :
  line 2 did not have 4 elements

When I set fill=TRUE, e.g., read.table( 'test.dat', header=TRUE, sep='$',
fill=TRUE )

I get:

  Var1 Var2 Var3 Var4
11234
22   NA5   NA
3   NA   NA   NA6


What I need is

  Var1 Var2 Var3 Var4
11234
22   NA   NA5
3   NA   NA   NA6

What am I missing?

Thanks,

Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Network graph google maps‏

2014-12-16 Thread Tim de Wolf
Hello everybody. I am not sure if I am in the right forum for my question. If 
so, please let me know so I can go somewhere else! Thanks. Currently I am 
trying to familiarize myself with R. I am trying to plot some sort of Network 
graph (think of migration maps). Basically, I'd like to have an interactive map 
(via the google maps packages for R f.e.) of The Netherlands which has nodes 
and edges. However, I stumble upon some problems. I figured out how to create 
edges and nodes in R using the map function (this was quite the victory for 
me). R automatically creates a world map on which I can plot the data. But the 
world map is nowhere near as detailed as an interactive google map (via, f.e., 
gvisMap package). So I was looking for a way to get a google map, and plot data 
onto that, but I just can't seem to find a tutorial. I found this document, 
which shows several graphs, but none is a network graph. The goal is to 
eventually connect it to shiny and be able to plot different!
  kinds of network graphs based on input. I really need a tutorial as I am 
still a big noob in R :) So is there someone who is willing to help me find 
some sort of tutorial? Any help is much appreciated!

Tim   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rtmvnorm {tmvtnorm} seems broken

2014-12-09 Thread Tim Benham
General linear constraints don't seem to work. I get an error message if I
have more constraint equations than variables. E.g. executing the following
code

print(R.version)
library('tmvtnorm')
cat('tmvtnorm version ')
print(packageVersion('tmvtnorm'))
## Let's constrain our sample to the dwarfed hypercube of dimension p.
p - 3  # dimension
mean - rep(0,p)
sigma - diag(p)
## a = Dx = b
a - c(rep(0,p),-Inf)
b - c(rep(1,p),2)
D - rbind(diag(p),rep(1,p))
cat('mean is\n'); print(mean)
cat('a is\n'); print(a)
cat('b is\n'); print(b)
cat('D is\n'); print(D)
X - rtmvnorm(n=1000, mean, sigma, D=D, lower=a, upper=b, algorithm=gibbsR)

produces the following output

platform   x86_64-w64-mingw32
arch   x86_64
os mingw32
system x86_64, mingw32
status
major  3
minor  1.0
year   2014
month  04
day10
svn rev65387
language   R
version.string R version 3.1.0 (2014-04-10)
nickname   Spring Dance
tmvtnorm version [1] '1.4.9'
mean is
[1] 0 0 0
a is
[1]000 -Inf
b is
[1] 1 1 1 2
D is
 [,1] [,2] [,3]
[1,]100
[2,]010
[3,]001
[4,]111
Error in checkTmvArgs(mean, sigma, lower, upper) (from rtmvnorm-test.R#18) :
  mean, lower and upper must have the same length

That error message is not appropriate when a matrix of linear constraints is
passed in. I emailed the package maintainer on the 3rd but received only an
automatic out-of-office reply.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating submatrices from a dataframe, depending on factors in sample names

2014-12-01 Thread Tim Richter-Heitmann

Hello there,

this is a cross-post of a stack-overflow question, which wasnt answered, 
but is very important for my work. Apologies for breaking any rules, but 
i do hope for some help from the list instead:


I have a huge matrix of pairwise similarity percentages between 
different samples. The samples are belonging to groups. The groups are 
determined by the suffix _n in the row.names/header names.
In the first step, i wanted to create submatrices consisting of all 
pairs within single groups (i.e. for all samples from _1).
However, I realized that i need to know all pairwise submatrices, 
between all combination of groups. So, i want to create (a list of) 
vectors that are named _n1 vs _n2 (or similar) for all combinations of 
n, as illustrated by the colored rectangulars:


http://i.stack.imgur.com/XMkxj.png

Reproducible code, as provided by helpful Stack Overflow members, 
dealing with identical _ns.



df - structure(list(HQ673618_1 = c(NA, 90.8, 89.8, 89.6, 89.8, 
88.9,

87.8, 88.2, 88.3), HQ674317_1 = c(90.8, NA, 98.6, 97.7, 98.4,
97.4, 94.9, 96.2, 95.1), EU686630_1 = c(89.8, 98.6, NA, 98.4,
98.9, 97.7, 95.4, 96.4, 95.8), EU686593_2 = c(89.6, 97.7, 98.4,
NA, 98.1, 96.8, 94.4, 95.6, 94.8), JN166322_2 = c(89.8, 98.4,
98.9, 98.1, NA, 97.5, 95.3, 96.5, 95.9), EU491340_2 = c(88.9,
97.4, 97.7, 96.8, 97.5, NA, 96.5, 97.7, 96), AB694259_3 = c(87.8,
94.9, 95.4, 94.4, 95.3, 96.5, NA, 98.3, 95.9), AB694258_3 = 
c(88.2,
96.2, 96.4, 95.6, 96.5, 97.7, 98.3, NA, 95.8), AB694462_3 = 
c(88.3,
95.1, 95.8, 94.8, 95.9, 96, 95.9, 95.8, NA)), .Names = 
c(HQ673618_1,
HQ674317_1, EU686630_1, EU686593_2, JN166322_2, 
EU491340_2,
AB694259_3, AB694258_3, AB694462_3), class = 
data.frame, row.names = c(HQ673618_1,
HQ674317_1, EU686630_1, EU686593_2, JN166322_2, 
EU491340_2,

AB694259_3, AB694258_3, AB694462_3))


indx - gsub(.*_, , names(df))
sub.matrices - lapply(unique(indx), function(x) {
  temp - which(indx %in% x)
  df[temp, temp]
})
unique_values - lapply(sub.matrices, function(x) x[upper.tri(x)])
names(unique_values) - unique(indx)

This code needs to be expanded to form sub.matrices for any combination 
of unique indices in temp.



Thank you so much!




--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] big datasets for R

2014-10-28 Thread Tim Hoolihan
Another source of large datasets is the Public Data Sets on AWS 
http://aws.amazon.com/public-data-sets/
Tim Hoolihan
@thoolihan
http://linkedin.com/in/timhoolihan


On Oct 28, 2014, at 7:00 AM, r-help-requ...@r-project.org wrote:

 --
 
 Message: 2
 Date: Mon, 27 Oct 2014 13:37:12 +0100
 From: Qiong Cai qiong@gmail.com
 To: r-help@r-project.org
 Subject: [R] big datasets for R
 Message-ID:
   caorq85hxzxgsua-zdy5ee30m6qlxuvu8cqtkwbfewovbg1a...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8
 
 Hi,
 
 Could anyone please tell me where I can find very big datasets for R?  I'd
 like to do some benchmarking on R by stressing R a lot.
 
 Thanks
 Qiong
 
   [[alternative HTML version deleted]]
 
 
 
 --


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can I overwrite a method in R?

2014-10-09 Thread Tim Hesterberg
How can I create an improved version of a method in R, and have it be used?

Short version:
I think plot.histogram has a bug, and I'd like to try a version with a fix.
But when I call hist(), my fixed version doesn't get used.

Long version:
hist() calls plot() which calls plot.histogram() which fails to pass ...
when it calls plot.window().
As a result hist() ignores xaxs and yaxs arguments.
I'd like to make my own copy of plot.histogram that passes ... to
plot.window().

If I just make my own copy of plot.histogram, plot() ignores it, because my
version is not part of the same graphics package that plot belongs to.

If I copy hist, hist.default and plot, the copies inherit the same
environments as
the originals, and behave the same.

If I also change the environment of each to .GlobalEnv, hist.default fails
in
a .Call because it cannot find C_BinCount.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [vegan] Error in as.vector(x, mode) : , cannot coerce type 'builtin' to vector of type 'any' when performing anova on CCA objects

2014-10-09 Thread Tim Richter-Heitmann
Hi there,

i have three dataframes, which have the same structure (data.frame, 358 
observations of 340 variables, every column is numeric), and are 
basically submatrices of a parental dataset.

I can perform a CCA on all of them (with the explanatory dataset being 
abio)

/cca1_ab - cca(df1~., data=abio)/

I can also perform an anova on all three cca-derived values

/anova(cca1_ab)/

However, if i do

/sig1_ab-anova(cca1_ab, by=term, perm=200)/

which i believe gives me confidence values for the fitting of the 
explanatory variables, only one of the cca-derived values gives an 
unexpected

Error in as.vector(x, mode) : ,  cannot coerce type 'builtin' to vector 
of type 'any'

The other two are working fine. Its hard to come up with an explanation, 
as the datasets look and behave the same otherwise.

Any idea why this could happen?

I am sorry for not providing data for reproduction, as the data sets are 
pretty large.

-- 
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Stra�e (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I overwrite a method in R?

2014-10-09 Thread Tim Hesterberg
Thank you Duncan, Brian, Hadley, and Lin.

In Lin's suggestion, I believe the latter two statements should be
reversed, so that the environment is added before the function is placed
into the graphics namespace.

source('plot.histogram.R')
environment(plot.histogram) - asNamespace('graphics')
assignInNamespace('plot.histogram', plot.histogram, ns='graphics')

The middle statement could also be
environment(plot.histogram) - environment(graphics:::plot.histogram)
The point is to ensure that the replacement version has the same
environment as the original.

Having tested this, I will now submit a bug report :-)

On Thu, Oct 9, 2014 at 9:11 AM, C Lin bac...@hotmail.com wrote:

 I posted similar question a while ago. Search for modify function in a
 package.
 In your case, the following should work.

 source('plot.histogram.R')
 assignInNamespace('plot.histogram',plot.histogram,ns='graphics')
 environment(plot.histogram) - asNamespace('graphics');

 Assuming you have your own plot.histogram function inside
 plot.histogram.R and the plot.histogram function you are trying to
 overwrite is in graphics package.

 Lin

 
  From: h.wick...@gmail.com
  Date: Thu, 9 Oct 2014 07:00:31 -0500
  To: timhesterb...@gmail.com
  CC: r-h...@stat.math.ethz.ch
  Subject: Re: [R] How can I overwrite a method in R?
 
  This is usually ill-advised, but I think it's the right solution for
  your problem:
 
  assignInNamespace(plot.histogram, function(...) plot(1:10), graphics)
  hist(1:10)
 
  Haley
 
  On Thu, Oct 9, 2014 at 1:14 AM, Tim Hesterberg timhesterb...@gmail.com
 wrote:
  How can I create an improved version of a method in R, and have it be
 used?
 
  Short version:
  I think plot.histogram has a bug, and I'd like to try a version with a
 fix.
  But when I call hist(), my fixed version doesn't get used.
 
  Long version:
  hist() calls plot() which calls plot.histogram() which fails to pass ...
  when it calls plot.window().
  As a result hist() ignores xaxs and yaxs arguments.
  I'd like to make my own copy of plot.histogram that passes ... to
  plot.window().
 
  If I just make my own copy of plot.histogram, plot() ignores it,
 because my
  version is not part of the same graphics package that plot belongs to.
 
  If I copy hist, hist.default and plot, the copies inherit the same
  environments as
  the originals, and behave the same.
 
  If I also change the environment of each to .GlobalEnv, hist.default
 fails
  in
  a .Call because it cannot find C_BinCount.
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  --
  http://had.co.nz/
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.





-- 
Tim Hesterberg
http://www.timhesterberg.net
 (resampling, water bottle rockets, computers to Costa Rica, hot shower =
2650 light bulbs, ...)

Help your students understand statistics:
Mathematical Statistics with Resampling and R, Chihara  Hesterberg
http://www.timhesterberg.net/bootstrap/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gplot heatmaps: clustering according to rowsidecolors + key.xtickfun

2014-09-04 Thread Tim Richter-Heitmann

Hi there,

I have two questions about heatmap.2 in gplot.
My input is a simple square matrix with numeric values between 75 and 
100 (it is a similarity matrix based on bacterial DNA sequences).


1. I can sort my input matrix into categories with rowsidecolors (in 
this case, very conveniently by bacterial taxa). I do a clustering and 
reordering of my matrix by Rowv=TRUE (and Colv=Rowv).
The question is now, can i combine the two features that the 
clustering/reordering is done only for submatrices defined by the 
vectors given in rowsidecolors (so, in this case, that the original 
ordering by bacterial taxa is preserved)?



That would be very amazing.

2. I have set my own coloring rules with:

mypal - c(grey,blue, green,yellow,orange,red)
col_breaks = c(seq(0,74.9), seq(75.0,78.4), seq(78.5,81.9), 
seq(82.0,86.4), seq(86.5, 94.5),  seq(94.5,100.0))


Is it possible to pass this sequential ordering to key.xtickfun? May i 
ask for an example code?


Thank you very much!

--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [vegan]Envfit, pvalues and ggplot2

2014-08-11 Thread Tim Richter-Heitmann
Good Morning,

first let me thank you very much for answering my first two questions on 
this list.

Currently, i do vegan's EnvFit to simple PCA ordinations. When drawing 
the biplot, one can set a cutoff to just fit the parameters with 
significant p-values (via p.max=0.05 in the plot command).

There is already sufficient coverage on the net for biplotting this kind 
of data with ggplot2 (with the problem being the arrow length).
http://stackoverflow.com/questions/14711470/plotting-envfit-vectors-vegan-package-in-ggplot2

However, what the solution is not covering is the exclusion from 
insignificant environmental parameters, as the score extraction process 
as described in the link only works with 'display=vectors':

|Envfit_scores-  as.data.frame(scores(list_from_envfit,  display=  vectors))

|

envfit creates lists like this:

 PC1PC2r2Pr(r)
param1-0.708820.705390.09940.000999***
param2-0.601220.799080.05930.000999***

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P values based on 999 permutations.

The list contains a vector called $pval containing the pvalues.

So, i need to reduce the list created by envfit to rows meeting a 
criterion in $pval (via unlist and which, i suppose). However, i 
have difficulties to work out the correct code.


Any help is much appreciated!

-- 
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with labeling plots, possibly in font defaults

2014-08-07 Thread Tim Blass
Hello,

I am using R 3.1.1 on a (four year old) MacBook, running OSX 10.9.4.

I just tried making and labeling a plot as follows:

 x-rnorm(10)
 y-rnorm(10)
 plot(x,y)
 title(main=random points)

which produces a scatter plot of the random points, but without the title
and without any numbers along the axes. If I then run

 par(family=sans)
 plot(x,y,main=plot title)

the plot has the title and the numbers on the axes (also and 'x' and 'y'
appear as default labels for the axes).

I do not know what is going on, but maybe there is some problem in the
default font settings (I don't know if that could be an R issue or an issue
specific to my Mac)?

This is clearly not a big problem (at least for now) since I can put labels
on plots by running par(), but if it is indicative of a larger underlying
problem (or if there is a simple fix) I would like to know.

Thank you!

tb

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problems with read.table and data structure

2014-07-11 Thread Tim Richter-Heitmann
Hi there!

I have huge datafile of 600 columns 360 samples:

data - read.table(small.txt, header = TRUE, sep = \t, dec = ., 
row.names=1)

The txt.file (compiled with excel) is showing me only numbers, however R 
gives me the structure of ANY column as factor.

When i try stringsAsFactors=FALSE in the read command, the structure 
of the dataset becomes character.

When i try as.numeric(data), i get

Error: (list) object cannot be coerced to type 'double'


even, if i try to subset columns with [].


When i try as.numeric on single columns with $, i am successful, but the 
numbers dont make any sense at all, as the factors are not converted by their 
levels:


Factor w/ 358 levels 0,123111694,..: 11 14 50 12 38 44 13 76 31 30


becomes


num  11 14 50 12 38 44 13 76 31 30


whereas i would need the levels, though!


I suspect excel to mess up the save as tab-delimited text, but the text file 
seems fine with me on surface (i dont know how the numbers are stored  
internally). I just see correct numbers, also the View command
yields the correct content.



Anyone knows help? Its pretty annoying.



Thank you!

-- 
Tim Richter-Heitmann


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] x,y scatterplots, with averaging by a column

2014-06-30 Thread Tim Richter-Heitmann

Hallo!

I have this matrix:

SampleID, Day, Species1, Species2,Species3,...,Speciesn
1,Monday,abundance values
2,Monday,abundance values
11,Tuesday,abundance values
12,Tuesday,abundance values
21,Wednesday,abundance values
22,Wednesday,abundance values

I would like to plot the Days on the x-axis, and the species abundance 
data on the y-axis. Each pair of values (i just have two measurements 
for each day) should be represented by their mean with absolute error 
bars (so we have basically a vertical min,max-range) and whiskers. The 
means for each Species observation should be connected with lines.
Also of interest would be annotating the whispers with their sample ID 
(because the whiskers basically represent the values for y1,2 (11,12; 
21,22)).

Any help is welcome! I am new to R, so please bear with me. Thank you!

--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Large matrices and pairs/ggpairs

2014-05-20 Thread Tim Richter-Heitmann
Hi there,

i recently started to learn R to deal  with a huge data matrix to deal 
with with ecological data (about 40 observations in 360 samples, divided 
in 7 groups of similar samples). I want to do a simple pairs or ggpairs:

pairs(mydata, pch=20, col=brewer.pal(7, Dark2)[unclass(all$group)])

ggpairs(mydata, colour=group) (preferred)



Of course, the plot margins are way to small to produce a 40x40 plot 
matrix (or is there a way to create poster-sized plots?), so i need to 
produce 39 x,y scatter-plots for each single column and arrange them 
somehow. Is there a way to do it automatically (without creating 39 
single plots one by one?)
Another option would be to only show plots with x,y pairs with 
correlation values above a certain treshold (i would need to create the 
cor-matrix first and identify those highly correlated pairs by myself, 
correct, or is there a way to tell R to do that?)
And finally, ggpairs is giving me fits because the font size of the 
upper part does not change despite me trying to tell R to do so 
(params=list(corSize=10) or just params=list(Size=10) or 
upper=list(params=c(Size=10)).


As i said i am a beginner and hopefully my questions are not too stupid.

-- 
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: using coord_trans for logit - probability

2014-04-16 Thread Tim Marcella
I think all you have to do is add type=response to your call for the
predictions.

Does this work for you

# get fitted values on the logit scale
pred - data.frame(Arthritis,
   predict(arth.logistic, se.fit=TRUE,type=response))

library(ggplot2)
library(scales)
# plot on logit scale
gg - ggplot(pred, aes(x=Age, y=fit)) +
  geom_line(size = 2) + theme_bw() +
  geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
  ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
transparent) +
  labs(x = Age, y = Log odds (Better))
gg

-Tim


On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly frien...@yorku.ca wrote:

 I'm trying to see if  how I can use coord_trans() with ggplot2 to
 transform the
 Y axis of a plot on the logit scale to the probability scale, as opposed
 to  recalculating
 everything manually and constructing a new plot.
 Here is a simple example of the 'base' plot I'd like to transform:

 data(Arthritis, package=vcdExtra)
 Arthritis$Better - as.numeric(Arthritis$Improved  None)
 arth.logistic - glm(Better ~ Age, data=Arthritis, family=binomial)

 # get fitted values on the logit scale
 pred - data.frame(Arthritis,
predict(arth.logistic, se.fit=TRUE))
 library(ggplot2)
 library(scales)
 # plot on logit scale
 gg - ggplot(pred, aes(x=Age, y=fit)) +
   geom_line(size = 2) + theme_bw() +
   geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
   ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
 transparent) +
   labs(x = Age, y = Log odds (Better))
 gg

 Things I've tried that don't work:

  gg + coord_trans(ytrans=logis)
 Error in get(as.character(FUN), mode = function, envir = envir) :
   object 'logis_trans' of mode 'function' was not found
 
  gg + coord_trans(ytrans=probability_trans(logis))
 Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
 In addition: Warning message:
 In qfun(x, ...) : NaNs produced
 

 Doing what I want manually:

 # doing it manually
 pred2 - within(pred, {
  prob  - plogis(fit)
  lower - plogis(fit - 1.96 * se.fit)
  upper - plogis(fit + 1.96 * se.fit)
  })


 gg2 - ggplot(pred2, aes(x=Age, y=prob)) +
   geom_line(size = 2) + theme_bw() +
   geom_ribbon(aes(ymin = lower,
   ymax = upper), alpha = 0.2,  color = transparent) +
   labs(x = Age, y = Probability (Better))
 gg2



 --
 Michael Friendly Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.  Chair, Quantitative Methods
 York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
 4700 Keele StreetWeb:   http://www.datavis.ca
 Toronto, ONT  M3J 1P3 CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: using coord_trans for logit - probability

2014-04-16 Thread Tim Marcella
Sorry I jumped the gun. That does not provide you with the same plot as gg2
that you are aiming for.

-T


On Wed, Apr 16, 2014 at 7:37 PM, Tim Marcella timmarce...@gmail.com wrote:

 I think all you have to do is add type=response to your call for the
 predictions.

 Does this work for you

 # get fitted values on the logit scale
 pred - data.frame(Arthritis,
predict(arth.logistic, se.fit=TRUE,type=response))

 library(ggplot2)
 library(scales)
 # plot on logit scale
 gg - ggplot(pred, aes(x=Age, y=fit)) +
   geom_line(size = 2) + theme_bw() +
   geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
   ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
 transparent) +
   labs(x = Age, y = Log odds (Better))
 gg

 -Tim


 On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly frien...@yorku.cawrote:

 I'm trying to see if  how I can use coord_trans() with ggplot2 to
 transform the
 Y axis of a plot on the logit scale to the probability scale, as opposed
 to  recalculating
 everything manually and constructing a new plot.
 Here is a simple example of the 'base' plot I'd like to transform:

 data(Arthritis, package=vcdExtra)
 Arthritis$Better - as.numeric(Arthritis$Improved  None)
 arth.logistic - glm(Better ~ Age, data=Arthritis, family=binomial)

 # get fitted values on the logit scale
 pred - data.frame(Arthritis,
predict(arth.logistic, se.fit=TRUE))
 library(ggplot2)
 library(scales)
 # plot on logit scale
 gg - ggplot(pred, aes(x=Age, y=fit)) +
   geom_line(size = 2) + theme_bw() +
   geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
   ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
 transparent) +
   labs(x = Age, y = Log odds (Better))
 gg

 Things I've tried that don't work:

  gg + coord_trans(ytrans=logis)
 Error in get(as.character(FUN), mode = function, envir = envir) :
   object 'logis_trans' of mode 'function' was not found
 
  gg + coord_trans(ytrans=probability_trans(logis))
 Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
 In addition: Warning message:
 In qfun(x, ...) : NaNs produced
 

 Doing what I want manually:

 # doing it manually
 pred2 - within(pred, {
  prob  - plogis(fit)
  lower - plogis(fit - 1.96 * se.fit)
  upper - plogis(fit + 1.96 * se.fit)
  })


 gg2 - ggplot(pred2, aes(x=Age, y=prob)) +
   geom_line(size = 2) + theme_bw() +
   geom_ribbon(aes(ymin = lower,
   ymax = upper), alpha = 0.2,  color = transparent) +
   labs(x = Age, y = Probability (Better))
 gg2



 --
 Michael Friendly Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.  Chair, Quantitative Methods
 York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
 4700 Keele StreetWeb:   http://www.datavis.ca
 Toronto, ONT  M3J 1P3 CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Tim Marcella
 508.498.2989
 timmarce...@gmail.com




-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot mlogit object

2014-04-14 Thread Tim Marcella
Hi,

I cannot figure out how or if I even can plot the results from a nested
multinomial logit model. I am using the mlogit package.

Does anyone have a lead on any tutorials? Both of the vignettes are lacking
plotting instructions.

Thanks, Tim

-- 
Tim Marcella

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time interactions for coxph object

2014-04-07 Thread Tim Marcella
Hi All,

I am working in the 'survival' library and need to know how to code a
variable time interaction for left truncated data (not all subjects
followed from the start of the study). I therefor add in the entry time and
exit time variables into the formula.

Here is my basic formula

CSHR.shore.fly - coxph(Surv(entry, exit, to == 1) ~ shore.cat, data
  glba.mod)

My variable shore.cat is violating the proportional hazards assumption so I
am trying to add in an interaction with time. Do I interact exit? entry? or
the range of the two?

Thanks, Tim

-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nested Logit Model

2014-04-03 Thread Tim Marcella
Hi All,

I am using the package 'mlogit'

I am having trouble constructing a nested logit model. I am headed down
this path as I am violating the IIA assumption when running a multinomial
logistic regression.

I am analyzing the choice a seabird floating on the surface of the water
makes when approached by a ship. I have three choices defined for the
dependent variable 'act2'; fly, dive, or remain on the water (none).
Independent variables I hypothesize might affect the decision are
perpendicular distance to the ship's path, group size, location, distance
to shore, speed of the ship, number of ships in the area and wind speed.
All the variables are individual specific.

A non-nested mutinomial logit model will run fine using the following code
(data converted to long format already).

  mod.1 - mlogit(act2 ~ 0 | dist + as.factor(grp.bin) + as.factor(ship) +
as.factor(sea) + avgknots + shore + as.factor(location),long_perp ,
reflevel = 'none')

After assessing that I am violating the IIA assumption I attempted to run a
nested logit model with the following code.

 nested.1 - mlogit(act2 ~  0 | dist + as.factor(grp.bin) +
as.factor(ship) +
as.factor(sea) + avgknots + shore + as.factor(location),long_perp ,
reflevel = 'none',
nests = list(react = c(dive,fly), remain = none), unscaled=TRUE)

I get the following error code.

Error in solve.default(crossprod(attr(x, gradi)[, !fixed])) :
  Lapack routine dgesv: system is exactly singular: U[6,6] = 0

Does this indicate that my variables are col-linear?, overfitting? I have
tried to remove all but one variable and rerun and still get an error code.

Can anyone help me get over this hump or help suggest a different modeling
approach?

I am mainly interested in the probability of choosing to react (fly or
dive) or not, and then once a reaction has been made, which one is chosen
and how these decisions relate to perpendicular distance to the ship's path.

Thanks, Tim

-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multicore - handling a list of return values

2014-03-18 Thread Tim Smith
Hi,

I was trying to gather/combine the results returned from the mclapply function. 
This is how I do it if only one data object is returned:

#= This works fine ===

library(multicore)

frandom1 - function(iter,var1 = 3,var2 =2){ 
    mat - matrix(rnorm(var1*var2),nrow=var1,ncol=var2)
    return(mat)    
}
N - 3

### OK
temp1 - mclapply(1:N,frandom1,mc.cores=2)
result1 - do.call(rbind,temp1)
print(result1)

#

Now, I want to return more than one object from my function and I am not sure 
how best to combine the values returned. My current code is:

#=== ??? =
frandom2 - function(iter,var1 = 3,var2 =2){ 
    mat - matrix(rnorm(var1*var2),nrow=var1,ncol=var2)
    vect1 - sample(1:1000,var1)
    return(list(mat,vect1))    
}

temp2 - mclapply(1:N,frandom2,mc.cores=2)


 Combining returned values
result2 - result3 - c()
for(k in 1:N){
    thismat - temp2[[k]][[1]]
    result2 - rbind(result2,thismat)
    
    thisvect - temp2[[k]][[2]]
    result3 - c(result3,thisvect)
    
}

#==


Is there a more elegant way of combining these values (as in the top example)? 
Although this works, for a large value of N ( N  500,000), running a loop 
would be very expensive. 


Any help would be appreciated!

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coding for segmented regression within a hurdle model

2014-03-15 Thread Tim Marcella
Hi,

I am using a two part hurdle model to account for zero inflation and
overdispersion in my count data. I would like to account for a segmented or
breakpoint relationship in the binomial logistic hurdle model and pass
these results onto the count model (negative binomial).

Using the segemented package I have determined that my data supports one
breakpoint at 3.85. The slope to this point is significant and will affect
the presence of zeros in a linear fashion. The slope  3.85 is
non-significant and estimated to not help predict the presence of zeros in
the data (threshold effect). Here are the results from this model

Estimated Break-Point(s):
   Est. St.Err
 3.853  1.372

t value for the gap-variable(s) V:  0

Meaningful coefficients of the linear terms:
   Estimate Std. Error z value Pr(|z|)
(Intercept) -0.2750 0.3556  -0.774   0.4392
approach_km -0.4520 0.2184  -2.069   0.0385 *
sea2 0.3627 0.2280   1.591   0.1117
U1.approach_km   0.4543 0.2188   2.076   NA

U1.approach_km is the estimate for the second slope. The actual estimated
slope for the section section is the difference between this value and the
approach_km value (0.0023).

I think that I have found a way to maually code this into the hurdle
model as follows

hurdle.fit - hurdle(tot_f ~  x1 + x2 + x3 | approach_km +
I(pmax(approach_km-3.849,0)) + sea )

When I look at the estimated coefficients from the manual code it gives
the same values. However, the std.errors are estimated lower.

Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(|z|)
(Intercept) -0.274410.29347  -0.9350.350
approach_km -0.452610.09993  -4.529 5.92e-06 ***
I(pmax(approach_km - 3.849, 0))  0.454860.10723   4.242 2.22e-05 ***
sea2 0.362710.22803   1.5910.112

Question # 1: Does the hurdle equation use the standard errors from the
zero model when building the count predictions? If no then I guess I would
not have to worry about this and can just report the original std.errors
and associated p values from the segemented object in the pub.
Question # 2: If the count model uses the std.errors, how can I reformulate
this equation to generate the original std.errors.

Thanks, Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Negative binomial models and censored observations

2014-03-13 Thread Tim Marcella
Hi,

I am working with hurdle models in the pscl package to model zero inflated
overdispersed count data and want to incorporate censored observations into
the equation. 33% of the observed positive count data is right censored,
i.e. subject lost to follow up during the duration of the study. Can this
be accounted for in the hurdle() function?

Thanks, Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating RMSE in R from hurdle regression object

2014-03-12 Thread Tim Marcella
Hi,

My data is characterized by many zeros (82%) and overdispersion. I have
chosen to model with hurdle regression (pscl package) with a negative
binomial distribution for the count data. In an effort to validate the
model I would like to calculate the RMSE of the predicted vs. the observed
values. From my reading I understand that this is the calculated on the raw
residuals generated from the model output. This is the formula I used

H1.RMSE - sqrt(mean(H1$residuals^2)) # Where H1 is my fitted hurdle
model

I get 46.7 as the RMSE. This seems high to me based on the model results.
Assuming my formula and my understanding of RMSE is correct (and please
correct me if I am wrong) I question whether this is an appropriate use of
validation for this particular structure of model. The hurdle model
correctly predicts all of my zeros. The predictions I get from the fitted
model are all values greater than zero. From my readings I understand that
the predictions from the fitted hurdle model are means generated for the
particular covariate environment based on the model coefficients. If this
is truly the case it does not make sense to compare these means to the
observations. This will generate large residuals (only 18% of the
observations contain counts greater than 0, while the predicted counts all
exceed 0). It seems like comparing apples to oranges. Other correlative
tests (Pearson's r, Spearman's p) would seem to be comparing the mean
predicted value for particular covariate to the observed which again is
heavily dominated by zeros.

Any tips on how best to validate hurdle models in R?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ifelse...

2014-01-16 Thread Tim Smith
Hi,

Sorry for the newbie question! My code:

x - 'a'
ifelse(x == 'a',y - 1, y - 2)
print(y)

Shouldn't this assign a value of 1? When I execute this I get:

 x - 'a'
 ifelse(x == 'a',y - 1, y - 2)
[1] 1
 print(y)
[1] 2


Am I doing something really daft???

thanks!



 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   parallel  stats graphics  grDevices utils datasets  
methods   base 

other attached packages:
 [1] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6    
ggplot2_0.9.3.1   sda_1.3.2 fdrtool_1.2.11    corpcor_1.6.6 
   
 [8] entropy_1.2.0 scatterplot3d_0.3-34  pdist_1.2 
hash_2.2.6    DAAG_1.18 multicore_0.1-7   
multtest_2.18.0  
[15] XML_3.95-0.2  hgu133a.db_2.10.1 affy_1.40.0   
genefilter_1.44.0 GOstats_2.28.0    graph_1.40.1  
Category_2.28.0  
[22] GO.db_2.10.1  venneuler_1.1-0   rJava_0.9-6   
colorRamps_2.3    RColorBrewer_1.0-5    sparcl_1.0.3  gap_1.1-10
   
[29] plotrix_3.5-2 som_0.3-5 pvclust_1.2-2 
lsr_0.3.1 compute.es_0.2-2  sm_2.2-5.3    
imputation_2.0.1 
[36] locfit_1.5-9.1    TimeProjection_0.2.0  Matrix_1.1-1.1    
timeDate_3010.98  lubridate_1.3.3   gbm_2.1   
lattice_0.20-24  
[43] survival_2.37-4   RobustRankAggreg_1.1  impute_1.36.0 
reshape_0.8.4 plyr_1.8  zoo_1.7-10    
data.table_1.8.10    
[50] foreach_1.4.1 foreign_0.8-57    languageR_1.4.1   
preprocessCore_1.24.0 gtools_3.1.1  BiocInstaller_1.12.0  
org.Hs.eg.db_2.10.1  
[57] RSQLite_0.11.4    DBI_0.2-7 AnnotationDbi_1.24.0  
Biobase_2.22.0    BiocGenerics_0.8.0    biomaRt_2.18.0   

loaded via a namespace (and not attached):
 [1] affyio_1.30.0 annotate_1.40.0   AnnotationForge_1.4.4 
codetools_0.2-8   colorspace_1.2-4  dichromat_2.0-0   digest_0.6.4  
   
 [8] grid_3.0.2    GSEABase_1.24.0   gtable_0.1.2  
iterators_1.0.6   labeling_0.2  latticeExtra_0.6-26   MASS_7.3-29   
   
[15] munsell_0.4.2 proto_0.3-10  RBGL_1.38.0   
RCurl_1.95-4.1    reshape2_1.2.2    scales_0.2.3  stats4_3.0.2  
   
[22] stringr_0.6.2 tools_3.0.2   xtable_1.7-1  
zlibbioc_1.8.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] set new path for package installation, because In normalizePath(path.expand(path), winslash, mustWork) :

2014-01-15 Thread Tim Richter-Heitmann

Hi there,

i am no way an expert user, so please bear with me, as i am basically 
looking for a step.by.step tutorial to change pathes in R. This is my 
current problem:
We are running R 3.0.2 with R-Studio on a Win7 environment. In our case, 
the default working directory Users/Documents/R on the local drive is 
not available. Instead, the path is redirected to a different network 
drive (as all our profile data is stored on that H: drive). This used to 
work fine at some point, but the other day, some major changes were made 
to our server (which i have nothing to do with), and suddenly, a few 
users having this problem:


Whenever i am starting R, i am welcomed with:
Warning message:
In normalizePath(path.expand(path), winslash, mustWork) :
path[1]=//x/home$/tim/Documents/R/win-library/3.0: Access is denied

And i am not allowed to install packages or manipulate objects in my 
workspace. However, whenever i am running R with admin rights, i may 
install packages, but they will be stored in a directory in 
profiles/admin/appdata/local.


Curiously, when i perform .libPaths(), i get:

[1] x/home$/tim/Documents/R/win-library/3.0
[2] C:/Program Files/R/R-3.0.2/library


I can use the packages then with the normal user account though. Still, 
i am not able to write into my workspace with normal user rights, and 
for us, it is impossible to grant every user with admin rights to run 
R-Studio.
From what i have found on google, there are workarounds for R, like 
starting R from batch files, but i do not know if they're fine when i 
want to start R from an GUI like R-Studio, too. I also have found advice 
to change the path to the working and library location directory in the 
Rprofile. I do find a rprofile.site in ../etc , but there is no path 
given to the rprofile, which i think is needed. There was another tip to 
manipulate R_LIBS_USER. But, tbh, there is a limit to my technical skill.
Do you have any idea to fix that issue? Or is anyone please willing to 
give me a little guidance?


Thank you very much!


--
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Append text to panels of lattice::barchart()

2013-11-26 Thread Tim Sippel
I could use a little help writing a panel function to append text to each
panel of a lattice::barchart().  Below is a modified version of the barley
dataset to illustrate.

data(barley)
# add a new variable called samp.size
barley$samp.size-round(runif(n=nrow(barley), min=0, max=50),0)
# Below is a barchart plot for this example
barchart(yield ~ variety | site, data = barley,
 groups = year, layout = c(1,6), stack = TRUE,
 auto.key = list(space = right),
 ylab = Barley Yield (bushels/acre),
 scales = list(x = list(rot = 45)))
# I need to add to each panel the sum of samp.size by the groups variable
(year) for that panel.
# So the text appended to the top panel (Waseca) might say 1931=15,
1932=20, and same for each subsequent panel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Append text to panels of lattice::barchart()

2013-11-26 Thread Tim Sippel
Thanks Duncan, although that helps me learn something new about lattice
plots (I never had played with the as.table argument before), I need
something a bit more elegant.  I need to generate about 10 plots, each with
up to 20 panels. So the panel-by-panel approach is a bit cludgier than I am
looking for.  I'll look into trellis.focus to see if the light bulb comes
on in my head.  I'd still appreciate a more efficient way to do this given
the number of plots and panels I need.
Thanks,
Tim


On Tue, Nov 26, 2013 at 2:08 PM, Duncan Mackay dulca...@bigpond.com wrote:

 Hi

 Try

 barchart(yield ~ variety | site, data = barley,
   groups = year, layout = c(1,6), stack = TRUE,
   auto.key = list(space = right),
   ylab = Barley Yield (bushels/acre),
   scales = list(x = list(rot = 45)),
   panel = function(x,y,...){

 pnl = panel.number()
 panel.barchart(x,y,...)

 if (pnl == 6) grid.text(1931=15, 1932=20, 0.5, 0.88,
 gp = gpar(cex = 0.8))

   }
 )

 if you had as.table = TRUE as an argument pnl==1 would apply to the top
 panel

 you then have to repeat the line an amend the pnl == 6.
 On a panel basis is easier than on a plot basis as the xy coordinates are
 easily managed.

 Another way is trellis.focus but possibly a little fiddlier

 Duncan

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Tim Sippel
 Sent: Wednesday, 27 November 2013 07:08
 To: r-help@r-project.org
 Subject: [R] Append text to panels of lattice::barchart()

 I could use a little help writing a panel function to append text to each
 panel of a lattice::barchart().  Below is a modified version of the barley
 dataset to illustrate.

 data(barley)
 # add a new variable called samp.size
 barley$samp.size-round(runif(n=nrow(barley), min=0, max=50),0) # Below is
 a
 barchart plot for this example barchart(yield ~ variety | site, data =
 barley,
  groups = year, layout = c(1,6), stack = TRUE,
  auto.key = list(space = right),
  ylab = Barley Yield (bushels/acre),
  scales = list(x = list(rot = 45))) # I need to add to each panel
 the sum of samp.size by the groups variable
 (year) for that panel.
 # So the text appended to the top panel (Waseca) might say 1931=15,
 1932=20, and same for each subsequent panel.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spsurvey analysis

2013-11-01 Thread Tim Howard
All,
I've used the excellent package, spsurvey, to create spatially balanced samples 
many times in the past. I'm now attempting to use the analysis portion of the 
package, which compares CDFs among sub-populations to test for differences in 
sub-population metrics. 
 
- My data (count data) have many zeros, following a negative binomial or even 
zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 
2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) 
species sampled. Warning message: The combined number of values in at least 
one class is less than five. Action: The user should consider using a smaller 
number of classes.

- There are plenty of samples in my two time periods (the dummy set below: 
Yr1=27, Yr2=31 non-zero values). 
 
My Question:
Why is it throwing this error and is there a way to get around it?



Reproduceable example (change path to spsurvey sample data), requires us to use 
spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd(C:/Program Files/R/R-3.0.2/library/spsurvey/doc)
att - read.dbf(UT_ecoregions)
shp - read.shape(UT_ecoregions)

set.seed(4447864)

# Create the design list
Stratdsgn - list(Central Basin and Range=list(panel=c(PanelOne=25), 
seltype=Equal),
  Colorado Plateaus=list(panel=c(PanelOne=25), 
seltype=Equal),
  Mojave Basin and Range=list(panel=c(PanelOne=10), 
seltype=Equal),
  Northern Basin and Range=list(panel=c(PanelOne=10), 
seltype=Equal),
  Southern Rockies=list(panel=c(PanelOne=14), 
seltype=Equal),
  Wasatch and Uinta Mountains=list(panel=c(PanelOne=10), 
seltype=Equal),
  Wyoming Basin=list(panel=c(PanelOne=6), seltype=Equal))

# Select the sample design for each year
Stratsites_Yr1 - grts(design=Stratdsgn, DesignID=STRATIFIED,
   type.frame=area, src.frame=sp.object,
   sp.object=shp, att.frame=att, stratum=Level3_Nam, 
shapefile=FALSE)

Stratsites_Yr2 - grts(design=Stratdsgn, DesignID=STRATIFIED,
   type.frame=area, src.frame=sp.object,
   sp.object=shp, att.frame=att, stratum=Level3_Nam, 
shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID 
to link with dummy data
Yr1 - cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = Yr1)
Yr2 - cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = Yr2)  
   
sitedat - rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a - rnbinom(size = 0.06, mu = 19.87, n=100)
bn.b - rnbinom(size = 0.06, mu = 20.15, n=100)
dat.a - data.frame(pltID = 1001:1100, grp = Yr1,count = bn.a)
dat.b - data.frame(pltID = 2001:2100, grp = Yr2,count = bn.b)
dat - rbind(dat.a, dat.b)


## Analysis begins here

data.cont - data.frame(siteID = dat$pltID, Density=dat$count)
sites - data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat)))
subpop - data.frame(siteID = dat$pltID, 
All_years=(rep(allYears,nrow(dat))),
Year = dat$grp)
design - data.frame(siteID = sitedat$pltID,
wgt = sitedat$wgt,
xcoord = sitedat$xcoord,
ycoord = sitedat$ycoord)
framesize - c(Yr1=888081202000, Yr2=888081202000)

## There seem to be pretty good estimates
CDF_Estimates - cont.analysis(sites, subpop, design, data.cont, 
popsize = list(All_years=sum(framesize),
Year = as.list(framesize)))

print(CDF_Estimates$Pct)

## this test fails
CDF_Tests - cont.cdftest(sites, subpop[,c(1,3)], design, data.cont,
   popsize=list(Year=as.list(framesize)))
warnprnt()

## how many records have values greater than zero, by year?   Probably 
irrelevant!
notZero - dat[dat$count  0,]
table(notZero$grp)

### end

 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C  
[5] LC_TIME=English_United States.1252

Thanks in advance.
Tim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spsurvey analysis

2013-11-01 Thread Tim Howard
Jason,
Thank you for your reply. Interesting ... so you think the 'classes' in the 
error message The combined number of values in at least one class...  is 
referring to the CDF bins rather than the sub-population classes that I 
defined. 
 
That makes sense as I only defined two classes (!). I was worried it was 
detecting and treating polygons as classes, somehow (ecoregions in example 
below).
 
I had already reached out to Kincaid and Olsen but had not received an answer 
yet so I moved on to R-help.   I'll go back to them. 
 
Thanks again.
Best, 
Tim

 Law, Jason jason@portlandoregon.gov 11/1/2013 2:47 PM 
I use the spsurvey package a decent amount.  The cont.cdftest function bins the 
cdf in order to perform the test which I think is the root of the problem.  
Unfortunately, the default is 3 which is the minimum number of bins.

I would contact Tom Kincaid or Tony Olsen at NHEERL WED directly to ask about 
this problem.

Another option would be to take a different analytical approach (e.g., a mixed 
effects model) which would allow you a lot more flexibility.

Jason Law
Statistician
City of Portland
Bureau of Environmental Services
Water Pollution Control Laboratory
6543 N Burlington Avenue
Portland, OR 97203-5452
503-823-1038
jason@portlandoregon.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Tim Howard
Sent: Friday, November 01, 2013 7:49 AM
To: r-help@r-project.org
Subject: [R] spsurvey analysis

All,
I've used the excellent package, spsurvey, to create spatially balanced samples 
many times in the past. I'm now attempting to use the analysis portion of the 
package, which compares CDFs among sub-populations to test for differences in 
sub-population metrics. 

- My data (count data) have many zeros, following a negative binomial or even 
zero-inflated negative binomial distribution.
- Samples are within polygons of varying sizes
- I want to test whether a sample at time 1 is different from a sample at time 
2. Essentially the same sample areas and number of samples.

The problem:
- cont.cdftest  throws a warning and does not complete for most (but not all) 
species sampled. Warning message: The combined number of values in at least 
one class is less than five. Action: The user should consider using a smaller 
number of classes.

- There are plenty of samples in my two time periods (the dummy set below: 
Yr1=27, Yr2=31 non-zero values). 

My Question:
Why is it throwing this error and is there a way to get around it?



Reproduceable example (change path to spsurvey sample data), requires us to use 
spsurvey to generate sample points:

### R code tweaked from vignettes 'Area_Design' and 'Area_Analysis'
library(spsurvey)
### Analysis set up
setwd(C:/Program Files/R/R-3.0.2/library/spsurvey/doc)
att - read.dbf(UT_ecoregions)
shp - read.shape(UT_ecoregions)

set.seed(4447864)

# Create the design list
Stratdsgn - list(Central Basin and Range=list(panel=c(PanelOne=25), 
seltype=Equal),
  Colorado Plateaus=list(panel=c(PanelOne=25), 
seltype=Equal),
  Mojave Basin and Range=list(panel=c(PanelOne=10), 
seltype=Equal),
  Northern Basin and Range=list(panel=c(PanelOne=10), 
seltype=Equal),
  Southern Rockies=list(panel=c(PanelOne=14), 
seltype=Equal),
  Wasatch and Uinta Mountains=list(panel=c(PanelOne=10), 
seltype=Equal),
  Wyoming Basin=list(panel=c(PanelOne=6), seltype=Equal))

# Select the sample design for each year
Stratsites_Yr1 - grts(design=Stratdsgn, DesignID=STRATIFIED,
   type.frame=area, src.frame=sp.object,
   sp.object=shp, att.frame=att, stratum=Level3_Nam, 
shapefile=FALSE)

Stratsites_Yr2 - grts(design=Stratdsgn, DesignID=STRATIFIED,
   type.frame=area, src.frame=sp.object,
   sp.object=shp, att.frame=att, stratum=Level3_Nam, 
shapefile=FALSE)

#extract the core information, add year as a grouping variable, add a plot ID 
to link with dummy data
Yr1 - cbind(pltID = 1001:1100, Stratsites_Yr1@data[,c(1,2,3,5)], grp = Yr1)
Yr2 - cbind(pltID = 2001:2100, Stratsites_Yr2@data[,c(1,2,3,5)], grp = Yr2)  
   
sitedat - rbind(Yr1, Yr2)

# create dummy sampling data. Lots of zeros!
bn.a - rnbinom(size = 0.06, mu = 19.87, n=100) bn.b - rnbinom(size = 0.06, mu 
= 20.15, n=100) dat.a - data.frame(pltID = 1001:1100, grp = Yr1,count = 
bn.a) dat.b - data.frame(pltID = 2001:2100, grp = Yr2,count = bn.b) dat - 
rbind(dat.a, dat.b)


## Analysis begins here

data.cont - data.frame(siteID = dat$pltID, Density=dat$count) sites - 
data.frame(siteID = dat$pltID, Use=rep(TRUE, nrow(dat))) subpop - 
data.frame(siteID = dat$pltID, 
All_years=(rep(allYears,nrow(dat))),
Year = dat$grp)
design - data.frame(siteID = sitedat$pltID,
wgt = sitedat$wgt,
xcoord = sitedat

[R] Selecting maximums between different variables

2013-10-17 Thread Tim Umbach
Hi there,

another beginners question, I'm afraid. Basically i want to selct the
maximum of values, that correspond to different variables. I have a table
of oil production that looks somewhat like this:

oil - data.frame( YEAR = c(2011, 2012),
   TX = c(2, 3),
   CA = c(4, 25000),
   AL = c(2,
21000),

   ND = c(21000,6))

Now I want to find out, which state produced most oil in a given year. I
tried this:

attach(oil)
last_year = oil[ c(YEAR == 2012), ]
max(last_year)

Which works, but it doesnt't give me the corresponding values (i.e. it just
gives me the maximum output, not what state its from).
So I tried this:

oil[c(oil == max(last_year)),]
and this:
oil[c(last_year == max(last_year)),]
and this:
oil[which.max(last_year),]
and this:
last_year[max(last_year),]

None of them work, but they don't give error messages either, the output is
just NA. The problem is, in my eyes, that I'm comparing the values of
different variables with each other. Because if i change the structure of
the dataframe (which I can't do with the real data, at least not with out
doing it by hand with a huge dataset), it looks like this and works
perfectly:

oil2 - data.frame (
  names = c('YEAR', 'TX', 'CA', 'AL', 'ND'),
  oil_2011 = c(2011, 2, 4, 2, 21000),
  oil_2012 = c(2012, 3, 25000, 21000, 6)
  )
attach(oil2)
oil2[c(oil_2012 == max(oil_2012)),]

Any help is much appreciated.

Thanks, Tim Umbach

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] update.packages fails on pdf write - CSAgent

2013-10-03 Thread Tim Howard
For the archives, the solution I've come up with is to complete the install in 
a location where write permissions are allowed for exe files and then simply 
copy the extracted files from this folder to R's library folder, using the 
components of update.packages as follows:


#get a list of old packages
x - old.packages()

#download the zips
download.packages(pkgs = x,  destdir = 
C:/pathToLocationWhereWritesAreAllowed)   

#install them in the safe folder
setwd(C:/pathToLocationWhereWritesAreAllowed)
y - list.files(pattern = .zip)
install.packages(pkgs = y, repos = NULL, lib = 
C:/pathToLocationWhereWritesAreAllowed)

#now manually copy the folders to the R installation and overwrite the existing 
ones. 
 
Tim


Date: Wed, 02 Oct 2013 11:17:29 -0400
From: Tim Howard tghow...@gw.dec.state.ny.us
To: r-help@r-project.org
Subject: [R] update.packages fails on pdf write - CSAgent
Message-ID: 524c00c902d500b80...@gwsmtp.dec.state.ny.us
Content-Type: text/plain; charset=UTF-8

All,
I work in a building where our computers are running Windows XP with
Cisco Security Agent (CSA) activated, locking many automated actions,
especially file writes. We (lowly staff) are allowed to install
software, so on upgrading to R 3.0.2, I tried the standard approach for
updating my packages: Copying the non-base folders from my older
installation into the library folder and running

update.packages(checkBuilt=TRUE, ask=FALSE)

R merrily downloads from the mirror and then continues installation --
until it gets to a package that has a pdf to write, as follows:

 update.packages(checkBuilt=TRUE, ask=FALSE)
--- Please select a CRAN mirror for use in this session ---
trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/akima_0.5-11.zip'
Content type 'application/zip' length 370759 bytes (362 Kb)
opened URL
downloaded 362 Kb

trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/bitops_1.0-6.zip'
Content type 'application/zip' length 35878 bytes (35 Kb)
opened URL
downloaded 35 Kb

trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/chron_2.3-44.zip'
Content type 'application/zip' length 105192 bytes (102 Kb)
opened URL
downloaded 102 Kb
trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/colorspace_1.2-4.zip'
Content type 'application/zip' length 384585 bytes (375 Kb)
opened URL
downloaded 375 Kb
 .. many more packages ...

package ?akima? successfully unpacked and MD5 sums checked
package ?bitops? successfully unpacked and MD5 sums checked
package ?chron? successfully unpacked and MD5 sums checked
Error in unzip(zipname, exdir = dest) : 
  cannot open file 'C:/Program
Files/R/R-3.0.2/library/file8ec2cd16e8/colorspace/doc/hcl-colors.pdf':
Permission denied
 
Here's the message from the Application log in my Event Viewer:

The process 'C:\Program Files\R\R-3.0.2\bin\i386\Rgui.exe' (as user
myusername) attempted to access 'C:\Program
Files\R\R-3.0.2\library\file8ec2cd16e8\colorspace\doc\hcl-colors.pdf'.
The attempted access was a write (operation = OPEN/CREATE). The
operation was denied.

So it seems to me that CSA is blocking the installation of this package
because of the PDF write attempt. Please correct me if my interpretation
is wrong!

I can re-install these packages manually, but would love to do it
automatically.

My Questions:

It seems that the pdf is 'extra' since it isn't required for all
packages. Is there a way for me to tell update.packages to not install
these extras? I don't see that as an option at ?update.packages, but I
may be missing it.

Alternatively, is there an easy way to identify and download the zips
needed in a folder of my choice and but then finish the install
manually?

 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base   


loaded via a namespace (and not attached):
[1] tools_3.0.2
 

Thanks for any help

Tim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] update.packages fails on pdf write - CSAgent

2013-10-02 Thread Tim Howard
All,
I work in a building where our computers are running Windows XP with
Cisco Security Agent (CSA) activated, locking many automated actions,
especially file writes. We (lowly staff) are allowed to install
software, so on upgrading to R 3.0.2, I tried the standard approach for
updating my packages: Copying the non-base folders from my older
installation into the library folder and running
 
update.packages(checkBuilt=TRUE, ask=FALSE)
 
R merrily downloads from the mirror and then continues installation --
until it gets to a package that has a pdf to write, as follows:
 
 update.packages(checkBuilt=TRUE, ask=FALSE)
--- Please select a CRAN mirror for use in this session ---
trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/akima_0.5-11.zip'
Content type 'application/zip' length 370759 bytes (362 Kb)
opened URL
downloaded 362 Kb
 
trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/bitops_1.0-6.zip'
Content type 'application/zip' length 35878 bytes (35 Kb)
opened URL
downloaded 35 Kb
 
trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/chron_2.3-44.zip'
Content type 'application/zip' length 105192 bytes (102 Kb)
opened URL
downloaded 102 Kb
trying URL
'http://cran.mirrors.hoobly.com/bin/windows/contrib/3.0/colorspace_1.2-4.zip'
Content type 'application/zip' length 384585 bytes (375 Kb)
opened URL
downloaded 375 Kb
 .. many more packages ...
 
package ‘akima’ successfully unpacked and MD5 sums checked
package ‘bitops’ successfully unpacked and MD5 sums checked
package ‘chron’ successfully unpacked and MD5 sums checked
Error in unzip(zipname, exdir = dest) : 
  cannot open file 'C:/Program
Files/R/R-3.0.2/library/file8ec2cd16e8/colorspace/doc/hcl-colors.pdf':
Permission denied
 
Here's the message from the Application log in my Event Viewer:
 
The process 'C:\Program Files\R\R-3.0.2\bin\i386\Rgui.exe' (as user
myusername) attempted to access 'C:\Program
Files\R\R-3.0.2\library\file8ec2cd16e8\colorspace\doc\hcl-colors.pdf'.
The attempted access was a write (operation = OPEN/CREATE). The
operation was denied.
 
So it seems to me that CSA is blocking the installation of this package
because of the PDF write attempt. Please correct me if my interpretation
is wrong!
 
I can re-install these packages manually, but would love to do it
automatically.
 
My Questions:
 
It seems that the pdf is 'extra' since it isn't required for all
packages. Is there a way for me to tell update.packages to not install
these extras? I don't see that as an option at ?update.packages, but I
may be missing it.
 
Alternatively, is there an easy way to identify and download the zips
needed in a folder of my choice and but then finish the install
manually?
 
 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base   
 

loaded via a namespace (and not attached):
[1] tools_3.0.2
 
 
Thanks for any help
 
 Tim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GGally installation problems

2013-06-12 Thread Tim Smith
Hi,

I was trying to install the GGally package, but was getting errors. Here is 
what I get:

 install.packages(GGally)
Installing package(s) into ‘/Users/ts2w/Library/R/2.15/library’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘GGally’ is not available (for R version 2.15.2)

 install.packages(GGally,repos=http://cran-r.project.org;)
Installing package(s) into ‘/Users/ts2w/Library/R/2.15/library’
(as ‘lib’ is unspecified)
Warning in install.packages :
  cannot open: HTTP status was '404 Not Found'
Warning in install.packages :
  cannot open: HTTP status was '404 Not Found'
Warning in install.packages :
  unable to access index for repository http://cran-r.project.org/src/contrib
Warning in install.packages :
  package ‘GGally’ is not available (for R version 2.15.2)
Warning in install.packages :
  cannot open: HTTP status was '404 Not Found'
Warning in install.packages :
  cannot open: HTTP status was '404 Not Found'
Warning in install.packages :
  unable to access index for repository 
http://cran-r.project.org/bin/macosx/leopard/contrib/2.15

thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] #Keeping row names when using as.data.frame.matrix

2013-05-17 Thread Tim
#question I have the following data set:

Date-c(9/7/2010,9/7/2010,9/7/2010,9/7/2010,9/7/2010,9/7/2010,9/8/2010)

EstimatedQuantity-c(3535,2772,3279,3411,3484,3274,3305)

ScowNo-c(4001,3002,4002,BR 8,4002,BR 8,4001)

dataset- data.frame(EstimatedQuantity,Date,ScowNo)

#I'm trying to convert the data set into a contingency table and then back
into a regular data frame:


xtabdata-as.data.frame.matrix(xtabs(EstimatedQuantity~Date+ScowNo,data=dataset),
 row.names=(dataset$Date),optional=F)

#I'm trying to keep the row names (in xtabsdata) as the dates.
#But the row names keep coming up as integers.
#How can I preserve the row names as dates when
#the table is converted back to a data frame?




--
View this message in context: 
http://r.789695.n4.nabble.com/Keeping-row-names-when-using-as-data-frame-matrix-tp4667344.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv quotes within fields

2013-01-28 Thread Tim Howard
Yes!  This does this trick. Thank You!
Tim

 peter dalgaard pda...@gmail.com 1/26/2013 11:49 AM 

On Jan 26, 2013, at 16:32 , Tim Howard wrote:

 Duncan, 
 Good point - I guess I am expecting too much. I'll work on a global replace 
 before import or chopping with strsplit while importing. 
 
 FYI everyone - these folks with the non-standard csv are the US Feds:
 https://oeaaa.faa.gov/oeaaa/external/public/publicAction.jsp?action=showCaseDownloadForm
 
 (see off-airport AEA 2005 for a csv with issues.)

Does this do the trick?

dd - read.csv(text=gsub(\\\, \\, fixed=TRUE,
   readLines(~/Downloads/OffAirportAEA2005List.csv)))



 
 Thank you for the help. I really do appreciate the suggested solutions.
 Also, thanks to John Kane for the link to csvEdit. 
 
 Tim
 
 Duncan Murdoch murdoch.dun...@gmail.com 01/25/13 6:37 PM 
 On 13-01-25 4:37 PM, Tim Howard wrote:
 David,
 Thank you again for the reply. I'll try to make readLines() and strplit() 
 work.  What bugs me is that I think it would import fine if the folks who 
 created the csv had used double quotes  rather than an escaped quote \ 
 for those pesky internal quotes. Since that's the case, I'd think there 
 would be a solution within read.csv() ... or perhaps scan()?, I just can't 
 figure it out.
 
 What you say doesn't make sense.  Let me paraphrase:
 
 The folks who sent me the data created a weird, non-standard .csv file. 
  You'd think read.csv() could handle it.
 
 The standard way to handle quotes in strings is to double them.  They 
 didn't, so you've got a weird file on your hands.
 
 You can probably fix it by doing a global substitution of all backslash 
 doublequote pairs with doublequote doublequote.  Unless they have 
 some backslash backslash doublequote triples that they want you to 
 interpret as backslash doublequote.  Or some other weirdness.
 
 I'd suggest you try the global subst mentioned above. or ask them for 
 data in a reasonably standardized format.
 
 Duncan Murdoch
 
 best,
 Tim
 
 David Winsemius dwinsem...@comcast.net 1/25/2013 4:16 PM 
 
 On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:
 
 Great point, your fix (quote=) works for the example I gave. 
 Unfortunately, these text strings have commas in them as well(!).  Throw a 
 few commas in any of the text strings and it breaks again.  Sorry about not 
 including those in the example.
 
 So, I need to incorporate commas *and* quotes with the escape character 
 within a single string.
 
 Well you need to have _some_ delimiter. At the moment it sounds as though 
 you might end upusing readLines() and strsplit( . , split=\\'\\,\\s\\).
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv quotes within fields

2013-01-26 Thread Tim Howard
Duncan, 
Good point - I guess I am expecting too much. I'll work on a global replace 
before import or chopping with strsplit while importing. 

FYI everyone - these folks with the non-standard csv are the US Feds:
https://oeaaa.faa.gov/oeaaa/external/public/publicAction.jsp?action=showCaseDownloadForm

(see off-airport AEA 2005 for a csv with issues.)

Thank you for the help. I really do appreciate the suggested solutions.
Also, thanks to John Kane for the link to csvEdit. 

Tim

 Duncan Murdoch murdoch.dun...@gmail.com 01/25/13 6:37 PM 
On 13-01-25 4:37 PM, Tim Howard wrote:
 David,
 Thank you again for the reply. I'll try to make readLines() and strplit() 
 work.  What bugs me is that I think it would import fine if the folks who 
 created the csv had used double quotes  rather than an escaped quote \ for 
 those pesky internal quotes. Since that's the case, I'd think there would be 
 a solution within read.csv() ... or perhaps scan()?, I just can't figure it 
 out.

What you say doesn't make sense.  Let me paraphrase:

The folks who sent me the data created a weird, non-standard .csv file. 
  You'd think read.csv() could handle it.

The standard way to handle quotes in strings is to double them.  They 
didn't, so you've got a weird file on your hands.

You can probably fix it by doing a global substitution of all backslash 
doublequote pairs with doublequote doublequote.  Unless they have 
some backslash backslash doublequote triples that they want you to 
interpret as backslash doublequote.  Or some other weirdness.

I'd suggest you try the global subst mentioned above. or ask them for 
data in a reasonably standardized format.

Duncan Murdoch

 best,
 Tim

 David Winsemius dwinsem...@comcast.net 1/25/2013 4:16 PM 

 On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:

 Great point, your fix (quote=) works for the example I gave. 
 Unfortunately, these text strings have commas in them as well(!).  Throw a 
 few commas in any of the text strings and it breaks again.  Sorry about not 
 including those in the example.

 So, I need to incorporate commas *and* quotes with the escape character 
 within a single string.

 Well you need to have _some_ delimiter. At the moment it sounds as though you 
 might end upusing readLines() and strsplit( . , split=\\'\\,\\s\\).



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.csv quotes within fields

2013-01-25 Thread Tim Howard
All,
 
I have some csv files I am trying to import. I am finding that quotes inside 
strings are escaped in a way R doesn't expect for csv files. The problem only 
seems to rear its ugly head when there are an uneven number of internal quotes. 
I'll try to recreate the problem:
 
# set up a matrix, using escape-quote as the internal double quote mark.
 
x - data.frame(matrix(data=c(1, string one, another string, 2, quotes 
escaped 10' 20\ 5' 30\ \test string, final string, 3,third row,last 
\ col),ncol = 3, byrow=TRUE))
 
 write.csv(x, test.csv)
 
# NOTE that write.csv correctly created the three internal quotes '  ' by 
using double quotes '  '. 
# here's what got written
 
,X1,X2,X3
1,1,string one,another string
2,2,quotes escaped 10' 20 5' 30 test string,final string
3,3,third row,last  col
 
# Importing test.csv works fine.
 
 read.csv(test.csv)
  X X1 X2 X3
1 1  1 string one another string
2 2  2 quotes escaped 10' 20 5' 30 test string   final string
3 3  3  third row last  col
# this looks good. 
# now, please go and open test.csv with a text editor and replace all the 
double quotes '' with the 
# quote escaped ' \ ' as is found in my data set. Like this:

,X1,X2,X3
1,1,string one,another string
2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string
3,3,third row,last \ col
 
# this breaks read.csv:
 
 read.csv(test.csv)
  X X1  
  X2 X3
1 1  1  
  string one another string
2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final 
string\n3,3,third row,last \\ col  
 
# we now have only two rows, with all the data captured in col2 row2
 
Any suggestions on how to fix this behavior? I've tried fiddling with 
quote=\ to no avail, obviously. Interestingly, an even number of escaped 
quotes within a field is loaded correctly, which certainly threw me for a while!
 
Thank you in advance, 
Tim
 
 
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv quotes within fields

2013-01-25 Thread Tim Howard
Drat, I forgot to tell you what system I am on:
 
 sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252 LC_NUMERIC=C  
[5] LC_TIME=English_United States.1252
 
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 


 Tim Howard 1/25/2013 1:42 PM 
All,
 
I have some csv files I am trying to import. I am finding that quotes inside 
strings are escaped in a way R doesn't expect for csv files. The problem only 
seems to rear its ugly head when there are an uneven number of internal quotes. 
I'll try to recreate the problem:
 
# set up a matrix, using escape-quote as the internal double quote mark.
 
x - data.frame(matrix(data=c(1, string one, another string, 2, quotes 
escaped 10' 20\ 5' 30\ \test string, final string, 3,third row,last 
\ col),ncol = 3, byrow=TRUE))
 
 write.csv(x, test.csv)
 
# NOTE that write.csv correctly created the three internal quotes '  ' by 
using double quotes '  '. 
# here's what got written
 
,X1,X2,X3
1,1,string one,another string
2,2,quotes escaped 10' 20 5' 30 test string,final string
3,3,third row,last  col
 
# Importing test.csv works fine.
 
 read.csv(test.csv)
  X X1 X2 X3
1 1  1 string one another string
2 2  2 quotes escaped 10' 20 5' 30 test string   final string
3 3  3  third row last  col
# this looks good. 
# now, please go and open test.csv with a text editor and replace all the 
double quotes '' with the 
# quote escaped ' \ ' as is found in my data set. Like this:

,X1,X2,X3
1,1,string one,another string
2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string
3,3,third row,last \ col
 
# this breaks read.csv:
 
 read.csv(test.csv)
  X X1  
  X2 X3
1 1  1  
  string one another string
2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final 
string\n3,3,third row,last \\ col  
 
# we now have only two rows, with all the data captured in col2 row2
 
Any suggestions on how to fix this behavior? I've tried fiddling with 
quote=\ to no avail, obviously. Interestingly, an even number of escaped 
quotes within a field is loaded correctly, which certainly threw me for a while!
 
Thank you in advance, 
Tim
 
 
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv quotes within fields

2013-01-25 Thread Tim Howard
Great point, your fix (quote=) works for the example I gave. Unfortunately, 
these text strings have commas in them as well(!).  Throw a few commas in any 
of the text strings and it breaks again.  Sorry about not including those in 
the example.
 
So, I need to incorporate commas *and* quotes with the escape character within 
a single string.
 
Tim
 

 David Winsemius dwinsem...@comcast.net 1/25/2013 2:27 PM 

On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:

 All,
 
 I have some csv files I am trying to import. I am finding that quotes inside 
 strings are escaped in a way R doesn't expect for csv files. The problem only 
 seems to rear its ugly head when there are an uneven number of internal 
 quotes. I'll try to recreate the problem:
 
 # set up a matrix, using escape-quote as the internal double quote mark.
 
 x - data.frame(matrix(data=c(1, string one, another string, 2, 
 quotes escaped 10' 20\ 5' 30\ \test string, final string, 3,third 
 row,last \ col),ncol = 3, byrow=TRUE))
 
 write.csv(x, test.csv)
 
 # NOTE that write.csv correctly created the three internal quotes '  ' by 
 using double quotes '  '. 
 # here's what got written
 
 ,X1,X2,X3
 1,1,string one,another string
 2,2,quotes escaped 10' 20 5' 30 test string,final string
 3,3,third row,last  col
 
 # Importing test.csv works fine.
 
 read.csv(test.csv)
  X X1 X2 X3
 1 1  1 string one another string
 2 2  2 quotes escaped 10' 20 5' 30 test string   final string
 3 3  3  third row last  col
 # this looks good. 
 # now, please go and open test.csv with a text editor and replace all the 
 double quotes '' with the 
 # quote escaped ' \ ' as is found in my data set. Like this:
 
 ,X1,X2,X3
 1,1,string one,another string
 2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string
 3,3,third row,last \ col

Use quote=:

 read.csv(text=',X1,X2,X3
+ 1,1,string one,another string
+ 2,2,quotes escaped 10\' 20 5\' 30 test string,final string
+ 3,3,third row,last  col', sep=,, quote=)

Not , quote=\


  X.. X.X1.   X.X2.X.X3.
1 1   1string one another string
2 2   2 quotes escaped 10' 20 5' 30 test string   final string
3 3   3 third rowlast  col

You will then be depending entirely on commas to separate. 

(Needed to use escaped single quotes to illustrate from a command line.)

 
 # this breaks read.csv:
 
 read.csv(test.csv)
  X X1 
X2 X3
 1 1  1
 string one another string
 2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final 
 string\n3,3,third row,last \\ col  
 
 # we now have only two rows, with all the data captured in col2 row2
 
 Any suggestions on how to fix this behavior? I've tried fiddling with 
 quote=\ to no avail, obviously. Interestingly, an even number of escaped 
 quotes within a field is loaded correctly, which certainly threw me for a 
 while!
 
 Thank you in advance, 
 Tim
 
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv quotes within fields

2013-01-25 Thread Tim Howard
David, 
Thank you again for the reply. I'll try to make readLines() and strplit() work. 
 What bugs me is that I think it would import fine if the folks who created the 
csv had used double quotes  rather than an escaped quote \ for those pesky 
internal quotes. Since that's the case, I'd think there would be a solution 
within read.csv() ... or perhaps scan()?, I just can't figure it out. 
best, 
Tim

 David Winsemius dwinsem...@comcast.net 1/25/2013 4:16 PM 

On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:

 Great point, your fix (quote=) works for the example I gave. Unfortunately, 
 these text strings have commas in them as well(!).  Throw a few commas in any 
 of the text strings and it breaks again.  Sorry about not including those in 
 the example.
  
 So, I need to incorporate commas *and* quotes with the escape character 
 within a single string.

Well you need to have _some_ delimiter. At the moment it sounds as though you 
might end upusing readLines() and strsplit( . , split=\\'\\,\\s\\).

-- 
david.

  
 Tim
  
 
  David Winsemius dwinsem...@comcast.net 1/25/2013 2:27 PM 
 
 On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:
 
  All,
  
  I have some csv files I am trying to import. I am finding that quotes 
  inside strings are escaped in a way R doesn't expect for csv files. The 
  problem only seems to rear its ugly head when there are an uneven number of 
  internal quotes. I'll try to recreate the problem:
  
  # set up a matrix, using escape-quote as the internal double quote mark.
  
  x - data.frame(matrix(data=c(1, string one, another string, 2, 
  quotes escaped 10' 20\ 5' 30\ \test string, final string, 3,third 
  row,last \ col),ncol = 3, byrow=TRUE))
  
  write.csv(x, test.csv)
  
  # NOTE that write.csv correctly created the three internal quotes '  ' by 
  using double quotes '  '. 
  # here's what got written
  
  ,X1,X2,X3
  1,1,string one,another string
  2,2,quotes escaped 10' 20 5' 30 test string,final string
  3,3,third row,last  col
  
  # Importing test.csv works fine.
  
  read.csv(test.csv)
   X X1 X2 X3
  1 1  1 string one another string
  2 2  2 quotes escaped 10' 20 5' 30 test string   final string
  3 3  3  third row last  col
  # this looks good. 
  # now, please go and open test.csv with a text editor and replace all the 
  double quotes '' with the 
  # quote escaped ' \ ' as is found in my data set. Like this:
  
  ,X1,X2,X3
  1,1,string one,another string
  2,2,quotes escaped 10' 20\ 5' 30\ \test string,final string
  3,3,third row,last \ col
 
 Use quote=:
 
  read.csv(text=',X1,X2,X3
 + 1,1,string one,another string
 + 2,2,quotes escaped 10\' 20 5\' 30 test string,final string
 + 3,3,third row,last  col', sep=,, quote=)
 
 Not , quote=\
 
 
   X.. X.X1.   X.X2.X.X3.
 1 1   1string one another string
 2 2   2 quotes escaped 10' 20 5' 30 test string   final string
 3 3   3 third rowlast  col
 
 You will then be depending entirely on commas to separate. 
 
 (Needed to use escaped single quotes to illustrate from a command line.)
 
  
  # this breaks read.csv:
  
  read.csv(test.csv)
   X X1   
   X2 X3
  1 1  1  
string one another string
  2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final 
  string\n3,3,third row,last \\ col  
  
  # we now have only two rows, with all the data captured in col2 row2
  
  Any suggestions on how to fix this behavior? I've tried fiddling with 
  quote=\ to no avail, obviously. Interestingly, an even number of escaped 
  quotes within a field is loaded correctly, which certainly threw me for a 
  while!
  
  Thank you in advance, 
  Tim
  
  
 
 David Winsemius
 Alameda, CA, USA
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Filling Lists or Arrays of variable dimensions

2012-12-21 Thread Tim Howard
Jessica,
In terms of initializing your list for populating it in a later step, I think 
this partially gets at your question -- it removes the upper loop assignment, 
but then does require a loop to get the second level. Perhaps there's something 
here you can work with ...
 
 
height-c(high,low)
width-c(slim,wide)
 
l - vector(list,length(height))   
names(l) - height
 
for(i in 1:length(l)){
l[[i]] - vector(list,length(width))
names(l[[i]]) - width
}
 
 
Best
Tim
 
Date: Fri, 21 Dec 2012 11:34:06 +0100
From: Jessica Streicher j.streic...@micromata.de
To: David Winsemius dwinsem...@comcast.net
Cc: R help r-help@r-project.org
Subject: Re: [R] Filling Lists or Arrays of variable dimensions
Message-ID: be0ef9be-a571-45f3-92e6-72e093bb5...@micromata.de
Content-Type: text/plain; charset=us-ascii

@David : In my mind it was quite complete enough.

@William: Thanks, didn't know you could do that with data frames, if i ever 
have to do something similar again i might try this.

On 20.12.2012, at 22:39, David Winsemius wrote:

 
 On Dec 20, 2012, at 10:01 AM, Jessica Streicher wrote:
 
 Really must have been unclear at some point, sorry.
 
 Hasn't it become abundantly clear that this would have progress farther had 
 you post a complete example?
 
 -- 
 David.
 
 William, thats interesting, but not really helping the main problem, which 
 is: how to do 
 
 l[[ as.character(grid[1, ]) ]] - 1
 
 without having initialized the list in the loop before. 
 
 Well, or how to initialize it without having to do the loop thing, because 
 the loop stuff can only be done for a specific set of parameter vectors. But 
 those change, and i don't want to have to write another loop construct every 
 time for the new version.
 
 I want to say: hey, i have these vectors here with these values (my 
 parameters), could you build me that nested list structure (tree - whatever) 
 from it? And the function will give me that structure whatever i give it 
 without me needing to intervene in form of changing the code.
 
 -- Clarification -
 
 First: i am not computing statistics over the parameters. I'm computing 
 stuff from other data, and the computation is affected by the parameters. 
 
 I am computing classifiers for different sets of parameters for those 
 classifiers. So the result of doSomething() isn't a simple value. Its 
 usually a list of 6 lists (doing cross validation), which in turn have the 
 classifier object, some statistics of the classifier (e.g what was 
 missclassified), and the subsets of data used in them.
 That doesn't really fit in a data.frame, hence the use of lists. I want the 
 nested lists because it helps me find stuff in the object browser faster, 
 and because all my other code is already geared towards it. If i had the 
 time i might still go for a flat structure that everyone keeps telling me to 
 use (got a few mails off the list),
 but i really haven't the time.
 
 If theres no good way i'll just keep things as they are now.
 
 
 On 20.12.2012, at 18:37, William Dunlap wrote:
 
 Arranging data as a list of lists of lists of lists [...] of scalar values 
 generally
 will lead to slow and hard-to-read R code, mainly because R is designed to
 work on long vectors of simple data.  If you were to start over, consider 
 constructing
 a data.frame with one column for each attribute.  Then tools like aggregate 
 and
 the plyr functions would be useful.
 
 However, your immediate problem may be solved by creating your 'grid' object
 as a data.frame of character, not factor, columns because as.character 
 works differently
 on lists of scalar factors and lists of scalar characters.  Usually 
 as.mode(x), when
 x is a list of length-1 items, gives the same result as 
 as.mode(unlist(x)), but not when
 x is a list of length-1 factors:
 
 height-c(high, low)
 width-c(slim, wide)
 gridF - expand.grid(height, width, stringsAsFactors=FALSE)
 gridT - expand.grid(height, width, stringsAsFactors=TRUE)
 as.character(gridF[1,])
 [1] high slim
 as.character(gridT[1,])
 [1] 1 1
 as.character(unlist(gridT[1,])) # another workaround
 [1] high slim
 
 Your example was not self-contained so I changed the call to doSomething() 
 to paste(h,w,sep=/):
 
 height-c(high, low)
 width-c(slim, wide)
 
 l - list()
 for(h in height){
l[[h]] - list()
for(w in width){
l[[h]][[w]] - paste(h, w, sep=/) # doSomething()
}
 }
 
 grid - expand.grid(height, width, stringsAsFactors=FALSE)
 as.character(grid[1,])
 # [1] high slim, not the [1] 1 1 you get with stringsAsFactors=TRUE
 l[[ as.character(grid[1, ]) ]]
 # [1] high/slim
 l[[ as.character(grid[1, ]) ]] - 1
 l[[ as.character(grid[1, ]) ]]
 # [1] 1
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
 On Behalf
 Of Jessica Streicher
 Sent: Thursday, December 20, 2012 8:43 AM
 To: Chris Campbell
 Cc: R help

Re: [R] boot() with glm/gnm on a contingency table

2012-09-12 Thread Tim Hesterberg
One approach is to bootstrap the vector 1:n, where n is the number
of individuals, with a function that does:
f - function(vectorOfIndices, theTable) {
  (1) create a new table with the same dimensions, but with the counts
  in the table based on vectorOfIndices.
  (2) Calculate the statistics of interest on the new table.
}

When f is called with 1:n, the table it creates should be the same
as the original table.  When called with a bootstrap sample of
values from 1:n, it should create a table corresponding to the
bootstrap sample.

Tim Hesterberg
http://www.timhesterberg.net
 (resampling, water bottle rockets, computers to Costa Rica, shower = 2650 
light bulbs, ...)

NEW!  Mathematical Statistics with Resampling and R, Chihara  Hesterberg
http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8

Hi everyone!

In a package I'm developing, I have created a custom function to get
jackknife standard errors for the parameters of a gnm model (which is
essentially the same as a glm model for this issue). I'd like to add
support for bootstrap using package boot, but I couldn't find how to
proceed.

The problem is, my data is a table object. Thus, I don't have one
individual per line: when the object is converted to a data frame, one
row represents one cell, or one combination of factor levels. I cannot
pass this to boot() as the data argument and use indices from my
custom statistic() function, since I would drop cells, not individual
observations.

A very inefficient solution would be to create a data frame with one row
per observation, by replicating each cell using its frequencies. That's
really a brute force solution, though. ;-)

The other way would be generate importance weights based on observed
frequencies, and to multiply the original data by the weights at each
iteration, but I'm not sure that's correct. Thoughts?


Thanks for your help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boot() with glm/gnm on a contingency table

2012-09-12 Thread Tim Hesterberg
Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit :
 One approach is to bootstrap the vector 1:n, where n is the number
 of individuals, with a function that does:
 f - function(vectorOfIndices, theTable) {
   (1) create a new table with the same dimensions, but with the counts
   in the table based on vectorOfIndices.
   (2) Calculate the statistics of interest on the new table.
 }

 When f is called with 1:n, the table it creates should be the same
 as the original table.  When called with a bootstrap sample of
 values from 1:n, it should create a table corresponding to the
 bootstrap sample.
Indeed, that's another solution I considered, but I wanted to be sure
nothing more reasonable exists. You're right that it's more efficient
than replicating the whole data set. But still, with a typical table of
less than 100 cells and several thousands of observations, this means
creating a potentially long vector, much larger than the original data;
nothing really hard with common machines, to be sure.

If no other way exists, I'll use this. Thanks.

In your original posting you also suggested:
The other way would be generate importance weights based on observed
frequencies, and to multiply the original data by the weights at each
iteration, but I'm not sure that's correct. Thoughts?

You could do:

bootstrapTable - x  # where x is the original table
for(i in numberOfBootstrapSamples) {
  bootstrapTable[] - rmultinom(1, size = sum(x), prob = x)
  replicate[i] - myFunction(bootstrapTable)
}
# caveat - not tested


I can't tell from help(boot) whether you could do it correctly there.
boot has a 'weights' argument that you could use for the sampling
probabilities, but you also need a way to tell it to draw sum(x)
observations.  Or, you could also pass boot a parametric sampler.  
But be careful if you use boot in either of these ways; you not only
need to generate the bootstrap samples, you also need to make sure
that it is does all other calculations correctly, including
calculating the statistic for the original data, calculating jackknife
statistics if they are used for confidence intervals, etc.

Wistful sigh - this would be pretty easy to do with S+Resample.

Tim Hesterberg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Histogram to KDE

2012-09-06 Thread Tim Hesterberg
To bootstrap from a histogram, use
  sample(bins, replace = TRUE, prob = counts)

Note that a kernel density estimate is biased, so some bootstrap
confidence intervals have poor coverage properties.
Furthermore, if the kernel bandwidth is data-driven then the estimate
is not functional, so some bootstrap and jackknife methods won't work right.

Tim Hesterberg
http://www.timhesterberg.net
New:  Mathematical Statistics with Resampling and R, Chihara  Hesterberg

On Fri, Aug 31, 2012 at 12:15 PM, David L Carlson dcarl...@tamu.edu wrote:

 Using a data.frame x with columns bins and counts:

 x - structure(list(bins = c(3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5,
 11.5, 12.5, 13.5, 14.5, 15.5), counts = c(1, 1, 2, 3, 6, 18,
 19, 23, 8, 10, 6, 2, 1)), .Names = c(bins, counts), row.names =
 4:16,
 class = data.frame)

 This will give you a plot of the kde estimate:


Thanks.


 xkde - density(rep(bins, counts), bw=SJ)
 plot(xkde)

 As for the standard error or the confidence interval, you would probably
 need to use bootstrapping.



 On a similar note - is there a way in R to directly resample /
cross-validate from a histogram of a data-set without recreating the
original data-set ?


   -Original Message-
 
  Hello,
  I wanted to know if there was way to convert a histogram of a data-set
  to a
  kernel density estimate directly in R ?
 
  Specifically, I have a histogram [bins, counts] of samples {X1 ...
  XN} of a quantized variable X where there is one bin for each level of
  X,
  and I'ld like to directly get a kde estimate of the pdf of X from the
  histogram. Therefore, there is no additional quantization of X in the
  histogram. Most KDE methods in R seem to require the original sample
  set   - and I would like to avoid re-creating the samples from the
  histogram. Is there some quick way of doing this using one of the
  standard
  kde methods in R ?
 
  Also, a general statistical question - is there some measure of the
  standard error or confidence interval or similar of a KDE of a data-set
  ?
 
  Thanks,
  -fj
 


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need help reshaping table using aggregate

2012-06-20 Thread Tim
I am trying to learn how to reshape my data set.  I am new to R, so please
bear with me.  Basically, I have the following data set:

site-c(A,A,B,B)
bug-c(spider,grasshopper,ladybug,stinkbug)
count-c(2,4,6,8)
myf - data.frame(site, bug, count)
myf

  site bug count
1A  spider 2
2A grasshopper 4
3B ladybug 6
4Bstinkbug 8

This means that in site A, I found 2 spiders and 4 grasshopper.  In site B,
I found 6 ladybugs and 8 stinkbugs.

I would like to change the df to aggregate the site column and make the bugs
columns so it arranged like this:

  site spider grasshopper ladybug stinkbug
1A  2   4   00
2B  0   0   68


--
View this message in context: 
http://r.789695.n4.nabble.com/need-help-reshaping-table-using-aggregate-tp4634014.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: legend for geom_rug() ..?

2012-06-07 Thread Tim Smith
Hi,

Here is the corrected code:

library(ggplot2)
ids - paste('id_',1:3,sep='')
before - sample(9)
after - sample(1:10,9)
dat - as.matrix(cbind(before,after))
rownames(dat) - rep(ids,3)
position - c(rep(10,3),rep(13,3),rep(19,3))

mdat - cbind(melt(dat),position)
colnames(mdat) - c('ID','Treatment','value','position')

ggplot(mdat, aes(position, value)) + geom_point(aes(colour = Treatment)) +
        geom_rug(subset = .(position  14),aes(y=NULL),color=orange) + 
        geom_rug(subset = .(position  14),aes(y=NULL),color=black) 



Alternatively, how do I add a second legend in ggplot2?

thanks!




 From: John Kane jrkrid...@inbox.com
To: Brian Smith bsmith030...@gmail.com; r-help@r-project.org 
Sent: Wednesday, June 6, 2012 3:06 PM
Subject: Re: [R] ggplot2: legend for geom_rug() ..?

What is X2? 

code not running at the moment 

John Kane
Kingston ON Canada


 -Original Message-
 From: bsmith030...@gmail.com
 Sent: Wed, 6 Jun 2012 11:52:25 -0400
 To: r-help@r-project.org
 Subject: [R] ggplot2: legend for geom_rug() ..?
 
 Hi,
 
 I was trying to make another legend for the rug plot. Sample code:
 
 
 library(ggplo2)
 ids - paste('id_',1:3,sep='')
 before - sample(9)
 after - sample(1:10,9)
 dat - as.matrix(cbind(before,after))
 rownames(dat) - rep(ids,3)
 position - c(rep(10,3),rep(13,3),rep(19,3))
 
 mdat - cbind(melt(dat),position)
 
 ggplot(mdat, aes(position, value)) + geom_point(aes(colour = X2)) +
         geom_rug(subset = .(position  14),aes(y=NULL),color=orange) +
         geom_rug(subset = .(position  14),aes(y=NULL),color=black)
 
 
 
 
 This gives the plot correctly, but how can I add another legend that
 would
 give some more information on the rugplot (e.g. that 'orange' line =
 'London', and 'black' line = 'NYC')?
 
 thanks!!
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
[[elided Yahoo spam]]
®, Google Talk™ and most webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] dataframe package

2012-05-23 Thread Tim Hesterberg
A new 'dataframe' package is on CRAN.  This is a modified version
of the data frame code in R, modified to make fewer copies of the inputs
and run faster, e.g.
  as.data.frame(a vector)
makes 1 copy of the vector rather than 3,
  data.frame(a vector)
makes 3 copies rather than 6, and
  x[, a] - newValue
makes (2,2,1) copies of (old data frame, new column, integer vector
of row names) rather than (6,4,2).

Functionality should be identical to existing R code, with two exceptions:
* bug fix, if x has two columns, then x[[4]] - value will fail rather than 
  producing an illegal data frame
* round(a data frame with numeric and factor columns)
  rounds the numeric columns and leaves the factor columns unchanged, rather
  than failing.

Tim Hesterberg

NEW!  Mathematical Statistics with Resampling and R, Chihara  Hesterberg
http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8
http://home.comcast.net/~timhesterberg
 (resampling, water bottle rockets, computers to Costa Rica, shower = 2650 
light bulbs, ...)

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible artifacts in cross-correlation function (ccf)?

2012-05-11 Thread Tim Dorscheidt
Dear R-users,

I have been using R and its core-packages with great satisfaction now for many 
years, and have recently started using the ccf function (part of the stats 
package version 2.16.0), about which I have a question.

The ccf-algorithm for calculating the cross-correlation between two time 
series always calculates the mean and standard deviation per time series 
beforehand, thereby using a constant value for these irrespective of any 
time-lag. Another piece of statistical software that I'm using, a toolbox in 
Matlab, does this in a fundamentally different way. It first chops off the 
parts of the time-series that do not overlap when a time-lag has been 
introduced, and then calculates a new mean and standard deviation to be used 
for further calculations. This latter method has the advantage of always 
theoretically still being able to obtain a cross-correlation of 1 (or -1), 
whereas the ccf-method of R seems to introduce zeros at the non-overlapping 
parts of the time-series, thereby preventing this possibility and producing 
very different results. Take for instance the two time series: a = {1,3,2} and 
b = {3,2,1}. The query ccf(a,b) produces the output {-0.5, -0.5, 0.5}, but I 
would t!
 hink that
 a time-lag of -1 should produce a cross-correlation here of 1, since the two 
time series will overlap with identical parts {3,2}.

I have attached clean implementations (removing all dependencies) of how the R 
algorithm seems to calculate cross-correlations with time-lag (it produces 
identical results to ccf), and how this other method (in Matlab) calculates 
it (with newly calculated means and standard deviation for each time-lag).

Could someone be so kind as to explain to me why the ccf-algorithm has this 
specific implementation that seems to, at least for specific situations, 
produce results with artifacts? It is very likely that the R-implementation, as 
opposed to the alternative algorithm described above and in the attachment, has 
a very good statistical explanation, but one that unfortunately is not dawning 
on me.

Sincerely,
Tim Dorscheidt



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] suggested method to transform list to a matrix

2012-04-19 Thread Tim Stutt
I have data in the following list format:

USER,VARIABLE,COUNT
user1, var1, 3
user1, var2, 4
user2, var1, 7
userN, var12, 5

And would like to have it format as a matrix: 

var1var2var12
user1   3   4
user2   7
userN   5

What is the suggested method to do this for 1 million rows, with 12 variables 
and ~ 4,000 unique users?  

Thank you,

Tim Stutt
t...@ischool.berkeley.edu



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [BioC] Read .idat Illumina files in R

2012-04-11 Thread Tim Triche, Jr.
Unfortunately, this won't help for expression arrays.  Last time I checked,
those IDATs appeared to be encrypted.

crlmm, methylumi, and minfi can all read IDAT files... *if* they are
version 3 or later (e.g. genotyping, methylation, etc).

Otherwise you are probably stuck with GenomeStudio if it is expression data
you're dealing with.

On Wed, Apr 11, 2012 at 7:14 AM, Moiz Bootwalla msbootwa...@gmail.comwrote:

 Hi Ekta,

 You can use the bioconductor package methylumi to read in .idat files. The
 function methylumIDAT() will read in .idat files and provide you with beta
 values along with the M and U values as a MethyLumiSet object. You can then
 use the functions methylumi.bgcorr() and normalizeMethyLumiSet() to
 background correct and normalize your data. See the vignette for more
 details.

 Best,
 Moiz


 On Apr 11, 2012, at 1:46 AM, Ekta Jain wrote:

  Dear Bioc and R List Users,
  I am having trouble analysing illumine data generated from BeadScan. I
 have .idat files and JPEG images. I realise that i need bead-level summary
 data to be able to begin quality control followed by normalization. Is
 there a way i can read .idat files for expression analysis or do i need to
 go back to BeadScan and generate .txt files/tiff files ?
 
  Appreciate any help here.
 
  Many Thanks,
  Ekta Jain
  The information contained in this electronic message and in any
 attachments to this message is confidential, legally privileged and
 intended only for use by the person or entity to which this electronic
 message is addressed. If you are not the intended recipient, and have
 received this message in error, please notify the sender and system manager
 by return email and delete the message and its attachments and also you are
 hereby notified that any distribution, copying, review, retransmission,
 dissemination or other use of this electronic transmission or the
 information contained in it is strictly prohibited. Please note that any
 views or opinions presented in this email are solely those of the author
 and may not represent those of the Company or bind the Company. Any
 commitments made over e-mail are not financially binding on the company
 unless accompanied or followed by a valid purchase order. This message has
 been scanned for viruses and dangerous content by Mail Scanner,!
   a!
  nd is believed to be clean. The Company accepts no liability for any
 damage caused by any virus transmitted by this email.
  www.jubl.com
 
[[alternative HTML version deleted]]
 
  ___
  Bioconductor mailing list
  bioconduc...@r-project.org
  https://stat.ethz.ch/mailman/listinfo/bioconductor
  Search the archives:
 http://news.gmane.org/gmane.science.biology.informatics.conductor

 ___
 Bioconductor mailing list
 bioconduc...@r-project.org
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives:
 http://news.gmane.org/gmane.science.biology.informatics.conductor




-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipperhttp://cancerres.aacrjournals.org/content/31/9/1173.full.pdf

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] detecting noise in data?

2012-01-24 Thread HARROLD, Tim
You might want to provide an example? It's a pretty vague problem at the moment.

If the data can be easily picked out by human eyes, you might want to think 
about your criteria you're using to pick out a contaminated result. If you can 
express it in such a way that you don't need to scan each observation (e.g. if 
a snapper weighs = 30kg then somebody entered that data incorrectly) then 
you can create an indicator variable and continue with your analysis.

Other than that - some sort of cluster analysis might be able to pick up on 2 
distinct groups provided within each group there's a reasonable level of 
homogeneity. Then from there, you can do a basic inference test for group means 
to detect whether there are significant differences detected between groups.

Cheers,
Tim



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Michael
Sent: Wednesday, 25 January 2012 9:31 AM
To: r-help
Subject: Re: [R] detecting noise in data?

Hi all,

I just wanted to add that I am looking for a solution that's in R ... to
handle this...

And also, in a given sample, the correct data are of the majority and the
noise are of the minority.

Thank you!

On Tue, Jan 24, 2012 at 4:09 PM, Michael comtech@gmail.com wrote:

 Hi all,

 I have data which are unfortuantely comtaminated by noise.

 We knew that the noise is at different level than the correct data, i.e.
 the noise data can be easily picked out by human eyes.

 It looks as if there are two people that generated the two very different
 data with different mean levels, and they got mixed together.

 i.e. assming the two data are following unknown distribution DF,

 and the two mean levels are u1 and u2... (unknown)

 Then the correct data are generated by DF(u1)

 and the noise are generated by DF(u2),

 and they got mixed...

 Now, how do I flag those suspicious data? At least is there a way I could
 answer the question:

 Given a sample of mixed data - are these data generated from the
 above-mentioned two sources, or the data are indeed generated from one
 source only.

 i.e. are there two substantially distinct species in the given data?

 Thanks a lot!



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
This email has been scanned for the NSW Ministry of Health by the Websense 
Hosted Email Security System. 
Emails and attachments are monitored to ensure compliance with the NSW Ministry 
of Health's Electronic Messaging Policy.
__


__
Disclaimer: This message is intended for the addressee named and may contain 
confidential information. 
If you are not the intended recipient, please delete it and notify the sender. 
Views expressed in this message are those of the individual sender, and are not 
necessarily the views of the NSW Ministry of Health.
__
This email has been scanned for the NSW Ministry of Health by the Websense 
Hosted Email Security System. 
Emails and attachments are monitored to ensure compliance with the NSW Ministry 
of Health's Electronic Messaging Policy.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrap means by strata

2011-10-13 Thread Tim Howard
All - 
I have an uneven set of replicates and would like to sample from this set X 
number of times to generate a mean for each grouping variable. I was thinking 
the boot package would be the thing to use, but now I'm not so sure ... given 
the discussion here: 
 
http://finzi.psych.upenn.edu/Rhelp10/2010-June/243828.html
 
Given these data (a small subset): 
 
 dput(x)
structure(list(numSpp = c(7, 8, 6, 4, 5, 4, 10, 12, 7, 13, 4, 
11, 4, 3, 3, 10, 8, 16, 11, 6, 3, 5, 6, 13, 15, 2, 6, 11, 11, 
10, 6, 9, 11, 14, 10, 7), TrSeasYr = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3)), .Names = c(numSpp, TrSeasYr), row.names = c(NA, 
-36L), class = data.frame)
 
#get into R by pasting the above after x- or dget(clipboard)
# my grouping var is col 2, data are in col 1
# my attempt using boot
library(boot)
mn - function(d,f) {
  mean(d[,1] * f[d[,1]])
  }
b1 - boot(data = x, mn, R=99, stype = f, strata=x$TrSeasYr)
 
 b1
STRATIFIED BOOTSTRAP

Call:
boot(data = x, statistic = mn, R = 99, stype = f, strata = x$TrSeasYr)

Bootstrap Statistics :
originalbiasstd. error
t1* 8.08 0.13524131.681901
 
What is the most efficient way to resample, with replacement, and generate 
means for each grouping variable?
 
Thanks in advance,
Tim
 
 sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: i386-pc-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C 
 
[5] LC_TIME=English_United States.1252
 
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
 
other attached packages:
[1] RODBC_1.3-2 boot_1.3-2 
 
loaded via a namespace (and not attached):
[1] tools_2.12.2
 
 
Timothy G. Howard, Ph.D.
Director of Science
New York Natural Heritage Program
625 Broadway, 5th floor
Albany, NY 12233-4757

(518) 402-8945
facsimile  (518) 402-8925

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bootstrap means by strata

2011-10-13 Thread Tim Howard
Answering my own question. 
?sample  (!)
 
y - by(x, x$TrSeasYr, function(x) mean(sample(x[,1], size=999, replace = 
TRUE)))

 Tim Howard 10/13/2011 9:42 AM 
All - 
I have an uneven set of replicates and would like to sample from this set X 
number of times to generate a mean for each grouping variable. I was thinking 
the boot package would be the thing to use, but now I'm not so sure ... given 
the discussion here: 
 
http://finzi.psych.upenn.edu/Rhelp10/2010-June/243828.html
 
Given these data (a small subset): 
 
 dput(x)
structure(list(numSpp = c(7, 8, 6, 4, 5, 4, 10, 12, 7, 13, 4, 
11, 4, 3, 3, 10, 8, 16, 11, 6, 3, 5, 6, 13, 15, 2, 6, 11, 11, 
10, 6, 9, 11, 14, 10, 7), TrSeasYr = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3)), .Names = c(numSpp, TrSeasYr), row.names = c(NA, 
-36L), class = data.frame)
 
#get into R by pasting the above after x- or dget(clipboard)
# my grouping var is col 2, data are in col 1
# my attempt using boot
library(boot)
mn - function(d,f) {
  mean(d[,1] * f[d[,1]])
  }
b1 - boot(data = x, mn, R=99, stype = f, strata=x$TrSeasYr)
 
 b1
STRATIFIED BOOTSTRAP

Call:
boot(data = x, statistic = mn, R = 99, stype = f, strata = x$TrSeasYr)

Bootstrap Statistics :
originalbiasstd. error
t1* 8.08 0.13524131.681901
 
What is the most efficient way to resample, with replacement, and generate 
means for each grouping variable?
 
Thanks in advance,
Tim
 
 sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: i386-pc-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C 
 
[5] LC_TIME=English_United States.1252
 
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
 
other attached packages:
[1] RODBC_1.3-2 boot_1.3-2 
 
loaded via a namespace (and not attached):
[1] tools_2.12.2
 
 
Timothy G. Howard, Ph.D.
Director of Science
New York Natural Heritage Program
625 Broadway, 5th floor
Albany, NY 12233-4757

(518) 402-8945
facsimile  (518) 402-8925

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread Tim Hesterberg
I'll concur with Peter Dalgaard that 
* a permutation test is the right thing to do - your problem is equivalent
  to a two-sample test,
* don't bootstrap, and
* don't bother with t-statistics
but I'll elaborate a bit on on why, including 
* two approaches to the whole problem - and how your approach relates
  to the usual approach,
* an interesting tidbit about resampling t statistics.

First, I'm assuming that your x variable is irrelevant, only y matters,
and that sample1 is a proper subset of df.  I'll also assume that you
want to look for differences in the mean, rather than arbitrary differences
(in which case you might use e.g. a Kolmogorov-Smirnov test).

There are two general approaches to this problem:
(1) two-sample problem, sample1$y vs df$y[rows other than sample 1]
(2) the approach you outlined, thinking of sample1$y as part of df$y.

Consider (1), and call the two data sets y1 and y2
The basic permutation test approach is
* compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2)
* repeat  (or 9) times:
  draw a sample of size n1 from the pooled data, call that Y1, call the rest Y2
  compute theta(Y1, Y2)
* P-value for a one-sided test is (1 + k) / (1 + )
  where k is the number of permutation samples with theta(Y1,Y2) = theta(y1,y2)

The test statistic could be
  mean(y1) - mean(y2)
  mean(y1)
  sum(y1)
  t-statistic (pooled variance)
  P-value for a t-test (pooled variance)
  mean(y1) - mean(pooled data)
  t-statistic (unpooled variance)
  P-value for a t-test (unpooled variance)
  median(y1) - median(y2)
  ...
The first six of those are equivalent - they give exactly the same P-value
for the permutation test.  The reason is that those test statistics
are monotone transformations of each other, given the data.
Hence, doing the pooled-variance t calculations gains nothing.

Now consider your approach (2).  That is equivalent to the permutation
test using the test statistic:  mean(y1) - mean(pooled data).

Why not a bootstrap?  E.g. pool the data and draw samples of size
n1 and n2 from the pooled data, independently and with replacement.
That is similar to the permutation test, but less accurate.  Probably
the easiest way to see this is to suppose there is 1 outlier in the pooled data.
In any permutation iteration there is exactly 1 outlier among the two samples.
With bootstrapping, there could be 0, 1, 2, 
The permutation test answers the question - given that there is exactly
1 outlier in my combined data, what is the probability that random chance
would give a difference as large as I observed.  The bootstrap would
answer some other question.

Tim Hesterberg
NEW!  Mathematical Statistics with Resampling and R, Chihara  Hesterberg
http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8
http://home.comcast.net/~timhesterberg
 (resampling, water bottle rockets, computers to Guatemala, shower = 2650 light 
bulbs, ...)


On Oct 8, 2011, at 16:04 , francy wrote:

 Hi,

 I am having trouble understanding how to approach a simulation:

 I have a sample of n=250 from a population of N=2,000 individuals, and I
 would like to use either permutation test or bootstrap to test whether this
 particular sample is significantly different from the values of any other
 random samples of the same population. I thought I needed to take random
 samples (but I am not sure how many simulations I need to do) of n=250 from
 the N=2,000 population and maybe do a one-sample t-test to compare the mean
 score of all the simulated samples, + the one sample I am trying to prove
 that is different from any others, to the mean value of the population. But
 I don't know:
 (1) whether this one-sample t-test would be the right way to do it, and how
 to go about doing this in R
 (2) whether a permutation test or bootstrap methods are more appropriate

 This is the data frame that I have, which is to be sampled:
 df-
 i.e.
 x y
 1 2
 3 4
 5 6
 7 8
 . .
 . .
 . .
 2,000

 I have this sample from df, and would like to test whether it is has extreme
 values of y.
 sample1-
 i.e.
 x y
 3 4
 7 8
 . .
 . .
 . .
 250

 For now I only have this:

 R=999 #Number of simulations, but I don't know how many...
 t.values =numeric(R)  #creates a numeric vector with 999 elements, which
 will hold the results of each simulation.
 for (i in 1:R) {
 sample1 - df[sample(nrow(df), 250, replace=TRUE),]

 But I don't know how to continue the loop: do I calculate the mean for each
 simulation and compare it to the population mean?
 Any help you could give me would be very appreciated,
 Thank you.

The straightforward way would be a permutation test, something like this

msamp - mean(sample1$y)
mpop - mean(df$y)
msim - replicate(1, mean(sample(df$y, 250)))

sum(abs(msim-mpop) = abs(msamp-mpop))/1

I don't really see a reason to do bootstrapping here. You say you want to test 
whether your sample could be a random sample from the population, so just 
simulate that sampling

[R] Can R be installed in a Hyper-V environment?

2011-09-20 Thread McDonough, Tim
We are contemplating the installation of R on a Windows server with Hyper-V but 
we can't find any information on whether it can be done.










This transmission is intended solely for the person or organization to whom it 
is addressed and it may contain privileged and confidential information. If you 
are not the intended recipient you should not copy, distribute or take any 
action in reliance on it. If you believe you received this transmission in 
error please notify the sender.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 freqpoly() layers..?

2011-09-08 Thread Tim Smith
Hi,

I was trying to overlay/combine two freqpoly plots. The sample code below 
illustrates the problem. Essentially, I want to do is:

1. Have the same colour for all the lines in 'Plot 1' (and 'Plot 2'). 
Currently, all the lines in Plot 1 have different colours and all the 
lines in Plot 2 have different colors. I'd like for all lines in Plot 1 
to be 'red' and all the lines in Plot 2 to be 'black'.
2. Combine both the plots ('Plot 1' and 'Plot 2' as one combined plot - 
which I attempt to do in 'Combined Plot'). However, I'm doing something 
wrong because with the code for 'Combined Plot' I just get two lines.

 sample code 
library(ggplot2)

## Plot 1 - normal distributions with mean = 0 ##
    mat - matrix(rnorm(1,mean=0),1000,10)
    colnames(mat) - paste('a',1:ncol(mat),sep='')
    rownames(mat) - 1:nrow(mat)
    mat2 - melt(mat)
    
    ggplot(mat2) + geom_freqpoly(aes(x = value,
                    y = ..density.., colour = X2)) 
    
## Plot 2- normal distributions with mean = 1
    tab - matrix(rnorm(1,mean=1),1000,10)
    colnames(tab) - paste('b',1:ncol(tab),sep='')
    rownames(tab) - 1:nrow(tab)
    tab2 - melt(tab)
    
    ggplot(tab2) + geom_freqpoly(aes(x = value,
                    y = ..density.., colour = X2)) 
    
    
## Combined plot 
    comb - cbind(mat,tab)
    comb2 - melt(comb)
    cols - c(rep('red',ncol(mat)*nrow(mat)),rep('black',ncol(tab)*nrow(tab)))
    
    ggplot(comb2) + geom_freqpoly(aes(x = value,
                    y = ..density.., colour = cols)) 


### End code ###

Any help would be appreciated!

thanks!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hide row and column-names in cat()

2011-08-24 Thread Tim Haering
Dear list,

to prepare data as input for a stand-alone groundwater model I use the
cat() function to print data.
I want to print a table with print() or cat() but without the row- and
column-names. Is this possible?

Here a small example

my.df - rbind(c(Nr,FraSand,FraSilt,FraClay,pH), c(,
(kg.kg-1),(kg.kg-1), (kg.kg-1), (-)), c(1, 19, 60, 21, 7.1),
c(2, 35, 53, 11, 7.7))
print(my.df, quote=FALSE)
 [,1] [,2]  [,3]  [,4]  [,5]
[1,] Nr   FraSand   FraSilt   FraClay   pH
[2,]  (kg.kg-1) (kg.kg-1) (kg.kg-1) (-)
[3,] 1    19    60    21    7.1
[4,] 2    35    53    11    7.7

What I want to have is this:

 Nr   FraSand   FraSilt   FraClay   pH
  (kg.kg-1) (kg.kg-1) (kg.kg-1) (-)
 1    19    60    21    7.1
 2    35    53    11    7.7

Thank you.
TIM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting components from a 'boot' class output in R

2011-07-24 Thread Tim Hesterberg
Sarath pointed out (privately) that the standard error is not actually
stored as part of the bootstrap object.

Instead, it is calculated on the fly when you print the bootstrap object;
this is done by
  boot:::print.boot
Similarly for bias.
(The 'boot:::' is necessary because 
'print.boot' is not an exported object from 'namespace:boot').

Tim Hesterberg

Do
  names(bootObj)
to find out what the components are, and use $ or [[ to extract
components.
Do
  help(boot)
for a description of components of the object (look in the Value section).

That is general advice in R, applying to all kinds of objects -
boot, and many other functions such as lm(), return lists with
a class added, and you can operate on the object as a list using
names(), $, etc.

Tim Hesterberg

Dear R user,

I used the following to do a bootstrap.


bootObj-boot(data=DAT, statistic=Lp.est,
R=1000,x0=3)

I have the following output from the above bootstrap. How
can I extract  components of the output.
For example, how can I extract the std.error?


 bootObj

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = DAT, statistic = Lp.est, R = 1000, x0 = 3)

Bootstrap Statistics :
  original    bias  std. error
t1*  794.9745 -0.341    4.042099

Any help is greatly appreciated.

Thank you


Sarath Banneheka

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting components from a 'boot' class output in R

2011-07-23 Thread Tim Hesterberg
Do
  names(bootObj)
to find out what the components are, and use $ or [[ to extract
components.
Do
  help(boot)
for a description of components of the object (look in the Value section).

That is general advice in R, applying to all kinds of objects -
boot, and many other functions such as lm(), return lists with
a class added, and you can operate on the object as a list using
names(), $, etc.

Tim Hesterberg

Dear R user,

I used the following to do a bootstrap.


bootObj-boot(data=DAT, statistic=Lp.est,
R=1000,x0=3)

I have the following output from the above bootstrap. How
can I extract  components of the output.
For example, how can I extract the std.error?


 bootObj

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = DAT, statistic = Lp.est, R = 1000, x0 = 3)

Bootstrap Statistics :
  original    bias  std. error
t1*  794.9745 -0.341    4.042099

Any help is greatly appreciated.

Thank you


Sarath Banneheka

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pulling data from remote MySQL database

2011-06-21 Thread HARROLD, Tim
Agree - I started with RMySQL but eventually changed over to RODBC when I 
realised some of the additional functionality it contains and the fact that I 
needed to access different types of RDBMSs (e.g. acquire data from a Microsoft 
access database file).

Cheers,
Tim

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David A. Johnston
Sent: Wednesday, 22 June 2011 8:22 AM
To: r-help@r-project.org
Subject: Re: [R] Pulling data from remote MySQL database

Hi Joseph,

I know that the RMySQL and RODBC packages allow you to pull the result of a
MySQL query into a data.frame within R.  

I like RODBC because you can access many different flavors of RDBMS's. 

-David Johnston

--
View this message in context: 
http://r.789695.n4.nabble.com/Pulling-data-from-remote-MySQL-database-tp3615351p3615523.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
This email has been scanned for the NSW Department of Health by the Websense 
Hosted Email Security System. 
The Department regularly monitors emails and attachments to ensure compliance 
with its Electronic Messaging Policy.
__


__
Disclaimer: This message is intended for the addressee named and may contain 
confidential information. 
If you are not the intended recipient, please delete it and notify the sender. 
Views expressed in this message are those of the individual sender, and are not 
necessarily the views of NSW Health.
__
This email has been scanned for the NSW Department of Health by the Websense 
Hosted Email Security System. 
The Department regularly monitors emails and attachments to ensure compliance 
with its Electronic Messaging Policy.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread Tim Smith
William,

I think to convert to numeric, you might need to do something like:

as.numeric(as.character())  ## and not just as.numeric()

As it stands, it would appear that it is still being read as a character string.






From: William Armstrong william.armstr...@noaa.gov
To: r-help@r-project.org
Sent: Tue, June 7, 2011 10:05:37 AM
Subject: Re: [R] Sorting a data frame with values of different lengths

Also, I tried changing a line to store W as numeric:

sample_info-c(pds_gagehandles[i],p,as.numeric(sample_W))

But it is still sorting incorrectly:

 W_table[order(W_table$as.numeric.W.),]
   pds_gagehandles.i.  p as.numeric.W.
8mibe  81004.5
1mibe  1   746
10   mibe 10   746
6mibe  6 811.5
2mibe  2 818.5
11   mibe 11 822.5
4mibe  4   879
5mibe  5   888
9mibe  9   901
7mibe  7 903.5
3mibe  3 919.5
 str(W_table)
'data.frame':   11 obs. of  3 variables:
$ pds_gagehandles.i.: Factor w/ 1 level mibe: 1 1 1 1 1 1 1 1 1 1 ...
$ p : chr  1 2 3 4 ...
$ as.numeric.W. : chr  746 818.5 919.5 879 ...


--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p3579691.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on approximations of full logistic regression model

2011-05-16 Thread Tim Hesterberg
My usual rule is that whatever gives the widest confidence intervals
in a particular problem is most accurate for that problem :-)

Bootstrap percentile intervals tend to be too narrow.
Consider the case of the sample mean; the usual formula CI is
xbar +- t_alpha sqrt( (1/(n-1)) sum((x_i - xbar)^2)) / sqrt(n)
The bootstrap percentile interval for symmetric data is roughly
xbar +- z_alpha sqrt( (1/(n  )) sum((x_i - xbar)^2)) / sqrt(n)
It is narrower than the formula CI because
  * z quantiles rather than t quantiles
  * standard error uses divisor of n rather than (n-1)

In stratified sampling, the narrowness factor depends on the
stratum sizes, not the overall n.
In regression, estimates for some quantities may be based on a small
subset of the data (e.g. coefficients related to rare factor levels).

This doesn't mean we should give up on the bootstrap.
There are remedies for the bootstrap biases, see e.g.
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
vs. Smoothing, Proceedings of the Section on Statistics and the
Environment, American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

And other methods have their own biases, particularly in nonlinear
applications such as logistic regression.

Tim Hesterberg

Thank you for your reply, Prof. Harrell.

I agree with you. Dropping only one variable does not actually help a lot.

I have one more question.
During analysis of this model I found that the confidence
intervals (CIs) of some coefficients provided by bootstrapping (bootcov
function in rms package) was narrower than CIs provided by usual
variance-covariance matrix and CIs of other coefficients wider.  My data
has no cluster structure. I am wondering which CIs are better.
I guess bootstrapping one, but is it right?

I would appreciate your help in advance.
--
KH



(11/05/16 12:25), Frank Harrell wrote:
 I think you are doing this correctly except for one thing.  The validation
 and other inferential calculations should be done on the full model.  Use
 the approximate model to get a simpler nomogram but not to get standard
 errors.  With only dropping one variable you might consider just running the
 nomogram on the entire model.
 Frank


 KH wrote:

 Hi,
 I am trying to construct a logistic regression model from my data (104
 patients and 25 events). I build a full model consisting of five
 predictors with the use of penalization by rms package (lrm, pentrace
 etc) because of events per variable issue. Then, I tried to approximate
 the full model by step-down technique predicting L from all of the
 componet variables using ordinary least squares (ols in rms package) as
 the followings. I would like to know whether I am doing right or not.

 library(rms)
 plogit- predict(full.model)
 full.ols- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
 fastbw(full.ols, aics=1e10)

   Deleted   Chi-Sq d.f. P  Residual d.f. P  AICR2
   stenosis   1.41  10.2354   1.41   10.2354  -0.59 0.991
   x216.78  10.  18.19   20.0001  14.19 0.882
   procedure 26.12  10.  44.31   30.  38.31 0.711
   ClinicalScore 25.75  10.  70.06   40.  62.06 0.544
   x183.42  10. 153.49   50. 143.49 0.000

 Then, fitted an approximation to the full model using most imprtant
 variable (R^2 for predictions from the reduced model against the
 original Y drops below 0.95), that is, dropping stenosis.

 full.ols.approx- ols(plogit ~ x1+x2+ClinicalScore+procedure)
 full.ols.approx$stats
n  Model L.R.d.f.  R2   g   Sigma
 104.000 487.9006640   4.000   0.9908257   1.3341718   0.1192622

 This approximate model had R^2 against the full model of 0.99.
 Therefore, I updated the original full logistic model dropping
 stenosis as predictor.

 full.approx.lrm- update(full.model, ~ . -stenosis)

 validate(full.model, bw=F, B=1000)
index.orig trainingtest optimism index.correctedn
 Dxy   0.6425   0.7017  0.6131   0.0887  0.5539 1000
 R20.3270   0.3716  0.3335   0.0382  0.2888 1000
 Intercept 0.   0.  0.0821  -0.0821  0.0821 1000
 Slope 1.   1.  1.0548  -0.0548  1.0548 1000
 Emax  0.   0.  0.0263   0.0263  0.0263 1000

 validate(full.approx.lrm, bw=F, B=1000)
index.orig trainingtest optimism index.correctedn
 Dxy   0.6446   0.6891  0.6265   0.0626  0.5820 1000
 R20.3245   0.3592  0.3428   0.0164  0.3081 1000
 Intercept 0.   0.  0.1281  -0.1281  0.1281 1000
 Slope 1.   1.  1.1104  -0.1104  1.1104 1000
 Emax  0.   0.  0.0444   0.0444  0.0444 1000

 Validatin revealed this approximation was not bad.
 Then, I made a nomogram.

 full.approx.lrm.nom- nomogram

Re: [R] memory and bootstrapping

2011-05-05 Thread Tim Hesterberg
(Regarding bootstrapping logistic regression.)

If the number of rows with Y=1 is small, it doesn't matter that n is huge.

If both number of successes and failures is huge, then ad Ripley notes
you can use asymptotic CIs.  The mean difference in predicted probabilities
is a nonlinear function of the coefficients, so you can use the delta method
to get standard errors.

In general, if you're not sure about normality and bias, you can use
the bootstrap to estimate how close to normal the sampling distribution
is.  The results may surprise you.  For example, for a one-sample mean,
if the population has skewness = 2 (like an exponential distn),
you need n=5000 before the CLT is reasonably accurate (actual 
one-sided non-coverage probabilities within 10% of the nominal, for a
95% interval).

Finally, you can speed up bootstrapping glms by using starting
values based on the coefficients estimated from the original data.
And, you can compute the model matrix once and resample rows of that
along with y, rather than computing a model matrix from scratch each time.

Tim Hesterberg

The only reason the boot package will take more memory for 2000
replications than 10 is that it needs to store the results.  That is
not to say that on a 32-bit OS the fragmentation will not get worse,
but that is unlikely to be a significant factor.

As for the methodology: 'boot' is support software for a book, so
please consult it (and not secondary sources).  From your brief
description it looks to me as if you should be using studentized CIs.

130,000 cases is a lot, and running the experiment on a 1% sample
may well show that asymptotic CIs are good enough.

On Thu, 5 May 2011, E Hofstadler wrote:

 hello,

 the following questions will without doubt reveal some fundamental
 ignorance, but hopefully you can still help me out.

 I'd like to bootstrap a coefficient gained on the basis of the
 coefficients in a logistic regression model (the mean differences in
 the predicted probabilities between two groups, where each predict()
 operation uses as the newdata-argument a dataframe of equal size as
 the original dataframe).I've got 130,000 rows and 7 columns in my
 dataframe. The glm-model uses all variables (as well as two 2-way
 interactions).

 System:
 - R-version: 2.12.2
 - OS: Windows XP Pro, 32-bit
 - 3.16Ghz intel dual core processor, 2.9GB RAM

 I'm using the boot package to arrive at the standard errors for this
 difference, but even with only 10 replications, this takes quite a
 long time: 216 seconds (perhaps this is partly also due to my
 inefficiently programmed function underlying the boot-call, I'm also
 looking into that).

 I wanted to try out calculating a bca-bootstrapped confidence
 interval, which as I understand requires a lot more replications than
 normal-theory intervals. Drawing on John Fox' Appendix to his An R
 Companion to Applied Regression, I was thinking of trying out 2000
 replications -- but this will take several hours to compute on my
 system (which isn't in itself a major issue though).

 My Questions:
 - let's say I try bootstrapping with 2000 replications. Can I be
 certain that the memory available to R  will be sufficient for this
 operation?
 - (this relates to statistics more generally): is it a good idea in
 your opinion to try bca-bootstrapping, or can it be assumed that a
 normal theory confidence interval will be a sufficiently good
 approximation (letting me get away with, say, 500 replications)?


 Best,
 Esther

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple Missing cases Function

2011-04-19 Thread Tim Elwell-Sutton
Dear all

 

I have written a function to perform a very simple but useful task which I
do regularly. It is designed to show how many values are missing from each
variable in a data.frame. In its current form it works but is slow because I
have used several loops to achieve this simple task. 

 

Can anyone see a more efficient way to get the same results? Or is there
existing function which does this?

 

Thanks for your help

Tim

 

Function:

miss - function (data) 

{

miss.list - list(NA)

for (i in 1:length(data)) {

miss.list[[i]] - table(is.na(data[i]))

}

for (i in 1:length(miss.list)) {

if (length(miss.list[[i]]) == 2) {

miss.list[[i]] - miss.list[[i]][2]

}

}

for (i in 1:length(miss.list)) {

if (names(miss.list[[i]]) == FALSE) {

miss.list[[i]] - 0

}

}

data.frame(names(data), as.numeric(miss.list))

}

 

Example:

data(ToothGrowth)

 data.m - ToothGrowth

 data.m$supp[sample(1:nrow(data.m), size=25)] - NA

 miss(data.m)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Missing cases Function

2011-04-19 Thread Tim Elwell-Sutton
Dear Petr
Thanks so much. That is a LOT more efficient.
Tim

-Original Message-
From: Petr PIKAL [mailto:petr.pi...@precheza.cz] 
Sent: Tuesday, April 19, 2011 3:37 PM
To: tesutton
Cc: r-help@r-project.org
Subject: Odp: [R] Simple Missing cases Function

Hi

Hi
try

colSums(is.na(data.m))

It is not in data frame but you can easily transform it if you want.

Regards
Petr


r-help-boun...@r-project.org napsal dne 19.04.2011 09:29:08:

 Dear all
 
 
 
 I have written a function to perform a very simple but useful task which 
I
 do regularly. It is designed to show how many values are missing from 
each
 variable in a data.frame. In its current form it works but is slow 
because I
 have used several loops to achieve this simple task. 
 
 
 
 Can anyone see a more efficient way to get the same results? Or is there
 existing function which does this?
 
 
 
 Thanks for your help
 
 Tim
 
 
 
 Function:
 
 miss - function (data) 
 
 {
 
 miss.list - list(NA)
 
 for (i in 1:length(data)) {
 
 miss.list[[i]] - table(is.na(data[i]))
 
 }
 
 for (i in 1:length(miss.list)) {
 
 if (length(miss.list[[i]]) == 2) {
 
 miss.list[[i]] - miss.list[[i]][2]
 
 }
 
 }
 
 for (i in 1:length(miss.list)) {
 
 if (names(miss.list[[i]]) == FALSE) {
 
 miss.list[[i]] - 0
 
 }
 
 }
 
 data.frame(names(data), as.numeric(miss.list))
 
 }
 
 
 
 Example:
 
 data(ToothGrowth)
 
  data.m - ToothGrowth
 
  data.m$supp[sample(1:nrow(data.m), size=25)] - NA
 
  miss(data.m)
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Missing cases Function

2011-04-19 Thread Tim Elwell-Sutton
Thanks Tyler

This function has some useful features

Tim

 

From: Tyler Rinker [mailto:tyler_rin...@hotmail.com] 
Sent: Tuesday, April 19, 2011 3:52 PM
To: tesutton; r-help@r-project.org
Subject: RE: [R] Simple Missing cases Function

 

I use the following code/function which gives me some quick descriptives
about each variable (ie. n of missing values, % missing, case #'s missing,
etc.):
Fairly quick, maybe not pretty but effective on either single variables or
entire data sets.
 
NAhunter-function(dataset)
{
find.NA-function(variable)
{
if(is.numeric(variable)){
n-length(variable)
mean-mean(variable, na.rm=T)
median-median(variable, na.rm=T)
sd-sd(variable, na.rm=T)
NAs-is.na(variable)
total.NA-sum(NAs)
percent.missing-total.NA/n
descriptives-data.frame(n,mean,median,sd,total.NA,percent.missing)
rownames(descriptives)-c( )
Case.Number-1:n
Missing.Values-ifelse(NAs0,Missing Value, )
missing.value-data.frame(Case.Number,Missing.Values)
missing.values-missing.value[ which(Missing.Values=='Missing Value'),]
list(NUMERIC DATA,DESCRIPTIVES=t(descriptives),CASE # OF MISSING
VALUES=missing.values[,1])
}
else{
n-length(variable)
NAs-is.na(variable)
total.NA-sum(NAs)
percent.missing-total.NA/n
descriptives-data.frame(n,total.NA,percent.missing)
rownames(descriptives)-c( )
Case.Number-1:n
Missing.Values-ifelse(NAs0,Missing Value, )
missing.value-data.frame(Case.Number,Missing.Values)
missing.values-missing.value[ which(Missing.Values=='Missing Value'),]
list(CATEGORICAL DATA,DESCRIPTIVES=t(descriptives),CASE # OF MISSING
VALUES=missing.values[,1])
}
}
dataset-data.frame(dataset)
options(scipen=100)
options(digits=2)
lapply(dataset,find.NA)
}

 
 From: tesut...@hku.hk
 To: r-help@r-project.org
 Date: Tue, 19 Apr 2011 15:29:08 +0800
 Subject: [R] Simple Missing cases Function
 
 Dear all
 
 
 
 I have written a function to perform a very simple but useful task which I
 do regularly. It is designed to show how many values are missing from each
 variable in a data.frame. In its current form it works but is slow because
I
 have used several loops to achieve this simple task. 
 
 
 
 Can anyone see a more efficient way to get the same results? Or is there
 existing function which does this?
 
 
 
 Thanks for your help
 
 Tim
 
 
 
 Function:
 
 miss - function (data) 
 
 {
 
 miss.list - list(NA)
 
 for (i in 1:length(data)) {
 
 miss.list[[i]] - table(is.na(data[i]))
 
 }
 
 for (i in 1:length(miss.list)) {
 
 if (length(miss.list[[i]]) == 2) {
 
 miss.list[[i]] - miss.list[[i]][2]
 
 }
 
 }
 
 for (i in 1:length(miss.list)) {
 
 if (names(miss.list[[i]]) == FALSE) {
 
 miss.list[[i]] - 0
 
 }
 
 }
 
 data.frame(names(data), as.numeric(miss.list))
 
 }
 
 
 
 Example:
 
 data(ToothGrowth)
 
 data.m - ToothGrowth
 
 data.m$supp[sample(1:nrow(data.m), size=25)] - NA
 
 miss(data.m)
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrap 95% confidence intervals for splines

2011-03-27 Thread Tim Hesterberg
You're mixing up two concepts here,
  - splines
  - bootstrap confidence intervals
Separating them may help cut the confusion.

First, to do a bootstrap confidence interval for a difference in predictions
in the linear regression case, do:

repeat 10^4 times
  draw a bootstrap sample of the observations (subjects, keeping x  y together)
  fit the linear regression to the bootstrap sample
  record the difference in predictions at the two x values
end loop
The bootstrap confidence interval is the range of the middle 95% of
the recorded differences.

For a spline, the procedure is the same except for fitting a spline regression:

repeat 10^4 times
  draw a bootstrap sample of the observations (subjects, keeping x  y together)
  fit the SPLINE regression to the bootstrap sample
  record the difference in predictions at the two x values
end loop
The bootstrap confidence interval is the range of the middle 95% of
the recorded differences.

Tim Hesterberg

P.S. I think you're mixing up the response and explanatory variables.
I'd think of eating hot dogs as the cause (explanatory variable),
and waistline as the effect (response, or outcome).

P.P.S.  I don't like the terms independent and dependent variables,
as that conflicts with the concept of independence in probability.
Independent variables are generally not independent, and the dependent
variable may be independent of the others.

There appear to be reports in the literature that transform continuous
independent variablea by the use of splines, e.g.,  assume the dependent
variable is hot dogs eaten per week (HD) and the independent variable is
waistline (WL), a normal linear regression model would be:

nonconfusing_regression  - lm(HD ~ WL)

One might use a spline,

confusion_inducing_regression_with_spline - lm(HD ~ ns(WL, df = 4) )

Now is where the problem starts.

From nonconfusing_regression , I get, say 2 added hot dogs per week for each
centimeter of waistline along with a s.e. of 0.5 hot dogs per week, which I
multiply by 1.96 to garner each side of the 95% c.i.
If I want to show what the difference between the 75th percentile (say 100
cm) and 25th percentile (say 80 cm) waistlines are, I multiply 2 by
100-80=20 and get 40 hot dogs per week as the point estimate with a similar
bumping of the s.e. to 10 hot dogs per week.

What do I do to get the point estimate and 95% confidence interval for the
difference between 100 cm persons and 80 cm persons with
confusion_inducing_regression_with_spline ?

Best regards.

Mitchell S. Wachtel, MD

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assign factor levels based on list

2011-02-10 Thread Tim Howard
David and Bill,
Thank you so much for your rapid and exceptional help. Bill, the reason I had 
gone with lists within the list was because I thought I might use the list for 
holding other information - and because it was easier to get the column name. 
Your simple-list suggestion is cleaner and I'm going with that for now. To 
better understand it, I tweaked David's fabulous lapply() statement to work 
with the simpler list and for prosperity, provide that below. The for-loop 
seems cleanest and I'll probably go with it on its own, outside of the function.

 
 y - data.frame(colOne = c(1,2,3), colTwo = c(apple,pear,orange),
+ colThree=c(4,5,6) )
  my.factor.defs - list(colOne = c(1,2,3,4,5,6),
+colTwo = c(apple, pear, orange, fig, banana))
 y[ , names(my.factor.defs)] - lapply(names(my.factor.defs), function(x) {
+y[[x]] - factor(y[[x]] , levels= 
my.factor.defs[[x]])})
 str(y)
'data.frame':   3 obs. of  3 variables:
 $ colOne  : Factor w/ 6 levels 1,2,3,4,..: 1 2 3
 $ colTwo  : Factor w/ 5 levels apple,pear,..: 1 2 3
 $ colThree: num  4 5 6
 
Thank you again!
Best, 
Tim Howard
 
 William Dunlap wdun...@tibco.com 2/9/2011 4:41 PM 
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Tim Howard
 Sent: Wednesday, February 09, 2011 12:44 PM
 To: r-help@r-project.org
 Subject: [R] assign factor levels based on list
 
 All,
  
 Given a data frame and a list containing factor definitions 
 for certain columns, how can I apply those definitions from 
 the list, rather than doing it the standard way, as noted 
 below. I'm lost in the world of do.call, assign, paste, and 
 can't find my way through. For example:
  
 #set up df
 y - data.frame(colOne = c(1,2,3), colTwo = 
 c(apple,pear,orange))
  
 factor.defs - list(colOne = list(name = colOne,
  lvl = c(1,2,3,4,5,6)),
  colTwo = list(name = colTwo,
  lvl = c(apple,pear,orange,fig,banana)))

Why not the following format?
   my.factor.defs - list(colOne = c(1,2,3,4,5,6),
   colTwo = c(apple, pear, orange, fig,
banana))
Do you really want to support a case like the following?
   list(colOne = list( name = anotherColumn, lvl=c(1,2,3,4,5,6))
  
 #A standard way to define levels
 y$colTwo - factor(y$colTwo , levels = 
 c(apple,pear,orange,fig,banana))
  
 # I'd like to use the definitions locally but also pass them 
 (but not the data) to a function, 
 # so, rather than defining each manually each time, I'd like 
 to loop through the columns,
 # call them by name, find the definitions in the list and use 
 them from there. Before I try to loop
 # or use some form of apply, I'd like to get a single factor 
 definition working.

First write a function that takes a data.frame and list
of desired levels for each column and outputs a new data.frame.
E.g., if you use the simpler form of the levelsList I gave
above, the following might work well enough (it does no
error checking):
   assignNewLevelsToDataFrameColumns - function(x, levelsList) {
  for(colName in names(levelsList)) {
  # note that x$name is equivalent to x[[name]], so
  # if you want to use a variable as the name, use [[.
  x[[colName]] - factor(x[[colName]],
levels=levelsList[[colName]])
  }
  x
   }
Test it:
fixedY - assignNewLevelsToDataFrameColumns(y, my.factor.defs)
 colOne colTwo
   1  1  apple
   2  2   pear
   3  3 orange
str(fixedY)
   'data.frame':   3 obs. of  2 variables:
$ colOne: Factor w/ 6 levels 1,2,3,4,..: 1 2 3
$ colTwo: Factor w/ 5 levels apple,pear,..: 1 2 3
Do
y - assignNewLevelsToDataFrameColumns(y, my.factor.defs)
if you want to overwrite the old y.

Now if you want a function that changes the data.frame you give
it, use a replacement function.  If you want to use the syntax
func(y) - newStuff
then the function should be called `func-` and the last argument
must be called 'value' (newStuff will be passed via value=newStuff).
E.g.,
   `func-` - function(x, value) {
 alteredX - assignNewLevelsToDataFrameColumns(x, value)
 alteredX
}
and use it as
func(y) - my.factor.defs
str(y)
   'data.frame':   3 obs. of  2 variables:
   $ colOne: Factor w/ 6 levels 1,2,3,4,..: 1 2 3
   $ colTwo: Factor w/ 5 levels apple,pear,..: 1 2 3
The first command gets translated into
   y - `func-`(y, value=my.factor.defs)

If you write a replacement function, it is nice to create a matching
extractor function called 'func'.  E.g.,
func - function(x) lapply(x, levels)
func(y)
   $colOne
   [1] 1 2 3 4 5 6
   
   $colTwo
   [1] apple  pear   orange figbanana

Note that this avoids assign(), get(), eval(), etc., and
thus makes it easy to follow the flow of data in the code: only
things on the left side of the assignment arrow can get
changed.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

  
 # this doesn't seem to see the dataframe properly

[R] assign factor levels based on list

2011-02-09 Thread Tim Howard
All,
 
Given a data frame and a list containing factor definitions for certain 
columns, how can I apply those definitions from the list, rather than doing it 
the standard way, as noted below. I'm lost in the world of do.call, assign, 
paste, and can't find my way through. For example:
 
#set up df
y - data.frame(colOne = c(1,2,3), colTwo = c(apple,pear,orange))
 
factor.defs - list(colOne = list(name = colOne,
 lvl = c(1,2,3,4,5,6)),
 colTwo = list(name = colTwo,
 lvl = c(apple,pear,orange,fig,banana)))
 
#A standard way to define levels
y$colTwo - factor(y$colTwo , levels = 
c(apple,pear,orange,fig,banana))
 
# I'd like to use the definitions locally but also pass them (but not the data) 
to a function, 
# so, rather than defining each manually each time, I'd like to loop through 
the columns,
# call them by name, find the definitions in the list and use them from there. 
Before I try to loop
# or use some form of apply, I'd like to get a single factor definition working.
 
# this doesn't seem to see the dataframe properly
do.call(factor,list((paste(y$,factor.defs[2][[1]]$name,sep=)),levels=factor.defs[2][[1]]$lvl))
 
#adding as.name doesn't help
do.call(factor,list(as.name(paste(y$,factor.defs[2][[1]]$name,sep=)),levels=factor.defs[2][[1]]$lvl))
 
#Here's my attempt to mimic the standard way, using assign. Ha! what a joke.
assign(as.name(paste(y$,factor.defs[2][[1]]$name,sep=)),
do.call(factor, list(as.name(paste(y$,factor.defs[2][[1]]$name,sep=)), 
levels = factor.defs[2][[1]]$lvl)))
##Error in function (x = character(), levels, labels = levels, exclude = NA,  : 
##  object 'y$colTwo' not found
Any help or perspective (or better way from the beginning!) would be greatly 
appreciated. 
Thanks in advance!
Tim
 
 
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing formula as an agrument

2011-01-23 Thread Tim Smith
Hi,

I had a function that looked like:

diff - lm(x ~ y + z)

How can I pass the argument to the 'lm' function on the fly? E.g., if I pass it 
in as a string (e.g. x ~ y + z), then the lm function treats it as a string 
and not a proper argument.

many thanks


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] legend not appearing in Word document

2010-12-12 Thread Tim Clark
I need help with using graphics in Word 2007 that will later be converted into 
a 
pdf document.  I have tried several formats and found that I get the best 
quality of graphics using .wmf, .eps format, but when I convert it to .pdf I 
get 
a bunch of lines across the figures.  I also tried .tiff and .png but they give 
me much lower quality.  The best quality that converts to pdf appears to be 
.eps.  However, I have now come across a problem with my figure legends.  For 
some reason the legend is visible in R but not in Word.  Does anyone know why a 
legend in .eps format won't work in Word, or how I can get it to work?  I have 
made an example of the legend below that you should be able to save as .eps and 
paste into Word as an example.  I would appreciate any help you can offer on 
getting the legend in .eps format to work, or on other formats that may be 
better for Word and pdf files.

Thanks,

Tim

  library(plotrix)
  Satelite.Palette - 
colorRampPalette(c(blue3,cyan,aquamarine,yellow,orange,red))
  mycol-Satelite.Palette(ceiling(5000+1))#Max relative angle multiplied by 
100 to give larger range.  Max is 3.1415, rounded up to 3.15 plus one.
  col.labels-round((seq(0,5000,length.out=5)/1000),1)
  plot(0, 0, type=n, axes=F, xlab=, ylab=)   #New plot for legend
  color.legend(0,0,1,1,col.labels, mycol, align=rb, gradient=y)   #Adds 
legend




 Tim Clark
Marine Ecologist
National Park of American Samoa
Pago Pago, AS 96799 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend not appearing in Word document

2010-12-12 Thread Tim Clark
Thanks, I will take it up with MS.  I just downloaded their latest converter 
and that hasn't fixed the issue.  Hopefully they will have additional advice.

Aloha,

Tim

Tim Clark
Marine Ecologist
National Park of American Samoa
Pago Pago, AS 96799 





From: Schalk Heunis schalk.heu...@enerweb.co.za

Cc: r help r-help r-help@r-project.org; Tim Clark tim_cl...@nps.gov
Sent: Sun, December 12, 2010 6:05:23 AM
Subject: Re: [R] legend not appearing in Word document

Tim

Works in Word 2002 on Windows XP with PDF-xchange 3.0 to convert to pdf.

Saw reponse from Duncan - agree might be problem with Word 2007 PDF converter.

HTH
Schalk 





I need help with using graphics in Word 2007 that will later be converted into a
pdf document.  I have tried several formats and found that I get the best
quality of graphics using .wmf, .eps format, but when I convert it to .pdf I 
get
a bunch of lines across the figures.  I also tried .tiff and .png but they give
me much lower quality.  The best quality that converts to pdf appears to be
.eps.  However, I have now come across a problem with my figure legends.  For
some reason the legend is visible in R but not in Word.  Does anyone know why a
legend in .eps format won't work in Word, or how I can get it to work?  I have
made an example of the legend below that you should be able to save as .eps and
paste into Word as an example.  I would appreciate any help you can offer on
getting the legend in .eps format to work, or on other formats that may be
better for Word and pdf files.

Thanks,

Tim

  library(plotrix)
  Satelite.Palette -
colorRampPalette(c(blue3,cyan,aquamarine,yellow,orange,red))
  mycol-Satelite.Palette(ceiling(5000+1))#Max relative angle multiplied by
100 to give larger range.  Max is 3.1415, rounded up to 3.15 plus one.
  col.labels-round((seq(0,5000,length.out=5)/1000),1)
  plot(0, 0, type=n, axes=F, xlab=, ylab=)   #New plot for legend
  color.legend(0,0,1,1,col.labels, mycol, align=rb, gradient=y)   #Adds
legend




 Tim Clark
Marine Ecologist
National Park of American Samoa
Pago Pago, AS 96799




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] apply over list of data.frames

2010-11-24 Thread Tim Howard
R users,
This probably involves a simple incantation of one of the flavors of apply... 
that I can't yet figure out. Consider a list of data frames. I'd like to apply 
a function (mean) across the list and return a dataframe of the same dimensions 
where each cell represents the mean of that cell across all dataframes. 
 
# set up the list
x - vector(list,2)
names(x) - c(one,two)
# add data to the list 
for(i in 1:2){
 y = i^2
 x[[i]] - 
data.frame(a=c(y,2*y,3*y),b=c(y+1,y+2,y+3),c=c(2*y+1,2*y+2,2*y+3))
 }
#show the list
 x
$one
  a b c
1 1 2 3
2 2 3 4
3 3 4 5
 
$two
   a b  c
1  4 5  9
2  8 6 10
3 12 7 11
#the result should be
 a b c
1 2.5 3.5 6
2 5 4.5 7
3 7.5 5.5 8
 
Can anyone direct me down the right path?
 
Thanks in advance
Tim Howard
 
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   >