[R] Restructuring Star Wars data from rwars package
I'm having trouble restructuring data from the rwars package into a dataframe. Can someone help me? Here's what I have... library("rwars") library("tidyverse") # These data are json, so they load into R as a list people <- get_all_people(parse_result = T) people <- get_all_people(getElement(people, "next"), parse_result = T) # Look at Anakin Skywalker's data people$results[[1]] people$results[[1]][1] # print his name # To use them in R, I need to restructure them to a dataframe like they are in dplyr data("starwars") glimpse(starwars) Thanks for the help. Cheers, MVS = Matthew Van Scoyoc = Think SNOW! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fread transforms numbers
Thanks Bill for cc. Santosh, I'm almost certain you don't have package bit64 installed. When you do it works fine : > remove.packages("bit64") > data.table::fread("9876543210\n") V1 1: 4.879661e-314 > install.packages("bit64") > data.table::fread("9876543210\n") V1 1: 9876543210 News for data.table v1.10.2 on CRAN 31 Jan 2017 contained : * When fread() or print() see integer64 columns are present, bit64's namespace is now automatically loaded for convenience. However, when data.table loads the namespace there is a bug in this function : > data.table:::require_bit64 function () { tt = try(requireNamespace("bit64", quietly = TRUE)) if (inherits(tt, "try-error")) warning("Some columns are type 'integer64' but package bit64 is not installed. Those columns will print as strange looking floating point data. There is no need to reload the data. Simply install.packages('bit64') to obtain the integer64 print method and print the data again.") } The intent was to display that nice helpful message to you. Due to this report, I can see now that I shouldn't have wrapped requireNamespace() with try() because requireNamespace() returns TRUE or FALSE anyway. Even though requireNamespace() prints 'Failed with error' it doesn't actually throw an error. I'll change data.table's function to the following : if (!requireNamespace("bit64", quietly = TRUE)) warning("Some columns ...") bit64 is correctly Suggests not Depends. It's just unfortunate the intended message wasn't displayed. Santosh, in future please follow the data.table support guide here: https://github.com/Rdatatable/data.table/wiki/Support. r-help is not supposed to be used for package support. The main thing though is thanks for helping me find this bug. Thanks, Matt On Wed, Mar 22, 2017 at 10:22 AM, William Dunlap <wdun...@tibco.com> wrote: > Here is a way to reproduce the problem: > > data.table::fread("9876543210\n") # number bigger than 2^31-1 > V1 > 1: 4.879661e-314 > and your work-around does fix things up > > data.table::fread("9876543210\n", colClasses="numeric") > V1 > 1: 9876543210 > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Wed, Mar 22, 2017 at 9:58 AM, Jeff Newmiller > <jdnew...@dcn.davis.ca.us> wrote: > > You failed to provide a reproducible example, and you posted HTML so the > quality of any answer will be limited by the quality of your question. > > > > My stab at your problem is that you should read ?fread, and in > particular should try using the colClasses argument. > > -- > > Sent from my phone. Please excuse my brevity. > > > > On March 22, 2017 8:52:55 AM PDT, Santosh <santosh2...@gmail.com> wrote: > >>Hi > >> > >>I have been using "fread" utility of "data.table" packge .. on a > >>dataset of > >>about 20 million rows. It's a fantastic package to read datasets. Thank > >>you, Matt D. > >> > >>However, I am faced with a peculiar instance of certain numbers in a > >>column being transformed. > >> > >>In the dataset, a column has values ranging from 1 to 9## > >>(nchar(x)=11, e.g. 98765432109). After using "fread" to read the > >>dataset, > >>values in all the columns are displayed correctly upto the first 1000 > >>rows. > >>If "fread" is applied for reading >1000 rows of the total of 20Million > >>rows, the values in only this (column (having wide range of values) are > >>displayed as x.xxxe-3yy. (e.g. 3.5639877e-324) > >> > >>I tried reading all the columns as "character" and didn't help. > >> > >>Would highly appreciate your assistance! > >> > >>Thanks so much in advance. > >> > >>Best regards, > >>Santosh > >> > >> [[alternative HTML version deleted]] > >> > >>__ > >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>https://stat.ethz.ch/mailman/listinfo/r-help > >>PLEASE do read the posting guide > >>http://www.R-project.org/posting-guide.html > >>and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Validating Minitab's "Expanded Gage R Study" using R and lme4
I'm trying to validate the results of an "Expanded Gage R Study" in Minitab using R and lme4, but I can't get the numbers to match up in certain situations. I can't tell whether my model is wrong, my data is bad, or something else is going on. For instance, here's some data for which the results don't match: https://i.stack.imgur.com/5PCgm.png After running the gage study, these are the results according to Minitab: Study Var %Study Var %Tolerance Source StdDev (SD) (6 * SD) (%SV) (SV/Toler) Total Gage R 1.7627710.5766 100.00 14.36 Repeatability0.0 0.0.000.00 Reproducibility 1.7627710.5766 100.00 14.36 B 0.0 0.0.000.00 A*B1.7627710.5766 100.00 14.36 Part-To-Part 0.0 0.0.000.00 A0.0 0.0.000.00 Total Variation1.7627710.5766 100.00 14.36 But when I mimic Minitab's results by parsing the output from lmer() and doing the arithmetic in Excel, this is what I see: https://i.stack.imgur.com/EGg9F.png The raw output from lmer() was: Linear mixed model fit by REML ['lmerMod'] Formula: y ~ 1 + (1 | A) + (1 | B) + (1 | A:B) Data: d REML criterion at convergence: -100.1 Scaled residuals: Min 1Q Median 3QMax -1.308e-07 -1.308e-07 -1.308e-07 -6.541e-08 1.308e-07 Random effects: Groups NameVariance Std.Dev. A:B (Intercept) 1.333e+00 1.154e+00 B(Intercept) 7.066e-04 2.658e-02 A(Intercept) 2.260e-03 4.754e-02 Residual 2.655e-14 1.629e-07 Number of obs: 8, groups: A:B, 4; B, 2; A, 2 Fixed effects: Estimate Std. Error t value (Intercept)52.17 0.57 91.53 convergence code: 0 Model failed to converge with max|grad| = 0.422755 (tol = 0.002, component 1) Model is nearly unidentifiable: very large eigenvalue - Rescale variables? And the R code that produced that output is: library(lme4) A <- factor(c(1, 1, 2, 2, 2, 1, 2, 1)) B <- factor(c(1, 2, 1, 2, 1, 2, 2, 1)) y <- c(51.356124843620798, 51.356124843620798, 54.8816618912481, 51.356124843620798, 54.8816618912481, 51.356124843620798, 51.356124843620798, 51.356124843620798) d <- data.frame(y, A, B) fm <- lmer(y ~ 1 + (1|A) + (1|B) + (1|A:B), d) summary(fm) For a different measurement with a different response, it's a completely different situation! Given the following data: https://i.stack.imgur.com/cH0bO.png The resulting table from Minitab is: Study Var %Study Var %Tolerance Source StdDev (SD) (6 * SD) (%SV) (SV/Toler) Total Gage R0.1936491.16190 55.901.00 Repeatability 0.0935410.56125 27.000.48 Reproducibility 0.1695581.01735 48.950.88 B 0.1322880.79373 38.190.68 A*B 0.1060660.63640 30.620.55 Part-To-Part 0.2872281.72337 82.921.49 A 0.2872281.72337 82.921.49 Total Variation 0.3464102.07846 100.001.79 And after plugging my R results into Excel, I get exactly the same thing: https://i.stack.imgur.com/jUEAP.png Which was produced by this R code: library(lme4) A <- factor(c(1, 1, 2, 2, 2, 1, 2, 1)) B <- factor(c(1, 2, 1, 2, 1, 2, 2, 1)) y <- c(-49.4, -49.8, -50.1, -50.1, -50.0, -49.9, -50.2, -49.6) d <- data.frame(y, A, B) fm <- lmer(y ~ 1 + (1|A) + (1|B) + (1|A:B), d) summary(fm) That generated the following lmer() summary: Linear mixed model fit by REML ['lmerMod'] Formula: y ~ 1 + (1 | A) + (1 | B) + (1 | A:B) Data: d REML criterion at convergence: -3.8 Scaled residuals: Min 1Q Median 3Q Max -0.7705 -0.6853 -0.1039 0.4379 1.4151 Random effects: Groups NameVariance Std.Dev. A:B (Intercept) 0.01125 0.10607 B(Intercept) 0.01750 0.13229 A(Intercept) 0.08250 0.28723 Residual 0.00875 0.09354 Number of obs: 8, groups: A:B, 4; B, 2; A, 2 Fixed effects: Estimate Std. Error t value (Intercept) -49.8875 0.2322 -214.9 Is the difference attributable to the warnings produced by lmer() about the model failing to converge and being nearly unidentifiable? What could Minitab be doing differently when the measurement data contains only two distinct values? Matt This question is cross-posted to http://stats.stackexchange.com/questions/262170/how-can-i-validate-minitabs-expanded-gage-rr-study-using-open-source-tools __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the postin
[R] class-specific Gini metrics for Random Forest?
library(randomForest) data(iris) fit <- randomForest(Species ~ ., data=iris, importance=TRUE); fit.imp<-importance(fit) fit.imp columns 1-3 of fit.imp show the class-specific variable importance for the Mean Decrease Acuracy measure (MDA). Is there a way to calculate class-specific Gini metrics rather than the default class-specific MDA? Simply setting "importance(fit, type=2)" doesn't do it. I really want to do calculate these metrics. I was about to start trying to code a way to do it, but thought I would ask here first. Many thanks for any help or pointers--I hope I missed something simple. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshape: melt and cast
"1152", "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182", "1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201", "1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219", "1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236", "1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248", "1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268", "1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287", "1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329", "1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355", "1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400", "1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424", "1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439", "1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449", "1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462", "1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487", "1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508", "1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525", "1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538", "1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555", "1556", "1558", "1559"), class = "factor"), RaterName = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("cwormhoudt", "zspeidel"), class = "factor"), SI1 = c(1L, 1L, 1L, 1L, 1L, 1L), SI2 = c(3L, 2L, 2L, 3L, 3L, 2L), SI3 = c(3L, 2L, 3L, 3L, 3L, 2L), SI4 = c(1L, 1L, 1L, 1L, 1L, 1L), SI5 = c(1L, 1L, 1L, 1L, 1L, 1L), SI6 = c(1L, 1L, 1L, 1L, 1L, 1L), SI7 = c(1L, 1L, 1L, 2L, 2L, 1L), SI8 = c(1L, 1L, 1L, 1L, 1L, 1L), SI9 = c(1L, 1L, 1L, 1L, 1L, 1L), SI10 = c(1L, 1L, 1L, 2L, 2L, 1L), SI11 = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("QCode", "PID", "RaterName", "SI1", "SI2", "SI3", "SI4", "SI5", "SI6", "SI7", "SI8", "SI9", "SI10", "SI11"), row.names = 2456:2461, class = "data.frame") I am trying to use the melt and cast functions to re-arrange to have column names QCode, PID, sItem, cwormhoudt, zpeidel. Under each of the last two columns I want the values that correspond to each of RaterNames. So, I melt the data like this: mratings = melt(ratings, variable_name="sItem") Then cast the data like this: > outData = cast(mratings, QCode + PID + sItem ~ RaterName) Aggregation requires fun.aggregate: length used as default But the value columns appear to be displaying counts and not the original values. > dput(head(outData)) structure(list(QCode = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("APPEAR", "FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class = "factor"), PID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1123", "1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152", "1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182", "1183", "1191", "11
[R] reshape: melt and cast
Hi, I have data that looks like this: *> head(ratings) QCode PID RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7 SI8 SI9 SI10 SI111 GUILT 1123 cwormhoudt 2 2 3 1 1 1 3 3 3 212 LOVE 1123 cwormhoudt 1 2 3 2 1 1 1 1 11 33 GUILT 1136 cwormhoudt 1 2 3 1 1 1 2 3 2214 LOVE 1136 cwormhoudt 1 2 3 1 1 1 1 1 1125 GUILT 1137 cwormhoudt 2 2 2 1 1 1 2 3 1216 LOVE 1137 cwormhoudt 1 3 4 1 1 1 1 1 114* *> tail(ratings) QCode PID RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7 SI8 SI9 SI10 SI112456FUN 1555 zspeidel 1 3 3 1 1 1 1 1 1112457FUN 1556 zspeidel 1 2 2 1 1 1 1 1 1112458FUN 1558 zspeidel 1 2 3 1 1 1 1 1 1112459 APPEAR 1558 zspeidel 1 3 3 1 1 1 2 1 1212460 APPEAR 1559 zspeidel 1 3 3 1 1 1 2 1 1212461FUN 1559 zspeidel 1 2 2 1 1 1 1 1 111* I am trying to use the melt and cast functions to re-arrange it to look like this: * QCode PID sItem cwormhoudt zspeidel1 APPEAR 1123 SI1 112 APPEAR 1123 SI2 413 APPEAR 1123 SI3 124 APPEAR 1123 SI4 315 APPEAR 1123 SI5 116 APPEAR 1123 SI6 13* So, I melt the data like this: *mratings = melt(ratings, variable_name="sItem")* Then cast the data like this: *> outData = cast(mratings, QCode + PID + sItem ~ RaterName)Aggregation requires fun.aggregate: length used as default* But the value columns appear to be displaying counts and not the original values: *> head(outData) QCode PID sItem cwormhoudt zspeidel1 APPEAR 1123 SI1 112 APPEAR 1123 SI2 113 APPEAR 1123 SI3 114 APPEAR 1123 SI4 115 APPEAR 1123 SI5 116 APPEAR 1123 SI6 1 1> which(outData$zpeidel==3)integer(0)* How to I prevent cast from aggregating the data according to counts? Am I doing something wrong? Thanks in advance. MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hurdle control and optim
I was hoping someone may be able to help with the following. I fit the model below using the pscl package. I am modelling catch data (about 17,000 entry points) so lots of zero's fit.hurdle.bin = hurdle(Catch ~ Beach + Region + Year+ Decade + Month + Season + Whale+ Sex + Size+ meantemp + meanviz + offset(log(Length.nets..km.)), dist=poisson,zero.dist=binomial,link=logit,trace=T) The model output tells me that: Warning message: In sqrt(diag(object$vcov)) : NaNs produced (against year) I then use hurdle control with L-BFGS-B to set some parameter controls to solve this issue, but get the warning message: L-BFGS-B needs finite values of 'fn' In addition: Warning message: In optim(fn = countDist, gr = countGrad, par = c(start$count, if (dist == : method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol' How do I write the script for Hurdle control to solve these issues? Any help would be really appreciated All the best Matt Dr. Matt Dicken Senior Scientist Telephone: 0315660400 | Fax: 0315660493 | Email: m...@shark.co.za Physical Address: 1a Herrwood Drive, Umhlanga Rocks, 4320 | www.shark.co.za [http://www.shark.co.za/ImageHandler.ashx?fguid=2c107195-209c-4fb2-aae5-31e45ce5de1a] Connect with us on social media: [KZNSB Facebook] https://www.facebook.com/kznsb [KZNSB Twitter] https://twitter.com/KznSharks?lang=en [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hurdle control and optim
Dear Achim, Apologies for the cross posting and confusion. I really appreciate the help All the best Matt -Original Message- From: Achim Zeileis [mailto:achim.zeil...@r-project.org] Sent: 17 August, 2015 10:06 PM To: Matt Dicken m...@shark.co.za Cc: r-help@r-project.org Subject: Re: [R] hurdle control and optim Please refrain from cross-posting. The same request was sent to the author of hurdle(), R-help, and StackOverflow (where it was already answered). Also, do provide self-contained and reproducible code as would be appropriate in any of the three cases. On Mon, 17 Aug 2015, Matt Dicken wrote: I was hoping someone may be able to help with the following. I fit the model below using the pscl package. I am modelling catch data (about 17,000 entry points) so lots of zero's fit.hurdle.bin = hurdle(Catch ~ Beach + Region + Year+ Decade + Month + Season + Whale+ Sex + Size+ meantemp + meanviz + offset(log(Length.nets..km.)), dist=poisson,zero.dist=binomial,link=logit,trace=T) The model output tells me that: Warning message: In sqrt(diag(object$vcov)) : NaNs produced (against year) I then use hurdle control with L-BFGS-B to set some parameter controls to solve this issue, but get the warning message: L-BFGS-B needs finite values of 'fn' In addition: Warning message: In optim(fn = countDist, gr = countGrad, par = c(start$count, if (dist == : method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol' How do I write the script for Hurdle control to solve these issues? Any help would be really appreciated All the best Matt Dr. Matt Dicken Senior Scientist Telephone: 0315660400 | Fax: 0315660493 | Email: m...@shark.co.za Physical Address: 1a Herrwood Drive, Umhlanga Rocks, 4320 | www.shark.co.za [http://www.shark.co.za/ImageHandler.ashx?fguid=2c107195-209c-4fb2-aae 5-31e45ce5de1a] Connect with us on social media: [KZNSB Facebook] https://www.facebook.com/kznsb [KZNSB Twitter] https://twitter.com/KznSharks?lang=en [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reproducing a 3d yield curve plot from New York Times
, 1325203200, 1327968000, 1330473600, 1333065600, 1335744000, 1338422400, 1340928000, 1343692800, 1346371200, 1348790400, 1351641600, 1354233600, 1356912000, 1359590400, 1362009600, 1364428800, 136728, 1369958400, 1372377600, 1375228800, 1377820800, 1380499200, 1383177600, 1385683200, 1388448000, 1391126400, 1393545600, 1396224000, 1398816000, 1401408000, 1404086400, 1406764800, 1409270400, 1412035200, 1414713600, 1417132800, 1419984000, 1422403200 ), tzone = UTC, tclass = Date), .Dim = c(301L, 11L), .Dimnames = list( NULL, c(1M, 3M, 6M, 1Y, 2Y, 3Y, 5Y, 7Y, 10Y, 20Y, 30Y))) chartSeries3d0(term.structure,r=1,col=c(lightblue,darkblue), border=NA,theta=45,ltheta=0,shade=0.15,smoother=1,phi=15,scale=FALSE,expand=0.75) Can anyone suggest a different package to work with to get closer to the above-mentioned output? I'm interested in figuring out how to smooth the color transitions, add the grid/gridlines and use a different set of color gradients when the values are negative. Also, I realize that my colors transition along the wrong axis (1m, 3m, etc) rather than along y. Thanks in advance. I've tried to find a reference to this in the archives and have come up empty. As well, I've tried to make this reproducible. Matt __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Processing key_column, begin_date, end_date in R
Hi, I am trying to process a large dataset in R. The dataset contains the following three columns: key_column - a unique key identifier begin_date - the start date of the active period end_date - the end date of the active period Example data is here: key_column,begin_date,end_date 123456,2013-01-01,2014-01-01 123456,2013-07-01,2014-07-01 789102,2012-03-01,2014-03-01 789102,2015-02-01,2016-02-01 789102,2015-02-06,2016-02-06 I want to build a condensed table of key_column and begin_date's and end_date's. As you can see in the example data above, some begin and end date periods overlap with begin_date and end_date pairs for the same key_column. In situations where overlap exists I want to have one record for the key_column with the min(begin_date) and the max(end_date). Can anyone help me build the commands to process this data in R? Thanks, Matt -- Matt Gross gro...@gmail.com 503.329.4545 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] global environment
Rewrite it with spaces between your assigns and numbers. This line is unclear to me: if(rst[i]-3 rst[i]=-3) Is it supposed to be rst[i] - 3, or rst[i] -3? R might be misinterpreting what you're trying to get it to do. On Mon, Jan 12, 2015 at 1:18 AM, Methekar, Pushpa (GE Transportation, Non-GE) pushpa.methe...@ge.com wrote: Hi I am trying to make some changes in data frame and return it to function .this is my function rm.outliers = function(model,xsys) { rst = rstudent(model) outliers-vector(numeric,10) xsys-xsys for(i in 1:length(rst)) { if(rst[i]-3 rst[i]=-3) { #print(this is not outlier) print(i) } else { print(this is an outlier) print(i) outliers[i]-c(i) print(outliers) } i-i+1 } xsys-xsys[-outliers,] print( printing rows) nrow(xsys) return(xsys) } After returning xsys dataframe its not making changes in my global environment data frame. I tried assign and - but no use. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parsing Google Finance page data?
Hi, I'm wondering if anyone can point me to code to parse data on Google Finance pages, i.e. parse the results of a URL request such as this http://www.google.com/finance?q=apple I know how to return the contents of the page; it's figuring out the best tools to parse it that I'm interested in and hopefully someone has already done this. (For what it is worth, the only info I am looking for are the ticker, exchange, currency and Mkt Cap datapoint) Thanks in advance for any help - scraping is not my strong suit. Matt --- This email is free from viruses and malware because avast! Antivirus protection is active. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing Google Finance page data?
FWIW, this is the kludge I came up with. The idea is that I only know the name of the company and not the ticker/exchange. So the following admittedly doesn't work in all cases (e.g. Time Warner). So if anyone alternatively knows how to return a list of tickers/exchanges of companies matching a name, that would be helpful. (Though that question should probably go to the finance list). In any case, thanks in advance for any thoughts put towards this. Matt library(RCurl) library(xts) library(XML) #want to return results of this # http://www.google.com/finance?q=ibm coname - ibm baseurl -paste(http://www.google.com/finance?q=,coname,sep=;) # Read and parse HTML file doc.html = htmlTreeParse(baseurl, useInternalNodes=TRUE) tables - readHTMLTable(doc.html,which=2,as.data.frame=T,stringsAsFactors = FALSE) mktcap - tables[4,2] doc.text = unlist(xpathApply(doc.html, '//script', xmlValue)) block - doc.text[11] exchangeticker-unlist(strsplit(block,'\n'))[11] doc.text = unlist(xpathApply(doc.html, '//div', xmlValue)) currency - doc.text[60] print(mktcap) print(exchangeticker) print(currency) --- This email is free from viruses and malware because avast! Antivirus protection is active. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem Invoking System Commands from R
Hello, First please keep in mind I am not a programmer and know very little about R. I am running the 64bit version of R on a Windows 8.1 machine. I am trying to run a script (which I have successfully run in the past) to download some weather data from a NOAA ftp site. When I attempt to run the following command: system(wget -P data/raw ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz;) it returns status 127, which as I understand simply means the command will not run. If I go directly to my command prompt in Windows, navigate to my working director, and run wget -P data/raw ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz the command runs and the file downloads without a problem. Playing around, it seems I can't invoke any system commands from R. Even a simple system(dir) returns status 127. I have moved to a new computer since I last successfully ran this script...I'm wondering if this might be a permissions issue or other security setting preventing me from invoking system commands. Any ideas? -Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem Invoking System Commands from R
I appreciate the feedback. 1) The paths are properly set...I only wonder if the spaces in the path to wget.exe are problematic for R. The full path (C:\\Program Files (x86)\\GnuWin32\\bin) is properly included in the return list for Sys.getenv(PATH). Sys.which(wget) returns: C:\\PROGRA~2\\GnuWin32\\bin\\wget.exe Note that in this return, the folder 'Program Files (x86)' was truncated. Not sure if that is a problem in this. Also as mentioned, wget works fine directly from the Windows CMD line, so it strikes me as an issue calling a system command from R as opposed to a problem with the command itself. 2) 'dir' is a recognized command at the Windows command line...but it is somewhat irrelevant as I was only using it to determine whether any calls to the Windows command line from R were working...it is not essential to the script. One further point, I booted up my old machine last night and reinstalled R and wget...and was successfully able to run the script. Old machine is Windows XP versus Windows 8.1 on my new machine. Perhaps this confirms it is a Windows permission issue and not an R problem? -Matt On Saturday, October 11, 2014 3:00 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: Please do follow the posting guide and not sent HTML: it gets mangled. There are two issues here: 1) Paths. Use Sys.which(wget) to see if the command is on your path. I suspect it is not, and you need to set the path when running R in the same way as is done for your shell. Compare the setting of PATH in your shell with Sys.getenv(PATH) in R, and use Sys.getenv() to set it (or do so on the shortcut used to start R: see the rw-FAQ). 3) AFAIR 'dir' is not a system command. See ?system (on Windows) and note that shell() is required for some commands: this is one. These are not R issues, and you may need to seek local Windows help. On 11/10/2014 02:20, Matt Borkowski wrote: Hello, First please keep in mind I am not a programmer and know very little about R. I am running the 64bit version of R on a Windows 8.1 machine. I am trying to run a script (which I have successfully run in the past) to download some weather data from a NOAA ftp site. When I attempt to run the following command: system(wget -P data/raw ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz;) it returns status 127, which as I understand simply means the command will not run. If I go directly to my command prompt in Windows, navigate to my working director, and run wget -P data/raw ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz the command runs and the file downloads without a problem. Playing around, it seems I can't invoke any system commands from R. Even a simple system(dir) returns status 127. I have moved to a new computer since I last successfully ran this script...I'm wondering if this might be a permissions issue or other security setting preventing me from invoking system commands. Any ideas? -Matt [[alternative HTML version deleted]] -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with continuous color plot
Hi, I have a matrix of data, with the rows representing observations and the columns representing various values that the observation can take on. In other words, each row can be thought of as a sampling of the density function/histogram associated with the range of values for that observation. I'd like to graph these with a shaded color, rather than as lines. So a given observation would have the darkest shade at the mean and the shading would lighten for values that approached the tails. In a sense this is like a ribbon chart, but where there are many confidence bands. I think the example near the bottom of this page http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/ starts to get at what I want. But when I tried to get a ribbon, I get an error message saying that Error: Aesthetics can not vary with a ribbon Can anyone point me to an example that accomplishes my task, or give me some ideas as to how to code this? Below is a reproducible dataset and the code I ran that generated the above error. And apologies in advance if I have overlooked some obvious source - I'm not exactly sure what keywords to search for. Regards, Matt testdataset - structure(c(0.703482475602795, 0.708141442616021, 0.696373713631662, 0.670284015871304, 0.675183812793659, 0.690440437259122, 0.717483375152826, 0.775328205198994, 0.848374059782512, 0.869939471712489, 0.86329313061477, 0.842138830353923, 0.819853961383293, 0.808038546509378, 0.826626282345039, 0.855428819162732, 0.873943618483253, 0.906412218904192, 0.95345525957727, 0.941481792397259, 0.923791753474186, 0.909206164221341, 0.847283523824235, 0.774333551860785, 0.723440114819687, 0.653247411286407, 0.585889004137383, 0.516531935718585, 0.458855598305008, 0.422596378188962, 0.385800210249005, 0.363663809831211, 0.703482475602795, 0.708055808109959, 0.696379276680681, 0.686643131558789, 0.702628930558265, 0.736010723583024, 0.790795207667811, 0.843997296035071, 0.872447231982615, 0.876357159885425, 0.852095141662599, 0.815122741092172, 0.759163100114952, 0.737079598996168, 0.755626127703219, 0.76375495269533, 0.757290640161052, 0.754301244147121, 0.738872719902144, 0.712590028244082, 0.707690675037336, 0.707234385372842, 0.708720518303698, 0.723271948541464, 0.738173079905318, 0.772161522113349, 0.776237486574842, 0.775666977944939, 0.764229462885737, 0.758916671383124, 0.742887393474484, 0.741362343479079, 0.703482475602795, 0.70722192044612, 0.694934601341247, 0.675623005679584, 0.67355293987199, 0.67514195581405, 0.701338223542176, 0.770084545123592, 0.82661391194, 0.815331595124185, 0.801265437257298, 0.768736104243487, 0.698903427959817, 0.654393072393584, 0.646507677289504, 0.606308031283892, 0.574521529688064, 0.550931914275617, 0.518538683619987, 0.495773346159491, 0.482784058725618, 0.473031502762785, 0.462940836756943, 0.455472910452526, 0.457374752189383, 0.468449683385787, 0.469177346159405, 0.47981744053419, 0.500517935694715, 0.521161553352487, 0.538278248678118, 0.545834896270532, 0.703482475602795, 0.707475643569319, 0.695699528962731, 0.695460540915422, 0.705063229294573, 0.694190083263775, 0.676451221936696, 0.66113162065, 0.627150885842318, 0.592467979293877, 0.556197511727567, 0.524883713023224, 0.484571801496662, 0.427784904562, 0.370137413134906, 0.331233866457343, 0.292181528642806, 0.265504971226103, 0.239129968439056, 0.21258454640671, 0.184521419432522, 0.160633576032345, 0.135729972994914, 0.115111431576686, 0.0933784744252792, 0.0672765522562478, 0.0397992726679255, 0.0118179662548541, NA, NA, NA, NA, 0.703482475602795, 0.70791132542366, 0.696508162877812, 0.672357035115831, 0.679831378223931, 0.702075998432084, 0.736057349706643, 0.759252979404642, 0.739391321260192, 0.706608353324493, 0.65348693474, 0.607986236497692, 0.600942686427268, 0.602450590412635, 0.594096281507138, 0.598414292518021, 0.570859444977738, 0.50462737404968, 0.441225469913529, 0.37010584373766, 0.299554326292306
Re: [R] Help with continuous color plot
No, I don't think so. And I've wondered if I described the problem clearly, so I put together the following hack, which seems to be what I want : #create a matrix to hold the values corresponding to various percentiles vals-matrix(0,32,21) #for each row in the data, collect info on the distribution for(i in 1:32){ obs - testdataset[i,] vals[i,] - quantile(obs, probs=seq(0,1,0.05)) } #pick the last observation to get a distrbution of colors cols - sort(densCols(vals[32,])) #set up a blank plot matplot(vals, type=n, xlab = 'yrs', ylab = 'Ratio', main = 'Projected ratios') #plot confidence bands as polygons, ideally overlaying light to dark for (i in 1:10){ lines(vals[,i],col=cols[22-i]) lines(vals[,22-i],col=cols[22-i]) polygon(c(seq(1:32),rev(seq(1:32))),c(vals[,22-i],rev(vals[,i])),col=cols[22-i],border=NA) } #plot a line for the average case lines(vals[,11],col=black) If anyone can suggest a more efficient/effective/better/etc/etc way of doing this, I'd be grateful. In a nutshell, I am trying to find a visually clean way of showing the output of a Monte Carlo analysis. Thanks again for everyone's attention. Matt On 2014-09-24 14:14, Federico Lasa wrote: Does this resemble what you're after? library(reshape2) tst - melt(testdataset) library(ggplot2) ggplot(tst, aes(x=Var1, y=Var2, fill=value)) + geom_tile() + scale_fill_gradient2(low=white, high=white, mid=scales::muted(blue), midpoint=0.6148377) On Wed, Sep 24, 2014 at 10:26 AM, m...@considine.net wrote: Hi, I have a matrix of data, with the rows representing observations and the columns representing various values that the observation can take on. In other words, each row can be thought of as a sampling of the density function/histogram associated with the range of values for that observation. I'd like to graph these with a shaded color, rather than as lines. So a given observation would have the darkest shade at the mean and the shading would lighten for values that approached the tails. In a sense this is like a ribbon chart, but where there are many confidence bands. I think the example near the bottom of this page http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/ starts to get at what I want. But when I tried to get a ribbon, I get an error message saying that Error: Aesthetics can not vary with a ribbon Can anyone point me to an example that accomplishes my task, or give me some ideas as to how to code this? Below is a reproducible dataset and the code I ran that generated the above error. And apologies in advance if I have overlooked some obvious source - I'm not exactly sure what keywords to search for. Regards, Matt testdataset - structure(c(0.703482475602795, 0.708141442616021, 0.696373713631662, 0.670284015871304, 0.675183812793659, 0.690440437259122, 0.717483375152826, 0.775328205198994, 0.848374059782512, 0.869939471712489, 0.86329313061477, 0.842138830353923, 0.819853961383293, 0.808038546509378, 0.826626282345039, 0.855428819162732, 0.873943618483253, 0.906412218904192, 0.95345525957727, 0.941481792397259, 0.923791753474186, 0.909206164221341, 0.847283523824235, 0.774333551860785, 0.723440114819687, 0.653247411286407, 0.585889004137383, 0.516531935718585, 0.458855598305008, 0.422596378188962, 0.385800210249005, 0.363663809831211, 0.703482475602795, 0.708055808109959, 0.696379276680681, 0.686643131558789, 0.702628930558265, 0.736010723583024, 0.790795207667811, 0.843997296035071, 0.872447231982615, 0.876357159885425, 0.852095141662599, 0.815122741092172, 0.759163100114952, 0.737079598996168, 0.755626127703219, 0.76375495269533, 0.757290640161052, 0.754301244147121, 0.738872719902144, 0.712590028244082, 0.707690675037336, 0.707234385372842, 0.708720518303698, 0.723271948541464, 0.738173079905318, 0.772161522113349, 0.776237486574842, 0.775666977944939, 0.764229462885737, 0.758916671383124, 0.742887393474484, 0.741362343479079, 0.703482475602795, 0.70722192044612, 0.694934601341247, 0.675623005679584, 0.67355293987199, 0.67514195581405, 0.701338223542176, 0.770084545123592, 0.82661391194, 0.815331595124185, 0.801265437257298, 0.768736104243487, 0.698903427959817, 0.654393072393584, 0.646507677289504, 0.606308031283892, 0.574521529688064, 0.550931914275617, 0.518538683619987, 0.495773346159491, 0.482784058725618, 0.473031502762785, 0.462940836756943
Re: [R] plotly
Hey Shane, Sorry you're having trouble. The quick start is here and walks through installation: https://plot.ly/r/. A note. If you're on Windows, you'll need Rtools to install devtools: http://cran.rstudio.com/bin/windows/Rtools/. As Sarah noted, Plotly isn't on CRAN. If you're having trouble, please let us know and we're happy to try and help, or open an issue on GitHub: https://github.com/ropensci/plotly/issues. M On Mon, Jul 21, 2014 at 4:05 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Monday, July 21, 2014, Shane Carey careys...@gmail.com wrote: Hey, What version of R is required to use the plotly library? I have R version 3.0.1 and it will not allow me to install the devtools package or the ploty package. I have googled and searched to see what version of R I should be running but could not find anything. It's always a good idea to upgrade to the current release of R before asking questions like that (3.1.1). But in general, packages on CRAN clearly state what version of R is needed, as in http://cran.r-project.org/web/packages/devtools/index.html devtools: Tools to make developing R code easier Collection of package development tools Version:1.5Depends:R (⥠3.0.2) For packages not on CRAN, like plotly, you may need to download the package and check the DESCRIPTION file. Sarah -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotly and rOpenSci: R and ggplot2 interactive, online, collaborative plotting
Hello R help, My name is Matt, and I'm a co-founder at Plotly http://plot.ly, an online graphing and analytics project. We're building an R library http://plot.ly/r as part of the rOpenScihttp://ropensci.orgproject. You can use it to make interactive, web-based R and ggplot2 plots. The plots are shareable, embeddable, and drawn with D3 (a JS graphing library). You can also make R and ggplot2 plots into collaborative, web-based plots. The project is still definitely in beta, so we'd appreciate hearing your suggestions and issues. Here is how the ggplot2 sharing works: ropensci.org/blog/2014/04/17/plotly/ Another fun aspect of it is that you can collaboratively plot in R, Python, MATLAB, and from our web app. That means you could work from R with a team working from Excel and work on the same plots and data. And your data and plots always stay together in your files. Here's how that looks in an IPython Notebook: http://nbviewer.ipython.org/gist/msund/61cdbd5b22c103fffb84 We'd love to hear your thoughts, feedback, and suggestions. Our goal is to be a GitHub for sharing and collaborating on data and plots. We're on GitHubhttp://github.com/ropensci/plotly, and eager to hear from you. Thanks so much for any and all help and advice. All the best, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotly Beta: Online Plotting with R
Hi R Users, My name is Matt, and I'm a part of Plotly http://plot.ly. We recently released an R plotting library http://plot.ly/api/r for making publication-quality graphs online. We wanted to let the folks on this list know. A basic summary: - Make publication-quality, online plots with a GUI and R. - Fits, error bars, stats, and functions. - Embed interactive graphs in an iframe (Washington Post examplehttp://washingtonpost.com/blogs/wonkblog/wp/2013/06/14/do-low-taxes-on-the-rich-leave-the-middle-class-with-lower-wages/ ). - Collaborative, so you can edit with others, comment on your graphs, and save revision history. - Free for public use, and you own your data (like GitHub). If you're interested, here are a few examples you can check out of: multiple axes scales with old faithful datahttp://blog.plot.ly/post/69647810381/multiple-axes-scales-with-old-faithful-data, an IPython http://nbviewer.ipython.org/gist/fonnesbeck/8495259 that has an Rmagic example, and a posthttp://www.r-bloggers.com/plotly-beta-collaborative-plotting-with-r/on r-bloggers. We'd love your feedback, advice, and thoughts. As a new project, expert insights go a long way for us, so please let us know what you think. Happy plotting, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to assign names to global data frames created in a function
I have several data frames containing similar data. I'd like to pass these data frames to a function for processing. The function would create newly named global data frames containing the processed data. I cannot figure out how to assign names to the data frames in Step 1 or Step 2 in the following example: # sample function in pseudo code processdf - function(df, prefix) { # df - data frame containing data for processing # prefix - string to become the first part of the names of the resulting data frames # Step 1 - processs df into several subsets df1 - subset(df, df$cond1 df$cond2 ...) df2 - subset(df, df$cond3 df$cond4 ...) df3 - subset(df, df$cond5 df$cond6 ...) # and so onfor many more steps with resulting data frames # Step 2 - rename the resulting global data frames rename df1 to prefix + cond1cond2 rename df2 to prefix + cond3cond4 rename df3 to prefix + cond5cond6 # and so on for the remaining data frames } Example using data frames: frame1 and frame2: processdf(frame1, frame1) # produces these data frames: frame1cond1cond2 frame1cond3cond4 frame1cond5cond6 processdf(frame2, frame2) # produces these data frames: frame2cond1cond2 frame2cond3cond4 frame2cond5cond6 Thank you for your thoughts, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Computing Median for Subset of Data
From my larger data set I created a subset of it by using: subset_1 - subset(timeuse, IndepTrans = 1, Physical = 1) where my larger data set is timeuse and the smaller subset is subset_1. The subset was conditioned on IndepTrans equaling 1 in the data and Physical equaling 1 as well. I want to be able to compute the median of a variable first for the larger data set timeuse then for the subset file subset_1. How do I identify to R which data set I'm wanting the median computed for? I've tried many possibilities but for some reason can't figure it out. Thanks, Matt. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help searching a matrix for only certain records
Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote: Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string SAO or FL-15. My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... j - 1 for (i in 1:nrow(dataset)) { if(dataset$REC.TYPE[j] != SAO dataset$RECTYPE[j] != FL-15) { dataset - dataset[-j,] } else { j - j+1 } } After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help searching a matrix for only certain records
I appreciate all the feedback on this. I ended up using this line to solve my problem, just because I stumbled upon it first... alldata - alldata[alldata$REC.TYPE == SAO | alldata$REC.TYPE == FM-15,,drop=FALSE] But I think Jim's solution would work equally as well. I was a bit confused by the relative complexity of the data frames solution, as it seems like more steps than necessary. Thanks again for the input! -Matt Again, thanks for the feedback! --- On Sun, 3/3/13, arun smartpink...@yahoo.com wrote: From: arun smartpink...@yahoo.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: R help r-help@r-project.org, jim holtman jholt...@gmail.com Date: Sunday, March 3, 2013, 1:29 PM HI, You could also use ?data.table() n- 30 set.seed(51) mat1- as.matrix(data.frame(REC.TYPE= sample(c(SAO,FAO,FL-1,FL-2,FL-15),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE)) dat1- as.data.frame(mat1,stringsAsFactors=FALSE) table(mat1[,1]) # # FAO FL-1 FL-15 FL-2 SAO #60046 60272 59669 59878 60135 system.time(x1 - subset(mat1, grepl((SAO|FL-15), mat1[, REC.TYPE]))) #user system elapsed # 0.076 0.004 0.082 system.time(x2 - subset(mat1, mat1[, REC.TYPE] %in% c(SAO, FL-15))) # user system elapsed # 0.028 0.000 0.030 system.time(x3 - mat1[match(mat1[, REC.TYPE] , c(SAO, FL-15) , nomatch = 0) != 0 ,, drop = FALSE] ) #user system elapsed # 0.028 0.000 0.028 table(x3[,1]) # #FL-15 SAO #59669 60135 library(data.table) dat2- data.table(dat1) system.time(x4- dat2[match(REC.TYPE,c(SAO, FL-15),nomatch=0)!=0,,drop=FALSE]) # user system elapsed #0.024 0.000 0.025 table(x4$REC.TYPE) #FL-15 SAO #59669 60135 A.K. - Original Message - From: jim holtman jholt...@gmail.com To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org Sent: Sunday, March 3, 2013 11:52 AM Subject: Re: [R] Help searching a matrix for only certain records If you are using matrices, then here is several ways of doing it for size 300,000. You can determine if the difference of 0.1 seconds is important in terms of the performance you are after. It is taking you more time to type in the statements than it is taking them to execute: n - 30 testdata - matrix( + sample(c(SAO , FL-15, Other), n, TRUE, prob = c(1,2,1000)) + , nrow = n + , dimnames = list(NULL, REC.TYPE) + ) table(testdata[, REC.TYPE]) FL-15 Other SAO 562 299151 287 system.time(x1 - subset(testdata, grepl((SAO |FL-15), testdata[, REC.TYPE]))) user system elapsed 0.17 0.00 0.17 system.time(x2 - subset(testdata, testdata[, REC.TYPE] %in% c(SAO , FL-15))) user system elapsed 0.05 0.00 0.05 system.time(x3 - testdata[match(testdata[, REC.TYPE] + , c(SAO , FL-15) + , nomatch = 0) != 0 + ,, drop = FALSE] + ) user system elapsed 0.03 0.00 0.03 identical(x1, x2) [1] TRUE identical(x2, x3) [1] TRUE On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com wrote: there are way more efficient ways of doing many of the operations , but you probably won't see any differences unless you have very large objects (several hunfred thousand entries), or have to do it a lot of times. My background is in computer performance and for the most part I have found that the easiest/mostbstraight forward ways are fine most of the time. a more efficient way might be: testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ] you can always use 'system.time' to determine how long actions take. for multiple comparisons use %in% Sent from my iPad On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote: Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to modify it it to search for both my acceptable conditions... testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE] -Matt --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote: From: jim holtman jholt...@gmail.com Subject: Re: [R] Help searching a matrix for only certain records To: Matt Borkowski mathias1...@yahoo.com Cc: r-help@r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset - subset(dataset, grepl((SAO |FL-15
Re: [R] Amelia algorithm
Hi Martin, I helped to develop Amelia, so I can try to take a shot. In a non-mathematical way, Amelia works by filling in missing values with imputed values that are consistent with the observed relationships in the data, plus some random noise. Thus, Amelia creates multiple imputed datasets that have no missingness (the original observed cells remain the same across each imputation, but the filled-in values vary from imputed dataset to dataset) and have the same relationships between and within variables as the original observed data. The difficult part of the problem is estimating the relationships of the observed data since it has all of that missing data in it (the dataset looks like Swiss cheese). We use a EM algorithm to estimate these relationships, but those details are (somewhat) less important. You can find more resources, including a number of papers describing the methods at our webpage: http://gking.harvard.edu/amelia Hope that helps! Cheers, matt. ~~~ Matthew Blackwell Assistant Professor of Political Science University of Rochester url: http://www.mattblackwell.org On Mon, Jan 7, 2013 at 6:27 PM, zGreenfelder zgreenfel...@gmail.com wrote: On Mon, Jan 7, 2013 at 4:29 PM, Martin lh...@gmx.net wrote: Dear all. First of all, my english isn't verry good, but I hope I can convey my concern. I've a general question about the Amelia algorithm. I'm no mathematician or statistician, but I had to use R and impute and analyse some data, and Amelia showed results that fitted my expectations. I'll have to defend my choice soon, but I haven't totally grasped what Amelia does. I'm particularly interested in a simple as possible explanation in how Amelia imputation works. I've read that it uses a bootstrapping-based algorithm, but how does it chose the values? The data had mainly value 0 (chemical concentrations, water temperature and pH-value). Regards Martin I'm pretty new here, but a quick google search suggests that perhaps http://gking.harvard.edu/amelia (and maybe google translate) might have some decent pointers for you. I poked at the documentation from that site (a pdf file), and it's quite intense on the mathematics, you may get more from it than I could. there's also a link there to a separate, for thisspecific package/algorithm (it seems to be called Amel ia II, I'm assuming this is the same that you used, if I'm off .. sorry about that) HTH -- Even the Magic 8 ball has an opinion on email clients: Outlook not so good. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Oracle Approximating Shrinkage in R?
Hi, Can anyone point me to an implementation in R of the oracle approximating shrinkage technique for covariance matrices? Rseek, Google, etc. aren't turning anything up for me. Thanks in advance, Matt Considine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing gage RR study in R w/more than 2 factors
On Mon, Nov 19, 2012, at 16:31, Bert Gunter wrote: I believe that you need to consult a local statistician, as there are likely way too many statistical issues here that you do not fully understand. Alternatively, try posting to a statistical list like stats.stackexchange.com, as I think most of your issues are primarily statistical, not R related. Yes, you are correct. I've actually been working with a statistician within my organization, but the dilemma is that he's a stats guy who knows Minitab, and I'm a software guy who's trying to deploy some tools that are dependent on R. I've basically been trying to match up the output of R with the output of Minitab to check my work. Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing gage RR study in R w/more than 2 factors
On Mon, Nov 19, 2012, at 18:26, David Winsemius wrote: My guess is that you do not understand the meaning of a random factor. I certainly did not when I first encountered it. All my training had been with ordinary regression and analysis of variance. These are methods for what in mixed models are fixed effects. My opinion is that these terms are completely confusing to the new student of this sort of analysis. You're absolutely right---the distinction of fixed vs. random factors is confusing. However, I was under the impression that all factors in a gage RR study were random, since we're trying to determine the sources of variability on the system. My guess is the you may just want the output of: lm( vals ~ f1 * f2 * f3, data = yourdat) I'm trying to get the variance component estimates, and from there, I can calculate the percent tolerance and other interesting statistics. It doesn't look like lm gives me that information, though. FWIW, your formula is the same as what I'm feeding into aov, and the ANOVA table output *does* match up with what Minitab is producing. Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Performing gage RR study in R w/more than 2 factors
Hi everyone, I'm fairly new to R, and I don't have a background in statistics, so please bear with me. ;-) I'm dealing with 2^k factorial designs, and I was just wondering if there's any way to analyze more than two factors of a gage RR study in R. For example, Minitab has an expanded gage RR function that lets you include up to eight additional factors besides the usual two that are present in gage studies (parts and operators). If I wanted to include n additional random factors, is there a package or built-in functionality that will allow me to do that? I've been experimenting with the SixSigma package, and that has a ss.rr method which works great---as long as your experiment only contains two factors. I've also been using lmer from lme4 to fit a linear model of my experiment, but the standard deviations generated by lmer don't match what I'm seeing in Minitab. Since all my factors are random, the formula I'm using looks like this: vals ~ 1 + (1|f1) + (1|f2) + (1|f3) + (1|f1:f2) + (1|f1:f3) + (1|f2:f3) What am I doing wrong, and how can I fix it? Thanks, Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Opening SAS file using read.sas7bdat() function in sas7bdat library.
Thanks for the helpful comments from others. The KNOWNHOST variable lists the types of file that are known to work with the read.sas7bdat function. It's likely that most files written on Windows platforms will work, even if not listed in KNOWNHOST. If you're feeling experimental, you might just comment the lines that test against the KNOWNHOST list. Unfortunately, it appears that the file formatting depends on the system where is was originally written. The hypothesis is that sas7bdat files were originally no more than a memory dump of a C structure, or similar. Because C structures may be laid out differently by different compilers (i.e., on different platforms), this may have led to the difficulty apparent here. Regards, Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any good R server-with connection examples
I want to connect R with HTML/PHP pages to take input from user,do some statistical processing on it show results to HTML page again. I search on net,i got Rserve package,but examples are mainly for java langaure not for PHP i am wondering how to connect it to PHP-Apache-MySQL Is there any good tutorial/video which will tell me how to do that ? At least tell me logical way how to use it ? Check out http://rapache.net/ rApache connects R and the Apache 2 web server, such that R can act as a server-side scripting language, like PHP. This may be the easiest way, using R, to take user input from the web browser. The site has some decent documentation and links to examples. --Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unable to run spatial lag and error models on large data
Hi: First my apologies for cross-posting. A few days back I posted my queries ar R-sig-geo but did not get any response. Hence this post. I am working on two parcel-level housing dataset to estimate the impact of various variables on home sale prices. I created the spatial weight metrics in ArcGIS 10 using sale year of four nearest houses to assign weights. Next, I ran LM tests and then ran the spatial lag and error models using spdep package. I run into five issues. Issue 1: When I weight the 10,000-observation first dataset, I get the following message: Non-symmetric neighbors list. Is this going to pose problems while running the regression models? If yes, what can I do? The code and the results are: test1.csv - read.csv(C:/Article/Housing1/NHspwt.csv) class(test1.csv) - c(spatial.neighbour, class(test1.csv)) of - ordered(test1.csv$OID) attr(test1.csv, region.id) - levels(of) test1.csv$OID - as.integer(of) test1.csv$NID - as.integer(ordered(test1.csv$NID)) attr(test1.csv, n) - length(unique(test1.csv$OID)) lw_test1.csv - sn2listw(test1.csv) lw_test1.csv$style - W lw_test1.csv Characteristics of weights list object: Neighbour list object: Number of regions: 10740 Number of nonzero links: 42960 Percentage nonzero weights: 0.03724395 Average number of links: 4 Non-symmetric neighbours list Weights style: W Weights constants summary: n nn S0 S1 S2 W 10740 115347600 10740 3129.831 44853.33 Issue 2: The spatial lag and error models do not run. I get the following message (the models runs on half the data, approx. 5,000 observations. However, I will like to use the entire sample). Error: cannot allocate vector of size 880.0 Mb In addition: Warning messages: 1: In t.default(object) : Reached total allocation of 3004Mb: see help(memory.size) 2: In t.default(object) : Reached total allocation of 3004Mb: see help(memory.size) 3: In t.default(object) : Reached total allocation of 3004Mb: see help(memory.size) 4: In t.default(object) : Reached total allocation of 3004Mb: see help(memory.size) The code for the lag model is: fmtypecurrentcombinedlag -lagsarlm(fmtypecurrentcombined, data = spssnew, lw_test1.csv, na.action=na.fail, type=lag, method=eigen, quiet=TRUE, zero.policy=TRUE, interval = NULL, tol.solve=1.0e-20) When I am able to read the data file using filehash package. However, I still get the following error message when I run the models: Error in matrix(0, nrow = n, ncol = n) : too many elements specified Issue 3: For the second dataset that contains approx. 100,000 observations, I get the following error message when I try to run spatial lag or error models. Error in matrix(0, nrow = n, ncol = n) : too many elements specified The code is: fecurrentcombinedlag -lagsarlm(fecurrentcombined, data = spssall, lw_test2.csv, na.action=na.fail, type=lag, method=eigen, quiet=NULL, zero.policy=TRUE, interval = NULL, tol.solve=1.0e-20) Issue 5: When I run LM tests I get the test results but with the following message: Spatial weights matrix not row standardized. Should I be worried about this considering that I am using the 4-nearest neighbor rule? The code is: lm.LMtests(fmtypecurrent, lw_test1.csv, test=c(LMerr, LMlag, RLMerr, RLMlag, SARMA)) Thanks Shishm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data from Stock and Watson or DAgostino papers?
Hello, I am interested in looking at the dataset used by Stock and Watson in their Macroeconomic Forecasting Using Diffusion Indexes (J. of Business and Econ. Statistics, April 2002, pp158-161) or the set used by D'Agostino and Giannone Comparing Alternative Predictors [...](October 2006) in R. Does anyone know if the R-code to retrieve these series from FRED (as opposed to McGraw-Hill) is out in the wild anywhere? Before doing the mapping from the papers to the St. Louis database and then doing the coding, I thought I would ask if anyone has already gone down that road or would know where else I could search for this answer (and - yes - I have tried Google ...) Thanks in advance, Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Does aov produce one-sided or two-sided p-values?
Hi - Hopefully this is an easy question. In SPSS, when I'm testing a directional hypotheses using an ANOVA (GLM), I can divide the p-value by 2 because SPSS reported two-sided p-values? Is this approach still legit when I'm using aov in R? Thanks, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random forests prediction
But shouldn't it be resolved when I set mtry to the maximum number of variables? Then the model explores all the variables for the next step, so it will still be able to find the better ones? And then in the later steps it could use the (less important) variables. Matthijs -- View this message in context: http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409p4629944.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random forests prediction
Hi all, I have a strange problem when applying RF in R. I have a set of variables with which I obtain an AUC of 0.67. I do have a second set of variables that have an AUC of 0.57. When I merge the first and second set of variables, the AUC becomes 0.64. I would expect the prediction to become better as I add variables that do have some predictive power? This is even more strange as the AUC on the training set increased when I added more variables (while the AUC of the validation set thus decreased). Is there anyone who has experienced the same and/or who know what could be the reason? Thanks, Matthijs -- View this message in context: http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reception of (Vegan) envfit analysis by manuscript reviewers
I'm getting lots of grief from reviewers about figures generated with the envfit function in the Vegan package. Has anyone else struggled to effectively explain this analysis? If so, can you share any helpful tips? The most recent comment I've gotten back: What this shows is which NMDS axis separates the communities, not the relationship between the edaphic factor and the Bray-Curtis distance. Thanks for any suggestions! Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Gwet's AC1
R has functions for computing kappa, fleiss's kappa, etc., but can it compute Gwet's AC1? Thanks, Matt. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested brew call yields Error in .brew.cat(26, 28) : unused argument(s) (26, 28)
On Wed, 2012-03-28 at 11:40 +0100, Chris Beeley wrote: I am writing several webpages using the brew package and R2HTML. I would like to work off one script so I am using nested brew calls. The documentation for brew states that: NOTE: brew calls can be nested and rely on placing a function named ’.brew.cat’ in the environment in which it is passed. Each time brew is called, a check for the existence of this function is made. If it exists, then it is replaced with a new copy that is lexically scoped to the current brew frame. Once the brew call is done, the function is replaced with the previous function. The function is finally removed from the environment once all brew calls return. I'm afraid I can't quite figure out what it is I'm supposed to do here. I've tried loading the brew library within the script which I pass to brew, and I've tried defining brew cat like this: The paragraph above describes what brew is doing behind the scenes. It's not necessary to modify or set the .brew.cat function. A nested (or recursive) brew call occurs when brew() is called from a document currently being processed by brew(). To illustrate further, suppose there are two brew documents, example-1.brew and example-2.brew, where example-1.brew contains the following text (delimited by '''): ''' This text is in example-1.brew. %= brew::brew(example-2.brew) % ''' and the example-2.brew contains ''' This text is in example-2.brew. %= date() -% ''' Then from the R prompt we have: Rbrew::brew(example-1.brew) This text is in example-1.brew. This text is in example-2.brew. Thu Mar 29 20:24:52 2012 .brew.cat=function(){} This generates the following error message: Error in .brew.cat(26, 28) : unused argument(s) (26, 28) I think perhaps it is more likely that I need to insert into the script the actual content of .brew.cat, but I can't seem to get R to tell me what it is and Googling throws up a lot of stuff about beer and not much else (drew a blank also from RSiteSearch(Nested brew)) Any help gratefully received. Chris Beeley Institute of Mental Health, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Assistant Professor, Department of Biostatistics School of Medicine, Vanderbilt University 1161 21st Ave. S2323 MCN Office CC2102L Nashville, TN 37232-2158 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with Matrix code optimization
The chol and solve methods for dpoMatrix (Matrix package) are much faster than the default methods. But, the time required to coerce a regular matrix to dpoMatrix swamps the advantage. Hence, I have the following problem, where use of dpoMatrix is worse than a regular matrix. library(Matrix) x - diag(10) system.time( for(r in seq(0.1, 0.9, length.out=1000)) { m - r^abs(row(x)-col(x)); chol(m); solve(m); }) system.time( for(r in seq(0.1, 0.9, length.out=1000)) { M - as(r^abs(row(x)-col(x)), 'dpoMatrix') chol(M); solve(M); }) Any ideas? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] repeating or looping within an apply statement to handle multiple variables
Dear R experts, I would like to please ask for your help with repeating steps in an apply statement. I have a dataframe that lists multiple variables for a given id and visit, as well as drug treatment. head(exp) id visit variable1 variable2 variable3 variable4 drug 1 3 11310 7110 2 3 51015 9 90 3 312 910 8 80 4 7 112 8 9 81 5 7 516 9 3101 6 712 511 9141 I would like process these variables to find the difference between visit 5 and 1 for each id, then summarize this data in terms of means and errors. Thus far, with your brilliant advice to employ do.call and lapply, I have been able to process one variable at a time, but I would much prefer to loop or repeat the process for each variable in order to create an efficiently stored set of data. I would like to get a data set such as: exp1 id variable drug d5.3 3 3 variable10 -3 7 7 variable114 13 13 variable10 -5 56 56 variable104 78 78 variable107 109 109 variable10 -3 145 145 variable10 -2 173 173 variable109 212 212 variable11 -7 3 3 variable2? ? 7 7 variable2? ? 13 13 variable2 ? ? 56 56 variable2? ? 78 78 variable2 ? ? 109 109 variable2? ? 145 145 variable2? ? 173 173 variable2 ? ? 212 212 variable2? ? 3 3 variable3? ? etc... exp2 variable difference gel mean sd n se X95cimean.sd 0 variable1 d5.1 0 1.0 5.567764 7 2.104417 5.149323 0.1796053 1 variable1 d5.1 1 -1.5 7.778175 2 5.50 69.884126 -0.1928473 se.sd X95ci.sd 0 0.3779645 0.9248457 1 0.7071068 8.9846435 But, I have only been able to get the data for the first variable, despite having attempted loop statements, ie (for i in c('variable1','variable2','variable3','variable4')), for the variable names. Would you please have any thoughts about how to repeat lapply across many column variables? I greatly appreciate your thoughts. I have supplied the code for the example and my work thus far below: exp - data.frame(id= rep(c(3,7,13,56,78,109,145,173,212),each=3) , visit = rep(c(1,5,12), times = 9 ) , variable1 = round (rnorm ( mean =10,sd = 3, n = 27),0) , variable2 = round (rnorm ( mean =10,sd = 3, n = 27),0) , variable3 = round (rnorm ( mean =10,sd = 3, n = 27),0) , variable4 = round (rnorm ( mean =10,sd = 3, n = 27),0) , drug = rep ( round ( rnorm ( mean = 0.5, sd=0.1, n=9),0),each = 3 ) ) exp [exp[,'visit'] == 1 exp[,'id']==3 ,]$variable - NA exp [exp[,'visit'] == 5 exp[,'id']==56 ,]$variable - NA exp1 - do.call (rbind ,lapply (split (exp, exp$id), function (.grp) { data.frame ('id'=.grp$id[1L], 'variable'= 'variable1', 'drug'=.grp$drug[1L ], 'd5-3'= .grp [.grp [['visit']]==5,]$variable1 - .grp[.grp[['visit']]==1 ,]$variable1 ) })) exp2 - do.call (rbind ,lapply ( split (exp1,exp1$drug), function (.grp) { a- na.omit(.grp$d5.3) data.frame('variable'='variable1', 'difference'='d5.1', 'gel'=.grp$drug[1L], 'mean'=mean(a), 'sd'=sd(a), 'n'=length(a), 'se'=sd(a)/sqrt(length(a)), '95ci'= qt(0.975, (length(a)-1)) * sd(a)/sqrt(length(a)), 'mean/sd'=mean(a)/sd(a), 'se/sd'=(sd(a)/sqrt(length(a)))/sd(a), '95ci/sd'=(qt(0.975,(length(a)-1))*sd(a)/sqrt(length(a)))/sd(a) )} ) ) Thanks again for your help, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to rbind matrices from different loops
Dear R experts, I am having difficulty using loops productively and would like to please ask for advice. I have a dataframe of ids and groups. I would like to break down the dataframe into groups, find the unique sets of ids, then reassemble. My thought was to use a loop, but I have been unable to finish this loop in a logical way. I would like to find the unique ids for group 1, group 2, etc., and rbind these back together. However, I am unclear how to do this because in other attempts my final product is always a part of the last run loop. My way of working around this has been to write.csv and then re read at the end, which is so clumsy. Previously, I have used a store matrix for individual cells. 1. Is there a better way to approach this? 2. How can I combine parts of matrices to other parts created in prior loops? I have created a primitive example below. Each of the groups varies in number, so my repetitive example below is not accurate. In my real data, ids repeat often within groups. Thank you so much, Matt example - data.frame(id=rep( ( abs(round(rnorm(50,mean=500,sd=250),digits=0))) ,3), group=rep(1:15,10)) example -example[with(example,order(id,group)),] for (i in 1:15) { ai - example[example[,2]==i,][!duplicated ( example[example[,2]==i,][,1] ),] write.csv(ai, paste('a',i,'.csv',sep=)) } b1-read.csv('a1.csv') b2-read.csv('a2.csv') b3-read.csv('a3.csv') b4-read.csv('a4.csv') b5-read.csv('a5.csv') b6-read.csv('a6.csv') b7-read.csv('a7.csv') b8-read.csv('a8.csv') b9-read.csv('a9.csv') b10-read.csv('a10.csv') b11-read.csv('a11.csv') b12-read.csv('a12.csv') b13-read.csv('a13.csv') unis2 - rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind (b1,b2),b3),b4),b5),b6),b7),b8),b9),b10),b11),b12),b13) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function restrictedparts
That's because the number of partitions of 281 items of order 10 is quite large: R library('partitions') R R(10,281) [1] 1218681472 Without thinking about this too hard, the result of restrictedparts(281,10) should require around R 1218681472 * 10 * 4 / 10^9 [1] 48.74726 gigabytes of storage space (because the result is a 1218681472 x 10 array of 4 byte integers). Because the number of partitions grows 'explosively' with the number of items, this is a serious obstacle for statistical partitioning and clustering methods. For more discouragement, see the 'Bell number'. You can enumerate these restricted partitions one by one; see R ?partitions::nextpart Matt On Wed, 2012-01-25 at 15:11 +, yan jiao wrote: I am using function restrictedparts, but got error: restrictedparts(281,10) Error in integer(len) : vector size specified is too large Calls: restrictedparts - integer In addition: Warning message: In restrictedparts(281, 10) : NAs introduced by coercion Error in integer(len) : vector size specified is too large Calls: restrictedparts - integer is there a similar function can deal with long vector? I'm using R version 2.14.1 (2011-12-22),x86_64, linux-gnu many thanks yan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bayesian data analysis recommendations
On Thu, 2012-01-19 at 19:23 -0500, C W wrote: Thanks, Rich, I will look at the book. I agree, there are many nice packages, but what if the package changes in a few years? I would have no idea what is going on! I've heard from predecessor in the industry who emphasize the learning, not just plug and chug. I really want to learn the material and understand it, above all, it is interesting. I am looking more towards Bayesian statistics or Bayesian inference. I am in statistics graduate school, though not my field, the biology application could help in the understand I suppose? This list (r-help) may not be the best place to look for advice on this. But here is some anyway :) For a well-rounded introduction, I recommend Robert's 'The Bayesian Choice'. This is a great foundation for Bayesians who intend to defend their positions on statistical inference. For a more practical approach, Gelman, Carlin, Stern, and Rubin's book 'Bayesian Data Analysis' has been very popular (THE most popular, according to some). Regarding the software tools for Bayesian data analysis, the most mature _and_ active _and_ best integrated with the R project is Martyn Plummer's JAGS (See also the R package rjags, by the same author). Another tool that I'm planning to check out is PyMC: http://code.google.com/p/pymc/ Best, Matt On Thu, Jan 19, 2012 at 7:07 PM, Rich Shepard rshep...@appl-ecosys.com wrote: On Thu, 19 Jan 2012, C W wrote: I am trying to learn Bayesian inference and Bayesian data analysis, I am new in the field. Would any experts on the list recommend any good sites or materials for beginners? My approach is to learn and understand the theory first, then program on my own using R, though I see there are already packages. I'm far from an expert, but why not avoid re-inventing the wheel while you learn? Buy and read Jim Albert's Bayesian Computation with R. If you're a population ecologist (or willing to extend pesented examples and ideas to communities and ecosystems), Ben Bolker's Ecological Models and Data in R explains when Bayesian and frequentist approaches each have advantages over the other. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading MINE output into a matrix
I've benefited from this list with input on how to build up a symmetrical matrix. The purpose of that query was to work with the output from the MINE routine posted at www.exploredata.net To the extent it helps others, here is the script that I was working on an which turns a given MINE output column (in the case below, the third column corresponding to MIC) into a matrix. Hope it helps, Matt #needed for MINE routine require(rJava) #load market data require(PortfolioAnalytics) data(indexes) #write CSV file of data to current working directory datafilename - indexes.csv write.table(indexes, datafilename, sep=,, col.names=TRUE, row.names=FALSE, quote=FALSE, na=NA) #read MINE R code source.with.encoding('MINE.r', encoding='UTF-8') pairs_method - all.pairs max_num_boxes_exponent - 0.6 num_clumps_factor - 15 #run MINE routine on data MINE(datafilename,style=pairs_method, max.num.boxes.exponent=max_num_boxes_exponent, num.clumps.factor=num_clumps_factor) #read output of MINE routine #data is sorted in descending order of MIC variable #output is half of a square symmetric matrix, excluding diagonal #there are 9 columns, 7 of which are various stats #calc of outputfilename could be better handled ... # kludge included to deal with filename generated on Windows outputfilename - sprintf(%s,%s,cv=0.0,B=n^%g,Results.csv,datafilename, sub(.,,pairs_method,fixed=TRUE), max_num_boxes_exponent) x-read.csv(outputfilename,header=TRUE) #isolate row/col frequencies as a matrix. we need to look at # both to get the complete list of pairs and their respective frequencies xtable-table(x$X.var) ytable-table(x$Y.var) #map frequencies of X Y vars to rows xmap-xtable[x$X.var] ymap-ytable[x$Y.var] finalmap-order(xmap,-ymap,decreasing=TRUE) #fill in matrix - we want the third column for MIC z-diag(length(levels(x$X.var))+1) z[row(z)col(z)]-x[finalmap,3] z-z+t(z) diag(z)-1 #determine and set row/column names varnames-c(names(sort(xtable,decreasing=TRUE)),names(sort(ytable,decreasing=TRUE))[1]) rownames(z)-varnames colnames(z)-varnames z __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help creating a symmetric matrix?
Thank you all for your help and best wishes for the holiday season. Matt Considine On 12/24/2011 8:38 AM, William Revelle wrote: Dear Matt, Sarah and Rui, To answer the original question for creating a symmetric matrix v-c(0.33740, 0.26657, 0.23388, 0.23122, 0.21476, 0.20829, 0.20486, 0.19439, 0.19237, 0.18633, 0.17298, 0.17174, 0.16822, 0.16480, 0.15027) z-diag(6) z[row(z) col(z)]- v z- z + t(z) diag(z)- 0 z [,1][,2][,3][,4][,5][,6] [1,] 0.0 0.33740 0.26657 0.23388 0.23122 0.21476 [2,] 0.33740 0.0 0.20829 0.20486 0.19439 0.19237 [3,] 0.26657 0.20829 0.0 0.18633 0.17298 0.17174 [4,] 0.23388 0.20486 0.18633 0.0 0.16822 0.16480 [5,] 0.23122 0.19439 0.17298 0.16822 0.0 0.15027 [6,] 0.21476 0.19237 0.17174 0.16480 0.15027 0.0 Bill On Dec 24, 2011, at 6:04 AM, Sarah Goslee wrote: Or the slightly shorter: z-diag(6) z[row(z) col(z)]- v which is what lower.tri() does, and z- diag(6) z[lower.tri(z)]- v also works. Sarah On Fri, Dec 23, 2011 at 9:31 PM, Rui Barradasruipbarra...@sapo.pt wrote: Matt Considine wrote Hi, I am trying to work with the output of the MINE analysis routine found at http://www.exploredata.net Specifically, I am trying to read the results into a matrix (ideally an n x n x 6 matrix, but I'll settle right now for getting one column into a matrix.) The problem I have is not knowing how to take what amounts to being one half of a symmetric matrix - excluding the diagonal - and getting it into a matrix. I have tried using lower.tri as found here https://stat.ethz.ch/pipermail/r-help/2008-September/174516.html but it appears to only partially fill in the matrix. My code and an example of the output is below. Can anyone point me to an example that shows how to create a matrix with this sort of input? Thank you in advance, Matt #v-newx[,3] #or, for the sake of this example v-c(0.33740, 0.26657, 0.23388, 0.23122, 0.21476, 0.20829, 0.20486, 0.19439, 0.19237, 0.18633, 0.17298, 0.17174, 0.16822, 0.16480, 0.15027) z-diag(6) ind- lower.tri(z) z[ind]- t(v)[ind] z [,1][,2] [,3] [,4] [,5] [,6] [1,] 1.0 0.00000 [2,] 0.26657 1.00000 [3,] 0.23388 0.192371000 [4,] 0.23122 0.18633 NA100 [5,] 0.21476 0.17298 NA NA10 [6,] 0.20829 0.17174 NA NA NA1 Hello, Aren't you complicating? In the last line of your code, why use 'v[ind]' if 'ind' indexes the matrix, not the vector? z-diag(6) ind- lower.tri(z) z[ind]- v#This works z Rui Barradas -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. William Revellehttp://personality-project.org/revelle.html Professor http://personality-project.org Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern Universityhttp://www.northwestern.edu/ Use R for psychology http://personality-project.org/r It is 6 minutes to midnighthttp://www.thebulletin.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help creating a symmetric matrix?
Hi, I am trying to work with the output of the MINE analysis routine found at http://www.exploredata.net Specifically, I am trying to read the results into a matrix (ideally an n x n x 6 matrix, but I'll settle right now for getting one column into a matrix.) The problem I have is not knowing how to take what amounts to being one half of a symmetric matrix - excluding the diagonal - and getting it into a matrix. I have tried using lower.tri as found here https://stat.ethz.ch/pipermail/r-help/2008-September/174516.html but it appears to only partially fill in the matrix. My code and an example of the output is below. Can anyone point me to an example that shows how to create a matrix with this sort of input? Thank you in advance, Matt require(PortfolioAnalytics) #load market index data data(indexes) #save data as a CSV write.table(indexes, C:/Rwork/indexes.csv, sep=,, col.names=TRUE, row.names=FALSE, quote=FALSE, na=NA) #assumes rJava is installed, MINE.r and MINE.jar are in the working directory #read in MINE.r source.with.encoding('C:/Rwork/MINE.r', encoding='UTF-8') #run MINE on indexes MINE(C:/Rwork/indexes.csv,all.pairs) #read the output file of MINE analysis x=read.csv(C:/Rwork/indexes.csv,B=n^0.6,k=15,Results.csv,header=TRUE) #isolate one half of matrix newx-x[,1:3] newx X.var Y.var MIC..strength. 1 US.Equities Int.l.Equities0.33740 2US.Bonds US.Tbill0.26657 3US.Tbill Inflation0.23388 4 Commodities Inflation0.23122 5 Commodities US.Tbill0.21476 6 US.Equities US.Tbill0.20829 7US.Bonds Inflation0.20486 8 Int.l.EquitiesCommodities0.19439 9US.BondsCommodities0.19237 10US.EquitiesCommodities0.18633 11 US.BondsUS.Equities0.17298 12US.Equities Inflation0.17174 13 Int.l.Equities US.Tbill0.16822 14 US.Bonds Int.l.Equities0.16480 15 Int.l.Equities Inflation0.15027 #v-newx[,3] #or, for the sake of this example v-c(0.33740, 0.26657, 0.23388, 0.23122, 0.21476, 0.20829, 0.20486, 0.19439, 0.19237, 0.18633, 0.17298, 0.17174, 0.16822, 0.16480, 0.15027) z-diag(6) ind - lower.tri(z) z[ind] - t(v)[ind] z [,1][,2] [,3] [,4] [,5] [,6] [1,] 1.0 0.00000 [2,] 0.26657 1.00000 [3,] 0.23388 0.192371000 [4,] 0.23122 0.18633 NA100 [5,] 0.21476 0.17298 NA NA10 [6,] 0.20829 0.17174 NA NA NA1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question about spaces in r
Hello, I would like to please ask if someone would explain how r reads characters and numbers differently. Using read.csv, I had a matrix that resembled the following, only with many more ids and data: ID Visit variable 2 1 5 2 1 3 2 3 4 2 41 1 2 42 34 2 5 54 2 9 1 2 10 3 2 12 5 5 1 54 5 2 9 5 3 3 5 41 54 5 41 2 5 5 235 5 9 4 5 10 2 5 12 2 I then tried to subset for Visit==3. However, subset == was not working properly. This gave me zero rows. I printed the matrix/dataframe and found that this was because r viewed the 3 as 3 (space three). So, I had to type subset == 3 to select for the data instead. I think this has to do with character, number and string properties, but I am quite a novice. Would anyone be able to instruct me how one tells a dafaframe/matrix to convert numbers such as 3 to 3 so that I do not get confused in the future? I guess another problem I have is that I am still learning the differences between matrices and dataframes. Thanks so much, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R logo in eps formt
See this earlier post for SVG logos: http://tolstoy.newcastle.edu.au/R/e12/devel/10/10/0112.html Using Image Magick, do something like convert logo.svg logo.eps On Thu, 2011-12-01 at 10:56 +0700, Ben Madin wrote: G'day all, Sorry if this message has been posted before, but searching for R is always difficult... I was hoping for a copy of the logo in eps format? Can I do this from R, or is one available for download? cheers Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R endnote entry
I know citation() gives the R citation to be used in publications. Has anyone put this into endnote nicely? I'm not very experienced with endnote, and the way I have it at the momeny the 'R Development Core Team' becomes R. D. C. T. etc. Cheers. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] googleVis motionchart - slow with Date class
Hi, I am trying to create a googleVis motion chart with monthly data. When formatting the date column as a Date class variable, the plot as presented in the browser becomes considerably slower and very prone to crashing the browser. To illustrate this issue I have modified the WorldBank demo. ### objects from demo(WorldBank, package = googleVis) M - gvisMotionChart(subData, idvar=country.name, timevar=year, options=list(width=700, height=600)) plot(M) This works fine and I can smoothly move back and forth between the scatter plots and the line plots. ## here I express the date as a Date class object - arbibrarily assigning each year to June 1st. subData$year2 - as.Date(ISOdate(subData$year, 6, 1 )) M2 - gvisMotionChart(subData, idvar=country.name, timevar=year2, options=list(width=700, height=600)) plot(M2) Using Chrome, this plot is very slow to load and it appears when pressing play that the date field fills in each day of the year. Trying to go back and fourth between the line plot and the scatter plot will crash the browser. Is there a better way to express monthly data? I have tried converting it to a numeric in the form of mmdd or mm, but this didn't work? I am also wondering. is the best place to post a question about googleVis? I notice threads on stackoverflow and other places. Thanks, Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching
I have a spatial weight file in csv that I want as listw object in R. The file has the following 3 variables (left to right in the file) -- OID_, NID and WEIGHTS. NID stands for the neighbors and OID_ as the origins. There are 217 origins with 4 neighbors each. I have been able to read the csv file as a data frame (test.csv). Then I tried to check whether the OID_ variable is in the right place in the dataframe. I used match for that using: o - match(OID_, OID_) I am not sure whether this is the right way to match. Please advice. Anyway, next I created a matrix object (m) using: m - as.matrix(test.csv[, -1]) Then I created object m1, using: m1 - m[o, o] Finally, I tried creating listw object using: mat2listw(m1) Here I get an error that x is not a square matrix. Not sure what to do now. Any helo appreciated! Thanks, Shishm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] contact person for UseR 2012, please?
The contact person is: Stephania McNeal-Goddard email: stephania.mcneal-godd...@vanderbilt.edu phone: (615)322-2768 Vanderbilt University School of Medicine Department of Biostatistics S-2323 Medical Center North Nashville, TN 37232-2158 On Tue, 2011-10-18 at 12:41 -0400, David Winsemius wrote: On Oct 18, 2011, at 12:25 PM, Erin Hodgess wrote: Dear R People: Do you know who the contact person is for UseR 2012, please? I'm trying to get together some numbers for funding (sorry for the Funny, it was the first hit on a Google search with term useR2012 http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012 David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Assistant Professor, Department of Biostatistics School of Medicine, Vanderbilt University 1161 21st Ave. S2323 MCN Office CC2102L Nashville, TN 37232-2158 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Related Topic] need help on read.spss
Would it be worthwhile to update the read.spss implementation using the more recent discoveries from the PSPP group? I don't mean to copy their code; but to use the ideas in their code. Is anyone working on this? I wouldn't want the effort to be duplicated. On Thu, 2011-10-13 at 16:22 +0200, Uwe Ligges wrote: On 11.10.2011 12:07, Smart Guy wrote: Hi, I have one doubt about one of the parameter of 'read.spss()' from 'foreign' package. Here is the syntax :- read.spss ( file, use.value.labels = TRUE, to.data.frame = FALSE, max.value.labels = Inf, trim.factor.names = FALSE, trim_values = TRUE, reencode = NA, use.missings = to.data.frame ) In above syntax when I pass *'to.data.frame= FALSE*' it gives me missing values from SPSS file (that I try to read using read.spss() ). But when I pass '*to.data.frame = TRUE*' then its not giving me missing values. And need to get missing values. According to read.spss() documentation *to.data.frame : return a data frame?* I am curious to know, if we pass *'to.data.frame = TRUE*' , is it going to cause some issue or effect something? I didn't understand the read.spss() documentation correctly. Please explain. Thanks in Advance An R data.frame cannot represent different kinds of missing values, since R just has NA. Therefore, there are two way to import data: to.data.frame=FALSE will read all the information, but into a format you will likely have to postprocess to make it conveniently usable. to.data.frame=TRUE will import into a data.frame, but that cannot represent all the nuances known from the SPSS representation. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rweb and setting up R on a server
Erin, I haven't used Rweb recently. The URL is http://www.math.montana.edu/Rweb/ . If you have a server, you could set up the server version of RStudio: http://rstudio.org/download/server . It worked well when I tried it. Best, Matt On Tue, 2011-09-06 at 17:07 -0500, Erin Hodgess wrote: Dear R People: At one time, Rweb existed, which had R on a server. I looked for it, but can't find it. Has anyone used that recently, or is there a new equivalent, please? Thanks, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readBin fails to read large files
On Thu, 2011-09-01 at 17:36 +0100, Prof Brian Ripley wrote: readBin is intended to read a few items at a time, not 10^9. You are probably getting 32-bit integer overflow inside your OS, since the number of bytes you are trying to read in one go exceeds 2GB. Don't do that: read say a million at time. And BTW, if these really are unsigned ints you will get wraparound. To elaborate, ?readBin reads that the 'signed' argument is only used for integers of size 1 and 2 bytes. These are ultimately converted to signed 4 byte integers, because that's how R stores integers. To be exact, if your file contains integers larger than 2^31-1 = 2147483647, would occur. In actuality, R returns NA for those values. I'm bringing this up because R normally issues a warning: R 2147483647L + 1L [1] NA Warning message: In 2147483647L + 1L : NAs produced by integer overflow But, a similar warning isn't issued by readBin when NA results from signed integer overflow: #The raw vector below represents 2147483647L and 2147483647L + 1L #in little endian, unsigned, 4 byte integers R dat - as.raw(c(0xff,0xff,0xff,0x7f,0x00,0x00,0x00,0x80)) R writeBin(dat, 'test.bin') R readBin('test.bin', n=2, integer(), signed=FALSE) [1] 2147483647 NA On Thu, 1 Sep 2011, Benton, Paul wrote: Posting for a friend Begin forwarded message: From: Geier, Florian florian.geie...@imperial.ac.ukmailto:florian.geie...@imperial.ac.uk Subject: Fwd: readBin fails to read large files Date: September 1, 2011 4:10:53 PM GMT+01:00 To: Begin forwarded message: Date: 1 September 2011 16:01:45 GMT+01:00 Subject: readBin fails to read large files Dear all, I am trying to read a large file (~2GB) of unsigned ints into R. Using the command: raw-readBin(file,n=10^8, integer(),endian=little,signed=FALSE) It works fine for n=10^8, but fails for n=10^9 (or even at n=6*10^8). My machine$sizeof.long is 8 bit. I am running R 2.13.1 on a x86_64-apple-darwin9.8.0/x86_64 (64-bit) architecture. Thanks for your help Florian -- AXA doctoral fellow Bundy lab - Biomolecular Medicine Imperial College London -- AXA doctoral fellow Bundy lab - Biomolecular Medicine Imperial College London [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unusual separators
Hi all, I have a list that I got from a web page that I would like to crunch. Unfortunately, the list has some unusual separators in it. I believe the columns are separated by 1 space and 1 tab. I tried to insert this into the read.table( ..., sep= \t, ...) but got an error that said something like 'only one byte separators can be used. I have thought about using a gsub to 'swap out' the space + tab and replace it with commas, etc but thought there might be another way. Any suggestions? M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshape::rename package unable to install !?!
Greetings all, I have been working with RStudio and R only for a little while. I came across a package called 'reshape' that helped me 'rename' columns. Unfortunately, my computer got hosed (too much playing with linux too late at nite) and I had to re-install everything, BUT when I tried to reinstall 'reshape' or 'reshape2' I COULDN't. Is there a way to get over this hurdle with reshape or is there another command I can use. I am stuck because my programs up to this point used 'rename' and now I have to redo some work. M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Which is more efficient?
Greetings all, I am curious to know if either of these two sets of code is more efficient? Example1: ## t-test ## colA - temp [ , j ] colB - temp [ , k ] ttr - t.test ( colA, colB, var.equal=TRUE) tt_pvalue [ i ] - ttr$p.value or Example2: tt_pvalue [ i ] - t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE) - I have three loops, i, j, k. One to test the all of i files in a directory. One to tease out column j and compare it by means of t-test to column k in each of the files. --- for ( i in 1:num_files ) { temp - read.table ( files_to_test [ i ], header=TRUE, sep=\t) num_cols - ncol ( temp ) ## Define Columns To Compare ## for ( j in 2 : num_cols ) { for ( k in 3 : num_cols ) { ## t-test ## colA - temp [ , j ] colB - temp [ , k ] ttr - t.test ( colA, colB, var.equal=TRUE) tt_pvalue [ i ] - ttr$p.value } } } I am a novice writer of code and am interested to hear if there are any (dis)advantages to one way or the other. M Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error message for MCC
Greetings all, I am getting an error message that is stifling me. Any ideas? ## Define Directories ## load_from - /home/mcc/Dropbox/abrodsky/kegg_combine_data/ save_to - /home/mcc/Dropbox/abrodsky/ttest_results/ ### ## Define Columns To Compare ## compareA - log_b_rich compareB - Fc_cdt_rich_tot ## Collect Files To Compare ## setwd(load_from) files_to_test - list.files(pattern = combine.kegg) ## ## Initialize Variables ## vl - length(files_to_test) temp - vector(mode=numeric, length = vl) colA - vector(mode=numeric, length = vl) colB - vector(mode=numeric, length = vl) tt - vector(mode=numeric, length = vl) ## Calculate P-values ## for (i in 1:3){ +temp1 - read.table(files_to_test[i], header=TRUE, sep= ) +numrows - nrow(temp1) +tt_pvalue - matrix(data=temp, nrow=numrows, ncol=vl) +colA - temp[,compareA] +colB - temp[,compareB] +tt - t.test(colA, colB, var.equal=TRUE) +tt_pvalue - tt$p.value + } Error in temp[, compareA] : incorrect number of dimensions -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Use dump or write? or what?
Greetings all, Thanks for all your help so far. Let me give a better idea of what I am doing. I have hundreds of files that I need to plow thru with a t-test and correlation test. BTW, 'tempA' and tempB' are simply columns of numbers from a gene-chip experiment that spits out dna 'amounts'. So I have set up a loop to read the files and carry out the tests but need to save it for later inspection (and Jim H-you are probably right, for later inspection). By inspection I mean I don't know what I want to do with it yet, Remember: That's why they call it Research. So it seems that 'save/load' might be a good alternative for my work. Any suggestions, M On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio matt.curcio...@gmail.com wrote: Greetings all, I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. But I can't figure out how to write it to file and then append subsequent t-tests. (maybe too tired ;} ) I have tried to use dump and file.append to no avial. ttest_results = tempfile() two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE) welch_ttest - t.test (tempA, tempB, var.equal = FALSE) dump (two_sample_ttest, file = dumpdata.txt, append=TRUE) ttest_results - file.append (ttest_results, two_sample_ttest) Any suggestions, M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Errors, driving me nuts
Greetings all, I am getting this error that is driving me nuts... (not a long trip, haha) I have a set of files and in these files I want to calculate ttests on rows 'compareA' and 'compareB' (these will change over time there I want a variable here). Also these files are in many different directories so I want a way filter out the junk... Anyway I don't believe that this is related to my errors but I mention it none the less. files_to_test - list.files (pattern = kegg.combine) for (i in 1:length (files_to_test)) { +raw_data - read.table (files_to_test[i], header=TRUE, sep= ) +tmpA - raw_data[,compareA] +tmpB - raw_data[,compareB] +tt - t.test (tmpA, tmpB, var.equal=TRUE) +tt_pvalue[i] - tt$p.value + } Error in tt_pvalue[i] - tt$p.value : object 'tt_pvalue' not found # I tried setting up a vector... # as.vector(tt_pvalue, mode=any) ### but NO GO file.name = paste(ttest.results., compareA, compareB, ) setwd(save_to) write.table(tt_pvalue, file=file.name, sep=\t ) Error in inherits(x, data.frame) : object 'tt_pvalue' not found # No idea?? What is going wrong?? M Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Appending 4 Digits On A File Name
Greetings all, I would like to append a 4 digit number suffix to the names of my files for later use. What I am using now only produces 1 or 2 or 3 or 4 digits. for (i in 1:1000) { temp - (kegg [i,]) temp - merge (temp, subrichcdt, by=gene) file.name - paste (kegg.subrichcdt., i, .txt, sep=) write.table(temp, file=file.name) } ### But I want: kegg.subrichcdt.0001.txt kegg.subrichcdt.0002.txt, ... Any suggestions M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Appending 4 Digits On A File Name
Hmmm... Got this error Error in formatC(i, width = 4, format = d, flat = 0) : unused argument(s) (flat = 0) Any ideas, M On Sun, Jul 31, 2011 at 1:30 PM, Matt Curcio matt.curcio...@gmail.com wrote: Greetings all, I would like to append a 4 digit number suffix to the names of my files for later use. What I am using now only produces 1 or 2 or 3 or 4 digits. for (i in 1:1000) { temp - (kegg [i,]) temp - merge (temp, subrichcdt, by=gene) file.name - paste (kegg.subrichcdt., i, .txt, sep=) write.table(temp, file=file.name) } ### But I want: kegg.subrichcdt.0001.txt kegg.subrichcdt.0002.txt, ... Any suggestions M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Appending 4 Digits On A File Name
Michael, Got it, thanks. Looking over the man file realized it is FLAG not flat. Cheers, M On Sun, Jul 31, 2011 at 2:26 PM, Matt Curcio matt.curcio...@gmail.com wrote: Hmmm... Got this error Error in formatC(i, width = 4, format = d, flat = 0) : unused argument(s) (flat = 0) Any ideas, M On Sun, Jul 31, 2011 at 1:30 PM, Matt Curcio matt.curcio...@gmail.com wrote: Greetings all, I would like to append a 4 digit number suffix to the names of my files for later use. What I am using now only produces 1 or 2 or 3 or 4 digits. for (i in 1:1000) { temp - (kegg [i,]) temp - merge (temp, subrichcdt, by=gene) file.name - paste (kegg.subrichcdt., i, .txt, sep=) write.table(temp, file=file.name) } ### But I want: kegg.subrichcdt.0001.txt kegg.subrichcdt.0002.txt, ... Any suggestions M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use dump or write? or what?
Greetings all, I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. But I can't figure out how to write it to file and then append subsequent t-tests. (maybe too tired ;} ) I have tried to use dump and file.append to no avial. ttest_results = tempfile() two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE) welch_ttest - t.test (tempA, tempB, var.equal = FALSE) dump (two_sample_ttest, file = dumpdata.txt, append=TRUE) ttest_results - file.append (ttest_results, two_sample_ttest) Any suggestions, M -- Matt Curcio M: 401-316-5358 E: matt.curcio...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap
In order to apply the bootstrap, you must resample, uniformly at random from the independent units of measurement in your data. Assuming that these represent the rows of 'data', consider the following: est - function(y, x, obeta = c(1,1), verbose=FALSE) { n - length(x) X - cbind(rep(1, n), x) nbeta - c(0,0) iter - 0 while(crossprod(obeta-nbeta)10^(-12)) { nbeta - obeta eta - X%*%nbeta mu- eta mu1 - 1/eta W - diag(as.vector(mu1)) Z - X%*%nbeta+(y-mu) XWX - t(X)%*%W%*%X XWZ - t(X)%*%W%*%Z Cov - solve(XWX) obeta - Cov%*%XWZ iter - iter+1 if(verbose) cat(Iteration # and beta1= ,iter, nbeta, \n) } return(nbeta[1,1]) } boot - function(data, reps) { n - nrow(data) Nt - vector('numeric', length=reps) for(Ncount in 1:reps) { #resample the rows of data bdata - data[sample(1:n,n,replace=TRUE),] #recompute and store estimate Nt[Ncount] - est(bdata[,1], bdata[,2]) } return(Nt) } stem(boot(data,1000),width=60) The decimal point is at the | -3 | 4 -2 | -1 | 2 -0 | 88866555444333222111 0 | 0022+400 1 | 0001+203 2 | 2224+23 3 | 112223344455 4 | 113344555789 5 | 02334446677899 6 | 1112334455778 7 | 11235568 8 | 001799 9 | 0259 10 | 1446 11 | 19 12 | 48 13 | 8 14 | 024 15 | 16 | 17 | 0788 18 | 19 | 1 On Wed, 2011-07-20 at 18:09 -0400, Val wrote: Hi all, I am facing difficulty on how to use bootstrap sampling and below is my example of function. Read a data , use some functions and use iteration to find the solution( ie, convergence is reached). I want to use bootstrap approach to do it several times (200 or 300 times) this whole process and see the distribution of parameter of interest. Below is a small example that resembles my problem. However, I found out all samples are the same. So I would appreciate your help on this case. #** rm(list=ls()) xx - read.table(textConnection( y x 11 5.16 11 4.04 14 3.85 19 5.68 4 1.26 23 7.89 15 4.25 17 3.94 7 2.35 17 4.74 14 5.49 11 4.12 17 5.92), header=TRUE) data - as.matrix(xx) closeAllconnections() Nt - NULL for (Ncount in 1:100) { y - data[,1] x - data[,2] n - length(x) X - cbind(rep(1,n),x) #covariate/design matrix obeta- c(1,1) #previous/starting values of beta nbeta - c(0,0)#new beta iter=0 while(crossprod(obeta-nbeta)10^(-12)) { nbeta - obeta eta - X%*%nbeta mu- eta mu1 - 1/eta W - diag(as.vector(mu1)) Z - X%*%nbeta+(y-mu) XWX - t(X)%*%W%*%X XWZ - t(X)%*%W%*%Z Cov - solve(XWX) obeta - Cov%*%XWZ iter - iter+1 cat(Iteration # and beta1= ,iter, nbeta, \n) } Nt[Ncount] - nbeta[1,1] } Nt summary(Nt) #**e* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hold position of vertices constant in network {statnet}?
I am a novice with network fuctions! I have been exploring the network function in the statnet package, but haven't been able to figure out how to hold vertices in position while varying edge features. Can anyone advise on whether this is possible, and if so, how to do it? Thanks! -- Matthew Bakker, Ph.D. Department of Plant Pathology University of Minnesota 495 Borlaug Hall 1991 Upper Buford Circle Saint Paul, MN 55108 USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to capture console output in a numeric format
Ravi, Consider using an environment (i.e. a 'reference' object) to store the results, avoiding string manipulation, and the potential for loss of precision: fr - function(x, env) { ## Rosenbrock Banana function x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 if(exists('fout', env)) fout - rbind(get('fout', env), c(x1, x2, f)) else fout - c(x1=x1, x2=x2, f=f) assign('fout', fout, env) f } out - new.env() ans - optim(c(-1.2, 1), fr, env=out) out$fout Best, Matt On Fri, 2011-06-24 at 15:10 +, Ravi Varadhan wrote: Thank you very much, Jim. That works! I did know that I could process the character strings using regex, but was also wondering if there was a direct way to get this. Suppose, in the current example I would like to obtain a 3-column matrix that contains the parameters and the function value: fr - function(x) { ## Rosenbrock Banana function on.exit(print(cbind(x1, x2, f))) x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } fvals - capture.output(ans - optim(c(-1.2,1), fr)) Now, I need to tweak your solution to get the 3-column matrix. It would be nice, if there was a more direct way to get the numerical output, perhaps a numeric option in capture.output(). Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Friday, June 24, 2011 10:48 AM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] How to capture console output in a numeric format try this: fr - function(x) { ## Rosenbrock Banana function +on.exit(print(f)) +x1 - x[1] +x2 - x[2] +f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 +f + } fvals - capture.output(ans - optim(c(-1.2,1), fr)) # convert to numeric fvals - as.numeric(sub(^.* , , fvals)) fvals [1] 24.20 7.095296 15.08 4.541696 [5] 6.029216 4.456256 8.879936 7.777856 [9] 4.728125 5.167901 4.21 4.437670 [13] 4.178989 4.326023 4.070813 4.221489 [17] 4.039810 4.896359 4.009379 4.077130 [21] 4.020798 3.993600 4.024586 4.117625 [25] 3.993115 3.976081 3.971089 4.023905 [29] 3.980807 3.952577 3.932179 3.935345 On Fri, Jun 24, 2011 at 10:39 AM, Ravi Varadhan rvarad...@jhmi.edu wrote: Hi, I would like to know how to capture the console output from running an algorithm for further analysis. I can capture this using capture.output() but that yields a character vector. I would like to extract the actual numeric values. Here is an example of what I am trying to do. fr - function(x) { ## Rosenbrock Banana function on.exit(print(f)) x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } fvals - capture.output(ans - optim(c(-1.2,1), fr)) Now, `fvals' contains character elements, but I would like to obtain the actual numerical values. How can I do this? Thanks very much for any suggestions. Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edumailto:rvarad...@jhmi.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S. Shotwell Assistant Professor, Department of Biostatistics School of Medicine, Vanderbilt University 1161 21st Ave. S2323 MCN Office CC2102L Nashville, TN 37232-2158 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to capture console output in a numeric format
On Fri, 2011-06-24 at 12:09 -0400, David Winsemius wrote: On Jun 24, 2011, at 11:27 AM, Matt Shotwell wrote: Ravi, Consider using an environment (i.e. a 'reference' object) to store the results, avoiding string manipulation, and the potential for loss of precision: fr - function(x, env) { ## Rosenbrock Banana function x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 if(exists('fout', env)) fout - rbind(get('fout', env), c(x1, x2, f)) So _that's_ what a reference object is? Well, environments have 'pass-by-reference' behavior. That is, when they are passed to a function, modifications to the environment persist outside the function call. This is distinct from the Reference class (?methods::ReferenceClass). But there are similar concepts. The methods of a reference class can modify the class fields in a 'by-reference' fashion. However, the fields need not be passed to a method. This seems to give the same results in this example. Am I committing any sins by sneaking around the get()? if(exists('fout', env)) fout - rbind(env[['fout']], c(x1, x2, f)) # seems more direct 'env$fout' works here too. Thinking I also might be able to avoid the later assign(), I tried these without success. fr - function(x, env) { ## Rosenbrock Banana function x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 if(exists('fout', env)) env[['fout']] - rbind(env[['fout']], c(x1, x2, f)) else fout - c(x1=x1, x2=x2, f=f) f } this would work with 'env$fout - c(x1=x1, x2=x2, f=f)' following the 'else'. Hence, David's version might look like this: fr - function(x, env) { ## Rosenbrock Banana function x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 if(exists('fout', env)) env$fout - rbind(env$fout, c(x1, x2, f)) else env$fout - c(x1=x1, x2=x2, f=f) f } out - new.env() ans - optim(c(-1.2, 1), fr, env=out) out$fout -Matt out - new.env() ans - optim(c(-1.2, 1), fr, env=out) out$fout # NULL Is there no '[[-' for environments? (Also tried '-' but I know that is sinful/ ) -- David. else fout - c(x1=x1, x2=x2, f=f) assign('fout', fout, env) f } out - new.env() ans - optim(c(-1.2, 1), fr, env=out) out$fout Best, Matt On Fri, 2011-06-24 at 15:10 +, Ravi Varadhan wrote: Thank you very much, Jim. That works! I did know that I could process the character strings using regex, but was also wondering if there was a direct way to get this. Suppose, in the current example I would like to obtain a 3-column matrix that contains the parameters and the function value: fr - function(x) { ## Rosenbrock Banana function on.exit(print(cbind(x1, x2, f))) x1 - x[1] x2 - x[2] f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 f } fvals - capture.output(ans - optim(c(-1.2,1), fr)) Now, I need to tweak your solution to get the 3-column matrix. It would be nice, if there was a more direct way to get the numerical output, perhaps a numeric option in capture.output(). Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Friday, June 24, 2011 10:48 AM To: Ravi Varadhan Cc: r-help@r-project.org Subject: Re: [R] How to capture console output in a numeric format try this: fr - function(x) { ## Rosenbrock Banana function +on.exit(print(f)) +x1 - x[1] +x2 - x[2] +f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 +f + } fvals - capture.output(ans - optim(c(-1.2,1), fr)) # convert to numeric fvals - as.numeric(sub(^.* , , fvals)) fvals [1] 24.20 7.095296 15.08 4.541696 [5] 6.029216 4.456256 8.879936 7.777856 [9] 4.728125 5.167901 4.21 4.437670 [13] 4.178989 4.326023 4.070813 4.221489 [17] 4.039810 4.896359 4.009379 4.077130 [21] 4.020798 3.993600 4.024586 4.117625 [25] 3.993115 3.976081 3.971089 4.023905 [29] 3.980807 3.952577 3.932179 3.935345 On Fri, Jun 24, 2011 at 10:39 AM, Ravi Varadhan rvarad...@jhmi.edu wrote: Hi, I would like to know how to capture the console output from running an algorithm for further analysis. I can capture this using capture.output() but that yields a character vector. I would like
[R] analysing a three level reponse
Hello, I am struggling to figure out how to analyse a dataset I have inherited (please note this was conducted some time ago, so the data is as it is, and I know it isn't perfect!). A brief description of the experiment follows: Pots of grass were grown in 1l pots of standad potting medium for 1 month with a regular light and watering regime. At this point they were randomly given 1l of one of 4 different pesticides at one of 4 different concentrations (100%, 75%, 50% or 25% in water). There were 20 pots of grass for each pesticide/concentration giving 320 pots. There were no control (untreated) pots. The response was measured after 1 week and recorded as either: B1 - grass dead B2 - grass affected but not dead B3 - no visible effect I could analyse this as lethal effect vs non-lethal effect (B1 vs B2+B3) or some effect vs no effect (B1+B2 vs B3) binomial model, but I can't see how to do it with three levels. Any pointing in the right direction greatly appreciated! Thanks Matt -- Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify me at matt.el...@basc.org.uk then delete it. BASC may monitor email traffic. By replying to this e-mail you consent to BASC monitoring the content of any email you send or receive from BASC. Any views expressed in this message are those of the individual sender, except where the sender specifies with authority, states them to be the views of the British Association for Shooting and Conservation. BASC can confirm that this email message and any attachments have been scanned for the presence of computer viruses but recommends that you make your own virus checks. Registered Industrial and Provident Society No.: 28488R. Registered Office: Marford Mill, Rossett, Wrexham, LL12 0HL. -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Elbow criterion
On Mon, 2011-06-20 at 13:38 +0200, Dominik P.H. Kalisch wrote: Hi, I would like to cluster a dataset with the ward algorithm. I'm assuming that this refers to the agglomerative partitioning method [1]. That is, the number of clusters is selected according to the data partition that is sequentially optimal with respect to an `objective function'. In order to apply the elbow criterion, it should be possible to optimize over subsets of all possible data partitions where the number of clusters is fixed. Although the Ward method yields a sequence of data partitions with decreasing cluster sizes, there is no guarantee that _any_ of these partitions are optimal (except sequentially, of course). To apply the elbow method post hoc seems dubious, but maybe no more so than the Ward method itself. There are clustering methods that optimize the data partition (w.r.t a likelihood/posterior) with a fixed number of clusters, for instance, those based on finite mixture models. The elbow principle and method seem more valid in this context. See the R package 'mclust', and the CRAN task view for cluster analysis: http://cran.r-project.org/web/views/Cluster.html That works fine. But I can't find a method to plot the structure chart to estimate the elbow crterion for the number of clusters. Can someone tell me how I can do it? Thanks for your help. Dominik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [1] Ward, J. H. (1963), “Hierarchical Grouping to Optimize an Objective Function,” Journal of the American Statistical Association, 58, 236–244. -- Matthew S. Shotwell Assistant Professor, Department of Biostatistics School of Medicine, Vanderbilt University 1161 21st Ave. S2323 MCN Office CC2102L Nashville, TN 37232-2158 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Factor Analysis/Inputting Correlation Matrix
Can someone please direct me to how to run a factor analysis in R by first inputting a correlation matrix? Does the function factanal allow one to read a correlation matrix instead of data vectors? Thanks, Matt. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can we prepare a questionaire in R
As Mike had written, there are frameworks for web-development with R. RApache http://www.rapache.net is one. Also, see the R package Rook: http://cran.r-project.org/web/packages/Rook/index.html . On Wed, 2011-06-08 at 17:26 +0530, amrita gs wrote: How can we create HTML forms in R Wouldn't you rather create HTML forms in HTML? See the links above to use R for server-side scripting, for example, to receive form data from a web browser. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about curve function
On Tue, 2011-06-07 at 16:17 +0200, Uwe Ligges wrote: On 07.06.2011 11:57, peter dalgaard wrote: On Jun 6, 2011, at 11:22 , Prof Brian Ripley wrote: As a further example of the trickiness, the function method of plot() relies on curve(x, ...) being a request to plot the function x(x) against x. I've added a comment to that effect to the help page. Ouch. This springs to mind: fortune(106) If the answer is parse() you should usually rethink the question. -- Thomas Lumley R-help (February 2005) but curve() predates that insight by half a decade or more. It could probably do with a redesign, if anyone is up to it. By the way, it really does work if the 2nd arg is an expression object (as opposed to an expression evaluating to an expression object): do.call(curve,list(expression(x))) or cl- quote(curve(x)) cl[[2]]- expression(x) eval(cl) (The trouble with nonstandard evaluation is that it doesn't follow standard evaluation rules...) If this is not already a fortune, I will add it. And one more for Uwe's principle: when discontent, circumvent! :) Which is why I useually circvumvent curve(). It is typically faster to just evaluate a function at positions x and plot it rather than thinking minutes about how curve() expects its arguments. Uwe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about curve function
I think there is trouble because expr in curve(expr) may be the name of a function, and it's ambiguous whether 'x' should be interpreted as a mathematical expression involving x, or the name of a function. Here are some examples that work: curve(I(x)) curve(1*x) On Sun, 2011-06-05 at 12:07 -0500, Abhilash Balakrishnan wrote: Dear Sirs, I am a new user of the R package. When I try to use the curve function it confuses me. curve(x^2) Works fine. curve(x) Makes a complaint I don't understand. Why is x^2 valid and x is not? I check the documentation of curve, and it says the first argument must be an expression containing x. expression(x) Is an expression containing x. curve(expression(x)) Makes a different complaint and mentions different lengths of x and y (but I use no y here). I understand that plotting the function y(x) = x is rather silly, but I want to know what I am doing wrong, for the sake of my understanding of how R works. Thank you for support. Abhilash B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a vector from a file
On Tue, 2011-05-31 at 15:36 +0200, heimat los wrote: Hello all, I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package [1]. As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow. words - c(apple=10, pie=14, orange=5, fruit=4) My problem is to do the same thing but create the vector from a file which would contain words and their occurence number. I would be very happy if you could give me some hints. How is the file formatted? Can you provide a small example? Moreover, to understand the format of the file to be inserted I write the vector words to a file. write(words, file=words.txt) However, the file words.txt contains only the values but not the names(apple, pie etc.). $ cat words.txt 10 14 5 4 It seems that I have to understand more about the data types in R. Thanks. PH http://www.rforge.net/doc/packages/snippets/cloud.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a vector from a file
On Tue, 2011-05-31 at 16:19 +0200, heimat los wrote: On Tue, May 31, 2011 at 4:12 PM, Matt Shotwell m...@biostatmatt.com wrote: On Tue, 2011-05-31 at 15:36 +0200, heimat los wrote: Hello all, I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package [1]. As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow. words - c(apple=10, pie=14, orange=5, fruit=4) My problem is to do the same thing but create the vector from a file which would contain words and their occurence number. I would be very happy if you could give me some hints. How is the file formatted? Can you provide a small example? The file format is video tape=8 object recognition=45 object detection=23 vhs tape=2 But I can change it if needed with bash scripting. A CSV might be more universal, but this will do. Regards OK. Save the above as 'words.txt', then from the R prompt: words.df - read.table(words.txt, sep==) words.vec - words.df$V2 names(words.vec) - words.df$V1 Then use words.vec with the snippets::cloud function. I wasn't able to install the snippets package and test the cloud function, because I am still using R 2.13.0-alpha. read.table returns what R calls a 'data frame'; basically a collection of records over some number of fields. It's like a matrix but different, since fields may take values of different types. In the example above, the data frame returned by read.table has two fields named 'V1' and 'V2', respectively. The R expression 'words.df$V2' references the 'V2' field of words.df, which is a vector. The last expression sets names for words.vec, by referencing the 'V1' field of words.df. Moreover, to understand the format of the file to be inserted I write the vector words to a file. write(words, file=words.txt) However, the file words.txt contains only the values but not the names(apple, pie etc.). $ cat words.txt 10 14 5 4 It seems that I have to understand more about the data types in R. Thanks. PH http://www.rforge.net/doc/packages/snippets/cloud.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] blank space escape sequence in R?
You can embed hex escapes in strings (except \x00). The value(s) that you embed will depend on the character encoding used on you platform. If this is UTF-8, or some other ASCII compatible encoding, \x20 will work: foo\x20bar [1] foo bar For other locales, you might try charToRaw( ) to see the binary (hex) representation for the space character on your platform, and substitute this sequence instead. On Mon, 2011-04-25 at 15:01 +0200, Mark Heckmann wrote: Is there a blank space escape sequence in R, i.e. something like \sp etc. to produce a blank space? TIA Mark ––– Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] blank space escape sequence in R?
I may have misread your original email. Whether you use a hex escape or a space character, the resulting string in memory is identical: identical(a\x20b, a b) [1] TRUE But, if you were to read a file containing the six characters a \x20b (say with readLines), then the six characters would be read into memory, and printed like this: a\\x20b That is, not with a space character substituted for \x20. So, now I'm not sure this is a solution. On Mon, 2011-04-25 at 12:24 -0500, Matt Shotwell wrote: You can embed hex escapes in strings (except \x00). The value(s) that you embed will depend on the character encoding used on you platform. If this is UTF-8, or some other ASCII compatible encoding, \x20 will work: foo\x20bar [1] foo bar For other locales, you might try charToRaw( ) to see the binary (hex) representation for the space character on your platform, and substitute this sequence instead. On Mon, 2011-04-25 at 15:01 +0200, Mark Heckmann wrote: Is there a blank space escape sequence in R, i.e. something like \sp etc. to produce a blank space? TIA Mark ––– Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting 16-bit to 8-bit encoding?
On 04/21/2011 10:36 AM, Brian Buma wrote: Hello all- I have a question related to encoding. I'm using a seperate program which takes either 16 bit or 8 bit (flat binary files) as inputs (they are raster satellite imagery and the associated quality files), but can't handle both at the same time. Problem is the quality and the image come in different formats (quality- 8bit, image- 16bit). I need to switch the encoding on the I think some more detail about these files is necessary. What do these 16/8 bit quantities represent? Are these files just a sequence of such quantities, or is there meta information (i.e. image dimension)? quality files to 16 bit, without altering anything else (they are img files right now). I imagine this is a fairly simply process, but I haven't been Does 'img files' indicate that these files are formatted according to a standard?. Finally, are you using some R code to manipulate these files? Have an example, including data? able to find a package or anything which can tell me how to do it- perhaps I'm searching the wrong terms, but I did look. Is there any methods to do this quickly? Ideally, the solution would involve reading in a list of files and replacing the original with the new, 16 bit version, as I have over 300 files to convert. I hope that's clear. Thanks in advance! -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting 16-bit to 8-bit encoding?
OK. I'm going to copy this back to R-help too. With R, we can convert a file of 8-bit integers to 16-bit integers like so: # Create a test file of 8-bit integers: con - file(test.8, wb) writeBin(sample(-1L:4L, 1024, TRUE), con, size=1) close(con) # Convert test.8 to test.16 icon - file(test.8, rb) ocon - file(test.16, wb) while(length(dat - readBin(icon, integer, 1024, size=1)) 0) writeBin(dat, ocon, size=2) close(icon) close(ocon) This assumes (without considering a more formal description of the format) that the file and your computing platform agree on how multi-byte signed integers are represented. Hope that will get you going. On 04/21/2011 11:02 AM, Brian Buma wrote: Apologies. The 8-bit file (the one that needs to be converted) is just a series of integers, -1 to 4, which is no doubt why they are encoded in 8 bit. They don't need to be changed numerically, just put in a 16-bit encoding. No meta info, headerless. All the data is MODIS satellite imagery. I have been using the raster program to visualize things, and processing (when I get that far) will be done in that program mainly. I've used that program on a different project, and it seemed to work well. The actual program that can't handle two different inputs is Timesat, a phenology-program (not R). I was thinking that R could probably do this conversion quick and easy (fairly), but haven't figured out how to yet. As an example, I have an NDVI file (flat binary, 16bit encoding)- so a string of numbers, 4450, 4650, etc... The associated quality file is another string, 1,1,2,1,0, etc. It's encoded as an 8bit file. Conceptually, all it needs (I think) is to be read in and resaved in the less memory-efficient 16-bit format. Thanks! Sorry if the explanation isn't clear. On Thu, Apr 21, 2011 at 9:50 AM, Matt Shotwell matt.shotw...@vanderbilt.edu mailto:matt.shotw...@vanderbilt.edu wrote: On 04/21/2011 10:36 AM, Brian Buma wrote: Hello all- I have a question related to encoding. I'm using a seperate program which takes either 16 bit or 8 bit (flat binary files) as inputs (they are raster satellite imagery and the associated quality files), but can't handle both at the same time. Problem is the quality and the image come in different formats (quality- 8bit, image- 16bit). I need to switch the encoding on the I think some more detail about these files is necessary. What do these 16/8 bit quantities represent? Are these files just a sequence of such quantities, or is there meta information (i.e. image dimension)? quality files to 16 bit, without altering anything else (they are img files right now). I imagine this is a fairly simply process, but I haven't been Does 'img files' indicate that these files are formatted according to a standard?. Finally, are you using some R code to manipulate these files? Have an example, including data? able to find a package or anything which can tell me how to do it- perhaps I'm searching the wrong terms, but I did look. Is there any methods to do this quickly? Ideally, the solution would involve reading in a list of files and replacing the original with the new, 16 bit version, as I have over 300 files to convert. I hope that's clear. Thanks in advance! -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University -- Brian Buma PhD Candidate Ecology and Evolutionary Biology / CIRES University of Colorado, Boulder brian.b...@colorado.edu mailto:brian.b...@colorado.edu -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] print.raw - but convert ASCII?
On Tue, 2011-04-19 at 03:14 -0400, Duncan Murdoch wrote: On 11-04-18 9:51 PM, Matt Shotwell wrote: Does anyone know if there is a simple way to print raw vectors, such that ASCII characters are printed for bytes in the ASCII range, and their hex representation otherwise? rawToChar doesn't work when we have something like c(0x00, 0x00, 0x44, 0x00). Do you really need hex? rawToChar(x, multiple=TRUE) comes close, but displays using octal or symbolic escapes, e.g. No, but I've almost learned to count efficiently in hex. :) [1] \001 \002 \003 \004 \005 \006 \a \b \t \n [12] \v \f \r \016 \017 \020 \021 \022 \023 \024 \025 [23] \026 \027 \030 \031 \032 \033 \034 \035 \036 \037 [34] !\ #$%'() *+ If you really do want hex, then you'll need something like ifelse( x 32 | x = 127, as.character(x), rawToChar(x, multiple=TRUE)) That does it. Thanks. -Matt Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] print.raw - but convert ASCII?
Does anyone know if there is a simple way to print raw vectors, such that ASCII characters are printed for bytes in the ASCII range, and their hex representation otherwise? rawToChar doesn't work when we have something like c(0x00, 0x00, 0x44, 0x00). -Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integer and floating-point storage
Hi Mike, There are some facilities for storing and manipulating small (2 bit) integers. See here: http://cran.r-project.org/web/packages/ff/index.html -Matt On 04/14/2011 01:20 PM, Mike Miller wrote: I note that current implementations of R use 32-bit integers for integer vectors, but I am working with large arrays that contain integers from 0 to 3, so they could be stored as unsigned 8-bit integers. Can R do this? (FYI -- This is for storing minor-allele counts for genetic studies. There are 0, 1 or 2 minor alleles and 3 would represent missing.) It is theoretically possible to store such data with four integers per byte. This is what PLINK (GPL license) does in its binary (.bed) pedigree format: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped That might be too much to hope for. ;-) I think that the R system uses double-precision floating point numbers by default. When I impute minor-allele counts, I get posterior expected values ranging from 0 to 2 (called dosages). The imputation isn't very precise, so it would be fine to store such data using one or two bytes. (The values are used as regressors and small changes would have minimal impact on results.) I could use unsigned 8-bit integers (0 to 255), probably using only 0 to 254 so that 1 and 2 could be represented with perfect precision as 127/127 and 254/127 (but I would do regression on the integer values). Or I could use 16 bits, doubling memory load and improving precision. It would be convenient if R could work with half-precision floating-point numbers (binary16): http://en.wikipedia.org/wiki/Half_precision_floating-point_format Can R do that? If not, is anyone interested in working on developing some of these features in R? We have GPL code from PLINK and Octave that might help a lot. http://www.gnu.org/software/octave/doc/interpreter/Integer-Data-Types.html Best, Mike -- Michael B. Miller, Ph.D. Bioinformatics Specialist Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] understanding dump.frames; typo;
When a function I have stop()s, I'd like it to return its evaluation frame, but not halt execution of the script. In experimenting with this, I became confused with dump.frames. From ?dump.frames: If ‘dump.frames’ is installed as the error handler, execution will continue even in non-interactive sessions. See the examples for how to dump and then quit. Suppose I save the following script to dump-test.R: options(error=dump.frames) cat(interactive:, interactive(), \n) f - function() { stop(dump-test-error) cat(execution continues within f\n) } f() cat(execution continues outside of f\n) if(exists(last.dump)) cat(last.dump is available\n) From an interactive R prompt, execution is halted at 'stop': R source('dump-test.R') interactive: TRUE Error in f() : dump-test-error Using Rscript, execution continues depending on whether you source() the file with the -e flag, or pass the file as an argument. matt@pal ~$ Rscript dump-test.R interactive: FALSE Error in f() : dump-test-error execution continues outside of f last.dump is available matt@pal ~$ Rscript -e source('dump-test.R') interactive: FALSE Error in f() : dump-test-error Calls: source - eval.with.vis - eval.with.vis - f It seems that interactiveness (as tested by interactive()) doesn't come into play, yet execution does *not* always continue. What am I missing? Alternative solutions are also welcome. -Matt P.S. There is a typo in the help file: The dumped object contain the call stack... should read The dumped object contains the call stack sessionInfo() R version 2.13.0 alpha (2011-03-18 r54865) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.13.0 -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Examples of web-based Sweave use?
That's an interesting idea. I had written a long email describing a proof-of-concept, but decided to post is to the website below instead. http://biostatmatt.com/archives/1184 Matt On 04/04/2011 07:31 AM, carslaw wrote: I appreciate that this is OT, but I'd be grateful for pointers to examples of where Sweave has been used for web-based applications. In particular, examples of where reports/analyses are produced automatically through submission of data to a web-sever. I am mostly interested in situations where pdf reports have been produced rather than, say, a plot/table etc shown on a web page. I've had limited success finding examples on this. Many thanks. David Carslaw Environmental Research Group MRC-HPA Centre for Environment and Health King's College London Franklin Wilkins Building Stamford Street London SE1 9NH david.cars...@kcl.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Examples-of-web-based-Sweave-use-tp3425324p3425324.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] library(foreign) read.spss warning
There is some information about this subtype in the PSPP source code, and for other subtypes not yet implemented by read.spss. The PSPP source code indicates that this subtype consists of Value labels for long strings, which isn't very illuminating to me (probably because I don't use PSPP, or SPSS, though I increasingly have need to import SPSS data files). Copied below are the relevant bits. -Matt From (the PSPP source file) src/data/sys-file-reader.c: enum { /* subtypes 0-2 unknown */ EXT_INTEGER = 3, /* Machine integer info. */ EXT_FLOAT = 4, /* Machine floating-point info. */ EXT_VAR_SETS = 5, /* Variable sets. */ EXT_DATE = 6, /* DATE. */ EXT_MRSETS= 7, /* Multiple response sets. */ EXT_DATA_ENTRY= 8, /* SPSS Data Entry. */ /* subtypes 9-10 unknown */ EXT_DISPLAY = 11, /* Variable display parameters. */ /* subtype 12 unknown */ EXT_LONG_NAMES= 13, /* Long variable names. */ EXT_LONG_STRINGS = 14, /* Long strings. */ /* subtype 15 unknown */ EXT_NCASES= 16, /* Extended number of cases. */ EXT_FILE_ATTRS= 17, /* Data file attributes. */ EXT_VAR_ATTRS = 18, /* Variable attributes. */ EXT_MRSETS2 = 19, /* Multiple response sets (extended). */ EXT_ENCODING = 20, /* Character encoding. */ EXT_LONG_LABELS = 21 /* Value labels for long strings. */ }; and static const struct extension_record_type types[] = { /* Implemented record types. */ { EXT_INTEGER, 4, 8 }, { EXT_FLOAT,8, 3 }, { EXT_MRSETS, 1, 0 }, { EXT_DISPLAY, 4, 0 }, { EXT_LONG_NAMES, 1, 0 }, { EXT_LONG_STRINGS, 1, 0 }, { EXT_NCASES, 8, 2 }, { EXT_FILE_ATTRS, 1, 0 }, { EXT_VAR_ATTRS,1, 0 }, { EXT_MRSETS2, 1, 0 }, { EXT_ENCODING, 1, 0 }, { EXT_LONG_LABELS, 1, 0 }, /* Ignored record types. */ { EXT_VAR_SETS, 0, 0 }, { EXT_DATE, 0, 0 }, { EXT_DATA_ENTRY, 0, 0 }, }; On Fri, 2011-03-25 at 18:39 -0500, Robert Baer wrote: I got the following: library(foreign) swal = read.spss(swallowing.sav, to.data.frame =TRUE) Warning message: In read.spss(swallowing.sav, to.data.frame = TRUE) : swallowing.sav: Unrecognized record type 7, subtype 21 encountered in system file The bulk of the data seems to read in a usable form, but I'm curious about what might be getting lost because I don't know how to translate type 7, subtype 21. I did not generate the SPSS data so I'm not certain of the version, but I'm assuming version 18 or 19. I did a quick Find on the PSPP manual for Type 7 and subtype 21 and came up dry. Any insights or clues how I might learn more? Thanks, Rob R.Version() $platform [1] i386-pc-mingw32 $arch [1] i386 $os [1] mingw32 $system [1] i386, mingw32 $status [1] $major [1] 2 $minor [1] 12.2 $year [1] 2011 $month [1] 02 $day [1] 25 $`svn rev` [1] 54585 $language [1] R $version.string [1] R version 2.12.2 (2011-02-25) -- Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences Kirksville, MO 63501 660-626-232 FAX 660-626-2965 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Venn Diagram corresponding to size in R
Try here: https://stat.ethz.ch/pipermail/r-help/2003-February/029393.html On Tue, 2011-03-08 at 20:25 -0500, Shira Rockowitz wrote: I was wondering if anyone could help me figure out how to make a Venn diagram in R where the circles are scaled to the size of each dataset. I have looked at the information for venn (in gplots) and vennDiagram (in limma) and I cannot seem to figure out what parameter to change. I have looked this up online and do not seem to be seeing anyone else who has posted this question or the answer to it before. I see graphs though that are purported to be made in R that are scaled like this, so I think it must be possible, although I do not know if they were made with a custom function. If I have just not been searching for this question correctly, and it has already been asked, please direct me to the earlier question. I would like to thank you all in advance for you help! ~Shira [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assignment by value or reference
On 03/08/2011 07:20 AM, Xiaobo Gu wrote: On Wed, Sep 15, 2010 at 5:05 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: See the R Language Definition manual. Since R knows about lazy evaluation, it is sometimes neither by reference nor by value. If you want to think binary, then by value fits better than by reference. Hi, Can we think it's eventually by value? Not always (see in-line below). For simple functions such as: is(df[[1]], logical) used to test wheather the first column of data frame df is of type logical, will a new vector be created and used inside the is function? No, df[[1]] isn't copied in this case. However, if you subset an atomic vector (subset+assignment is different!), there is copying. For example: df - data.frame(x=c(FALSE,TRUE)) tracemem(df[[1]]) [1] 0x217afa8 is(df[[1]],logical) [1] TRUE is(df[[1]][], logical) tracemem[0x217afa8 - 0xf9d198]: ...cut... [1] TRUE is(df[[1]][1], logical) [1] TRUE Note that tracemem doesn't catch the copying that occurs during evaluation of the last expression. As a strategy, R avoids copying when it's clearly not necessary from the perspective of the R interpreter. There are some notable cases where copying is obviously not necessary from the user perspective (e.g. contiguous subsetting), but avoiding a copy in these cases might be difficult to implement in R's parser/evaluator framework. Here's another simple exception: x - 1 tracemem(x) [1] 0x18984b8 x - x + 1 tracemem[0x18984b8 - 0x207e568]: ...cut... Another example, dbWriteTable(con, tablename, df) will write the content of data frame df into a database table, will a new data frame object created and used inside the dbWriteTable function? No, but if dbWriteTable modifies its local variable that was assigned df, then df may be copied. Thanks. Uwe Ligges On 05.09.2010 17:19, Xiaobo Gu wrote: Hi Team, Can you please tell me the rules of assignment in R, by value or by reference. From my about 3 months of experience of part time job of R, it seems most times it is by value, especially in function parameter and return values assignment; and it is by reference when referencing container sub-objects of This is a function call convention (i.e. passing by value), as distinguished from an assignment convention (I'm not certain they're equivalent in R). In general R functions pass by value. There are exceptions here also, notably R environments. For example: f - function(e) assign(a, 1, e) e - new.env() f(e) objects(e) [1] a Under strict pass-by-value convention, e would remain unchanged. In general, assignments are by value. However, R environments are an exception; assignment is by reference: r - e objects(r) [1] a assign(b, 2, r) objects(r) [1] a b objects(e) [1] a b In this sense, the calling/assignment convention is a property of the objects being passed/assigned. I think that is consistent with Uwe's comment above. Best, Matt R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) container objects, such as elements of List objects and row/column objects of DataFrame objectes; but it is by value when referencing the smallest unit of element of a container object, such as cell of data frame objects. Xiaobo.Gu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rapache ( was Developing a web crawler )
On Sun, 2011-03-06 at 08:06 -0500, Mike Marchywka wrote: Date: Thu, 3 Mar 2011 13:04:11 -0600 From: matt.shotw...@vanderbilt.edu To: r-help@r-project.org Subject: Re: [R] Developing a web crawler / R webkit or something similar? [off topic] On 03/03/2011 08:07 AM, Mike Marchywka wrote: Date: Thu, 3 Mar 2011 01:22:44 -0800 From: antuj...@gmail.com To: r-help@r-project.org Subject: [R] Developing a web crawler Hi, I wish to develop a web crawler in R. I have been using the functionalities available under the RCurl package. I am able to extract the html content of the site but i don't know how to go In general this can be a big effort but there may be things in text processing packages you could adapt to execute html and javascript. However, I guess what I'd be looking for is something like a webkit package or other open source browser with or without an R interface. This actually may be an ideal solution for a lot of things as you get all the content handlers of at least some browser. Now that you mention it, I wonder if there are browser plugins to handle R content ( I'd have to give this some thought, put a script up as a web page with mime type test/R and have it execute it in R. ) There are server-side solutions for this sort of thing. See http://rapache.net/ . Also, there was a string of messages on R-devel some years ago addressing the mime type issue; beginning here: http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't know whether there was a resolution. Some suggestions were text/x-R, text/x-Rd, application/x-RData. The rapache demo looks like something I could use right away but I haven't looked into the handlers yet. I have installed rapache now on my debian system ( still have config issues but I did get apach2 to restart LOL) Before I plow into this too far, how would this compare/compete with something like a PHP library for Rserve? That is the approach I had been pursuing. Thanks. Hi Mike, If you've built and configured RApache, then the difficult plowing is over :). RApache operates at the top (HTTP) layer of the OSI stack, whereas Rserve works at the lower transport/network layer. Hence, the scope of Rserve applications is far more general. Extending Rserve to operate at the HTTP layer (via PHP) will mean more work. RApache offers high level functionality, for example, to replace PHP with R in web pages. No interface code is necessary. Here's a simple What's The Time? webpage using RApache and yarr [1] to handle the code: setContentType(text/html\n\n) html headtitleWhat's The Time?/title/head bodypre/= cat(format(Sys.time(), usetz=TRUE)) /pre/body /html Here's a live version: [2]. Interfacing PHP with Rserve in this context would be useful if installation of R and/or RApache on the web host were prohibited. A PHP/Rserve framework might also be useful in other contexts, for example, to extend PHP applications (e.g. WordPress, MediaWiki). Best, Matt [1] http://biostatmatt.com/archives/1000 [2] http://biostatmatt.com/yarr/time.yarr -Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Developing a web crawler / R webkit or something similar? [off topic]
On 03/03/2011 08:07 AM, Mike Marchywka wrote: Date: Thu, 3 Mar 2011 01:22:44 -0800 From: antuj...@gmail.com To: r-help@r-project.org Subject: [R] Developing a web crawler Hi, I wish to develop a web crawler in R. I have been using the functionalities available under the RCurl package. I am able to extract the html content of the site but i don't know how to go In general this can be a big effort but there may be things in text processing packages you could adapt to execute html and javascript. However, I guess what I'd be looking for is something like a webkit package or other open source browser with or without an R interface. This actually may be an ideal solution for a lot of things as you get all the content handlers of at least some browser. Now that you mention it, I wonder if there are browser plugins to handle R content ( I'd have to give this some thought, put a script up as a web page with mime type test/R and have it execute it in R. ) There are server-side solutions for this sort of thing. See http://rapache.net/ . Also, there was a string of messages on R-devel some years ago addressing the mime type issue; beginning here: http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't know whether there was a resolution. Some suggestions were text/x-R, text/x-Rd, application/x-RData. -Matt about analyzing the html formatted document. I wish to know the frequency of a word in the document. I am only acquainted with analyzing data sets. So how should i go about analyzing data that is not available in table format. Few chunks of code that i wrote: w- getURL(http://www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B003DZ1Y8Q/ref=dp_reviewsanchor#FullQuotes;) write.table(w,test.txt) t- readLines(w) readLines also didnt prove out to be of any help. Any help would be highly appreciated. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Developing-a-web-crawler-tp3332993p3332993.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Robust variance estimation with rq (failure of the bootstrap?)
Jim, Thanks for pointing me to this article. The authors argue that the bootstrap intervals for a robust estimator may not be as robust as the estimator. In this context, robustness is measured by the breakdown point, which is supposed to measure robustness to outliers. Even so, the authors found that the upper bound of a quantile bootstrap interval for the sample median was nearly as robust as the sample median. That brings some comfort in using quantile bootstrap intervals in quantile regression. Does the sandwich estimator assume that errors are independent? And a related question: Does the rq function allow the user to specify clusters/grouping among the observations? Best, Matt On Tue, 2011-03-01 at 05:35 -0600, James Shaw wrote: Matt: Thanks for your prompt reply. The disparity between the bootstrap and sandwich variance estimates derived when modeling the highly skewed outcome suggest that either (A) the empirical robust variance estimator is underestimating the variance or (B) the bootstrap is breaking down. The bootstrap variance estimate of a robust location estimate is not necessarily robust, see Statistics Probability Letters 50 (2000) 49-53. Since submitting my earlier post, I have noticed that the the robust kernel variance estimate is similar to the bootstrap estimate. Under what conditions would one expect Koenker and Machado's sandwich variance estimator, which uses a local estimate of the sparsity, to fail? -- Jim On Mon, Feb 28, 2011 at 8:59 PM, Matt Shotwell m...@biostatmatt.com wrote: Jim, If repeated measurements on patients are correlated, then resampling all measurements independently induces an incorrect sampling distribution (= incorrect variance) on a statistic of these data. One solution, as you mention, is the block or cluster bootstrap, which preserves the correlation among repeated observations in resamples. I don't immediately see why the cluster bootstrap is unsuitable. Beyond this, I would be concerned about *any* variance estimates that are blind to correlated observations. The bootstrap variance estimate may be larger than the asymptotic variance estimate, but that alone isn't evidence to favor one over the other. Also, I can't justify (to myself) why skew would hamper the quality of bootstrap variance estimates. I wonder how it affects the sandwich variance estimate... Best, Matt On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote: I am fitting quantile regression models using data collected from a sample of 124 patients. When modeling cross-sectional associations, I have noticed that nonparametric bootstrap estimates of the variances of parameter estimates are much greater in magnitude than the empirical Huber estimates derived using summary.rq's nid option. The outcome variable is severely skewed, and I am afraid that this may be affecting the consistency of the bootstrap variance estimates. I have read that the m out of n bootstrap can be used to overcome this problem. However, this procedure requires both the original sample (n) and the subsample (m) sizes to be large. The version implemented in rq.boot does not appear to provide any improvement over the naive bootstrap. Ultimately, I am interested in using median regression to model changes in the outcome variable over time. Summary.rq's robust variance estimator is not applicable to repeated-measures data. I question whether the block (cluster) bootstrap variance estimator, which can accommodate intraclass correlation, would perform well. Can anyone suggest alternatives for variance estimation in this situation? Regards, Jim James W. Shaw, Ph.D., Pharm.D., M.P.H. Assistant Professor Department of Pharmacy Administration College of Pharmacy University of Illinois at Chicago 833 South Wood Street, M/C 871, Room 266 Chicago, IL 60612 Tel.: 312-355-5666 Fax: 312-996-0868 Mobile Tel.: 215-852-3045 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- James W. Shaw, Ph.D., Pharm.D., M.P.H. Assistant Professor Department of Pharmacy Administration College of Pharmacy University of Illinois at Chicago 833 South Wood Street, M/C 871, Room 266 Chicago, IL 60612 Tel.: 312-355-5666 Fax: 312-996-0868 Mobile Tel.: 215-852-3045 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo
Re: [R] Robust variance estimation with rq (failure of the bootstrap?)
Jim, If repeated measurements on patients are correlated, then resampling all measurements independently induces an incorrect sampling distribution (= incorrect variance) on a statistic of these data. One solution, as you mention, is the block or cluster bootstrap, which preserves the correlation among repeated observations in resamples. I don't immediately see why the cluster bootstrap is unsuitable. Beyond this, I would be concerned about *any* variance estimates that are blind to correlated observations. The bootstrap variance estimate may be larger than the asymptotic variance estimate, but that alone isn't evidence to favor one over the other. Also, I can't justify (to myself) why skew would hamper the quality of bootstrap variance estimates. I wonder how it affects the sandwich variance estimate... Best, Matt On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote: I am fitting quantile regression models using data collected from a sample of 124 patients. When modeling cross-sectional associations, I have noticed that nonparametric bootstrap estimates of the variances of parameter estimates are much greater in magnitude than the empirical Huber estimates derived using summary.rq's nid option. The outcome variable is severely skewed, and I am afraid that this may be affecting the consistency of the bootstrap variance estimates. I have read that the m out of n bootstrap can be used to overcome this problem. However, this procedure requires both the original sample (n) and the subsample (m) sizes to be large. The version implemented in rq.boot does not appear to provide any improvement over the naive bootstrap. Ultimately, I am interested in using median regression to model changes in the outcome variable over time. Summary.rq's robust variance estimator is not applicable to repeated-measures data. I question whether the block (cluster) bootstrap variance estimator, which can accommodate intraclass correlation, would perform well. Can anyone suggest alternatives for variance estimation in this situation? Regards, Jim James W. Shaw, Ph.D., Pharm.D., M.P.H. Assistant Professor Department of Pharmacy Administration College of Pharmacy University of Illinois at Chicago 833 South Wood Street, M/C 871, Room 266 Chicago, IL 60612 Tel.: 312-355-5666 Fax: 312-996-0868 Mobile Tel.: 215-852-3045 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Visualizing Points on a Sphere
That's interesting. You might also like: http://en.wikipedia.org/wiki/Von_Mises%E2%80%93Fisher_distribution I'm not sure how to plot the wireframe sphere, but you can visualize the points by transforming to Cartesian coordinates like so: u - runif(1000,0,1) v - runif(1000,0,1) theta - 2 * pi * u phi - acos(2 * v - 1) x - sin(theta) * cos(phi) y - sin(theta) * sin(phi) z - cos(theta) library(lattice) cloud(z ~ x + y) -Matt On Fri, 2011-02-25 at 14:21 +0100, Lorenzo Isella wrote: Dear All, I need to plot some points on the surface of a sphere, but I am not sure about how to proceed to achieve this in R (or if it is suitable for this at all). In any case, I am not looking for really fancy visualizations; for instance you can consider the images between formulae 5 and 6 at http://bit.ly/hOgK9h Any suggestion is appreciated. Cheers Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transfer function observed vs predicted values graph problem
Hi, I am trying to make a palaeoenvironmental transfer function using the R package rioja that predicts the water-table (measured as depth to the water table) of an area given the testate amoebae that are found there. I've carried out weighted averaging of the data and am trying to produce a graph that shows the observed water-table versues the model's predicted values. Following the instructions in the rioja help booklet (see below), I end up with a graph where the origin is not at the bottom left of the diagram, i.e. the graph is showing some values that suggest that the water table is, say, 1m above ground. I've tried entering the water-tables as negative values but the same thing happens. Does anybody know if there something I'm missing out? Or is there a way that, if the values returned are less than 0, then they can automatically be put just as 0? Any help would be most appreciated, Thank you, Matthew My environmental matrix (x) is: SampleId WTD Moisture pH EC 1 1 20 91.72700 3.496674 85.02688 2 2 2 93.88913 3.550794 85.69465 3 3 26 90.30269 3.948559 113.19206 4 4 5 94.14427 3.697213 48.56375 5 5 30 90.04269 3.745020 108.57278 90 GAL_15 70 94.07849 3.777932 66.77673 The species matrix (y) contains the abundance of 32 species over 90 sites, set out like this F1 AmpFlav AmpWri ArcCat ArcDis 1 1 22.2929936 0.000 0.000 0.000 2 2 30.9677419 0.000 0.000 3.2258065 fit - WA(y, x, tolDW = FALSE, use.N2=TRUE, check.data=TRUE, lean=FALSE) # plot predicted vs. observed plot(fit) plot(fit, resid=TRUE) # Water-table reconstruction pred - predict(fit, y) #plot the reconstruction plot(sites, pred$fit[, 1], type=b) # cross-validation model using bootstrapping fit.xv - crossval(fit, cv.method=boot, nboot=1000) par(mfrow=c(1,2)) plot(fit) plot(fit, resid=TRUE) plot(fit.xv, xval=TRUE) plot(fit.xv, xval=TRUE, resid=TRUE) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with writing a file in UTF-8
Thomas, I wasn't able to reproduce your finding. The last two characters in my 'out.txt' file were just as expected. But, I'm in an UTF-8 locale. Your locale affects the encoding of characters on your platform. If you're not in a UTF-8 locale, then characters are converted from your native encoding to UTF-8 (when you specify encoding=UTF-8). In the process of conversion, it's possible to lose information. You can test whether there is a loss (or a change rather) when R writes these characters like so: # what does űŁ look like in binary (hex)? raw_before - charToRaw(űŁ) # write 'out.txt' as before out - file(description=out.txt, open=w, encoding=UTF-8) write(x=űŁ, file=out) close(con=out) # read in the two characters out - file(description=out.txt, open=r, encoding=UTF-8) raw_after - charToRaw(readChar(con=out, nchars=2)) close(con=out) # compare the raw representations identical(raw_before, raw_after) This test passes on my machine. But, there's also the question of whether these characters made it onto R-help list unaltered. Also, please include the result of sessionInfo() in you subsequent messages. Best, Matt sessionInfo() R version 2.11.1 (2010-05-31) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base On Thu, 2011-02-17 at 13:54 -0800, tpklein wrote: Hello, I am working with a data frame containg character strings with many special symbols from various European languages. When writing such character strings to a file using the UTF-8 encoding, some of them are converted in a strange way. See the following example, run in R 2.12.1 on Windows 7: out - file( description=out.txt, open=w, encoding=UTF-8) write( x=äöüßæűŁ, file=out ) close( con=out ) The last two symbols in the character string are converted to uL while all other characters are not changed (which is what I want). How to explain this? Does it have something to do with my locale? And is there a way to work around this problem? -- Any help would be greatly appreciated. Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] non-ascii characters in R output
All, I'd like to automatically output text from R to HTML. In doing this I've run into trouble with non-ascii characters, as my browser (and presumably others) does not render such characters correctly. For example, the 'fancy' single quotes associated with summary.lm are multi-byte characters on my platform. This particular problem is solved by options(useFancyQuotes=FALSE). But now I'm concerned about other non-ascii characters. As an overkill maybe, my current solution involves capture.output and iconv(..., to=ASCII//TRANSLIT). Are there other sources of non-ascii character? Is there a better or general solution? Best, Matt sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.12.1 -- Matthew S Shotwell Assistant Professor School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-ascii characters in R output
OK, looks like my web browser does render non-ascii characters output by R when it's given the encoding explicitly. This works for me: meta http-equiv=Content-Type content=text/html; charset=UTF-8/. So that's another solution, but not a general one. -Matt On Fri, 2011-02-18 at 12:47 -0600, Matt Shotwell wrote: All, I'd like to automatically output text from R to HTML. In doing this I've run into trouble with non-ascii characters, as my browser (and presumably others) does not render such characters correctly. For example, the 'fancy' single quotes associated with summary.lm are multi-byte characters on my platform. This particular problem is solved by options(useFancyQuotes=FALSE). But now I'm concerned about other non-ascii characters. As an overkill maybe, my current solution involves capture.output and iconv(..., to=ASCII//TRANSLIT). Are there other sources of non-ascii character? Is there a better or general solution? Best, Matt sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.12.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.