[R] Restructuring Star Wars data from rwars package

2017-08-03 Thread Matt Van Scoyoc
I'm having trouble restructuring data from the rwars package into a
dataframe. Can someone help me?

Here's what I have...

library("rwars")
library("tidyverse")

# These data are json, so they load into R as a list
people <- get_all_people(parse_result = T)
people <- get_all_people(getElement(people, "next"), parse_result = T)

# Look at Anakin Skywalker's data
people$results[[1]]
people$results[[1]][1] # print his name

# To use them in R, I need to restructure them to a dataframe like they are
in dplyr
data("starwars")
glimpse(starwars)

Thanks for the help.

Cheers,
MVS
=
Matthew Van Scoyoc
=
Think SNOW!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fread transforms numbers

2017-03-22 Thread Matt Dowle
Thanks Bill for cc.

Santosh,

I'm almost certain you don't have package bit64 installed.  When you do it
works fine :

> remove.packages("bit64")
> data.table::fread("9876543210\n")
  V1
1: 4.879661e-314
> install.packages("bit64")
> data.table::fread("9876543210\n")
   V1
1: 9876543210

News for data.table v1.10.2 on CRAN 31 Jan 2017 contained :

* When fread() or print() see integer64 columns are present, bit64's
namespace is now automatically loaded for convenience.

However, when data.table loads the namespace there is a bug in this
function :

> data.table:::require_bit64
function ()
{
tt = try(requireNamespace("bit64", quietly = TRUE))
if (inherits(tt, "try-error"))
warning("Some columns are type 'integer64' but package bit64 is not
installed. Those columns will print as strange looking floating point data.
There is no need to reload the data. Simply install.packages('bit64') to
obtain the integer64 print method and print the data again.")
}

The intent was to display that nice helpful message to you.   Due to this
report, I can see now that I shouldn't have wrapped requireNamespace() with
try() because  requireNamespace() returns TRUE or FALSE anyway. Even though
requireNamespace() prints 'Failed with error' it doesn't actually throw an
error.  I'll change data.table's function to the following :

if (!requireNamespace("bit64", quietly = TRUE))
warning("Some columns ...")

bit64 is correctly Suggests not Depends.   It's just unfortunate the
intended message wasn't displayed.

Santosh, in future please follow the data.table support guide here:
https://github.com/Rdatatable/data.table/wiki/Support.  r-help is not
supposed to be used for package support.  The main thing though is thanks
for helping me find this bug.

Thanks,
Matt


On Wed, Mar 22, 2017 at 10:22 AM, William Dunlap <wdun...@tibco.com> wrote:

> Here is a way to reproduce the problem:
>   > data.table::fread("9876543210\n") # number bigger than 2^31-1
> V1
>   1: 4.879661e-314
> and your work-around does fix things up
>   > data.table::fread("9876543210\n", colClasses="numeric")
>  V1
>   1: 9876543210
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Mar 22, 2017 at 9:58 AM, Jeff Newmiller
> <jdnew...@dcn.davis.ca.us> wrote:
> > You failed to provide a reproducible example, and you posted HTML so the
> quality of any answer will be limited by the quality of your question.
> >
> > My stab at your problem is that you should read ?fread, and in
> particular should try using the colClasses argument.
> > --
> > Sent from my phone. Please excuse my brevity.
> >
> > On March 22, 2017 8:52:55 AM PDT, Santosh <santosh2...@gmail.com> wrote:
> >>Hi
> >>
> >>I have been using "fread" utility of "data.table" packge .. on a
> >>dataset of
> >>about 20 million rows. It's a fantastic package to read datasets. Thank
> >>you, Matt D.
> >>
> >>However, I am faced with a peculiar instance of  certain numbers in a
> >>column being transformed.
> >>
> >>In the dataset, a column has values ranging from 1 to 9##
> >>(nchar(x)=11, e.g. 98765432109). After using "fread" to read the
> >>dataset,
> >>values in all the columns are displayed correctly upto the first 1000
> >>rows.
> >>If "fread" is applied for reading >1000 rows of  the total of 20Million
> >>rows, the values in only this (column (having wide range of values) are
> >>displayed as x.xxxe-3yy. (e.g. 3.5639877e-324)
> >>
> >>I tried reading all the columns as "character" and didn't help.
> >>
> >>Would highly appreciate your assistance!
> >>
> >>Thanks so much in advance.
> >>
> >>Best regards,
> >>Santosh
> >>
> >>   [[alternative HTML version deleted]]
> >>
> >>__
> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Validating Minitab's "Expanded Gage R Study" using R and lme4

2017-02-15 Thread Matt Jacob
I'm trying to validate the results of an "Expanded Gage R Study" in
Minitab using R and lme4, but I can't get the numbers to match up in
certain situations. I can't tell whether my model is wrong, my data is
bad, or something else is going on.

For instance, here's some data for which the results don't match:

https://i.stack.imgur.com/5PCgm.png

After running the gage study, these are the results according to
Minitab:

Study Var  %Study Var  %Tolerance
Source StdDev (SD)   (6 * SD)   (%SV)  (SV/Toler)
Total Gage R 1.7627710.5766  100.00   14.36
  Repeatability0.0 0.0.000.00
  Reproducibility  1.7627710.5766  100.00   14.36
B  0.0 0.0.000.00
A*B1.7627710.5766  100.00   14.36
Part-To-Part   0.0 0.0.000.00
  A0.0 0.0.000.00
Total Variation1.7627710.5766  100.00   14.36

But when I mimic Minitab's results by parsing the output from lmer() and
doing the arithmetic in Excel, this is what I see:

https://i.stack.imgur.com/EGg9F.png

The raw output from lmer() was:

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ 1 + (1 | A) + (1 | B) + (1 | A:B)
   Data: d

REML criterion at convergence: -100.1

Scaled residuals: 
   Min 1Q Median 3QMax 
-1.308e-07 -1.308e-07 -1.308e-07 -6.541e-08  1.308e-07 

Random effects:
 Groups   NameVariance  Std.Dev. 
 A:B  (Intercept) 1.333e+00 1.154e+00
 B(Intercept) 7.066e-04 2.658e-02
 A(Intercept) 2.260e-03 4.754e-02
 Residual 2.655e-14 1.629e-07
Number of obs: 8, groups:  A:B, 4; B, 2; A, 2

Fixed effects:
Estimate Std. Error t value
(Intercept)52.17   0.57   91.53
convergence code: 0
Model failed to converge with max|grad| = 0.422755 (tol = 0.002,
component 1)
Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

And the R code that produced that output is:

library(lme4)
A <- factor(c(1, 1, 2, 2, 2, 1, 2, 1))
B <- factor(c(1, 2, 1, 2, 1, 2, 2, 1))
y <- c(51.356124843620798, 51.356124843620798, 54.8816618912481,
51.356124843620798, 54.8816618912481, 51.356124843620798,
51.356124843620798, 51.356124843620798)
d <- data.frame(y, A, B)
fm <- lmer(y ~ 1 + (1|A) + (1|B) + (1|A:B), d)
summary(fm)

For a different measurement with a different response, it's a completely
different situation! Given the following data:

https://i.stack.imgur.com/cH0bO.png

The resulting table from Minitab is:

Study Var  %Study Var  %Tolerance
Source StdDev (SD)   (6 * SD)   (%SV)  (SV/Toler)
Total Gage R0.1936491.16190   55.901.00
  Repeatability   0.0935410.56125   27.000.48
  Reproducibility 0.1695581.01735   48.950.88
B 0.1322880.79373   38.190.68
A*B   0.1060660.63640   30.620.55
Part-To-Part  0.2872281.72337   82.921.49
  A   0.2872281.72337   82.921.49
Total Variation   0.3464102.07846  100.001.79

And after plugging my R results into Excel, I get exactly the same
thing:

https://i.stack.imgur.com/jUEAP.png

Which was produced by this R code:

library(lme4)
A <- factor(c(1, 1, 2, 2, 2, 1, 2, 1))
B <- factor(c(1, 2, 1, 2, 1, 2, 2, 1))
y <- c(-49.4, -49.8, -50.1, -50.1, -50.0, -49.9, -50.2, -49.6)
d <- data.frame(y, A, B)
fm <- lmer(y ~ 1 + (1|A) + (1|B) + (1|A:B), d)
summary(fm)

That generated the following lmer() summary:

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ 1 + (1 | A) + (1 | B) + (1 | A:B)
   Data: d

REML criterion at convergence: -3.8

Scaled residuals: 
Min  1Q  Median  3Q Max 
-0.7705 -0.6853 -0.1039  0.4379  1.4151 

Random effects:
 Groups   NameVariance Std.Dev.
 A:B  (Intercept) 0.01125  0.10607 
 B(Intercept) 0.01750  0.13229 
 A(Intercept) 0.08250  0.28723 
 Residual 0.00875  0.09354 
Number of obs: 8, groups:  A:B, 4; B, 2; A, 2

Fixed effects:
Estimate Std. Error t value
(Intercept) -49.8875 0.2322  -214.9

Is the difference attributable to the warnings produced by lmer() about
the model failing to converge and being nearly unidentifiable? What
could Minitab be doing differently when the measurement data contains
only two distinct values?

Matt

This question is cross-posted to
http://stats.stackexchange.com/questions/262170/how-can-i-validate-minitabs-expanded-gage-rr-study-using-open-source-tools

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the postin

[R] class-specific Gini metrics for Random Forest?

2015-10-13 Thread Matt Fagan
library(randomForest)
data(iris)
fit <- randomForest(Species ~ ., data=iris, importance=TRUE);
fit.imp<-importance(fit)
fit.imp

columns 1-3 of fit.imp show the class-specific variable importance for the
Mean Decrease Acuracy measure (MDA). Is there a way to calculate
class-specific Gini metrics rather than the default class-specific MDA?
Simply setting "importance(fit, type=2)" doesn't do it.

I really want to do calculate these metrics. I was about to start trying to
code a way to do it, but thought I would ask here first.  Many thanks for
any help or pointers--I hope I missed something simple.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshape: melt and cast

2015-09-01 Thread Matt Pickard
"1152",
"1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
"1183", "1191", "1196", "1197", "1198", "1199", "1200", "1201",
"1203", "1205", "1207", "1208", "1209", "1214", "1216", "1219",
"1220", "1222", "1223", "1224", "1225", "1226", "1229", "1236",
"1237", "1238", "1240", "1241", "1243", "1245", "1246", "1248",
"1254", "1255", "1256", "1257", "1260", "1262", "1264", "1268",
"1270", "1272", "1278", "1279", "1280", "1282", "1283", "1287",
"1288", "1292", "1293", "1297", "1310", "1311", "1315", "1329",
"1332", "1333", "1343", "1346", "1347", "1352", "1354", "1355",
"1356", "1360", "1368", "1369", "1370", "1378", "1398", "1400",
"1403", "1404", "1411", "1412", "1420", "1421", "1423", "1424",
"1426", "1428", "1432", "1433", "1435", "1436", "1438", "1439",
"1440", "1441", "1443", "1444", "1446", "1447", "1448", "1449",
"1450", "1453", "1454", "1456", "1459", "1460", "1461", "1462",
"1463", "1468", "1471", "1475", "1478", "1481", "1482", "1487",
"1488", "1490", "1493", "1495", "1497", "1503", "1504", "1508",
"1509", "1511", "1513", "1514", "1515", "1522", "1524", "1525",
"1526", "1527", "1528", "1529", "1532", "1534", "1536", "1538",
"1539", "1540", "1543", "1550", "1551", "1552", "1554", "1555",
    "1556", "1558", "1559"), class = "factor"), RaterName = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("cwormhoudt", "zspeidel"), class =
"factor"),
SI1 = c(1L, 1L, 1L, 1L, 1L, 1L), SI2 = c(3L, 2L, 2L, 3L,
3L, 2L), SI3 = c(3L, 2L, 3L, 3L, 3L, 2L), SI4 = c(1L, 1L,
1L, 1L, 1L, 1L), SI5 = c(1L, 1L, 1L, 1L, 1L, 1L), SI6 = c(1L,
1L, 1L, 1L, 1L, 1L), SI7 = c(1L, 1L, 1L, 2L, 2L, 1L), SI8 = c(1L,
1L, 1L, 1L, 1L, 1L), SI9 = c(1L, 1L, 1L, 1L, 1L, 1L), SI10 = c(1L,
1L, 1L, 2L, 2L, 1L), SI11 = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names =
c("QCode",
"PID", "RaterName", "SI1", "SI2", "SI3", "SI4", "SI5", "SI6",
"SI7", "SI8", "SI9", "SI10", "SI11"), row.names = 2456:2461, class =
"data.frame")


I am trying to use the melt and cast functions to re-arrange to have column
names QCode, PID, sItem, cwormhoudt, zpeidel.  Under each of the last two
columns I want the values that correspond to each of RaterNames.

So, I melt the data like this:

mratings = melt(ratings, variable_name="sItem")

Then cast the data like this:

> outData = cast(mratings, QCode + PID + sItem ~ RaterName)
Aggregation requires fun.aggregate: length used as default

But the value columns appear to be displaying counts and not the original
values.

> dput(head(outData))
structure(list(QCode = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("APPEAR",
"FEAR", "FUN", "GRAT", "GUILT", "Joy", "LOVE", "UNGRAT"), class =
"factor"),
PID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1123",
"1136", "1137", "1142", "1146", "1147", "1148", "1149", "1152",
"1153", "1154", "1156", "1158", "1161", "1164", "1179", "1182",
"1183", "1191", "11

[R] reshape: melt and cast

2015-08-31 Thread Matt Pickard
Hi,

I have data that looks like this:









*> head(ratings)  QCode  PID  RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7 SI8 SI9
SI10 SI111 GUILT 1123 cwormhoudt   2   2   3   1   1   1   3   3   3
212  LOVE 1123 cwormhoudt   1   2   3   2   1   1   1   1   11
33 GUILT 1136 cwormhoudt   1   2   3   1   1   1   2   3   2214
LOVE 1136 cwormhoudt   1   2   3   1   1   1   1   1   1125 GUILT
1137 cwormhoudt   2   2   2   1   1   1   2   3   1216  LOVE 1137
cwormhoudt   1   3   4   1   1   1   1   1   114*








*> tail(ratings)  QCode  PID RaterName SI1 SI2 SI3 SI4 SI5 SI6 SI7 SI8
SI9 SI10 SI112456FUN 1555  zspeidel   1   3   3   1   1   1   1   1
1112457FUN 1556  zspeidel   1   2   2   1   1   1   1   1
1112458FUN 1558  zspeidel   1   2   3   1   1   1   1   1
1112459 APPEAR 1558  zspeidel   1   3   3   1   1   1   2   1
1212460 APPEAR 1559  zspeidel   1   3   3   1   1   1   2   1
1212461FUN 1559  zspeidel   1   2   2   1   1   1   1   1
111*
I am trying to use the melt and cast functions to re-arrange it to look
like this:








*   QCode  PID sItem cwormhoudt zspeidel1 APPEAR 1123   SI1
112 APPEAR 1123   SI2  413 APPEAR 1123
SI3  124 APPEAR 1123   SI4  315 APPEAR
1123   SI5  116 APPEAR 1123   SI6  13*
So, I melt the data like this:



*mratings = melt(ratings, variable_name="sItem")*
Then cast the data like this:


*> outData = cast(mratings, QCode + PID + sItem ~ RaterName)Aggregation
requires fun.aggregate: length used as default*

But the value columns appear to be displaying counts and not the original
values:













*> head(outData)   QCode  PID sItem cwormhoudt zspeidel1 APPEAR 1123
SI1  112 APPEAR 1123   SI2  113 APPEAR
1123   SI3  114 APPEAR 1123   SI4  115
APPEAR 1123   SI5  116 APPEAR 1123   SI6  1
1> which(outData$zpeidel==3)integer(0)*
How to I prevent cast from aggregating the data according to counts?  Am I
doing something wrong?

Thanks in advance.

MP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hurdle control and optim

2015-08-17 Thread Matt Dicken

I was hoping someone may be able to help with the following.

I fit the model below using the pscl package. I am modelling catch data (about 
17,000 entry points) so lots of zero's

fit.hurdle.bin = hurdle(Catch ~ Beach + Region + Year+
  Decade + Month + Season + Whale+ Sex + Size+ meantemp +
  meanviz + offset(log(Length.nets..km.)),
  dist=poisson,zero.dist=binomial,link=logit,trace=T)

The model output tells me that:
Warning message: In sqrt(diag(object$vcov)) : NaNs produced (against year)

I then use hurdle control with L-BFGS-B to set some parameter controls to 
solve this issue, but get the warning message:
L-BFGS-B needs finite values of 'fn'
In addition: Warning message:
In optim(fn = countDist, gr = countGrad, par = c(start$count, if (dist ==  :
  method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol'

How do I write the script for Hurdle control to solve these issues?
Any help would be really appreciated
All the best
Matt





Dr. Matt Dicken
Senior Scientist
Telephone: 0315660400 | Fax: 0315660493 | Email: m...@shark.co.za
Physical Address: 1a Herrwood Drive, Umhlanga Rocks, 4320 | www.shark.co.za

[http://www.shark.co.za/ImageHandler.ashx?fguid=2c107195-209c-4fb2-aae5-31e45ce5de1a]

Connect with us on social media: [KZNSB Facebook]  
https://www.facebook.com/kznsb [KZNSB Twitter]  
https://twitter.com/KznSharks?lang=en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hurdle control and optim

2015-08-17 Thread Matt Dicken
Dear Achim,

Apologies for the cross posting and confusion. I really appreciate the help

All the best

Matt

-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@r-project.org] 
Sent: 17 August, 2015 10:06 PM
To: Matt Dicken m...@shark.co.za
Cc: r-help@r-project.org
Subject: Re: [R] hurdle control and optim

Please refrain from cross-posting. The same request was sent to the author of 
hurdle(), R-help, and StackOverflow (where it was already answered). 
Also, do provide self-contained and reproducible code as would be appropriate 
in any of the three cases.

On Mon, 17 Aug 2015, Matt Dicken wrote:


 I was hoping someone may be able to help with the following.

 I fit the model below using the pscl package. I am modelling catch 
 data (about 17,000 entry points) so lots of zero's

 fit.hurdle.bin = hurdle(Catch ~ Beach + Region + Year+
  Decade + Month + Season + Whale+ Sex + Size+ meantemp +
  meanviz + offset(log(Length.nets..km.)),
  
 dist=poisson,zero.dist=binomial,link=logit,trace=T)

 The model output tells me that:
 Warning message: In sqrt(diag(object$vcov)) : NaNs produced (against 
 year)

 I then use hurdle control with L-BFGS-B to set some parameter controls to 
 solve this issue, but get the warning message:
 L-BFGS-B needs finite values of 'fn'
 In addition: Warning message:
 In optim(fn = countDist, gr = countGrad, par = c(start$count, if (dist ==  :
  method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol'

 How do I write the script for Hurdle control to solve these issues?
 Any help would be really appreciated
 All the best
 Matt





 Dr. Matt Dicken
 Senior Scientist
 Telephone: 0315660400 | Fax: 0315660493 | Email: m...@shark.co.za 
 Physical Address: 1a Herrwood Drive, Umhlanga Rocks, 4320 | 
 www.shark.co.za

 [http://www.shark.co.za/ImageHandler.ashx?fguid=2c107195-209c-4fb2-aae
 5-31e45ce5de1a]

 Connect with us on social media: [KZNSB Facebook]  
 https://www.facebook.com/kznsb [KZNSB Twitter]  
 https://twitter.com/KznSharks?lang=en

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reproducing a 3d yield curve plot from New York Times

2015-06-04 Thread matt
, 1325203200, 1327968000,
1330473600, 1333065600, 1335744000, 1338422400, 1340928000, 1343692800,
1346371200, 1348790400, 1351641600, 1354233600, 1356912000, 1359590400,
1362009600, 1364428800, 136728, 1369958400, 1372377600, 1375228800,
1377820800, 1380499200, 1383177600, 1385683200, 1388448000, 1391126400,
1393545600, 1396224000, 1398816000, 1401408000, 1404086400, 1406764800,
1409270400, 1412035200, 1414713600, 1417132800, 1419984000, 1422403200
), tzone = UTC, tclass = Date), .Dim = c(301L, 11L), .Dimnames = 
list(

NULL, c(1M, 3M, 6M, 1Y, 2Y, 3Y, 5Y, 7Y, 10Y,
20Y, 30Y)))

chartSeries3d0(term.structure,r=1,col=c(lightblue,darkblue),
   
border=NA,theta=45,ltheta=0,shade=0.15,smoother=1,phi=15,scale=FALSE,expand=0.75)


Can anyone suggest a different package to work with to get closer to the 
above-mentioned output?  I'm interested in figuring out how to smooth 
the color transitions, add the grid/gridlines and use a different set of 
color gradients when the values are negative.  Also, I realize that my 
colors transition along the wrong axis (1m, 3m, etc) rather than along 
y.


Thanks in advance.  I've tried to find a reference to this in the 
archives and have come up empty.  As well, I've tried to make this 
reproducible.

Matt

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Processing key_column, begin_date, end_date in R

2015-02-25 Thread Matt Gross
Hi,

I am trying to process a large dataset in R.  The dataset contains the
following three columns:

key_column - a unique key identifier
begin_date - the start date of the active period
end_date - the end date of the active period


Example data is here:

key_column,begin_date,end_date
123456,2013-01-01,2014-01-01
123456,2013-07-01,2014-07-01
789102,2012-03-01,2014-03-01
789102,2015-02-01,2016-02-01
789102,2015-02-06,2016-02-06

I want to build a condensed table of key_column and begin_date's and
end_date's.  As you can see in the example data above, some begin and end
date periods overlap with begin_date and end_date pairs for the same
key_column.  In situations where overlap exists I want to have one record
for the key_column with the min(begin_date) and the max(end_date).

Can anyone help me build the commands to process this data in R?

Thanks,
Matt

-- 
Matt Gross
gro...@gmail.com
503.329.4545

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] global environment

2015-01-12 Thread Matt Warpinski
Rewrite it with spaces between your assigns and numbers. This line is
unclear to me: if(rst[i]-3   rst[i]=-3)

Is it supposed to be rst[i] - 3, or rst[i]  -3? R might be
misinterpreting what you're trying to get it to do.

On Mon, Jan 12, 2015 at 1:18 AM, Methekar, Pushpa (GE Transportation,
Non-GE) pushpa.methe...@ge.com wrote:

 Hi
 I am trying to make some changes in data frame and return it to function
 .this is my function
 rm.outliers = function(model,xsys)
 {

   rst = rstudent(model)
   outliers-vector(numeric,10)
   xsys-xsys
 for(i in 1:length(rst))
   {
 if(rst[i]-3   rst[i]=-3)
   {
   #print(this is not outlier)
   print(i)
 }
 else
   {
   print(this is an outlier)
   print(i)
   outliers[i]-c(i)
   print(outliers)

 }

 i-i+1
   }

 xsys-xsys[-outliers,]
 print( printing rows)
 nrow(xsys)
 return(xsys)
 }
 After returning xsys dataframe its not making changes in my global
 environment data frame.
 I tried assign and - but no use.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parsing Google Finance page data?

2014-11-20 Thread Matt Considine

Hi,
I'm wondering if anyone can point me to code to parse data on Google 
Finance pages, i.e. parse the results of a URL request such as this

  http://www.google.com/finance?q=apple

I know how to return the contents of the page; it's figuring out the 
best tools to parse it that I'm interested in and hopefully someone has 
already done this.


(For what it is worth, the only info I am looking for are the ticker, 
exchange, currency and Mkt Cap datapoint)


Thanks in advance for any help - scraping is not my strong suit.
Matt


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing Google Finance page data?

2014-11-20 Thread Matt Considine
FWIW, this is the kludge I came up with.  The idea is that I only know 
the name of the company and not the ticker/exchange.  So the following 
admittedly doesn't work in all cases (e.g. Time Warner).  So if anyone 
alternatively knows how to return a list of tickers/exchanges of 
companies matching a name, that would be helpful.  (Though that question 
should probably go to the finance list).  In any case, thanks in advance 
for any thoughts put towards this.
Matt

library(RCurl)
library(xts)
library(XML)

#want to return results of this
# http://www.google.com/finance?q=ibm

coname - ibm

baseurl -paste(http://www.google.com/finance?q=,coname,sep=;)

# Read and parse HTML file
doc.html = htmlTreeParse(baseurl, useInternalNodes=TRUE)

tables - 
readHTMLTable(doc.html,which=2,as.data.frame=T,stringsAsFactors = FALSE)
mktcap - tables[4,2]

doc.text = unlist(xpathApply(doc.html, '//script', xmlValue))

block - doc.text[11]
exchangeticker-unlist(strsplit(block,'\n'))[11]

doc.text = unlist(xpathApply(doc.html, '//div', xmlValue))
currency - doc.text[60]

print(mktcap)
print(exchangeticker)
print(currency)


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem Invoking System Commands from R

2014-10-11 Thread Matt Borkowski
Hello,
First please keep in mind I am not a programmer and know very little about R. I 
am running the 64bit version of R on a Windows 8.1 machine. I am trying to run 
a script (which I have successfully run in the past) to download some weather 
data from a NOAA ftp site.
When I attempt to run the following command:     system(wget -P data/raw 
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz;)

it returns status 127, which as I understand simply means the command will not 
run.
If I go directly to my command prompt in Windows, navigate to my working 
director, and run     wget -P data/raw 
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz
the command runs and the file downloads without a problem. 
Playing around, it seems I can't invoke any system commands from R. Even a 
simple      system(dir)
returns status 127.
I have moved to a new computer since I last successfully ran this script...I'm 
wondering if this might be a permissions issue or other security setting 
preventing me from invoking system commands.
Any ideas?
-Matt
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem Invoking System Commands from R

2014-10-11 Thread Matt Borkowski
I appreciate the feedback.

1) The paths are properly set...I only wonder if the spaces in the path to 
wget.exe are problematic for R. The full path (C:\\Program Files 
(x86)\\GnuWin32\\bin) is properly included in the return list for 
Sys.getenv(PATH). Sys.which(wget) returns:

C:\\PROGRA~2\\GnuWin32\\bin\\wget.exe

Note that in this return, the folder 'Program Files (x86)' was truncated. Not 
sure if that is a problem in this. Also as mentioned, wget works fine directly 
from the Windows CMD line, so it strikes me as an issue calling a system 
command from R as opposed to a problem with the command itself.

2) 'dir' is a recognized command at the Windows command line...but it is 
somewhat irrelevant as I was only using it to determine whether any calls to 
the Windows command line from R were working...it is not essential to the 
script.

One further point, I booted up my old machine last night and reinstalled R and 
wget...and was successfully able to run the script. Old machine is Windows XP 
versus Windows 8.1 on my new machine. Perhaps this confirms it is a Windows 
permission issue and not an R problem?

-Matt


On Saturday, October 11, 2014 3:00 AM, Prof Brian Ripley 
rip...@stats.ox.ac.uk wrote:



Please do follow the posting guide and not sent HTML: it gets mangled.

There are two issues here:

1) Paths.  Use Sys.which(wget) to see if the command is on your path. 
  I suspect it is not, and you need to set the path when running R in 
the same way as is done for your shell.  Compare the setting of PATH in 
your shell with Sys.getenv(PATH) in R, and use Sys.getenv() to set it 
(or do so on the shortcut used to start R: see the rw-FAQ).

3) AFAIR 'dir' is not a system command.  See ?system (on Windows) and 
note that shell() is required for some commands: this is one.

These are not R issues, and you may need to seek local Windows help.


On 11/10/2014 02:20, Matt Borkowski wrote:
 Hello,
 First please keep in mind I am not a programmer and know very little about R. 
 I am running the 64bit version of R on a Windows 8.1 machine. I am trying to 
 run a script (which I have successfully run in the past) to download some 
 weather data from a NOAA ftp site.
 When I attempt to run the following command: system(wget -P data/raw 
 ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz;)

 it returns status 127, which as I understand simply means the command will 
 not run.
 If I go directly to my command prompt in Windows, navigate to my working 
 director, and run wget -P data/raw 
 ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/724620-23061-2013.gz
 the command runs and the file downloads without a problem.
 Playing around, it seems I can't invoke any system commands from R. Even a 
 simple  system(dir)
 returns status 127.
 I have moved to a new computer since I last successfully ran this 
 script...I'm wondering if this might be a permissions issue or other security 
 setting preventing me from invoking system commands.
 Any ideas?
 -Matt
 [[alternative HTML version deleted]]


-- 
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with continuous color plot

2014-09-24 Thread matt

Hi,
I have a matrix of data, with the rows representing observations and the 
columns representing various values that the observation can take on.  
In other words, each row can be thought of as a sampling of the density 
function/histogram associated with the range of values for that 
observation.


I'd like to graph these with a shaded color, rather than as lines.  So a 
given observation would have the darkest shade at the mean and the 
shading would lighten for values that approached the tails.  In a sense 
this is like a ribbon chart, but where there are many confidence bands.


I think the example near the bottom of this page
  http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/
starts to get at what I want.  But when I tried to get a ribbon, I get 
an error message saying that Error: Aesthetics can not vary with a 
ribbon


Can anyone point me to an example that accomplishes my task, or give me 
some ideas as to how to code this?


Below is a reproducible dataset and the code I ran that generated the 
above error.  And apologies in advance if I have overlooked some obvious 
source - I'm not exactly sure what keywords to search for.


Regards,
Matt

testdataset - structure(c(0.703482475602795, 0.708141442616021, 
0.696373713631662,
   0.670284015871304, 0.675183812793659, 
0.690440437259122, 0.717483375152826,
   0.775328205198994, 0.848374059782512, 
0.869939471712489, 0.86329313061477,
   0.842138830353923, 0.819853961383293, 
0.808038546509378, 0.826626282345039,
   0.855428819162732, 0.873943618483253, 
0.906412218904192, 0.95345525957727,
   0.941481792397259, 0.923791753474186, 
0.909206164221341, 0.847283523824235,
   0.774333551860785, 0.723440114819687, 
0.653247411286407, 0.585889004137383,
   0.516531935718585, 0.458855598305008, 
0.422596378188962, 0.385800210249005,
   0.363663809831211, 0.703482475602795, 
0.708055808109959, 0.696379276680681,
   0.686643131558789, 0.702628930558265, 
0.736010723583024, 0.790795207667811,
   0.843997296035071, 0.872447231982615, 
0.876357159885425, 0.852095141662599,
   0.815122741092172, 0.759163100114952, 
0.737079598996168, 0.755626127703219,
   0.76375495269533, 0.757290640161052, 
0.754301244147121, 0.738872719902144,
   0.712590028244082, 0.707690675037336, 
0.707234385372842, 0.708720518303698,
   0.723271948541464, 0.738173079905318, 
0.772161522113349, 0.776237486574842,
   0.775666977944939, 0.764229462885737, 
0.758916671383124, 0.742887393474484,
   0.741362343479079, 0.703482475602795, 
0.70722192044612, 0.694934601341247,
   0.675623005679584, 0.67355293987199, 
0.67514195581405, 0.701338223542176,
   0.770084545123592, 0.82661391194, 
0.815331595124185, 0.801265437257298,
   0.768736104243487, 0.698903427959817, 
0.654393072393584, 0.646507677289504,
   0.606308031283892, 0.574521529688064, 
0.550931914275617, 0.518538683619987,
   0.495773346159491, 0.482784058725618, 
0.473031502762785, 0.462940836756943,
   0.455472910452526, 0.457374752189383, 
0.468449683385787, 0.469177346159405,
   0.47981744053419, 0.500517935694715, 
0.521161553352487, 0.538278248678118,
   0.545834896270532, 0.703482475602795, 
0.707475643569319, 0.695699528962731,
   0.695460540915422, 0.705063229294573, 
0.694190083263775, 0.676451221936696,
   0.66113162065, 0.627150885842318, 
0.592467979293877, 0.556197511727567,
   0.524883713023224, 0.484571801496662, 
0.427784904562, 0.370137413134906,
   0.331233866457343, 0.292181528642806, 
0.265504971226103, 0.239129968439056,
   0.21258454640671, 0.184521419432522, 
0.160633576032345, 0.135729972994914,
   0.115111431576686, 0.0933784744252792, 
0.0672765522562478, 0.0397992726679255,
   0.0118179662548541, NA, NA, NA, NA, 
0.703482475602795, 0.70791132542366,
   0.696508162877812, 0.672357035115831, 
0.679831378223931, 0.702075998432084,
   0.736057349706643, 0.759252979404642, 
0.739391321260192, 0.706608353324493,
   0.65348693474, 0.607986236497692, 
0.600942686427268, 0.602450590412635,
   0.594096281507138, 0.598414292518021, 
0.570859444977738, 0.50462737404968,
   0.441225469913529, 0.37010584373766, 
0.299554326292306

Re: [R] Help with continuous color plot

2014-09-24 Thread matt
No, I don't think so.  And I've wondered if I described the problem 
clearly, so I put together the following hack, which seems to be what I 
want :


#create a matrix to hold the values corresponding to various percentiles
vals-matrix(0,32,21)
#for each row in the data, collect info on the distribution
for(i in 1:32){
  obs - testdataset[i,]
  vals[i,] - quantile(obs, probs=seq(0,1,0.05))
}

#pick the last observation to get a distrbution of colors
cols - sort(densCols(vals[32,]))

#set up a blank plot
matplot(vals, type=n, xlab = 'yrs', ylab = 'Ratio',
main = 'Projected ratios')

#plot confidence bands as polygons, ideally overlaying light to dark
for (i in 1:10){
  lines(vals[,i],col=cols[22-i])
  lines(vals[,22-i],col=cols[22-i])
  
polygon(c(seq(1:32),rev(seq(1:32))),c(vals[,22-i],rev(vals[,i])),col=cols[22-i],border=NA)

}

#plot a line for the average case
lines(vals[,11],col=black)

If anyone can suggest a more efficient/effective/better/etc/etc way of 
doing this, I'd be grateful.  In a nutshell, I am trying to find a 
visually clean way of showing the output of a Monte Carlo analysis.


Thanks again for everyone's attention.
Matt


On 2014-09-24 14:14, Federico Lasa wrote:

Does this resemble what you're after?

library(reshape2)
tst - melt(testdataset)
library(ggplot2)

ggplot(tst, aes(x=Var1, y=Var2, fill=value)) +
  geom_tile() +
  scale_fill_gradient2(low=white,
high=white,
mid=scales::muted(blue),
midpoint=0.6148377)

On Wed, Sep 24, 2014 at 10:26 AM,  m...@considine.net wrote:

Hi,
I have a matrix of data, with the rows representing observations and 
the
columns representing various values that the observation can take on.  
In

other words, each row can be thought of as a sampling of the density
function/histogram associated with the range of values for that 
observation.


I'd like to graph these with a shaded color, rather than as lines.  So 
a
given observation would have the darkest shade at the mean and the 
shading
would lighten for values that approached the tails.  In a sense this 
is like

a ribbon chart, but where there are many confidence bands.

I think the example near the bottom of this page
  http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/
starts to get at what I want.  But when I tried to get a ribbon, I get 
an
error message saying that Error: Aesthetics can not vary with a 
ribbon


Can anyone point me to an example that accomplishes my task, or give 
me some

ideas as to how to code this?

Below is a reproducible dataset and the code I ran that generated the 
above
error.  And apologies in advance if I have overlooked some obvious 
source -

I'm not exactly sure what keywords to search for.

Regards,
Matt

testdataset - structure(c(0.703482475602795, 0.708141442616021,
0.696373713631662,
   0.670284015871304, 0.675183812793659,
0.690440437259122, 0.717483375152826,
   0.775328205198994, 0.848374059782512,
0.869939471712489, 0.86329313061477,
   0.842138830353923, 0.819853961383293,
0.808038546509378, 0.826626282345039,
   0.855428819162732, 0.873943618483253,
0.906412218904192, 0.95345525957727,
   0.941481792397259, 0.923791753474186,
0.909206164221341, 0.847283523824235,
   0.774333551860785, 0.723440114819687,
0.653247411286407, 0.585889004137383,
   0.516531935718585, 0.458855598305008,
0.422596378188962, 0.385800210249005,
   0.363663809831211, 0.703482475602795,
0.708055808109959, 0.696379276680681,
   0.686643131558789, 0.702628930558265,
0.736010723583024, 0.790795207667811,
   0.843997296035071, 0.872447231982615,
0.876357159885425, 0.852095141662599,
   0.815122741092172, 0.759163100114952,
0.737079598996168, 0.755626127703219,
   0.76375495269533, 0.757290640161052,
0.754301244147121, 0.738872719902144,
   0.712590028244082, 0.707690675037336,
0.707234385372842, 0.708720518303698,
   0.723271948541464, 0.738173079905318,
0.772161522113349, 0.776237486574842,
   0.775666977944939, 0.764229462885737,
0.758916671383124, 0.742887393474484,
   0.741362343479079, 0.703482475602795,
0.70722192044612, 0.694934601341247,
   0.675623005679584, 0.67355293987199,
0.67514195581405, 0.701338223542176,
   0.770084545123592, 0.82661391194,
0.815331595124185, 0.801265437257298,
   0.768736104243487, 0.698903427959817,
0.654393072393584, 0.646507677289504,
   0.606308031283892, 0.574521529688064,
0.550931914275617, 0.518538683619987,
   0.495773346159491, 0.482784058725618,
0.473031502762785, 0.462940836756943

Re: [R] plotly

2014-07-22 Thread Matt Sundquist
Hey Shane,

Sorry you're having trouble.

The quick start is here and walks through installation: https://plot.ly/r/.

A note. If you're on Windows, you'll need Rtools to install devtools:
http://cran.rstudio.com/bin/windows/Rtools/. As Sarah noted, Plotly isn't
on CRAN.

If you're having trouble, please let us know and we're happy to try and
help, or open an issue on GitHub: https://github.com/ropensci/plotly/issues.

M


On Mon, Jul 21, 2014 at 4:05 AM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 Hi,

 On Monday, July 21, 2014, Shane Carey careys...@gmail.com wrote:

  Hey,
 
  What version of R is required to use the plotly library?
 
  I have R version 3.0.1 and it will not allow me to install the devtools
  package or the ploty package.
 
  I have googled and searched to see what version of R I should be running
  but could not find anything.


 It's always a good idea to upgrade to the current release of R before
 asking questions like that (3.1.1). But in general, packages on CRAN
 clearly state what version of R is needed, as in

 http://cran.r-project.org/web/packages/devtools/index.html
 devtools: Tools to make developing R code easier

 Collection of package development tools

 Version:1.5Depends:R (≥ 3.0.2)

 For packages not on CRAN, like plotly, you may need to download the package
 and check the DESCRIPTION file.

 Sarah


 --
 Sarah Goslee
 http://www.stringpage.com
 http://www.sarahgoslee.com
 http://www.functionaldiversity.org

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotly and rOpenSci: R and ggplot2 interactive, online, collaborative plotting

2014-05-12 Thread Matt Sundquist
Hello R help,

My name is Matt, and I'm a co-founder at Plotly http://plot.ly, an online
graphing and analytics project.

We're building an R library http://plot.ly/r as part of the
rOpenScihttp://ropensci.orgproject. You can use it to make
interactive, web-based R and ggplot2 plots.
The plots are shareable, embeddable, and drawn with D3 (a JS graphing
library). You can also make R and ggplot2 plots into collaborative,
web-based plots. The project is still definitely in beta, so we'd
appreciate hearing your suggestions and issues.

Here is how the ggplot2 sharing works:

ropensci.org/blog/2014/04/17/plotly/

Another fun aspect of it is that you can collaboratively plot in R, Python,
MATLAB, and from our web app. That means you could work from R with a team
working from Excel and work on the same plots and data. And your data and
plots always stay together in your files. Here's how that looks in an
IPython Notebook:

http://nbviewer.ipython.org/gist/msund/61cdbd5b22c103fffb84

We'd love to hear your thoughts, feedback, and suggestions. Our goal is to
be a GitHub for sharing and collaborating on data and plots. We're on
GitHubhttp://github.com/ropensci/plotly, and
eager to hear from you. Thanks so much for any and all help and advice.

All the best,
Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotly Beta: Online Plotting with R

2014-01-24 Thread Matt Sundquist
Hi R Users,

My name is Matt, and I'm a part of Plotly http://plot.ly. We recently
released an R plotting library http://plot.ly/api/r for making
publication-quality graphs online. We wanted to let the folks on this list
know.

A basic summary:

- Make publication-quality, online plots with a GUI and R.
- Fits, error bars, stats, and functions.
- Embed interactive graphs in an iframe (Washington Post
examplehttp://washingtonpost.com/blogs/wonkblog/wp/2013/06/14/do-low-taxes-on-the-rich-leave-the-middle-class-with-lower-wages/
).
- Collaborative, so you can edit with others, comment on your graphs, and
save revision history.
- Free for public use, and you own your data (like GitHub).

If you're interested, here are a few examples you can check out of: multiple
axes scales with old faithful
datahttp://blog.plot.ly/post/69647810381/multiple-axes-scales-with-old-faithful-data,
an IPython http://nbviewer.ipython.org/gist/fonnesbeck/8495259 that has
an Rmagic example, and a
posthttp://www.r-bloggers.com/plotly-beta-collaborative-plotting-with-r/on
r-bloggers.

We'd love your feedback, advice, and thoughts. As a new project, expert
insights go a long way for us, so please let us know what you think.

Happy plotting,
Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to assign names to global data frames created in a function

2013-09-03 Thread Matt Strauser
I have several data frames containing similar data. I'd like to pass these
data frames to a function for processing. The function would create newly
named global data frames containing the processed data. I cannot figure
out how to assign names to the data frames in Step 1 or Step 2 in the
following example:

# sample function in pseudo code
processdf - function(df, prefix) {
# df - data frame containing data for processing
# prefix - string to become the first part of the names of the resulting
data frames
# Step 1 - processs df into several subsets
  df1 - subset(df, df$cond1  df$cond2  ...)
  df2 - subset(df, df$cond3  df$cond4  ...)
  df3 - subset(df, df$cond5  df$cond6  ...)
# and so onfor many more steps with resulting data frames

# Step 2 - rename the resulting global data frames
   rename df1 to prefix + cond1cond2
   rename df2 to prefix + cond3cond4
   rename df3 to prefix + cond5cond6
# and so on for the remaining data frames
}

Example using data frames: frame1 and frame2:

processdf(frame1, frame1)
# produces these data frames:
frame1cond1cond2
frame1cond3cond4
frame1cond5cond6

processdf(frame2, frame2)
# produces these data frames:
frame2cond1cond2
frame2cond3cond4
frame2cond5cond6

Thank you for your thoughts,
Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Computing Median for Subset of Data

2013-06-02 Thread Matt Stati
From my larger data set I created a subset of it by using: 

subset_1 - subset(timeuse, IndepTrans = 1, Physical = 1)

where my larger data set is timeuse and the smaller subset is subset_1. The 
subset was conditioned on IndepTrans equaling 1 in the data and Physical 
equaling 1 as well. I want to be able to compute the median of a variable 
first for the larger data set timeuse then for the subset file subset_1. 
How do I identify to R which data set I'm wanting the median computed for? I've 
tried many possibilities but for some reason can't figure it out. 

Thanks, 
Matt. 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help searching a matrix for only certain records

2013-03-03 Thread Matt Borkowski
Let me start by saying I am rather new to R and generally consider myself to be 
a novice programmer...so don't assume I know what I'm doing :)

I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year 
dataset of 15-minute data. However, I only need the rows where the column I've 
named REC.TYPE contains the string SAO   or FL-15. 

My horribly inefficient solution was to search the matrix row by row, test the 
REC.TYPE column and essentially delete the row if it did not match my criteria. 
Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) {
  dataset - dataset[-j,]  }
else {
  j - j+1  }
 }

After watching my code get through only about 10% of the matrix in an hour and 
slowing with every row...I figure there must be a more efficient way of pulling 
out only the records I need...especially when I need to repeat this for another 
8 datasets. 

Can anyone point me in the right direction?

Thanks!

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread Matt Borkowski
Thank you for your response Jim! I will give this one a try! But a couple 
followup questions...

In my search for a solution, I had seen something stating match() is much more 
efficient than subset() and will cut down significantly on computing time. Is 
there any truth to that?

Also, I found the following solution which works for matching a single 
condition, but I couldn't quite figure out how to  modify it it to search for 
both my acceptable conditions...

 testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE]

-Matt




--- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote:

From: jim holtman jholt...@gmail.com
Subject: Re: [R] Help searching a matrix for only certain records
To: Matt Borkowski mathias1...@yahoo.com
Cc: r-help@r-project.org
Date: Sunday, March 3, 2013, 8:00 AM

Try this:

dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE))


On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote:
 Let me start by saying I am rather new to R and generally consider myself to 
 be a novice programmer...so don't assume I know what I'm doing :)

 I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year 
 dataset of 15-minute data. However, I only need the rows where the column 
 I've named REC.TYPE contains the string SAO   or FL-15.

 My horribly inefficient solution was to search the matrix row by row, test 
 the REC.TYPE column and essentially delete the row if it did not match my 
 criteria. Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
    if(dataset$REC.TYPE[j] != SAO    dataset$RECTYPE[j] != FL-15) {
      dataset - dataset[-j,]  }
    else {
      j - j+1  }
 }

 After watching my code get through only about 10% of the matrix in an hour 
 and slowing with every row...I figure there must be a more efficient way of 
 pulling out only the records I need...especially when I need to repeat this 
 for another 8 datasets.

 Can anyone point me in the right direction?

 Thanks!

 Matt

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread Matt Borkowski
I appreciate all the feedback on this. I ended up using this line to solve my 
problem, just because I stumbled upon it first...

 alldata - alldata[alldata$REC.TYPE == SAO   | alldata$REC.TYPE == 
 FM-15,,drop=FALSE]

But I think Jim's solution would work equally as well. I was a bit confused by 
the relative complexity of the data frames solution, as it seems like more 
steps than necessary.

Thanks again for the input!

-Matt




Again, thanks for the feedback!

--- On Sun, 3/3/13, arun smartpink...@yahoo.com wrote:

 From: arun smartpink...@yahoo.com
 Subject: Re: [R] Help searching a matrix for only certain records
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: R help r-help@r-project.org, jim holtman jholt...@gmail.com
 Date: Sunday, March 3, 2013, 1:29 PM
 HI,
 You could also use ?data.table() 
 
 n- 30
 set.seed(51)
  mat1- as.matrix(data.frame(REC.TYPE=
 sample(c(SAO,FAO,FL-1,FL-2,FL-15),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE))
  dat1- as.data.frame(mat1,stringsAsFactors=FALSE)
  table(mat1[,1])
 #
  # FAO  FL-1 FL-15  FL-2   SAO 
 #60046 60272 59669 59878 60135 
 system.time(x1 - subset(mat1, grepl((SAO|FL-15),
 mat1[, REC.TYPE])))
  #user  system elapsed 
  # 0.076   0.004   0.082 
  system.time(x2 - subset(mat1, mat1[, REC.TYPE] %in%
 c(SAO, FL-15)))
  #  user  system elapsed 
  # 0.028   0.000   0.030 
 
 system.time(x3 - mat1[match(mat1[, REC.TYPE]
     ,
 c(SAO, FL-15)
     ,
 nomatch = 0) != 0
     ,,
 drop = FALSE]
     )
 #user  system elapsed 
 #  0.028   0.000   0.028 
  table(x3[,1])
 #
 #FL-15   SAO 
 #59669 60135 
 
 
 library(data.table)
 
 dat2- data.table(dat1) 
  system.time(x4- dat2[match(REC.TYPE,c(SAO,
 FL-15),nomatch=0)!=0,,drop=FALSE])
   # user  system elapsed 
   #0.024   0.000   0.025 
  table(x4$REC.TYPE)
 
 #FL-15   SAO 
 #59669 60135 
 A.K.
 
 
 
 
 
 
 
 
 - Original Message -
 From: jim holtman jholt...@gmail.com
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: r-help@r-project.org
 r-help@r-project.org
 Sent: Sunday, March 3, 2013 11:52 AM
 Subject: Re: [R] Help searching a matrix for only certain
 records
 
 If you are using matrices, then here is several ways of
 doing it for
 size 300,000.  You can determine if the difference of 0.1
 seconds is
 important in terms of the performance you are after.  It is
 taking you
 more time to type in the statements than it is taking them
 to execute:
 
  n - 30
  testdata - matrix(
 +     sample(c(SAO , FL-15, Other), n, TRUE,
 prob = c(1,2,1000))
 +     , nrow = n
 +     , dimnames = list(NULL, REC.TYPE)
 +     )
  table(testdata[, REC.TYPE])
 
 FL-15  Other   SAO
    562 299151    287
  system.time(x1 - subset(testdata, grepl((SAO
 |FL-15), testdata[, REC.TYPE])))
    user  system elapsed
    0.17    0.00    0.17
  system.time(x2 - subset(testdata, testdata[,
 REC.TYPE] %in% c(SAO , FL-15)))
    user  system elapsed
    0.05    0.00    0.05
  system.time(x3 - testdata[match(testdata[,
 REC.TYPE]
 +                             , c(SAO ,
 FL-15)
 +                             , nomatch =
 0) != 0
 +                             ,, drop =
 FALSE]
 +             )
    user  system elapsed
    0.03    0.00    0.03
  identical(x1, x2)
 [1] TRUE
  identical(x2, x3)
 [1] TRUE
 
 
 
 On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com
 wrote:
  there are way more efficient ways of doing many of
 the operations , but you probably won't see any differences
 unless you have very large objects (several hunfred thousand
 entries), or have to do it a lot of times.  My background
 is in computer performance and for the most part I have
 found that the easiest/mostbstraight forward ways are fine
 most of the time.
 
  a more efficient way might be:
 
  testdata - testdata[match(c('SAO ', 'FL-15'),
 testdata$REC.TYPE), ]
 
  you can always use 'system.time' to determine how long
 actions take.
 
  for multiple comparisons use %in%
 
  Sent from my iPad
 
  On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com
 wrote:
 
  Thank you for your response Jim! I will give this
 one a try! But a couple followup questions...
 
  In my search for a solution, I had seen something
 stating match() is much more efficient than subset() and
 will cut down significantly on computing time. Is there any
 truth to that?
 
  Also, I found the following solution which works
 for matching a single condition, but I couldn't quite figure
 out how to  modify it it to search for both my acceptable
 conditions...
 
  testdata - testdata[testdata$REC.TYPE ==
 SAO,,drop=FALSE]
 
  -Matt
 
 
 
 
  --- On Sun, 3/3/13, jim holtman jholt...@gmail.com
 wrote:
 
  From: jim holtman jholt...@gmail.com
  Subject: Re: [R] Help searching a matrix for only
 certain records
  To: Matt Borkowski mathias1...@yahoo.com
  Cc: r-help@r-project.org
  Date: Sunday, March 3, 2013, 8:00 AM
 
  Try this:
 
  dataset - subset(dataset, grepl((SAO |FL-15

Re: [R] Amelia algorithm

2013-01-12 Thread Matt Blackwell
Hi Martin,

I helped to develop Amelia, so I can try to take a shot. In a
non-mathematical way, Amelia works by filling in missing values with
imputed values that are consistent with the observed relationships in the
data, plus some random noise. Thus, Amelia creates multiple imputed
datasets that have no missingness (the original observed cells remain the
same across each imputation, but the filled-in values vary from imputed
dataset to dataset) and have the same relationships between and within
variables as the original observed data. The difficult part of the problem
is estimating the relationships of the observed data since it has all of
that missing data in it (the dataset looks like Swiss cheese). We use a EM
algorithm to estimate these relationships, but those details are (somewhat)
less important.

You can find more resources, including a number of papers describing the
methods at our webpage:

http://gking.harvard.edu/amelia

Hope that helps!

Cheers,
matt.

~~~
Matthew Blackwell
Assistant Professor of Political Science
University of Rochester
url: http://www.mattblackwell.org


On Mon, Jan 7, 2013 at 6:27 PM, zGreenfelder zgreenfel...@gmail.com wrote:

 On Mon, Jan 7, 2013 at 4:29 PM, Martin lh...@gmx.net wrote:
  Dear all.
 
  First of all, my english isn't verry good, but I hope I can convey my
 concern.
  I've a general question about the Amelia algorithm. I'm no mathematician
 or
  statistician, but I had to use R and impute and analyse some data, and
 Amelia
  showed results that fitted my expectations. I'll have to defend my
 choice soon,
  but I haven't totally grasped what Amelia does. I'm particularly
 interested in a
  simple as possible explanation in how Amelia imputation works. I've read
 that it
  uses a bootstrapping-based algorithm, but how does it chose the values?
  The data had mainly value 0 (chemical concentrations, water temperature
 and
  pH-value).
 
  Regards Martin

 I'm pretty new here, but a quick google search suggests that perhaps
 http://gking.harvard.edu/amelia (and maybe google translate)
 might have some decent pointers for you.

 I poked at the documentation from that site (a pdf file), and it's
 quite intense on
 the mathematics, you may get more from it than I could.  there's also
 a link there
 to a separate, for thisspecific package/algorithm  (it seems to be
 called Amel ia II,
  I'm assuming this is the same that you used, if I'm off .. sorry about
 that)

 HTH
 --
 Even the Magic 8 ball has an opinion on email clients: Outlook not so good.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Oracle Approximating Shrinkage in R?

2012-12-08 Thread Matt Considine

Hi,
Can anyone point me to an implementation in R of the oracle 
approximating shrinkage technique for covariance matrices?  Rseek, 
Google, etc. aren't turning anything up for me.


Thanks in advance,
Matt Considine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performing gage RR study in R w/more than 2 factors

2012-11-20 Thread Matt Jacob
On Mon, Nov 19, 2012, at 16:31, Bert Gunter wrote:
 I believe that you need to consult a local statistician, as there are
 likely way too many statistical issues here that you do not fully
 understand. Alternatively, try posting to a statistical list like
 stats.stackexchange.com, as I think most of your issues are primarily
 statistical, not R related.

Yes, you are correct. I've actually been working with a statistician
within my organization, but the dilemma is that he's a stats guy who
knows Minitab, and I'm a software guy who's trying to deploy some tools
that are dependent on R. I've basically been trying to match up the
output of R with the output of Minitab to check my work.

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Performing gage RR study in R w/more than 2 factors

2012-11-20 Thread Matt Jacob
On Mon, Nov 19, 2012, at 18:26, David Winsemius wrote:
 My guess is that you do not understand the meaning of a random  
 factor. I certainly did not when I first encountered it. All my  
 training had been with ordinary regression and analysis of variance.  
 These are methods for what in mixed models are fixed effects. My  
 opinion is that these terms are completely confusing to the new  
 student of this sort of analysis.

You're absolutely right---the distinction of fixed vs. random factors is
confusing. However, I was under the impression that all factors in a
gage RR study were random, since we're trying to determine the sources
of variability on the system. 

 My guess is the you may just want the output of:
 
 lm( vals ~ f1 * f2 * f3, data = yourdat)

I'm trying to get the variance component estimates, and from there, I
can calculate the percent tolerance and other interesting statistics. It
doesn't look like lm gives me that information, though. FWIW, your
formula is the same as what I'm feeding into aov, and the ANOVA table
output *does* match up with what Minitab is producing.

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Performing gage RR study in R w/more than 2 factors

2012-11-19 Thread Matt Jacob
Hi everyone,

I'm fairly new to R, and I don't have a background in statistics, so
please bear with me. ;-)

I'm dealing with 2^k factorial designs, and I was just wondering if
there's any way to analyze more than two factors of a gage RR study in
R. For example, Minitab has an expanded gage RR function that lets
you include up to eight additional factors besides the usual two that
are present in gage studies (parts and operators). If I wanted to
include n additional random factors, is there a package or built-in
functionality that will allow me to do that?

I've been experimenting with the SixSigma package, and that has a ss.rr
method which works great---as long as your experiment only contains two
factors. I've also been using lmer from lme4 to fit a linear model of my
experiment, but the standard deviations generated by lmer don't match
what I'm seeing in Minitab. Since all my factors are random, the formula
I'm using looks like this:

vals ~ 1 + (1|f1) + (1|f2) + (1|f3) + (1|f1:f2) + (1|f1:f3) + (1|f2:f3)

What am I doing wrong, and how can I fix it?

Thanks,

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Opening SAS file using read.sas7bdat() function in sas7bdat library.

2012-10-30 Thread Matt Shotwell
Thanks for the helpful comments from others.

The KNOWNHOST variable lists the types of file that are known to work
with the read.sas7bdat function. It's likely that most files written on
Windows platforms will work, even if not listed in KNOWNHOST. If you're
feeling experimental, you might just comment the lines that test against
the KNOWNHOST list.

Unfortunately, it appears that the file formatting depends on the system
where is was originally written. The hypothesis is that sas7bdat files
were originally no more than a memory dump of a C structure, or similar.
Because C structures may be laid out differently by different compilers
(i.e., on different platforms), this may have led to the difficulty
apparent here.

Regards,
Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Any good R server-with connection examples

2012-10-23 Thread Matt Shotwell
 I want to connect R with HTML/PHP pages to take input from user,do
 some
 statistical processing on it   show results to HTML page again.
 I search on net,i got Rserve package,but examples  are mainly for java
 langaure  not for PHP
 i am wondering how to connect it to PHP-Apache-MySQL
 Is there any good tutorial/video which will tell me how to do that ?
 At least tell me logical way how to use it ?

Check out http://rapache.net/

rApache connects R and the Apache 2 web server, such that R can act as a
server-side scripting language, like PHP. This may be the easiest way,
using R, to take user input from the web browser.

The site has some decent documentation and links to examples.

--Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unable to run spatial lag and error models on large data

2012-07-24 Thread shish matt
Hi:
First my apologies for cross-posting. A few days back I posted my queries ar 
R-sig-geo but did not get any response. Hence this post.

I am working on two parcel-level housing dataset to estimate the impact of 
various variables on home sale prices. 

I created the spatial weight metrics in ArcGIS 10 using sale 
year of four nearest houses to assign weights.  Next, I ran LM tests and
 then ran the spatial lag and error models using spdep package. 


I run into five issues. 


Issue 1: When I weight the 10,000-observation first dataset, I get the 
following message: Non-symmetric neighbors list.   

Is this going to pose problems while running the regression models? If yes, 
what can I do? 


The code and the results are: 
test1.csv - read.csv(C:/Article/Housing1/NHspwt.csv) 

class(test1.csv) - c(spatial.neighbour, class(test1.csv)) 
of - ordered(test1.csv$OID) 
attr(test1.csv, region.id) - levels(of) 
test1.csv$OID - as.integer(of) 
test1.csv$NID - as.integer(ordered(test1.csv$NID)) 
attr(test1.csv, n) - length(unique(test1.csv$OID)) 

lw_test1.csv - sn2listw(test1.csv) 
lw_test1.csv$style - W 
lw_test1.csv 

Characteristics of weights list object: 
Neighbour list object: 
Number of regions: 10740 
Number of nonzero links: 42960 
Percentage nonzero weights: 0.03724395 
Average number of links: 4 
Non-symmetric neighbours list 

Weights style: W 
Weights constants summary: 
      n        nn    S0       S1       S2 
W 10740 115347600 10740 3129.831 44853.33 


Issue 2: The spatial lag and error models do not run. I get 
the following message (the models runs on half the data, approx. 5,000 
 observations.  However, I will like to use the entire sample).   

Error: cannot allocate vector of size 880.0 Mb 
In addition: Warning messages: 
1: In t.default(object) : 
  Reached total allocation of 3004Mb: see help(memory.size) 
2: In t.default(object) : 
  Reached total allocation of 3004Mb: see help(memory.size) 
3: In t.default(object) : 
  Reached total allocation of 3004Mb: see help(memory.size) 
4: In t.default(object) : 
  Reached total allocation of 3004Mb: see help(memory.size) 

The code for the lag model is: 
 fmtypecurrentcombinedlag -lagsarlm(fmtypecurrentcombined, 
data = spssnew, lw_test1.csv, na.action=na.fail, type=lag, 
method=eigen, quiet=TRUE, zero.policy=TRUE, interval = NULL, 
tol.solve=1.0e-20) 

When I am able to read the data file using filehash package. 
 However, I still get the following error message when I run the models:
 Error in matrix(0, nrow = n, ncol = n) : too many elements specified 


Issue 3: For the second dataset that contains approx. 
100,000 observations, I get the following error message when I try to 
run spatial lag or error models. 
Error in matrix(0, nrow = n, ncol = n) : too many elements specified 

The code is: 
 fecurrentcombinedlag -lagsarlm(fecurrentcombined, data = 
spssall, lw_test2.csv, na.action=na.fail, type=lag, method=eigen, 
quiet=NULL, zero.policy=TRUE, interval = NULL, tol.solve=1.0e-20) 


Issue 5: When I run LM tests I get the test results but with
 the following message: Spatial weights matrix not row standardized. 
 Should I be worried about this considering that I am using the 
4-nearest neighbor rule? 

The code is: 
lm.LMtests(fmtypecurrent, lw_test1.csv, test=c(LMerr, LMlag, RLMerr, 
RLMlag, SARMA)) 

Thanks 
Shishm 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data from Stock and Watson or DAgostino papers?

2012-07-16 Thread Matt Considine

Hello,
I am interested in looking at the dataset used by Stock and Watson in 
their Macroeconomic Forecasting Using Diffusion Indexes (J. of Business 
and Econ. Statistics, April 2002, pp158-161) or the set used by 
D'Agostino and Giannone Comparing Alternative Predictors [...](October 
2006) in R.


Does anyone know if the R-code to retrieve these series from FRED (as 
opposed to McGraw-Hill) is out in the wild anywhere?  Before doing the 
mapping from the papers to the St. Louis database and then doing the 
coding, I thought I would ask if anyone has already gone down that road 
or would know where else I could search for this answer (and - yes - I 
have tried Google ...)


Thanks in advance,
Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does aov produce one-sided or two-sided p-values?

2012-05-21 Thread Matt Pickard
Hi -

Hopefully this is an easy question.   In SPSS, when I'm testing a
directional hypotheses using an ANOVA (GLM), I can divide the p-value by 2
because SPSS reported two-sided p-values?  Is this approach still legit
when I'm using aov in R?

Thanks,

Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random forests prediction

2012-05-14 Thread matt
But shouldn't it be resolved when I set mtry to the maximum number of
variables? 
Then the model explores all the variables for the next step, so it will
still be able to find the better ones? And then in the later steps it could
use the (less important) variables.

Matthijs

--
View this message in context: 
http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409p4629944.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random forests prediction

2012-05-11 Thread matt
Hi all,

I have a strange problem when applying RF in R. 
I have a set of variables with which I obtain an AUC of 0.67.

I do have a second set of variables that have an AUC of 0.57. 

When I merge the first and second set of variables, the AUC becomes 0.64. 

I would expect the prediction to become better as I add variables that do
have some predictive power?
This is even more strange as the AUC on the training set increased when I
added more variables (while the AUC of the validation set thus decreased).

Is there anyone who has experienced the same and/or who know what could be
the reason?

Thanks,

Matthijs

--
View this message in context: 
http://r.789695.n4.nabble.com/Random-forests-prediction-tp4627409.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reception of (Vegan) envfit analysis by manuscript reviewers

2012-05-09 Thread Matt Bakker
I'm getting lots of grief from reviewers about figures generated with
the envfit function in the Vegan package. Has anyone else struggled to
effectively explain this analysis? If so, can you share any helpful
tips?

The most recent comment I've gotten back: What this shows is which
NMDS axis separates the communities, not the relationship between the
edaphic factor and the Bray-Curtis distance.

Thanks for any suggestions!


Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Gwet's AC1

2012-05-07 Thread Matt Stati
R has functions for computing kappa, fleiss's kappa, etc., but can it compute 
Gwet's AC1? 


Thanks, 

Matt. 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested brew call yields Error in .brew.cat(26, 28) : unused argument(s) (26, 28)

2012-03-29 Thread Matt Shotwell
On Wed, 2012-03-28 at 11:40 +0100, Chris Beeley wrote:
 I am writing several webpages using the brew package and R2HTML. I would 
 like to work off one script so I am using nested brew calls. The 
 documentation for brew states that:
 
 NOTE: brew calls can be nested and rely on placing a function named 
 ’.brew.cat’ in the environment in which it is passed. Each time brew is 
 called, a check for the existence of this function is made. If it 
 exists, then it is replaced with a new copy that is lexically scoped to 
 the current brew frame. Once the brew call is done, the function is 
 replaced with the previous function. The function is finally removed from 
 the environment once all brew calls return.
 
 I'm afraid I can't quite figure out what it is I'm supposed to do here. 
 I've tried loading the brew library within the script which I pass to 
 brew, and I've tried defining brew cat like this:

The paragraph above describes what brew is doing behind the scenes. It's
not necessary to modify or set the .brew.cat function.

A nested (or recursive) brew call occurs when brew() is called from a
document currently being processed by brew().

To illustrate further, suppose there are two brew documents,
example-1.brew and example-2.brew, where example-1.brew contains the
following text (delimited by '''):

'''
This text is in example-1.brew.
%= brew::brew(example-2.brew) %
'''

and the example-2.brew contains

'''
This text is in example-2.brew.
%= date() -%
'''

Then from the R prompt we have:

Rbrew::brew(example-1.brew)
This text is in example-1.brew.
This text is in example-2.brew.
Thu Mar 29 20:24:52 2012

 .brew.cat=function(){}
 
 This generates the following error message:
 
 Error in .brew.cat(26, 28) : unused argument(s) (26, 28)
 
 I think perhaps it is more likely that I need to insert into the script 
 the actual content of .brew.cat, but I can't seem to get R to tell me 
 what it is and Googling throws up a lot of stuff about beer and not much 
 else (drew a blank also from RSiteSearch(Nested brew))
 
 Any help gratefully received.
 
 Chris Beeley
 Institute of Mental Health, UK
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Assistant Professor, Department of Biostatistics
School of Medicine, Vanderbilt University
1161 21st Ave. S2323 MCN Office CC2102L
Nashville, TN 37232-2158

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with Matrix code optimization

2012-02-23 Thread Matt Shotwell
The chol and solve methods for dpoMatrix (Matrix package) are much
faster than the default methods. But, the time required to coerce a
regular matrix to dpoMatrix swamps the advantage.

Hence, I have the following problem, where use of dpoMatrix is worse
than a regular matrix.

library(Matrix)

x - diag(10)

system.time(
  for(r in seq(0.1, 0.9, length.out=1000)) {
m - r^abs(row(x)-col(x));
chol(m); solve(m);
  })

system.time(
  for(r in seq(0.1, 0.9, length.out=1000)) {
M - as(r^abs(row(x)-col(x)), 'dpoMatrix')
chol(M); solve(M);
  })

Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] repeating or looping within an apply statement to handle multiple variables

2012-02-19 Thread Matt Spitzer
Dear R experts,
I would like to please ask for your help with repeating steps in an apply
statement.
I have a dataframe that lists multiple variables for a given id and visit,
as well as drug treatment.

 head(exp)
  id visit variable1 variable2 variable3 variable4 drug
1  3 11310 7110
2  3 51015 9 90
3  312 910 8 80
4  7 112 8 9 81
5  7 516 9 3101
6  712 511 9141

I would like process these variables to find the difference between visit 5
and 1 for each id, then summarize this data in terms of means and errors.
 Thus far, with your brilliant advice to employ do.call and lapply, I have
been able to process one variable at a time, but I would much prefer to
loop or repeat the process for each variable in order to create an
efficiently stored set of data.  I would like to get a data set such as:

 exp1
 id  variable drug d5.3
3 3 variable10   -3
7 7 variable114
13   13 variable10   -5
56   56 variable104
78   78 variable107
109 109 variable10   -3
145 145 variable10   -2
173 173 variable109
212 212 variable11   -7
3 3 variable2?  ?
7 7 variable2?  ?
13   13 variable2   ?  ?
56   56 variable2?  ?
78   78 variable2   ?  ?
109 109 variable2?  ?
145 145 variable2?  ?
173 173 variable2   ?  ?
212 212 variable2?  ?
3 3 variable3?  ?
etc...

 exp2
   variable difference gel mean   sd n   se X95cimean.sd
0 variable1   d5.1   0  1.0 5.567764 7 2.104417  5.149323  0.1796053
1 variable1   d5.1   1 -1.5 7.778175 2 5.50 69.884126 -0.1928473
  se.sd  X95ci.sd
0 0.3779645 0.9248457
1 0.7071068 8.9846435

But, I have only been able to get the data for the first variable, despite
having attempted loop statements, ie (for i in
c('variable1','variable2','variable3','variable4')), for the variable
names.  Would you please have any thoughts about how to repeat lapply
across many column variables?  I greatly appreciate your thoughts.  I have
supplied the code for the example and my work thus far below:

exp - data.frame(id= rep(c(3,7,13,56,78,109,145,173,212),each=3)

, visit = rep(c(1,5,12), times = 9 )

, variable1 = round (rnorm ( mean =10,sd = 3, n = 27),0)

, variable2 =  round (rnorm ( mean =10,sd = 3, n = 27),0)

, variable3 =  round (rnorm ( mean =10,sd = 3, n = 27),0)

, variable4 = round (rnorm ( mean =10,sd = 3, n = 27),0)

, drug = rep ( round ( rnorm ( mean = 0.5, sd=0.1, n=9),0),each = 3 ) )

exp [exp[,'visit'] == 1  exp[,'id']==3 ,]$variable - NA

exp [exp[,'visit'] == 5  exp[,'id']==56 ,]$variable - NA


exp1 - do.call (rbind

,lapply (split (exp, exp$id), function (.grp) {

data.frame ('id'=.grp$id[1L], 'variable'= 'variable1',  'drug'=.grp$drug[1L
], 'd5-3'= .grp [.grp [['visit']]==5,]$variable1 -  .grp[.grp[['visit']]==1
,]$variable1 )

}))



exp2 - do.call (rbind

,lapply ( split (exp1,exp1$drug), function (.grp) {

a- na.omit(.grp$d5.3)

data.frame('variable'='variable1',

'difference'='d5.1',

'gel'=.grp$drug[1L],

'mean'=mean(a),

'sd'=sd(a),

'n'=length(a),

'se'=sd(a)/sqrt(length(a)),

'95ci'= qt(0.975, (length(a)-1)) * sd(a)/sqrt(length(a)),

'mean/sd'=mean(a)/sd(a),

'se/sd'=(sd(a)/sqrt(length(a)))/sd(a),

'95ci/sd'=(qt(0.975,(length(a)-1))*sd(a)/sqrt(length(a)))/sd(a)

)}

)

)


Thanks again for your help, Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to rbind matrices from different loops

2012-02-16 Thread Matt Spitzer
Dear R experts,
I am having difficulty using loops productively and would like to please
ask for advice.  I have a dataframe of ids and groups.  I would like to
break down the dataframe into groups, find the unique sets of ids, then
reassemble.  My thought was to use a loop, but I have been unable to finish
this loop in a logical way.  I would like to find the unique ids for group
1, group 2, etc., and rbind these back together.  However, I am unclear how
to do this because in other attempts my final product is always a part of
the last run loop.  My way of working around this has been to write.csv and
then re read at the end, which is so clumsy.  Previously, I have used a
store matrix for individual cells. 1. Is there a better way to approach
this?  2. How can I combine parts of matrices to other parts created in
prior loops?
I have created a primitive example below.  Each of the groups varies in
number, so my repetitive example below is not accurate.  In my real data,
ids repeat often within groups.
Thank you so much, Matt

example - data.frame(id=rep(
( abs(round(rnorm(50,mean=500,sd=250),digits=0)))
,3), group=rep(1:15,10))
example -example[with(example,order(id,group)),]

for (i in 1:15) {
ai - example[example[,2]==i,][!duplicated ( example[example[,2]==i,][,1]
),]
write.csv(ai, paste('a',i,'.csv',sep=))
}
b1-read.csv('a1.csv')
b2-read.csv('a2.csv')
b3-read.csv('a3.csv')
b4-read.csv('a4.csv')
b5-read.csv('a5.csv')
b6-read.csv('a6.csv')
b7-read.csv('a7.csv')
b8-read.csv('a8.csv')
b9-read.csv('a9.csv')
b10-read.csv('a10.csv')
b11-read.csv('a11.csv')
b12-read.csv('a12.csv')
b13-read.csv('a13.csv')

unis2 -
rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind(rbind
(b1,b2),b3),b4),b5),b6),b7),b8),b9),b10),b11),b12),b13)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function restrictedparts

2012-01-25 Thread Matt Shotwell
That's because the number of partitions of 281 items of order 10 is
quite large:

R library('partitions')
R R(10,281)
[1] 1218681472

Without thinking about this too hard, the result of
restrictedparts(281,10) should require around

R 1218681472 * 10 * 4 / 10^9
[1] 48.74726

gigabytes of storage space (because the result is a 1218681472 x 10
array of 4 byte integers).

Because the number of partitions grows 'explosively' with the number of
items, this is a serious obstacle for statistical partitioning and
clustering methods. For more discouragement, see the 'Bell number'.

You can enumerate these restricted partitions one by one; see

R ?partitions::nextpart

Matt

On Wed, 2012-01-25 at 15:11 +, yan jiao wrote:
 I am using function restrictedparts, but got error:
 
 
 restrictedparts(281,10)
 Error in integer(len) : vector size specified is too large
 Calls: restrictedparts - integer
 In addition: Warning message:
 In restrictedparts(281, 10) : NAs introduced by coercion
 Error in integer(len) : vector size specified is too large
 Calls: restrictedparts - integer
 
 
 is there a similar function can deal with long vector?
 
 I'm using R version 2.14.1 (2011-12-22),x86_64, linux-gnu
 
 many thanks
 
 yan
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian data analysis recommendations

2012-01-20 Thread Matt Shotwell
On Thu, 2012-01-19 at 19:23 -0500, C W wrote:
 Thanks, Rich, I will look at the book.
 
 I agree, there are many nice packages, but what if the package changes in a
 few years?  I would have no idea what is going on!  I've heard
 from predecessor in the industry who emphasize the learning, not just plug
 and chug.
 
 I really want to learn the material and understand it, above all, it is
 interesting.
 
 I am looking more towards Bayesian statistics or Bayesian inference.  I am
 in statistics graduate school, though not my field, the biology application
 could help in the understand I suppose?

This list (r-help) may not be the best place to look for advice on this.
But here is some anyway :)

For a well-rounded introduction, I recommend Robert's 'The Bayesian
Choice'. This is a great foundation for Bayesians who intend to defend
their positions on statistical inference. For a more practical approach,
Gelman, Carlin, Stern, and Rubin's book 'Bayesian Data Analysis' has
been very popular (THE most popular, according to some). Regarding the
software tools for Bayesian data analysis, the most mature _and_ active
_and_ best integrated with the R project is Martyn Plummer's JAGS (See
also the R package rjags, by the same author). Another tool that I'm
planning to check out is PyMC: http://code.google.com/p/pymc/

Best,
Matt

 On Thu, Jan 19, 2012 at 7:07 PM, Rich Shepard rshep...@appl-ecosys.com
 wrote:
  On Thu, 19 Jan 2012, C W wrote:
 
  I am trying to learn Bayesian inference and Bayesian data analysis, I am
  new in the field.  Would any experts on the list recommend any good sites
  or materials for beginners?
 
  My approach is to learn and understand the theory first, then program
  on my own using R, though I see there are already packages.
 
 
   I'm far from an expert, but why not avoid re-inventing the wheel while
 you
  learn? Buy and read Jim Albert's Bayesian Computation with R.
 
   If you're a population ecologist (or willing to extend pesented examples
  and ideas to communities and ecosystems), Ben Bolker's Ecological Models
  and Data in R explains when Bayesian and frequentist approaches each have
  advantages over the other.
 
  Rich
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading MINE output into a matrix

2012-01-15 Thread Matt Considine
I've benefited from this list with input on how to build up a 
symmetrical matrix.  The purpose of that query was to work with the 
output from the MINE routine posted at www.exploredata.net


To the extent it helps others, here is the script that I was working on 
an which turns a given MINE output column (in the case below, the third 
column corresponding to MIC) into a matrix.


Hope it helps,
Matt

#needed for MINE routine
require(rJava)
#load market data
require(PortfolioAnalytics)
data(indexes)
#write CSV file of data to current working directory
datafilename - indexes.csv
write.table(indexes, datafilename, sep=,, col.names=TRUE,
row.names=FALSE, quote=FALSE, na=NA)
#read MINE R code
source.with.encoding('MINE.r', encoding='UTF-8')
pairs_method - all.pairs
max_num_boxes_exponent - 0.6
num_clumps_factor - 15
#run MINE routine on data
MINE(datafilename,style=pairs_method,
 max.num.boxes.exponent=max_num_boxes_exponent,
 num.clumps.factor=num_clumps_factor)
#read output of MINE routine
#data is sorted in descending order of MIC variable
#output is half of a square symmetric matrix, excluding diagonal
#there are 9 columns, 7 of which are various stats
#calc of outputfilename could be better handled ...
#  kludge included to deal with filename generated on Windows
outputfilename - sprintf(%s,%s,cv=0.0,B=n^%g,Results.csv,datafilename,
  sub(.,,pairs_method,fixed=TRUE),
  max_num_boxes_exponent)
x-read.csv(outputfilename,header=TRUE)
#isolate row/col frequencies as a matrix.  we need to look at
# both to get the complete list of pairs and their respective frequencies
xtable-table(x$X.var)
ytable-table(x$Y.var)
#map frequencies of X  Y vars to rows
xmap-xtable[x$X.var]
ymap-ytable[x$Y.var]
finalmap-order(xmap,-ymap,decreasing=TRUE)
#fill in matrix - we want the third column for MIC
z-diag(length(levels(x$X.var))+1)
z[row(z)col(z)]-x[finalmap,3]
z-z+t(z)
diag(z)-1
#determine and set row/column names
varnames-c(names(sort(xtable,decreasing=TRUE)),names(sort(ytable,decreasing=TRUE))[1])
rownames(z)-varnames
colnames(z)-varnames
z

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help creating a symmetric matrix?

2011-12-24 Thread Matt Considine

Thank you all for your help and best wishes for the holiday season.
Matt Considine

On 12/24/2011 8:38 AM, William Revelle wrote:

Dear Matt, Sarah and Rui,

To answer the original question for creating a symmetric matrix



v-c(0.33740, 0.26657, 0.23388, 0.23122, 0.21476, 0.20829, 0.20486,
0.19439, 0.19237,
0.18633, 0.17298, 0.17174, 0.16822, 0.16480, 0.15027)


z-diag(6)
z[row(z)  col(z)]- v
z- z + t(z)
diag(z)- 0


z

 [,1][,2][,3][,4][,5][,6]
[1,] 0.0 0.33740 0.26657 0.23388 0.23122 0.21476
[2,] 0.33740 0.0 0.20829 0.20486 0.19439 0.19237
[3,] 0.26657 0.20829 0.0 0.18633 0.17298 0.17174
[4,] 0.23388 0.20486 0.18633 0.0 0.16822 0.16480
[5,] 0.23122 0.19439 0.17298 0.16822 0.0 0.15027
[6,] 0.21476 0.19237 0.17174 0.16480 0.15027 0.0


Bill


On Dec 24, 2011, at 6:04 AM, Sarah Goslee wrote:


Or the slightly shorter:

z-diag(6)
z[row(z)  col(z)]- v

which is what lower.tri() does,

and
z- diag(6)
z[lower.tri(z)]- v

also works.

Sarah

On Fri, Dec 23, 2011 at 9:31 PM, Rui Barradasruipbarra...@sapo.pt  wrote:

Matt Considine wrote

Hi,
I am trying to work with the output of the MINE analysis routine found at
http://www.exploredata.net

Specifically, I am trying to read the results into a matrix (ideally an
n x n x 6 matrix, but I'll settle right now for getting one column into
a matrix.)

The problem I have is not knowing how to take what amounts to being one
half of a symmetric matrix - excluding the diagonal - and getting it
into a matrix.  I have tried using lower.tri as found here
https://stat.ethz.ch/pipermail/r-help/2008-September/174516.html
but it appears to only partially fill in the matrix.  My code and an
example of the output is below.  Can anyone point me to an example that
shows how to create a matrix with this sort of input?

Thank you in advance,
Matt

#v-newx[,3]
#or, for the sake of this example
v-c(0.33740, 0.26657, 0.23388, 0.23122, 0.21476, 0.20829, 0.20486,
0.19439, 0.19237,
0.18633, 0.17298, 0.17174, 0.16822, 0.16480, 0.15027)
z-diag(6)
ind- lower.tri(z)
z[ind]- t(v)[ind]

z
  [,1][,2] [,3] [,4] [,5] [,6]
[1,] 1.0 0.00000
[2,] 0.26657 1.00000
[3,] 0.23388 0.192371000
[4,] 0.23122 0.18633   NA100
[5,] 0.21476 0.17298   NA   NA10
[6,] 0.20829 0.17174   NA   NA   NA1



Hello,

Aren't you complicating?

In the last line of your code, why use 'v[ind]' if 'ind' indexes the matrix,
not the vector?

z-diag(6)
ind- lower.tri(z)
z[ind]- v#This works
z

Rui Barradas


--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


William Revellehttp://personality-project.org/revelle.html
Professor  http://personality-project.org
Department of Psychology   http://www.wcas.northwestern.edu/psych/
Northwestern Universityhttp://www.northwestern.edu/
Use R for psychology http://personality-project.org/r
It is 6 minutes to midnighthttp://www.thebulletin.org






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help creating a symmetric matrix?

2011-12-22 Thread Matt Considine
Hi,
I am trying to work with the output of the MINE analysis routine found at
   http://www.exploredata.net

Specifically, I am trying to read the results into a matrix (ideally an 
n x n x 6 matrix, but I'll settle right now for getting one column into 
a matrix.)

The problem I have is not knowing how to take what amounts to being one 
half of a symmetric matrix - excluding the diagonal - and getting it 
into a matrix.  I have tried using lower.tri as found here
   https://stat.ethz.ch/pipermail/r-help/2008-September/174516.html
but it appears to only partially fill in the matrix.  My code and an 
example of the output is below.  Can anyone point me to an example that 
shows how to create a matrix with this sort of input?

Thank you in advance,
Matt

require(PortfolioAnalytics)
#load market index data
data(indexes)
#save data as a CSV
write.table(indexes, C:/Rwork/indexes.csv, sep=,, col.names=TRUE,
   row.names=FALSE, quote=FALSE, na=NA)
#assumes rJava is installed, MINE.r and MINE.jar are in the working 
directory
#read in MINE.r
source.with.encoding('C:/Rwork/MINE.r', encoding='UTF-8')
#run MINE on indexes
MINE(C:/Rwork/indexes.csv,all.pairs)
#read the output file of MINE analysis
x=read.csv(C:/Rwork/indexes.csv,B=n^0.6,k=15,Results.csv,header=TRUE)
#isolate one half of matrix
newx-x[,1:3]
newx
 X.var  Y.var MIC..strength.
1 US.Equities Int.l.Equities0.33740
2US.Bonds   US.Tbill0.26657
3US.Tbill  Inflation0.23388
4 Commodities  Inflation0.23122
5 Commodities   US.Tbill0.21476
6 US.Equities   US.Tbill0.20829
7US.Bonds  Inflation0.20486
8  Int.l.EquitiesCommodities0.19439
9US.BondsCommodities0.19237
10US.EquitiesCommodities0.18633
11   US.BondsUS.Equities0.17298
12US.Equities  Inflation0.17174
13 Int.l.Equities   US.Tbill0.16822
14   US.Bonds Int.l.Equities0.16480
15 Int.l.Equities  Inflation0.15027

#v-newx[,3]
#or, for the sake of this example
v-c(0.33740, 0.26657, 0.23388, 0.23122, 0.21476, 0.20829, 0.20486, 
0.19439, 0.19237,
0.18633, 0.17298, 0.17174, 0.16822, 0.16480, 0.15027)
z-diag(6)
ind - lower.tri(z)
z[ind] - t(v)[ind]

z
 [,1][,2] [,3] [,4] [,5] [,6]
[1,] 1.0 0.00000
[2,] 0.26657 1.00000
[3,] 0.23388 0.192371000
[4,] 0.23122 0.18633   NA100
[5,] 0.21476 0.17298   NA   NA10
[6,] 0.20829 0.17174   NA   NA   NA1


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about spaces in r

2011-12-09 Thread Matt Spitzer
Hello,
I would like to please ask if someone would explain how r reads characters
and numbers differently.  Using read.csv, I had a matrix that resembled the
following, only with many more ids and data:

ID
 Visit
 variable
 2
 1
 5
 2
 1
 3
 2
 3
 4
 2
 41
 1
 2
 42
 34
 2
 5
 54
 2
 9
 1
 2
 10
 3
 2
 12
 5
 5
 1
 54
 5
 2
 9
 5
 3
 3
 5
 41
 54
 5
 41
 2
 5
 5
 235
 5
 9
 4
 5
 10
 2
 5
 12
 2

I then tried to subset for Visit==3.  However, subset == was not working
properly.  This gave me zero rows.  I printed the matrix/dataframe and
found that this was because r viewed the 3 as  3 (space three).  So, I
had to type subset ==  3 to select for the data instead.  I think this
has to do with character, number and string properties, but I am quite a
novice.  Would anyone be able to instruct me how one tells a
dafaframe/matrix to convert numbers such as  3 to 3 so that I do not
get confused in the future?  I guess another problem I have is that I am
still learning the differences between matrices and dataframes.
Thanks so much, Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R logo in eps formt

2011-12-01 Thread Matt Shotwell
See this earlier post for SVG logos:

http://tolstoy.newcastle.edu.au/R/e12/devel/10/10/0112.html

Using Image Magick, do something like 

convert logo.svg logo.eps


On Thu, 2011-12-01 at 10:56 +0700, Ben Madin wrote:
 G'day all,
 
 Sorry if this message has been posted before, but searching for R is always 
 difficult...
 
 I was hoping for a copy of the logo in eps format? Can I do this from R, or 
 is one available for download?
 
 cheers
 
 Ben
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R endnote entry

2011-11-30 Thread Matt Cooper
I know citation() gives the R citation to be used in publications. Has
anyone put this into endnote nicely? I'm not very experienced with
endnote, and the way I have it at the momeny the 'R Development Core
Team' becomes R. D. C. T. etc.

Cheers.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] googleVis motionchart - slow with Date class

2011-10-31 Thread Matt Pocernich

Hi,

I am trying to create a googleVis motion chart with monthly data.  When 
formatting the date column as a Date class variable, the plot as presented in 
the browser becomes considerably slower and very prone to crashing the browser. 
 To illustrate this issue I have modified the WorldBank demo.

### objects from demo(WorldBank, package = googleVis)
M - gvisMotionChart(subData, idvar=country.name, timevar=year, 
options=list(width=700, height=600))
plot(M)

This works fine and I can smoothly move back and forth between the scatter 
plots and the line plots. 


## here I express the date as a Date class object - arbibrarily assigning each 
year to June 1st.

subData$year2 - as.Date(ISOdate(subData$year, 6, 1 ))

M2 - gvisMotionChart(subData, idvar=country.name, timevar=year2, 
options=list(width=700, height=600))
plot(M2)

Using Chrome,  this plot is very slow to load and it appears when pressing play 
that the date field fills in each day of the year.
Trying to go back and fourth between the line plot and the scatter plot will 
crash the browser.

Is there a better way to express monthly data?  I have tried converting it to a 
numeric in the form of mmdd or mm, but this didn't work?
I am also wondering. is the best place to post a question about googleVis?  I 
notice threads on stackoverflow and other places.

Thanks,

Matt
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matching

2011-10-29 Thread shish matt
I have a spatial weight file in csv that I want as listw object in R.
The file has the following 3 variables (left to right in the file) -- OID_, NID 
and WEIGHTS. NID stands for the neighbors and OID_ as the origins. There are 
217 origins with 4 neighbors each.


I have been able to read the csv file as a data frame (test.csv). Then I tried 
to check whether the OID_ variable is in the right place in the dataframe. I 
used match for that using:
o
- match(OID_, OID_)
I am not sure whether this is the right way to match. Please advice.

Anyway, next I created a matrix object (m) using:

m
- as.matrix(test.csv[, -1])

Then I created object m1, using:

m1
- m[o, o]

Finally, I tried creating listw object using:
mat2listw(m1)
Here I get an error that x is not a square matrix.

Not sure what to do now. Any helo appreciated!

Thanks,
Shishm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contact person for UseR 2012, please?

2011-10-18 Thread Matt Shotwell
The contact person is:

Stephania McNeal-Goddard
email: stephania.mcneal-godd...@vanderbilt.edu
phone: (615)322-2768

Vanderbilt University School of Medicine
Department of Biostatistics
S-2323 Medical Center North
Nashville, TN 37232-2158

On Tue, 2011-10-18 at 12:41 -0400, David Winsemius wrote:
 On Oct 18, 2011, at 12:25 PM, Erin Hodgess wrote:
 
  Dear R People:
 
  Do you know who the contact person is for UseR 2012, please?
 
  I'm trying to get together some numbers for funding (sorry for the
 
 Funny, it was the first hit on a Google search with term useR2012
 
 http://biostat.mc.vanderbilt.edu/wiki/Main/UseR-2012
 
 
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Assistant Professor, Department of Biostatistics
School of Medicine, Vanderbilt University
1161 21st Ave. S2323 MCN Office CC2102L
Nashville, TN 37232-2158

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Related Topic] need help on read.spss

2011-10-13 Thread Matt Shotwell
Would it be worthwhile to update the read.spss implementation using the
more recent discoveries from the PSPP group? I don't mean to copy their
code; but to use the ideas in their code. Is anyone working on this? I
wouldn't want the effort to be duplicated.

On Thu, 2011-10-13 at 16:22 +0200, Uwe Ligges wrote:
 
 On 11.10.2011 12:07, Smart Guy wrote:
  Hi,
 I have one doubt about one of the parameter of 'read.spss()' from
  'foreign' package.
  Here is the syntax :-
 
  read.spss ( file,
   use.value.labels = TRUE,
   to.data.frame = FALSE,
   max.value.labels = Inf,
   trim.factor.names = FALSE,
   trim_values = TRUE,
   reencode = NA,
   use.missings = to.data.frame )
 
 
  In above syntax when I pass *'to.data.frame= FALSE*' it gives me missing
  values from SPSS file (that I try to read using read.spss() ). But when I
  pass '*to.data.frame = TRUE*' then its not giving me missing values. And
  need to get missing values.
 
  According to read.spss() documentation
 
  *to.data.frame :  return a data frame?*
 
  I am curious to know, if we pass *'to.data.frame = TRUE*' , is it going to
  cause some issue or effect something? I didn't understand the read.spss()
  documentation correctly.
  Please explain.
 
  Thanks in Advance
 
 
 An R data.frame cannot represent different kinds of missing values, 
 since R just has NA. Therefore, there are two way to import data:
 
 to.data.frame=FALSE  will read all the information, but into a format 
 you will likely have to postprocess to make it conveniently usable.
 
 to.data.frame=TRUE   will import into a data.frame, but that cannot 
 represent all the nuances known from the SPSS representation.
 
 Uwe Ligges
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rweb and setting up R on a server

2011-09-08 Thread Matt Shotwell
Erin, 

I haven't used Rweb recently. The URL is
http://www.math.montana.edu/Rweb/ . If you have a server, you could set
up the server version of RStudio: http://rstudio.org/download/server .
It worked well when I tried it. 

Best,
Matt

On Tue, 2011-09-06 at 17:07 -0500, Erin Hodgess wrote: 
 Dear R People:
 
 At one time, Rweb existed, which had R on a server.
 
 I looked for it, but can't find it.
 
 Has anyone used that recently, or is there a new equivalent, please?
 
 Thanks,
 Erin
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] readBin fails to read large files

2011-09-01 Thread Matt Shotwell
On Thu, 2011-09-01 at 17:36 +0100, Prof Brian Ripley wrote:
 readBin is intended to read a few items at a time, not 10^9.  You are 
 probably getting 32-bit integer overflow inside your OS, since the 
 number of bytes you are trying to read in one go exceeds 2GB.
 
 Don't do that: read say a million at time.
 
 And BTW, if these really are unsigned ints you will get wraparound.

To elaborate, ?readBin reads that the 'signed' argument is only used for
integers of size 1 and 2 bytes. These are ultimately converted to signed
4 byte integers, because that's how R stores integers. To be exact, if
your file contains integers larger than 2^31-1 = 2147483647, would
occur. In actuality, R returns NA for those values.

I'm bringing this up because R normally issues a warning:

R 2147483647L + 1L
[1] NA
Warning message:
In 2147483647L + 1L : NAs produced by integer overflow

But, a similar warning isn't issued by readBin when NA results from
signed integer overflow:

#The raw vector below represents 2147483647L and 2147483647L + 1L
#in little endian, unsigned, 4 byte integers 
R dat - as.raw(c(0xff,0xff,0xff,0x7f,0x00,0x00,0x00,0x80))
R writeBin(dat, 'test.bin')
R readBin('test.bin', n=2, integer(), signed=FALSE)
[1] 2147483647 NA

 On Thu, 1 Sep 2011, Benton, Paul wrote:
 
  Posting for a friend
 
  Begin forwarded message:
 
  From: Geier, Florian 
  florian.geie...@imperial.ac.ukmailto:florian.geie...@imperial.ac.uk
  Subject: Fwd: readBin fails to read large files
  Date: September 1, 2011 4:10:53 PM GMT+01:00
  To:
 
 
 
  Begin forwarded message:
 
  Date: 1 September 2011 16:01:45 GMT+01:00
  Subject: readBin fails to read large files
 
  Dear all,
 
  I am trying to read a large file (~2GB) of unsigned ints into R. Using the 
  command:
 
  raw-readBin(file,n=10^8, integer(),endian=little,signed=FALSE)
 
  It works fine for n=10^8, but fails for n=10^9 (or even at n=6*10^8). My 
  machine$sizeof.long is 8 bit.
  I am running R 2.13.1 on a x86_64-apple-darwin9.8.0/x86_64 (64-bit) 
  architecture.
 
  Thanks for your help
 
  Florian
 
  --
  AXA doctoral fellow
  Bundy lab - Biomolecular Medicine
  Imperial College London
 
 
 
 
 
  --
  AXA doctoral fellow
  Bundy lab - Biomolecular Medicine
  Imperial College London
 
 
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unusual separators

2011-08-16 Thread Matt Curcio
Hi all,
I have a list that I got from a web page that I would like to crunch.
Unfortunately, the list has some unusual separators in it.  I believe
the columns are separated by 1 space and 1 tab.  I tried to insert
this into the read.table( ..., sep= \t, ...) but got an error that
said something like 'only one byte separators can be used.
I have thought about using a gsub to 'swap out' the space + tab and
replace it with commas, etc but thought there might be another way.
Any suggestions?
M
-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reshape::rename package unable to install !?!

2011-08-07 Thread Matt Curcio
Greetings all,
I have been working with RStudio and R only for a little while.  I
came across a package called 'reshape' that helped me 'rename'
columns.  Unfortunately, my computer got hosed (too much playing with
linux too late at nite) and I had to re-install everything, BUT when I
tried to reinstall 'reshape' or 'reshape2' I COULDN't.  Is there a way
to get over this hurdle with reshape or is there another command I can
use.  I am stuck because my programs up to this point used 'rename'
and now I have to redo some work.
M
-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which is more efficient?

2011-08-04 Thread Matt Curcio
Greetings all,
I am curious to know if either of these two sets of code is more efficient?

Example1:
 ## t-test ##
colA - temp [ , j ]
colB - temp [ , k ]
ttr - t.test ( colA, colB, var.equal=TRUE)
tt_pvalue [ i ] - ttr$p.value

or
Example2:
tt_pvalue [ i ] - t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE)
-
I have three loops, i, j, k.
One to test the all of i files in a directory.  One to tease out
column j and compare it by means of t-test to column k in each of
the files.
---
for ( i in 1:num_files ) {
   temp - read.table ( files_to_test [ i ], header=TRUE, sep=\t)
   num_cols - ncol ( temp )
   ## Define Columns To Compare ##
   for ( j in 2 : num_cols ) {
  for ( k in 3 : num_cols ) {
  ## t-test ##
  colA - temp [ , j ]
  colB - temp [ , k ]
  ttr - t.test ( colA, colB, var.equal=TRUE)
  tt_pvalue [ i ] - ttr$p.value
  }
   }
}

I am a novice writer of code and am interested to hear if there are
any (dis)advantages to one way or the other.
M


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error message for MCC

2011-08-03 Thread Matt Curcio
Greetings all,
I am getting an error message that is stifling me.
Any ideas?

 ## Define Directories ##
 load_from - /home/mcc/Dropbox/abrodsky/kegg_combine_data/
 save_to - /home/mcc/Dropbox/abrodsky/ttest_results/

 ###
 ## Define Columns To Compare ##
 compareA - log_b_rich
 compareB - Fc_cdt_rich_tot

 
 ## Collect Files To Compare ##
 setwd(load_from)
 files_to_test - list.files(pattern = combine.kegg)

 ##
 ## Initialize Variables ##
 vl - length(files_to_test)
 temp - vector(mode=numeric, length = vl)
 colA - vector(mode=numeric, length = vl)
 colB - vector(mode=numeric, length = vl)
 tt - vector(mode=numeric, length = vl)


 
 ## Calculate P-values ##
 for (i in 1:3){
+temp1 - read.table(files_to_test[i], header=TRUE, sep= )
+numrows - nrow(temp1)
+tt_pvalue - matrix(data=temp, nrow=numrows, ncol=vl)
+colA - temp[,compareA]
+colB - temp[,compareB]
+tt - t.test(colA, colB, var.equal=TRUE)
+tt_pvalue - tt$p.value
+ }
Error in temp[, compareA] : incorrect number of dimensions

-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use dump or write? or what?

2011-08-01 Thread Matt Curcio
Greetings all,
Thanks for all your help so far.
Let me give a better idea of what I am doing.  I have hundreds of
files that I need to plow thru with a t-test and correlation test.
BTW, 'tempA' and tempB' are simply columns of numbers from a gene-chip
experiment that spits out dna 'amounts'. So I have set up a loop to
read the files and carry out the tests but need to save it for later
inspection (and Jim H-you are probably right, for later inspection).
By inspection I mean I don't know what I want to do with it yet,
Remember: That's why they call it Research.

So it seems that 'save/load' might be a good alternative for my work.
Any suggestions,
M

On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio matt.curcio...@gmail.com wrote:
 Greetings all,
 I am calculating two t-test values for each of many files then save it
 to file calculate another set and append, repeat.
 But I can't figure out how to write it to file and then append
 subsequent t-tests.
 (maybe too tired ;} )
 I have tried to use dump and file.append to no avial.

 ttest_results = tempfile()

 two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE)
 welch_ttest - t.test (tempA, tempB, var.equal = FALSE)

 dump (two_sample_ttest, file = dumpdata.txt, append=TRUE)
 ttest_results - file.append (ttest_results, two_sample_ttest)

 Any suggestions,
 M
 --



 Matt Curcio
 M: 401-316-5358
 E: matt.curcio...@gmail.com




-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Errors, driving me nuts

2011-08-01 Thread Matt Curcio
Greetings all,
I am getting this error that is driving me nuts... (not a long trip, haha)

I have a set of files and in these files I want to calculate ttests on
rows 'compareA' and 'compareB' (these will change over time there I
want a variable here). Also these files are in many different
directories so I want a way filter out the junk...  Anyway I don't
believe that this is related to my errors but I mention it none the
less.

 files_to_test - list.files (pattern = kegg.combine)
 for (i in 1:length (files_to_test)) {
+raw_data - read.table (files_to_test[i], header=TRUE, sep= )
+tmpA - raw_data[,compareA]
+tmpB - raw_data[,compareB]
+tt - t.test (tmpA, tmpB, var.equal=TRUE)
+tt_pvalue[i] - tt$p.value
+ }
Error in tt_pvalue[i] - tt$p.value : object 'tt_pvalue' not found
# I tried setting up a vector...
# as.vector(tt_pvalue, mode=any) ### but NO GO
 file.name = paste(ttest.results., compareA, compareB, )
 setwd(save_to)
 write.table(tt_pvalue, file=file.name, sep=\t )
Error in inherits(x, data.frame) : object 'tt_pvalue' not found
# No idea??

What is going wrong??
M


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Appending 4 Digits On A File Name

2011-07-31 Thread Matt Curcio
Greetings all,
I would like to append a 4 digit number suffix to the names of my
files for later use.  What I am using now only produces 1 or 2 or 3 or
4 digits.


for (i in 1:1000) {
   temp - (kegg [i,])
   temp - merge (temp, subrichcdt, by=gene)
  file.name - paste (kegg.subrichcdt., i, .txt, sep=)
  write.table(temp, file=file.name)
}
###
But I want:
kegg.subrichcdt.0001.txt
kegg.subrichcdt.0002.txt, ...


Any suggestions
M
-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Appending 4 Digits On A File Name

2011-07-31 Thread Matt Curcio
Hmmm...
Got this error

Error in formatC(i, width = 4, format = d, flat = 0) :
  unused argument(s) (flat = 0)

Any ideas,
M

On Sun, Jul 31, 2011 at 1:30 PM, Matt Curcio matt.curcio...@gmail.com wrote:
 Greetings all,
 I would like to append a 4 digit number suffix to the names of my
 files for later use.  What I am using now only produces 1 or 2 or 3 or
 4 digits.

 
 for (i in 1:1000) {
   temp - (kegg [i,])
   temp - merge (temp, subrichcdt, by=gene)
      file.name - paste (kegg.subrichcdt., i, .txt, sep=)
      write.table(temp, file=file.name)
 }
 ###
 But I want:
 kegg.subrichcdt.0001.txt
 kegg.subrichcdt.0002.txt, ...


 Any suggestions
 M
 --


 Matt Curcio
 M: 401-316-5358
 E: matt.curcio...@gmail.com




-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Appending 4 Digits On A File Name

2011-07-31 Thread Matt Curcio
Michael,
Got it, thanks.  Looking over the man file realized it is FLAG not flat.
Cheers,
M

On Sun, Jul 31, 2011 at 2:26 PM, Matt Curcio matt.curcio...@gmail.com wrote:
 Hmmm...
 Got this error

 Error in formatC(i, width = 4, format = d, flat = 0) :
  unused argument(s) (flat = 0)

 Any ideas,
 M

 On Sun, Jul 31, 2011 at 1:30 PM, Matt Curcio matt.curcio...@gmail.com wrote:
 Greetings all,
 I would like to append a 4 digit number suffix to the names of my
 files for later use.  What I am using now only produces 1 or 2 or 3 or
 4 digits.

 
 for (i in 1:1000) {
   temp - (kegg [i,])
   temp - merge (temp, subrichcdt, by=gene)
      file.name - paste (kegg.subrichcdt., i, .txt, sep=)
      write.table(temp, file=file.name)
 }
 ###
 But I want:
 kegg.subrichcdt.0001.txt
 kegg.subrichcdt.0002.txt, ...


 Any suggestions
 M
 --


 Matt Curcio
 M: 401-316-5358
 E: matt.curcio...@gmail.com




 --


 Matt Curcio
 M: 401-316-5358
 E: matt.curcio...@gmail.com




-- 


Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use dump or write? or what?

2011-07-31 Thread Matt Curcio
Greetings all,
I am calculating two t-test values for each of many files then save it
to file calculate another set and append, repeat.
But I can't figure out how to write it to file and then append
subsequent t-tests.
(maybe too tired ;} )
I have tried to use dump and file.append to no avial.

ttest_results = tempfile()

two_sample_ttest - t.test (tempA, tempB, var.equal = TRUE)
welch_ttest - t.test (tempA, tempB, var.equal = FALSE)

dump (two_sample_ttest, file = dumpdata.txt, append=TRUE)
ttest_results - file.append (ttest_results, two_sample_ttest)

Any suggestions,
M
-- 



Matt Curcio
M: 401-316-5358
E: matt.curcio...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrap

2011-07-21 Thread Matt Shotwell
In order to apply the bootstrap, you must resample, uniformly at random
from the independent units of measurement in your data. Assuming that
these represent the rows of 'data', consider the following:

est - function(y, x, obeta = c(1,1), verbose=FALSE) {
n - length(x)
X - cbind(rep(1, n), x)
nbeta - c(0,0)
iter - 0
while(crossprod(obeta-nbeta)10^(-12)) {
nbeta - obeta
eta   - X%*%nbeta
mu- eta
mu1   - 1/eta
W - diag(as.vector(mu1))
Z - X%*%nbeta+(y-mu)
XWX   - t(X)%*%W%*%X
XWZ   - t(X)%*%W%*%Z
Cov   - solve(XWX)
obeta - Cov%*%XWZ
iter  - iter+1
if(verbose)
cat(Iteration #  and beta1= ,iter, nbeta, \n)
}
return(nbeta[1,1])
}

boot - function(data, reps) {
n - nrow(data)
Nt - vector('numeric', length=reps)
for(Ncount in 1:reps) {
#resample the rows of data
bdata - data[sample(1:n,n,replace=TRUE),]
#recompute and store estimate
Nt[Ncount] - est(bdata[,1], bdata[,2])
}
return(Nt) 
}

stem(boot(data,1000),width=60)

  The decimal point is at the |

  -3 | 4
  -2 | 
  -1 | 2
  -0 | 88866555444333222111
   0 | 0022+400
   1 | 0001+203
   2 | 2224+23
   3 | 112223344455
   4 | 113344555789
   5 | 02334446677899
   6 | 1112334455778
   7 | 11235568
   8 | 001799
   9 | 0259
  10 | 1446
  11 | 19
  12 | 48
  13 | 8
  14 | 024
  15 | 
  16 | 
  17 | 0788
  18 | 
  19 | 1

On Wed, 2011-07-20 at 18:09 -0400, Val wrote:
 Hi all,
 
 I am facing difficulty on  how to use bootstrap sampling and
 below is my example of function.
 
 Read a data , use some functions and  use iteration to find the solution(
 ie, convergence is reached).  I want to use bootstrap approach to do it
 several times (200 or 300 times) this whole process  and see the
 distribution of parameter of interest.
 
 Below is a small example that resembles my problem. However,  I  found out
 all samples are the same. So I would appreciate your help on this case.
 
 #**
 rm(list=ls())
  xx - read.table(textConnection( y x
 11 5.16
 11 4.04
 14 3.85
 19 5.68
 4 1.26
 23  7.89
 15 4.25
 17 3.94
 7 2.35
 17 4.74
 14 5.49
 11 4.12
 17 5.92), header=TRUE)
 data - as.matrix(xx)
 closeAllconnections()
 
 Nt - NULL
 for (Ncount in 1:100)
  {
 y - data[,1]
 x - data[,2]
 n - length(x)
 
 X - cbind(rep(1,n),x) #covariate/design matrix
 obeta- c(1,1) #previous/starting values of beta
 
 nbeta - c(0,0)#new beta
 iter=0
 
   while(crossprod(obeta-nbeta)10^(-12))
{
 nbeta - obeta
 eta   - X%*%nbeta
 mu- eta
 mu1   - 1/eta
 W - diag(as.vector(mu1))
 Z - X%*%nbeta+(y-mu)
 XWX   - t(X)%*%W%*%X
 XWZ   - t(X)%*%W%*%Z
 Cov   - solve(XWX)
 obeta - Cov%*%XWZ
 iter  - iter+1
 
 cat(Iteration #  and beta1= ,iter, nbeta, \n)
 }
 
   Nt[Ncount] - nbeta[1,1]
 }
 Nt
 summary(Nt)
 #**e*
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hold position of vertices constant in network {statnet}?

2011-07-19 Thread Matt Bakker
I am a novice with network fuctions! I have been exploring the network
function in the statnet package, but haven't been able to figure out
how to hold vertices in position while varying edge features. Can
anyone advise on whether this is possible, and if so, how to do it?
Thanks!
-- 
Matthew Bakker, Ph.D.
Department of Plant Pathology
University of Minnesota
495 Borlaug Hall
1991 Upper Buford Circle
Saint Paul, MN  55108 USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to capture console output in a numeric format

2011-06-24 Thread Matt Shotwell
Ravi, 

Consider using an environment (i.e. a 'reference' object) to store the
results, avoiding string manipulation, and the potential for loss of
precision:

fr - function(x, env) {   ## Rosenbrock Banana function
x1 - x[1]
x2 - x[2]
f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
if(exists('fout', env))
fout - rbind(get('fout', env), c(x1, x2, f))
else
fout - c(x1=x1, x2=x2, f=f)
assign('fout', fout, env)
f   
}

out - new.env()
ans - optim(c(-1.2, 1), fr, env=out)
out$fout

Best,
Matt

 
On Fri, 2011-06-24 at 15:10 +, Ravi Varadhan wrote:
 Thank you very much, Jim.  That works!  
 
 I did know that I could process the character strings using regex, but was 
 also wondering if there was a direct way to get this.  
 
 Suppose, in the current example I would like to obtain a 3-column matrix that 
 contains the parameters and the function value:
 
 fr - function(x) {   ## Rosenbrock Banana function
 on.exit(print(cbind(x1, x2, f)))
 x1 - x[1]
 x2 - x[2]
 f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
 f 
 }
 
 fvals - capture.output(ans - optim(c(-1.2,1), fr))
 
 Now, I need to tweak your solution to get the 3-column matrix.  It would be 
 nice, if there was a more direct way to get the numerical output, perhaps a 
 numeric option in capture.output().
 
 Best,
 Ravi.
 
 ---
 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology School of Medicine Johns 
 Hopkins University
 
 Ph. (410) 502-2619
 email: rvarad...@jhmi.edu
 
 -Original Message-
 From: jim holtman [mailto:jholt...@gmail.com] 
 Sent: Friday, June 24, 2011 10:48 AM
 To: Ravi Varadhan
 Cc: r-help@r-project.org
 Subject: Re: [R] How to capture console output in a numeric format
 
 try this:
 
  fr - function(x) {   ## Rosenbrock Banana function
 +on.exit(print(f))
 +x1 - x[1]
 +x2 - x[2]
 +f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
 +f
 + }
 
  fvals - capture.output(ans - optim(c(-1.2,1), fr))
  # convert to numeric
  fvals - as.numeric(sub(^.* , , fvals))
 
  fvals
   [1] 24.20  7.095296 15.08  4.541696
   [5]  6.029216  4.456256  8.879936  7.777856
   [9]  4.728125  5.167901  4.21  4.437670
  [13]  4.178989  4.326023  4.070813  4.221489
  [17]  4.039810  4.896359  4.009379  4.077130
  [21]  4.020798  3.993600  4.024586  4.117625
  [25]  3.993115  3.976081  3.971089  4.023905
  [29]  3.980807  3.952577  3.932179  3.935345
 
 
 On Fri, Jun 24, 2011 at 10:39 AM, Ravi Varadhan rvarad...@jhmi.edu wrote:
  Hi,
 
  I would like to know how to capture the console output from running an 
  algorithm for further analysis.  I can capture this using capture.output() 
  but that yields a character vector.  I would like to extract the actual 
  numeric values.  Here is an example of what I am trying to do.
 
  fr - function(x) {   ## Rosenbrock Banana function
 on.exit(print(f))
 x1 - x[1]
 x2 - x[2]
 f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
 f
  }
 
  fvals - capture.output(ans - optim(c(-1.2,1), fr))
 
  Now, `fvals' contains character elements, but I would like to obtain the 
  actual numerical values.  How can I do this?
 
  Thanks very much for any suggestions.
 
  Best,
  Ravi.
 
  ---
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology School of Medicine Johns 
  Hopkins University
 
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edumailto:rvarad...@jhmi.edu
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 

-- 
Matthew S. Shotwell
Assistant Professor, Department of Biostatistics
School of Medicine, Vanderbilt University
1161 21st Ave. S2323 MCN Office CC2102L
Nashville, TN 37232-2158

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to capture console output in a numeric format

2011-06-24 Thread Matt Shotwell
On Fri, 2011-06-24 at 12:09 -0400, David Winsemius wrote:
 On Jun 24, 2011, at 11:27 AM, Matt Shotwell wrote:
 
  Ravi,
 
  Consider using an environment (i.e. a 'reference' object) to store the
  results, avoiding string manipulation, and the potential for loss of
  precision:
 
  fr - function(x, env) {   ## Rosenbrock Banana function
 x1 - x[1]
 x2 - x[2]
 f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
 if(exists('fout', env))
 fout - rbind(get('fout', env), c(x1, x2, f))
 
 So _that's_ what a reference object is?

Well, environments have 'pass-by-reference' behavior. That is, when they
are passed to a function, modifications to the environment persist
outside the function call.

This is distinct from the Reference class (?methods::ReferenceClass).
But there are similar concepts. The methods of a reference class can
modify the class fields in a 'by-reference' fashion. However, the fields
need not be passed to a method.

 This seems to give the same results in this example. Am I committing  
 any sins by sneaking around the get()?
 
  if(exists('fout', env))
 fout - rbind(env[['fout']], c(x1, x2, f))  # seems more direct
 

'env$fout' works here too.

 Thinking I also might be able to avoid the later assign(), I tried  
 these without success.
 
 fr - function(x, env) {   ## Rosenbrock Banana function
 x1 - x[1]
 x2 - x[2]
 f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
 if(exists('fout', env))
 env[['fout']] - rbind(env[['fout']], c(x1, x2, f))
 else
 fout - c(x1=x1, x2=x2, f=f)
 
 f
 }

this would work with 'env$fout - c(x1=x1, x2=x2, f=f)' following the
'else'. Hence, David's version might look like this:

fr - function(x, env) {   ## Rosenbrock Banana function 
x1 - x[1]
x2 - x[2]
f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
if(exists('fout', env))
env$fout - rbind(env$fout, c(x1, x2, f))
else
env$fout - c(x1=x1, x2=x2, f=f)
f
}

out - new.env()
ans - optim(c(-1.2, 1), fr, env=out)
out$fout

-Matt

 out - new.env()
 ans - optim(c(-1.2, 1), fr, env=out)
 out$fout
 # NULL
 
   Is there no '[[-' for environments? (Also tried '-' but I know  
 that is sinful/ )
 
 -- 
 David.
 else
 fout - c(x1=x1, x2=x2, f=f)
 assign('fout', fout, env)
 f
  }
 
  out - new.env()
  ans - optim(c(-1.2, 1), fr, env=out)
  out$fout
 
  Best,
  Matt
 
 
  On Fri, 2011-06-24 at 15:10 +, Ravi Varadhan wrote:
  Thank you very much, Jim.  That works!
 
  I did know that I could process the character strings using regex,  
  but was also wondering if there was a direct way to get this.
 
  Suppose, in the current example I would like to obtain a 3-column  
  matrix that contains the parameters and the function value:
 
  fr - function(x) {   ## Rosenbrock Banana function
 on.exit(print(cbind(x1, x2, f)))
 x1 - x[1]
 x2 - x[2]
 f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
 f   
  }
 
  fvals - capture.output(ans - optim(c(-1.2,1), fr))
 
  Now, I need to tweak your solution to get the 3-column matrix.  It  
  would be nice, if there was a more direct way to get the numerical  
  output, perhaps a numeric option in capture.output().
 
  Best,
  Ravi.
 
  ---
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology School of Medicine  
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edu
 
  -Original Message-
  From: jim holtman [mailto:jholt...@gmail.com]
  Sent: Friday, June 24, 2011 10:48 AM
  To: Ravi Varadhan
  Cc: r-help@r-project.org
  Subject: Re: [R] How to capture console output in a numeric format
 
  try this:
 
  fr - function(x) {   ## Rosenbrock Banana function
  +on.exit(print(f))
  +x1 - x[1]
  +x2 - x[2]
  +f - 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
  +f
  + }
 
  fvals - capture.output(ans - optim(c(-1.2,1), fr))
  # convert to numeric
  fvals - as.numeric(sub(^.* , , fvals))
 
  fvals
   [1] 24.20  7.095296 15.08   
  4.541696
   [5]  6.029216  4.456256  8.879936   
  7.777856
   [9]  4.728125  5.167901  4.21   
  4.437670
  [13]  4.178989  4.326023  4.070813   
  4.221489
  [17]  4.039810  4.896359  4.009379   
  4.077130
  [21]  4.020798  3.993600  4.024586   
  4.117625
  [25]  3.993115  3.976081  3.971089   
  4.023905
  [29]  3.980807  3.952577  3.932179   
  3.935345
 
 
  On Fri, Jun 24, 2011 at 10:39 AM, Ravi Varadhan  
  rvarad...@jhmi.edu wrote:
  Hi,
 
  I would like to know how to capture the console output from  
  running an algorithm for further analysis.  I can capture this  
  using capture.output() but that yields a character vector.  I  
  would like

[R] analysing a three level reponse

2011-06-22 Thread Matt Ellis (Research)
Hello,
I am struggling to figure out how to analyse a dataset I have inherited
(please note this was conducted some time ago, so the data is as it is,
and I know it isn't perfect!).
 
A brief description of the experiment follows:
Pots of grass were grown in 1l pots of standad potting medium for 1
month with a regular light and watering regime. At this point they were
randomly given 1l of one of 4 different pesticides at one of 4 different
concentrations (100%, 75%, 50% or 25% in water). There were 20 pots of
grass for each pesticide/concentration giving 320 pots. There were no
control (untreated) pots. The response was measured after 1 week and
recorded as either:
B1 - grass dead
B2 - grass affected but not dead
B3 - no visible effect
 
I could analyse this as lethal effect vs non-lethal effect (B1 vs B2+B3)
or some effect vs no effect (B1+B2 vs B3) binomial model, but I can't
see how to do it with three levels.
 
Any pointing in the right direction greatly appreciated!
Thanks
Matt

--
Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify me at 
matt.el...@basc.org.uk then delete it. BASC may monitor email traffic.  By 
replying to this e-mail you consent to BASC monitoring the content of any email 
you send or receive from BASC. Any views expressed in this message are those of 
the individual sender, except where the sender specifies with authority, states 
them to be the views of the British Association for Shooting and Conservation. 
BASC can confirm that this email message and any attachments have been scanned 
for the presence of computer viruses but recommends that you make your own 
virus checks. Registered Industrial and Provident Society No.: 28488R. 
Registered Office: Marford Mill, Rossett, Wrexham, LL12 0HL. 
--



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Elbow criterion

2011-06-20 Thread Matt Shotwell
On Mon, 2011-06-20 at 13:38 +0200, Dominik P.H. Kalisch wrote:
 Hi,
 
 I would like to cluster a dataset with the ward algorithm.

I'm assuming that this refers to the agglomerative partitioning method
[1]. That is, the number of clusters is selected according to the data
partition that is sequentially optimal with respect to an `objective
function'. In order to apply the elbow criterion, it should be possible
to optimize over subsets of all possible data partitions where the
number of clusters is fixed.

Although the Ward method yields a sequence of data partitions with
decreasing cluster sizes, there is no guarantee that _any_ of these
partitions are optimal (except sequentially, of course). To apply the
elbow method post hoc seems dubious, but maybe no more so than the Ward
method itself.

There are clustering methods that optimize the data partition (w.r.t a
likelihood/posterior) with a fixed number of clusters, for instance,
those based on finite mixture models. The elbow principle and method
seem more valid in this context. See the R package 'mclust', and the
CRAN task view for cluster analysis:

http://cran.r-project.org/web/views/Cluster.html

 That works fine. But I can't find a method to plot the structure chart 
 to estimate the elbow crterion for the number of clusters.
 Can someone tell me how I can do it?
 
 Thanks for your help.
 Dominik
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[1] Ward, J. H. (1963), “Hierarchical Grouping to Optimize an Objective
Function,” Journal of the American Statistical Association, 58, 236–244.

-- 
Matthew S. Shotwell
Assistant Professor, Department of Biostatistics
School of Medicine, Vanderbilt University
1161 21st Ave. S2323 MCN Office CC2102L
Nashville, TN 37232-2158

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Factor Analysis/Inputting Correlation Matrix

2011-06-11 Thread Matt Stati
Can someone please direct me to how to run a factor analysis in R by first 
inputting a correlation matrix? Does the function factanal allow one to read 
a correlation matrix instead of data vectors? 

Thanks, 
Matt. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can we prepare a questionaire in R

2011-06-08 Thread Matt Shotwell
As Mike had written, there are frameworks for web-development with R.
RApache http://www.rapache.net is one. Also, see the R package Rook:
http://cran.r-project.org/web/packages/Rook/index.html .

On Wed, 2011-06-08 at 17:26 +0530, amrita gs wrote:
 How can we create HTML forms in R

Wouldn't you rather create HTML forms in HTML? See the links above to
use R for server-side scripting, for example, to receive form data from
a web browser.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about curve function

2011-06-07 Thread Matt Shotwell
On Tue, 2011-06-07 at 16:17 +0200, Uwe Ligges wrote:
 
 On 07.06.2011 11:57, peter dalgaard wrote:
 
  On Jun 6, 2011, at 11:22 , Prof Brian Ripley wrote:
 
  As a further example of the trickiness, the function method of plot() 
  relies on curve(x, ...) being a request to plot the function x(x) against 
  x.  I've added a comment to that effect to the help page.
 
  Ouch. This springs to mind:
 
  fortune(106)
 
  If the answer is parse() you should usually rethink the question.
  -- Thomas Lumley
 R-help (February 2005)
 
 
  but curve() predates that insight by half a decade or more. It could 
  probably do with a redesign, if anyone is up to it.
 
  By the way, it really does work if the 2nd arg is an expression object (as 
  opposed to an expression evaluating to an expression object):
 
  do.call(curve,list(expression(x)))
 
  or
 
  cl- quote(curve(x))
  cl[[2]]- expression(x)
  eval(cl)
 
  (The trouble with nonstandard evaluation is that it doesn't follow standard 
  evaluation rules...)
 
 If this is not already a fortune, I will add it.

And one more for Uwe's principle: when discontent, circumvent!  :)

 Which is why I useually circvumvent curve(). It is typically faster to 
 just evaluate a function at positions x and plot it rather than thinking 
 minutes about how curve() expects its arguments.
 
 Uwe
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about curve function

2011-06-05 Thread Matt Shotwell
I think there is trouble because expr in curve(expr) may be the name of
a function, and it's ambiguous whether 'x' should be interpreted as a
mathematical expression involving x, or the name of a function. Here are
some examples that work:

curve(I(x))
curve(1*x)

On Sun, 2011-06-05 at 12:07 -0500, Abhilash Balakrishnan wrote:
 Dear Sirs,
 
 I am a new user of the R package.  When I try to use the curve function it
 confuses me.
 
  curve(x^2)
 Works fine.
 
  curve(x)
 Makes a complaint I don't understand.  Why is x^2 valid and x is not?
 
 I check the documentation of curve, and it says the first argument must be
 an expression containing x.
 
  expression(x)
 Is an expression containing x.
 
  curve(expression(x))
 Makes a different complaint and mentions different lengths of x and y (but I
 use no y here).
 
 I understand that plotting the function y(x) = x is rather silly, but I want
 to know what I am doing wrong, for the sake of my understanding of how R
 works.
 
 Thank you for support.
 Abhilash B.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a vector from a file

2011-05-31 Thread Matt Shotwell
On Tue, 2011-05-31 at 15:36 +0200, heimat los wrote:
 Hello all,
 I am new to R and my question should be trivial. I need to create a word
 cloud from a txt file containing the words and their occurrence number. For
 that purposes I am using the snippets package [1].
 As it can be seen at the bottom of the link, first I have to create a vector
 (is that right that words is a vector?) like bellow.
 
  words - c(apple=10, pie=14, orange=5, fruit=4)
 
 My problem is to do the same thing but create the vector from a file which
 would contain words and their occurence number. I would be very happy if you
 could give me some hints.

How is the file formatted? Can you provide a small example?

 Moreover, to understand the format of the file to be inserted I write the
 vector words to a file.
 
  write(words, file=words.txt)
 
 However, the file words.txt contains only the values but not the
 names(apple, pie etc.).
 
 $ cat words.txt
 10 14 5 4
 
 It seems that I have to understand more about the data types in R.
 
 Thanks.
 PH
 
 http://www.rforge.net/doc/packages/snippets/cloud.html
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a vector from a file

2011-05-31 Thread Matt Shotwell
On Tue, 2011-05-31 at 16:19 +0200, heimat los wrote:
 On Tue, May 31, 2011 at 4:12 PM, Matt Shotwell m...@biostatmatt.com
 wrote:
 On Tue, 2011-05-31 at 15:36 +0200, heimat los wrote:
  Hello all,
  I am new to R and my question should be trivial. I need to
 create a word
  cloud from a txt file containing the words and their
 occurrence number. For
  that purposes I am using the snippets package [1].
  As it can be seen at the bottom of the link, first I have to
 create a vector
  (is that right that words is a vector?) like bellow.
 
   words - c(apple=10, pie=14, orange=5, fruit=4)
 
  My problem is to do the same thing but create the vector
 from a file which
  would contain words and their occurence number. I would be
 very happy if you
  could give me some hints.
 
 
 How is the file formatted? Can you provide a small example?
 
 
 
 The file format is
 
 video tape=8
 object recognition=45
 object detection=23
 vhs tape=2
 
 But I can change it if needed with bash scripting.

A CSV might be more universal, but this will do.

 Regards
 

OK. Save the above as 'words.txt', then from the R prompt:

words.df - read.table(words.txt, sep==)
words.vec - words.df$V2
names(words.vec) - words.df$V1

Then use words.vec with the snippets::cloud function. I wasn't able to
install the snippets package and test the cloud function, because I am
still using R 2.13.0-alpha.

read.table returns what R calls a 'data frame'; basically a collection
of records over some number of fields. It's like a matrix but different,
since fields may take values of different types. In the example above,
the data frame returned by read.table has two fields named 'V1' and
'V2', respectively. The R expression 'words.df$V2' references the 'V2'
field of words.df, which is a vector. The last expression sets names for
words.vec, by referencing the 'V1' field of words.df. 

  
  Moreover, to understand the format of the file to be
 inserted I write the
  vector words to a file.
 
   write(words, file=words.txt)
 
  However, the file words.txt contains only the values but not
 the
  names(apple, pie etc.).
 
  $ cat words.txt
  10 14 5 4
 
  It seems that I have to understand more about the data types
 in R.
 
  Thanks.
  PH
 
  http://www.rforge.net/doc/packages/snippets/cloud.html
 
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible
 code.
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] blank space escape sequence in R?

2011-04-25 Thread Matt Shotwell
You can embed hex escapes in strings (except \x00). The value(s) that
you embed will depend on the character encoding used on you platform. If
this is UTF-8, or some other ASCII compatible encoding, \x20 will work:

 foo\x20bar
[1] foo bar


For other locales, you might try charToRaw( ) to see the binary (hex)
representation for the space character on your platform, and substitute
this sequence instead.

On Mon, 2011-04-25 at 15:01 +0200, Mark Heckmann wrote:
 Is there a blank space escape sequence in R, i.e. something like \sp etc. to 
 produce a blank space?
 
 TIA
 Mark
 –––
 Mark Heckmann
 Blog: www.markheckmann.de
 R-Blog: http://ryouready.wordpress.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] blank space escape sequence in R?

2011-04-25 Thread Matt Shotwell
I may have misread your original email. Whether you use a hex escape or
a space character, the resulting string in memory is identical:

 identical(a\x20b, a b)
[1] TRUE

But, if you were to read a file containing the six characters a
\x20b (say with readLines), then the six characters would be read into
memory, and printed like this:

a\\x20b

That is, not with a space character substituted for \x20. So, now I'm
not sure this is a solution.

On Mon, 2011-04-25 at 12:24 -0500, Matt Shotwell wrote:
 You can embed hex escapes in strings (except \x00). The value(s) that
 you embed will depend on the character encoding used on you platform. If
 this is UTF-8, or some other ASCII compatible encoding, \x20 will work:
 
  foo\x20bar
 [1] foo bar
 
 
 For other locales, you might try charToRaw( ) to see the binary (hex)
 representation for the space character on your platform, and substitute
 this sequence instead.
 
 On Mon, 2011-04-25 at 15:01 +0200, Mark Heckmann wrote:
  Is there a blank space escape sequence in R, i.e. something like \sp etc. 
  to produce a blank space?
  
  TIA
  Mark
  –––
  Mark Heckmann
  Blog: www.markheckmann.de
  R-Blog: http://ryouready.wordpress.com
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting 16-bit to 8-bit encoding?

2011-04-21 Thread Matt Shotwell

On 04/21/2011 10:36 AM, Brian Buma wrote:

Hello all-

I have a question related to encoding.  I'm using a seperate program which
takes either 16 bit or 8 bit (flat binary files) as inputs (they are raster
satellite imagery and the associated quality files), but can't handle both
at the same time.  Problem is the quality and the image come in different
formats (quality- 8bit, image- 16bit).  I need to switch the encoding on the


I think some more detail about these files is necessary. What do these 
16/8 bit quantities represent? Are these files just a sequence of such 
quantities, or is there meta information (i.e. image dimension)?



quality files to 16 bit, without altering anything else (they are img files
right now).  I imagine this is a fairly simply process, but I haven't been


Does 'img files' indicate that these files are formatted according to a 
standard?. Finally, are you using some R code to manipulate these files? 
Have an example, including data?



able to find a package or anything which can tell me how to do it- perhaps
I'm searching the wrong terms, but I did look.  Is there any methods to do
this quickly?  Ideally, the solution would involve reading in a list of
files and replacing the original with the new, 16 bit version, as I have
over 300 files to convert.  I hope that's clear.  Thanks in advance!




--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting 16-bit to 8-bit encoding?

2011-04-21 Thread Matt Shotwell

OK. I'm going to copy this back to R-help too.

With R, we can convert a file of 8-bit integers to 16-bit integers like so:

# Create a test file of 8-bit integers:
con - file(test.8, wb)
writeBin(sample(-1L:4L, 1024, TRUE), con, size=1)
close(con)

# Convert test.8 to test.16
icon - file(test.8, rb)
ocon - file(test.16, wb)
while(length(dat - readBin(icon, integer, 1024, size=1))  0)
writeBin(dat, ocon, size=2)
close(icon)
close(ocon)

This assumes (without considering a more formal description of the 
format) that the file and your computing platform agree on how 
multi-byte signed integers are represented.


Hope that will get you going.

On 04/21/2011 11:02 AM, Brian Buma wrote:

Apologies.  The 8-bit file (the one that needs to be converted) is just
a series of integers, -1 to 4, which is no doubt why they are encoded in
8 bit.  They don't need to be changed numerically, just put in a 16-bit
encoding.  No meta info, headerless.  All the data is MODIS satellite
imagery.

I have been using the raster program to visualize things, and
processing (when I get that far) will be done in that program mainly.
I've used that program on a different project, and it seemed to work
well.  The actual program that can't handle two different inputs is
Timesat, a phenology-program (not R).  I was thinking that R could
probably do this conversion quick and easy (fairly), but haven't figured
out how to yet.

As an example, I have an NDVI file (flat binary, 16bit encoding)- so a
string of numbers, 4450, 4650, etc...  The associated quality file is
another string, 1,1,2,1,0, etc.  It's encoded as an 8bit file.
Conceptually, all it needs (I think) is to be read in and resaved in the
less memory-efficient 16-bit format.

Thanks!  Sorry if the explanation isn't clear.



On Thu, Apr 21, 2011 at 9:50 AM, Matt Shotwell
matt.shotw...@vanderbilt.edu mailto:matt.shotw...@vanderbilt.edu wrote:

On 04/21/2011 10:36 AM, Brian Buma wrote:

Hello all-

I have a question related to encoding.  I'm using a seperate
program which
takes either 16 bit or 8 bit (flat binary files) as inputs (they
are raster
satellite imagery and the associated quality files), but can't
handle both
at the same time.  Problem is the quality and the image come in
different
formats (quality- 8bit, image- 16bit).  I need to switch the
encoding on the


I think some more detail about these files is necessary. What do
these 16/8 bit quantities represent? Are these files just a sequence
of such quantities, or is there meta information (i.e. image dimension)?


quality files to 16 bit, without altering anything else (they
are img files
right now).  I imagine this is a fairly simply process, but I
haven't been


Does 'img files' indicate that these files are formatted according
to a standard?. Finally, are you using some R code to manipulate
these files? Have an example, including data?


able to find a package or anything which can tell me how to do
it- perhaps
I'm searching the wrong terms, but I did look.  Is there any
methods to do
this quickly?  Ideally, the solution would involve reading in a
list of
files and replacing the original with the new, 16 bit version,
as I have
over 300 files to convert.  I hope that's clear.  Thanks in advance!



--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University




--


Brian Buma
PhD Candidate
Ecology and Evolutionary Biology / CIRES
University of Colorado, Boulder

brian.b...@colorado.edu mailto:brian.b...@colorado.edu




--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] print.raw - but convert ASCII?

2011-04-19 Thread Matt Shotwell
On Tue, 2011-04-19 at 03:14 -0400, Duncan Murdoch wrote:
 On 11-04-18 9:51 PM, Matt Shotwell wrote:
  Does anyone know if there is a simple way to print raw vectors, such
  that ASCII characters are printed for bytes in the ASCII range, and
  their hex representation otherwise? rawToChar doesn't work when we have
  something like c(0x00, 0x00, 0x44, 0x00).
 
 Do you really need hex?  rawToChar(x, multiple=TRUE) comes close, but 
 displays using octal or symbolic escapes, e.g.

No, but I've almost learned to count efficiently in hex. :)

[1]  \001 \002 \003 \004 \005 \006 \a   \b 
 \t   \n
   [12] \v   \f   \r   \016 \017 \020 \021 \022 \023 
 \024 \025
   [23] \026 \027 \030 \031 \032 \033 \034 \035 \036 
 \037  
   [34] !\   #$%'() 
 *+
 
 If you really do want hex, then you'll need something like
 
 ifelse( x  32 | x = 127, as.character(x), rawToChar(x, multiple=TRUE))

That does it. Thanks. -Matt

 Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] print.raw - but convert ASCII?

2011-04-18 Thread Matt Shotwell
Does anyone know if there is a simple way to print raw vectors, such
that ASCII characters are printed for bytes in the ASCII range, and
their hex representation otherwise? rawToChar doesn't work when we have
something like c(0x00, 0x00, 0x44, 0x00).

-Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integer and floating-point storage

2011-04-14 Thread Matt Shotwell

Hi Mike,

There are some facilities for storing and manipulating small (2 bit) 
integers. See here:


http://cran.r-project.org/web/packages/ff/index.html

-Matt

On 04/14/2011 01:20 PM, Mike Miller wrote:

I note that current implementations of R use 32-bit integers for
integer vectors, but I am working with large arrays that contain
integers from 0 to 3, so they could be stored as unsigned 8-bit
integers. Can R do this? (FYI -- This is for storing minor-allele counts
for genetic studies. There are 0, 1 or 2 minor alleles and 3 would
represent missing.)

It is theoretically possible to store such data with four integers per
byte. This is what PLINK (GPL license) does in its binary (.bed)
pedigree format:

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

That might be too much to hope for. ;-)

I think that the R system uses double-precision floating point numbers
by default. When I impute minor-allele counts, I get posterior expected
values ranging from 0 to 2 (called dosages). The imputation isn't very
precise, so it would be fine to store such data using one or two bytes.
(The values are used as regressors and small changes would have minimal
impact on results.) I could use unsigned 8-bit integers (0 to 255),
probably using only 0 to 254 so that 1 and 2 could be represented with
perfect precision as 127/127 and 254/127 (but I would do regression on
the integer values). Or I could use 16 bits, doubling memory load and
improving precision. It would be convenient if R could work with
half-precision floating-point numbers (binary16):

http://en.wikipedia.org/wiki/Half_precision_floating-point_format

Can R do that?

If not, is anyone interested in working on developing some of these
features in R? We have GPL code from PLINK and Octave that might help a
lot.

http://www.gnu.org/software/octave/doc/interpreter/Integer-Data-Types.html

Best,

Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] understanding dump.frames; typo;

2011-04-12 Thread Matt Shotwell
When a function I have stop()s, I'd like it to return its evaluation 
frame, but not halt execution of the script. In experimenting with this, 
I became confused with dump.frames. From ?dump.frames:


 If ‘dump.frames’ is installed as the error handler, execution will
 continue even in non-interactive sessions.  See the examples for
 how to dump and then quit.

Suppose I save the following script to dump-test.R:

options(error=dump.frames)
cat(interactive:, interactive(), \n)
f - function() {
stop(dump-test-error)
cat(execution continues within f\n)
}
f()
cat(execution continues outside of f\n)
if(exists(last.dump))
cat(last.dump is available\n)

From an interactive R prompt, execution is halted at 'stop':

R source('dump-test.R')
interactive: TRUE
Error in f() : dump-test-error

Using Rscript, execution continues depending on whether you source() the 
file with the -e flag, or pass the file as an argument.


matt@pal ~$ Rscript dump-test.R
interactive: FALSE
Error in f() : dump-test-error
execution continues outside of f
last.dump is available

matt@pal ~$ Rscript -e source('dump-test.R')
interactive: FALSE
Error in f() : dump-test-error
Calls: source - eval.with.vis - eval.with.vis - f

It seems that interactiveness (as tested by interactive()) doesn't come 
into play, yet execution does *not* always continue. What am I missing? 
Alternative solutions are also welcome.


-Matt

P.S. There is a typo in the help file: The dumped object contain the 
call stack... should read The dumped object contains the call stack


 sessionInfo()
R version 2.13.0 alpha (2011-03-18 r54865)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.13.0

--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Examples of web-based Sweave use?

2011-04-06 Thread Matt Shotwell
That's an interesting idea. I had written a long email describing a 
proof-of-concept, but decided to post is to the website below instead.


http://biostatmatt.com/archives/1184

Matt

On 04/04/2011 07:31 AM, carslaw wrote:

I appreciate that this is OT, but I'd be grateful for pointers to examples of
where
Sweave has been used for web-based applications.  In particular, examples of
where reports/analyses are produced automatically through submission of data
to a web-sever.  I am mostly interested in situations where pdf reports have
been produced rather than, say, a plot/table etc shown on a web page.

I've had limited success finding examples on this.

Many thanks.

David Carslaw


Environmental Research Group
MRC-HPA Centre for Environment and Health
King's College London
Franklin Wilkins Building
Stamford Street
London SE1 9NH

david.cars...@kcl.ac.uk


--
View this message in context: 
http://r.789695.n4.nabble.com/Examples-of-web-based-Sweave-use-tp3425324p3425324.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] library(foreign) read.spss warning

2011-03-26 Thread Matt Shotwell
There is some information about this subtype in the PSPP source code,
and for other subtypes not yet implemented by read.spss. The PSPP source
code indicates that this subtype consists of Value labels for long
strings, which isn't very illuminating to me (probably because I don't
use PSPP, or SPSS, though I increasingly have need to import SPSS data
files). Copied below are the relevant bits.

-Matt

From (the PSPP source file) src/data/sys-file-reader.c:

enum
  {
/* subtypes 0-2 unknown */
EXT_INTEGER   = 3,  /* Machine integer info. */
EXT_FLOAT = 4,  /* Machine floating-point info. */
EXT_VAR_SETS  = 5,  /* Variable sets. */
EXT_DATE  = 6,  /* DATE. */
EXT_MRSETS= 7,  /* Multiple response sets. */
EXT_DATA_ENTRY= 8,  /* SPSS Data Entry. */
/* subtypes 9-10 unknown */
EXT_DISPLAY   = 11, /* Variable display parameters. */
/* subtype 12 unknown */
EXT_LONG_NAMES= 13, /* Long variable names. */
EXT_LONG_STRINGS  = 14, /* Long strings. */
/* subtype 15 unknown */
EXT_NCASES= 16, /* Extended number of cases. */
EXT_FILE_ATTRS= 17, /* Data file attributes. */
EXT_VAR_ATTRS = 18, /* Variable attributes. */
EXT_MRSETS2   = 19, /* Multiple response sets (extended). */
EXT_ENCODING  = 20, /* Character encoding. */
EXT_LONG_LABELS   = 21  /* Value labels for long strings. */
  };

and

  static const struct extension_record_type types[] =
{   
  /* Implemented record types. */
  { EXT_INTEGER,  4, 8 },
  { EXT_FLOAT,8, 3 },
  { EXT_MRSETS,   1, 0 }, 
  { EXT_DISPLAY,  4, 0 },
  { EXT_LONG_NAMES,   1, 0 },
  { EXT_LONG_STRINGS, 1, 0 },
  { EXT_NCASES,   8, 2 },
  { EXT_FILE_ATTRS,   1, 0 },
  { EXT_VAR_ATTRS,1, 0 },
  { EXT_MRSETS2,  1, 0 },
  { EXT_ENCODING, 1, 0 },
  { EXT_LONG_LABELS,  1, 0 },

  /* Ignored record types. */
  { EXT_VAR_SETS, 0, 0 },
  { EXT_DATE, 0, 0 },
  { EXT_DATA_ENTRY,   0, 0 },
};


On Fri, 2011-03-25 at 18:39 -0500, Robert Baer wrote:
 I got the following:
  library(foreign)
  swal = read.spss(swallowing.sav, to.data.frame =TRUE)
 Warning message:
 In read.spss(swallowing.sav, to.data.frame = TRUE) :
   swallowing.sav: Unrecognized record type 7, subtype 21 encountered in 
 system file
  
 
 The bulk of the data seems to read in  a usable form, but I'm curious about 
 what might be getting lost because I don't know how to translate type 7, 
 subtype 21.  I did not generate the SPSS data so I'm not certain of the 
 version, but I'm assuming version 18 or 19.  I did a quick Find on the PSPP 
 manual for Type 7 and subtype 21 and came up dry.
 
 Any insights or clues how I might learn more?  
 
 Thanks,
 Rob
 
 
  R.Version()
 $platform
 [1] i386-pc-mingw32
 
 $arch
 [1] i386
 
 $os
 [1] mingw32
 
 $system
 [1] i386, mingw32
 
 $status
 [1] 
 
 $major
 [1] 2
 
 $minor
 [1] 12.2
 
 $year
 [1] 2011
 
 $month
 [1] 02
 
 $day
 [1] 25
 
 $`svn rev`
 [1] 54585
 
 $language
 [1] R
 
 $version.string
 [1] R version 2.12.2 (2011-02-25)
 
 
 
 --
 Robert W. Baer, Ph.D.
 Professor of Physiology
 Kirksville College of Osteopathic Medicine
 A. T. Still University of Health Sciences
 Kirksville, MO 63501
 660-626-232
 FAX 660-626-2965
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Venn Diagram corresponding to size in R

2011-03-09 Thread Matt Shotwell
Try here:
https://stat.ethz.ch/pipermail/r-help/2003-February/029393.html


On Tue, 2011-03-08 at 20:25 -0500, Shira Rockowitz wrote:
 I was wondering if anyone could help me figure out how to make a Venn
 diagram in R where the circles are scaled to the size of each dataset.  I
 have looked at the information for venn (in gplots) and vennDiagram (in
 limma) and I cannot seem to figure out what parameter to change.  I have
 looked this up online and do not seem to be seeing anyone else who has
 posted this question or the answer to it before.  I see graphs though that
 are purported to be made in R that are scaled like this, so I think it must
 be possible, although I do not know if they were made with a custom
 function.  If I have just not been searching for this question correctly,
 and it has already been asked, please direct me to the earlier question.  I
 would like to thank you all in advance for you help!
 ~Shira
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assignment by value or reference

2011-03-08 Thread Matt Shotwell
On 03/08/2011 07:20 AM, Xiaobo Gu wrote:
 On Wed, Sep 15, 2010 at 5:05 PM, Uwe Ligges
 lig...@statistik.tu-dortmund.de  wrote:
 See the R Language Definition manual. Since R knows about lazy
evaluation,
 it is sometimes neither by reference nor by value.
 If you want to think binary, then by value fits better than by
 reference.
 Hi,
 Can we think it's eventually by value?

Not always (see in-line below).


 For simple functions such as:
 is(df[[1]], logical)
 used to test wheather the first column of data frame df is of type
 logical, will a new vector be created and used inside the is function?

No, df[[1]] isn't copied in this case. However, if you subset an atomic
vector (subset+assignment is different!), there is copying. For example:

  df - data.frame(x=c(FALSE,TRUE))
  tracemem(df[[1]])
[1] 0x217afa8
  is(df[[1]],logical)
[1] TRUE
  is(df[[1]][],  logical)
tracemem[0x217afa8 - 0xf9d198]: ...cut...
[1] TRUE
  is(df[[1]][1], logical)
[1] TRUE

Note that tracemem doesn't catch the copying that occurs during 
evaluation of the last expression. As a strategy, R avoids copying when 
it's clearly not necessary from the perspective of the R interpreter. 
There are some notable cases where copying is obviously not necessary 
from the user perspective (e.g. contiguous subsetting), but avoiding a 
copy in these cases might be difficult to implement in R's 
parser/evaluator framework. Here's another simple exception:

  x - 1
  tracemem(x)
[1] 0x18984b8
  x - x + 1
tracemem[0x18984b8 - 0x207e568]: ...cut...


 Another example,

 dbWriteTable(con, tablename, df) will write the content of data
 frame df into a database table, will a new data frame object created
 and used inside the dbWriteTable function?

No, but if dbWriteTable modifies its local variable that was assigned
df, then df may be copied.


 Thanks.



 Uwe Ligges



 On 05.09.2010 17:19, Xiaobo Gu wrote:

 Hi Team,

   Can you please tell me the rules of assignment in R, by
value or
 by reference.

  From my about 3 months of experience of part time job of R, it
seems most
 times it is by value, especially in function parameter and return
values
 assignment; and it is by reference when referencing container
sub-objects of

This is a function call convention (i.e. passing by value), as 
distinguished from an assignment convention (I'm not certain they're
equivalent in R). In general R functions pass by value. There are
exceptions here also, notably R environments. For 
example:

  f - function(e) assign(a, 1, e)
  e - new.env()
  f(e)
  objects(e)
[1] a

Under strict pass-by-value convention, e would remain unchanged. In 
general, assignments are by value. However, R environments are an 
exception; assignment is by reference:

  r - e
  objects(r)
[1] a
  assign(b, 2, r)
  objects(r)
[1] a b
  objects(e)
[1] a b

In this sense, the calling/assignment convention is a property of the 
objects being passed/assigned. I think that is consistent with Uwe's 
comment above.

Best,
Matt
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

 container objects, such as elements of List objects and row/column
objects
 of DataFrame objectes; but it is by value when referencing the
smallest unit
 of element of a container object, such as cell of data frame
objects.





 Xiaobo.Gu




 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rapache ( was Developing a web crawler )

2011-03-06 Thread Matt Shotwell
On Sun, 2011-03-06 at 08:06 -0500, Mike Marchywka wrote: 
 
 
 
 
 
 
  Date: Thu, 3 Mar 2011 13:04:11 -0600
  From: matt.shotw...@vanderbilt.edu
  To: r-help@r-project.org
  Subject: Re: [R] Developing a web crawler / R webkit or something 
  similar? [off topic]
 
  On 03/03/2011 08:07 AM, Mike Marchywka wrote:
  
  
  
  
  
  
  
   Date: Thu, 3 Mar 2011 01:22:44 -0800
   From: antuj...@gmail.com
   To: r-help@r-project.org
   Subject: [R] Developing a web crawler
  
   Hi,
  
   I wish to develop a web crawler in R. I have been using the 
   functionalities
   available under the RCurl package.
   I am able to extract the html content of the site but i don't know how 
   to go
  
   In general this can be a big effort but there may be things in
   text processing packages you could adapt to execute html and javascript.
   However, I guess what I'd be looking for is something like a webkit
   package or other open source browser with or without an R interface.
   This actually may be an ideal solution for a lot of things as you get
   all the content handlers of at least some browser.
  
  
   Now that you mention it, I wonder if there are browser plugins to handle
   R content ( I'd have to give this some thought, put a script up as
   a web page with mime type test/R and have it execute it in R. )
 
  There are server-side solutions for this sort of thing. See
  http://rapache.net/ . Also, there was a string of messages on R-devel
  some years ago addressing the mime type issue; beginning here:
  http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't
  know whether there was a resolution. Some suggestions were text/x-R,
  text/x-Rd, application/x-RData.
 
 The rapache demo looks like something I could use right away
 but I haven't looked into the handlers yet. I have installed rapache now
 on my debian system ( still have config issues but I did get apach2 to 
 restart LOL)
 Before I plow into this too far, how would this compare/compete with something
 like a PHP library for Rserve? That is the approach I had been pursuing.
 
 Thanks. 

Hi Mike, 

If you've built and configured RApache, then the difficult plowing is
over :). RApache operates at the top (HTTP) layer of the OSI stack,
whereas Rserve works at the lower transport/network layer. Hence, the
scope of Rserve applications is far more general. Extending Rserve to
operate at the HTTP layer (via PHP) will mean more work.

RApache offers high level functionality, for example, to replace PHP
with R in web pages. No interface code is necessary. Here's a simple
What's The Time? webpage using RApache and yarr [1] to handle the
code:

 setContentType(text/html\n\n) 
html
headtitleWhat's The Time?/title/head
bodypre/= cat(format(Sys.time(), usetz=TRUE)) /pre/body
/html

Here's a live version: [2]. Interfacing PHP with Rserve in this context
would be useful if installation of R and/or RApache on the web host were
prohibited. A PHP/Rserve framework might also be useful in other
contexts, for example, to extend PHP applications (e.g. WordPress,
MediaWiki).

Best,
Matt

[1] http://biostatmatt.com/archives/1000
[2] http://biostatmatt.com/yarr/time.yarr

 
  -Matt
 
  
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Developing a web crawler / R webkit or something similar? [off topic]

2011-03-03 Thread Matt Shotwell

On 03/03/2011 08:07 AM, Mike Marchywka wrote:









Date: Thu, 3 Mar 2011 01:22:44 -0800
From: antuj...@gmail.com
To: r-help@r-project.org
Subject: [R] Developing a web crawler

Hi,

I wish to develop a web crawler in R. I have been using the functionalities
available under the RCurl package.
I am able to extract the html content of the site but i don't know how to go


In general this can be a big effort but there may be things in
text processing packages you could adapt to execute html and javascript.
However, I guess what I'd be looking for is something like a webkit
package or other open source browser with or without an R interface.
This actually may be an ideal solution for a lot of things as you get
all the content handlers of at least some browser.


Now that you mention it, I wonder if there are browser plugins to handle
R content ( I'd have to give this some thought, put a script up as
a web page with mime type test/R and have it execute it in R. )


There are server-side solutions for this sort of thing. See 
http://rapache.net/ . Also, there was a string of messages on R-devel 
some years ago addressing the mime type issue; beginning here: 
http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't 
know whether there was a resolution. Some suggestions were text/x-R, 
text/x-Rd, application/x-RData.


-Matt






about analyzing the html formatted document.
I wish to know the frequency of a word in the document. I am only acquainted
with analyzing data sets.
So how should i go about analyzing data that is not available in table
format.

Few chunks of code that i wrote:
w-
getURL(http://www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B003DZ1Y8Q/ref=dp_reviewsanchor#FullQuotes;)
write.table(w,test.txt)
t- readLines(w)

readLines also didnt prove out to be of any help.

Any help would be highly appreciated. Thanks in advance.


--
View this message in context: 
http://r.789695.n4.nabble.com/Developing-a-web-crawler-tp3332993p3332993.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust variance estimation with rq (failure of the bootstrap?)

2011-03-01 Thread Matt Shotwell
Jim,

Thanks for pointing me to this article. The authors argue that the 
bootstrap intervals for a robust estimator may not be as robust as the 
estimator. In this context, robustness is measured by the breakdown 
point, which is supposed to measure robustness to outliers. Even so, the
authors found that the upper bound of a quantile bootstrap interval for 
the sample median was nearly as robust as the sample median. That brings
some comfort in using quantile bootstrap intervals in quantile
regression.

Does the sandwich estimator assume that errors are independent? And a 
related question: Does the rq function allow the user to specify 
clusters/grouping among the observations?

Best,
Matt

On Tue, 2011-03-01 at 05:35 -0600, James Shaw wrote:
 Matt:
 
 Thanks for your prompt reply.
 
 The disparity between the bootstrap and sandwich variance estimates
 derived when modeling the highly skewed outcome suggest that either
 (A) the empirical robust variance estimator is underestimating the
 variance or (B) the bootstrap is breaking down.  The bootstrap
 variance estimate of a robust location estimate is not necessarily
 robust, see Statistics  Probability Letters 50 (2000) 49-53.  Since
 submitting my earlier post, I have noticed that the the robust kernel
 variance estimate is similar to the bootstrap estimate.  Under what
 conditions would one expect Koenker and Machado's sandwich variance
 estimator, which uses a local estimate of the sparsity, to fail?
 
 --
 Jim
 
 
 
 On Mon, Feb 28, 2011 at 8:59 PM, Matt Shotwell m...@biostatmatt.com wrote:
  Jim,
 
  If repeated measurements on patients are correlated, then resampling all
  measurements independently induces an incorrect sampling distribution
  (= incorrect variance) on a statistic of these data. One solution, as
  you mention, is the block or cluster bootstrap, which preserves the
  correlation among repeated observations in resamples. I don't
  immediately see why the cluster bootstrap is unsuitable.
 
  Beyond this, I would be concerned about *any* variance estimates that
  are blind to correlated observations.
 
  The bootstrap variance estimate may be larger than the asymptotic
  variance estimate, but that alone isn't evidence to favor one over the
  other.
 
  Also, I can't justify (to myself) why skew would hamper the quality of
  bootstrap variance estimates. I wonder how it affects the sandwich
  variance estimate...
 
  Best,
  Matt
 
  On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote:
  I am fitting quantile regression models using data collected from a
  sample of 124 patients.  When modeling cross-sectional associations, I
  have noticed that nonparametric bootstrap estimates of the variances
  of parameter estimates are much greater in magnitude than the
  empirical Huber estimates derived using summary.rq's nid option.
  The outcome variable is severely skewed, and I am afraid that this may
  be affecting the consistency of the bootstrap variance estimates.  I
  have read that the m out of n bootstrap can be used to overcome this
  problem.  However, this procedure requires both the original sample
  (n) and the subsample (m) sizes to be large.  The version implemented
  in rq.boot does not appear to provide any improvement over the naive
  bootstrap.  Ultimately, I am interested in using median regression to
  model changes in the outcome variable over time.  Summary.rq's robust
  variance estimator is not applicable to repeated-measures data.  I
  question whether the block (cluster) bootstrap variance estimator,
  which can accommodate intraclass correlation, would perform well.  Can
  anyone suggest alternatives for variance estimation in this situation?
  Regards,
 
  Jim
 
 
  James W. Shaw, Ph.D., Pharm.D., M.P.H.
  Assistant Professor
  Department of Pharmacy Administration
  College of Pharmacy
  University of Illinois at Chicago
  833 South Wood Street, M/C 871, Room 266
  Chicago, IL 60612
  Tel.: 312-355-5666
  Fax: 312-996-0868
  Mobile Tel.: 215-852-3045
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 
 -- 
 James W. Shaw, Ph.D., Pharm.D., M.P.H.
 Assistant Professor
 Department of Pharmacy Administration
 College of Pharmacy
 University of Illinois at Chicago
 833 South Wood Street, M/C 871, Room 266
 Chicago, IL 60612
 Tel.: 312-355-5666
 Fax: 312-996-0868
 Mobile Tel.: 215-852-3045
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo

Re: [R] Robust variance estimation with rq (failure of the bootstrap?)

2011-02-28 Thread Matt Shotwell
Jim, 

If repeated measurements on patients are correlated, then resampling all
measurements independently induces an incorrect sampling distribution
(= incorrect variance) on a statistic of these data. One solution, as
you mention, is the block or cluster bootstrap, which preserves the
correlation among repeated observations in resamples. I don't
immediately see why the cluster bootstrap is unsuitable.

Beyond this, I would be concerned about *any* variance estimates that
are blind to correlated observations.

The bootstrap variance estimate may be larger than the asymptotic
variance estimate, but that alone isn't evidence to favor one over the
other.

Also, I can't justify (to myself) why skew would hamper the quality of
bootstrap variance estimates. I wonder how it affects the sandwich
variance estimate...

Best,
Matt

On Mon, 2011-02-28 at 17:50 -0600, James Shaw wrote:
 I am fitting quantile regression models using data collected from a
 sample of 124 patients.  When modeling cross-sectional associations, I
 have noticed that nonparametric bootstrap estimates of the variances
 of parameter estimates are much greater in magnitude than the
 empirical Huber estimates derived using summary.rq's nid option.
 The outcome variable is severely skewed, and I am afraid that this may
 be affecting the consistency of the bootstrap variance estimates.  I
 have read that the m out of n bootstrap can be used to overcome this
 problem.  However, this procedure requires both the original sample
 (n) and the subsample (m) sizes to be large.  The version implemented
 in rq.boot does not appear to provide any improvement over the naive
 bootstrap.  Ultimately, I am interested in using median regression to
 model changes in the outcome variable over time.  Summary.rq's robust
 variance estimator is not applicable to repeated-measures data.  I
 question whether the block (cluster) bootstrap variance estimator,
 which can accommodate intraclass correlation, would perform well.  Can
 anyone suggest alternatives for variance estimation in this situation?
 Regards,
 
 Jim
 
 
 James W. Shaw, Ph.D., Pharm.D., M.P.H.
 Assistant Professor
 Department of Pharmacy Administration
 College of Pharmacy
 University of Illinois at Chicago
 833 South Wood Street, M/C 871, Room 266
 Chicago, IL 60612
 Tel.: 312-355-5666
 Fax: 312-996-0868
 Mobile Tel.: 215-852-3045
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Visualizing Points on a Sphere

2011-02-25 Thread Matt Shotwell
That's interesting. You might also like:
http://en.wikipedia.org/wiki/Von_Mises%E2%80%93Fisher_distribution

I'm not sure how to plot the wireframe sphere, but you can visualize the
points by transforming to Cartesian coordinates like so:

u - runif(1000,0,1)
v - runif(1000,0,1)
theta - 2 * pi * u
phi   - acos(2 * v - 1)
x - sin(theta) * cos(phi)
y - sin(theta) * sin(phi)
z - cos(theta)
library(lattice)
cloud(z ~ x + y)

-Matt

On Fri, 2011-02-25 at 14:21 +0100, Lorenzo Isella wrote:
 Dear All,
 I need to plot some points on the surface of a sphere, but I am not sure 
 about how to proceed to achieve this in R (or if it is suitable for this 
 at all).
 In any case, I am not looking for really fancy visualizations; for 
 instance you can consider the images between formulae 5 and 6 at
 
 http://bit.ly/hOgK9h
 
 Any suggestion is appreciated.
 Cheers
 
 Lorenzo
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Transfer function observed vs predicted values graph problem

2011-02-22 Thread Matt Coe
Hi,
I am trying to make a palaeoenvironmental transfer function using the R
package rioja that predicts the water-table (measured as depth to the water
table) of an area given the testate amoebae that are found there. I've
carried out weighted averaging of the data and am trying to produce a graph
that shows the observed water-table versues the model's predicted values.
Following the instructions in the rioja help booklet (see below), I end up
with a graph where the origin is not at the bottom left of the diagram, i.e.
the graph is showing some values that suggest that the water table is, say,
1m above ground.
I've tried entering the water-tables as negative values but the same thing
happens.
Does anybody know if there something I'm missing out? Or is there a way
that, if the values returned are less than 0, then they can automatically be
put just as 0?
Any help would be most appreciated,
Thank you,
Matthew


My environmental matrix (x) is:
SampleId WTD Moisture pH EC
1 1 20 91.72700 3.496674  85.02688
2 2  2 93.88913 3.550794  85.69465
3 3 26 90.30269 3.948559 113.19206
4 4  5 94.14427 3.697213  48.56375
5 5 30 90.04269 3.745020 108.57278

90 GAL_15 70 94.07849 3.777932  66.77673

The species matrix (y) contains the abundance of 32 species over 90 sites,
set out like this
F1 AmpFlav AmpWri ArcCat ArcDis
1 1 22.2929936 0.000 0.000 0.000
2 2 30.9677419 0.000 0.000 3.2258065


fit - WA(y, x, tolDW = FALSE, use.N2=TRUE, check.data=TRUE, lean=FALSE)
# plot predicted vs. observed
plot(fit)
plot(fit, resid=TRUE)
# Water-table reconstruction
pred - predict(fit, y)
#plot the reconstruction
plot(sites, pred$fit[, 1], type=b)
# cross-validation model using bootstrapping
fit.xv - crossval(fit, cv.method=boot, nboot=1000)
par(mfrow=c(1,2))
plot(fit)
plot(fit, resid=TRUE)
plot(fit.xv, xval=TRUE)
plot(fit.xv, xval=TRUE, resid=TRUE)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with writing a file in UTF-8

2011-02-21 Thread Matt Shotwell
Thomas, 

I wasn't able to reproduce your finding. The last two characters in my
'out.txt' file were just as expected. But, I'm in an UTF-8 locale. Your
locale affects the encoding of characters on your platform. If you're
not in a UTF-8 locale, then characters are converted from your native
encoding to UTF-8 (when you specify encoding=UTF-8). In the process of
conversion, it's possible to lose information. You can test whether
there is a loss (or a change rather) when R writes these characters like
so:

# what does űŁ look like in binary (hex)?
raw_before - charToRaw(űŁ)

# write 'out.txt' as before
out - file(description=out.txt, open=w, encoding=UTF-8)
write(x=űŁ, file=out)
close(con=out)

# read in the two characters
out - file(description=out.txt, open=r, encoding=UTF-8)
raw_after - charToRaw(readChar(con=out, nchars=2))
close(con=out)

# compare the raw representations
identical(raw_before, raw_after)

This test passes on my machine. But, there's also the question of
whether these characters made it onto R-help list unaltered. Also,
please include the result of sessionInfo() in you subsequent messages.

Best,
Matt

 sessionInfo()
R version 2.11.1 (2010-05-31) 
i686-pc-linux-gnu 

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C 
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C   
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C  

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base  

On Thu, 2011-02-17 at 13:54 -0800, tpklein wrote:

 Hello,
 
 I am working with a data frame containg character strings with many special
 symbols from various European languages.  When writing such character
 strings to a file using the UTF-8 encoding, some of them are converted in a
 strange way.  See the following example, run in R 2.12.1 on Windows 7:
 
 out - file( description=out.txt, open=w, encoding=UTF-8)
 write( x=äöüßæűŁ, file=out )
 close( con=out )
 
 The last two symbols in the character string are converted to uL while all
 other characters are not changed (which is what I want).  How to explain
 this?  Does it have something to do with my locale?  And is there a way to
 work around this problem? -- Any help would be greatly appreciated.
 
 Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] non-ascii characters in R output

2011-02-18 Thread Matt Shotwell

All,

I'd like to automatically output text from R to HTML. In doing this I've 
run into trouble with non-ascii characters, as my browser (and 
presumably others) does not render such characters correctly. For 
example, the 'fancy' single quotes associated with summary.lm are 
multi-byte characters on my platform. This particular problem is solved 
by options(useFancyQuotes=FALSE). But now I'm concerned about other 
non-ascii characters. As an overkill maybe, my current solution involves 
capture.output and iconv(..., to=ASCII//TRANSLIT). Are there other 
sources of non-ascii character? Is there a better or general solution?


Best,
Matt

 sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.12.1

--
Matthew S Shotwell   Assistant Professor   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] non-ascii characters in R output

2011-02-18 Thread Matt Shotwell
OK, looks like my web browser does render non-ascii characters output by
R when it's given the encoding explicitly. This works for me: meta
http-equiv=Content-Type content=text/html; charset=UTF-8/. So
that's another solution, but not a general one.

-Matt

On Fri, 2011-02-18 at 12:47 -0600, Matt Shotwell wrote:
 All,
 
 I'd like to automatically output text from R to HTML. In doing this I've 
 run into trouble with non-ascii characters, as my browser (and 
 presumably others) does not render such characters correctly. For 
 example, the 'fancy' single quotes associated with summary.lm are 
 multi-byte characters on my platform. This particular problem is solved 
 by options(useFancyQuotes=FALSE). But now I'm concerned about other 
 non-ascii characters. As an overkill maybe, my current solution involves 
 capture.output and iconv(..., to=ASCII//TRANSLIT). Are there other 
 sources of non-ascii character? Is there a better or general solution?
 
 Best,
 Matt
 
   sessionInfo()
 R version 2.12.1 (2010-12-16)
 Platform: x86_64-pc-linux-gnu (64-bit)
 
 locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 loaded via a namespace (and not attached):
 [1] tools_2.12.1


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >