date:20120210

Actually, is there any way to get at additional information beyond the
classProbs?  In particular, is there any way to find out the
associated weights, or otherwise the row indices into the original
model matrix corresponding to the tested instances?

On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote:
 Oops, found trainControl's classProbs right after I sent!

 On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 I'm dealing with classification problems, and I'm trying to specify a
 custom scoring metric (recall@p, ROC, etc.) that depends on not just
 the class output but the probability estimates, so that caret::train
 can choose the optimal tuning parameters based on this metric.

 However, when I supply a trainControl summaryFunction, the data given
 to it contains only class predictions, so the only metrics possible
 are things like accuracy, kappa, etc.

 Is there any way to do this that I'm looking?  If not, could I put
 this in as a feature request?  Thanks!

 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding all the coefficients for a logit model

2012-02-10 Thread peter dalgaard


On Feb 10, 2012, at 06:08 , R. Michael Weylandt wrote:

 Add -1 to your glm to remove the intercept term -- that will force
 Friday to have its own coefficient.


That's one answer. 

Another one is that the coefficient (in the model with an intercept term) for 
Friday is zero with an s.e. of zero. Namely, by definition, the log-odds for 
Friday minus the log-odds for Friday.

-pd

 
 E.g.,
 
 dg - data.frame(value = rnorm(28), day = letters[1:7])
 lm(value ~ day, data = dg)
 lm(value ~ day - 1, data = dg)
 
 Michael
 
 On Thu, Feb 9, 2012 at 6:58 PM, Abraham Mathew abmathe...@gmail.com wrote:
 Let's say I have a variable, day, which is saved as a factor with 7 levels,
 and I use it in a
 logistic regression model. I ran the model using the car package in R and
 printed out the
 results.
 
 mod1 = glm(factor(status1) ~ factor(day), data=mydat,
 family=binomial(link=logit))
 print(summary(mod1))
 
 The result I get is:
 
 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 (Intercept)   -0.4350 0.0379  -11.48   2e-16 ***
 factor(day)Monday -0.6072 0.0479  -12.69   2e-16 ***
 factor(day)Saturday0.5964 0.0559   10.67   2e-16 ***
 factor(day)Sunday  1.1140 0.0627   17.78   2e-16 ***
 factor(day)Thursday   -0.4492 0.0516   -8.71   2e-16 ***
 factor(day)Tuesday-0.9331 0.0496  -18.82   2e-16 ***
 factor(day)Wednesday  -0.8575 0.0486  -17.63   2e-16 ***
 
 
 It seems that Friday is being used as the baseline, but I want to know
 
 how I can acquire the coefficient for the baseline (friday)?
 
 
 I ran mod1$coefficients, but that didn't do the trick.
 
 
 Can anyone help.
 
 
 Thank you,
 
 Abraham M.
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Recall@p plot using ROCR?

Is it possible to use ROCR to plot a simple recall@p plot?  I.e., a
plot where the x-axis is the position into the ranked test set, and
the y-axis is the recall, so you can see what's the recall in the top
10% of the ranked results.

I searched through the performance() manual but found nothing.

(Not a cutoff-vs-recall graph, since the cutoff is the probability
estimate returned by the model.)

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot using scale_x_date gives Error in seq.int(r1$year, to$year, by)

Hi Aidan,

str is your friend:

 str(g)
'data.frame':   9 obs. of  3 variables:
 $ Date: chr  2011-12-23 2011-12-30 2012-01-06 2011-12-23 ...
 $ variable: Factor w/ 3 levels Price,Yield,..: 1 1 1 2 2 2 3 3 3
 $ value   : num  86.78 86.04 86.44 9.74 9.54 ...

You haven't turned the Date variable into an actual date yet.  Try:

g$Date - as.Date(g$Date)

Hadley

On Fri, Jan 6, 2012 at 9:12 AM, Aidan Corcoran
aidan.corcora...@gmail.com wrote:
 Dear all,

 ggplot gives me an error when trying to plot time series data using a
 date variable as the x axis.

 g-structure(list(Date = c(2011-12-23, 2011-12-30, 2012-01-06,
 2011-12-23, 2011-12-30, 2012-01-06, 2011-12-23, 2011-12-30,
 2012-01-06), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
 3L, 3L, 3L), .Label = c(Price, Yield, CDS Spread), class = factor),
    value = c(86.777, 86.037, 86.437, 9.737, 9.542, 9.683, 580.132,
    576.866, 573.564)), .Names = c(Date, variable, value
 ), row.names = c(100L, 101L, 102L, 202L, 203L, 204L, 304L, 305L,
 306L), class = data.frame)

 g
          Date   variable value
 100 2011-12-23      Price  86.8
 101 2011-12-30      Price  86.0
 102 2012-01-06      Price  86.4
 202 2011-12-23      Yield   9.7
 203 2011-12-30      Yield   9.5
 204 2012-01-06      Yield   9.7
 304 2011-12-23 CDS Spread 580.1
 305 2011-12-30 CDS Spread 576.9
 306 2012-01-06 CDS Spread 573.6
 gp-ggplot(g,aes(x=Date,y=value,group=variable,lty=variable)) +geom_line()
 gp
 #THIS WORKS FINE (BUT AXIS LABELLING NOT VISIBLE WITH MORE DATA, HENCE I 
 WOULD LIKE TO USE SCALE_X_DATE)
 gp-gp+   scale_x_date()
 gp
 Error in seq.int(r1$year, to$year, by) : 'from' must be finite
 In addition: Warning messages:
 1: In get(x, envir = this, inherits = inh)(this, ...) :
  NAs introduced by coercion
 2: In min(x) : no non-missing arguments to min; returning Inf
 3: In max(x) : no non-missing arguments to max; returning -Inf
 4: In min(x) : no non-missing arguments to min; returning Inf
 5: In max(x) : no non-missing arguments to max; returning -Inf


 In previous help requests, the workaround of specifying the unit was 
 suggested:
 gp-gp+   scale_x_date(major=years)
 but this doesn't work for me (same error).

 Any help is very much appreciated! Thanks in advance.

 Aidan

 R version 2.14.1 (2011-12-22)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252
 LC_MONETARY=English_Ireland.1252
 [4] LC_NUMERIC=C                     LC_TIME=English_Ireland.1252

 attached base packages:
  [1] datasets  tools     grDevices grid      splines   graphics  stats
    tcltk     utils
 [10] methods   base

 other attached packages:
  [1] micEcon_0.6-6      miscTools_0.6-12   np_0.40-11
 cubature_1.0       boot_1.3-3
  [6] RODBC_1.3-3        sqldf_0.4-6.1      chron_2.3-41
 gsubfn_0.5-7       DBI_0.2-5
 [11] Haver_1.0          xtable_1.5-6       plm_1.2-7
 sandwich_2.2-7     MASS_7.3-16
 [16] Formula_1.0-1      nlme_3.1-102       bdsmatrix_1.0
 RBloomberg_0.4-150 rJava_0.9-1
 [21] gtools_2.6.2       gdata_2.8.2        ggplot2_0.8.9
 proto_0.3-9.2      zoo_1.7-4
 [26] reshape_0.8.4      plyr_1.6           svSocket_0.9-52
 TinnR_1.0.3        R2HTML_2.2
 [31] Hmisc_3.8-3        survival_2.36-10

 loaded via a namespace (and not attached):
 [1] cluster_1.14.1        digest_0.5.0          lattice_0.20-0
 RSQLite_0.10.0
 [5] RSQLite.extfuns_0.0.1 svMisc_0.9-63


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GGplot controlling point size across range

On Thu, Jan 12, 2012 at 5:27 PM, Darran King darran.k...@csiro.au wrote:
 Hi all

 New to R and GGplot2 but loving the potential. I am trying to plot four
 separate point plots by looping over the data and plotting a different
 subset each time.

 When I plot the data as a point plot, the size of the points is determined
 by the data values used as below

 qplot(accum_rain, accum_g_radn, data = clim_sub[[i]], size = avgyld, colour
 = avgyld)

 The problem is that i want all four plots to be comparable, so a point size
 representing avgyld = 2000 should be the same on all four plots. However as
 the data for some plots has a smaller range than others and the plots are
 automatically scalling to the range of data in each plot, and the largest
 point is always assigned to the largest value a plot with a top value of say
 5000 with be represented with the same size point as a plot with a top value
 of 7000.

 Any tips on how to scale the point sizes to a defined range of classes and
 still plot the actual data to those classes?

Specify limits to scale_area:  + scale_area(limits = c(0, 1000))

If you're still having problems, you might want to try the ggplot2
mailing list, as Jeff suggested.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pretty(range(data$z),10) Error

Hi Mario,

If you're still having problems, I'd suggest sending a small
reproducible example
(https://github.com/hadley/devtools/wiki/Reproducibility) to the
ggplot2 mailing list.

Hadley

On Tue, Jan 17, 2012 at 12:32 PM, Mario Giesel rr.gie...@yahoo.de wrote:
 Hello, R-List,
 I'm getting error messages when adding geom_density2d() [package ggplot2]:
 
 Fehler in pretty(range(data$z), 10) :
   NA/NaN/Inf in externem Funktionsaufruf (arg 1)
 Zusätzlich: Warnmeldungen:
 1: Removed 1 rows containing missing values (stat_contour).
 2: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück
 3: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück
 4: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück
 5: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück
 

 I installed 2.11.1 Patched in the hope to overcome this obstacle but in vain.
 I tried reading spss and csv format - no effect.

 While this one works:
 ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ segment,ncol=3)
 This one fails:
 ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ 
 segment,ncol=3) + geom_density2d()

 But it only fails for 2 out of 10 x-variables, although they do not differ in 
 structure.
 Data structure looks like this:

 segment Qq01 Qq02 Qq02a Qq04 Qq05 Qq07 Qq08 Qq10 Qq11 Qq15 Qq17a
 A 7 5 5 5 5 4 5 5 3 7 7
 A 5 4 4 3 4 3 4 4 4 5 6
 B 3 3 3 3 2 3 4 2 4 3 2
 B 6 4 5 3 3 3 4 4 3 6 6
 C 3 3 3 3 3 4 2 3 3 4 2
 C 2 1 4 3 1 4 1 1 3 4 2
 D 5 3 4 3 3 3 4 3 3 6 3
 D 5 3 4 2 2 4 4 3 3 4 4
 E 3 3 3 2 3 4 4 3 2 3 4
 E 7 5 5 5 5 4 4 5 1 7 7
 F 6 5 4 3 3 3 4 3 3 6 6
 F 5 3 4 3 3 4 4 3 3 4 4
 G 4 3 3 3 2 1 3 3 3 4 4
 G 6 5 5 5 5 4 5 5 3 7 7
 H 6 5 5 4 4 4 5 4 3 5 6
 H 7 5 5 5 4 4 5 5 3 7 7
 I 6 5 5 4 4 4 5 4 3 6 6
 I 6 3 4 2 3 4 4 3 4 6 4
 J 5 3 4 1 3 4 4 3 3 5 5
 J 4 2 4 3 2 2 3 2 3 4 4
 K 4 4 4 2 3 4 4 4 3 5 5
 K 6 4 3 3 2 3 4 3 3 6 5
 L 6 4 5 3 3 4 5 4 3 6 6
 L 3 1 2 2 1 3 3 1 3 4 4
 ...
 About 1.800 cases (more than 100 per segment).
 Did anybody experience similar problems?

 Thanks for all comments,
  Mario
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting bar graph over a geographical map

Hi Simon,

You might want to try sending a small reproducible example
(https://github.com/hadley/devtools/wiki/Reproducibility) to the
ggplot2 mailing list.

Hadley

On Tue, Jan 31, 2012 at 10:53 PM, sjlabrie sjlab...@mit.edu wrote:
 Hi,

 I am looking for a way to plot bar on a map instead of the standard points.
 I have been using ggplot2 and maps libraries.
 The points are added with the function geom_point. I know that there is a
 function
 geom_bar but I can't figure out how to use it.

 Thank you for your help,

 Simon

 ### R-code
 library(ggplot2)
 library(maps)

 measurements - read.csv(all_podo.count.csv, header=T)
 allworld - map_data(world)

 pdf(map.pdf)
 ggplot(measurements, aes(long, lat)) +
  geom_polygon(data = allworld, aes(x = long, y = lat, group = group),
  colour = grey70, fill = grey70) +
  geom_point(aes(size = ref)) +
  opts(axis.title.x = theme_blank(),
  axis.title.y = theme_blank()) +
  geom_bar(aes(y = normcount))
 dev.off()
 ###




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Plotting-bar-graph-over-a-geographical-map-tp4346925p4346925.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 geom_polygon fill

Hi Raimund,

To increase your chances of getting help, I'd recommend using the
ggplot2 mailing list, and reducing your example down to the essence of
the problem.  For example, the theme components don't affect the
problem, but make the code longer, and so harder to understand.

Hadley

On Mon, Feb 6, 2012 at 10:50 AM, raimund rai...@gmail.com wrote:
 Hi everyone,

 i've been trying to make a special plot with ggplot2, but I can't get it to
 fill the polygon I'd like to see filled so very very much.

 I want to display the difference or change in the distribution of the
 modified Rankin Scale between two groups. mRS is a scale for disability or
 daily activities competence.

 It looks like this.

 http://r.789695.n4.nabble.com/file/n4361919/rankinplot.png

 I just wish the polygons in between the bars would fill in the same colors
 as the bar segments do.
 Interestingly, in the example provided by the geom_polygon help page, there
 is a fill, which leads me to suspect that something with my data frame might
 be wrong.

 If I supply a colour argument, I get borders, but not always in the color
 I defined. The attached image has such borders, to make clear what polygons
 I am talking about. In the final plot I don't desire such borders, only the
 as of yet unattainable fill.


 Here's my code:

 library(ggplot2)
 library(plyr)


 # define good looks

 no_margins - opts(
  axis.line =         theme_blank(),
  axis.text.x =       theme_blank(),
  axis.ticks =        theme_blank(),
  axis.title.x = theme_text(size = 12, vjust = 0.15),
  axis.title.y = theme_text(angle = 90, size = 12, vjust = 0.2),
  axis.ticks.length = unit(0, cm),
  axis.ticks.margin = unit(0, cm),
  panel.background =  theme_blank(),
  panel.grid.major =  theme_blank(),
  panel.grid.minor =  theme_blank(),
  plot.background =   theme_blank(),
  plot.title =        theme_blank(),
  plot.margin =       unit(c(1, 1, 1, 1.5), lines)
  )

 sfm = scale_fill_manual(mRS, c(0=darkgreen,
                                 1=forestgreen,
                                 2=mediumseagreen,
                                 3=coral,
                                 4=red,
                                 5=darkred,
                                 6=black))

 barwidth = 0.6

 # good looks defined

 smalldummy = data.frame(
  mRS = as.factor(rep(0:6,2)),
  rsfreq = sample(0:6,14,replace=T),
  treatment = factor(rep(c(one,two),each=7))
  )

 smalldummy = ddply(smalldummy, .(treatment), transform,
                   textpos = cumsum(rsfreq/sum(rsfreq)) -
 rsfreq/sum(rsfreq)/2, # center within segment
                   lineposx = cumsum(rsfreq/sum(rsfreq)))
 # segment borders without 0



 # make the xs of the polygon
 polylo = 1 + barwidth/2
 polyhi = 2 - barwidth/2
 xs = rep(c(polylo,polyhi,polyhi,polylo), 7)

 # make the ys of the polygon
 tmp1 = c(0, smalldummy$lineposx[1:7])
 tmp2 = c(0, smalldummy$lineposx[8:14])
 ys = c()
 for(i in 1:7) {
  nu = c(tmp1[i], tmp2[i], tmp2[i+1], tmp1[i+1])
  ys = c(ys, nu)
 }
 m = as.factor(rep(0:6, each=4))
 tmpdf = data.frame(xs, ys, mRS = m)


 bigdummy = merge(smalldummy, tmpdf, by = mRS)

 ggplot(data = bigdummy, aes(x = treatment, y = rsfreq, fill = mRS)) +
  geom_bar(aes(width = barwidth),stat=identity,position=fill) +
  geom_text(aes(label=as.character(mRS),
    y = ifelse(rsfreq != 0, textpos, NA)),
    size = 8, colour = white) +
  geom_polygon(aes(x = xs, y = ys, group = mRS, fill = mRS)) +
  ylab(Modified Rankin Scale) + xlab(Treatment) +
  coord_flip() + theme_bw() + opts(legend.position = none) + no_margins +
 sfm




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/ggplot2-geom-polygon-fill-tp4361919p4361919.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Multiple comparisons of lme model - interactions?

2012-02-10 Thread Fredrik Karlsson

Dear list,

Please excuse my ignorance, but I'm trying to model some data using the lme
package. vot is a numeric response, and condition, location and obs are all
categories.
This works:

 anova(vot.lme - lme(vot ~ condition * location *
obs,data=mergedCodesL,random= ~1 |patient))
   numDF denDF  F-value p-value
(Intercept)1  1898 462.7519  .0001
condition  2  1898   8.4126  0.0002
location   112   0.0272  0.8718
obs2  1898 472.5526  .0001
condition:location 2  1898   1.0467  0.3513
condition:obs  4  1898   1.0683  0.3706
location:obs   2  1898   9.7067  0.0001
condition:location:obs 4  1898   4.6143  0.0010


If I then would like to do post-hoc testing, I found in the email archives
that I could use the glht function of multcomp - something like

summary(glht(vot.lme, linfct=mcp(obs = Tukey)))

However, if I would like to investigate the condition:location:obs -
interaction. What do I do then?


Best!

/Fredrik
-- 
Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Custom caret metric based on prob-predictions/rankings

2012-02-10 Thread Max Kuhn

I think you need to read the man pages and the four vignettes. A lot
of your questions have answers there.

If you don't specify the resampling indices, they ones generated for
you are saved in the train object:

 data(iris)
 TrainData - iris[,1:4]
 TrainClasses - iris[,5]

 knnFit1 - train(TrainData, TrainClasses,
+  method = knn,
+  preProcess = c(center, scale),
+  tuneLength = 10,
+  trControl = trainControl(method = cv))
Loading required package: class

Attaching package: ‘class’

The following object(s) are masked from ‘package:reshape’:

condense

Warning message:
executing %dopar% sequentially: no parallel backend registered
 str(knnFit1$control$index)
List of 10
 $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ...
 $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ...
 $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ...
 $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ...
 $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ...
 $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ...
 $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ...
 $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
 $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ...

There is also a savePredictions argument that gives you the hold-out results.

I'm not sure which weights you are referring to.

On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang yanghates...@gmail.com wrote:
 Actually, is there any way to get at additional information beyond the
 classProbs?  In particular, is there any way to find out the
 associated weights, or otherwise the row indices into the original
 model matrix corresponding to the tested instances?

 On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote:
 Oops, found trainControl's classProbs right after I sent!

 On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 I'm dealing with classification problems, and I'm trying to specify a
 custom scoring metric (recall@p, ROC, etc.) that depends on not just
 the class output but the probability estimates, so that caret::train
 can choose the optimal tuning parameters based on this metric.

 However, when I supply a trainControl summaryFunction, the data given
 to it contains only class predictions, so the only metrics possible
 are things like accuracy, kappa, etc.

 Is there any way to do this that I'm looking?  If not, could I put
 this in as a feature request?  Thanks!

 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-question about html writer: hwriter; how to create scroll for matrices?

2012-02-10 Thread Bornhauser Hanspeter

 
Dear All

This is the first time I use this help forum. My apologies in case I did
not follow a predefined format.

My question: 

I use the package hwriter.

How can I produce a matrix (html) that allows scrolling (up and down,
left and right) or, alternatively, prduces a fixed row for the headers
and a fixed column for the row names (so that the names do not disappear
if I go to the right or downward on that webpage)?

I did try quite few things (also via CSS) but I seem to have problems
referring to the entire matrix rather than a single item of it.

For example: assume we have a correlation matrix of some dimension
(assume 100x100). What would be an efficient way to do the above?

Thank you to anyone who has a go at this. 

Kind Regards

Hanspeter


This message may contain confidential information and is...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] making multiple lines using qqplot

2012-02-10 Thread ilai

Melissa,
par(new=T) works as many times as you use it. You don't provide data,
but (assuming it is not NULL) more likely your n=500 qqplot was just
obscuring the points of the n=50 plot.

Reverse the order (i.e. qqplot 500 first, 50, 5 last) and see if all
three are there (as there are more 500 you should still see green on
the extremes).

Second, you say you want lines but used pch=20. replace with
type='l' to get lines (may also help with your main problem). If you
want to stick with points and your device supports it, you can
consider semi-transparent colors.

Cheers

On Thu, Feb 9, 2012 at 9:00 PM, Melrose2012
melissa.patric...@stonybrook.edu wrote:
 Hi Everyone,

 I want to make 3 lines on the same graph (not as subplots, all in the same
 window, one on top of each other) and I want them to be quantile-quantile
 plots (qqplot).  Essentially, I am looking for the equivalent of Matlab's
 hold on command to use with qqplot.  I know I can use 'points' or 'lines',
 but these do not give me a qqplot (only appear to work as scatter plots).  I
 found the syntax 'par(new=TRUE)' but that only seems to work for two lines,
 not for three.

 My script currently looks like:

 qqplot(nq.n5,tq.n5,col=red,xlab=Normal Distribution Quantiles,ylab=t
 Distribution Quantiles,main=Quantile-Quantile Plot of Normal vs
 t-Distribution for Various Sample Sizes,pch=20)
 par(new=TRUE)
 qqplot(nq.n50,tq.n50,col=blue,xlab=,ylab=,pch=20).
 par(new=TRUE)
 qqplot(nq.n500,tq.n500,col=green,xlab=,ylab=,pch=20)
 legend(topleft,c(n=5,n=50,n=500),fill=c(red,blue,green))

 I realize that this only plots the first and the third qqplot because by
 doing par(new=TRUE) again, it gets rid of the middle one.  I don't know how
 to get around this and get all 3 lines on the same plot.

 Can anyone please help me with this syntax?

 Thank you very much for your time and advice!

 Cheers,
 Melissa

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/making-multiple-lines-using-qqplot-tp4375273p4375273.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot using scale_x_date gives Error in seq.int(r1$year, to$year, by)

2012-02-10 Thread Aidan Corcoran

Thanks Hadley, of course I should have spotted that.

On Fri, Feb 10, 2012 at 12:46 PM, Hadley Wickham had...@rice.edu wrote:
 Hi Aidan,

 str is your friend:

 str(g)
 'data.frame':   9 obs. of  3 variables:
  $ Date    : chr  2011-12-23 2011-12-30 2012-01-06 2011-12-23 ...
  $ variable: Factor w/ 3 levels Price,Yield,..: 1 1 1 2 2 2 3 3 3
  $ value   : num  86.78 86.04 86.44 9.74 9.54 ...

 You haven't turned the Date variable into an actual date yet.  Try:

 g$Date - as.Date(g$Date)

 Hadley

 On Fri, Jan 6, 2012 at 9:12 AM, Aidan Corcoran
 aidan.corcora...@gmail.com wrote:
 Dear all,

 ggplot gives me an error when trying to plot time series data using a
 date variable as the x axis.

 g-structure(list(Date = c(2011-12-23, 2011-12-30, 2012-01-06,
 2011-12-23, 2011-12-30, 2012-01-06, 2011-12-23, 2011-12-30,
 2012-01-06), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
 3L, 3L, 3L), .Label = c(Price, Yield, CDS Spread), class = factor),
    value = c(86.777, 86.037, 86.437, 9.737, 9.542, 9.683, 580.132,
    576.866, 573.564)), .Names = c(Date, variable, value
 ), row.names = c(100L, 101L, 102L, 202L, 203L, 204L, 304L, 305L,
 306L), class = data.frame)

 g
          Date   variable value
 100 2011-12-23      Price  86.8
 101 2011-12-30      Price  86.0
 102 2012-01-06      Price  86.4
 202 2011-12-23      Yield   9.7
 203 2011-12-30      Yield   9.5
 204 2012-01-06      Yield   9.7
 304 2011-12-23 CDS Spread 580.1
 305 2011-12-30 CDS Spread 576.9
 306 2012-01-06 CDS Spread 573.6
 gp-ggplot(g,aes(x=Date,y=value,group=variable,lty=variable)) +geom_line()
 gp
 #THIS WORKS FINE (BUT AXIS LABELLING NOT VISIBLE WITH MORE DATA, HENCE I 
 WOULD LIKE TO USE SCALE_X_DATE)
 gp-gp+   scale_x_date()
 gp
 Error in seq.int(r1$year, to$year, by) : 'from' must be finite
 In addition: Warning messages:
 1: In get(x, envir = this, inherits = inh)(this, ...) :
  NAs introduced by coercion
 2: In min(x) : no non-missing arguments to min; returning Inf
 3: In max(x) : no non-missing arguments to max; returning -Inf
 4: In min(x) : no non-missing arguments to min; returning Inf
 5: In max(x) : no non-missing arguments to max; returning -Inf


 In previous help requests, the workaround of specifying the unit was 
 suggested:
 gp-gp+   scale_x_date(major=years)
 but this doesn't work for me (same error).

 Any help is very much appreciated! Thanks in advance.

 Aidan

 R version 2.14.1 (2011-12-22)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_Ireland.1252  LC_CTYPE=English_Ireland.1252
 LC_MONETARY=English_Ireland.1252
 [4] LC_NUMERIC=C                     LC_TIME=English_Ireland.1252

 attached base packages:
  [1] datasets  tools     grDevices grid      splines   graphics  stats
    tcltk     utils
 [10] methods   base

 other attached packages:
  [1] micEcon_0.6-6      miscTools_0.6-12   np_0.40-11
 cubature_1.0       boot_1.3-3
  [6] RODBC_1.3-3        sqldf_0.4-6.1      chron_2.3-41
 gsubfn_0.5-7       DBI_0.2-5
 [11] Haver_1.0          xtable_1.5-6       plm_1.2-7
 sandwich_2.2-7     MASS_7.3-16
 [16] Formula_1.0-1      nlme_3.1-102       bdsmatrix_1.0
 RBloomberg_0.4-150 rJava_0.9-1
 [21] gtools_2.6.2       gdata_2.8.2        ggplot2_0.8.9
 proto_0.3-9.2      zoo_1.7-4
 [26] reshape_0.8.4      plyr_1.6           svSocket_0.9-52
 TinnR_1.0.3        R2HTML_2.2
 [31] Hmisc_3.8-3        survival_2.36-10

 loaded via a namespace (and not attached):
 [1] cluster_1.14.1        digest_0.5.0          lattice_0.20-0
 RSQLite_0.10.0
 [5] RSQLite.extfuns_0.0.1 svMisc_0.9-63


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxme with frailty

2012-02-10 Thread Terry Therneau

 A couple of clarifications for you.

1. I write mixed effects Cox models as exp(X beta + Z b), beta = fixed
effects coefficients and b = random effects coefficients.  I'm using
notation that is common in linear mixed effects models (on purpose).
  About 2/3 of the papers use exp(X beta)* c, i.e., pull the random
effects out of the exponent.  Does it make a difference?  Not much: b
will be Gaussian and c will be log-normal. However, it's much easier to
write down complicated (crossed, nested, ect) random effects in the
notation I chose.  The description you give from HL is using the c
style.

 2. If var(b) is known, solution of beta and b is straightforward.  It's
a small modification of the ususal coxph internals for a fixed effects
model.  That is, a bit of c code that has been optimized and exercised
over a decade, it makes sense to re-use it.  So all the routines I know
of solve mixed effects in a nested fashion: an outer maximizer looks for
the parameters of var(b), and for each trial value calls the known
routine (which itself iterates) for the corresponding (beta, b) pair.
The HL formula you quote only holds in a specific case.

Terry T

 begin included message --
Thank you very much for taking the time to answer. This pointed me in
the right direction.
For those interested, I found a useful explanation  of the derivation
of the estimated variance of the random effect in Hosmer  Lemeshow's
Applied Survival Analysis (1999), pp.321-26 (I love your book, Dr.
Therneay, but I needed something easier...). They proceed in 4 steps:

1. Obtain the cumulative hazard function for each subject.
2. Choose an arbitrary value for the variance parameter of the
frailties (call it theta).
3. Compute for each subject an estimate of the value of their
frailties, USING this variance parameter theta:
frailty_i= \frac(1+\theta \times c_i}{1+\theta \times H_i} (formula on
p. 321), where H is the cumulative hazard for the subject. So if theta
is 0 (no variance), then frailty=1 (i.e., no excess frailty). As theta
goes to infinity, the estimated frailty is simply the ratio
1/(cumulative risk so far) or 1/(cumulative risk so far), depending on
whether the subject is still alive or not.
4. fit the proportional hazard model with the same covariates as
before, but this time including the frailties in the hazard function.
5. calculate a log-likelihood for the model.

Repeat this for all possible values of the frailties (in practice,
sweep through them according to some algorithm), and pick the one with
the highest log-likelihood.

SO IF I understand correctly, the frailties are calculated GIVEN a
variance parameter of the frailties, and not the reverse. I.e., theta
is chosen first, and the random effects are estimated accordingly.
Which is why the variance of the estimated random effects is NOT the
same as theta. Did I get this right?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Find interval between numbers in list or vector

2012-02-10 Thread alend

hi,

  I have a vector as follow:

  myVec = c(1,4,10,30)

How can I get a vector that show the interval of the numbers.

 I need to get this result:

  distance
3,6,20

--
View this message in context: 
http://r.789695.n4.nabble.com/Find-interval-between-numbers-in-list-or-vector-tp4376115p4376115.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help needed please

2012-02-10 Thread Jaymin Shah

I have coded a time series from simulated data:

simtimeseries - arima.sim(n=1024,list(order=c(4,0,0),ar=c(2.7607, -3.8106, 
2.6535,  -0.9258),sd=sqrt(1)))
#show roots are outside unit circle
plot.ts(simtimeseries, xlab=, ylab=, main=Time Series of Simulated Data)

# Yule 


q1 - cbind(simtimeseries[1:1024])
q2 - t(q1)%*%q1
s0 - q2/1204
r1 - cbind(simtimeseries[1:1023])
r2 - cbind(simtimeseries[2:1024])
r3 - t(r1)%*%r2
s1 - r3/1204
t1 - cbind(simtimeseries[1:1022])
t2 - cbind(simtimeseries[3:1024])
t3 - t(t1)%*%t2
s2 - t3/1204
u1 - cbind(simtimeseries[1:1021])
u2 - cbind(simtimeseries[4:1024])
u3 - t(u1)%*%u2
s3 - u3/1204
v1 - cbind(simtimeseries[1:1020])
v2 - cbind(simtimeseries[5:1024])
v3 - t(v1)%*%v2
s4 - v3/1204

i0 - c(s0,s1,s2,s3)
i1 - c(s1,s0,s1,s2)
i2 - c(s2,s1,s0,s1)
i3 - c(s3,s2,s1,s0)

gamma - cbind(i0,i1,i2,i3)
eta -c(s1,s2,s3,s4)
inversegamma - solve(gamma)
phihat - inversegamma%*%eta
phihat

Phihat - cbind(phihat)
s - c(s1,s2,s3,s4)
S - cbind(s)
sigmasquaredyule - s0 - (t(Phihat)%*%S)
sigmasquaredyule



I did a yule walker estimate on the simulated data and wanted to work out phi 
hat which is a vector of 4 values and sigmasquaredyule which is one value. 
However,  I want to run the simulated data 100 times i.e. in a for loop and 
then take the averages of the phi hat values and sigmasquaredyule value.

How would i repeat this simulated time series lots of times (e.g. a 100 times) 
and store the average value of phi hat and sigmasquaredyule.

Thank you


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Find interval between numbers in list or vector

2012-02-10 Thread Dimitris Rizopoulos


Have a look at function diff(), e.g.,

diff(c(1,4,10,30))


I hope it helps.

Best,
Dimitris


On 2/10/2012 1:33 PM, alend wrote:

hi,

   I have a vector as follow:

   myVec = c(1,4,10,30)

How can I get a vector that show the interval of the numbers.

  I need to get this result:

   distance

3,6,20


--
View this message in context: 
http://r.789695.n4.nabble.com/Find-interval-between-numbers-in-list-or-vector-tp4376115p4376115.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Q - scatterplots

2012-02-10 Thread Jhope

I was able to make a scatterplot but ...

1) what does the 86 mean? The 86 shows up on the graph as well. 

 scatterplot (Shells/TotalEggs ~ Sector, data = data.to.analyze)
[1] 86

2) Also how do you change the Y axis title? I don't want it to read
Shells/TotalEggs, instead I would like it to read Average Hatching Rate (%).

3) What does this error mean? Rayos if composed of section 1, 2, 3, 4 and 5

 scatterplot (Shells/TotalEggs ~ Rayos, data = data.to.analyze)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

J

--
View this message in context: 
http://r.789695.n4.nabble.com/Q-scatterplots-tp4375632p4375632.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Split matrix into square submatices

2012-02-10 Thread xiddw

Hi everybody, 

I'm looking for an optimal way to split  a big matrix  (e.g. ncol = 8,
nrow=8)  into small square submatrices  (e.g. ncol=2, nrow=2)

For example

If I have 

  h
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]19   17   25   33   41   49   57
[2,]2   10   18   26   34   42   50   58
[3,]3   11   19   27   35   43   51   59
[4,]4   12   20   28   36   44   52   60
[5,]5   13   21   29   37   45   53   61
[6,]6   14   22   30   38   46   54   62
[7,]7   15   23   31   39   47   55   63
[8,]8   16   24   32   40   48   56   64

and I want to split matriz h into 16 small submatrices:

 g[1]
 [,1] [,2]
[1,]19
[2,]2   10

 g[2]
 [,1] [,2]
[1,]   17   25
[2,]   18   26

...

 g[4]
 [,1] [,2]
[1,]   49   57
[2,]   50   58

...

 g[16]
 [,1] [,2]
[1,]   55   63
[2,]   56   64

Always the big matrix would be a square matrix and also would have a nrow
and ncol in order of  a power of two, so it always could  be splitted in at
least halves. 

Until now, I'm able to split a matrix but keeping the number of original
matrix columns.

I mean, if I want to split into 16 submatrices, using the following command
what I get its 4 lists: 
But I haven't found a way to split resultant lists in order to get 4
matrices for each list.

 g - split(as.data.frame(h), (1:4))
 g
$`1`
  V1 V2 V3 V4 V5 V6 V7 V8
1  1  9 17 25 33 41 49 57
5  5 13 21 29 37 45 53 61

$`2`
  V1 V2 V3 V4 V5 V6 V7 V8
2  2 10 18 26 34 42 50 58
6  6 14 22 30 38 46 54 62

$`3`
  V1 V2 V3 V4 V5 V6 V7 V8
3  3 11 19 27 35 43 51 59
7  7 15 23 31 39 47 55 63

$`4`
  V1 V2 V3 V4 V5 V6 V7 V8
4  4 12 20 28 36 44 52 60
8  8 16 24 32 40 48 56 64


I would appreciate any hint. Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/Split-matrix-into-square-submatices-tp4375563p4375563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rpart and splitting criteria

2012-02-10 Thread MYRIAM TABASSO

Dear All,
I have questions about the function rpart  to construct a regression tree
in
R code.
My problem is how to change the splitting criteria.

In the rpart we have : parms=list(split=..) , I ask you if in this
command is it possible to use an another splitting criterion to substitute
the
default criteria( gini or information)?

Does someone can help me ?
Thank you,
Myriam Tabasso

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to made .exe of any gui project in R

2012-02-10 Thread sagarnikam123

i made jDialog through JGR package(Java gui 4 R), 
i want to export it with imported external package e.g.bio3d,RCOR , into
.exe 

or if not possible convert code to executable jar file 
how should i go, give me package name,example codes (if possible) 

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-made-exe-of-any-gui-project-in-R-tp4375938p4375938.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] access results of graph.maxflow and program max-flow algorithm

2012-02-10 Thread Wendy

Hi all, 

I want to use graph.maxflow function for my work. Given a network, I could
get a result, but I could not figure out how the function get the result.
Did anyone have the same problem before? How did you resolve it? Does anyone
know any other functions/programs for max-flow analysis.

Thank you in advance.
Wendy

--
View this message in context: 
http://r.789695.n4.nabble.com/access-results-of-graph-maxflow-and-program-max-flow-algorithm-tp4375744p4375744.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] clustering and the region of integration

2012-02-10 Thread Barbara Uszczynska

Dear R users,

I'm having trouble with calculating pvalues for my 2d dataset. First I
performed clustering and I would like to get some info about the strength
of cluster membership for each point. I've calculated (thanks to nice
people help) the multivariate normal densities (mnd) using dmvnorm function:

pd11=mvtnorm::dmvnorm(dataset1,mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1])

I've obtained a vector of mnds for each cluster:

NA12043  NA12249  NA12264  NA12707  NA12716  NA12717
   NA12751  NA12762  NA12864  NA12873  NA07034  NA07048
 NA07055  NA07345  NA07348  NA07357  NA10830
 NA10835
8.627681e+00 8.465797e+00 1.522724e+01 2.047262e+01 1.780368e+01
2.443946e+01 8.687642e+00 5.024366e+00 2.163811e+01 6.093326e-01
1.503374e+00 2.263341e+00 2.177880e+01 2.851877e+01 1.240402e+01
7.498245e+00 1.186389e+01 1.229760e+01
 NA12154  NA12234  NA12236  NA12763  NA12801
 NA12812  NA12813  NA12878  NA10851  NA10854  NA10857
   NA10859  NA10861  NA10863  NA11839  NA11840  NA11881
 NA11882
8.293616e+00 4.019101e-19 2.733848e+01 2.623284e+01 2.320810e+01
5.112927e-01 1.432336e+01 1.000314e+01 1.675454e+01 8.239816e+00
2.449679e+01 2.655419e+01 2.294064e+01 2.218329e-17 8.844933e+00
2.911991e+00 2.170381e+01 3.089883e+00
 NA11994  NA12044  NA12056  NA12057  NA12891
 NA12892
1.668749e+01 1.588963e+01 5.913443e+00 2.924297e+01 1.765777e+01
7.935129e+00

Next, what I would like to do is to calculate the pvalue for each point,
which was assigned to particular cluster. In order to do this i'm using
pmvnorm function, but I found it difficult to set the region of
integration. As I understand to get the probability of cluster membership I
should define how 'far' from the cluster mean is my point. However, I've
got 2d dataset and my mean is also 2d:

   [1,] [2,]
 1.348992  1.269590

but I've got only one density value for each point.

Using:

pmvnorm(mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1],
lower=2.218329e-17, upper=as.vector(dataset1MC$parameters$mean[,1]))

gives strange results, since for 2.218329e-17 the output is:

 [1] 0.348126
attr(,error)
[1] 1e-15
attr(,msg)
[1] Normal Completion

and

pmvnorm(mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1],
lower= as.vector(dataset1MC$parameters$mean[,1]) , upper=2.924297e+01)

gives:

[1] 0.348126
attr(,error)
[1] 1e-15
attr(,msg)
[1] Normal Completion

If it is possible I would like to get some info about:

Is my idea of calculating  the probability of cluster membership is
correct? How I can set properly the region of integration?

I would be grateful for any help.

Best Wishes,

Bas.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Trust in a glm.nb model results with an itereation limit reached

2012-02-10 Thread Lucas

Hello to everyone.
I'm fitting a glm.nb model to a count data. I'm using about 8 predictive
variables.
Once a run the script I do get a result but it tells me that the iteration
limit has been reached.
So, can i trust the results given by the model? 
Could it be a multicollinearity problem?

Thank you for taking the time to help me.
Greetings

Lucas.


--
View this message in context: 
http://r.789695.n4.nabble.com/Trust-in-a-glm-nb-model-results-with-an-itereation-limit-reached-tp4376073p4376073.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Need to aggregate large dataset by week...

2012-02-10 Thread danielrev

Hi all, 

I have a large dataset with ~8600 observations that I want to compress to
weekly means. There are 9 variables (columns), and I have already added a
week column with 51 weeks. I have been looking at the functions:
aggregate, tapply, apply, etc. and I am just not savvy enough with R to
figure this out on my own, though I'm sure it's fairly easy. I also have the
Dates (month/day/year) for all of the observations, but I figured just
having a week column may be easier. If someone wanted to show me how to
organize this data using a date function and aggregating by month that would
be useful too!  

Here's an example of the data set, with only 5 of the variables and 10 of
8600 obs.:

 weekrainfall windspeed   winddir  temp   oakdepth
1   1  0.2000   0.89000  245.9200  1.15   4.40
2   1  0.   0.84000  292.8800  1.19   5.30
3   1  0.2000   0.74000  258.5400  1.36   6.00
4   1  0.   0.930003.7000  1.43   4.40
5   1  0.2000   0.69000   37.8200  1.56   5.20
6   1  0.   0.8   17.2900  1.69   4.40
7   1  0.2000   0.7   28.7300  1.88   5.00
8   1  0.2000   1.12000  294.3700  1.93   6.00
9   1  0.   1.21000  274.9700  1.80   4.40
10  1  0.   1.31000  279.2400  1.86   5.80

...so after about 170 observations it changes to week 2, and so on.

I've tried something like this, but its only one variable's mean, and I
would rather have the rows=weeks and columns= the different variables. 

 tapply(metdata$rainfall,metdata$week,FUN=mean)
  1   2   3   4   5   6 
0.080952381 0.101190476 0.379761905 0.179761905 0.0 0.295238095 
  7   8   9  10  11  12 
0.146428571 0.015476190 0.16389 0.098809524 0.065476190 0.215476190 

Hope this is enough information and that I'm not just re-asking an old
question. Thanks so much in advance for any help. 



--
View this message in context: 
http://r.789695.n4.nabble.com/Need-to-aggregate-large-dataset-by-week-tp4376154p4376154.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] best option for big 3D arrays?

2012-02-10 Thread Djordje Bajic

Hi all,

I am trying to fill a 904x904x904 array, but at some point of the loop R
states that the 5.5Gb sized vector is too big to allocate. I have looked at
packages such as bigmemory, but I need help to decide which is the best
way to store such an object. It would be perfect to store it in this cube
form (for indexing and computation purpouses). If not possible, maybe the
best is to store the 904 matrices separately and read them individually
when needed?

Never dealed with such a big dataset, so any help will be appreciated

(R+ESS, Debian 64bit, 4Gb RAM, 4core)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] e1071/svm: never finishes

2012-02-10 Thread Sam Steingold

When I tried to run svm on the same data frame, memory usage as reported
by top(1) doubled to 4GB almost right away and the function never
returned (has been running for ~15 hours now). ^C does not stop it.
This is most unusual, libsvm has always seemed very fast.

This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).

 * Sam Steingold f...@tah.bet [2012-02-09 21:43:30 -0500]:

 I did this:
 nb - naiveBayes(users, platform)
 pl - predict(nb,users)
 nrow(users) == 314781
 ncol(users) == 109

 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
 (tens of minutes).  why?

 2. the predict results were completely off the mark (quite the opposite
 of the expected overfitting).  suffice it to show the tables:

 pl:

android blackberry   ipad iphone lg  linuxmac 
  3  5 11 14 312723  5 11 
 mobile  nokiasamsungsymbianunknownwindows 
   1864 17 16112  0  0 

 platform:
android blackberry   ipad iphone lg  linuxmac 
  18013   1221   2647   1328  4   2936  34336 
 mobile  nokiasamsungsymbianunknownwindows 
 18 88 39103   2660 251388 

 i.e., nb classified nearly everything as lg while in the actual data
 lg is virtually nonexistent.

 3. when I print nb, I see A-priori probabilities (which are what I
 expected) and Conditional probabilities which are confusing because
 there are only two of them, e.g.:

  android0.048464998 0.43946764
  blackberry 0.001638002 0.04045564
  ipad   0.322251606 1.84940588
  iphone 0.030873494 0.23250250
  lg 0.0 0.
  linux  0.023501362 0.34698919
  mac0.082653774 1.22535027
  mobile 0.0 0.
  nokia  0.0 0.
  samsung0.0 0.
  symbian0.0 0.
  unknown0.003759398 0.08219078
  windows0.021158528 0.32916970

 the predictors are integers.
 is the first column for the 0 predictors and the second for all non-0?
 Is there a way to ask naiveBayes to differenciate between non-0 values?

 thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
http://jihadwatch.org http://camera.org http://www.memritv.org
Don't ascribe to malice what can be adequately explained by stupidity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to combine two matrix or vectors

2012-02-10 Thread summer

such as a=[1,2,3 ],b =[2,4] I want to get a new one [1 2 3 4 5]. Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-combine-two-matrix-or-vectors-tp4376467p4376467.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to combine two matrix or vectors

2012-02-10 Thread R. Michael Weylandt

How could you combine a and b as given to get your new one? It
doesn't have the same elements...

?union
?intersect
?rbind
?c

Michael

On Fri, Feb 10, 2012 at 9:55 AM, summer summerwind...@hotmail.com wrote:
 such as a=[1,2,3 ],b =[2,4] I want to get a new one [1 2 3 4 5]. Thank you.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-combine-two-matrix-or-vectors-tp4376467p4376467.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split matrix into square submatices

2012-02-10 Thread Petr Savicky

On Fri, Feb 10, 2012 at 12:02:25AM -0800, xiddw wrote:
 Hi everybody, 
 
 I'm looking for an optimal way to split  a big matrix  (e.g. ncol = 8,
 nrow=8)  into small square submatrices  (e.g. ncol=2, nrow=2)
 
 For example
 
 If I have 
 
   h
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]19   17   25   33   41   49   57
 [2,]2   10   18   26   34   42   50   58
 [3,]3   11   19   27   35   43   51   59
 [4,]4   12   20   28   36   44   52   60
 [5,]5   13   21   29   37   45   53   61
 [6,]6   14   22   30   38   46   54   62
 [7,]7   15   23   31   39   47   55   63
 [8,]8   16   24   32   40   48   56   64
 
 and I want to split matriz h into 16 small submatrices:
 
  g[1]
  [,1] [,2]
 [1,]19
 [2,]2   10
 
  g[2]
  [,1] [,2]
 [1,]   17   25
 [2,]   18   26
 
 ...
 
  g[4]
  [,1] [,2]
 [1,]   49   57
 [2,]   50   58
 
 ...
 
  g[16]
  [,1] [,2]
 [1,]   55   63
 [2,]   56   64
 
 Always the big matrix would be a square matrix and also would have a nrow
 and ncol in order of  a power of two, so it always could  be splitted in at
 least halves. 
 
 Until now, I'm able to split a matrix but keeping the number of original
 matrix columns.
 
 I mean, if I want to split into 16 submatrices, using the following command
 what I get its 4 lists: 
 But I haven't found a way to split resultant lists in order to get 4
 matrices for each list.

Hi.

Try the following.

  k - 3
  n - 2^k
  m - 2^(k - 2)
  a - matrix(1:n^2, nrow=n, ncol=n)
  g - vector(list, length=16)
  
  k - 1
  for (j in 0:3) {
  for (i in 0:3) {
  g[[k]] - a[m*i + 1:m, m*j + 1:m]
  k - k + 1
  }
  }
  
  g[1:5]
  
  [[1]]
   [,1] [,2]
  [1,]19
  [2,]2   10
  
  [[2]]
   [,1] [,2]
  [1,]3   11
  [2,]4   12
  
  [[3]]
   [,1] [,2]
  [1,]5   13
  [2,]6   14
  
  [[4]]
   [,1] [,2]
  [1,]7   15
  [2,]8   16
  
  [[5]]
   [,1] [,2]
  [1,]   17   25
  [2,]   18   26

  ...

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nice report generator?

2012-02-10 Thread Aleksandar Blagotić

Hi guys,

AFAICT, this conversation now discusses some domain-specific issues.
However, in reply to OP and OQ, I'll shamelessly advertise the package that
Gergely Daróczi and I have released recently. It's named rapport, and you
can grab it from CRAN or GitHub (https://github.com/aL3xa/rapport/). Check
it out, as its purpose is to deliver reports in bosses-friendly format.
It relies on pandoc, so you can easily convert it to external formats like
ODT, DOCX, PDF, LaTeX, etc. Check out the project homepage for more
details: http://rapport-package.info/

Cheers,
aL3xa

On Tue, Feb 7, 2012 at 16:45, Martin Studer martin.remo.stu...@gmail.comwrote:

Hi Michael, Hi all,

an alternative for reading/writing tables from/to Excel is the package
XLConnect. It is platform-independent and therefore runs under Windows,
UNIX/Linux Mac. It supports reading/writing worksheets as well as named
ranges. In addition, for reporting purposes, there is support for cell
styles. You can find an example here:

http://miraisolutions.wordpress.com/2011/08/31/xlconnect-a-platform-independent-interface-to-excel/
XLConnect A platform-independent interface to Excel . With respect to
cell
styles: you either completely code up the styling according to your needs
using /setCellStyle/ (which can be cumbersome if you need to apply a lot of
styling) or you can use so-called style actions controlled via
/setStyleAction/. See the XLConnect reference manual for more information.

Best regards,
Martin

--
View this message in context:
http://r.789695.n4.nabble.com/nice-report-generator-tp4169939p4365144.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split matrix into square submatices

2012-02-10 Thread Gabor Grothendieck

On Fri, Feb 10, 2012 at 3:02 AM, xiddw xid...@gmail.com wrote:
 Hi everybody,

 I'm looking for an optimal way to split  a big matrix  (e.g. ncol = 8,
 nrow=8)  into small square submatrices  (e.g. ncol=2, nrow=2)

 For example

 If I have

  h
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]    1    9   17   25   33   41   49   57
 [2,]    2   10   18   26   34   42   50   58
 [3,]    3   11   19   27   35   43   51   59
 [4,]    4   12   20   28   36   44   52   60
 [5,]    5   13   21   29   37   45   53   61
 [6,]    6   14   22   30   38   46   54   62
 [7,]    7   15   23   31   39   47   55   63
 [8,]    8   16   24   32   40   48   56   64

 and I want to split matriz h into 16 small submatrices:

 g[1]
     [,1] [,2]
 [1,]    1    9
 [2,]    2   10

 g[2]
     [,1] [,2]
 [1,]   17   25
 [2,]   18   26

 ...

 g[4]
     [,1] [,2]
 [1,]   49   57
 [2,]   50   58

 ...

 g[16]
     [,1] [,2]
 [1,]   55   63
 [2,]   56   64

 Always the big matrix would be a square matrix and also would have a nrow
 and ncol in order of  a power of two, so it always could  be splitted in at
 least halves.

 Until now, I'm able to split a matrix but keeping the number of original
 matrix columns.

 I mean, if I want to split into 16 submatrices, using the following command
 what I get its 4 lists:
 But I haven't found a way to split resultant lists in order to get 4
 matrices for each list.

 g - split(as.data.frame(h), (1:4))
 g
 $`1`
  V1 V2 V3 V4 V5 V6 V7 V8
 1  1  9 17 25 33 41 49 57
 5  5 13 21 29 37 45 53 61

 $`2`
  V1 V2 V3 V4 V5 V6 V7 V8
 2  2 10 18 26 34 42 50 58
 6  6 14 22 30 38 46 54 62

 $`3`
  V1 V2 V3 V4 V5 V6 V7 V8
 3  3 11 19 27 35 43 51 59
 7  7 15 23 31 39 47 55 63

 $`4`
  V1 V2 V3 V4 V5 V6 V7 V8
 4  4 12 20 28 36 44 52 60
 8  8 16 24 32 40 48 56 64



Try this:

h - matrix(1:64, 8) # input

k - kronecker(matrix(1:16, 4, byrow = TRUE), matrix(1, 2, 2))
g - lapply(split(h, k), matrix, nr = 2)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nice report generator?

 To be strictly correct, shouldn't that be:

 formula- eval(substitute( value*v*LEFT ~ RIGHT, list(LEFT=LEFT,
 RIGHT=RIGHT)))

 ?


 I think it probably doesn't matter.  The difference is that mine gives a
 pure language object, whereas yours gives a formula object.  The formula
 object has a class which means some methods will work differently, and it
 also has an environment attached, which defines where the variables in it
 should be resolved.  I suspect the variables shouldn't be resolved in the
 environment where formula was being created, so it's probably better not
 to attach an environment at all, but the tabular function ignores the
 environment of the formula (it uses its data argument for that), so it
 doesn't make a big difference.

Yes, sorry, I was probably being excessively picky :/  Formula
semantics are tricky though!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trust in a glm.nb model results with an itereation limit reached

2012-02-10 Thread David Winsemius



On Feb 10, 2012, at 7:14 AM, Lucas wrote:


Hello to everyone.
I'm fitting a glm.nb model to a count data. I'm using about 8  
predictive

variables.
Once a run the script I do get a result but it tells me that the  
iteration

limit has been reached.


Most regression functions have a control parameter that can increase  
the maximum number of iterations, but you provided no code so all is  
guesswork from here.



So, can i trust the results given by the model?


Probably not. Some functions will not even record estimates if they  
have failed the convergence tests.



Could it be a multicollinearity problem?


Could be.



Thank you for taking the time to help me.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Q - scatterplots

2012-02-10 Thread John Fox

Dear J.,

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Jhope
 Sent: February-10-12 3:44 AM
 To: r-help@r-project.org
 Subject: [R] Q - scatterplots

 I was able to make a scatterplot but ...

 1) what does the 86 mean? The 86 shows up on the graph as well.

  scatterplot (Shells/TotalEggs ~ Sector, data = data.to.analyze)
 [1] 86

You don't provide enough information to answer your questions. I assume, for
example, that this is the scatterplot() function in the car package. If so,
the function returns the names (in your case, numbers) of outlying
observations, which are also labelled on the graph, but that shouldn't have
happened unless you set the id.n or id.method arguments appropriately.
Beyond the command, you haven't provided any information about the version
of the car package that you're using, or even whether that's what you're
using.

 2) Also how do you change the Y axis title? I don't want it to read
 Shells/TotalEggs, instead I would like it to read Average Hatching
 Rate (%).

See ?scatterplot, in particular, the ylab argument, assuming again that
you're using scatterplot() in the car package.

 3) What does this error mean? Rayos if composed of section 1, 2, 3, 4
 and 5

  scatterplot (Shells/TotalEggs ~ Rayos, data = data.to.analyze)
 Error in plot.window(...) : need finite 'xlim' values In addition:
 Warning messages:
 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by
 coercion
 2: In min(x) : no non-missing arguments to min; returning Inf
 3: In max(x) : no non-missing arguments to max; returning -Inf

I suspect that Rayos isn't numeric but without the data it's impossible to
tell.

Please see the posting guide for r-help
http://www.R-project.org/posting-guide.html for how to formulate an
answerable question.

Best,
 John

John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox

 J

 --
 View this message in context: http://r.789695.n4.nabble.com/Q-
 scatterplots-tp4375632p4375632.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] e1071/svm: never finishes

2012-02-10 Thread Sam Steingold

 * Sam Steingold f...@tah.bet [2012-02-10 10:01:54 -0500]:

 When I tried to run svm on the same data frame, memory usage as reported
 by top(1) doubled to 4GB almost right away and the function never
 returned (has been running for ~15 hours now). ^C does not stop it.
 This is most unusual, libsvm has always seemed very fast.

looks like it _is_ libsvm:

#0  0x72aedc64 in Solver::select_working_set (this=0x7fff97f0, 
out_i=@0x7fff95a0, out_j=@0x7fff95b0) at svm.cpp:852
#1  0x72aef91d in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., 
p_=optimized out, y_=optimized out, alpha_=0x6023fb60, Cp=1, 
Cn=1, eps=optimized out, si=0x7fff9980, shrinking=1) at svm.cpp:573
#2  0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, 
alpha=0x6023fb60, param=optimized out, prob=0x7fff9c30) at svm.cpp:1444
#3  svm_train_one (prob=0x7fff9c30, param=optimized out, Cp=1, Cn=1) at 
svm.cpp:1641
#4  0x72af4a8e in svm_train (prob=optimized out, 
param=0x7fff9d40) at svm.cpp:2179
#5  0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, 
c=optimized out, y=optimized out, rowindex=optimized out, 
colindex=optimized out, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, 
degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, 
nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, 
cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, 
shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, 
probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, 
index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, 
coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, 
cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at 
Rsvm.c:275
#6  0x7792cefc in ?? () from /usr/lib/R/lib/libR.so
#7  0x7795da1d in Rf_eval () from /usr/lib/R/lib/libR.so
#8  0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#9  0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#10 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#11 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#12 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#13 0x779ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so
#14 0x779ada47 in ?? () from /usr/lib/R/lib/libR.so
#15 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#16 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#17 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#18 0x7795f540 in ?? () from /usr/lib/R/lib/libR.so
#19 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#20 0x7795db9b in ?? () from /usr/lib/R/lib/libR.so
#21 0x7795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so
#22 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#23 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#24 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#25 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#26 0x77998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
#27 0x779982e0 in ?? () from /usr/lib/R/lib/libR.so
#28 0x77998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so
#29 0x0040078b in main ()
#30 0x772d930d in __libc_start_main () from 
/lib/x86_64-linux-gnu/libc.so.6
#31 0x004007bd in _start ()


#0  0x72aeeb67 in Kernel::dot (px=0x48eeb220, py=0x4b21890) at 
svm.cpp:295
#1  0x72af7a25 in Kernel::kernel_rbf (this=optimized out, 
i=optimized out, j=optimized out) at svm.cpp:239
#2  0x72af782c in SVC_Q::get_Q (this=0x7fff9870, i=187701, 
len=208039) at svm.cpp:1271
#3  0x72aef9ab in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., 
p_=optimized out, y_=optimized out, alpha_=0x6023fb60, Cp=1,
Cn=1, eps=optimized out, si=0x7fff9980, shrinking=1) at svm.cpp:591
#4  0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, 
alpha=0x6023fb60, param=optimized out, prob=0x7fff9c30) at svm.cpp:1444
#5  svm_train_one (prob=0x7fff9c30, param=optimized out, Cp=1, Cn=1) at 
svm.cpp:1641
#6  0x72af4a8e in svm_train (prob=optimized out, 
param=0x7fff9d40) at svm.cpp:2179
#7  0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, 
c=optimized out, y=optimized out, rowindex=optimized out,
colindex=optimized out, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, 
degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0,
nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, 
cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50,
shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, 
probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40,
index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, 
coefs=0x391dbb48, sigma=0x10358a88,

Re: [R] Importing a CSV file

2012-02-10 Thread David L Carlson

First, make sure the file name is correct.

Second, try 

list.files(Users:/sezginozcan/Downloads/)

to see if the file is listed and you are looking in the correct folder.

Third, you may be able to locate and open the file using

A - read.table(file.choose(), sep=\t, row.names=1, header=T)

I believe that clipboard works only on Windows computers.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of sezgin ozcan
Sent: Thursday, February 09, 2012 9:21 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] Importing a CSV file

I have been trying to import a csv file to r. but I get the same message
everytime. the message is

Error in file(file, rt) : cannot open the connection
In addition: Warning message:
In file(file, rt) :
  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such
file or directory

I use mac.
I tried this command also
a-read.table(clipboard,sep=\t,row.names=1,header=T)
Error: unexpected input in a-read.table(clipboard,sep=,

I will appreciate if you help me before I get crazy.
thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: Importing a CSV file

2012-02-10 Thread Gyanendra Pokharel

-- Forwarded message --
From: Gyanendra Pokharel gyanendra.pokha...@gmail.com
Date: Fri, Feb 10, 2012 at 11:13 AM
Subject: Re: [R] Importing a CSV file
To: sezgin ozcan szgoz...@gmail.com

You save your file in your own folder either in desktop or in document
where ever you want and serach the file using the command

mydatafile - read.csv(file.choose())

then open the file
Cheers

On Thu, Feb 9, 2012 at 10:20 PM, sezgin ozcan szgoz...@gmail.com wrote:

 I have been trying to import a csv file to r. but I get the same message
 everytime. the message is

 Error in file(file, rt) : cannot open the connection
 In addition: Warning message:
 In file(file, rt) :
  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such
 file or directory

 I use mac.
 I tried this command also
 a-read.table(clipboard,sep=\t,row.names=1,header=T)
 Error: unexpected input in a-read.table(clipboard,sep=

 I will appreciate if you help me before I get crazy.
 thank you

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to have columns lined up?

A little more on topic is that there are tools for R to nicely format output.

The simplest is probably the R2wd package which will transfer objects
and text to word, you can send a matrix or data frame to word as a
nicely formatted table or wdVerbatim will send output to word but make
sure that the font is the correct type to make things line up.

If you are going to be doing a lot of this then you should investigate
the variations on Sweave (Sword, knitr, odfWeave, etc.) which allow
you to create a template file with R code and report text, then
process it to replace the R code with its output.

On Tue, Feb 7, 2012 at 9:07 PM, hithit168 ccchri...@live.com wrote:
  Hi there,

 Everytime when I paste My R output in WORD, the columns couldn't line up
 nicely like they appear in  R console. Is there a way to fix this problem?
 Thanks for any help!


 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    6     21       3    0.857  0.0764        0.707        1.000
    7     17       1    0.807  0.0869        0.636        0.977
   10     15       1    0.753  0.0963        0.564        0.942
   13     12       1    0.690  0.1068        0.481        0.900
   16     11       1    0.627  0.1141        0.404        0.851
   22      7       1    0.538  0.1282        0.286        0.789
   23      6       1    0.448  0.1346        0.184        0.712

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-have-columns-lined-up-tp4367928p4367928.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread vaneet

Hello,

I've encountered a very weird issue with the method subset(), or maybe this
is something I don't know about said method that when you're subsetting
based on the columns of a data frame you can only use constants (0.1, 2.3,
2.2) instead of variables?

Here's a look at my data frame called 'ea.cad.pwr':
*ea.ca.pwr[1:5,]
   MAF   OR  POWER
1 0.02 0.01 0.
2 0.02 0.02 0.9998
3 0.02 0.03 0.9997
4 0.02 0.04 0.9995
5 0.02 0.05 0.9993*

Here's my subset lines which finds no rows:

*power1 = subset(ea.cad.pwr, MAF == maf1  OR == odds)
power2 = subset(ea.cad.pwr, MAF == maf2  OR == odds)
*
Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
there's a row with these values:
* ea.cad.pwr[1430:1432,]
 MAF   OR  POWER
1430 0.2 0.58 0.9996
1431 0.2 1.20 0.3092
1432 0.2 1.22 0.3914*

I have code working in a loop and each previous iteration the subset()
function is working fine, but in this iteration some different lines are
executed which are relevant to these variables, here they are:
*
maf1 = maf.adj - 0.01
maf2 = maf.adj + 0.01*

Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
case maf.adj isn't in the table.  maf.adj is read from another dataframe,
when I use it to subset it always works fine but when I do this innocent
subtraction for some reason it doesn't work.  If I rewrite statements like
this it works:

*power1 = subset(ea.cad.pwr, MAF == 0.2  OR == odds)
power2 = subset(ea.cad.pwr, MAF == 0.22  OR == odds)
*

Even if I write this first:

maf1 = 0.2

Then:

power1 = subset(ea.cad.pwr, MAF == maf1  OR == odds)

It works as well! That's what's really confusing, how can this subtraction
mess everything up?  Please help if you can..thank you!

Vaneet

--
View this message in context: 
http://r.789695.n4.nabble.com/problem-subsetting-data-frame-with-variable-instead-of-constant-tp4376759p4376759.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread Petr Savicky

On Fri, Feb 10, 2012 at 08:15:39AM -0800, vaneet wrote:
 Hello,
 
 I've encountered a very weird issue with the method subset(), or maybe this
 is something I don't know about said method that when you're subsetting
 based on the columns of a data frame you can only use constants (0.1, 2.3,
 2.2) instead of variables?
 
 Here's a look at my data frame called 'ea.cad.pwr':
 *ea.ca.pwr[1:5,]
MAF   OR  POWER
 1 0.02 0.01 0.
 2 0.02 0.02 0.9998
 3 0.02 0.03 0.9997
 4 0.02 0.04 0.9995
 5 0.02 0.05 0.9993*
 
 Here's my subset lines which finds no rows:
 
 *power1 = subset(ea.cad.pwr, MAF == maf1  OR == odds)
 power2 = subset(ea.cad.pwr, MAF == maf2  OR == odds)
 *
 Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
 there's a row with these values:
 * ea.cad.pwr[1430:1432,]
  MAF   OR  POWER
 1430 0.2 0.58 0.9996
 1431 0.2 1.20 0.3092
 1432 0.2 1.22 0.3914*
 
 I have code working in a loop and each previous iteration the subset()
 function is working fine, but in this iteration some different lines are
 executed which are relevant to these variables, here they are:
 *
 maf1 = maf.adj - 0.01
 maf2 = maf.adj + 0.01*
 
 Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
 I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
 case maf.adj isn't in the table.  maf.adj is read from another dataframe,
 when I use it to subset it always works fine but when I do this innocent
 subtraction for some reason it doesn't work.  If I rewrite statements like
 this it works:
 
 *power1 = subset(ea.cad.pwr, MAF == 0.2  OR == odds)
 power2 = subset(ea.cad.pwr, MAF == 0.22  OR == odds)
 *
 
 Even if I write this first:
 
 maf1 = 0.2
 
 Then:
 
 power1 = subset(ea.cad.pwr, MAF == maf1  OR == odds)
 
 It works as well! That's what's really confusing, how can this subtraction
 mess everything up?  Please help if you can..thank you!

Hi.

This may be a rounding problem. Try

  0.3 - 0.1 == 0.2

  [1] FALSE

Explicit rounding to a not too large number of decimal
digits can help.

  round(0.3 - 0.1, digits=7) == 0.2

  [1] TRUE


See also FAQ 7.31 or http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem subsetting data frame with variable instead of constant

This is likely a representation issue, as in R FAQ 7.31.

?== suggests that using identical and all.equal is a better strategy:

 x1 - 0.5 - 0.3
 x2 - 0.3 - 0.1
 x1 == x2   # FALSE on most machines
 identical(all.equal(x1, x2), TRUE) # TRUE everywhere

Sarah

On Fri, Feb 10, 2012 at 11:15 AM, vaneet vaneet.lo...@mountsinai.org wrote:
 Hello,

 I've encountered a very weird issue with the method subset(), or maybe this
 is something I don't know about said method that when you're subsetting
 based on the columns of a data frame you can only use constants (0.1, 2.3,
 2.2) instead of variables?

 Here's a look at my data frame called 'ea.cad.pwr':
 *ea.ca.pwr[1:5,]
   MAF   OR  POWER
 1 0.02 0.01 0.
 2 0.02 0.02 0.9998
 3 0.02 0.03 0.9997
 4 0.02 0.04 0.9995
 5 0.02 0.05 0.9993*

 Here's my subset lines which finds no rows:

 *power1 = subset(ea.cad.pwr, MAF == maf1  OR == odds)
 power2 = subset(ea.cad.pwr, MAF == maf2  OR == odds)
 *
 Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
 there's a row with these values:
 * ea.cad.pwr[1430:1432,]
     MAF   OR  POWER
 1430 0.2 0.58 0.9996
 1431 0.2 1.20 0.3092
 1432 0.2 1.22 0.3914*

 I have code working in a loop and each previous iteration the subset()
 function is working fine, but in this iteration some different lines are
 executed which are relevant to these variables, here they are:
 *
 maf1 = maf.adj - 0.01
 maf2 = maf.adj + 0.01*

 Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
 I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
 case maf.adj isn't in the table.  maf.adj is read from another dataframe,
 when I use it to subset it always works fine but when I do this innocent
 subtraction for some reason it doesn't work.  If I rewrite statements like
 this it works:

 *power1 = subset(ea.cad.pwr, MAF == 0.2  OR == odds)
 power2 = subset(ea.cad.pwr, MAF == 0.22  OR == odds)
 *

 Even if I write this first:

 maf1 = 0.2

 Then:

 power1 = subset(ea.cad.pwr, MAF == maf1  OR == odds)

 It works as well! That's what's really confusing, how can this subtraction
 mess everything up?  Please help if you can..thank you!

 Vaneet



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stop a loop for?

If you want to pause for the person to look at a plot before going on
to the next plot then just do:

 par(ask=TRUE)

This will actually allow your loop to continue with calculations while
the user looks at the plot but will pause before drawing the next plot
(hitting enter in the command window or clicking on the plot window
will allow the code to continue).

If you want to pause for something other than a plot then use readline
like Michael suggests.

On Wed, Feb 8, 2012 at 12:45 PM, Juan Andres Hernandez
jhernandezcabr...@gmail.com wrote:
 Hi all, I have some time trying to find a way to stop a loop for( ) until the
 user presses the enter key or any other one and the loop can continue.
 This could
 be an example:

  library(MASS)
  data - data.frame(mvrnorm(1000,rep(0,5),Sigma=diag(1,5)))
  for(i in 1:dim(data)[2]){
  plot(density(data[,i]), main=paste('histogram',i))
  #here something like waituntil command
  }

 Thank's in advance
 Juan A. Hernandez
 Spain.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem subsetting data frame with variable instead of constant

2012-02-10 Thread vaneet

Thanks guys, both those solutions work.  I really appreciate the help!

--
View this message in context: 
http://r.789695.n4.nabble.com/problem-subsetting-data-frame-with-variable-instead-of-constant-tp4376759p4376826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How indices calculated in package boot

In your second example the boot function will still generate its
random indicies before your internal function calls sample, so the
seed will be different when you call sample from what you originally
set it to.  If you really want to know what the boot function does,
look at its code (it does some things to try and be more efficient
which might give different results than a simpler version).

On Wed, Feb 8, 2012 at 1:45 PM, Deng Nina nnd...@gmail.com wrote:
 Hi,there,

 I am using R package boot to bootstrap. I have one question here: does
 anybody possibly know how the boot package generates the indices which is
 used in the statistic function?

 I thought indices = sample(data, replace=TRUE), but when I replaced
 indices with this command and used boot, I got different results.

 Specifically, below are the codes for illustration.

 (1) The typical way by generating indices in the package:
 boot1 - function (data, indices)
 {
  d - data[indices]
  return(d)
 }
 AA - c(1:10)
 require(boot)
 set.seed(123)
 results1 - boot(data= AA, statistic=boot1, R=100)

 (2) The alternative way by calculating indices myself:
 boot2 - function (data,indices)
 {
  indices - sample(data, replace=TRUE)
  d - data[indices]
  return(d)
  }
 AA - c(1:10)
 set.seed(123)
 results2 - boot(data= AA, statistic=boot2, R=100)

 When I looked up using results1$t and results2$t, I had totoally different
 bootstrap samples. I found this even had great impacts on the results in my
 study. Does the second approach have any problem? Anyone could provide any
 inputs on this? Thank you very much in advance!
 Regards
 Nina

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] function arrows.circular not working

2012-02-10 Thread Chosid, David (MISC)

I have started using the circular package but it is not recognizing the 
function arrows.circular.  I attempted to use the example provided in the 
circular manual.  Here is the example code using the circular package:

  plot(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

My error is:  Error: could not find function arrows.circular

Any help would be greatly appreciated.  Thanks.

David Chosid
Conservation Engineering
MA Division of Marine Fisheries
1213 Purchase St., 3rd Floor
New Bedford, MA 02740
508-990-2860
david.cho...@state.ma.usmailto:david.cho...@state.ma.us


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function arrows.circular not working

Your sessionInfo() would be helpful. Also, just to check: you did do
library(circular)
right?

Sarah

On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC)
david.cho...@state.ma.us wrote:
 I have started using the circular package but it is not recognizing the 
 function arrows.circular.  I attempted to use the example provided in the 
 circular manual.  Here is the example code using the circular package:

  plot(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
    x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

 My error is:  Error: could not find function arrows.circular

 Any help would be greatly appreciated.  Thanks.

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function arrows.circular not working

2012-02-10 Thread Jorge I Velez

Hi David,

You need to load the package before running your code:

# install.packages('circular')
require(circular)
 plot(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
   x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

HTH,
Jorge.-


On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC)  wrote:

 I have started using the circular package but it is not recognizing the
 function arrows.circular.  I attempted to use the example provided in the
 circular manual.  Here is the example code using the circular package:

  plot(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

 My error is:  Error: could not find function arrows.circular

 Any help would be greatly appreciated.  Thanks.

 David Chosid
 Conservation Engineering
 MA Division of Marine Fisheries
 1213 Purchase St., 3rd Floor
 New Bedford, MA 02740
 508-990-2860
 david.cho...@state.ma.usmailto:david.cho...@state.ma.us


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed please

2012-02-10 Thread ilai

Your script is rather inefficient with spurious cbind calls. Any
particular reason not to use
?ar directly ?

Call:
ar.yw.default(x = simtimeseries, order.max = 4)

Coefficients:
  1234
 1.9440  -1.9529   0.8450  -0.2154

Order selected 4  sigma^2 estimated as  15.29

To repeat the sim, you could use a for() loop but ?sapply is better:

out- sapply(1:100,function(...){
  simtimeseries - arima.sim(n=1024,list(order=c(4,0,0),
ar=c(2.7607, -3.8106, 2.6535,
-0.9258),sd=sqrt(1)))
  aryule - ar.yw(simtimeseries,order.max=4)
  c( c(aryule$ar,NA)[1:4] , aryule$var.pred )
  }
)
rowMeans(out[1:4,])   # mean phi(1),...,4 see ?rowMeans for dealing with NA's
mean(out[5,])# mean sig^2

Cheers




On Fri, Feb 10, 2012 at 6:42 AM, Jaymin Shah jayminsh...@hotmail.com wrote:
 I have coded a time series from simulated data:

 simtimeseries - arima.sim(n=1024,list(order=c(4,0,0),ar=c(2.7607, -3.8106, 
 2.6535,  -0.9258),sd=sqrt(1)))
 #show roots are outside unit circle
 plot.ts(simtimeseries, xlab=, ylab=, main=Time Series of Simulated Data)

 # Yule 
 

 q1 - cbind(simtimeseries[1:1024])
 q2 - t(q1)%*%q1
 s0 - q2/1204
 r1 - cbind(simtimeseries[1:1023])
 r2 - cbind(simtimeseries[2:1024])
 r3 - t(r1)%*%r2
 s1 - r3/1204
 t1 - cbind(simtimeseries[1:1022])
 t2 - cbind(simtimeseries[3:1024])
 t3 - t(t1)%*%t2
 s2 - t3/1204
 u1 - cbind(simtimeseries[1:1021])
 u2 - cbind(simtimeseries[4:1024])
 u3 - t(u1)%*%u2
 s3 - u3/1204
 v1 - cbind(simtimeseries[1:1020])
 v2 - cbind(simtimeseries[5:1024])
 v3 - t(v1)%*%v2
 s4 - v3/1204

 i0 - c(s0,s1,s2,s3)
 i1 - c(s1,s0,s1,s2)
 i2 - c(s2,s1,s0,s1)
 i3 - c(s3,s2,s1,s0)

 gamma - cbind(i0,i1,i2,i3)
 eta -c(s1,s2,s3,s4)
 inversegamma - solve(gamma)
 phihat - inversegamma%*%eta
 phihat

 Phihat - cbind(phihat)
 s - c(s1,s2,s3,s4)
 S - cbind(s)
 sigmasquaredyule - s0 - (t(Phihat)%*%S)
 sigmasquaredyule



 I did a yule walker estimate on the simulated data and wanted to work out phi 
 hat which is a vector of 4 values and sigmasquaredyule which is one value. 
 However,  I want to run the simulated data 100 times i.e. in a for loop and 
 then take the averages of the phi hat values and sigmasquaredyule value.

 How would i repeat this simulated time series lots of times (e.g. a 100 
 times) and store the average value of phi hat and sigmasquaredyule.

 Thank you


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function arrows.circular not working

2012-02-10 Thread Chosid, David (MISC)

Yes, sorry to not be clear.  library(circular) was run.

From: Jorge I Velez [mailto:jorgeivanve...@gmail.com]
Sent: Friday, February 10, 2012 12:03 PM
To: Chosid, David (FWE)
Cc: r-help@r-project.org
Subject: Re: [R] function arrows.circular not working

Hi David,

You need to load the package before running your code:

# install.packages('circular')
require(circular)
 plot(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
   x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

HTH,
Jorge.-

On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC)  wrote:
I have started using the circular package but it is not recognizing the 
function arrows.circular.  I attempted to use the example provided in the 
circular manual.  Here is the example code using the circular package:

 plot(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1))
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2)
 arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
   x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

My error is:  Error: could not find function arrows.circular

Any help would be greatly appreciated.  Thanks.

David Chosid
Conservation Engineering
MA Division of Marine Fisheries
1213 Purchase St., 3rd Floor
New Bedford, MA 02740
508-990-2860tel:508-990-2860
david.cho...@state.ma.usmailto:david.cho...@state.ma.usmailto:david.cho...@state.ma.usmailto:david.cho...@state.ma.us

   [[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outlier removal techniques

2012-02-10 Thread S Ellison

 

 -Original Message-
 I wonder why it is still standard practice in some circles to 
 search for outliers as opposed to using robust/resistent methods.  

At the risk of extending an old debate and driving us off list topic, here are 
three possible reasons:
i) Identifying outliers is important when you want to find possible mistakes in 
measurement or data entry - so irrespective of whether you use robust methods, 
you probably want to ask questions like 'why has that result been entered as 
almost exactly 1000 times the value I expected?' [typically a unit error, btw). 
And although graphical outlier checking is the obvious way to do that, eyeballs 
see oddity in chance; an outlier test can help you distinguish oddity from 
chance and save some (arguably) unnecessary follow-up. 

ii) because supervised outlier rejection at around the 99% level performs - for 
simple problems - about as well as Huber's with c set to 1.5 and is a lot 
easier to explain to, er, people who don't understand iterative numerical 
methods.

iii) Because it's written into some international Standards for statistical 
processing of data (ie, it's standard practice because it's Standard practice).

iv) because you can't do robust analysis in Excel* 

Not that all these are necesarily _good_ reasons ... ;-)

However, I do NOT understand why schools in the UK teach physics students that 
outliers should automatically and always be thrown out; that's a much larger 
leap.

*You can actually; with R or several add-ins. But that is off topic.
***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split dataframe into new dataframes

I think one of your problems (the others have been addressed by
others) is that you want the expression x$y to represent a column of x
whose name is stored in y (not the name y itself).  The problem here
is that the $ notation is a magical shortcut and like any other magic
if used incorrectly is likely to do the programmatic equivalent of
turning yourself into a toad.  It is better to use '[[' instead of the
shortcut, try replacing x$y with x[[y]] and see if that fixes that
problem.

On Wed, Feb 8, 2012 at 2:11 PM, Johannes Radinger jradin...@gmx.at wrote:
 Hi,

 I want to split a dataframe based on a grouping variable (in one column). The 
 resulting new
 dataframes should be stored in a new variable. I tried to split the dataframe 
 using split() and
 to store it using a FOR loop, but thats not working so far:

 df - data.frame(A=c(A1,A1,A2,A2),B=seq(1:4))

 Fsplit - function(x,y){
        ls - split(x,f=x$y)
        for (i in names(ls)){
                i - ls$i
        }
 }

 Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping 
 variable


 Any suggestions how to get that done?

 Best regards
 Johannes
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function arrows.circular not working

Hi,

On Fri, Feb 10, 2012 at 12:15 PM, Chosid, David (MISC)
david.cho...@state.ma.us wrote:
 Yes, sorry for not being more clear.  Here is the sessionInfo():

 sessionInfo()
 R version 2.9.1 (2009-06-26)
 i386-pc-mingw32

That is definitely the place to start. Your version of R is 2.5 years old,
and circular is up to version 0.4-3 on CRAN. I just checked, and circular
0.3-8 doesn't *have* arrows.circular. So you must be using some
documentation from the internet for a newer version, rather than using
the docs that go with what you have installed on your computer.

You need to update R and your packages.

Sarah


 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
 States.1252;LC_MONETARY=English_United 
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

 attached base packages:
 [1] grid      tcltk     stats     graphics  grDevices utils     datasets
 [8] methods   base

 other attached packages:
 [1] relimp_1.0-1    Rcmdr_1.4-10    car_1.2-14      circular_0.3-8
 [5] boot_1.2-37     lattice_0.17-25 RODBC_1.3-0




 -Original Message-
 From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Sent: Friday, February 10, 2012 12:02 PM
 To: Chosid, David (FWE)
 Cc: r-help@r-project.org
 Subject: Re: [R] function arrows.circular not working

 Your sessionInfo() would be helpful. Also, just to check: you did do
 library(circular)
 right?

 Sarah

 On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) 
 david.cho...@state.ma.us wrote:
 I have started using the circular package but it is not recognizing the 
 function arrows.circular.  I attempted to use the example provided in the 
 circular manual.  Here is the example code using the circular package:

  plot(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
 col=2)
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
    x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

 My error is:  Error: could not find function arrows.circular

 Any help would be greatly appreciated.  Thanks.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] debug in a loop

2012-02-10 Thread ikuzar

Hi, 

I'd like to debug in a loop (using debug() and browser() etc but not print()
). I'am looking for the first occurence of NA.
For instance:

tab = c(1:300)
tab[250] = NA
len = length(tab)
for (i in 1:len){
   if(i != len){
 tab[i] = tab[i]+tab[i+1]
   }
}

I do not want to do Browse[2] n for each step ... I'd like to declare a
browser() in the loop with a condition. But how to write stop running
when you encounter NA ?

Thanks for your help

--
View this message in context: 
http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Formatting Y axis.

2012-02-10 Thread sock.o

I've looked around and I just can't find anything that will work for my
needs. This is a bit of a 2 part question but pertaining to the same topic
so bare with me. 

The first is with my qq plot. On the Y axis of my qq plot it'll have my
sample quantities but because my data is log-normal it'll show numbers
between 0 - 5 (depending on the data). I'd like to know how to get it,
instead of displaying 0-5 to display actual values so that way when reading
it I know where my potential problems are. Would just save a lot of time and
be an easier more understandable read. Here's sample code to plug in and
see. 

x - rlnorm(400, 2.3, .8)
y - rlnorm(400, 3, 1.6)

plot.new()
q1 - qqnorm(log1p(y))
q2 - qqnorm(log1p(x))
points( q1, col = red)
points( q2, col = blue)
abline( qqline(log1p(y), col = red) )
abline( qqline(log1p(x), col = blue) )
legend( bottomright, inset = 0.02, title = Loc_ID, 
c(Sayincom,
Sayout), fill = c( red, blue), horiz = TRUE)

So what I want is for the y axis. or even secondary y axis if it's lined up
properly to read the actual values rather than the logged values. 

The second part of this is in regards to my boxplots. In order to fit all of
the data in a readable manner I've been making log = y. Which is fine, but
when trying to estimate the difference via notches it's difficult to
estimate how far it really is based on the axis being logged. You can use
the same x as above, or y, whichever floats your boat. 

boxplot ( x,  notch = TRUE, log = y, boxwex = .50)

Versus 

boxplot ( x,  notch = TRUE, boxwex = .50)

What I need from this is for the ability to zoom in on a particular
location. Rather than be forced to look at things in a log format on my y
axis to just be able to say, ok I have my mean at 25 and my standard
deviation is 10 so I want to concentrate this boxplot between 5 and 45 to
view 2 standard deviations. (That whole mean thing and standard deviations
are arbitrary numbers but the concept remains)

Any help would be much appreciated. If there are particular packages
required for any suggestions please include their name in your response. 

If you need actual imaging of what the actual graphs I'm working with look
like as opposed to the randomly generated values that I'm supplying then
just say so and I'll post the actual graphs I'm trying to work with. 

Sincerely, 

Sock


--
View this message in context: 
http://r.789695.n4.nabble.com/Formatting-Y-axis-tp4376843p4376843.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Find exit status inside on.exit

2012-02-10 Thread McGehee, Robert

Hello,
Does anyone know if it is possible to find out whether a calling function 
exited with an error or not inside of an on.exit() call? I'd like to use the 
same error cleanup function inside of several functions, and for aesthetic 
reasons would prefer not to surround the body of all of these function with a 
tryCatch.

Something like this:

isError - function() {
parentError - ???
if (!parentError) cat(No errors here!\n)
}

X - function() {
on.exit(isError())
1+1
}

 X()
[1] 2
No errors here!

Thanks,
Robert

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Direct: (617)392-8396

This e-mail, and any attachments hereto, are intended fo...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split matrix into square submatices

2012-02-10 Thread xiddw

Thanks to Peter and Gabriel for their comments,

The first way which Peter commented was my first approach, however as I use
it for a 'large' matrix (in my case nrow = ncol = 512) and I want to split
it into very small matrices (e.g. nrow = ncol = {4, 8, 16}) both nested
loops I think are a quite expensive and that's why I was looking for a
better way.

The Gabor's approach works perfect for my needs, thanks :)

--
View this message in context: 
http://r.789695.n4.nabble.com/Split-matrix-into-square-submatices-tp4375563p4376952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] debug in a loop

2012-02-10 Thread Justin Haynes

You can add

if(is.na(tab[i])) browser()

or

if(is.na(tab[i])) break

see inline

On Fri, Feb 10, 2012 at 7:22 AM, ikuzar raz...@hotmail.fr wrote:

 Hi,

 I'd like to debug in a loop (using debug() and browser() etc but not
 print()
 ). I'am looking for the first occurence of NA.
 For instance:

 tab = c(1:300)
 tab[250] = NA
 len = length(tab)
 for (i in 1:len){
   if(i != len){

   if(is.na(tab[i])) browser()

 tab[i] = tab[i]+tab[i+1]
   }
 }

 I do not want to do Browse[2] n for each step ... I'd like to declare a
 browser() in the loop with a condition. But how to write stop running
 when you encounter NA ?

 Thanks for your help

 --
 View this message in context:
 http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function arrows.circular not working

It's a good idea to acknowledge that you've found a
solution on the R-help list, rather than just to me.

That way the answer appears in the list archives, and
other people will know you no longer need help with
this problem.

Sarah

On Fri, Feb 10, 2012 at 12:57 PM, Chosid, David (MISC)
david.cho...@state.ma.us wrote:
 Ah.  Thanks.  I didn't realize that I needed an updated R... Just the 
 library.  Works now.

 -Original Message-
 From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Sent: Friday, February 10, 2012 12:32 PM
 To: Chosid, David (FWE); r-help
 Subject: Re: [R] function arrows.circular not working

 Hi,

 On Fri, Feb 10, 2012 at 12:15 PM, Chosid, David (MISC) 
 david.cho...@state.ma.us wrote:
 Yes, sorry for not being more clear.  Here is the sessionInfo():

 sessionInfo()
 R version 2.9.1 (2009-06-26)
 i386-pc-mingw32

 That is definitely the place to start. Your version of R is 2.5 years old, 
 and circular is up to version 0.4-3 on CRAN. I just checked, and circular
 0.3-8 doesn't *have* arrows.circular. So you must be using some documentation 
 from the internet for a newer version, rather than using the docs that go 
 with what you have installed on your computer.

 You need to update R and your packages.

 Sarah


 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

 attached base packages:
 [1] grid      tcltk     stats     graphics  grDevices utils
 datasets [8] methods   base

 other attached packages:
 [1] relimp_1.0-1    Rcmdr_1.4-10    car_1.2-14      circular_0.3-8 [5]
 boot_1.2-37     lattice_0.17-25 RODBC_1.3-0




 -Original Message-
 From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
 Sent: Friday, February 10, 2012 12:02 PM
 To: Chosid, David (FWE)
 Cc: r-help@r-project.org
 Subject: Re: [R] function arrows.circular not working

 Your sessionInfo() would be helpful. Also, just to check: you did do
 library(circular)
 right?

 Sarah

 On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) 
 david.cho...@state.ma.us wrote:
 I have started using the circular package but it is not recognizing the 
 function arrows.circular.  I attempted to use the example provided in the 
 circular manual.  Here is the example code using the circular package:

  plot(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1))
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
 col=2)
  arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10),
    x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3)

 My error is:  Error: could not find function arrows.circular

 Any help would be greatly appreciated.  Thanks.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a) t-tests on loess splines; b) linear models, type II SS for unbalanced ANOVA

2012-02-10 Thread Peter Davenport

Dear all,

I have some questions regarding the validity an implementation of
statistical tests based on linear models and loess. I've searched the
R-help arhives and found several informative threads that related to my
questions, but there are still a few issues I'm not clear about. I'd be
grateful for guidance.

Background and data set:
I wish to compare the growth and metabolism of 2 strains of bacteria. I
have grown the 2 strains in 3 replicates on 2 different occasions and on
each occasion taken hourly measurements of the population density of
bacteria (represented by od (optical density)) and the concentrations of
several chemicals, such as glycerol, in their growth medium (environment).

The data looks like this:

 head(df.yx.t)
   uid  cid date  g tod   glycerolalanine
66 Ma,1   25 Ma 1 0.1428571 3025750049 7843841013
77 Ma,1   25 Ma 2 0.4542857 3026668036 7902016686
88 Ma,1   25 Ma 3 1.4542857 3086406597 8017589237
99 Ma,1   25 Ma 4 2.587 2821935918 6595302338
10  10 Ma,1   25 Ma 5 3.093 2674142017 6144141491
11  11 Ma,1   25 Ma 6 3.600 3026824144 7009788499

 summary(df.yx.t)
  uid   cid dategt
odglycerolalanine
 6  :  1   Ma,1   :10   25:60   Ma   :60   1  :12   Min.
:0.1171   Min.   :7.305e+08   Min.   :9.858e+07
 7  :  1   Ma,2   :10   26:60   RILI2:60   2  :12   1st
Qu.:1.3921   1st Qu.:1.757e+09   1st Qu.:3.199e+09
 8  :  1   Ma,3   :10  3  :12   Median
:3.8042   Median :2.541e+09   Median :5.664e+09
 9  :  1   Ma,4   :10  4  :12   Mean
:3.7822   Mean   :2.354e+09   Mean   :5.264e+09
 10 :  1   Ma,5   :10  5  :12   3rd
Qu.:5.9495   3rd Qu.:3.026e+09   3rd Qu.:7.699e+09
 11 :  1   Ma,6   :10  6  :12   Max.
:8.7375   Max.   :3.296e+09   Max.   :8.501e+09
 (Other):114   (Other):60
(Other):48

uid = unique id for an observation
cid = culture id (a within subjects factor identifying each bacterial
culture)
date = date of culture; data was collected in 2 batches, with a
balanced split of observations (3 replicates of each genotype in each batch)
g = genotype (the name of the bactrial strain)
t = time (1-10 hours)
od = optical density (a proxy for poulation density of the bacterial)
glycerol = glycerol concentration
alanine = alanine concentration

I have tested whether Genotype affects the growth (od) of the bacteria
using lm() and anova().

#define the model
m1-lm(log10(od)~date*g*t,data=df.yx.t)
#perform anova
 anova(m1)
Analysis of Variance Table

Response: log10(od)
  Df Sum Sq Mean Sq   F valuePr(F)
date   1  0.006  0.00565.3763   0.02297 *
g  1  0.107  0.1066  103.2951 4.646e-16 ***
t  9 33.646  3.7385 3620.9069  2.2e-16 ***
date:g 1  0.003  0.00302.8731   0.09396 .
date:t 9  0.015  0.00171.6542   0.11417
g:t9  0.021  0.00232.2329   0.02796 *
date:g:t   9  0.005  0.00050.4950   0.87383
Residuals 80  0.083  0.0010
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

So far so good. (Although I do wonder whether I should be using a within
subjects predictor variable (e.g. via car::Anova).) However, I also want to
test whether genotype affects the concentrations of metabolites, e.g.
glycerol, in the growth medium at a given population density. I envisage
testing this by one of 2 methods, neither of which I am confident is
correct, despite digging aound in various fora and statistics texts.

Method 1. Loess splines
I could fit a loess curve to glycerol~od, (the date effect disappears
when plotting against od rather than t).

m1-loess(glycerol~od, data=df.yx.t[df.yx.t$g==Ma,])
m2-loess(glycerol~od, data=df.yx.t[df.yx.t$g==RILI2,])

However, I'm not confident that I can perform valid statistic tests
based on this.

I could try to test the whether there is a significant difference in
glycerol concentration at any given od as follows:

#test for a difference at od=7
v.pred-7
pred1-predict(m1,newdata=data.frame(od=v.pred),se=T)
pred2-predict(m2,newdata=data.frame(od=v.pred),se=T)


 
dfit-data.frame(g=c(Ma,RILI2),fit=c(pred1$fit,pred2$fit),se=c(pred1$fit,pred2$fit))
alpha-0.05
dfit$lo-with(dfit,fit-se*abs(qnorm(alpha/2)))
dfit$hi-with(dfit,fit+se*abs(qnorm(alpha/2)))

Is this legitimate?

I might also try to test whether the curves are significantly different
using anova(m1,m2), though again I'm not sure of the validity of this
approach since the 2 models are based on different (and different numbers
of) observations.



Method 2. Linear models

[R] Schwefel Function Optimization

2012-02-10 Thread Vartanian, Ara

All,

I am looking for an optimization library that does well on something as chaotic 
as the Schwefel function:

schwefel - function(x) sum(-x * sin(sqrt(abs(x

With these guys, not much luck:

 optim(c(1,1), schwefel)$value
[1] -7.890603
 optim(c(1,1), schwefel, method=SANN, control=list(maxit=1))$value
[1] -28.02825
 optim(c(1,1), schwefel, lower=c(-500,-500), upper=c(500,500), 
 method=L-BFGS-B)$value
[1] -7.890603
 optim(c(1,1), schwefel, method=BFGS)$value
[1] -7.890603
 optim(c(1,1), schwefel, method=CG)$value
[1] -7.890603

All trapped in local minima. I get the right answer when I pick a starting 
point that's close:

 optim(c(400,400), schwefel, lower=c(-500,-500), upper=c(500,500), 
 method=L-BFGS-B)$value
[1] -837.9658

Of course I can always roll my own:

r - vector()
for(i in 1:1000) {
  x - runif(2, -500,500)
  m - optim(x, schwefel, lower=c(-500,-500), upper=c(500,500), 
method=L-BFGS-B)
  r - rbind(r, c(m$par, m$value))
}

And this does fine. I'm just wondering if this is the right approach, or if 
there is some other package that wraps this kind of multi-start up so that the 
user doesn't have to think about it.

Best,

Ara


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] range and anisotropy with RandomFields

2012-02-10 Thread jusin

Hello, 

I am presently trying to get a feel for the various packages out there that
allow me to both analyze and simulate random fields. The package
RandomFields is nice, but there are still a few aspects of its
implementation that are confusing to me and I was hoping someone could help
clarify things for me. It could also be that my questions reflect a lack of
knowledge pertaining to random fields in general and not the RandomFields
package specifically, in which case someone can set me straight. 

I like the versatility of the extended definition form that can be used in
RandomFields. For example, specifying an isotropic exponential model with a
nugget of 1 as: 

modelA=list(+, 
  list($, var=2, scale=2, list(exp)),
  list($, var=1, list(nugget))
)

Ultimately I would like to work with anisotropic models (and data), and this
is where I get confused. I can specify a model with (geographic) anisotropy
by defining an anisotropy matrix, m and incorporating that into the
definition: 

m = matrix( c(1,0,0,0.1), nrow=2)
modelA=list(+, 
  list($, var=2, anis=m, list(exp)),
  list($, var=1, list(nugget))
)

First, nowhere in the RandomFields documentation does it say how  the
anisotropy matrix is actually defined, so I am assuming it is the usual  

A = [  1  0 ]  [  cos(phi)  -sin(phi)]
   0  R   sin(phi)cos(phi)

Where R is the ratio of the major range/minor range, and phi is the angle of
rotation.  Can anyone confirm this? 

Also, there now  seems to be no way to explicitly specify the range of the
major axis (it states this in the documentation and an error will be
returned if I do try to set e.g. scale=2 in the definition) .  Thus when I
use e.g. GaussRF() to simulate a field using the above anisotropic model
definition I have no idea how the range (scale) is being determined.  Does
anyone now how RandomFields works in this context? 

Thanks. 


--
View this message in context: 
http://r.789695.n4.nabble.com/range-and-anisotropy-with-RandomFields-tp4377104p4377104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Discrete Event Simulation problem

2012-02-10 Thread jism7690

Hi All,

I am attempting to simulation an inventory model on R and I am having some
problems. I believe the problem is when I define my demand rate is stays
constant throughout so when I do need to reorder the model does not
recognise it as I have the initial supply arrival time set to infinity at
the start as no order has been played but I have an if statement saying, if
the level falls below a certain point then an order should be placed, I have
the expected time to order arrival set to be before the the sale but this is
not working.

Any Help would be appreciated. Thanks

maxStock = 100
minStock = 20
t.max=1100.0

LAST = t.max
START = 0

GetDemand-function() START + runif(1,min=0,max=2)

main - function(t.max,maxStock,minStock)
{
index = 0
  t.current = START    Starting Conditions
t.demand = START
t.supply = START
inventory = 50
order_costs = 0
storage_costs = 0
sales = 0
sum = list(inventory = 50,order_costs = 0,storage_costs = 0,sales =0)

while(index  LAST){
index = index + 1
t.demand = GetDemand()  ### expected time to next sale
t.supply = Inf  ### expected time to arrival of order, Infinity 
as order
has not been placed
t.next =min(t.demand,t.supply)###next event either sale or supply is
the one with imminent arrival
k = maxStock - inventory

  if(inventory  minStock)
  {
t.supply = Inf  ### expected time to arrival of order, Infinity 
as
order has not been placed
  }
  if(inventory  0)
  storage_costs = (t.next-t.current)*0.10*inventory
  t.current = t.next
  
 if (inventory  minStock)
   {###Need to Order
 k = maxStock - inventory
 order_costs = 50 + 0.02*k
 sum$order_costs = sum$order_costs + order_costs
 t.supply =  t.current + 1.0
}

if(t.current ==t.demand)
  {
  sum$inventory = sum$inventory - 1.0   
   
Sale made
sum$sales = sum$sales + 1.0
t.demand = runif(1,min=0,max=2)
}

if(t.current == t.supply)
  { 
Order Arrives
sum$inventory = sum$inventory + k
k = 0
t.supply = Inf
}

  if(inventory  maxStock)
  {
  k = maxStock - inventory
  sum$storage_costs = sum$storage_costs + storage_costs
  sum$order_costs = sum$order_costs + order_costs
  sum$inventory = sum$invneotry + k - sum$sales
  sum$sales = sum$sales + sales
  sum$sales = sum$sales + sales
  }

}
sis = list(Time = index,StorageCosts =sum$storage_costs,OrderCosts=
sum$order_costs, FinalInventoryLevel=sum$inventory,sales=sum$sales ,
AverageCosts =((sum$order_costs + sum$storage_costs)/index))
return(sis)
}
main()

--
View this message in context: 
http://r.789695.n4.nabble.com/Discrete-Event-Simulation-problem-tp4377276p4377276.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Importing a CSV file

2012-02-10 Thread MacQueen, Don

I think you must have given the path to the file wrong.

Assuming you are using R.app, try

  infile - file.choose()

And then give infile to the read.csv() function.

Then print the value of infile to find out what the path really is.

Probably, you wanted
  '/Users/sezginozcan/Downloads/beer.data.csv'

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 2/9/12 7:20 PM, sezgin ozcan szgoz...@gmail.com wrote:

I have been trying to import a csv file to r. but I get the same message
everytime. the message is

Error in file(file, rt) : cannot open the connection
In addition: Warning message:
In file(file, rt) :
  cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such
file or directory

I use mac.
I tried this command also
a-read.table(clipboard,sep=”\t”,row.names=1,header=T)
Error: unexpected input in a-read.table(clipboard,sep=‚

I will appreciate if you help me before I get crazy.
thank you

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Custom caret metric based on prob-predictions/rankings

Sorry for not being more clear - I'm interested in accessing these
indices from within the trainControl summaryFunction, not afterward
(from the train object).

As for the weights, I'm referring to the weights argument passed into
train.

On Fri, Feb 10, 2012 at 5:50 AM, Max Kuhn mxk...@gmail.com wrote:
 I think you need to read the man pages and the four vignettes. A lot
 of your questions have answers there.

 If you don't specify the resampling indices, they ones generated for
 you are saved in the train object:

 data(iris)
 TrainData - iris[,1:4]
 TrainClasses - iris[,5]

 knnFit1 - train(TrainData, TrainClasses,
 +                  method = knn,
 +                  preProcess = c(center, scale),
 +                  tuneLength = 10,
 +                  trControl = trainControl(method = cv))
 Loading required package: class

 Attaching package: ‘class’

 The following object(s) are masked from ‘package:reshape’:

    condense

 Warning message:
 executing %dopar% sequentially: no parallel backend registered
 str(knnFit1$control$index)
 List of 10
  $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ...
  $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ...
  $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ...
  $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ...
  $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ...
  $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
  $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ...
  $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ...
  $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
  $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ...

 There is also a savePredictions argument that gives you the hold-out results.

 I'm not sure which weights you are referring to.

 On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang yanghates...@gmail.com wrote:
 Actually, is there any way to get at additional information beyond the
 classProbs?  In particular, is there any way to find out the
 associated weights, or otherwise the row indices into the original
 model matrix corresponding to the tested instances?

 On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote:
 Oops, found trainControl's classProbs right after I sent!

 On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 I'm dealing with classification problems, and I'm trying to specify a
 custom scoring metric (recall@p, ROC, etc.) that depends on not just
 the class output but the probability estimates, so that caret::train
 can choose the optimal tuning parameters based on this metric.

 However, when I supply a trainControl summaryFunction, the data given
 to it contains only class predictions, so the only metrics possible
 are things like accuracy, kappa, etc.

 Is there any way to do this that I'm looking?  If not, could I put
 this in as a feature request?  Thanks!

 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Max



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Custom caret metric based on prob-predictions/rankings

(I couldn't find answers to this question in the documentation)

On Fri, Feb 10, 2012 at 11:59 AM, Yang Zhang yanghates...@gmail.com wrote:
 Sorry for not being more clear - I'm interested in accessing these
 indices from within the trainControl summaryFunction, not afterward
 (from the train object).

 As for the weights, I'm referring to the weights argument passed into
 train.

 On Fri, Feb 10, 2012 at 5:50 AM, Max Kuhn mxk...@gmail.com wrote:
 I think you need to read the man pages and the four vignettes. A lot
 of your questions have answers there.

 If you don't specify the resampling indices, they ones generated for
 you are saved in the train object:

 data(iris)
 TrainData - iris[,1:4]
 TrainClasses - iris[,5]

 knnFit1 - train(TrainData, TrainClasses,
 +                  method = knn,
 +                  preProcess = c(center, scale),
 +                  tuneLength = 10,
 +                  trControl = trainControl(method = cv))
 Loading required package: class

 Attaching package: ‘class’

 The following object(s) are masked from ‘package:reshape’:

    condense

 Warning message:
 executing %dopar% sequentially: no parallel backend registered
 str(knnFit1$control$index)
 List of 10
  $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ...
  $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ...
  $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ...
  $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ...
  $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ...
  $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
  $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ...
  $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ...
  $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ...
  $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ...

 There is also a savePredictions argument that gives you the hold-out results.

 I'm not sure which weights you are referring to.

 On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang yanghates...@gmail.com wrote:
 Actually, is there any way to get at additional information beyond the
 classProbs?  In particular, is there any way to find out the
 associated weights, or otherwise the row indices into the original
 model matrix corresponding to the tested instances?

 On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote:
 Oops, found trainControl's classProbs right after I sent!

 On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 I'm dealing with classification problems, and I'm trying to specify a
 custom scoring metric (recall@p, ROC, etc.) that depends on not just
 the class output but the probability estimates, so that caret::train
 can choose the optimal tuning parameters based on this metric.

 However, when I supply a trainControl summaryFunction, the data given
 to it contains only class predictions, so the only metrics possible
 are things like accuracy, kappa, etc.

 Is there any way to do this that I'm looking?  If not, could I put
 this in as a feature request?  Thanks!

 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/



 --
 Yang Zhang
 http://yz.mit.edu/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Max



 --
 Yang Zhang
 http://yz.mit.edu/



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] passing an extra argument to an S3 generic

2012-02-10 Thread Michael Friendly


On 2/9/2012 6:24 PM, ilai wrote:

You do not provide mlm.influence() so your code can't be reproduced.

Or did you mean to put lm.influence() in the formals to your hatvalues.mlm ?

If yes, then 1) you have a typo 2) lm.influence doesn't allow you to
pass on arguments, maybe try influence.lm instead.


No, I've written an S3 method, influence.mlm and a computational method,
mlm.influence, both of which take an m= argument.  I didn't post all the 
code because I thought the question might have an easy answer based on 
what I provided.


I include all the code and a test case in the attached .txt file that
can be sourced.

-Michael



Elai

On Thu, Feb 9, 2012 at 1:42 PM, Michael Friendlyfrien...@yorku.ca  wrote:

I'm trying to write some functions extending influence measures to
multivariate linear models and also
allow subsets of size m=1 to be considered for deletion diagnostics.  I'd
like these to work roughly parallel
to those functions for the univariate lm where only single case deletion
(m=1) diagnostics are considered.

Corresponding to stats::hatvalues.lm, the S3 method for class lm objects,


hatvalues-function (model, ...)

UseMethod(hatvalues)


hatvalues.lm-

function (model, infl = lm.influence(model, do.coef = FALSE),...)
{
hat- infl$hat
names(hat)- names(infl$wt.res)
hat
}

I have, for class mlm objects

hatvalues.mlm- function(model, m=1, infl=mlm.influence(model, m=m, do.coef
= FALSE), ...)
{
hat- infl$H
m- infl$m
names(hat)- if(m==1) infl$subsets else apply(infl$subsets,1, paste,
collapse=',')
hat
}

where mlm.influence() does the calculations, but also allows the m= argument
to specify subset size.
Yet when I test this I can't seem to pass the m= argument directly, so that
it gets stuffed in to the infl=
call to mlm.influence.

# fit an mlm
library(heplots)
Rohwer2- subset(Rohwer, subset=group==2)
rownames(Rohwer2)- 1:nrow(Rohwer2)
Rohwer.mod- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2)


class(Rohwer.mod)

[1] mlm lm


## this doesn't work, as I would like it to, calling the hatvalues.mlm
method, but passing m=2:

hatvalues(Rohwer.mod, m=2)

Error in UseMethod(hatvalues) :
  no applicable method for 'hatvalues' applied to an object of class
c('double', 'numeric')

I don't understand why this doesn't just call hatvalues.mlm, since
Rohwer.mod is of class mlm.

# These work -- calling hatvalues.mlm explicitly, or passing the infl=
argument with the call to
# mlm.influence

hatvalues.mlm(Rohwer.mod, m=2)
hatvalues(Rohwer.mod, infl=mlm.influence(Rohwer.mod,m=2))


Can someone help me understand what is wrong and how to make the .mlm method
allow m= to be passed
directly to the infl= computation?

thx,
-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA
# S3 method for class mlm
influence.mlm - function(model, m=1, do.coef=TRUE, ...)
mlm.influence(model, m=m, do.coef = do.coef, ...)


# main computation a la lm.influence
mlm.influence -
function (model, m=1, do.coef = TRUE, ...) 
{

# helper functions
  vec - function(M) {
R - matrix(M, ncol=1)
if (is.vector(M)) return(R)
nn-expand.grid(dimnames(M))[,2:1]
rownames(R) - apply(as.matrix(nn), 1, paste, collapse=:)
R
}

X - model.matrix(model)
data - model.frame(model)
Y - as.matrix(model.response(data))
r - ncol(Y)
n - nrow(X)
p - ncol(X)
labels - rownames(X)
call - model$call

B - coef(model)
E - residuals(model)
XPXI - solve(crossprod(X))
EPEI - solve(crossprod(E))
vB - vec(t(B))
S - crossprod(E)/(n-p);
V - solve(p*S) %x% crossprod(X)

subsets - t(combn(n, m))
nsub - nrow(subsets) 


Beta - as.list(rep(0, nsub))
R - L - H - Q - as.list(rep(0, nsub))
CookD - as.vector(rep(0, nsub))

# FIXME: naive, direct computations can use qr() or update()
for (i in seq(nsub)) {
I - c(subsets[i,])
rows - which(!(1:n) %in% I)
XI - X[rows,]
YI - Y[rows,]
BI - solve(crossprod(XI)) %*% t(XI) %*% YI
EI - (Y - X %*% BI)[I, , drop=FALSE]
CookD[i] - t(vec(B - BI)) %*% V

Re: [R] Formatting Y axis.

2012-02-10 Thread ilai

yaxt='n' in ?par and ?axis are your friends.

# A plot on log scale labeled with original:
plot(x,log(y),yaxt='n')
axis(2,at=pretty(log(y)),labels=round(exp(pretty(log(y)

Works for qqnorm and boxplots, as well as other top level fun.

By the way this is a FAQ.


On Fri, Feb 10, 2012 at 9:43 AM, sock.o mrsock...@gmail.com wrote:
 I've looked around and I just can't find anything that will work for my
 needs. This is a bit of a 2 part question but pertaining to the same topic
 so bare with me.

 The first is with my qq plot. On the Y axis of my qq plot it'll have my
 sample quantities but because my data is log-normal it'll show numbers
 between 0 - 5 (depending on the data). I'd like to know how to get it,
 instead of displaying 0-5 to display actual values so that way when reading
 it I know where my potential problems are. Would just save a lot of time and
 be an easier more understandable read. Here's sample code to plug in and
 see.

 x - rlnorm(400, 2.3, .8)
 y - rlnorm(400, 3, 1.6)

 plot.new()
        q1 - qqnorm(log1p(y))
        q2 - qqnorm(log1p(x))
                points( q1, col = red)
                points( q2, col = blue)
                abline( qqline(log1p(y), col = red) )
                abline( qqline(log1p(x), col = blue) )
                legend( bottomright, inset = 0.02, title = Loc_ID, 
 c(Sayincom,
 Sayout), fill = c( red, blue), horiz = TRUE)

 So what I want is for the y axis. or even secondary y axis if it's lined up
 properly to read the actual values rather than the logged values.

 The second part of this is in regards to my boxplots. In order to fit all of
 the data in a readable manner I've been making log = y. Which is fine, but
 when trying to estimate the difference via notches it's difficult to
 estimate how far it really is based on the axis being logged. You can use
 the same x as above, or y, whichever floats your boat.

 boxplot ( x,  notch = TRUE, log = y, boxwex = .50)

 Versus

 boxplot ( x,  notch = TRUE, boxwex = .50)

 What I need from this is for the ability to zoom in on a particular
 location. Rather than be forced to look at things in a log format on my y
 axis to just be able to say, ok I have my mean at 25 and my standard
 deviation is 10 so I want to concentrate this boxplot between 5 and 45 to
 view 2 standard deviations. (That whole mean thing and standard deviations
 are arbitrary numbers but the concept remains)

 Any help would be much appreciated. If there are particular packages
 required for any suggestions please include their name in your response.

 If you need actual imaging of what the actual graphs I'm working with look
 like as opposed to the randomly generated values that I'm supplying then
 just say so and I'll post the actual graphs I'm trying to work with.

 Sincerely,

 Sock


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Formatting-Y-axis-tp4376843p4376843.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] passing an extra argument to an S3 generic

2012-02-10 Thread Henrik Bengtsson

For these type of setups, I typically turn to default values, e.g.

hatvalues.mlm - function(model, m=1, infl=NULL, ...)
{
   if (is.null(infl)) {
 infl - mlm.influence(model, m=m, do.coef=FALSE);
   }

   hat - infl$H
   m - infl$m
   names(hat) - if(m==1) infl$subsets else apply(infl$subsets,1,
paste, collapse=',')
   hat
}

So people may prefer to do the following:

hatvalues.mlm - function(model, m=1, infl, ...)
{
   if (missing(infl)) {
 infl - mlm.influence(model, m=m, do.coef=FALSE);
   }

   hat - infl$H
   m - infl$m
   names(hat) - if(m==1) infl$subsets else apply(infl$subsets,1,
paste, collapse=',')
   hat
}

The downside with this approach is that args(fcn) doesn't reveal the
default behavior; instead you need to document it clearly in the
help(fcn).

My $.02

/H

On Fri, Feb 10, 2012 at 12:13 PM, Michael Friendly frien...@yorku.ca wrote:
 On 2/9/2012 6:24 PM, ilai wrote:

 You do not provide mlm.influence() so your code can't be reproduced.

 Or did you mean to put lm.influence() in the formals to your hatvalues.mlm
 ?

 If yes, then 1) you have a typo 2) lm.influence doesn't allow you to
 pass on arguments, maybe try influence.lm instead.


 No, I've written an S3 method, influence.mlm and a computational method,
 mlm.influence, both of which take an m= argument.  I didn't post all the
 code because I thought the question might have an easy answer based on what
 I provided.

 I include all the code and a test case in the attached .txt file that
 can be sourced.

 -Michael



 Elai

 On Thu, Feb 9, 2012 at 1:42 PM, Michael Friendlyfrien...@yorku.ca
  wrote:

 I'm trying to write some functions extending influence measures to
 multivariate linear models and also
 allow subsets of size m=1 to be considered for deletion diagnostics.
  I'd
 like these to work roughly parallel
 to those functions for the univariate lm where only single case deletion
 (m=1) diagnostics are considered.

 Corresponding to stats::hatvalues.lm, the S3 method for class lm
 objects,

 hatvalues-function (model, ...)

 UseMethod(hatvalues)

 hatvalues.lm-

 function (model, infl = lm.influence(model, do.coef = FALSE),    ...)
 {
    hat- infl$hat
    names(hat)- names(infl$wt.res)
    hat
 }

 I have, for class mlm objects

 hatvalues.mlm- function(model, m=1, infl=mlm.influence(model, m=m,
 do.coef
 = FALSE), ...)
 {
    hat- infl$H
    m- infl$m
    names(hat)- if(m==1) infl$subsets else apply(infl$subsets,1, paste,
 collapse=',')
    hat
 }

 where mlm.influence() does the calculations, but also allows the m=
 argument
 to specify subset size.
 Yet when I test this I can't seem to pass the m= argument directly, so
 that
 it gets stuffed in to the infl=
 call to mlm.influence.

 # fit an mlm
 library(heplots)
 Rohwer2- subset(Rohwer, subset=group==2)
 rownames(Rohwer2)- 1:nrow(Rohwer2)
 Rohwer.mod- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2)

 class(Rohwer.mod)

 [1] mlm lm


 ## this doesn't work, as I would like it to, calling the hatvalues.mlm
 method, but passing m=2:

 hatvalues(Rohwer.mod, m=2)

 Error in UseMethod(hatvalues) :
  no applicable method for 'hatvalues' applied to an object of class
 c('double', 'numeric')

 I don't understand why this doesn't just call hatvalues.mlm, since
 Rohwer.mod is of class mlm.

 # These work -- calling hatvalues.mlm explicitly, or passing the infl=
 argument with the call to
 # mlm.influence

 hatvalues.mlm(Rohwer.mod, m=2)
 hatvalues(Rohwer.mod, infl=mlm.influence(Rohwer.mod,m=2))


 Can someone help me understand what is wrong and how to make the .mlm
 method
 allow m= to be passed
 directly to the infl= computation?

 thx,
 -Michael

 --
 Michael Friendly     Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.
 York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
 4700 Keele Street    Web:   http://www.datavis.ca
 Toronto, ONT  M3J 1P3 CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Michael Friendly     Email: friendly AT yorku DOT ca
 Professor, Psychology Dept.
 York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
 4700 Keele Street    Web:   http://www.datavis.ca
 Toronto, ONT  M3J 1P3 CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply pairs function to multiple columns in a data frame

2012-02-10 Thread jarvisma

I am very new to R and programming and thank you in advance for your patience
and help with a complete novice!

I am working with a large multivariate data set that has 10 explanatory
environmental variables (e.g. temp, depth) and over 60 response variables
(each is a separate species). My data frame is set up like the simplified
version below:

JulianDay Temperature Salinity Depth Copepod Barnacle Gastropod
Bivalve
222 12.1330.3 500

756 0 178
222 12.333.2 1.1145
111 0 0
223 11.133.1 7752

234 12 0

Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod,
and Bivalve are the column headers.

I am using the pairs function in R to explore my data. My data frame is
named Sunset.
Using this code:

Z=cbind(Sunset$Copepod,Sunset[,c(1:10)]) #the first 10 columns of my data
frame are explanatory variables such as temp
Pairs.Copepod=pairs(Z,main=Copepods vs. explanatory variables,
panel=panel.smooth)

I get a great pair plot of Copepods vs. all of the 10 explanatory
environmental variables. I would like to do this for each of my 60+
species. I can't just make one big pair plot of all of my explanatory
variables vs. all of my species because I have too many. So instead I would
like a separate pair plot for each species (each column of data after the
first 10 columns of explanatory variables)

I would like to be able to write a loop that creates all of these plots at
once, but haven't been able to do so. Ideally, each pair plot would have
the main title be the column header (e.g. Copepod) for that plot. I would
love to include a way to save each pair plot as separate jpeg to a folder
on my desktop with a file name that includes the species name (e.g.
Copepod).

Again, I am very new to R and to programing so I would GREATLY appreciate
anyone patient and kind enough to respond with lots of detail so I can
hopefully follow your answer and suggestions. I have searched but haven't
been able to find a way to write a loop that uses each column of data, so I
would love some help!

Thank you!
Marley

--
View this message in context:
http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] passing an extra argument to an S3 generic

2012-02-10 Thread Michael Friendly


On 2/10/2012 4:09 PM, Henrik Bengtsson wrote:

So people may prefer to do the following:

hatvalues.mlm- function(model, m=1, infl, ...)
{
if (missing(infl)) {
  infl- mlm.influence(model, m=m, do.coef=FALSE);
}

hat- infl$H
m- infl$m
names(hat)- if(m==1) infl$subsets else apply(infl$subsets,1,
paste, collapse=',')
hat
}
Thanks;  I tried exactly that, but I still can't pass m=2 to the mlm 
method through the generic


 hatvalues(Rohwer.mod)
 1  2  3  4  5  
6  7  8
0.16700926 0.21845327 0.14173469 0.07314341 0.56821462 0.15432157 
0.04530969 0.17661104
 9 10 11 12 13 
14 15 16
0.05131298 0.45161152 0.14542776 0.17050399 0.10374592 0.12649927 
0.33246744 0.33183461
17 18 19 20 21 
22 23 24
0.17320579 0.26353864 0.29835817 0.07880597 0.14023750 0.19380286 
0.04455330 0.20641708
25 26 27 28 29 
30 31 32
0.15712604 0.15333879 0.36726467 0.11189754 0.30426999 0.08655434 
0.08921878 0.07320950

 hatvalues(Rohwer.mod, m=2)
Error in UseMethod(hatvalues) :
  no applicable method for 'hatvalues' applied to an object of class 
c('double', 'numeric')


## This works:
 hatvalues.mlm(Rohwer.mod, m=2)
   ... output snipped

 hatvalues
function (model, ...)
UseMethod(hatvalues)
bytecode: 0x021339e4
environment: namespace:stats


-Michael


--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply pairs function to multiple columns in a data frame

2012-02-10 Thread Phil Spector


A good way to solve problems like this is to write a function that
will work with one variable, and then use one of the apply family
of functions (in this case sapply) to do it for each of your 
variables.  In this case, such a function would be something like

this:

make1plot = function(var){
jpeg(paste(var,'jpg',sep='.'))
pairs(Sunset[,c(var,names(Sunset)[1:10])],main=paste(var,'vs. explanatory 
variables'),
  panel=panel.smooth)
dev.off()
}

(Notice that indexing precludes the need for cbind when you're creating a 
sub-matrix.)


Now suppose the names of your dependent variables are stored in a vector 
called deps.  Then


sapply(deps,make1plot)

should do what you want.

I couldn't test this, since you didn't provide a reproducible example.  If you
use this list often, you'll find that providing a reproducible example goes a 
long way towards getting a good answer.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu

On Fri, 10 Feb 2012, jarvisma wrote:


I am very new to R and programming and thank you in advance for your patience
and help with a complete novice!

I am working with a large multivariate data set that has 10 explanatory
environmental variables (e.g. temp, depth) and over 60 response variables
(each is a separate species).  My data frame is set up like the simplified
version below:

JulianDay  Temperature  Salinity  Depth  Copepod  Barnacle  Gastropod
Bivalve
222  12.1330.3 500
756 0 178
222  12.333.2 1.1145
111 0 0
223  11.133.1 7752
234 12   0

Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod,
and Bivalve are the column headers.

I am using the pairs function in R to explore my data.  My data frame is
named Sunset.
Using this code:

Z=cbind(Sunset$Copepod,Sunset[,c(1:10)])   #the first 10 columns of my data
frame are explanatory variables such as temp
Pairs.Copepod=pairs(Z,main=Copepods vs. explanatory variables,
panel=panel.smooth)

I get a great pair plot of Copepods vs. all of the 10 explanatory
environmental variables.  I would like to do this for each of my 60+
species.  I can't just make one big pair plot of all of my explanatory
variables vs. all of my species because I have too many.  So instead I would
like a separate pair plot for each species (each column of data after the
first 10 columns of explanatory variables)

I would like to be able to write a loop that creates all of these plots at
once, but haven't been able to do so.  Ideally, each pair plot would have
the main title be the column header (e.g. Copepod) for that plot.  I would
love to include a way to save each pair plot as  separate jpeg to a folder
on my desktop with a file name that includes the species name (e.g.
Copepod).

Again, I am very new to R and to programing so I would GREATLY appreciate
anyone patient and kind enough to respond with lots of detail so I can
hopefully follow your answer and suggestions.  I have searched but haven't
been able to find a way to write a loop that uses each column of data, so I
would love some help!

Thank you!
Marley

--
View this message in context: 
http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Table rearranging

2012-02-10 Thread Jeffrey Joh


Hi David,
Totally forgot line 3!  So 513 red should be ok but 420 red and 917 
yellow aren't.

Jeffrey


 CC: r-help@r-project.org
 From: dwinsem...@comcast.net
 To: johjeff...@hotmail.com
 Subject: Re: [R] Table rearranging
 Date: Tue, 7 Feb 2012 21:51:55 -0500


 On Feb 7, 2012, at 8:43 PM, Jeffrey Joh wrote:

 
  Hi David, I am not sure how ddply/summarize solves my issue. I have
  the following table:
 
  ID measurement date door color
  1 0.93529385 513 open red
  2 0.97419293 420 open red
  3 0.962053514 513 closed red
  4 0.963909937 1230 open blue
  5 0.97652034 1230 open green
  6 0.989310795 1230 closed blue
  7 0.9941022 917 closed yellow
  8 0.8945757 1230 open blue
 
  I only want to keep the lines that have corresponding open/closed
  measurements. For example, I want to keep lines 4,6,8 because for
  the 1230 blue condition, there exists both open and closed
  measurements.
 
  However, the 513 red condition has an open measurement, but no
  closed measurement.

 Huh? what about line 3?

 --
 David,
  Therefore, line 1 should be deleted.
 
  Jeffrey
 
 
  
  CC: r-help@r-project.org; wdun...@tibco.com
  From: dwinsem...@comcast.net
  To: johjeff...@hotmail.com
  Subject: Re: [R] Table rearranging
  Date: Tue, 7 Feb 2012 09:08:00 -0500
 
 
  On Feb 7, 2012, at 4:21 AM, Jeffrey Joh wrote:
 
 
  Thank you for your help, Bill.
 
  From the original table (not the plyr output), I would like to
  remove all the lines that do not have a corresponding open/closed
  measurement. For example, if there is a Closed yellow measurement
  on 0917, but not an Open yellow 0917 measurement, then the Closed
  yellow should be deleted.
 
  How can I make this change?
 
 
  In R you need to assign the results of a function to an object name
  so
  you code would look like:
 
  modified_data - ddply(d, .(date, color), summarize,
  meanClosed=mean(measurement[door==closed]),
  nClosed=sum(door==closed))
 
  --
  David
  Jeffrey
 
 
  
  From: wdun...@tibco.com
 
  To: johjeff...@hotmail.com; r-help@r-project.org
 
  Subject: RE: [R] Table rearranging
 
  Date: Tue, 7 Feb 2012 00:43:25 +
 
 
 
  Install and load the plyr package and try something like:
 
 
 
  ddply(d, .(date, color), summarize,
 
  + ddply(d, .(date, color), summarize
 
  + meanClosed=mean(measurement[door==closed]),
  nClosed=sum(door==closed))
 
  date color meanOpen nOpen meanClosed nClosed
 
  1 420 red 0.9741929 1 NaN 0
 
  2 513 red 0.9352938 1 0.9620535 1
 
  3 917 yellow NaN 0 0.9941022 1
 
  4 1230 blue 0.9639099 1 0.9893108 1
 
  5 1230 green 0.9765203 1 NaN 0
 
 
 
  Bill Dunlap
 
  Spotfire, TIBCO Software
 
  wdunlap tibco.com
 
 
 
  -Original Message-
 
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
  ] On Behalf Of Jeffrey Joh
 
  Sent: Monday, February 06, 2012 4:28 PM
 
  To: r-help@r-project.org
 
  Subject: [R] Table rearranging
 
 
 
 
 
  I have a table that looks like this:
 
 
 
  measurement date door color
 
  0.93529385 513 open red
 
  0.97419293 420 open red
 
  0.962053514 513 closed red
 
  0.963909937 1230 open blue
 
  0.97652034 1230 open green
 
  0.989310795 1230 closed blue
 
  0.9941022 917 closed yellow
 
 
 
  I would like to create a table that has: Open measurement, Closed
  measurement, date, color. For every
 
  date/color combination, there should be two columns to represent
  the door open/closed measurement.
 
 
 
  If there are multiple datapoints with a given door/date/color
  combination, then they should be
 
  averaged.
 
  I would also like to make two columns to represent the number of
 
  datapoints that were averaged in determining the open/closed
 
  measurements.
 
 
 
  Jeffrey
 
 
 
  __
 
  R-help@r-project.org mailing list
 
  https://stat.ethz.ch/mailman/listinfo/r-help
 
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
 
  and provide commented, minimal, self-contained, reproducible code.
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  David Winsemius, MD
  West Hartford, CT
 
 

 David Winsemius, MD
 West Hartford, CT

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] stepwise variable selection with multiple dependent variables

2012-02-10 Thread Fugate, Michael L

Good Day,

I fit a multivariate linear regression model with 3 dependent variables and 
several predictors using the lm function.  I would like to use stepwise 
variable selection to produce a set of candidate models.  However, when I pass 
the fitted lm object to step() I get the following error:

Error from R:
Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k,  : 
  no 'drop1' method for mlm models

My dependent data is in the matrix ymat where ymat is 35 rows by 3 columns.  
The predictors are in X where X is 35 by 6

The steps I used were:
m.fit - lm(ymat ~ ., data=X)
m.step - step(m.fit)

If variable selection is not possible with step() is there another package that 
will perform variable selection in a multivariate setting?

System information:
platform   x86_64-apple-darwin9.8.0 
arch   x86_64   
os darwin9.8.0  
system x86_64, darwin9.8.0  
status  
major  2
minor  13.1 
year   2011 
month  07   
day08   
svn rev56322
language   R
version.string R version 2.13.1 (2011-07-08)

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Table rearranging

2012-02-10 Thread David Winsemius



On Feb 10, 2012, at 4:47 PM, Jeffrey Joh wrote:



Hi David,
Totally forgot line 3!  So 513 red should be ok but 420 red and  
917 yellow aren't.




I don't have a plyr solution but this is a base solution:

dat[ as.logical( ave(dat$door, dat$date,
   FUN=function(x) {open %in% x  closed %in% x} )) , ]
  ID measurement date   door color
1  1   0.9352938  513   open   red
3  3   0.9620535  513 closed   red
4  4   0.9639099 1230   open  blue
5  5   0.9765203 1230   open green
6  6   0.9893108 1230 closed  blue
8  8   0.8945757 1230   open  blue

(You could have used Dunlap's resutl and tested for both the meanOpen  
and meanClosed values being not-is.na.)


--
David


Jeffrey



CC: r-help@r-project.org
From: dwinsem...@comcast.net
To: johjeff...@hotmail.com
Subject: Re: [R] Table rearranging
Date: Tue, 7 Feb 2012 21:51:55 -0500


On Feb 7, 2012, at 8:43 PM, Jeffrey Joh wrote:



Hi David, I am not sure how ddply/summarize solves my issue. I have
the following table:

ID measurement date door color
1 0.93529385 513 open red
2 0.97419293 420 open red
3 0.962053514 513 closed red
4 0.963909937 1230 open blue
5 0.97652034 1230 open green
6 0.989310795 1230 closed blue
7 0.9941022 917 closed yellow
8 0.8945757 1230 open blue

I only want to keep the lines that have corresponding open/closed
measurements. For example, I want to keep lines 4,6,8 because for
the 1230 blue condition, there exists both open and closed
measurements.

However, the 513 red condition has an open measurement, but no
closed measurement.


Huh? what about line 3?

--
David,

Therefore, line 1 should be deleted.

Jeffrey




CC: r-help@r-project.org; wdun...@tibco.com
From: dwinsem...@comcast.net
To: johjeff...@hotmail.com
Subject: Re: [R] Table rearranging
Date: Tue, 7 Feb 2012 09:08:00 -0500


On Feb 7, 2012, at 4:21 AM, Jeffrey Joh wrote:



Thank you for your help, Bill.


From the original table (not the plyr output), I would like to
remove all the lines that do not have a corresponding open/closed
measurement. For example, if there is a Closed yellow measurement
on 0917, but not an Open yellow 0917 measurement, then the Closed
yellow should be deleted.


How can I make this change?



In R you need to assign the results of a function to an object name
so
you code would look like:

modified_data - ddply(d, .(date, color), summarize,
meanClosed=mean(measurement[door==closed]),
nClosed=sum(door==closed))

--
David

Jeffrey




From: wdun...@tibco.com



To: johjeff...@hotmail.com; r-help@r-project.org



Subject: RE: [R] Table rearranging



Date: Tue, 7 Feb 2012 00:43:25 +







Install and load the plyr package and try something like:







ddply(d, .(date, color), summarize,



+ ddply(d, .(date, color), summarize



+ meanClosed=mean(measurement[door==closed]),
nClosed=sum(door==closed))



date color meanOpen nOpen meanClosed nClosed



1 420 red 0.9741929 1 NaN 0



2 513 red 0.9352938 1 0.9620535 1



3 917 yellow NaN 0 0.9941022 1



4 1230 blue 0.9639099 1 0.9893108 1



5 1230 green 0.9765203 1 NaN 0







Bill Dunlap



Spotfire, TIBCO Software



wdunlap tibco.com







-Original Message-



From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
] On Behalf Of Jeffrey Joh



Sent: Monday, February 06, 2012 4:28 PM



To: r-help@r-project.org



Subject: [R] Table rearranging











I have a table that looks like this:







measurement date door color



0.93529385 513 open red



0.97419293 420 open red



0.962053514 513 closed red



0.963909937 1230 open blue



0.97652034 1230 open green



0.989310795 1230 closed blue



0.9941022 917 closed yellow






I would like to create a table that has: Open measurement,  
Closed

measurement, date, color. For every



date/color combination, there should be two columns to represent
the door open/closed measurement.







If there are multiple datapoints with a given door/date/color
combination, then they should be



averaged.



I would also like to make two columns to represent the number of



datapoints that were averaged in determining the open/closed



measurements.







Jeffrey







__



R-help@r-project.org mailing list



https://stat.ethz.ch/mailman/listinfo/r-help



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


and provide commented, minimal, self-contained, reproducible  
code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT





David Winsemius, MD
West Hartford, CT





David Winsemius, MD
West Hartford, CT

Re: [R] Choosing glmnet lambda values via caret

Yes, I know how to statically specify lambda choices.  I was asking
about whether there's a way to let glmnet guide that, instead of
specifying it in advance.  Again, sorry if I could've been more clear.

On Thu, Feb 9, 2012 at 7:15 PM, Max Kuhn mxk...@gmail.com wrote:
 You can adjust the candidate set of tuning parameters via the tuneGrid
 argument in trian() and the process by which the optimal choice is
 made (via the 'selectionFunction' argument in trainControl()). Check
 out the package vignettes.

 The latest version also has an update.train() function that lets the
 user manually specify the tuning parameters after the call to train().

 On Thu, Feb 9, 2012 at 7:00 PM, Yang Zhang yanghates...@gmail.com wrote:
 Usually when using raw glmnet I let the implementation choose the
 lambdas.  However when training via caret::train the lambda values are
 predetermined.  Is there any way to have caret defer the lambda
 choices to caret::train and thus choose the optimal lambda
 dynamically?

 --
 Yang Zhang
 http://yz.mit.edu/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Max



-- 
Yang Zhang
http://yz.mit.edu/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stepwise variable selection with multiple dependent variables

2012-02-10 Thread Mitchell Maltenfort

Anova.mlm would be one way to do model selection.



On Fri, Feb 10, 2012 at 4:29 PM, Fugate, Michael L fug...@lanl.gov wrote:
 Good Day,

 I fit a multivariate linear regression model with 3 dependent variables and 
 several predictors using the lm function.  I would like to use stepwise 
 variable selection to produce a set of candidate models.  However, when I 
 pass the fitted lm object to step() I get the following error:

 Error from R:
 Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k,  :
  no 'drop1' method for mlm models

 My dependent data is in the matrix ymat where ymat is 35 rows by 3 columns.  
 The predictors are in X where X is 35 by 6

 The steps I used were:
 m.fit - lm(ymat ~ ., data=X)
 m.step - step(m.fit)

 If variable selection is not possible with step() is there another package 
 that will perform variable selection in a multivariate setting?

 System information:
 platform       x86_64-apple-darwin9.8.0
 arch           x86_64
 os             darwin9.8.0
 system         x86_64, darwin9.8.0
 status
 major          2
 minor          13.1
 year           2011
 month          07
 day            08
 svn rev        56322
 language       R
 version.string R version 2.13.1 (2011-07-08)

 Thanks in advance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scriptable Integration

2012-02-10 Thread Hasan Diwan

My data:
 dput(mydata)
structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1,
1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276,
0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485,
0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579,
0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231
)), .Names = c(V1, V2, V3), row.names = c(NA, 6L), class = data.frame)

I'd like to apply an arbitrary number of integrations of the
splinefunc from mydata$V2 and mydata$V3. Therefore, it should return a
function. Anyone have any idea as to a package that allows this? Many
thanks! -- H
-- 
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Apply pmax to dataframe with different args based on dataframe factor

2012-02-10 Thread Idris Raja

damn, that's pretty clever. +1

thanks

On Thu, Feb 9, 2012 at 8:11 PM, ilai ke...@math.montana.edu wrote:

 Your attempt was just overly complicated. All you needed was

 threshold - c( .2 , .4 , .5 )[ df$track ]
 df$value - pmax(threshold, df$value)
 df # desired outcome

 Cheers

 On Thu, Feb 9, 2012 at 3:56 PM, Idris Raja idris.r...@gmail.com wrote:
  # I have a dataframe in the following form:
 
  track - c(rep('A', 3), rep('B', 4), rep('C', 4))
  value - c(0.15, 0.25, 0.35, 0.05, 0.99, 0.32, 0.13, 0.80, 0.75, 0.60,
 0.44)
  df - data.frame(track=factor(track), value=value)
 
  # print(df)
#track value
  #1  A  0.15
  #2  A  0.25
  #3  A  0.35
  #4  B  0.05
  #5  B  0.99
  #6  B  0.32
  #7  B  0.13
  #8  C  0.80
  #9  C  0.75
  #10 C  0.60
  #11 C  0.44
 
 
  # If any of the values are below a threshold value, I want to replace it
  with the
  # threshold value. The twist is that there is a different threshold value
  for
  # every track.
 
  # I tried something like this, but it's not working
 
  threshold - list()
  threshold['A'] - 0.2
  threshold['B'] - 0.4
  threshold['C'] - 0.5
 
 
  for (track in levels(df$track)){
 df[df$track==track,]$outcome - pmax(df[df$track==track,]$outcome,
  threshold[track])
  }
  # Warning messages:
  # 1: In is.na(mmm) : is.na() applied to non-(list or vector) of type
 'NULL'
  # 2: In is.na(mmm) : is.na() applied to non-(list or vector) of type
 'NULL'
  # 3: In is.na(mmm) : is.na() applied to non-(list or vector) of type
 'NULL'
 
 
  #**
  # Desired Results:
 
  # print(df)
#track value
  #1  A  0.20 # value changed
  #2  A  0.25
  #3  A  0.35
  #4  B  0.40 # value changed
  #5  B  0.99
  #6  B  0.40 # value changed
  #7  B  0.40 # value changed
  #8  C  0.80
  #9  C  0.75
  #10 C  0.60
  #11 C  0.50 # value changed
 
 
  # Any ideas? Thanks for reading.
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiple histograms from a dataframe

2012-02-10 Thread Adel ESSAFI

Hi list


I  need some help for drawing some histograms

I have a dataframe , say,
X Y Z T

I want to draw a histogram Z-T for each value of the couple (X-Y).
When I use thus syntax

 library(lattice)
 histogram(law[,3] ~ law[,66] | law[,1] )

it draws multiple histograms but by selecting distinct values of  law[,1]
The deal is to make the same thing but for a couple of columns

Thanks in advance for help

Adel



-- 
PhD candidate in Computer Science
Address
3 avenue lamine, cité ezzahra, Sousse 4000
Tunisia
tel: +216 97 246 706 (+33640302046 jusqu'au 15/6)
fax: +216 71 391 166

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple histograms from a dataframe

2012-02-10 Thread David Winsemius



On Feb 10, 2012, at 7:05 PM, Adel ESSAFI wrote:


Hi list


I  need some help for drawing some histograms

I have a dataframe , say,
X Y Z T

I want to draw a histogram Z-T for each value of the couple (X-Y).
When I use thus syntax


library(lattice)
histogram(law[,3] ~ law[,66] | law[,1] )


Perhaps (but untested in the absence of data);

 histogram( Z ~ T | interaction(X, Y)  , data=dfrmname )



it draws multiple histograms but by selecting distinct values of   
law[,1]

The deal is to make the same thing but for a couple of columns

Thanks in advance for help

Adel



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] garchSim with starting error and variance given?

2012-02-10 Thread jfca283

Hello
i need to simulate data for and arch and garch process, n=1000.
I've been generating arch and garch data but i can't find the way to set an
starting error, epsilon(t=0) and sigma(t=0) to run the command garchSim(). I
generated the process on Excel and i know that for the Arch process has a
mean of 10 aprox, and for Garch a mean of 25 aprox. The errors comes from a
random walk
y_t=epsilon_t
Thanks for your time and interest.

--
View this message in context: 
http://r.789695.n4.nabble.com/garchSim-with-starting-error-and-variance-given-tp432p432.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scriptable Integration

2012-02-10 Thread R. Michael Weylandt

I'm not quite sure what you mean, but perhaps this will help:

library(splines)
mydata - structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1,
1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276,
0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485,
0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579,
0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231
)), .Names = c(V1, V2, V3), row.names = c(NA, 6L), class = data.frame)

spf - splinefun(mydata$V2)
splInt - function(low, up) integrate(spf, low, up)$value

splInt(0, 1)

splInt(0, 2)

Michael

On Fri, Feb 10, 2012 at 6:34 PM, Hasan Diwan hasan.di...@gmail.com wrote:
 My data:
 dput(mydata)
 structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1,
 1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276,
 0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485,
 0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579,
 0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231
 )), .Names = c(V1, V2, V3), row.names = c(NA, 6L), class = data.frame)

 I'd like to apply an arbitrary number of integrations of the
 splinefunc from mydata$V2 and mydata$V3. Therefore, it should return a
 function. Anyone have any idea as to a package that allows this? Many
 thanks! -- H
 --
 Sent from my mobile device
 Envoyait de mon portable

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] issues with read.spss

2012-02-10 Thread Jeroen Ooms

Someone supplied me with an SPSS datafile that caused a buffer
overflow and then a crash when reading it in R. Unfortunately I can't
supply the dataset at hand and I have a hard time reproducing it with
a toy example. But I found at least 2 issues that might be related. I
would like to know which of these are expected behavior, and which are
bugs. I reproduced it on R 2.14.1 both on Ubuntu Linux and Windows
7...

Below some code. The files that are referenced in the code are
available for download on http://www.stat.ucla.edu/~jeroen/spss/

#load library
library(foreign)

#problem one: long string variable is converted to multiple variables.
x - read.spss(longstring.sav);
summary(x); #4 variables??

#problem two: use.labels does not deal correctly with duplicate labels
and generates a bad factor.
x - read.spss(duplicate_labels.sav, use.value.labels=T);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scriptable Integration

2012-02-10 Thread Hasan Diwan

Michael,
On 10 February 2012 18:11, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 I'm not quite sure what you mean, but perhaps this will help:
 spf - splinefun(mydata$V2)
 splInt - function(low, up) integrate(spf, low, up)$value

Sort of, I'm looking to get the nth order integral, where the only
thing I know about n is that it is greater than 1, but in practise
will be under 4. Something like:
spf - rep(splinefun(myData$V2, times=(n-1))
splInt - integrate(spf, min(myData$V2), max(myData$V3))

However, I can't seem to figure out how to get splinefun to evaluate
itself. Hope that's clearer and thanks for the help!
-- 
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply pairs function to multiple columns in a data frame