date:20110928

[R] Trouble with performing post hoc analysis Tukey for lme model using ghlt

2011-09-28 Thread l.m.t

Hi, 

I am new to R and I am trying to perform a post hoc tukey test using the
multcomp package's ghlt for a lme model. My first attempt at doing so gave
me an output, HOWEVER I have tried to do this again, it keeps coming up with
the error:
 
(Error in contrMat(table(mf[[nm]]), type = types[pm]) : 
  less than two groups)

My model is looking at effect of incubation temperature (3 groups) on PHA
with hen as a my random factor. Temperature treatment has been made
factorial (Tempfact).

my code is as follows:

library(multcomp)
 PHA - lme(DiffPHA48 ~ Tempfact, random= ~1|Hen, na.action=na.omit)
 summary(glht(PHA, linfct=mcp(Tempfact = Tukey))) 
Error in contrMat(table(mf[[nm]]), type = types[pm]) : 
  less than two groups

I am confused why it would work the first time and now everyother time it
comes up with this error. Can anyone please help me??

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Trouble-with-performing-post-hoc-analysis-Tukey-for-lme-model-using-ghlt-tp3849982p3849982.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding axis to an ellipse: ellipse package

2011-09-28 Thread Rolf Turner


On 28/09/11 04:53, Antoine wrote:

Dear list members,

This might be a silly question but I just can't figure it out. I am using
the ellipse package on covariance matrices. I would simply like to plot my
ellipses WITH its two axis ploted as well. These axis represents the 2 eigen
vectors of my matrix and it is important that I can graphically show them.
Is there an easy way to do so?


Pretty easy.

* Calculate your eigenvalues and vectors using eigen().

* Rescale the vectors by multiplying them by the square
root of the corresponding eigenvalues, and by the constant
corresponding to the confidence level --- e.g. sqrt(qchisq(0.95,2))
for the usual 95% confidence level.

* Add these vectors to the ellipse plot, using segments().
E.g. if your rescaled eigenvectors are v1 and v2, just do:

segments(-v1[1],-v1[2],v1[1],v1[2])

and

segments(-v2[1],-v2[2],v2[1],v2[2])

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Negative Quartile

2011-09-28 Thread Damian Abalo

Hello, I have a doubt, but it is more statistic than just about R: How
the people deal usually with negative percentile and quartile? In my
concrete case, I want to know the distribution of an error, so the
nearest it is to 0 the better (I think it would be optimun to have the
nearest values to 0 in the lower percentiles). The best result would
be making the percentile of the absolute value (without sign), but the
sing is also important, and I need to keep it.
This results in the big negative values to be in the lower
percentiles, which are usually understanded as the best cases, since
we are working with an error. My question is, what is the usual thing
done in this cases? keep the sign, and read the lower percentiles as
bad cases also (being the optimal ones the center ones) or try to
somehow get the signless distribution keeping the signs somehow? And
if this is the case, how can this be done?

Thanks for your time, Best regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation cleaning

2011-09-28 Thread Weidong Gu

Seems your questions belong to rule mining for frequent item sets.
check arules package

Weidong Gu

On Tue, Sep 27, 2011 at 11:13 PM, pip56789 pd...@virginia.edu wrote:
 Hi,

 I have a few methodological and implementation questions for ya'll. Thank
 you in advance for your help. I have a dataset that reflects people's
 preference choices. I want to see if there's any kind of clustering effect
 among certain preference choices (e.g. do people who pick choice A also pick
 choice D).

 I have a data set that has one record per user ID, per preference choice.
 It's a long form of a data set that looks like this:

 ID | Page
 123 | Choice A
 123 | Choice B
 456 | Choice A
 456 | Choice B
 ...

 I thought that I should do the following

 1. Make the data set wide, counting the observations so the data looks
 like this:
 ID | Count of Preference A | Count of Preference B
 123 | 1 | 1
 ...

 Using
 table1 - dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

 2. Create a correlation matrix of preferences
 cor(table2[,-1])

 How would I restrict my correlation to show preferences that met a minimum
 sample threshold? Can you confirm if the two following commands do the same
 thing? What would I do from here (or am I taking the wrong approach)
 table1 - dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
 table2 - with(data, table(Page,Page))


 many thanks,
 Peter

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting Lines Through multiple groups

2011-09-28 Thread clara_eco

Hi I have data in the following format
Cort Day Animal
230  1
273  1
240  2
271  2
342  2
303  2
244  2
200  3
241  3
282  3
344  3
etc.
It is measured across time(day) however no every individual is measured the
same number of times.  All I want to do is plot the Raw data and then run a
line connecting the data points of each individual animal, for the example
above there would be three lines as there are three animals.
I've tried, gplots, lattice and car but I'm not really figuring it out, any
help would be greatly appreciated!

Thanks 
Clara


--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3850099.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding axis to an ellipse: ellipse package

2011-09-28 Thread Antoine

Thanks a lot Rolf, I knew it would be possible to do it that way but I  
was just curious to know if the ellipse package was offering such a  
feature. Thanks again for you help!

Antoine
Le 28 sept. 11 Ã  09:30, Rolf Turner-3 [via R] a Ã©crit :

 On 28/09/11 04:53, Antoine wrote:
  Dear list members,
 
  This might be a silly question but I just can't figure it out. I  
 am using
  the ellipse package on covariance matrices. I would simply like  
 to plot my
  ellipses WITH its two axis ploted as well. These axis represents  
 the 2 eigen
  vectors of my matrix and it is important that I can graphically  
 show them.
  Is there an easy way to do so?

 Pretty easy.

 * Calculate your eigenvalues and vectors using eigen().

 * Rescale the vectors by multiplying them by the square
 root of the corresponding eigenvalues, and by the constant
 corresponding to the confidence level --- e.g. sqrt(qchisq(0.95,2))
 for the usual 95% confidence level.

 * Add these vectors to the ellipse plot, using segments().
 E.g. if your rescaled eigenvectors are v1 and v2, just do:

  segments(-v1[1],-v1[2],v1[1],v1[2])

 and

  segments(-v2[1],-v2[2],v2[1],v2[2])

  cheers,

  Rolf Turner

 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 If you reply to this email, your message will be added to the  
 discussion below:
 http://r.789695.n4.nabble.com/Adding-axis-to-an-ellipse-ellipse-package-tp3847954p3850233.html
 To unsubscribe from Adding axis to an ellipse: ellipse package,  
 click here.



--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-axis-to-an-ellipse-ellipse-package-tp3847954p3850237.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix and list indices

2011-09-28 Thread fernando.cabrera

Thanks Michael it works!

Have to say it is amazing what you can do in R with a few lines (a line in this 
case) of code.

Fernando

From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com]
Sent: 27. september 2011 15:43
To: Cabrera, Fernando Álvarez
Subject: Re: [R] Matrix and list indices

Untested, I believe this should work, though you might need to modify for 
floating point funny business in testing the equalities:

my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), 
mars=array(c(8:1),dim=c(2,2,2)))
my_list$earth[my_list$earth==0] - my_list$mars[my_list$earth==0]

Hope this helps,

Michael Weylandt
On Tue, Sep 27, 2011 at 8:49 AM, 
fernando.cabr...@nordea.commailto:fernando.cabr...@nordea.com wrote:
Hi guys,

I am trying to replace all elements of earth that are equal to zero with their 
corresponding elements in mars. I can do the replace with a bunch of for-loops, 
but I don't think this is the R way of doing things.

my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), 
mars=array(c(8:1),dim=c(2,2,2)))
my_list
for (i in c(1:2)) {
   for (j in c(1:2)) {
   for (k in c(1:2)) {
   if (my_list$earth[i,j,k] == 0) {

my_list$earth[i,j,k] - my_list$mars[i,j,k]
   }
   }
   }
}
my_list

Do you guys have any suggestions for getting rid of the ugly for-loops?

Many thanks,

Fernando Álvarez

Nordea e-Markets
Strandgade 3
PO Box 850
DK-0900 Copenhagen C
Denmark
Tel.: +45 33 33 32 67tel:%2B45%2033%2033%2032%2067
Mobile: +45 61 55 27 54tel:%2B45%2061%2055%2027%2054

This transmission is intended solely for the person or entity to whom it is 
addressed. It may contain privileged and confidential information. If you are 
not the intended recipient, please be notified that any dissemination, 
distribution or copying is strictly prohibited. If you have received this 
transmission by mistake, please let us know and then delete it from your system.

P Please consider the impact on the environment before printing this e-mail.

   [[alternative HTML version deleted]]

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Envfit, inconsistant result?

2011-09-28 Thread Jari Oksanen

rodrock rodrigo.vargasgaete at gmail.com writes:


 I am using the envfit function over an ordination of floristic data.
 
 The problem is that every time that I run it changes the results. Sometimes
 dramatically, selecting variables that the first time were not significant.
 I do not get what could be the problem or if is normal given the
 permutations are different.
 
 # the NMDS ordination
 
 gap_flor_NMDS_chord - metaMDS(gaps_flor, distance = euclid, k = 2, trymax
 = 20, autotransform =TRUE,
 noshare = 0.1, wascores = TRUE, expand = TRUE, trace = 1,
 plot = FALSE, old.wa = FALSE, zerodist = add)
 

 # the  Envfit calculation
 exp_flor1 - envfit(gap_flor_NMDS_chord, explain1, permu = 999, na.rm=T)  
  
 

Rodrigo,

Do you re-run your NMDS when the envfit results change? The results of NMDS may
change between two runs. You should first check that the NMDS results are nearly
identical. This you can do with procrustes() function in vegan:

plot(procrustes(gap_flor_NMDS1, gap_flor_NMDS2))

where the arguments gap_flor_NMDS1 and gap_flor_NMDS2 are two NMDS results that
gave you different envfit results. Use of plot() above helps you to avoid
looking at tiny differences in a numeric results: if you cannot see a difference
in the procrustes plot, there is no difference you need to worry about. Only
worry about envfit differences if the NMDS results are practically identical.

I'm surprised if metaMDS didn't issue any warning on the combination of
arguments you used in metaMDS: having no.share  0 and distance = euclid
sounds dubious, and I have tried to catch those cases and issue an informative
warning. You could also try higher trymax to get more consistent NMDS results.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting Lines Through multiple groups

2011-09-28 Thread Timothy Bates

if it was me, i'd do
library(ggplot2)
df - read.table(pipe(pbpaste), header=T, sep='\t')
df$Animus=factor(df$Animal, labels=c(Tom, Dick, Harry))
qplot(Day, Cort, data = df, geom=line, colour=Animus)

On Sep 28, 2011, at 8:03 AM, clara_eco wrote:

 Hi I have data in the following format
 Cort Day Animal
 230  1
 273  1
 240  2
 271  2
 342  2
 303  2
 244  2
 200  3
 241  3
 282  3
 344  3
 etc.
 It is measured across time(day) however no every individual is measured the
 same number of times.  All I want to do is plot the Raw data and then run a
 line connecting the data points of each individual animal, for the example
 above there would be three lines as there are three animals.
 I've tried, gplots, lattice and car but I'm not really figuring it out, any
 help would be greatly appreciated!
 
 Thanks 
 Clara
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3850099.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fitting a GLM: Problems with ns date functions

2011-09-28 Thread Cal

Hello,
I am attempting to use R as part of a time-series analysis investigating the
influence of meteorological factors on health outcomes. The test csv dta
file that i am working with contains a complete daily set of  the variables
‘DATE’, ‘ADMINS’, ‘NOX’, ‘TEMP’ across a 5 year period (1827 days).

Within attempting to fit a GLM, I believe that I am having difficulty with
the ns function, although problems could also be arising through date
functions in R. Attached is my code and following errors, as a new starter
within the world of R any help would be much appreciated.


## [1] Original Analysis: FAILS

## packages loaded for use within my wider data analysis
library(splines)
library(xtable)
library(tsModel)
library(lattice)
library(mda)
library(gam)



## splines: For date there are 4 seasons across 5 years
## splines: For temp cold and warm seasons are 6 months long

fit - glm(J00_99 ~ NOX_LIN + ns(DATE_B, 4 * 5) + ns(TEMP_LIN, 6), 
   data = data, family = poisson)
### Error in (1 - h) * qs[i] : non-numeric argument to binary operator

pr - predict(fit, type = terms)
### Error in predict(fit, type = terms) : object 'fit' not found





## [2] Attempt 2: FAILS

# So I have also been informed that when working with date i should use
the chron
# package to convert the dates within my data file to a usable format.
This could be causing such problems

library(chron)
dates-as.POSIXct(strptime(data[, DATE], format = %d/%m/%Y, GMT))


fit - glm(ADMINS ~ NOX + ns(date, 4 * 5) + ns(TEMP, 6), 
   data = data, family = poisson)


## Error in as.vector(x, mode) : cannot coerce type 'closure' to vector of
type 'any'



--
View this message in context: 
http://r.789695.n4.nabble.com/Fitting-a-GLM-Problems-with-ns-date-functions-tp3850379p3850379.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT: (quasi-?) separation in a logistic GLM

2011-09-28 Thread lincoln

I know that this is a quite old post but I am dealing with a similar warning
message and, also after reading all the posts here, I am still in doubt with
what I should do with my analysis.

I have a dataset where the binary response variable is sex, and the
predictors are several variables (they are percentage). I know that one of
this variables, hcp, predicts quite well the sex of the individuals given
that the ca. 85% of females have hcp0 while ca. 85% of females have hcp=0.
I don't believe that it does matter, but all the values are very low (the
maximum value of a predictor doesn't exceed 0.5) and most of predictors show
a very left-skewed distribution (log-normal). 
In any case, I believe that it could be said that quasi-separation is
occurring.

When I run the saturated model I get the warning message:

*
glm.sat1-glm(sex~hwp+hcp+hnp+twp+d1lp*d2dp+hwp:hcp+hwp:hnp+hwp:twp+hcp:hnp+hcp:twp+hnp:twp,data=dati,family=binomial)
Mensajes de aviso perdidos
glm.fit: fitted probabilities numerically 0 or 1 occurred 
 summary(glm.sat1)

Call:
glm(formula = sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + 
hwp:hnp + hwp:twp + hcp:hnp + hcp:twp + hnp:twp, family = binomial, 
data = dati)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-2.6838  -0.1439   0.2699   0.5694   3.8421  

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept) 8.940  4.046   2.210 0.027125 *  
hwp-4.897  4.550  -1.076 0.281793
hcp  -414.796113.780  -3.646 0.000267 ***
hnp-3.689  6.692  -0.551 0.581487
twp 5.450  2.566   2.124 0.033657 *  
d1lp  -22.436 13.837  -1.621 0.104926
d2dp  -21.076  9.266  -2.274 0.022940 *  
d1lp:d2dp  65.102 35.394   1.839 0.065864 .  
hwp:hcp   313.514823.782   0.381 0.703516
hwp:hnp   -46.572 79.467  -0.586 0.557834
hwp:twp 8.252 21.503   0.384 0.701156
hcp:hnp -1721.320   1925.766  -0.894 0.371409
hcp:twp   -94.153459.710  -0.205 0.837721
hnp:twp16.598 38.548   0.431 0.666769
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 581.51  on 425  degrees of freedom
Residual deviance: 281.98  on 412  degrees of freedom
  (41 observations deleted due to missingness)
AIC: 309.98

Number of Fisher Scoring iterations: 8

**  

The estimates of hcp are very high and its Std.Error are also quite high. If
I go further with the analyses and use the step procedure I finally get a
candidate minimal adequate model:

 step.1-step(glm.sat1)
Start:  AIC=309.98
sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + hwp:hnp + 
hwp:twp + hcp:hnp + hcp:twp + hnp:twp

Df DevianceAIC
- hcp:twp1   282.02 308.02
- hwp:hcp1   282.11 308.11
- hwp:twp1   282.13 308.13
- hnp:twp1   282.17 308.17
- hwp:hnp1   282.32 308.32
- hcp:hnp1   282.90 308.90
none   281.98 309.98
- d1lp:d2dp  1   285.43 311.43

Step:  AIC=308.02
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hcp + 
hwp:hnp + hwp:twp + hcp:hnp + hnp:twp

Df DevianceAIC
- hwp:hcp1   282.13 306.13
- hwp:twp1   282.14 306.14
- hnp:twp1   282.18 306.18
- hwp:hnp1   282.43 306.43
- hcp:hnp1   283.07 307.07
none   282.02 308.02
- d1lp:d2dp  1   285.43 309.43

Step:  AIC=306.13
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + 
hwp:twp + hcp:hnp + hnp:twp

Df DevianceAIC
- hwp:twp1   282.24 304.24
- hnp:twp1   282.27 304.27
- hwp:hnp1   282.53 304.53
- hcp:hnp1   283.25 305.25
none   282.13 306.13
- d1lp:d2dp  1   285.46 307.46

Step:  AIC=304.24
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + 
hcp:hnp + hnp:twp

Df DevianceAIC
- hnp:twp1   282.35 302.35
- hwp:hnp1   282.83 302.83
- hcp:hnp1   283.36 303.36
none   282.24 304.24
- d1lp:d2dp  1   285.56 305.56

Step:  AIC=302.35
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + 
hcp:hnp

Df DevianceAIC
- hwp:hnp1   283.06 301.06
- hcp:hnp1   283.37 301.37
none   282.35 302.35
- d1lp:d2dp  1   285.71 303.71
- twp1   306.17 324.17

Step:  AIC=301.06
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hcp:hnp

Df DevianceAIC
- hcp:hnp1   284.36 300.36
none   283.06 301.06
- d1lp:d2dp  1   286.26 302.26
- hwp1   286.96 302.96
- twp1   307.98 323.98

Step:  AIC=300.36
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp

Df DevianceAIC
none   284.36 300.36
- hnp1   287.07 301.07
- d1lp:d2dp  1   287.80 301.80
- hwp1   288.08 302.08
- twp1   308.39 322.39
- hcp1   484.12 498.12
Hubo 40

[R] apply lm function to dataset split by two variables

2011-09-28 Thread Elena Guijarro


Dear all,

I am not fluent in R and am struggling to 1) apply a lm to a weight-size 
dataset, thus the model has to run separately for each species, each 
year; 2) extract coefs, r-squared, n, etc. The data look like this:

yearsps cm  w
200950  16  22
200950  17  42
200950  18  45
200951  15  45
200951  16  53
200951  17  73
201050  15  22
201050  16  41
201050  16  21
201050  17  36
201051  15  43
201051  16  67
201051  17  79



The following script works for data from a single year, but I don't find 
a way to subset the data by sps AND year and get the function running:

f - function(data) lm(log(w) ~ log(cm+0.5), data = data)
v - lapply(split(data, data$sps), f)

and then I can extract the data with this script from Peter Solymos 
(although I do not get the number of points used in the analysis):

myFun -
function(lm)
{
out - c(lm$coefficients[1],
 lm$coefficients[2],
 length(lm$run1$model$y),
 summary(lm)$coefficients[2,2],
 pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
summary(lm)$fstatistic[3], lower.tail = FALSE),
 summary(lm)$r.squared)
names(out) - c(intercept,slope,n,slope.SE,p.value,r.squared)
return(out)}

results - list()
for (i in 1:length(v)) results[[names(v)[i]]] - myFun(v[[i]])
as.data.frame(results)

I have checked the plyr package, but the example that fits my data best 
uses a for loop and I would like to avoid these. I have also tried the 
following (among many other options) without results:

v-tapply(data$w,list(data$cm,data$year),f)

Error in is.function(FUN) : 'FUN' is missing

Any ideas?

Thanks for your help,

Elena


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] metaMDS

2011-09-28 Thread Gavin Simpson

On Fri, 2011-09-23 at 12:43 -0500, Jean V Adams wrote:
 Lineth Contreras wrote on 09/23/2011 11:35:10 AM:
  
  Hello R-user community,
  
  I am applying the function metaMDS. However, I would like to know if 
 there
  is any option to export the data I got from the axis as a data frame.
  
  I have tried as.data.frame.list but is not working.  Any suggestion?
  
  Thank you in advance for your help,
  
  Lineth
 
 
 When you say the data I got from the axis do you mean the coordinates 
 contained in the $points of the resulting object?  If so, something like 
 this should work (using the example provide in ?metaMDS):

You would be better off with the scores() method for metaMDS objects:

data(dune)
sol - metaMDS(dune)
scrs - scores(sol)

`scrs` is a matrix:

 class(scrs)
[1] matrix

which can be exported via say `write.csv()`:

write.csv(scrs, filenames.csv)

If you want a data frame in R, then

SCRS - as.data.frame(scrs)

will work, but there is little reason to convert to a data frame just to
export it out of R.

HTH

G

 data(dune)
 library(MASS)
 sol - metaMDS(dune)
 df - as.data.frame(sol$points)
 
 Jean
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting Lines Through multiple groups

2011-09-28 Thread Eik Vettorazzi

Hi Clara,
are there repeated measures of an animal on the same day?
Otherwise I did not get the point of the level of aggregation.

Using lattice you can do sth like
xyplot(Cort~Day,groups=Animal,type=c(p,a),data=df)

but with your given data this just connects the raw points

hth.

Am 28.09.2011 08:03, schrieb clara_eco:
Hi I have data in the following format
Cort Day Animal
230 1
273 1
240 2
271 2
342 2
303 2
244 2
200 3
241 3
282 3
344 3
etc.
It is measured across time(day) however no every individual is measured the
same number of times. All I want to do is plot the Raw data and then run a
line connecting the data points of each individual animal, for the example
above there would be three lines as there are three animals.
I've tried, gplots, lattice and car but I'm not really figuring it out, any
help would be greatly appreciated!

Thanks
Clara

--
View this message in context:
http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3850099.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts;
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander
Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus

[R] Password protected R Repository

2011-09-28 Thread stefan . petersson

  

Hi,

I've set up a very simple R repository. Just a single source
library. Everything works fine. I can install the package on my client
using:

install.packages(repos='http://www.myServer.se/myRepo/',
pkgs='myLib', dep=TRUE)

However, I want to protect the repo, so I use a
.htaccess, placed directly under 'myRepo' on the server. I use
'Authentication Basic' and 'require valid-user'.

I've tried a few
things. From the
obvious:

install.packages(repos=getURL('http://www.myServer.se/myRepo',
userpwd='user:password'), pkgs='myLib', dep=TRUE)

To the more
elaborate:

h 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Password protected R Repository

2011-09-28 Thread Stefan Petersson

 stefan.petersson at inizio.se writes:

 
 
 Hi,
 
 I've set up a very simple R repository. Just a single source
 library. Everything works fine. I can install the package on my client
 using:
 
 install.packages(repos='http://www.myServer.se/myRepo/',
 pkgs='myLib', dep=TRUE)
 
 However, I want to protect the repo, so I use a
 .htaccess, placed directly under 'myRepo' on the server. I use
 'Authentication Basic' and 'require valid-user'.
 
 I've tried a few
 things. From the
 obvious:
 
 install.packages(repos=getURL('http://www.myServer.se/myRepo',
 userpwd='user:password'), pkgs='myLib', dep=TRUE)
 
 To the more
 elaborate:
 
 h 
   [[alternative HTML version deleted]]
 
 
I add this myself, since some strange 'alternative HTML version deleted' thingy 
cut my post short. Here is the rest:

To the more elaborate:

h - getCurlHandle(header = TRUE,
userpwd = user:password,
netrc = TRUE,
followlocation = TRUE
)

install.packages(getURL(http://www.myServer.se/myRepo/;,
verbose = TRUE,
curl = h
),
pkgs='myLib',
dep=TRUE
)

But it's not working. The last call is complaining of a missing index.html. And 
if I put one under myRepo, I get connected to that page, but install.packages 
can't go further to the src directory on the server. This is what I get:

Installing package(s) into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository HTTP/1.1 301 Moved Permanently
Date: Wed, 28 Sep 2011 09:20:25 GMT
Server: Apache
Location: http://www.myServer.se/myRepo/
Vary: Accept-Encoding
Content-Length: 235
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 403 Forbidden
Date: Wed, 28 Sep 2011 09:20:25 GMT
Server: Apache
Vary: Accept-Encoding
Content-Length: 208
Content-Type: text/html; charset=iso-8859-1

!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN
htmlhead
title403 Forbidden/title
/headbody
h1Forbidden/h1
pYou don't have permission to access /smisc/
on this server./p
/body/html
/src/contrib
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package ‘smisc’ is not available (for R version 2.13.1)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Password protected R Repository

2011-09-28 Thread Prof Brian Ripley


Who said that RCurl::getURL worked with install.packages?
(At least, I assume this is from RCurl: you did not mention it.)

install.packages() first calls available.packages(), and that uses 
download.file to get the PACKAGES[.gz] file.  It then calls 
download.file to get the packages.


So please read the help for download.file (as the help pages say), and 
try the solutions described there.


On Wed, 28 Sep 2011, Stefan Petersson wrote:


stefan.petersson at inizio.se writes:




Hi,

I've set up a very simple R repository. Just a single source
library. Everything works fine. I can install the package on my client
using:

install.packages(repos='http://www.myServer.se/myRepo/',
pkgs='myLib', dep=TRUE)

However, I want to protect the repo, so I use a
.htaccess, placed directly under 'myRepo' on the server. I use
'Authentication Basic' and 'require valid-user'.

I've tried a few
things. From the
obvious:


*None* of this is 'obvious', and none of it is reproducible.


install.packages(repos=getURL('http://www.myServer.se/myRepo',
userpwd='user:password'), pkgs='myLib', dep=TRUE)

To the more
elaborate:

h
[[alternative HTML version deleted]]



I add this myself, since some strange 'alternative HTML version deleted' thingy
cut my post short. Here is the rest:



To the more elaborate:

h - getCurlHandle(header = TRUE,
userpwd = user:password,
netrc = TRUE,
followlocation = TRUE
)

install.packages(getURL(http://www.myServer.se/myRepo/;,
verbose = TRUE,
curl = h
),
pkgs='myLib',
dep=TRUE
)

But it's not working. The last call is complaining of a missing index.html. And
if I put one under myRepo, I get connected to that page, but install.packages
can't go further to the src directory on the server. This is what I get:

Installing package(s) into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository HTTP/1.1 301 Moved Permanently
Date: Wed, 28 Sep 2011 09:20:25 GMT
Server: Apache
Location: http://www.myServer.se/myRepo/
Vary: Accept-Encoding
Content-Length: 235
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 403 Forbidden
Date: Wed, 28 Sep 2011 09:20:25 GMT
Server: Apache
Vary: Accept-Encoding
Content-Length: 208
Content-Type: text/html; charset=iso-8859-1

!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN
htmlhead
title403 Forbidden/title
/headbody
h1Forbidden/h1
pYou don't have permission to access /smisc/
on this server./p
/body/html
/src/contrib
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
 package ‘smisc’ is not available (for R version 2.13.1)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Printing an xtable with type = html

2011-09-28 Thread David Scott



I have been playing around with producing tables using xtable and the 
type = html argument when printing. For example, if xtbl is the output 
of a dataframe which has been run through xtable, using the command:


print(xtbl, type = html,
  html.table.attributes = border = '1', align = 'center')

I would be interested to see other examples of the use of xtable to 
produce html. There is a whole vignette on using xtable to produce all 
sorts of tables for incorporation into a TeX document but I have found 
no examples of producing html with any table attributes.


Ideally xtable should be able to access a css file but I don't see any 
mechanism for doing that. Perhaps someone can enlighten me.


David Scott

--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation cleaning

2011-09-28 Thread Jim Lemon


On 09/28/2011 01:13 PM, pip56789 wrote:

Hi,

I have a few methodological and implementation questions for ya'll. Thank
you in advance for your help. I have a dataset that reflects people's
preference choices. I want to see if there's any kind of clustering effect
among certain preference choices (e.g. do people who pick choice A also pick
choice D).

I have a data set that has one record per user ID, per preference choice.
It's a long form of a data set that looks like this:

ID | Page
123 | Choice A
123 | Choice B
456 | Choice A
456 | Choice B
...

I thought that I should do the following

1. Make the data set wide, counting the observations so the data looks
like this:
ID | Count of Preference A | Count of Preference B
123 | 1 | 1
...

Using
table1- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

2. Create a correlation matrix of preferences
cor(table2[,-1])

How would I restrict my correlation to show preferences that met a minimum
sample threshold? Can you confirm if the two following commands do the same
thing? What would I do from here (or am I taking the wrong approach)
table1- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
table2- with(data, table(Page,Page))



Hi Peter,
An easy way to visualize set intersections is the intersectDiagram 
function in the plotrix package. This will display the counts or 
percentages of each type of intersection. Your data could be passed like 
this:


choices-data.frame(IDs=sample(1:20,50,TRUE),
 sample(LETTERS[1:4],50,TRUE))
library(plotrix)
intersectDiagram(choices)

This example is a bit messy, as it will generate quite a few repeated 
choices that will be ignored by intersectDiagram, but it should give you 
the idea.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply lm function to dataset split by two variables

2011-09-28 Thread Dennis Murphy

Hi:

Here's one way to do it with the plyr package:

dd - read.table(textConnection(
yearsps cm  w
200950  16  22
200950  17  42
200950  18  45
200951  15  45
200951  16  53
200951  17  73
201050  15  22
201050  16  41
201050  16  21
201050  17  36
201051  15  43
201051  16  67
201051  17  79), header = TRUE)
closeAllConnections()

library('plyr')

# Input a data frame, output a list of lm objects
modlist - dlply(dd, .(sps, year), function(d) lm(w ~ cm, data = d))

# For use in plyr's ldply() function, the utility function should
# return a data frame. We save some effort in simple linear regression
# by noting that the two-sided p-value of the t-test of zero slope is the
# same as that of the overall F test:
extractfun - function(m) {
   cf - coef(m)
   tinfo - summary(m)$coefficients[2, c(2, 4)]
   r2 - summary(m)$r.squared
   data.frame(intercept = cf[1], slope = cf[2], n = length(resid(m)),
  slope.se = tinfo[1], pval = tinfo[2], Rsq = r2)
   }

# Take a list (of models) as input and output a data frame:
ldply(modlist, extractfun)

  sps year intercept slope n slope.se  pval   Rsq
1  50 2009 -159.1667  11.5 3 4.907477 0.2567749 0.8459488
2  50 2010  -82.   7.0 4 7.141428 0.4303481 0.3245033
3  51 2009 -167.  14.0 3 3.464102 0.1544210 0.9423077
4  51 2010 -225.  18.0 3 3.464102 0.1210377 0.9642857

HTH,
Dennis

On Wed, Sep 28, 2011 at 1:41 AM, Elena Guijarro
elena.guija...@vi.ieo.es wrote:

 Dear all,

 I am not fluent in R and am struggling to 1) apply a lm to a weight-size
 dataset, thus the model has to run separately for each species, each
 year; 2) extract coefs, r-squared, n, etc. The data look like this:

 year    sps     cm      w
 2009    50      16      22
 2009    50      17      42
 2009    50      18      45
 2009    51      15      45
 2009    51      16      53
 2009    51      17      73
 2010    50      15      22
 2010    50      16      41
 2010    50      16      21
 2010    50      17      36
 2010    51      15      43
 2010    51      16      67
 2010    51      17      79



 The following script works for data from a single year, but I don't find
 a way to subset the data by sps AND year and get the function running:

 f - function(data) lm(log(w) ~ log(cm+0.5), data = data)
 v - lapply(split(data, data$sps), f)

 and then I can extract the data with this script from Peter Solymos
 (although I do not get the number of points used in the analysis):

 myFun -
 function(lm)
 {
 out - c(lm$coefficients[1],
     lm$coefficients[2],
     length(lm$run1$model$y),
     summary(lm)$coefficients[2,2],
     pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
 summary(lm)$fstatistic[3], lower.tail = FALSE),
     summary(lm)$r.squared)
 names(out) - c(intercept,slope,n,slope.SE,p.value,r.squared)
 return(out)}

 results - list()
 for (i in 1:length(v)) results[[names(v)[i]]] - myFun(v[[i]])
 as.data.frame(results)

 I have checked the plyr package, but the example that fits my data best
 uses a for loop and I would like to avoid these. I have also tried the
 following (among many other options) without results:

 v-tapply(data$w,list(data$cm,data$year),f)

 Error in is.function(FUN) : 'FUN' is missing

 Any ideas?

 Thanks for your help,

 Elena


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply lm function to dataset split by two variables

2011-09-28 Thread Michael Dewey


At 09:41 28/09/2011, Elena Guijarro wrote:


Dear all,

I am not fluent in R and am struggling to 1) apply a lm to a weight-size
dataset, thus the model has to run separately for each species, each
year; 2) extract coefs, r-squared, n, etc. The data look like this:

yearsps cm  w
200950  16  22
200950  17  42
200950  18  45
200951  15  45
200951  16  53
200951  17  73
201050  15  22
201050  16  41
201050  16  21
201050  17  36
201051  15  43
201051  16  67
201051  17  79



The following script works for data from a single year, but I don't find
a way to subset the data by sps AND year and get the function running:


I think lmList from the nlme package does this for you. It comes with 
some other helpful extractors or you can write your own as you have 
done. Personally I would return a list rather than a vector but that 
is a matter of taste.




f - function(data) lm(log(w) ~ log(cm+0.5), data = data)
v - lapply(split(data, data$sps), f)

and then I can extract the data with this script from Peter Solymos
(although I do not get the number of points used in the analysis):

myFun -
function(lm)
{
out - c(lm$coefficients[1],
 lm$coefficients[2],
 length(lm$run1$model$y),
 summary(lm)$coefficients[2,2],
 pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
summary(lm)$fstatistic[3], lower.tail = FALSE),
 summary(lm)$r.squared)
names(out) - c(intercept,slope,n,slope.SE,p.value,r.squared)
return(out)}

results - list()
for (i in 1:length(v)) results[[names(v)[i]]] - myFun(v[[i]])
as.data.frame(results)

I have checked the plyr package, but the example that fits my data best
uses a for loop and I would like to avoid these. I have also tried the
following (among many other options) without results:

v-tapply(data$w,list(data$cm,data$year),f)

Error in is.function(FUN) : 'FUN' is missing

Any ideas?

Thanks for your help,

Elena


[[alternative HTML version deleted]]


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems using the 'HPloglik' function in the SDE package

2011-09-28 Thread Steven Craig

Hello all,
I am trying to produce the closed-form Ait-Sahalia approximation to the
log-likelihood function of a diffusion process but the function arguments I
am passing to the function result in NaN being returned. I've checked the
'Transform','S' and 'M' functions and the problem doesn't seem to be with
these. Does anyone have any ideas why this isn't working? My code is
reproduced below for info.

Thanks in advance.

Regards,
Steven

## simulate sample of diffusion process to be estimated
# define drift and diffusion coefficients for SDE
d - expression(x^(-1) - 1 + x - x^2)
s - expression(x^1.3)
s.x - expression(1.3*(x^0.3))
# simulate process using the Milstein approximation
X -
sde.sim(t0=0,T=100,X0=1,N=1000,drift=d,sigma=s,sigma.x=s.x,method=milstein)

## define the Lamperti transform function
Transform - function(t,x,theta)
(1/(theta[5]*(theta[6]-1)))*(x^(1-theta[6]))

## define the diffusion function
S - function(t,x,theta) theta[5]*(x^theta[6])

## define the transformed drift function and it's first six derivatives
m0 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-1/theta[5])*(theta[1]*((b*x)^p1) - theta[2]*((b*x)^p2) + theta[3]*b*x -
theta[4]*((b*x)^p3) - 0.5*theta[6]*(theta[5]^2)*((b*x)^(-1)))
}
m1 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-b/theta[5])*(theta[1]*(p1*(b*x)^(p1-1)) - p2*theta[2]*((b*x)^(p2-1)) +
theta[3] - p3*theta[4]*((b*x)^(p3-1)) +
0.5*theta[6]*(theta[5]^2)*((b*x)^(-2)))
}
m2 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-(b^2)/theta[5])*(theta[1]*(p1*(p1-1)*(b*x)^(p1-2)) -
p2*(p2-1)*theta[2]*((b*x)^(p2-2)) - p3*(p3-1)*theta[4]*((b*x)^(p3-2)) -
theta[6]*(theta[5]^2)*((b*x)^(-3)))
}
m3 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-(b^3)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(b*x)^(p1-3)) -
p2*(p2-1)*(p2-2)*theta[2]*((b*x)^(p2-3)) -
p3*(p3-1)*(p3-2)*theta[4]*((b*x)^(p3-3)) +
3*theta[6]*(theta[5]^2)*((b*x)^(-4)))
}
m4 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-(b^4)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(p1-3)*(b*x)^(p1-4)) -
p2*(p2-1)*(p2-2)*(p2-3)*theta[2]*((b*x)^(p2-4)) -
p3*(p3-1)*(p3-2)*(p3-3)*theta[4]*((b*x)^(p3-4)) -
12*theta[6]*(theta[5]^2)*((b*x)^(-5)))
}
m5 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-(b^5)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(p1-3)*(p1-4)*(b*x)^(p1-5)) -
p2*(p2-1)*(p2-2)*(p2-3)*(p2-4)*theta[2]*((b*x)^(p2-5)) -
p3*(p3-1)*(p3-2)*(p3-3)*(p3-4)*theta[4]*((b*x)^(p3-5)) +
60*theta[6]*(theta[5]^2)*((b*x)^(-6)))
}
m6 - function(t,x,theta)
{
b - theta[5]*(theta[6]-1)
p1 - (theta[6]+1)/(theta[6]-1)
p2 - theta[6]/(theta[6]-1)
p3 - (2-theta[6])/(1-theta[6])
(-(b^6)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(p1-3)*(p1-4)*(p1-5)*(b*x)^(p1-6))
- p2*(p2-1)*(p2-2)*(p2-3)*(p2-4)*(p2-5)*theta[2]*((b*x)^(p2-6)) -
p3*(p3-1)*(p3-2)*(p3-3)*(p3-4)*(p3-5)*theta[4]*((b*x)^(p3-6)) -
360*theta[6]*(theta[5]^2)*((b*x)^(-7)))
}
# now create list consisting of m0..m6
M - list(m0,m1,m2,m3,m4,m5,m6)

## use HPloglik function to calculate log-likelihood function at the true
parameter values
HPloglik(X,c(1,1,1,1,1,1.3),Moments,Transform,S)
-- 
S

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to solve a simple discrete Bayesian Belief Network?

2011-09-28 Thread Marcio Pupin Mello


Can somebody save-me? Thanks in advance!


#R script:
#trying to find out how solve a discrete Bayesian Belief Network.
#option: using 'catnet' package

#BEGIN
library(catnet)
cnet - cnNew(nodes = c(a, b, c), cats = list(c(1, 2),
c(1, 2), c(1, 2)), parents = list(NULL, c(1), c(1,
2)), probs = list(c(0.2, 0.8), list(c(0.6, 0.4), c(0.4, 0.6)),
list(list(c(0.3, 0.7), c(0.7, 0.3)), list(c(0.9, 0.1), c(0.1,
0.9)

#what is the probability of a=1?
cnNodeMarginalProb(cnet,node=1)[1]
#0.2

#what is the probability of b=2?
cnNodeMarginalProb(cnet,node=2)[2]
#0.56

#what is the probability of c=1?
cnNodeMarginalProb(cnet,node=3)[1]
#0.428


#but how can I answer questions like:

#what is the probability of a=1 given that c=1? i.e. P(a=1|c=1)?

#or what is the probability of a=1 given that b=2 and c=1? i.e. 
P(a=1|b=2,c=1)?

#END

--
Marcio Pupin Mello

Survey Engineer
PhD candidate in Remote Sensing
National Institute for Space Research (INPE) - Brazil
Laboratory of Remote Sensing in Agriculture and Forestry (LAF)
www.dsr.inpe.br/~mello

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Not saving plot

2011-09-28 Thread Joel

Hi

I got a strange problem.

I got this code:
poc-function(USER=nana,REGISTRY=nana,...){
require(RMySQL)
require(R2HTML)
require(lattice)

... (removed the dataprocessing)


toReturn=capture.output(HTML(as.title(En liten
rapport!),file=,align=center))
fileDir-paste(/home/joel/tmpR/rikssvikt/,USER,sep=);
fileName-tempfile(pattern=UTT1, tmpdir=);
try(png(paste(fileDir,fileName,.png,sep=)));
barchart(toPlot,xlab=Antal körningar);
dev.off();
gfn -paste(filemanager?USERID=,USER,file=,fileName,.png,sep=);
tmp-capture.output(HTMLInsertGraph(file=,GraphFileName=gfn));
toReturn-paste(toReturn,substr(tmp,1,nchar(tmp)-8));

return(paste(div,toReturn[2],/div,sep=))
}

When I just run the code without its being in the function in works and
saves the plot at the designated location. But when I instead try to run the
code when its inside the function it dossent save the plot at all.
Anyone know why that is?

//joel

--
View this message in context: 
http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3850981.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Negative Quartile

2011-09-28 Thread Jeff Newmiller

This mailing list is about R help, not basic statistics help. Your message is 
full of what appear to be misconceptions, though it could be just something 
basic that affects the whole thing. You should consult with a statistician 
locally for most effective communication.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Damian Abalo wardamo.z...@gmail.com wrote:

Hello, I have a doubt, but it is more statistic than just about R: How
the people deal usually with negative percentile and quartile? In my
concrete case, I want to know the distribution of an error, so the
nearest it is to 0 the better (I think it would be optimun to have the
nearest values to 0 in the lower percentiles). The best result would
be making the percentile of the absolute value (without sign), but the
sing is also important, and I need to keep it.
This results in the big negative values to be in the lower
percentiles, which are usually understanded as the best cases, since
we are working with an error. My question is, what is the usual thing
done in this cases? keep the sign, and read the lower percentiles as
bad cases also (being the optimal ones the center ones) or try to
somehow get the signless distribution keeping the signs somehow? And
if this is the case, how can this be done?

Thanks for your time, Best regards

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Asking an old housekeeping question...

2011-09-28 Thread Brian Lunergan

My bad. I know I've asked this some time back but I'm afraid I've lost the post
along the way and am not having much luck searching the net.

Don't want to pad my R setup with orphan packages. Is there a small snippet of
code or even a single command that would allow me to check the packages I do use
for their dependencies and then I can chuck those that don't turn up as wanted
on the journey. Thanks in advance.

Later days...
-- 
Brian Lunergan
Nepean, Ontario
Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Asking an old housekeeping question...

2011-09-28 Thread S Ellison

??dependencies shows up (on my system)

tools::package.dependencies
tools::dependsOnPkgs

The first looks for dependencies; the second for reverse dependencies.


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Lunergan
 Sent: 28 September 2011 13:55
 To: R_Help List
 Subject: [R] Asking an old housekeeping question...
 
 My bad. I know I've asked this some time back but I'm afraid 
 I've lost the post along the way and am not having much luck 
 searching the net.
 
 Don't want to pad my R setup with orphan packages. Is there a 
 small snippet of code or even a single command that would 
 allow me to check the packages I do use for their 
 dependencies and then I can chuck those that don't turn up as 
 wanted on the journey. Thanks in advance.
 
 Later days...
 --
 Brian Lunergan
 Nepean, Ontario
 Canada
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 ***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not saving plot

2011-09-28 Thread Joel

Seems to have something todo with barchart.
Cuz if I just change it to plot it works.

But I want to use barchart due to the graphical advantages so do anyone have
an ide.

--
View this message in context: 
http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3851255.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not saving plot

2011-09-28 Thread Ben Tupper


On Sep 28, 2011, at 8:19 AM, Joel wrote:

 Hi
 
 I got a strange problem.
 
 I got this code:
 poc-function(USER=nana,REGISTRY=nana,...){
 require(RMySQL)
 require(R2HTML)
 require(lattice)
 
 ... (removed the dataprocessing)
 
 
 toReturn=capture.output(HTML(as.title(En liten
 rapport!),file=,align=center))
 fileDir-paste(/home/joel/tmpR/rikssvikt/,USER,sep=);
 fileName-tempfile(pattern=UTT1, tmpdir=);
 try(png(paste(fileDir,fileName,.png,sep=)));
 barchart(toPlot,xlab=Antal körningar);
 dev.off();
 gfn -paste(filemanager?USERID=,USER,file=,fileName,.png,sep=);
 tmp-capture.output(HTMLInsertGraph(file=,GraphFileName=gfn));
 toReturn-paste(toReturn,substr(tmp,1,nchar(tmp)-8));
 
 return(paste(div,toReturn[2],/div,sep=))
 }
 
 When I just run the code without its being in the function in works and
 saves the plot at the designated location. But when I instead try to run the
 code when its inside the function it dossent save the plot at all.
 Anyone know why that is?
 

Hi,

Might this be related to R FAQ 7.22?  

http://cran.r-project.org/doc/FAQ/R-FAQ.html

Cheers,
Ben



 //joel
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3850981.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475 
http://www.bigelow.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Password protected R Repository

2011-09-28 Thread Stefan Petersson

Prof Brian Ripley ripley at stats.ox.ac.uk writes:

 
 Who said that RCurl::getURL worked with install.packages?
 (At least, I assume this is from RCurl: you did not mention it.)
 
 install.packages() first calls available.packages(), and that uses 
 download.file to get the PACKAGES[.gz] file.  It then calls 
 download.file to get the packages.
 
 So please read the help for download.file (as the help pages say), and 
 try the solutions described there.
 
 On Wed, 28 Sep 2011, Stefan Petersson wrote:
 

The helpfiles for 'download.file' was not that helpful. But maybe it's just me 
not being able to read them correctly. 

I tried to call install.packages with the 'method=wget', and hoped for a 
username and password dialog. But no luck. Other than that, I see no arguments 
that relates to my problem under ?download.file. Which btw is 'installing an R 
library from a password protected URL (Apache Basic Authentication)'.

Actually, nobody said that RCurl::getURL would work with install.packages, but 
from what was written in a post somewhere I jumped to the (false) conclusion 
that it would work. That's why I tried it.

Any hints would be greatly appreciated!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not saving plot

2011-09-28 Thread Rich Shepard


On Wed, 28 Sep 2011, Joel wrote:


Seems to have something todo with barchart. Cuz if I just change it to
plot it works.

But I want to use barchart due to the graphical advantages so do anyone
have an ide.


Joel,

  Perhaps. Do you want to save your figure to disk as a .pdf, .jpg, or .png?

  If so, you need to tell R that's what you want. For example,

jpg(mybarchart.jpg)
now enter your barchart creation command
dev.off()

  This sequence directs the output of the figure from the screen to the
named file. Afterwards you need to return the display to the screen or any
other plotting command will be saved in that one file, overwriting whatever
it contains.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How does the survfit.coxph calculate the survival value?

2011-09-28 Thread Terry Therneau

The answers to your questions consume an entire chapter of my book, so
it is difficult to write a reasonable email response.  When there are
time dependent covariates, the creation and interpretation of a survival
curve is particularly subtle.
  The survfit.coxph routine calculates the usual estimates: Breslow,
Fleming-Harrington, or Kalbfleisch-Prentice; these differ in how they
handle ties but the numerical results are usually very close.  A
treatise on the formulas, theory, and derivation of these is beyond the
scope of this message.

Terry Therneau

--- begin included message --- 

I am in an urgent using R to fit the cox proportional model. My data is
a
data frame including 100 individuals which are software stress tests. 
There are time-to-failure and six covariates meaning the system resource
(every 4 minutes).
Is it possible to use R to fit the Cox model?

I used *coxph* to estimated the coefficients for both time-independent
and
time-dependent covariates.
And next I need to calculated the survival and hazard rate for cox model
with time-independent and that with time-dependent covariates.

I see many people use *survfit.coxph *to get the survival.
Could anybody please tell me how is the survival value calculated? For
example, what kind of procedure.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not saving plot

2011-09-28 Thread Joel

You sir are a hero!

It works now thanks to your fast and accurate answer.

Thanks for the help.

//joel

--
View this message in context: 
http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3851331.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dump.frames, debugger and the ... argument

2011-09-28 Thread Jannis

Dear List, 

i run R non interactively and use option(error = dump.frames) to reconstruct 
the cause of errors for debugging. One of the functions in the call stack 
however uses the ... (dot dot dot) argument. When I run debugger() and try to 
browse to this functions' environment I get the following error message:

Error in get(.obj, envir = dump[[.selection]]) : 
  argument ... is missing, with no default


Some googleing showed that this seems to be a more common problem but I was not 
able to find any solution.

Any ideas?


Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survival analysis: interval censored data

2011-09-28 Thread Terry Therneau

You have still not given me enough information to reproduce your
problem.  Why doesn't it include all years?  I have no way of knowing,
since we have no data.

--- begin included message --
halo david

when I use type= 'interval'

Call: survfit(formula = Surv(ingreso, fecha, estado, type = interval)
~ 
categoria)

and when I use just 

Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria)

I don t know why when I use type = interval it does not survival
calculed for very year


regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Download statistics for a package

2011-09-28 Thread Ravi Varadhan

Hi,

How can I get information on how many times a particular package has been 
downloaded from CRAN?

Thanks,
Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edumailto:rvarad...@jhmi.edu


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GLMER Syntax Question

2011-09-28 Thread Ubuntu Diego

I was wondering is someone can explain me the differences between these (random 
slopes and intercept) models

model1 - glmer(output~A+B+C+(A+B+C | F) )

model2 - glmer(output~A+B+C+(A | F)+(B | F)+(C | F)


Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLMER Syntax Question

2011-09-28 Thread Ben Bolker

Ubuntu Diego ubuntu.diego at gmail.com writes:

 I was wondering is someone can explain me the differences
  between these (random slopes and intercept) models
 
 model1 - glmer(output~A+B+C+(A+B+C | F) )
 model2 - glmer(output~A+B+C+(A | F)+(B | F)+(C | F)
 

  You should ask this question on the r-sig-mixed-models mailing list.

  sincerely
   Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survival analysis: interval censored data

2011-09-28 Thread Ruth Arias



hallo terry:

I attached araceae data set, 

when I use this:

surara-survfit(Surv(time,time2,event)~categoria)

Call: survfit(formula = Surv(time, time2, event) ~ categoria)

    records n.max n.start events median 0.95LCL 0.95UCL
categoria=C  94    63   0 21 NA  NA  NA
categoria=E 111    77   0 26 NA  NA  NA
 summary(surara)
Call: survfit(formula = Surv(time, time2, event) ~ categoria)

    categoria=C 
 time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI
 2006 63   5   0   23    0.921  0.0341    0.856    0.990
 2007 35   2  30    1    0.868  0.0483    0.778    0.968
 2008 62   5   1    3    0.798  0.0536    0.700    0.910
 2009 55   4   0    5    0.740  0.0570    0.636    0.861
 2010 46   5   0   41    0.660  0.0611    0.550    0.791

    categoria=E 
 time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI
 2005 71   7   3    0    0.901  0.0354    0.835    0.973
 2006 67   2   0   22    0.875  0.0391    0.801    0.955
 2007 43   1  36    1    0.854  0.0432    0.774    0.943
 2008 77   5   0    8    0.799  0.0469    0.712    0.896
 2009 64   4   1    3    0.749  0.0502    0.657    0.854
 2010 58   7   0   51    0.658  0.0545    0.560    0.774

but whe I included type=interval, 

 suraraint-survfit(Surv(time,time2,event,type='interval')~categoria) # falta 
 arreglar lo del intervalo!!!
 summary(suraraint)
Call: survfit(formula = Surv(time, time2, event, type = interval) ~ 
    categoria)

    categoria=C 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
 2004  95.00   13.14    0.862  0.0354    0.795    0.934
 2007  31.86    7.19    0.667  0.0695    0.544    0.818
 2008   1.67    1.67    0.000 NaN   NA   NA

    categoria=E 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
 2004  112.0   18.47    0.835  0.0351    0.769    0.907
 2005   40.5    1.06    0.813  0.0401    0.738    0.896
 2007   37.5    7.46    0.651  0.0620    0.540    0.785

it does not survival calculed for very year

I have a one-year interval between each census




De: Terry Therneau thern...@mayo.edu
Para: Ruth Arias rueu...@yahoo.es
CC: r-help@r-project.org
Enviado: miércoles 28 de septiembre de 2011 16:00
Asunto: Re:  survival analysis: interval censored data

You have still not given me enough information to reproduce your
problem.  Why doesn't it include all years?  I have no way of knowing,
since we have no data.

--- begin included message --
halo david

when I use type= 'interval'

Call: survfit(formula = Surv(ingreso, fecha, estado, type = interval)
~ 
    categoria)

and when I use just 

Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria)

I don t know why when I use type = interval it does not survival
calculed for very year


regardstimefamilia genero  categoria   time2   event
2004Araceae Anthurium   C   20100
2004Araceae PhilodendronC   20061
2004Araceae Anthurium   C   20100
2004Araceae Anthurium   C   20100
2004Araceae Anthurium   C   20100
2004Araceae PhilodendronC   20060
2004Araceae PhilodendronC   20060
2004Araceae Anthurium   C   20100
2004Araceae Anthurium   C   20090
2004Araceae Anthurium   C   20060
2004Araceae Anthurium   C   20060
2004Araceae PhilodendronC   20060
2004Araceae PhilodendronC   20060
2004Araceae Anthurium   C   20100
2004Araceae PhilodendronC   20101
2004Araceae PhilodendronC   20060
2004Araceae PhilodendronC   20060
2004Araceae PhilodendronC   20080
2004Araceae Araceae C   20060
2004Araceae Anthurium   C   20100
2004Araceae Anthurium   C   20081
2004Araceae PhilodendronC   20070
2004Araceae Anthurium   C   20080
2004Araceae Anthurium   C   20100
2004Araceae Anthurium   C   20101
2004Araceae PhilodendronE   20080
2004Araceae Anthurium   E   20060
2004Araceae Anthurium   E   20091
2004Araceae Anthurium   E   20060
2004Araceae Anthurium   E   20081
2004Araceae Anthurium   E   20100
2004Araceae Anthurium   E   20100
2004Araceae Anthurium   E   20100
2004Araceae Anthurium   E   20100
2004Araceae Anthurium

Re: [R] Password protected R Repository

2011-09-28 Thread Uwe Ligges




On 28.09.2011 15:51, Stefan Petersson wrote:

Prof Brian Ripleyripleyat  stats.ox.ac.uk  writes:



Who said that RCurl::getURL worked with install.packages?
(At least, I assume this is from RCurl: you did not mention it.)

install.packages() first calls available.packages(), and that uses
download.file to get the PACKAGES[.gz] file.  It then calls
download.file to get the packages.

So please read the help for download.file (as the help pages say), and
try the solutions described there.

On Wed, 28 Sep 2011, Stefan Petersson wrote:



The helpfiles for 'download.file' was not that helpful. But maybe it's just me
not being able to read them correctly.


Yes, looks like this is the case.



I tried to call install.packages with the 'method=wget', and hoped for a
username and password dialog. But no luck.



The help page says if proper values are stored in the configuration 
file for wget, so why do you expect a dialog?


Best,
Uwe Ligges



Other than that, I see no arguments
that relates to my problem under ?download.file. Which btw is 'installing an R
library from a password protected URL (Apache Basic Authentication)'.

Actually, nobody said that RCurl::getURL would work with install.packages, but
from what was written in a post somewhere I jumped to the (false) conclusion
that it would work. That's why I tried it.

Any hints would be greatly appreciated!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Wilcox test and data collection

2011-09-28 Thread Francesca Pancotto

Dear Contributors
I have a problem with the collection of data from the results of a test.
I need to perform a comparative test over groups of data , recall the value
of the pvalue and create a table.
My problem is in the way to replicate the analysis over and over again over
subsets of data according to a condition.
I have this database, called y:
gg   t1 t2d
 40  1  1 2
 50 2   2 1
 45 1   3 1
 49 2   1 1
 5   2   1 3
 40 1   1 2

where gg takes values from 1 to 100, t1 and t2 have values in (1,2,3) and d
in (0,1,2,3)
I want to perform tests on the values of gg according to the conditions that

d==0 , compare values of gg when t1==1 with values of gg when t1==3
d==1 , compare values of gg when t1==1 with values of gg when t1==3
d==2 , compare values of gg when t1==1 with values of gg when t1==3
..
then
d==0 , compare values of gg when t2==1 with values of gg when t2==3
d==1...


then collect the data of a statistics and create a table.
The procedure i followed is to create sub datasets called m0,m1,m2,m3
corresponding
to the values of d, i.e.

m0- y[y$d==0,c(7,17,18,19)]
m1- y[y$d==1,c(7,17,18,19)]
m2- y[y$d==2,c(7,17,18,19)]
m3- y[y$d==3,c(7,17,18,19)]

then perform the test as follows:

x1-wilcox.test(m0[m0$t1==1,1],m0[m0$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c(g))   #ABC   ID
x2-  wilcox.test(m1[m1$t1==1,1],m1[m1$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c(g))
 x3-  wilcox.test(m2[m2$t1==1,1],m2[m2$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c(g))
x4- wilcox.test(m3[m3$t1==1,1],m3[m3$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c(g))

each of these tests will create an object, say x and then I extract the
value statistics using
x$statistics.

How to automatize this?
Thank you for any help you can provide.
Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Download statistics for a package

2011-09-28 Thread Uwe Ligges




On 28.09.2011 16:10, Ravi Varadhan wrote:

Hi,

How can I get information on how many times a particular package has been 
downloaded from CRAN?


You cannot since I guess not even all mirrors will have logs and there 
is no aggregation of mirror logs happening nor is it clear if we can use 
mirror logs at all legally in all countries.


Uwe





Thanks,
Ravi.
---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins 
University

Ph. (410) 502-2619
email: rvarad...@jhmi.edumailto:rvarad...@jhmi.edu


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get old packages to work on R 2.12.1

2011-09-28 Thread Joseph Boyer

Many thanks for all the suggestions.

For me, the update.packages command will not work behind the firewall at my 
workplace.

But the respondents have given me more than enough suggestions to work with.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dump.frames, debugger and the ... argument

2011-09-28 Thread Duncan Murdoch


On 28/09/2011 9:59 AM, Jannis wrote:

Dear List,

i run R non interactively and use option(error = dump.frames) to reconstruct the cause of 
errors for debugging. One of the functions in the call stack however uses the 
... (dot dot dot) argument. When I run debugger() and try to browse to this 
functions' environment I get the following error message:

Error in get(.obj, envir = dump[[.selection]]) :
   argument ... is missing, with no default


Some googleing showed that this seems to be a more common problem but I was not 
able to find any solution.

Any ideas?



Well, we're just over a month from a new release.  Post a reproducible 
example and this will likely get fixed.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question concerning Box.test

2011-09-28 Thread R. Michael Weylandt

Well, you could to the following to throw out the NAs on a column basis
before being passed to Box.test

apply(P, 2, function(x) Box.test(na.omit(x))$p.value)

and that probably will work for you. This will only throw out the NAs in
each column and won't cancel across columns. However, all the standard
points about throwing away data apply: specifically, you should ask yourself
why you have NA returns on a certain day -- perhaps that is itself deeply
significant -- and make sure the pattern is random (enough).

Hope this helps,

Michael Weylandt



On Wed, Sep 28, 2011 at 5:11 AM, Samir Benzerfa benze...@gmx.ch wrote:

 Many many thanks! My code looked like this: *apply(P,2,
 FUN=Box.test(P)$p.value)*

 So the problem was that I didnt include function(x) before
 Box.test(x)$p.value and that I used my object P in the brackets instead of
 x. It works fine now.

 ** **

 I have one last question regarding this problem: The help file for Box.test
 states that Missing values are not handled. Any idea what that technically
 means? Does it mean that NAs are included in the calculations or does R
 just skip missing values? Since I have many NA Values in my data I would
 like R to exclude these values temporarily for the calculations.
 Specifically Id like R to consider each column without any NAs before
 performing the Box.test for that column and then pass to the next one, such
 that I get the test results for each column as if there where no NAs in it.
 I tried to do this with *na.rm=T or na.action= *as an argument for the
 function apply, but R refuses (error message: *error in FUN(); unused
 argument na.rm=T*). I know that I could clean my data from NAs by using
 na.omit or na.exclude, but this cancels all rows which include an NA and
 since I have about 3000 columns (stocks) and each one includes several
 different NAs I would end up with very few data. Any hints for this issue?
 

 ** **

 Best, S.B.

 ** **

 ** **

 *Von:* R. Michael Weylandt [mailto:michael.weyla...@gmail.com]
 *Gesendet:* Mittwoch, 28. September 2011 00:39

 *An:* Samir Benzerfa; r-help
 *Betreff:* Re: [R] Question concerning Box.test

 ** **

 Plaintext data looks like this:

 P - structure(list(X77.BANK = c(0, 0, 0, 0.003181659, -0.006386799,
 0.028028724, -0.015347692, -0.015910002, 0.00322897, -0.013062473,
 0, -0.03809005, 0.021189299, -0.003460532, -0.010550182, 0.01744389,
 0.010139631, -0.01709, 0.010299957, 0), A...A.MATERIAL = c(0,
 0, 0, -0.008194479, -0.008352074, 0.008352074, 0.004116566, 0.01608682,
 0.019305155, -0.011479818, 0, -0.011791525, -0.00804272, -0.008194479,
 -0.012589127, 0.016705694, 0, 0.012120633, 0.023271342, -0.007619397
 )), .Names = c(X77.BANK, A...A.MATERIAL), row.names = c(NA,
 20L), class = data.frame)

 Can you provide your test code exactly? The following both work for me:

 R apply(P,2,Box.test)
 $X77.BANK

 Box-Pierce test

 data:  newX[, i]
 X-squared = 3.1825, df = 1, p-value = 0.07443


 $A...A.MATERIAL

 Box-Pierce test

 data:  newX[, i]
 X-squared = 0.3258, df = 1, p-value = 0.5682

 R apply(P,2,function(x) Box.test(x)$p.value)
 X77.BANK A...A.MATERIAL
 0.07443097 0.56815070

 Michael

 

 On Tue, Sep 27, 2011 at 5:42 PM, Samir Benzerfa benze...@gmx.ch wrote:**
 **

 Please find the plain text output in the attachment.

  

 Yes, Im sorry! I apologize for the confusion of rows and columns.
 Actually, I want to perform the test for each column and not row.

  

 *Von:* R. Michael Weylandt [mailto:michael.weyla...@gmail.com]
 *Gesendet:* Dienstag, 27. September 2011 17:51
 *An:* Samir Benzerfa; r-help


 *Betreff:* Re: [R] Question concerning Box.test

  

 Send this again using dput() to give a plain text output and I'll look at
 it.

 Also, I think you should probably look into the difference between a row
 and a column.

 Michael

 On Tue, Sep 27, 2011 at 11:48 AM, Samir Benzerfa benze...@gmx.ch wrote:*
 ***

 Many thanks for your hint. I tried regular apply now. However, it still
 doesnt work. Function apply works fine with other regular functions like
 sum or mean. But for the function Box.test(x,) it gives me the following
 error message: 

  

 *Error in Box.test() : *

 *  x is not a vector or univariate time series*

 * *

 For simplicity, I tried to do the test with a simple 2x20 Matrix for 2
 stocks (see below), but it still does not work. It works well if I do the
 test individually for each row à Box.test(x[,1],) and Box.test(x[,2],)**
 **

  

BANK.ABC   ABC.MATERIAL

 1  0.00.0

 2  0.00.0

 3  0.00.0

 4  0.003181659   -0.008194479

 5 -0.006386799   -0.008352074

 6  0.0280287240.008352074

 7 -0.0153476920.004116566

 8 -0.0159100020.016086820

 9  0.0032289700.019305155

 10 -0.013062473   -0.011479818

 11  0.0

Re: [R] inset one map on top of another map

2011-09-28 Thread Jean V Adams

Thank you, Ray!  That worked.

Thank you also, Yihui.  I tried your subplot() suggestion, but couldn't 
get it to work with a map() on a map()

Jean


Ray Brownrigg ray.brownr...@ecs.vuw.ac.nz wrote on 09/27/2011 07:26:34 
PM:
 
 On Wed, 28 Sep 2011, Jean V Adams wrote:
  I want to overlay a small inset map on top of another map, but I can't
  figure out how to do it.
  For example, here are two different maps:
  
  # map 1 - Ohio
  map(state, region= ohio)
  
  # map 2 - US with Ohio darkened
  map(state)
  map(state, region=ohio, fill=T, add=T)
  
  I would like to add map 2 as a small inset in the corner of map 1.
  I have tried:
  
  map(state, region= ohio)
  par(new=TRUE, mar=c(3, 3, 15, 15))
  map(state)
  map(state, region=ohio, fill=T, add=T)
  
  but this seems to erase map 1 and replace it with a full size version 
of
  map 2.
  
  I can successfully overlay an unrelated plot using similar code:
  
  map(state, region= ohio)
  par(new=TRUE, mar=c(3, 3, 15, 15))
  plot(1:10, 1:10)
  
  So, there must be something about the maps() function that I'm 
tripping
  over.
 
 I think the problem is that map() 'insists' (in some sense) on a 
 clean frame so it can get 
 the aspect ratio right.
 
  
  Any suggestions?
  
 How about something like:
 map(state, region= ohio, xlim=c(-85, -80), ylim=c(38, 42))
 par(usr=c(-216, -66, 24, 144))  # you should be able to 
 'automate' this calculation
 map(state, add=T)
 map(state, region=ohio, fill=T, add=T)
 
 HTH
 Ray Brownrigg
 
  I am using R for Windows 2.13.0
  and the maps package version 2.1-5.
  
  Jean
  
  
  `ï¿½.,,  (((ï¿½   `ï¿½.,,  (((ï¿½   `ï¿½.,,  (((ï¿½
  
  Jean V. Adams
  Statistician
  U.S. Geological Survey
  Great Lakes Science Center
  223 East Steinfest Road
  Antigo, WI 54409  USA
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dump.frames, debugger and the ... argument

2011-09-28 Thread Jannis

Well, its 2.10, a slightly old version. I know that I am advised to upgrade to 
the current version. Unfortunately the installed version on the machine I am 
using is out of my reach. The reproducible example would be:

dummyFunct = function(a, ...) {
  b = 2* a
  c = d   # should cause error
  return(b)
}

options(error = quote(dump.frames(dumpto = last.dump, to.file = TRUE)))

dummyFunct(1)
load(last.dump.rda)
debugger(last.dump)
# selection of call 1 causes:

# Error in get(.obj, envir = dump[[.selection]]) : 
#  argument ... is missing, with no default




R version 2.10.1 (2009-12-14) 
x86_64-unknown-linux-gnu

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.10.1



--- Prof Brian Ripley rip...@stats.ox.ac.uk schrieb am Mi, 28.9.2011:

 Von: Prof Brian Ripley rip...@stats.ox.ac.uk
 Betreff: Re: [R] dump.frames, debugger and the ... argument
 An: Duncan Murdoch murdoch.dun...@gmail.com
 CC: Jannis bt_jan...@yahoo.de
 Datum: Mittwoch, 28. September, 2011 15:40 Uhr
 On Wed, 28 Sep 2011, Duncan Murdoch
 wrote:
 
  On 28/09/2011 9:59 AM, Jannis wrote:
  Dear List,
  
  i run R non interactively and use option(error =
 dump.frames) to reconstruct the cause of errors for
 debugging. One of the functions in the call stack however
 uses the ... (dot dot dot) argument. When I run debugger()
 and try to browse to this functions' environment I get the
 following error message:
  
  Error in get(.obj, envir = dump[[.selection]]) :
     argument ... is missing, with no
 default
  
  
  Some googleing showed that this seems to be a more
 common problem but I was not able to find any solution.
  
  Any ideas?
  
  
  Well, we're just over a month from a new
 release.  Post a reproducible example and this will
 likely get fixed.
 
 Looks like this is exactly problem fixed in
 
 r54480 | ripley | 2011-02-18 13:42:03 + (Fri, 18 Feb
 2011) | 1 line
 
 use tryCatch to ensure that what is there can be examined
 
 Just what version of R is this?
 
 
  
  Duncan Murdoch
  
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
  
 
 -- Brian D. Ripley,         
         rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,         
    Tel:  +44 1865 272861 (self)
 1 South Parks Road,         
            +44 1865
 272866 (PA)
 Oxford OX1 3TG, UK           
     Fax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] removing outliers in non-normal distributions

2011-09-28 Thread Ben qant

Hello,

I'm seeking ideas on how to remove outliers from a non-normal distribution
predictor variable. We wish to reset points deemed outliers to a truncated
value that is less extreme. (I've seen many posts requesting outlier removal
systems. It seems like most of the replies center around why do you want to
remove them, you shouldn't remove them, it depends, etc. so I've tried
to add a lot of notes below in an attempt to answer these questions in
advance.)

Currently we Winsorize using the quantile function to get the new high and
low values to set the outliers to on the high end and low end (this is
summarized legacy code that I am revisiting):

#Get the truncated values for resetting:
lowq = quantile(dat,probs=perc_low,na.rm=TRUE)
hiq = quantile(dat,probs=perc_hi,na.rm=TRUE)

#resetting the highest and lowest values with the truncated values:
dat[lowqdat] = lowq
dat[hiqdat] = hiq

What I don't like about this is that it always truncates values (whether
they truly are outliers or not) and the perc_low and perc_hi settings are
arbitrary. I'd like to be more intelligent about it.

Notes:
1) Ranking has already been explored and is not an option at this time.
2) Reminder: these factors are almost always distributed non-normally.
3) For reason I won't get into here, I have to do this pragmatically. I
can't manually inspect the data each time I remove outliers.
4) I will be removing outliers from candidate predictor variables.
Predictors variable distributions all look very different from each other,
so I can't make any generalizations about them.
5) As #4 above indicates, I am building and testing predictor variables for
use in a regression model.
6) The predictor variable outliers are usually somewhat informative, but
their extremeness is a result of the predictor variable calculation. I
think extremeness takes away from the information that would otherwise be
available (outlier effect). So I want to remove some, but not all, of their
extremeness. For example, percent change of a small number: from say 0.001
to 500. Yes, we want to know that it changed a lot, but 49,999,900% is not
helpful and masks otherwise useful information.

I'd like to hear your ideas. Thanks in advance!

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fGarch - Fitting and APARCH-Modell with fixed delta

2011-09-28 Thread Bogey

Hi there,

I'm trying to fit a GJR-GARCH Model using fGarch. I wanted to try that by
fitting an APARCH model with a fixed delta of 2 and a non-fixed gamma. So I
was simply trying to use:

spec - garchFit(~aparch(1,1),data=garchSim(),delta=2)
coef(spec)

And sometimes, it's working like a charm and delta is indeed exactly 2 in
the resulting coefficient vector.
Frequently, though, the resulting delta coefficient has some other value,
usually somewhere between 0 and 2.

Any ideas why? Am I doing something wrong or could this be a bug in fGarch?

--
View this message in context: 
http://r.789695.n4.nabble.com/fGarch-Fitting-and-APARCH-Modell-with-fixed-delta-tp3850702p3850702.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] autocorrelation problem with cointegration

2011-09-28 Thread upani1982

Dear All,

I am looking for a cointegration relationship between Spot and Future Price
of commodites. The problem i am facing follows:

1. After estimating by Engle-Grranger Method, i found that the residuals are
stationary at their level I (o), which is required to fulfill the
cointegration test. But the autocorrelation problem arises, as DW statistics
is signficantly low 0.50-0.88 for various commodities. My question is shall
i go ahead with the results or not. 

2. When i use Johansens Method i found at least one cointegrtion relation.
But i am confused with lag selection criteria. I use VAR to select the
lagselection criteria. But there is autocorrelation problem with the lags it
is providing for AIC. Whether i should take first difference of the price
level to estimate the VAR, then how to use the same lag selection criteria,
when i am using price series in levels to estimate the cointegration by
johansen method. 

Looking forward for your help

With sincere regards,
Upananda

--
View this message in context: 
http://r.789695.n4.nabble.com/autocorrelation-problem-with-cointegration-tp3851336p3851336.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GAMs in R : How to put the new data into the model?

2011-09-28 Thread pigpigmeow

For example: 
GAMs and after stepwise regression:
cod-gam(newCO~RH+s(solar,bs=cr)+windspeed+s(transport,bs=cr),family=gaussian
(link=log),groupD,methods=REML)

I used 10 year meterorology data (2000-2010) to form equation of
concentration of carbon monoxide.
NOW, I have 2011 meteorology data, I want to use the above GAMs to get the
predict value of concentration of carbon monoxide.

How can I put the 2011 data into this gam and get the expected value of
concentration of Carbon monoxide??


--
View this message in context: 
http://r.789695.n4.nabble.com/GAMs-in-R-How-to-put-the-new-data-into-the-model-tp3849851p3850473.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cointegration test

2011-09-28 Thread upananda pani

Dear All,

I am looking for a cointegration relationship between Spot and Future Price
of commodites. The problem i am facing follows:

1. After estimating by Engle-Grranger Method, i found that the residuals are
stationary at their level I (o), which is required to fulfill the
cointegration test. But the autocorrelation problem arises, as DW statistics
is signficantly low 0.50-0.88 for various commodities. My question is shall
i go ahead with the results or not.

2. When i use Johansens Method i found at least one cointegrtion relation.
But i am confused with lag selection criteria. I use VAR to select the
lagselection criteria. But there is autocorrelation problem with the lags it
is providing for AIC. Whether i should take first difference of the price
level to estimate the VAR, then how to use the same lag selection criteria,
when i am using price series in levels to estimate the cointegration by
johansen method.

Looking forward for your help

With sincere regards,
Upananda

-- 


You may delay, but time will not.


Research Scholar
alternative mail id: up...@iitkgp.ac.in
Department of HSS, IIT KGP
KGP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.csv behaviour

2011-09-28 Thread Jim Lemon


On 09/28/2011 09:23 AM, Mehmet Suzen wrote:


This might be obvious but I was wondering if anyone knows quick and easy
way of writing out a CSV file with varying row lengths, ideally an
initial data read from a CSV file which has the same format. See example
below.


I found it quite strange that R cannot write it in one go, so one must
append blocks or post-process the file, is this true? (even Ruby can do
it!!)

Otherwise it puts ,, or similar for missing column values in the
shorter length rows and fill=FALSE option do not work!

I don't want to post-process if possible.

See this post:
http://r.789695.n4.nabble.com/Re-read-csv-trap-td3301924.html

Example that generated Error!

writeLines(c(A,B,C,D,
  1,a,b,c,
  2,f,g,c,
  3,a,i,j,
  4,a,b,c,
  5,d,e,f,
  6,g,h,i,j,k,l,m,n),
con=file(test.csv))

read.csv(test.csv)
try(read.csv(test.csv,fill=FALSE))


Hi Mehmet,
The example doesn't need to call file, writeLines does it for you. It 
worked for me:


 writeLines(c(A,B,C,D,
  1,a,b,c,
  2,f,g,c,
  3,a,i,j,
  4,a,b,c,
  5,d,e,f,
  6,g,h,i,j,k,l,m,n),
con=test.csv)

and to get the original object back, use:

readLines(test.csv)

The reason you can't use read.csv is that it returns a data frame, and 
that object can't have elements of unequal length. If you want an object 
with elements of unequal length, try:


as.list(readLines(test.csv))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generic argument passing to plot function

2011-09-28 Thread statquant2

Hello I am trying to write a function that would plot timeseries easily...
I aim at plotting two time-series on 2 different y axis sharing the same
x-axis

I succeded in doing so: 
plotTimeSerie(x,y,y2,[a lot of other args]){
...
plot()
axis.POSIXct(side=1) #here I build the x-axis 
points() #here I plot the first time serie related to y-axis
...
axis(side=2,[some args])
text(side=2,text=,[some args]) #here I build the y-left axis and annotate
with text
...
points() #here I plot the first time serie related to y-right axis
axis(side=4,[some args])
text(side=2,text=,[some args]) #here I build the y-right axis and annotate
with text

}

My problem is that I would like the user to be able to specify any
parameters he wishes for the axis/points functions.
How can I make plotTimeSerie handling any parameters and pass them into the
specified functions (points,axis...)?

--
View this message in context: 
http://r.789695.n4.nabble.com/generic-argument-passing-to-plot-function-tp3851599p3851599.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] number of items to replace is not a multiple of replacement length

2011-09-28 Thread dunner

Please help with this error message 

drugbook is an 885 x 32 dataframe

 names(drugbook)

[1] DRUG1  DRUG2  DRUG3  DRUG4  DRUG5 
 [6] DRUG6  DRUG7  DRUG8  DRUG9  DRUG10
[11] DRUG11 DRUG12 DRUG13 DRUG14 DRUG15
[16] DRUG16 DrugDose1  DrugDose2  DrugDose3  DrugDose4 
[21] DrugDose5  DrugDose6  DrugDose7  DrugDose8  DrugDose9 
[26] DrugDose10 DrugDose11 DrugDose12 DrugDose13 DrugDose14
[31] DrugDose15 DrugDose16

Equivs is a dataframe holding information for a certain subclass of drugs

 head(Equivs)
  Name  Equiv
1   Alprazolam   0.5
2   Bromazepam   5.5
3 Chlordiazepoxide  25.0
4 Clobazam  20.0
5   Clonazepam   0.5
6  Clorazepate  15.0

What I'm trying to is for each element of drugbook, get the value in
Equivs$Equiv, and multiply it by the corresponding drug dose, which will be
in the n+16 th  column on in the same row of drugbook.

Not all of the grabbed names from Drugbook will be in the dataframe Equivs,
though. So sometimes the == will return a null.

i.e. DrugDose16 is the dose of Drug 16

this I think should work: ( I did try apply and got knotted, so stopped)

mydosequivalents=list()
for( c in 1:16) {
  for (r in 1:885){
name-as.character(drugbook[r,c]);
equivalent-Equivs$Equiv[Equivs$Name==name];
dosequivalent-drugbook[r,c+16] * equivalent;
if (length(dosequivalent)0) mydosequivalents[r][c]-dosequivalent
else (mydosequivalents[r][c]-0)
  }
}

But it is throwing an error, and I can't figure out why. I remedied a
previous error by checking the length of the equivalent.

warnings() 
50: In mydosequivalents[c][r] - dosequivalent :
  number of items to replace is not a multiple of replacement length

mydosequivalents ends up being full of NULL and zeros


Thanks in advance for any help

Ross

--
View this message in context: 
http://r.789695.n4.nabble.com/number-of-items-to-replace-is-not-a-multiple-of-replacement-length-tp3851265p3851265.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rle function to expand for many samples

2011-09-28 Thread sujitha

Dear R experts,
code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','rep('numeric',150))
 s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
 names(s)=c(Values,Probes)

#Suppose the test file looks like with ofcourse more samples with values:
Chr Start   End Sample1 Sample2 Sample3
chr29896633 9896683 0   0   0
chr29896639 9896690 0   0   0 
chr214314039143140980   -0.35   0 
chr214404467144045020   -0.35   0.32 
chr21442171814421777-0.43   -0.35   0.32 
chr21603171016031769-0.43   -0.35   0.32 
chr21603617816036237-0.43   -0.35   0.45 
chr21604866516048724-0.43   -0.35   0.45 
chr237491676374917350   0   0.45 
chr237702947377030090   0   0 
s
Values Probes
10.00  4
2   -0.43  4
30.00  2
40.00  2
5   -0.35  6
60.00  2
70.00  3
80.32  3
90.45  3
10   0.00  1
Here s is nothing but sumarising similiar values with Probes giving the
count of similiar values.
How to I expand rle function to consider 150 samples here I have just shown
for 3 samples other than manually expanding for the rest 147 samples?
I have the rest of the code for my purpose just stuck with this step only.

waiting for your reply,
Thanks,
Suji


--
View this message in context: 
http://r.789695.n4.nabble.com/Rle-function-to-expand-for-many-samples-tp3851676p3851676.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread jim holtman

Here one approach:

 x - read.table(textConnection(Chr start end sample1 sample2
+ chr2 9896633 9896683 0 0
+ chr2 9896639 9896690 0 0
+ chr2 14314039 14314098 0 -0.35
+ chr2 14404467 14404502 0 -0.35
+ chr2 14421718 14421777 -0.43 -0.35
+ chr2 16031710 16031769 -0.43 -0.35
+ chr2 16036178 16036237 -0.43 -0.35
+ chr2 16048665 16048724 -0.43 -0.35
+ chr2 37491676 37491735 0 0
+ chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
 closeAllConnections()

 result - lapply(c('sample1', 'sample2'), function(.samp){
+ # split by breaks in the values
+ .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
+
+ # combine the list of dataframes
+ .range - do.call(rbind, lapply(.grps, function(.set){
+ # create a dataframe of the results
+ data.frame(Sample = .samp
+, Chr = .set$Chr[1L]
+, Start = min(.set$start)
+, End = max(.set$end)
+, Values = .set[[.samp]][1L]
+, Probes = nrow(.set)
+)
+ }))
+ })
 # put the list of dataframes together
 result - do.call(rbind, result)
 result
Sample  ChrStart  End Values Probes
0  sample1 chr2  9896633 14404502   0.00  4
1  sample1 chr2 14421718 16048724  -0.43  4
2  sample1 chr2 37491676 37703009   0.00  2
01 sample2 chr2  9896633  9896690   0.00  2
11 sample2 chr2 14314039 16048724  -0.35  6
21 sample2 chr2 37491676 37703009   0.00  2



On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
 Hi group,

 This is how my test file looks like:
 Chr start end sample1 sample2
 chr2 9896633 9896683 0 0
 chr2 9896639 9896690 0 0
 chr2 14314039 14314098 0 -0.35
 chr2 14404467 14404502 0 -0.35
 chr2 14421718 14421777 -0.43 -0.35
 chr2 16031710 16031769 -0.43 -0.35
 chr2 16036178 16036237 -0.43 -0.35
 chr2 16048665 16048724 -0.43 -0.35
 chr2 37491676 37491735 0 0
 chr2 37702947 37703009 0 0

 This is the output that I am expecting:
 Sample Chr Start End Values Probes
 sample1 chr2 9896633 14404502 0 4
 sample1 chr2 14421718 16048724 -0.43 4
 sample1 chr2 37491676 37703001 0 2
 sample2 chr2 9896633 9896690 0 2
 sample2  chr2 14314039 16048724 -0.35 6
 sample2 chr2 37491676 37703009 0 2

 Here the Chr value is same but can be any other value aswell so unique among
 the similar values. The Start for the first line would be the least value
 until values are similiar (4) then the end would be highest value. The
 values is the unique value among the common values and probes is number of
 similar values.

 Code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))
 #reading the test file
s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]]))
 # to get the last 2 columns
 names(s)=c(Values,Probes)
G=1
 for(i in 1:length(s$Probes)){
 + if(G==1){first-unique(m$Chr[G:s$Probes[i]])
 + second-min(m$Start[G:s$Probes[i]])
 + third-max(m$End[G:s$Probes[i]])
 + c-cbind(first,second,third,s$Values[i],s$Probes[i])
 + print (c)
 + G=(G+s$Probes[i])}
 + else if((G-1)  length(m$Sample1)) {
 + first-unique(m$Chr[G:(G+s$Probes[i]-1)])
 + second-min(m$Start[G:(G+s$Probes[i]-1)])
 + third-max(m$End[G:(G+s$Probes[i]-1)])
 + c-cbind(first,second,third,s$Values[i],s$Probes[i])
 + print (c)
 + G=(G+s$Probes[i])}
 + else {
 + G=1
 + first-unique(m$Chr[G:s$Probes[i]])
 + second-min(m$Start[G:s$Probes[i]])
 + third-max(m$End[G:s$Probes[i]])
 + c-cbind(first,second,third,s$Values[i],s$Probes[i])
 + print (c)
 + G=(G+s$Probes[i])}
 + }
 so the output is:
     first  second    third
 [1,] chr2 9896633 14404502 0 4
     first  second     third
 [1,] chr2 14421718 16048724 -0.43 4
     first  second     third
 [1,] chr2 37491676 37703009 0 2
     first  second    third
 [1,] chr2 9896633 9896690 0 2
     first  second     third
 [1,] chr2 14314039 16048724 -0.35 6
     first  second     third
 [1,] chr2 37491676 37703009 0 2

 I get almost the required output but just need 3 modifications to this code:
 1) Since this is just a small part of the file (with 2 samples), but my
 actual file has 150 samples, so how do I write rle function for that?
 2) How do I store all the executed c values as a dataframe (here I am just
 printing the values)?
 3) How do I include sample name in execution?
 Waiting for your reply ,
 Thanks,
 Suji


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

[R] Handling missing values

2011-09-28 Thread Samir Benzerfa

Hi everyone,

 

I've got a question concerning missing values in R.

 

I'm looking for an R code which removes all NA values in my data.frame. I
found many answers about how to remove rows or columns which contain NA's.
My problem however is a bit different. I have a list of vectors. Each vector
contains some NA Values that I want to remove. So, at the end all vectors
will have different dimensions. My final goal is to apply a function(x) to
all of the vectors. Any hints how to do this? Should I change the
class=data.frame of my list of vectors? Below I made a simple example how my
table looks like and how my result should look like.

 

 

My table:

V1  V2

1 1.2  2.1

2 1.3  2.2

3 NA 4.5

4 NA 1.5

5 1.9  NA

 

My intended result:

V1  V2

1.2  2.1

1.3  2.2

1.9  4.5

1.5

 

 

Many thanks in advance  best regards

S.B.

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] using the system command

2011-09-28 Thread Data Analytics Corp.


Hi,

I started playing around with a function for using StatTransfer (version 
10) for importing data.  This started as a simple task but it's not 
working and so now I'm very frustrated.  I'm using R version 2.13 on 
Windows 7.


The function, called fn.importData,  is:

function(file = NULL, type = NULL){
##
##  create statTransfer command file - stcmd
##
ext - switch(type, sas = sas7bdat, excel = xls)
tmp - paste(copy C:\\Temp\\, file, ., ext,  r 
c:\\temp\\, file, .rdata -T  , file,  \n\nquit, sep = )

cat(tmp, file = c:/temp/transfer.stcmd, append = FALSE)
##
##  transfer using StatTransfer
##
system(paste('c:\\program files\\statTransfer10\\st.exe', 
'c:\\temp\\transfer.stcmd'), wait = FALSE)

##
##  load
##
inpt - paste(c:\\temp\\, file, .RData, sep = )
##return(inpt)
load(inpt, .GlobalEnv)
   }

My call using a sas file (vicks.sasa7bdat) is:

fn.importData(file = vicks, type = sas)

The StatTransfer command file is created and StatTransfer is called 
correctly.  The translation from vicks.sas7bdat  to vicks.RData occurs 
as it should - I can look at them in the temp directory.  But I get an 
error message for the load function:


Error in readChar(con, 5L, useBytes = TRUE) : cannot open the 
connection

In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'c:\temp\vicks.RData', probable 
reason 'No such file or directory'


If I uncomment the return(inpt) statement in the above function and use 
the following at the command line:


load(fn.importData(file = vicks, type = sas))

then everything works fine.  But of course I don't want to do this.  Any 
suggestions as to what I'm doing wrong?  Why the error message?


Also, the StatTransfer program sometimes fails to close so if I 
repeatedly call my function I'll have multiple occurrences of 
StatTransfer running.  How can I force it to close after the transfer?


Thanks,

Walt



Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ 08536

(V) 609-936-8999
(F) 609-936-3733
w...@dataanalyticscorp.com
www.dataanalyticscorp.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fGarch - Fitting and APARCH-Modell with fixed delta

2011-09-28 Thread Jean-Christophe BOUËTTÉ

Hi there,
an answer from someone who's not an expert in the field but used to
play around with time series:
The action of simulating a process using with input parameters then
estimating the parameters is not an invariant, especially when the
process involves nonlinearities.

Did you try increasing simulation length and the burn-in to see if you
get more stable results ?

2011/9/28 Bogey a5462...@nepwk.com:
Hi there,

I'm trying to fit a GJR-GARCH Model using fGarch. I wanted to try that by
fitting an APARCH model with a fixed delta of 2 and a non-fixed gamma. So I
was simply trying to use:

spec - garchFit(~aparch(1,1),data=garchSim(),delta=2)
coef(spec)

And sometimes, it's working like a charm and delta is indeed exactly 2 in
the resulting coefficient vector.
Frequently, though, the resulting delta coefficient has some other value,
usually somewhere between 0 and 2.

Any ideas why? Am I doing something wrong or could this be a bug in fGarch?

--
View this message in context:
http://r.789695.n4.nabble.com/fGarch-Fitting-and-APARCH-Modell-with-fixed-delta-tp3850702p3850702.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] removing outliers in non-normal distributions

2011-09-28 Thread Christian Hennig


Dear Ben,

not specifically an R-related response, but the best philosophy of 
defining outliers, as far as I'm concerned, you'll find  in Davies 
and Gather, The identification of multiple outliers, JASA 1993.


The idea is that you can only properly define what an outlier is relative 
to a reference distributional shape. Note that the reference 
distributional shape is *not* what you believe the underlying distribution 
is, but rather a device to define outliers as points that clearly exceed 
the extremes that are normally to be expected under the reference shape.


If your reference distributional shape is the normal, you need to set up a
robust outlier identification rule that has a low probability to find 
*any* outlier if the data rally come from a normal. Basically declaring 
everything outside median +/- c*MAD will work (Hampel identifier), but c 
needs to depend on 
the size of the dataset in order to calibrate it so that for example under 
the normal the probability  not to find any outlier is 0.05 (or whatever 
you want; note that it is always left to the user and to some extent 
arbitrary to specify a borderline). Some values for c are given in 
Davies and Gather.
There are some slightly more efficient alternatives but the Hampel 
identifier is simple and still good.


The same principle can be applied (although this is not done in the cited 
paper) if your reference distribution is different. This may for example 
make sense if you have skew data and you want a skew outlier 
identification rule. It should be more or less straightforward to adapt 
the ideas to  a lognormal reference distribution. For others, I'm not sure 
if literature exists but maybe. A lot has happened since 1993.


Hope this helps,
Christian

On Wed, 28 Sep 2011, Ben qant wrote:


Hello,

I'm seeking ideas on how to remove outliers from a non-normal distribution
predictor variable. We wish to reset points deemed outliers to a truncated
value that is less extreme. (I've seen many posts requesting outlier removal
systems. It seems like most of the replies center around why do you want to
remove them, you shouldn't remove them, it depends, etc. so I've tried
to add a lot of notes below in an attempt to answer these questions in
advance.)

Currently we Winsorize using the quantile function to get the new high and
low values to set the outliers to on the high end and low end (this is
summarized legacy code that I am revisiting):

#Get the truncated values for resetting:
lowq = quantile(dat,probs=perc_low,na.rm=TRUE)
hiq = quantile(dat,probs=perc_hi,na.rm=TRUE)

#resetting the highest and lowest values with the truncated values:
dat[lowqdat] = lowq
dat[hiqdat] = hiq

What I don't like about this is that it always truncates values (whether
they truly are outliers or not) and the perc_low and perc_hi settings are
arbitrary. I'd like to be more intelligent about it.

Notes:
1) Ranking has already been explored and is not an option at this time.
2) Reminder: these factors are almost always distributed non-normally.
3) For reason I won't get into here, I have to do this pragmatically. I
can't manually inspect the data each time I remove outliers.
4) I will be removing outliers from candidate predictor variables.
Predictors variable distributions all look very different from each other,
so I can't make any generalizations about them.
5) As #4 above indicates, I am building and testing predictor variables for
use in a regression model.
6) The predictor variable outliers are usually somewhat informative, but
their extremeness is a result of the predictor variable calculation. I
think extremeness takes away from the information that would otherwise be
available (outlier effect). So I want to remove some, but not all, of their
extremeness. For example, percent change of a small number: from say 0.001
to 500. Yes, we want to know that it changed a lot, but 49,999,900% is not
helpful and masks otherwise useful information.

I'd like to hear your ideas. Thanks in advance!

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get old packages to work on R 2.12.1

2011-09-28 Thread Richard M. Heiberger

Joe,

Most firewall issues can be resolved by entering


setInternet2(use = TRUE)


prior to the update.packges call.

Rich


On Wed, Sep 28, 2011 at 11:23 AM, Joseph Boyer joseph.g.bo...@gsk.comwrote:

 Many thanks for all the suggestions.

 For me, the update.packages command will not work behind the firewall at my
 workplace.

 But the respondents have given me more than enough suggestions to work
 with.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using the system command

2011-09-28 Thread William Dunlap

system(wait=FALSE, command) means to return to R without
waiting for the command to finish.  E.g., on Linux with
the following function

  f - function (seconds, wait) {
  tf - tempfile()
  on.exit(unlink(tf))
  system(sprintf((sleep %d ; date)  '%s', seconds, tf),
  wait = wait)
  readLines(tf)
  }

we get

   f(3, wait=FALSE)
  character(0)
   f(3, wait=TRUE)
  [1] Wed Sep 28 10:23:33 PDT 2011

In 1 calls to f(seconds=0, wait=FALSE), 9915 resulted in character(0),
meaning that the shell's processing of  file had created the file
but date had not yet put any text into it, 81 resuled in an error
from readLines that the file did not not exist yet, and 4 succeeded
in reading the date from the file.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: Data Analytics Corp. [mailto:w...@dataanalyticscorp.com]
 Sent: Wednesday, September 28, 2011 10:19 AM
 To: William Dunlap
 Subject: Re: [R] using the system command
 
 Hi William,
 
 When I use wait = TRUE I get a lot of intermediate commands from
 StatTransfer.  For example, if the files it's transferring to already
 exist, it shows repeated messages asking if I want to overwrite.  wait =
 FALSE seems to avoid this.
 
 Thanks,
 
 Walt
 
 
 
 Walter R. Paczkowski, Ph.D.
 Data Analytics Corp.
 44 Hamilton Lane
 Plainsboro, NJ 08536
 
 (V) 609-936-8999
 (F) 609-936-3733
 w...@dataanalyticscorp.com
 www.dataanalyticscorp.com
 _
 
 On 9/28/2011 1:09 PM, William Dunlap wrote:
  Why do you use wait=FALSE in the call to system()?
  Do things work differently without it?
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
  On Behalf Of Data
 Analytics
  Corp.
  Sent: Wednesday, September 28, 2011 9:05 AM
  To: r-help@r-project.org
  Subject: [R] using the system command
 
  Hi,
 
  I started playing around with a function for using StatTransfer (version
  10) for importing data.  This started as a simple task but it's not
  working and so now I'm very frustrated.  I'm using R version 2.13 on
  Windows 7.
 
  The function, called fn.importData,  is:
 
function(file = NULL, type = NULL){
##
##  create statTransfer command file - stcmd
##
ext- switch(type, sas = sas7bdat, excel = xls)
tmp- paste(copy C:\\Temp\\, file, ., ext,  r
  c:\\temp\\, file, .rdata -T  , file,  \n\nquit, sep = )
cat(tmp, file = c:/temp/transfer.stcmd, append = FALSE)
##
##  transfer using StatTransfer
##
system(paste('c:\\program files\\statTransfer10\\st.exe',
  'c:\\temp\\transfer.stcmd'), wait = FALSE)
##
##  load
##
inpt- paste(c:\\temp\\, file, .RData, sep = )
##return(inpt)
load(inpt, .GlobalEnv)
   }
 
  My call using a sas file (vicks.sasa7bdat) is:
 
fn.importData(file = vicks, type = sas)
 
  The StatTransfer command file is created and StatTransfer is called
  correctly.  The translation from vicks.sas7bdat  to vicks.RData occurs
  as it should - I can look at them in the temp directory.  But I get an
  error message for the load function:
 
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the
  connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'c:\temp\vicks.RData', probable
  reason 'No such file or directory'
 
  If I uncomment the return(inpt) statement in the above function and use
  the following at the command line:
 
load(fn.importData(file = vicks, type = sas))
 
  then everything works fine.  But of course I don't want to do this.  Any
  suggestions as to what I'm doing wrong?  Why the error message?
 
  Also, the StatTransfer program sometimes fails to close so if I
  repeatedly call my function I'll have multiple occurrences of
  StatTransfer running.  How can I force it to close after the transfer?
 
  Thanks,
 
  Walt
 
  
 
  Walter R. Paczkowski, Ph.D.
  Data Analytics Corp.
  44 Hamilton Lane
  Plainsboro, NJ 08536
  
  (V) 609-936-8999
  (F) 609-936-3733
  w...@dataanalyticscorp.com
  www.dataanalyticscorp.com
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the

Re: [R] Handling missing values

2011-09-28 Thread David Winsemius



On Sep 28, 2011, at 11:49 AM, Samir Benzerfa wrote:


Hi everyone,



I've got a question concerning missing values in R.



I'm looking for an R code which removes all NA values in my  
data.frame. I
found many answers about how to remove rows or columns which contain  
NA's.
My problem however is a bit different. I have a list of vectors.  
Each vector
contains some NA Values that I want to remove. So, at the end all  
vectors

will have different dimensions.


Then it will no longer be a data.frame.


My final goal is to apply a function(x) to
all of the vectors.


If the function has an na.rm= argument tha tyou cna set to TRUE, it  
may not be necessary to do all of this.



Any hints how to do this? Should I change the
class=data.frame of my list of vectors? Below I made a simple  
example how my

table looks like and how my result should look like.



lis.tbl - lapply(tbl, function(x) x[!is,na(x)] )

You can then use lapply to get your results from your function:

yourres - lapply(lis.tbl, yourfunc)





My table:

   V1  V2

1 1.2  2.1

2 1.3  2.2

3 NA 4.5

4 NA 1.5

5 1.9  NA



My intended result:

   V1  V2

   1.2  2.1

   1.3  2.2

   1.9  4.5

   1.5





Many thanks in advance  best regards

S.B.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Password protected R Repository

2011-09-28 Thread Stefan Petersson

 
 
  The helpfiles for 'download.file' was not that helpful. But maybe it's just 
me
  not being able to read them correctly.
 
 Yes, looks like this is the case.
 
  I tried to call install.packages with the 'method=wget', and hoped for a
  username and password dialog. But no luck.
 
 The help page says if proper values are stored in the configuration 
 file for wget, so why do you expect a dialog?
 
 Best,
 Uwe Ligges
 

Well, I expect a dialog because when I use (for example) ncftp without a conf 
file, I get a dialog asking me for site, usr and pwd. For me, not being a black 
belt R user, it's not so strange expecting a dialog of some kind when a usr/pwd 
is required. But maybe that's just me... 

And I misunderstood the helpfile. I thought it referred to proxy servers alone, 
and that it didn't concern Apache Authentication. My error. Thanks for clearing 
that up.

My misunderstanding - Method ‘wget’ can be used with proxy firewalls which 
require user/password authentication if proper values are stored in the 
configuration file for ‘wget’.

So, I added 'http_user=usr' and 'http_passwd=pwd' to my /etc/wgetrc and run:

install.packages(repos='http://www.myServer.se/myRepo/', method='wget', 
pkgs='myLib', dep=TRUE)

Bob is my uncle!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] compute probabilities on a Bayesian Network (SOLVED)

2011-09-28 Thread Marcio Pupin Mello

I studied a little (I mean a lot) and found out a solution. Then I 
decided to post here aiming to help people who are interested...
The solution uses the gRain package... but because one dependence (the 
graph package) has been removed from the CRAN repository you have to 
use the follow script to install gRain package (I tested it on Windows 7):


#run to install gRain and graph packages
chooseBioCmirror() #then choose the nearest one
setRepositories() #then use Ctrl key to select all
install.packages(c(graph,gRain),dependencies=TRUE))

Then you can run the code to build the Bayesian Network structure of the 
figure in http://www.dsr.inpe.br/~mello/1727/BNgrapmodel.png


#run to build the Bayesian Network shown in Figure
library(gRain)
v.L-cptable(~L,values=c(0.35,(1-0.35)),levels=c(T,F),normalize=T,smooth=0)
v.R-cptable(~R,values=c(0.3,(1-0.3)),levels=c(T,F),normalize=T,smooth=0)
v.H-cptable(~H|D,values=c(0.75,(1-0.75),0.2,(1-0.2)),levels=c(T,F),normalize=T,smooth=0)
v.D-cptable(~D,values=c(0.05,(1-0.05)),levels=c(T,F),normalize=T,smooth=0)
v.C-cptable(~C|S,values=c(0.9,(1-0.9),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0)
v.S-cptable(~S|D:H:R:L,values=c(0.60,(1-0.60),0.65,(1-0.65),0.62,(1-0.62),0.67,(1-0.67),0.61,(1-0.61),0.64,(1-0.64),0.61,(1-0.61),0.70,(1-0.70),0.05,(1-0.05),0.09,(1-0.09),0.07,(1-0.07),0.10,(1-0.10),0.06,(1-0.06),0.08,(1-0.08),0.07,(1-0.07),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0)
CPT-compileCPT(list(v.L,v.R,v.H,v.D,v.C,v.S))
BN-grain(CPT,smooth=0)

#answering the 1st question:
# what is the probability of S=T given C=T, L=T, R=F, H=F and D=F?
scenaria.q1-list(nodes=c(C,L,R,H,D),states=c(T,T,F,F,F))
querygrain(setFinding(BN,nodes=scenaria.q1$nodes,states=scenaria.q1$states),nodes=S)

#answering the 2nd question:
# what is the probability of S=T given C=F, L=F, R=T, H=T, and D=T?
scenaria.q2-list(nodes=c(C,L,R,H,D),states=c(F,F,T,T,T))
querygrain(setFinding(BN,nodes=scenaria.q2$nodes,states=scenaria.q2$states),nodes=S)

#answering the 3rd question:
# what is the probability of what is the probability of S=T when my only 
one evidence is that C=T?

scenaria.q3-list(nodes=c(C),states=c(T))
querygrain(setFinding(BN,nodes=scenaria.q3$nodes,states=scenaria.q3$states),nodes=S)



On 26/09/2011 18:42, Marcio Pupin Mello wrote:

Deal R Users,
I'm trying to find out how can I compute probabilities on a Bayesian
Network using R. The Bayesian Network I modelled is shown at
http://www.dsr.inpe.br/~mello/1727/BNgrapmodel.png and I'd like to know
how can I proceed (commands on R) to answer questions like: (1) what is
the probability of S=T given C=T, L=T, R=F, H=F and D=F; or (2) what is
the probability of S=T given C=F, L=F, R=T, H=T, and D=T; or even (3)
what is the probability of S=T when my only one evidence is that C=T
(i.e I don't know the other variables values). I used Microsoft Belief
Networks to compute the results but I don't know how to do it with R.
Looking forward to receiving replies!
Thanks in advance,

Marcio Pupin Mello



--
Marcio Pupin Mello

Survey Engineer
Ph.D candidate in Remote Sensing
National Institute for Space Research (INPE) - Brazil
Laboratory of Remote Sensing in Agriculture and Forestry (LAF)
www.dsr.inpe.br/~mello

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to solve a simple discrete Bayesian Belief Network?

2011-09-28 Thread Marcio Pupin Mello

I found a solution using the gRain package... see (but using another 
example)..


I studied a little (I mean a lot) and found out a solution. Then I 
decided to post here aiming to help people who are interested...
The solution uses the gRain package... but because one dependence (the 
graph package) has been removed from the CRAN repository you have to 
use the follow script to install gRain package (I tested it on Windows 7):


#run to install gRain and graph packages
chooseBioCmirror() #then choose the nearest one
setRepositories() #then use Ctrl key to select all
install.packages(c(graph,gRain),dependencies=TRUE))

Then you can run the code to build the Bayesian Network structure of the 
figure in http://www.dsr.inpe.br/~mello/1727/BNgrapmodel.png


#run to build the Bayesian Network shown in Figure
library(gRain)
v.L-cptable(~L,values=c(0.35,(1-0.35)),levels=c(T,F),normalize=T,smooth=0)
v.R-cptable(~R,values=c(0.3,(1-0.3)),levels=c(T,F),normalize=T,smooth=0)
v.H-cptable(~H|D,values=c(0.75,(1-0.75),0.2,(1-0.2)),levels=c(T,F),normalize=T,smooth=0)
v.D-cptable(~D,values=c(0.05,(1-0.05)),levels=c(T,F),normalize=T,smooth=0)
v.C-cptable(~C|S,values=c(0.9,(1-0.9),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0)
v.S-cptable(~S|D:H:R:L,values=c(0.60,(1-0.60),0.65,(1-0.65),0.62,(1-0.62),0.67,(1-0.67),0.61,(1-0.61),0.64,(1-0.64),0.61,(1-0.61),0.70,(1-0.70),0.05,(1-0.05),0.09,(1-0.09),0.07,(1-0.07),0.10,(1-0.10),0.06,(1-0.06),0.08,(1-0.08),0.07,(1-0.07),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0)
CPT-compileCPT(list(v.L,v.R,v.H,v.D,v.C,v.S))
BN-grain(CPT,smooth=0)

#answering the 1st question:
# what is the probability of S=T given C=T, L=T, R=F, H=F and D=F?
scenaria.q1-list(nodes=c(C,L,R,H,D),states=c(T,T,F,F,F))
querygrain(setFinding(BN,nodes=scenaria.q1$nodes,states=scenaria.q1$states),nodes=S)

#answering the 2nd question:
# what is the probability of S=T given C=F, L=F, R=T, H=T, and D=T?
scenaria.q2-list(nodes=c(C,L,R,H,D),states=c(F,F,T,T,T))
querygrain(setFinding(BN,nodes=scenaria.q2$nodes,states=scenaria.q2$states),nodes=S)

#answering the 3rd question:
# what is the probability of what is the probability of S=T when my only 
one evidence is that C=T?

scenaria.q3-list(nodes=c(C),states=c(T))
querygrain(setFinding(BN,nodes=scenaria.q3$nodes,states=scenaria.q3$states),nodes=S)



On 28/09/2011 09:03, Marcio Pupin Mello wrote:

Can somebody save-me? Thanks in advance!


#R script:
#trying to find out how solve a discrete Bayesian Belief Network.
#option: using 'catnet' package

#BEGIN
library(catnet)
cnet - cnNew(nodes = c(a, b, c), cats = list(c(1, 2),
c(1, 2), c(1, 2)), parents = list(NULL, c(1), c(1,
2)), probs = list(c(0.2, 0.8), list(c(0.6, 0.4), c(0.4, 0.6)),
list(list(c(0.3, 0.7), c(0.7, 0.3)), list(c(0.9, 0.1), c(0.1,
0.9)

#what is the probability of a=1?
cnNodeMarginalProb(cnet,node=1)[1]
#0.2

#what is the probability of b=2?
cnNodeMarginalProb(cnet,node=2)[2]
#0.56

#what is the probability of c=1?
cnNodeMarginalProb(cnet,node=3)[1]
#0.428


#but how can I answer questions like:

#what is the probability of a=1 given that c=1? i.e. P(a=1|c=1)?

#or what is the probability of a=1 given that b=2 and c=1? i.e.
P(a=1|b=2,c=1)?
#END



--
Marcio Pupin Mello

Survey Engineer
Ph.D candidate in Remote Sensing
National Institute for Space Research (INPE) - Brazil
Laboratory of Remote Sensing in Agriculture and Forestry (LAF)
www.dsr.inpe.br/~mello

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] number of items to replace is not a multiple of replacement length

2011-09-28 Thread Jean V Adams

dunner wrote on 09/28/2011 08:43:32 AM:
 
 Please help with this error message 
 
 drugbook is an 885 x 32 dataframe
 
  names(drugbook)
 
 [1] DRUG1  DRUG2  DRUG3  DRUG4  DRUG5 
  [6] DRUG6  DRUG7  DRUG8  DRUG9  DRUG10 
 [11] DRUG11 DRUG12 DRUG13 DRUG14 DRUG15 
 [16] DRUG16 DrugDose1  DrugDose2  DrugDose3  DrugDose4 
 [21] DrugDose5  DrugDose6  DrugDose7  DrugDose8  DrugDose9 
 [26] DrugDose10 DrugDose11 DrugDose12 DrugDose13 DrugDose14
 [31] DrugDose15 DrugDose16
 
 Equivs is a dataframe holding information for a certain subclass of 
drugs
 
  head(Equivs)
   Name  Equiv
 1   Alprazolam   0.5
 2   Bromazepam   5.5
 3 Chlordiazepoxide  25.0
 4 Clobazam  20.0
 5   Clonazepam   0.5
 6  Clorazepate  15.0
 
 What I'm trying to is for each element of drugbook, get the value in
 Equivs$Equiv, and multiply it by the corresponding drug dose, which will 
be
 in the n+16 th  column on in the same row of drugbook.
 
 Not all of the grabbed names from Drugbook will be in the dataframe 
Equivs,
 though. So sometimes the == will return a null.
 
 i.e. DrugDose16 is the dose of Drug 16
 
 this I think should work: ( I did try apply and got knotted, so stopped)
 
 mydosequivalents=list()
 for( c in 1:16) {
   for (r in 1:885){
 name-as.character(drugbook[r,c]);
 equivalent-Equivs$Equiv[Equivs$Name==name];
 dosequivalent-drugbook[r,c+16] * equivalent;
 if (length(dosequivalent)0) mydosequivalents[r][c]-dosequivalent
 else (mydosequivalents[r][c]-0)
   }
 }
 
 But it is throwing an error, and I can't figure out why. I remedied a
 previous error by checking the length of the equivalent.


I think your error is coming from the way that you are trying to populate 
the list using [r][c].  Are you trying to have mydosequivalents be a list 
of r vectors each with c elements?


 warnings() 
 50: In mydosequivalents[c][r] - dosequivalent :
   number of items to replace is not a multiple of replacement length
 
 mydosequivalents ends up being full of NULL and zeros
 
 
 Thanks in advance for any help
 
 Ross


Another way:

# using my example data
drugbook - data.frame(DRUG1=letters[1:5], DRUG2=letters[6:10], 
DrugDose1=1:5, DrugDose2=6:10)
Equivs - data.frame(Name=letters[-c(2, 8)], Equiv=100*(1:24))

mydosequivalents - apply(drugbook[, 1:2], 2, 
function(x) Equivs$Equiv[match(x, Equivs$Name)])*drugbook[, 3:4]
mydosequivalents[is.na(mydosequivalents)] - 0


# this should work for your real data
mydosequivalents - apply(drugbook[, 1:16], 2, 
function(x) Equivs$Equiv[match(x, Equivs$Name)])*drugbook[, 17:32]
mydosequivalents[is.na(mydosequivalents)] - 0

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread jim holtman

The solution that I sent will handle the 150 different samples; just
list the column names in the argument to the top 'lapply'.  You don't
need the 'rle' in my approach.

On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
 Hi,
 This is the code that I wrote for 3 samples:
 code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))

 s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))

 names(s)=c(Values,Probes)

 c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
 G=1
 n=4

 for(i in 1:length(s$Probes)){

 + if(G==1){c[i,1]-names(m[n])
 + c[i,2]-unique(m$Chr[G:s$Probes[i]])
 + c[i,3]-min(m$Start[G:s$Probes[i]])
 + c[i,4]-max(m$End[G:s$Probes[i]])
 + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

 + G=(G+s$Probes[i])}
 + else if((G-1)  length(m$Sample1)) {

 + c[i,1]-names(m[n])
 + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
 + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
 + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
 + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

 + G=(G+s$Probes[i])}
 + else {
 + G=1

 + n=n+1
 +  c[i,1]-names(m[n])
 + c[i,2]-unique(m$Chr[G:s$Probes[i]])
 + c[i,3]-min(m$Start[G:s$Probes[i]])
 + c[i,4]-max(m$End[G:s$Probes[i]])
 + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

 + G=(G+s$Probes[i])}}

 c

     Sample  Chr    Start  End Values Probes

 1  Sample1 chr2  9896633 14404502  0  4
 2  Sample1 chr2 14421718 16048724  -0.43  4
 3  Sample1 chr2 37491676 37703009  0  2
 4  Sample2 chr2  9896633  9896690  0  2
 5  Sample2 chr2 14314039 16048724  -0.35  6
 6  Sample2 chr2 37491676 37703009  0  2
 7  Sample3 chr2  9896633 14314098  0  3
 8  Sample3 chr2 14404467 16031769   0.32  3
 9  Sample3 chr2 16036178 37491735   0.45  3
 10 Sample3 chr2 37702947 37703009  0  1


 The problem that I am facing is for expanding rle function for values and
 probes.
 Defintely your code looks simpler, but I would like to read the file by just
 giving the name of the file as written in my code because my original file
 contains 150 samples,but how to use lapply or rle function for 150 such
 samples, if my file contain 150 samples similiar to sample1 and sample2.

 waiting for your reply,
 Thanks,
 Suji

 On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote:

 Here one approach:

  x - read.table(textConnection(Chr start end sample1 sample2
 + chr2 9896633 9896683 0 0
 + chr2 9896639 9896690 0 0
 + chr2 14314039 14314098 0 -0.35
 + chr2 14404467 14404502 0 -0.35
 + chr2 14421718 14421777 -0.43 -0.35
 + chr2 16031710 16031769 -0.43 -0.35
 + chr2 16036178 16036237 -0.43 -0.35
 + chr2 16048665 16048724 -0.43 -0.35
 + chr2 37491676 37491735 0 0
 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
  closeAllConnections()
 
  result - lapply(c('sample1', 'sample2'), function(.samp){
 +     # split by breaks in the values
 +     .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
 +
 +     # combine the list of dataframes
 +     .range - do.call(rbind, lapply(.grps, function(.set){
 +         # create a dataframe of the results
 +         data.frame(Sample = .samp
 +                    , Chr = .set$Chr[1L]
 +                    , Start = min(.set$start)
 +                    , End = max(.set$end)
 +                    , Values = .set[[.samp]][1L]
 +                    , Probes = nrow(.set)
 +                    )
 +         }))
 +     })
  # put the list of dataframes together
  result - do.call(rbind, result)
  result
    Sample  Chr    Start      End Values Probes
 0  sample1 chr2  9896633 14404502   0.00      4
 1  sample1 chr2 14421718 16048724  -0.43      4
 2  sample1 chr2 37491676 37703009   0.00      2
 01 sample2 chr2  9896633  9896690   0.00      2
 11 sample2 chr2 14314039 16048724  -0.35      6
 21 sample2 chr2 37491676 37703009   0.00      2
 


 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
  Hi group,
 
  This is how my test file looks like:
  Chr start end sample1 sample2
  chr2 9896633 9896683 0 0
  chr2 9896639 9896690 0 0
  chr2 14314039 14314098 0 -0.35
  chr2 14404467 14404502 0 -0.35
  chr2 14421718 14421777 -0.43 -0.35
  chr2 16031710 16031769 -0.43 -0.35
  chr2 16036178 16036237 -0.43 -0.35
  chr2 16048665 16048724 -0.43 -0.35
  chr2 37491676 37491735 0 0
  chr2 37702947 37703009 0 0
 
  This is the output that I am expecting:
  Sample Chr Start End Values Probes
  sample1 chr2 9896633 14404502 0 4
  sample1 chr2 14421718 16048724 -0.43 4
  sample1 chr2 37491676 37703001 0 2
  sample2 chr2 9896633 9896690 0 2
  sample2  chr2 14314039 16048724 -0.35 6
  sample2 chr2 37491676 37703009 0 2
 
  Here the Chr value is same but

[R] survexp with large dataframes

2011-09-28 Thread Mike Harwood

Hello, and thank you in advance.

I would like to capture the expected survival from a coxph model for
subjects in an observational study with recurrent events, but the
survexp statement is failing due to memory.  I am using R version
2.13.1 (2011-07-08) on Windows XP.

My objective is to plot the fitted survival with the Kaplan-Meier
plot.  Below is the code with output and [unfortunately] errors. Is
there something wrong in my use of cluster in generating the
proportional hazards model, or is there some syntax to pass it into
survexp?

Mike

 dim(dev)
[1] 899876 25

 mod1 - coxph(Surv(begin.cp, end.cp, event)
+ ~ age.sex
+   + plan_type
+   + uw_load
+   + cluster(mbr_key)
+   ,data=dev
+   )

 summary(mod1)
Call:
coxph(formula = Surv(begin.cp, end.cp, event) ~ age.sex + plan_type +
uw_load + cluster(mbr_key), data = dev)

  n= 899876, number of events= 753324

 coef exp(coef)  se(coef) robust se   z
Pr(|z|)
age.sex19-34_MALE   -0.821944  0.439576  0.005529  0.023298 -35.280  
2e-16 ***
age.sex35-49_FEMALE  0.058776  1.060537  0.004201  0.018477   3.181
0.00147 **
age.sex35-49_MALE   -0.515590  0.597148  0.004634  0.019986 -25.798  
2e-16 ***
age.sex50-64_FEMALE  0.190940  1.210386  0.004350  0.020415   9.353  
2e-16 ***
age.sex50-64_MALE   -0.127514  0.880281  0.004487  0.021431  -5.950
2.68e-09 ***
age.sexCHILD_CHILD  -0.327522  0.720707  0.004238  0.017066 -19.192  
2e-16 ***
plan_typeLOW-0.165735  0.847270  0.002443  0.011080 -14.958  
2e-16 ***
uw_load1-50  0.215122  1.240014  0.006437  0.029189   7.370
1.71e-13 ***
uw_load101-250   0.551042  1.735060  0.003993  0.018779  29.344  
2e-16 ***
uw_load251+  0.981660  2.668884  0.003172  0.017490  56.126  
2e-16 ***
uw_load51-1000.413464  1.512046  0.006216  0.027877  14.832  
2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
age.sex19-34_MALE  0.4396 2.27490.42000.4601
age.sex35-49_FEMALE1.0605 0.94291.02281.0996
age.sex35-49_MALE  0.5971 1.67460.57420.6210
age.sex50-64_FEMALE1.2104 0.82621.16291.2598
age.sex50-64_MALE  0.8803 1.13600.84410.9180
age.sexCHILD_CHILD 0.7207 1.38750.69700.7452
plan_typeLOW   0.8473 1.18030.82910.8659
uw_load1-501.2400 0.80641.17111.3130
uw_load101-250 1.7351 0.57631.67241.8001
uw_load251+2.6689 0.37472.57892.7620
uw_load51-100  1.5120 0.66141.43161.5970

Concordance= 0.643  (se = 0 )
Rsquare= 0.205   (max possible= 1 )
Likelihood ratio test= 206724  on 11 df,   p=0
Wald test= 9207  on 11 df,   p=0
Score (logrank) test = 246358  on 11 df,   p=0,   Robust = 4574  p=0

  (Note: the likelihood ratio and score tests assume independence of
 observations within a cluster, the Wald and robust score tests do
not).

 dev.fit - survexp( ~ 1, ratetable=mod1, data=dev)
Error in survexp.cfit(cbind(as.numeric(X), R), Y, conditional,
FALSE,  :
  cannot allocate memory block of size 15.2 Gb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] svd (via MFA): All coordinates fall on straight lines?

2011-09-28 Thread Pundurs, Mark

Applying the svd function to my data by way of the FactoMineR package's MFA 
function:

dfmfa - MFA(df, group=c(2,96), type=c(n,c))

the result is that all my data points fall on one of 8 straight parallel lines 
when projected onto any two axes, e.g.,

points(dfmfa$ind$coord[, c(1, 2)])

Furthermore, spot-checks indicate that for any pair of axes other than (1, 2) 
the lines are either horizontal or vertical.

Is all this telling me something about my data, and if so, what? Or is it 
telling me I should be setting some parameters to nondefault values, and if so, 
which parameters and what values?

Thanks,
Mark Pundurs
Data Analyst - Traffic
Nokia Location  Commerce, Chicago


The information contained in this communication may be C...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Robust covariance matrix with NeweyWest()

2011-09-28 Thread Andreas Klein

Dear R-users,

I would like to compute a robust covariance matrix of two series of 
realizations of random variables:

###Begin Example###

data - cbind(rnorm(100), rnorm(100))
model - lm(data ~ 1)
vcov(model)

library(sandwich)
NeweyWest(model) #produces an error


###End Example###

NeweyWest() produces an error but sandwich(), vcovHAC(), kernHAC, weave(),... 
do not produce any errors. It seems that the model object does not fit in that 
special case.


Nevertheless, the problem is that I need the robust version of the covariance 
matrix according to Newey and West (1987, 1994).

Any ideas or suggestions to solve the problem?

Kind regards,
Andy


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread viritha k

Hi,
This is the code that I wrote for 3 samples:
code:
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))

s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))

 names(s)=c(Values,Probes)

c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
 G=1
 n=4

 for(i in 1:length(s$Probes)){

+ if(G==1){c[i,1]-names(m[n])
+ c[i,2]-unique(m$Chr[G:s$Probes[i]])
+ c[i,3]-min(m$Start[G:s$Probes[i]])
+ c[i,4]-max(m$End[G:s$Probes[i]])
+ c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

+ G=(G+s$Probes[i])}
+ else if((G-1)  length(m$Sample1)) {

+ c[i,1]-names(m[n])
+ c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
+ c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
+ c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
+ c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

+ G=(G+s$Probes[i])}
+ else {
+ G=1

+ n=n+1
+  c[i,1]-names(m[n])
+ c[i,2]-unique(m$Chr[G:s$Probes[i]])
+ c[i,3]-min(m$Start[G:s$Probes[i]])
+ c[i,4]-max(m$End[G:s$Probes[i]])
+ c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])

+ G=(G+s$Probes[i])}}

 c
Sample  ChrStart  End Values Probes

1  Sample1 chr2  9896633 14404502  0  4
2  Sample1 chr2 14421718 16048724  -0.43  4
3  Sample1 chr2 37491676 37703009  0  2
4  Sample2 chr2  9896633  9896690  0  2
5  Sample2 chr2 14314039 16048724  -0.35  6
6  Sample2 chr2 37491676 37703009  0  2
7  Sample3 chr2  9896633 14314098  0  3
8  Sample3 chr2 14404467 16031769   0.32  3
9  Sample3 chr2 16036178 37491735   0.45  3
10 Sample3 chr2 37702947 37703009  0  1


The problem that I am facing is for expanding rle function for values and
probes.
Defintely your code looks simpler, but I would like to read the file by just
giving the name of the file as written in my code because my original file
contains 150 samples,but how to use lapply or rle function for 150 such
samples, if my file contain 150 samples similiar to sample1 and sample2.
waiting for your reply,
Thanks,
Suji

On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote:

 Here one approach:

  x - read.table(textConnection(Chr start end sample1 sample2
 + chr2 9896633 9896683 0 0
 + chr2 9896639 9896690 0 0
 + chr2 14314039 14314098 0 -0.35
 + chr2 14404467 14404502 0 -0.35
 + chr2 14421718 14421777 -0.43 -0.35
 + chr2 16031710 16031769 -0.43 -0.35
 + chr2 16036178 16036237 -0.43 -0.35
 + chr2 16048665 16048724 -0.43 -0.35
 + chr2 37491676 37491735 0 0
 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
  closeAllConnections()
 
  result - lapply(c('sample1', 'sample2'), function(.samp){
 + # split by breaks in the values
 + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
 +
 + # combine the list of dataframes
 + .range - do.call(rbind, lapply(.grps, function(.set){
 + # create a dataframe of the results
 + data.frame(Sample = .samp
 +, Chr = .set$Chr[1L]
 +, Start = min(.set$start)
 +, End = max(.set$end)
 +, Values = .set[[.samp]][1L]
 +, Probes = nrow(.set)
 +)
 + }))
 + })
  # put the list of dataframes together
  result - do.call(rbind, result)
  result
Sample  ChrStart  End Values Probes
 0  sample1 chr2  9896633 14404502   0.00  4
 1  sample1 chr2 14421718 16048724  -0.43  4
 2  sample1 chr2 37491676 37703009   0.00  2
 01 sample2 chr2  9896633  9896690   0.00  2
 11 sample2 chr2 14314039 16048724  -0.35  6
 21 sample2 chr2 37491676 37703009   0.00  2
 


 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
  Hi group,
 
  This is how my test file looks like:
  Chr start end sample1 sample2
  chr2 9896633 9896683 0 0
  chr2 9896639 9896690 0 0
  chr2 14314039 14314098 0 -0.35
  chr2 14404467 14404502 0 -0.35
  chr2 14421718 14421777 -0.43 -0.35
  chr2 16031710 16031769 -0.43 -0.35
  chr2 16036178 16036237 -0.43 -0.35
  chr2 16048665 16048724 -0.43 -0.35
  chr2 37491676 37491735 0 0
  chr2 37702947 37703009 0 0
 
  This is the output that I am expecting:
  Sample Chr Start End Values Probes
  sample1 chr2 9896633 14404502 0 4
  sample1 chr2 14421718 16048724 -0.43 4
  sample1 chr2 37491676 37703001 0 2
  sample2 chr2 9896633 9896690 0 2
  sample2  chr2 14314039 16048724 -0.35 6
  sample2 chr2 37491676 37703009 0 2
 
  Here the Chr value is same but can be any other value aswell so unique
 among
  the similar values. The Start for the first line would be the least value
  until values are similiar (4) then the end would be highest value. The
  values is the unique value among the common values and probes is number
 of
  similar values.

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread viritha k

Hi Jim,
 Thanks for the reply, ok but I dont want to use textConnection and paste
each line but want the input to be read from a file like
m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric').
So how do I incorporate that in your code.
Thanks,
Suji
On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote:

 The solution that I sent will handle the 150 different samples; just
 list the column names in the argument to the top 'lapply'.  You don't
 need the 'rle' in my approach.

 On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
  Hi,
  This is the code that I wrote for 3 samples:
  code:

 m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
 
 
 s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
 
  names(s)=c(Values,Probes)
 
 
 c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
  G=1
  n=4
 
  for(i in 1:length(s$Probes)){
 
  + if(G==1){c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else if((G-1)  length(m$Sample1)) {
 
  + c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
  + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
  + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else {
  + G=1
 
  + n=n+1
  +  c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}}
 
  c
 
  Sample  ChrStart  End Values Probes
 
  1  Sample1 chr2  9896633 14404502  0  4
  2  Sample1 chr2 14421718 16048724  -0.43  4
  3  Sample1 chr2 37491676 37703009  0  2
  4  Sample2 chr2  9896633  9896690  0  2
  5  Sample2 chr2 14314039 16048724  -0.35  6
  6  Sample2 chr2 37491676 37703009  0  2
  7  Sample3 chr2  9896633 14314098  0  3
  8  Sample3 chr2 14404467 16031769   0.32  3
  9  Sample3 chr2 16036178 37491735   0.45  3
  10 Sample3 chr2 37702947 37703009  0  1
 
 
  The problem that I am facing is for expanding rle function for values and
  probes.
  Defintely your code looks simpler, but I would like to read the file by
 just
  giving the name of the file as written in my code because my original
 file
  contains 150 samples,but how to use lapply or rle function for 150 such
  samples, if my file contain 150 samples similiar to sample1 and sample2.
 
  waiting for your reply,
  Thanks,
  Suji
 
  On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com
 wrote:
 
  Here one approach:
 
   x - read.table(textConnection(Chr start end sample1 sample2
  + chr2 9896633 9896683 0 0
  + chr2 9896639 9896690 0 0
  + chr2 14314039 14314098 0 -0.35
  + chr2 14404467 14404502 0 -0.35
  + chr2 14421718 14421777 -0.43 -0.35
  + chr2 16031710 16031769 -0.43 -0.35
  + chr2 16036178 16036237 -0.43 -0.35
  + chr2 16048665 16048724 -0.43 -0.35
  + chr2 37491676 37491735 0 0
  + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
   closeAllConnections()
  
   result - lapply(c('sample1', 'sample2'), function(.samp){
  + # split by breaks in the values
  + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
  +
  + # combine the list of dataframes
  + .range - do.call(rbind, lapply(.grps, function(.set){
  + # create a dataframe of the results
  + data.frame(Sample = .samp
  +, Chr = .set$Chr[1L]
  +, Start = min(.set$start)
  +, End = max(.set$end)
  +, Values = .set[[.samp]][1L]
  +, Probes = nrow(.set)
  +)
  + }))
  + })
   # put the list of dataframes together
   result - do.call(rbind, result)
   result
 Sample  ChrStart  End Values Probes
  0  sample1 chr2  9896633 14404502   0.00  4
  1  sample1 chr2 14421718 16048724  -0.43  4
  2  sample1 chr2 37491676 37703009   0.00  2
  01 sample2 chr2  9896633  9896690   0.00  2
  11 sample2 chr2 14314039 16048724  -0.35  6
  21 sample2 chr2 37491676 37703009   0.00  2
  
 
 
  On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote:
   Hi group,
  
   This is how my test file looks like:
   Chr start end sample1 sample2
   chr2 9896633 9896683 0 0
   chr2 9896639 9896690 0 0
   chr2 14314039 14314098 0 -0.35
   chr2 14404467 14404502 0 -0.35
   chr2 14421718 14421777 -0.43 -0.35

[R] Error: could not find function

2011-09-28 Thread Franckx Laurent

Dear all

I have written a function, calibrateCES, in which a function is called, 
BPGC, which was written in the enclosure of calibrateCES. Until the most 
recent version of my program, calibrateCES always used BPGC as planned. 
Following a modification which does not affect BPGC directly (although it 
does affect one of the arguments of BPGC), I now get the following error 
message: 
Error: could not find function BPGC
When I defined BPGC in calibrateCES, the problem disappears, but this is 
definitely not the solution I want, because I want BPGC to be available for 
other functions as well. 


Laurent Franckx



Uniting expertise from different fields of technology enhances the development 
of innovative methods for sustainable production.
Join the third edition of the international congress 'Innovation for 
Sustainable Production 2012'
May, 6-9, 2012 - Bruges (Belgium)
http://www.i-sup2012.org
---
This e-mail, any attachments and the information it contains are confidential 
and meant only for the use of the addressee(s) only.  Access to this e-mail by 
anyone other than the addressee(s) is unauthorized.  If you are not the 
intended addressee (or responsible for delivery of the message to such person), 
you may not use, copy, distribute or deliver to anyone this message (or any 
part of its contents) or take any action in reliance on it.  In such case, you 
should destroy this message and notify the sender immediately.  If you have 
received this e-mail in error, please notify us immediately by e-mail or 
telephone and delete the e-mail from any computer.
All reasonable precautions have been taken to ensure no viruses are present in 
this e-mail and its attachments.  As our company cannot accept responsibility 
for any loss or damage arising from the use of this e-mail or attachments we 
recommend that you subject these to your virus checking procedures prior to use.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: could not find function

2011-09-28 Thread Jeff Newmiller

Then don't define BPGC inside another function.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Franckx Laurent laurent.fran...@vito.be wrote:

Dear all

I have written a function, calibrateCES, in which a function is called, 
BPGC, which was written in the enclosure of calibrateCES. Until the most 
recent version of my program, calibrateCES always used BPGC as planned. 
Following a modification which does not affect BPGC directly (although it 
does affect one of the arguments of BPGC), I now get the following error 
message: 
Error: could not find function BPGC
When I defined BPGC in calibrateCES, the problem disappears, but this is 
definitely not the solution I want, because I want BPGC to be available for 
other functions as well. 


Laurent Franckx



Uniting expertise from different fields of technology enhances the development 
of innovative methods for sustainable production.
Join the third edition of the international congress 'Innovation for 
Sustainable Production 2012'
May, 6-9, 2012 - Bruges (Belgium)
http://www.i-sup2012.org
---
This e-mail, any attachments and the information it contains are confidential 
and meant only for the use of the addressee(s) only. Access to this e-mail by 
anyone other than the addressee(s) is unauthorized. If you are not the intended 
addressee (or responsible for delivery of the message to such person), you may 
not use, copy, distribute or deliver to anyone this message (or any part of its 
contents) or take any action in reliance on it. In such case, you should 
destroy this message and notify the sender immediately. If you have received 
this e-mail in error, please notify us immediately by e-mail or telephone and 
delete the e-mail from any computer.
All reasonable precautions have been taken to ensure no viruses are present in 
this e-mail and its attachments. As our company cannot accept responsibility 
for any loss or damage arising from the use of this e-mail or attachments we 
recommend that you subject these to your virus checking procedures prior to use.

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Robust covariance matrix with NeweyWest()

2011-09-28 Thread Achim Zeileis


On Wed, 28 Sep 2011, Andreas Klein wrote:


Dear R-users,

I would like to compute a robust covariance matrix of two series of 
realizations of random variables:


###Begin Example###

data - cbind(rnorm(100), rnorm(100))
model - lm(data ~ 1)
vcov(model)

library(sandwich)
NeweyWest(model) #produces an error


###End Example###

NeweyWest() produces an error but sandwich(), vcovHAC(), kernHAC, 
weave(),... do not produce any errors. It seems that the model object 
does not fit in that special case.


Short version: This is/was a bug. A fixed version has just been released 
on the CRAN master site.


Medium version: Until binary packages are available (if needed), you can 
work around the bug by computing the truncation lag by hand.

  NeweyWest(model, lag = floor(bwNeweyWest(model, weights = 1)))

Long version: In the bandwidth selection -- function bwNeweyWest() -- 
estimating/score functions pertaining to intercepts are omitted by default 
(following Newey  West's advice). This is, of course, unless there is 
only an intercept in which case its estimating/score function is used. The 
code handled this correctly in models with one intercept but not in mlm 
objects which can have two intercept scores (and no other regressors), as 
in your case. Fixed now.


Best,
Z



Nevertheless, the problem is that I need the robust version of the covariance 
matrix according to Newey and West (1987, 1994).

Any ideas or suggestions to solve the problem?

Kind regards,
Andy


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcox test and data collection

2011-09-28 Thread Jean V Adams

Francesca Pancotto wrote on 09/28/2011 10:23:17 AM:
 
 Dear Contributors
 I have a problem with the collection of data from the results of a test.
 I need to perform a comparative test over groups of data , recall the 
value
 of the pvalue and create a table.
 My problem is in the way to replicate the analysis over and over again 
over
 subsets of data according to a condition.
 I have this database, called y:
 gg   t1 t2d
  40  1  1 2
  50 2   2 1
  45 1   3 1
  49 2   1 1
  5   2   1 3
  40 1   1 2
 
 where gg takes values from 1 to 100, t1 and t2 have values in (1,2,3) 
and d
 in (0,1,2,3)
 I want to perform tests on the values of gg according to the conditions 
that
 
 d==0 , compare values of gg when t1==1 with values of gg when t1==3
 d==1 , compare values of gg when t1==1 with values of gg when t1==3
 d==2 , compare values of gg when t1==1 with values of gg when t1==3
 ..
 then
 d==0 , compare values of gg when t2==1 with values of gg when t2==3
 d==1...
 
 
 then collect the data of a statistics and create a table.
 The procedure i followed is to create sub datasets called m0,m1,m2,m3
 corresponding
 to the values of d, i.e.
 
 m0- y[y$d==0,c(7,17,18,19)]
 m1- y[y$d==1,c(7,17,18,19)]
 m2- y[y$d==2,c(7,17,18,19)]
 m3- y[y$d==3,c(7,17,18,19)]
 
 then perform the test as follows:
 
 x1-wilcox.test(m0[m0$t1==1,1],m0[m0$t1==3,1],correct=FALSE, 
exact=FALSE,
 conf.int=TRUE,alternative = c(g))   #ABC   ID
 x2-  wilcox.test(m1[m1$t1==1,1],m1[m1$t1==3,1],correct=FALSE, 
exact=FALSE,
 conf.int=TRUE,alternative = c(g))
  x3-  wilcox.test(m2[m2$t1==1,1],m2[m2$t1==3,1],correct=FALSE, 
exact=FALSE,
 conf.int=TRUE,alternative = c(g))
 x4- wilcox.test(m3[m3$t1==1,1],m3[m3$t1==3,1],correct=FALSE, 
exact=FALSE,
 conf.int=TRUE,alternative = c(g))
 
 each of these tests will create an object, say x and then I extract the
 value statistics using
 x$statistics.
 
 How to automatize this?
 Thank you for any help you can provide.
 Francesca


y - data.frame(gg=sample(100), t1=sample(1:3, 100, TRUE), 
t2=sample(1:3, 100, TRUE), d=sample(0:3, 100, TRUE))

yd - split(y, y$d)

sapply(yd, function(df)
wilcox.test(df[df$t1==1, 1], df[df$t1==3, 1], correct=FALSE, 
exact=FALSE, conf.int=TRUE, alternative=g)[c(estimate, 
p.value)])

sapply(yd, function(df)
wilcox.test(df[df$t2==1, 1], df[df$t2==3, 1], correct=FALSE, 
exact=FALSE, conf.int=TRUE, alternative=g)[c(estimate, 
p.value)])

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] compare proportions

2011-09-28 Thread array chip

Hi all, I asked this yesterday, but hadn't got any response yet. Understand 
this is not pure R technical question, but more of statistical. With many 
statistical experts in the list, I would appreciate any suggestions...

Many thanks

John

--


Hi, I have a seemingly simple proportional test.  here is the question I am 
trying to answer:
 
There is a test running each day in the lab, the test comes out as either 
positive or negative. So at the end of each month, we can calculate a positive 
rate in that month as the proportion of positive test results. The data look 
like:
 
Month  # positive   # total tests    positive rate
January 24    205 11.7%
February    31    234 13.2%
March   26    227 11.5%
:
:
:
August  42    241  17.4%
 
The total # of positive before August is 182, and the total # of tests before 
August is 1526. It appears that from January to July, the positive monthly rate 
is between 11% to 13%, the rate in August is up around 17%. So the question is 
whether is up in August is statistically significant?
 
I can think of 3 ways to do this test:
 
1. Use binom.test(), set “p” as the average positive
rate between January and July (=182/1526):
 
 binom.test(42,241,182/1526)
 
    Exact binomial test
 
data:  42 and 241
number of successes = 42, number
of trials = 241, p-value = 0.0125
alternative hypothesis: true
probability of success is not equal to 0.1192661
95 percent confidence interval:
 0.1285821 0.2281769
sample estimates:
probability of success
 0.1742739
 
2. Use prop.test(), where I compare the average
positive rate between January  July with the positive rate in August:
 
 prop.test(c(182,42),c(1526,241))
 
    2-sample test for equality of
proportions with continuity correction
 
data:  c(182, 42) out of c(1526, 241)
X-squared = 5.203, df = 1,
p-value = 0.02255
alternative hypothesis:
two.sided
95 percent confidence interval:
 -0.107988625 -0.002026982
sample estimates:
   prop 1    prop 2
0.1192661 0.1742739

3. Use prop.test(), where I compare the average
MONTHLY positive rate between January  July with the positive rate in
August. The average monthly # of positives is 182/7=26, the average monthly $
of total tests is 1526/7=218:
 
 prop.test(c(26,42),c(218,241))
 
    2-sample test for equality of
proportions with continuity correction
 
data:  c(26, 42) out of c(218, 241)
X-squared = 2.3258, df = 1,
p-value = 0.1272
alternative hypothesis:
two.sided
95 percent confidence interval:
 -0.12375569  0.01374008
sample estimates:
   prop 1    prop 2
0.1192661 0.1742739
 
As you can see that the method 3 gave insignificant p value compared to method 
1  2. While I understand each method is testing different hypothesis, but for 
the question I am trying to answer (does August have higher positive rate 
compare to earlier months?), which method is more relevant? Or I should 
consider some regression techniques, then what type of regression is 
appropriate?
 
Thanks for any suggestions,
 
John


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survival analysis: interval censored data

2011-09-28 Thread David Winsemius



On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote:




hallo terry:

I attached araceae data set,


The usual survival analysis via the Kaplan-Meier method only make  
estimates at the time of events. When you tabulate your data, you see  
that there were no events for the missing (starting) time rows in  
those categories during the intervals that you are questioning as  
missing:


xtabs( ~ time+time2+categoria+event, data=araceae)
, , categoria = C, event = 0

  time2
time   2005 2006 2007 2008 2009 2010
  20040   23131   22
  2005000000
  200700004   19
  2008000000
  2009000000

, , categoria = E, event = 0

  time2
time   2005 2006 2007 2008 2009 2010
  20040   22073   21
  2005001100
  200700000   29
  2008000000
  2009000001


, , categoria = C, event = 1

  time2
time   2005 2006 2007 2008 2009 2010
  2004052303
  2005000000
  2007000232
  2008000010
  2009000000

, , categoria = E, event = 1

  time2
time   2005 2006 2007 2008 2009 2010
  2004721134
  2005000100
  2007000313
  2008000000
  2009000000



when I use this:

surara-survfit(Surv(time,time2,event)~categoria)

Call: survfit(formula = Surv(time, time2, event) ~ categoria)

records n.max n.start events median 0.95LCL 0.95UCL
categoria=C  9463   0 21 NA  NA  NA
categoria=E 11177   0 26 NA  NA  NA

summary(surara)

Call: survfit(formula = Surv(time, time2, event) ~ categoria)

categoria=C
 time n.risk n.event entered censored survival std.err lower 95% CI  
upper 95% CI
 2006 63   5   0   230.921  0.0341 
0.8560.990
 2007 35   2  3010.868  0.0483 
0.7780.968
 2008 62   5   130.798  0.0536 
0.7000.910
 2009 55   4   050.740  0.0570 
0.6360.861
 2010 46   5   0   410.660  0.0611 
0.5500.791


categoria=E
 time n.risk n.event entered censored survival std.err lower 95% CI  
upper 95% CI
 2005 71   7   300.901  0.0354 
0.8350.973
 2006 67   2   0   220.875  0.0391 
0.8010.955
 2007 43   1  3610.854  0.0432 
0.7740.943
 2008 77   5   080.799  0.0469 
0.7120.896
 2009 64   4   130.749  0.0502 
0.6570.854
 2010 58   7   0   510.658  0.0545 
0.5600.774


You see that your first survfit object is offering a simple sum of  
'time2' columns of that tabulation as its 'n.event' values. It's  
'n.risk' tabulation is not taking note of whether a case started in  
any particular prior interval. The n.risk sum appears to be the sum of  
persons surviving from the prior year less any decedents plus any  
entrants as reflected in future events on that rowYou notice  
that there are missing years even in that report: 2004,2005 for  
category C and 2004 for category E since there are no events in  
columns for those 'time2' values.




but whe I included type=interval,

suraraint- 
survfit(Surv(time,time2,event,type='interval')~categoria) # falta  
arreglar lo del intervalo!!!

summary(suraraint)

Call: survfit(formula = Surv(time, time2, event, type = interval) ~
categoria)

categoria=C
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
 2004  95.00   13.140.862  0.03540.7950.934
 2007  31.867.190.667  0.06950.5440.818
 2008   1.671.670.000 NaN   NA   NA

categoria=E
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
 2004  112.0   18.470.835  0.03510.7690.907
 2005   40.51.060.813  0.04010.7380.896
 2007   37.57.460.651  0.06200.5400.785



The second object's n.event, when Surv() was constructed with  
type=interval, has values based on the starting 'time' rows,  but I  
am unable to deduce the estimating algorithm. I remember Therneau  
saying it wasn't a simple algorithm. The 2008 row in category C has  
one  entry of 1 in the next year and there were no censoring for C- 
entrants in that year. Why the n.event is 1.67 I cannot say, but at  
least the n.event does not exceed the n.risk.  The code or a copy of  
Therneau and Grambsch would be sensible places to look for answer by

Re: [R] dump.frames, debugger and the ... argument

2011-09-28 Thread Duncan Murdoch


On 28/09/2011 12:25 PM, Jannis wrote:

dummyFunct = function(a, ...) {
   b = 2* a
   c = d   # should cause error
   return(b)
}

options(error = quote(dump.frames(dumpto = last.dump, to.file = TRUE)))

dummyFunct(1)
load(last.dump.rda)
debugger(last.dump)
# selection of call 1 causes:

# Error in get(.obj, envir = dump[[.selection]]) :
#  argument ... is missing, with no defa


Looks like Brian was right, it has been fixed.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding axis to an ellipse: ellipse package

2011-09-28 Thread Rolf Turner


On 28/09/11 20:34, Antoine wrote:

Thanks a lot Rolf, I knew it would be possible to do it that way but I
was just curious to know if the ellipse package was offering such a
feature. Thanks again for you help!


Well, since the help for ellipse makes no mention of such a
facility it is *highly* unlikely that there would be one.  And in
general in R, if there is no obvious built-in way of accomplishing
a task, it is probably faster to roll your own rather than expending
time on fruitless searching.  (I have a vague recollection of there
being a fortune along similar lines, but I can't find it.)

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survival analysis: interval censored data

2011-09-28 Thread Terry Therneau

David,
  Thanks for taking time to answer this.

   When there are interval censored data, computation of the MLE first
finds a minimal number of time points such that every interval with a 
death contains one of those points.  Those are the points at which the
final curve can have a jump.  The algorithm, more or less is:
1. On the real line, put a pair of parenthesis down for each
interval with an event ( at the left end ) at the right.
2. Now scan down the line for () pairs (pairs with nothing else in
the enclosed interval): the center of each will be one of the jump
points.

  With slight variations for left truncation.

 Terry T.

On Wed, 2011-09-28 at 16:33 -0400, David Winsemius wrote:
 On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote:
 
 
 
  hallo terry:
 
  I attached araceae data set,
 
 The usual survival analysis via the Kaplan-Meier method only make  
 estimates at the time of events. When you tabulate your data, you see  
 that there were no events for the missing (starting) time rows in  
 those categories during the intervals that you are questioning as  
 missing:
 
 xtabs( ~ time+time2+categoria+event, data=araceae)
 , , categoria = C, event = 0
 
time2
 time   2005 2006 2007 2008 2009 2010
20040   23131   22
2005000000
200700004   19
2008000000
2009000000
 
 , , categoria = E, event = 0
 
time2
 time   2005 2006 2007 2008 2009 2010
20040   22073   21
2005001100
200700000   29
2008000000
2009000001
 
 
 , , categoria = C, event = 1
 
time2
 time   2005 2006 2007 2008 2009 2010
2004052303
2005000000
2007000232
2008000010
2009000000
 
 , , categoria = E, event = 1
 
time2
 time   2005 2006 2007 2008 2009 2010
2004721134
2005000100
2007000313
2008000000
2009000000
 
 
  when I use this:
 
  surara-survfit(Surv(time,time2,event)~categoria)
 
  Call: survfit(formula = Surv(time, time2, event) ~ categoria)
 
  records n.max n.start events median 0.95LCL 0.95UCL
  categoria=C  9463   0 21 NA  NA  NA
  categoria=E 11177   0 26 NA  NA  NA
  summary(surara)
  Call: survfit(formula = Surv(time, time2, event) ~ categoria)
 
  categoria=C
   time n.risk n.event entered censored survival std.err lower 95% CI  
  upper 95% CI
   2006 63   5   0   230.921  0.0341 
  0.8560.990
   2007 35   2  3010.868  0.0483 
  0.7780.968
   2008 62   5   130.798  0.0536 
  0.7000.910
   2009 55   4   050.740  0.0570 
  0.6360.861
   2010 46   5   0   410.660  0.0611 
  0.5500.791
 
  categoria=E
   time n.risk n.event entered censored survival std.err lower 95% CI  
  upper 95% CI
   2005 71   7   300.901  0.0354 
  0.8350.973
   2006 67   2   0   220.875  0.0391 
  0.8010.955
   2007 43   1  3610.854  0.0432 
  0.7740.943
   2008 77   5   080.799  0.0469 
  0.7120.896
   2009 64   4   130.749  0.0502 
  0.6570.854
   2010 58   7   0   510.658  0.0545 
  0.5600.774
 
 You see that your first survfit object is offering a simple sum of  
 'time2' columns of that tabulation as its 'n.event' values. It's  
 'n.risk' tabulation is not taking note of whether a case started in  
 any particular prior interval. The n.risk sum appears to be the sum of  
 persons surviving from the prior year less any decedents plus any  
 entrants as reflected in future events on that rowYou notice  
 that there are missing years even in that report: 2004,2005 for  
 category C and 2004 for category E since there are no events in  
 columns for those 'time2' values.
 
 
  but whe I included type=interval,
 
  suraraint- 
  survfit(Surv(time,time2,event,type='interval')~categoria) # falta  
  arreglar lo del intervalo!!!
  summary(suraraint)
  Call: survfit(formula = Surv(time, time2, event, type = interval) ~
  categoria)
 
  categoria=C
   time n.risk n.event survival std.err lower 95% CI upper 95% CI
   2004  95.00   13.140.862  0.03540.7950.934
   2007  31.867.190.667  0.06950.5440.818
   2008

[R] Multiplying a list of matrices with a vector

2011-09-28 Thread Mendolia, Franco

Hi all!

I have a list of matrices and I want to multiply the ith element of the list 
with the ith element of a another vector. That is,

 LL - list(A=diag(3),B=diag(3),C=diag(3))
 vec - 1:3
 for(i in 1:3)
+ {
+   LL[[i]] - LL[[i]]*vec[i]
+ }
 LL
$A
 [,1] [,2] [,3]
[1,]100
[2,]010
[3,]001

$B
 [,1] [,2] [,3]
[1,]200
[2,]020
[3,]002

$C
 [,1] [,2] [,3]
[1,]300
[2,]030
[3,]003

Is there another (more efficient) way of doing this without using loops? I 
found an older entry in this list with a similar problem, where the list 
elements were always multiplied with the same number, e.g., lapply(LL, 
function(x) x*3) and I was looking for something similar.

Best,
Franco
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiplying a list of matrices with a vector

2011-09-28 Thread R. Michael Weylandt michael.weyla...@gmail.com

Untested:

mapply('*', LL, vec)

Those should be backticks but I can't type them on my phone

Michael

On Sep 28, 2011, at 6:04 PM, Mendolia, Franco fmendo...@mcw.edu wrote:

 Hi all!
 
 I have a list of matrices and I want to multiply the ith element of the list 
 with the ith element of a another vector. That is,
 
 LL - list(A=diag(3),B=diag(3),C=diag(3))
 vec - 1:3
 for(i in 1:3)
 + {
 +   LL[[i]] - LL[[i]]*vec[i]
 + }
 LL
 $A
 [,1] [,2] [,3]
 [1,]100
 [2,]010
 [3,]001
 
 $B
 [,1] [,2] [,3]
 [1,]200
 [2,]020
 [3,]002
 
 $C
 [,1] [,2] [,3]
 [1,]300
 [2,]030
 [3,]003
 
 Is there another (more efficient) way of doing this without using loops? I 
 found an older entry in this list with a similar problem, where the list 
 elements were always multiplied with the same number, e.g., lapply(LL, 
 function(x) x*3) and I was looking for something similar.
 
 Best,
 Franco
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Testing for arguments in a function

2011-09-28 Thread Paul Johnson

On Mon, Sep 26, 2011 at 2:39 PM, Gene Leynes gley...@gmail.com wrote:
 I don't understand how this function can subset by i when i is missing

 ## My function:
 myfun = function(vec, i){
    ret = vec[i]
    ret
 }

 ## My data:
 i = 10
 vec = 1:100

 ## Expected input and behavior:
 myfun(vec, i)

 ## Missing an argument, but error is not caught!
 ## How is subsetting even possible here???
 myfun(vec)


Hello, Gene:

It seems to me the discussion of your question launches off into a
more abstract direction than we need here.  I've found it is wise to
name arguments to functions differently than variables in the
environment, so you don't have the function looking for i outside
itself.  And you can put each variable to a ridiculous default to
force an error when that option is not given.  NAN is nothing here,
just a string that has no meaning, so we get an error as you said you
wanted.


myfun - function(ivec, ii=NAN){
ivec[ii]
}
myfun(1:40,10)

works

myfun(1:40)

Produces

Error in myfun(1:40) : object 'NAN' not found

If you are happy enough to just plop out an error, there's no need to worry.

Note the function can be written more succinctly than you originally
had it, and you are generally advised to use - rather than = by
the R developers.




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] PCA: prcomp rotations

2011-09-28 Thread Colin Wahl

Hi all,
I think I may be confused by different people/programs using the word
rotation differently.

Does prcomp not perform rotations by default?
If I understand it correctly retx=TRUE returns ordinated data, that I can
plot for individual samples (prcomp()$x: which is the scaled and centered
(rotated?) data multiplied by loadings).

What does it mean that the data is rotated from the ?prcomp description?
Is this referring to the data matrix orientation (i.e. looking at
differences among samples (columns) based on variables (rows) vs.
differences among variables (columns) based on samples(rows))?

Thank you,
Colin Wahl
Graduate student,
Western Washington University


code  background:
I am looking at the ordination of abiotic stream variables between different
sampling locations.

   abiot.pca=prcomp(all24[, c(10, 13:18)], retx=TRUE, center=TRUE, scale=
TRUE)


   summary(abiot.pca)

Importance of components:

  PC1PC2PC3PC4 PC5 PC6 PC7

Standard deviation 1.5925 1.0814 1.0697 0.9536 0.76624 0.68444 0.43037

Proportion of Variance 0.3623 0.1671 0.1635 0.1299 0.08387 0.06692 0.02646

Cumulative Proportion  0.3623 0.5294 0.6928 0.8227 0.90662 0.97354 1.0


loadings[,1:3]

 PC1 PC2 PC3

avg.dmax  -0.1879223  0.55792480 -0.04962935

scond.med -0.4327223  0.04779998 -0.43369560

docon.med  0.1976094 -0.30384127  0.67222550

cb.per 0.4134302 -0.17550281 -0.40171318

gc.per 0.4136933 -0.26997129 -0.39960398

gf.per 0.3419349  0.63917223  0.16174661

fine.per  -0.5285840 -0.28616200  0.10168155

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing data via url

2011-09-28 Thread moldovean

hi
I think this already a closed topic since you figure it out yourself.
but you can always try to fetch data from Google docs (first make the
spreadsheet public) and then writing this snippet:


library(RCurl)
u=https://docs.google.com/spreadsheet/pub?hl=en_UShl=en_USkey=0As6HUAxhy0Q7dDB0bjh4T2RyS3pIQkdXdGc2U0ZZc3csingle=truegid=0output=csv;

content = getBinaryURL(u, ssl.verifypeer = FALSE)
tmp = tempfile()
write(rawToChar(content), file = tmp)
mydata- read.csv(gzfile(tmp))
mydata

it works for me and hopefully it'll work nor newbies like me too :)

--
View this message in context: 
http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3853451.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting Lines Through multiple groups

2011-09-28 Thread clara_eco

Thank you Eik Vettorazzi-2
That was exactly what I wanted!
Regards
Clara


--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3853753.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Store the executed values in a dataframe rle function

2011-09-28 Thread jim holtman

I only used textConnection for the sample data.  Just put your file
name in the read.table; e.g.,

x-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'))

as you have in your email.  I used 'x' in my code, so I replaced your
'm' with 'x'.

Try it and see if it works; no reason it shouldn't.



On Wed, Sep 28, 2011 at 3:03 PM, viritha k virith...@gmail.com wrote:
 Hi Jim,
  Thanks for the reply, ok but I dont want to use textConnection and paste
 each line but want the input to be read from a file like
 m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric').
 So how do I incorporate that in your code.
 Thanks,
 Suji
 On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote:

 The solution that I sent will handle the 150 different samples; just
 list the column names in the argument to the top 'lapply'.  You don't
 need the 'rle' in my approach.

 On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote:
  Hi,
  This is the code that I wrote for 3 samples:
  code:

  m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric'))
 
 
  s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]]))
 
  names(s)=c(Values,Probes)
 
 
  c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE)
  G=1
  n=4
 
  for(i in 1:length(s$Probes)){
 
  + if(G==1){c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else if((G-1)  length(m$Sample1)) {
 
  + c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)])
  + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)])
  + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}
  + else {
  + G=1
 
  + n=n+1
  +  c[i,1]-names(m[n])
  + c[i,2]-unique(m$Chr[G:s$Probes[i]])
  + c[i,3]-min(m$Start[G:s$Probes[i]])
  + c[i,4]-max(m$End[G:s$Probes[i]])
  + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i])
 
  + G=(G+s$Probes[i])}}
 
  c
 
      Sample  Chr    Start  End Values Probes
 
  1  Sample1 chr2  9896633 14404502  0  4
  2  Sample1 chr2 14421718 16048724  -0.43  4
  3  Sample1 chr2 37491676 37703009  0  2
  4  Sample2 chr2  9896633  9896690  0  2
  5  Sample2 chr2 14314039 16048724  -0.35  6
  6  Sample2 chr2 37491676 37703009  0  2
  7  Sample3 chr2  9896633 14314098  0  3
  8  Sample3 chr2 14404467 16031769   0.32  3
  9  Sample3 chr2 16036178 37491735   0.45  3
  10 Sample3 chr2 37702947 37703009  0  1
 
 
  The problem that I am facing is for expanding rle function for values
  and
  probes.
  Defintely your code looks simpler, but I would like to read the file by
  just
  giving the name of the file as written in my code because my original
  file
  contains 150 samples,but how to use lapply or rle function for 150 such
  samples, if my file contain 150 samples similiar to sample1 and sample2.
 
  waiting for your reply,
  Thanks,
  Suji
 
  On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com
  wrote:
 
  Here one approach:
 
   x - read.table(textConnection(Chr start end sample1 sample2
  + chr2 9896633 9896683 0 0
  + chr2 9896639 9896690 0 0
  + chr2 14314039 14314098 0 -0.35
  + chr2 14404467 14404502 0 -0.35
  + chr2 14421718 14421777 -0.43 -0.35
  + chr2 16031710 16031769 -0.43 -0.35
  + chr2 16036178 16036237 -0.43 -0.35
  + chr2 16048665 16048724 -0.43 -0.35
  + chr2 37491676 37491735 0 0
  + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE)
   closeAllConnections()
  
   result - lapply(c('sample1', 'sample2'), function(.samp){
  +     # split by breaks in the values
  +     .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0)))
  +
  +     # combine the list of dataframes
  +     .range - do.call(rbind, lapply(.grps, function(.set){
  +         # create a dataframe of the results
  +         data.frame(Sample = .samp
  +                    , Chr = .set$Chr[1L]
  +                    , Start = min(.set$start)
  +                    , End = max(.set$end)
  +                    , Values = .set[[.samp]][1L]
  +                    , Probes = nrow(.set)
  +                    )
  +         }))
  +     })
   # put the list of dataframes together
   result - do.call(rbind, result)
   result
     Sample  Chr    Start      End Values Probes
  0  sample1 chr2  9896633 14404502   0.00      4
  1  sample1 chr2 14421718 16048724  -0.43      4
  2  sample1 chr2 37491676 37703009   0.00      2
  01 sample2 chr2  9896633  9896690   0.00

Re: [R] htmlParse hangs or crashes

2011-09-28 Thread Jorge Cornejo

Hi, I been having the same problem all the afternoon, and I just realize that
only happens when I use the R64 and is not crashing using the 32 bits
version.

This must be a bug in this R version.

I hope this could helps.

--
View this message in context: 
http://r.789695.n4.nabble.com/htmlParse-hangs-or-crashes-tp3792285p3853858.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survival analysis: interval censored data

2011-09-28 Thread Ruth Arias



Thanks for taking time to answer this







De: Terry Therneau thern...@mayo.edu
Para: David Winsemius dwinsem...@comcast.net

.org
Enviado: miércoles 28 de septiembre de 2011 23:03
Asunto: Re: [R] survival analysis: interval censored data

David,
  Thanks for taking time to answer this.

   When there are interval censored data, computation of the MLE first
finds a minimal number of time points such that every interval with a 
death contains one of those points.  Those are the points at which the
final curve can have a jump.  The algorithm, more or less is:
    1. On the real line, put a pair of parenthesis down for each
interval with an event ( at the left end ) at the right.
    2. Now scan down the line for () pairs (pairs with nothing else in
the enclosed interval): the center of each will be one of the jump
points.

  With slight variations for left truncation.

Terry T.

On Wed, 2011-09-28 at 16:33 -0400, David Winsemius wrote:
 On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote:
 
 
 
  hallo terry:
 
  I attached araceae data set,
 
 The usual survival analysis via the Kaplan-Meier method only make  
 estimates at the time of events. When you tabulate your data, you see 
 that there were no events for the missing (starting) time rows in  
 those categories during the intervals that you are questioning as  
 missing:
 
 xtabs( ~ time+time2+categoria+event, data=araceae)
 , , categoria = C, event = 0
 
        time2
 time   2005 2006 2007 2008 2009 2010
    2004    0   23    1    3    1   22
    2005    0    0    0    0    0    0
    2007    0    0    0    0    4   19
    2008    0    0    0    0    0    0
    2009    0    0    0    0    0    0
 
 , , categoria = E, event = 0
 
        time2
 time   2005 2006 2007 2008 2009 2010
    2004    0   22    0    7    3   21
    2005    0    0    1    1    0    0
    2007    0    0    0    0    0   29
    2008    0    0    0    0    0    0
    2009    0    0    0    0    0    1
 
 
 , , categoria = C, event = 1
 
        time2
 time   2005 2006 2007 2008 2009 2010
    2004    0    5    2    3    0    3
    2005    0    0    0    0    0    0
    2007    0    0    0    2    3    2
    2008    0    0    0    0    1    0
    2009    0    0    0    0    0    0
 
 , , categoria = E, event = 1
 
        time2
 time   2005 2006 2007 2008 2009 2010
    2004    7    2    1    1    3    4
    2005    0    0    0    1    0    0
    2007    0    0    0    3    1    3
    2008    0    0    0    0    0    0
    2009    0    0    0    0    0    0
 
 
  when I use this:
 
  surara-survfit(Surv(time,time2,event)~categoria)
 
  Call: survfit(formula = Surv(time, time2, event) ~ categoria)
 
              records n.max n.start events median 0.95LCL 0.95UCL
  categoria=C      94    63       0     21     NA      NA      NA
  categoria=E     111    77       0     26     NA      NA      NA
  summary(surara)
  Call: survfit(formula = Surv(time, time2, event) ~ categoria)
 
                  categoria=C
   time n.risk n.event entered censored survival std.err lower 95% CI  
  upper 95% CI
   2006     63       5       0       23    0.921  0.0341        
  0.856        0.990
   2007     35       2      30        1    0.868  0.0483        
  0.778        0.968
   2008     62       5       1        3    0.798  0.0536        
  0.700        0.910
   2009     55       4       0        5    0.740  0.0570        
  0.636        0.861
   2010     46       5       0       41    0.660  0.0611        
  0.550        0.791
 
                  categoria=E
   time n.risk n.event entered censored survival std.err lower 95% CI  
  upper 95% CI
   2005     71       7       3        0    0.901  0.0354        
  0.835        0.973
   2006     67       2       0       22    0.875  0.0391        
  0.801        0.955
   2007     43       1      36        1    0.854  0.0432        
  0.774        0.943
   2008     77       5       0        8    0.799  0.0469        
  0.712        0.896
   2009     64       4       1        3    0.749  0.0502        
  0.657        0.854
   2010     58       7       0       51    0.658  0.0545        
  0.560        0.774
 
 You see that your first survfit object is offering a simple sum of  
 'time2' columns of that tabulation as its 'n.event' values. It's  
 'n.risk' tabulation is not taking note of whether a case started in  
 any particular prior interval. The n.risk sum appears to be the sum of 
 persons surviving from the prior year less any decedents plus any  
 entrants as reflected in future events on that row    You notice 
 that there are missing years even in that report: 2004,2005 for  
 category C and 2004 for category E since there are no events in  
 columns for those 'time2' values.
 
 
  but whe I included type=interval,
 
  suraraint- 
  survfit(Surv(time,time2,event,type='interval')~categoria) # falta 
[[elided Yahoo spam]]
  summary(suraraint)
  Call: survfit(formula = Surv(time, time2, event, type = interval) ~

Re: [R] How does the survfit.coxph calculate the survival value?

2011-09-28 Thread koshihaku

Dear Professor Terry Therneau,
Sorry for another naive question about R.

Some one suggested me to use mathematica to analyze Cox proportional hazard
model. For example estimate the coefficients, and calculate the survival and
hazard function.
But I think that may be not possible, especially that all the six covariates
in my model are time-dependent.

Could you please give me any opinion on this?

Koshihaku

--
View this message in context: 
http://r.789695.n4.nabble.com/How-does-the-survfit-coxph-calculate-the-survival-value-tp3847179p3853994.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How does the survfit.coxph calculate the survival value?

2011-09-28 Thread koshihaku

Thank you very much!

I am analyzing software stress test data using R, and fiting data by coxph.
I think the survfit.coxph can give the estimated baseline survival S_0(t_i),
and we can calculated the survival S(t_i) by using S_0(t_i) and the x(t_i). 

Then can I use the function lambda={S(t_(i-1))-S(t_i)}/S(t_(i-1)) to
calculate the hazard at time t_i ?

Koshihaku

--
View this message in context: 
http://r.789695.n4.nabble.com/How-does-the-survfit-coxph-calculate-the-survival-value-tp3847179p3853923.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

94 matches

Mail list logo