[R] Strange behavior with poisosn and glm

2010-03-02 Thread Noah Silverman

Hi,

I'm just learning about poison links for the glm function.

One of the data sets I'm playing with has several of the variables as 
factors (i.e. month, group, etc.)


When I call the glm function with a formula that has a factor variable, 
R automatically converts the variable to a series of variables with 
unique names and binary values.


For example, with this pseudo data:

yv1month
21january
31.4februrary
1.56.3february
1.24.5january
5.54.0march

I use this call:

m - glm(y ~ v1 + month, family=poisson)

R gives me back a model with variables of
Intercept
v1
monthJanuary
monthFebruary
monthMarch

I'm concerned that this might be doing some strange things to my model.  
Can anyone offer some enlightenment?


Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Karl Ove Hufthammer
On Tue, 2 Mar 2010 08:58:25 +1300 Peter Alspach 
peter.alsp...@plantandfood.co.nz wrote:
 This brings up another confusion for new users.  Simply typing the
 object name at the command line gives just one view of the object (that
 provided by print()).

Good point. Any good introduction to R should include a brief discussion 
on 'str'. But sometimes even 'str' can fool you from discovering the 
real underlying structure of an object, e.g. for data frames. The 
solution is to use 'unclass' first.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] file reading /problems with encoding

2010-03-02 Thread T . Wunder

Quoting Uwe Ligges lig...@statistik.tu-dortmund.de:


R is not able to re-encode the file to the native encoding. But if you
keep it in UTF-8, what is the problem to grep for the specific
characters (as grep and friends support the argument useBytes these
days)?



The Problem with UTF-8 is that I'm not able to cat a valid xml-file.
Using the encoding=UTF-8 option in either the file() or the  
readLines() command will cause an error. If I would leave out both,  
it's not possible for me to run a gsub command on the string, because  
of special characters - even with the useBytes-option turned on:

grep(über 40%,xml,useBytes=TRUE)
will return integer(0). And the problem is obvious:
By reading in the file, the ü was taken to üb.
However I believe, that I did not use the useBytes-option in the right  
way, didn't I?


Thanks a lot for your help!

Best regards, Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] embedded nuls in 2.10 versus 2.11

2010-03-02 Thread Brandon Whitcher
I have been reading binary files, and parsing the output, for some
time now.  I have tried to develop a technique that is as robust as
possible to all the strange things that appear in text fields, not to
mention different global/regional encodings.  I have no control over
the data generated by users, so I would like to be as flexible and
accommodating as possible.  The following code is straightforward, but
will fail with embedded nuls in R = 2.10

fid = open(filename, rb)
readChar(fid, n=10)
close(fid)

Previous suggestions from the R-help list led me to consider

fid = open(filename, rb)
rawToChar(readBin(fid, raw, 10))
close(fid)

or even

fid = open(filename, rb)
iconv(rawToChar(readBin(fid, raw, 10)), to=UTF-8)
close(fid)

to ensure that my output is well behaved.  With the new error
handling in rawToChar() in R = 2.11, embedded nuls are no longer
allowed except at the end of the string.  I run across these all the
time in my user data.  How can I recover as much of the text as
possible when reading in from a binary file with embedded nuls in R =
2.11 and keep the code backwards compatible with R  2.11?

thanks...

Brandon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior with poisosn and glm

2010-03-02 Thread Ted Harding
On 02-Mar-10 08:02:27, Noah Silverman wrote:
 Hi,
 I'm just learning about poison links for the glm function.
 
 One of the data sets I'm playing with has several of the
 variables as factors (i.e. month, group, etc.)
 
 When I call the glm function with a formula that has a factor
 variable, R automatically converts the variable to a series of
 variables with unique names and binary values.
 
 For example, with this pseudo data:
 
 yv1month
 21january
 31.4februrary
 1.56.3february
 1.24.5january
 5.54.0march
 
 I use this call:
 
 m - glm(y ~ v1 + month, family=poisson)
 
 R gives me back a model with variables of
 Intercept
 v1
 monthJanuary
 monthFebruary
 monthMarch
 
 I'm concerned that this might be doing some strange things
 to my model.
 Can anyone offer some enlightenment?
 Thanks!

The creation of auxiliary variables is the way to incorporate
a factor variable into a model. These are usually called
dummy variables, and are essentially indicator variables.

Your data above would correspond to variables I (for Intercept),
J (for January), F (for February) and M (for March) in addition
to the other variables y and v1 as below:

  y  v1I   J   F   M   #   month
  2  1 1   1   0   0   #  january
  3  1.4   1   0   1   0   #  februrary
  1.56.3   1   0   1   0   #  february
  1.24.5   1   1   0   0   #  january
  5.54.0   1   0   0   1   #  march

The linear predictor L in the model for y would then be

  L = a*I + b*v1 + c1*J + c2*F + c3*J

evaluated arithmetically; e.g. for row 2 of the data it is

  a + b*1.4 + c2

However, as given, J + F + M = I, so there is redundancy in
the variables, since there are only three independent values
there  (not so if you exclude the Intercept using a model
formula y ~ v1 + month - 1), so R will provide estimates
which are computed in terms of some pattern of differences
between these four variables called contrasts. Different
patterns of difference present different representations
of the three independent aspects.

There are many different kinds of contrasts available.
One of these will be chosen as default by R (depending in
particular on whether the factor variable is being used
as an ordered factor or an unordered factor). See ?contrasts
for an outline of what is there, ?contrast for more detail,
and look at the help for particular contrasts such as
?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment.

After all that: No, R is not doing strange things to your model!

ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 02-Mar-10   Time: 08:47:11
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Linear Autoregressive Model with R Language

2010-03-02 Thread Paul Hiemstra

Emil Davtyan wrote:

Hello -

I need to do simple linear autoregressive model with R software for my
thesis. I looked into all your documentation and I am not able to find
anything too helpful. Can someone help me with the codes?

Thanks
Emil


  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  

Hi,

Google R ar model, the first hit gives:

http://stat.ethz.ch/R-manual/R-patched/library/stats/html/ar.html

cheers,
Paul

--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Linear Autoregressive Model with R Language

2010-03-02 Thread Achim Zeileis



On Mon, 1 Mar 2010, Emil Davtyan wrote:


Hello -

I need to do simple linear autoregressive model with R software for my
thesis. I looked into all your documentation and I am not able to find
anything too helpful. Can someone help me with the codes?


By all documentation you mean that you have also looked at the time 
series and econometrics task views that containt information on that 
topic? See


  http://CRAN.R-project.org/view=TimeSeries
  http://CRAN.R-project.org/view=Econometrics

In particular ar() (or maybe arima()) in the basic stats model seems to 
be what you are looking for. Packages FitAR or dynlm might also be 
useful.


Best,
Z


Thanks
Emil



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the predict.lda function

2010-03-02 Thread Gavin Simpson
On Mon, 2010-03-01 at 16:55 -0500, Diana Connett wrote:
 Hello.
 I just downloaded R onto a new computer, and after entering library(MASS), I
 still get the message Error: could not find function predict.lda when I 
 try
 to use the predict.lda function (even just predict.lda())
 
 Can anyone help me out?

Stop calling it directly, use the generic predict() instead. The reason
predict.lda can't be found is that it is hidden in a package NAMESPACE:

 require(MASS)
Loading required package: MASS
 predict.lda()
Error: could not find function predict.lda
 methods(predict)
 [1] predict.ar*predict.Arima*
 [3] predict.arima0*predict.glm   
 [5] predict.glmmPQL*   predict.HoltWinters*  
 [7] predict.lda*   predict.lm
 [9] predict.loess* predict.lqs*  
[11] predict.mca*   predict.mlm   
[13] predict.nls*   predict.polr* 
[15] predict.poly   predict.ppr*  
[17] predict.prcomp*predict.princomp* 
[19] predict.qda*   predict.rlm*  
[21] predict.smooth.spline* predict.smooth.spline.fit*
[23] predict.StructTS* 

   Non-visible functions are asterisked

By calling things directly you aren't really using R the way the
developers want you to. You should not need to know that there are all
those predict methods and what their names are etc. You should just need
to check that there is a method for the object/code you are using and
then call the generic function whilst R takes care of everything else.

If you *must* call it directly:

MASS:::predict.lda()

See ?`:::`

HTH

G

 Thank you!
 
 Diana Connett
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changepoints estimation in a data series

2010-03-02 Thread Achim Zeileis

On Mon, 1 Mar 2010, FMH wrote:


Dear All,

I'm trying to find changepoints in a data series  which only consist of  11 
measurements of altitude(x) and temperature(y), respectively, in which the data 
are as followed:

y = 16.3, 16.2, 16.1, 15.6, 14.2, 10, 8.2,  8.0, 7.5, 7.3, 7.2
x = 1,  2,  5, 10, 15, 20, 25, 30, 40, 50, 60

From the above series, i reckon there is more than one changepoint and  
presuming there is a package in R which might enable the estimation on 
such changepoints.


It depends what exactly you mean by changepoint, especially because the 
curve looks more sigmoidal than with a clear-cut change. Maybe these have 
been averaged already.


In any case, some useful methods might include:
  o maxstat_test() in package coin for changepoint estimation via
maximally selected statistics
  o breakpoints() in package strucchange for OLS estimation of two
separate constant means
  o segmented() in package segmented for OLS estimation of a broken
line trend

If you are looking for two separate constant means, I would probably 
employ the maximally selected statistics in this situation:


## data
x - c(1, 2, 5, 10, 15, 20, 25, 30, 40, 50, 60)
y - c(16.3, 16.2, 16.1, 15.6, 14.2, 10, 8.2, 8.0, 7.5, 7.3, 7.2)
plot(y ~ x, type = b)

## test
library(coin)
maxstat_test(y ~ x)

## add estimated changepoint
abline(v = 15, lty = 2)

hth,
Z


Could someone please advice me on this matter by using R?

Cheers,
FMH




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] capturing errors in Sweave

2010-03-02 Thread Sundar Dorai-Raj
Thanks, Berwin. That works just great!

--sundar

On Tue, Mar 2, 2010 at 12:57 AM, Berwin A Turlach
ber...@maths.uwa.edu.auwrote:

 G'day Sundar,

 On Mon, 1 Mar 2010 23:46:55 -0800
 Sundar Dorai-Raj sdorai...@gmail.com wrote:

  Thanks for the input, but I don't want try in the Sweave output. I
  want the output to look just like it does in the console, as if an
  uncaptured error really did occur.

 I don't think that you will get around using try; and you will have
 to work moderately hard to make the output appear as it does on the
 console.  Probably somewhere along the lines:

  Sweave code start ++
 Function-4a=
 MySqrt - function(x) {
  if (missing(x)) {
stop('x' is missing with no default)
  }
  if (!is.numeric(x)) {
stop('x' should only be numeric)
  }
  if (x  0) {
stop('x' should be non-negative)
  }
  return(sqrt(x))
 }
 @

 echo=FALSE=
 tmp - try(MySqrt())
 @
 eval=FALSE=
 MySqrt()
 @
 echo=FALSE=
  cat(tmp[1])
 @

 echo=FALSE=
 tmp - try(MySqrt(a))
 @
 eval=FALSE=
 MySqrt(a)
 @
 echo=FALSE=
  cat(tmp[1])
 @

 echo=FALSE=
 tmp - try(MySqrt(-2))
 @
 eval=FALSE=
 MySqrt(-2)
 @
 echo=FALSE=
  cat(tmp[1])
 @

 =
 MySqrt(4)
 @
 +++ Sweave code end ++

 Now what I would like to know is how to include easily warning messages
 in my Sweave output without having to try whether Jean Lobry's [1] hack
 still works. :)

 HTH.

 Cheers,

Berwin

 [1]
 https://www.stat.math.ethz.ch/pipermail/r-help/2006-December/121975.html

 == Full address 
 Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
 School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
 The University of Western Australia   FAX : +61 (8) 6488 1028
 35 Stirling Highway
 Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
 Australiahttp://www.maths.uwa.edu.au/~berwin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] repeated measures anova, car package

2010-03-02 Thread Kay Cichini

Hello John,

As you said, I could also take a means model and test linear hypothesis for
the desired effects - would this also be the case for the repeated measure i
did in the first place. 
I copied the model from the car model where you first call: 

 modx-lm(cbind(div_h, div_l) ~ site, divrep)

(?Could I test linear hypothesis here, instead of continuing as I did
beneath)

 idat
  cover
1  high
2   low

 (av.ok1 - Anova(modx, idata=idat, idesign=~cover))

Type II Repeated Measures MANOVA Tests: Pillai test statistic
   Df test stat approx F num Df den Df   Pr(F)   
site1   0.49908   9.9631  1 10 0.010220 * 
cover   1   0.28145   3.9169  1 10 0.075984 . 
site:cover  1   0.53963  11.7216  1 10 0.006507 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 divrep
   repl.  site div_h div_l
1  1 Scrub  4.18  5.23
2  2 Scrub  5.47  7.18
3  3 Scrub  3.74  4.97
4  4 Scrub  2.62  5.17
5  5 Scrub  3.33  6.43
6  6 Scrub  1.62  8.96
7  1 Tall_Forb  4.70  3.88
8  2 Tall_Forb  3.65  1.97
9  3 Tall_Forb  2.50  1.19
10 4 Tall_Forb  1.87  2.37
11 5 Tall_Forb  5.33  3.56
12 6 Tall_Forb  3.06  3.60

Your answers helped a lot -
Thank you very much for the quick reply!

Best wishes,
Kay
-- 
View this message in context: 
http://n4.nabble.com/repeated-measures-anova-car-package-tp1573721p1574747.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thougt I understood factors but??

2010-03-02 Thread Karl Ove Hufthammer
On Mon, 1 Mar 2010 14:23:04 -0500 Liaw, Andy andy_l...@merck.com 
wrote:
 Indeed this is one of the (few, I believe) traps of R,

Oh, no; there are many more:
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf :-)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why a text editor?

2010-03-02 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 16:26:37 - (GMT) ted.hard...@manchester.ac.uk 
ted.hard...@manchester.ac.uk wrote:
 In vim (to which I'm wedded
 for life) it will pick up matching (), {} and [].

You can also easily move between matching delimeter by typing '%'. A 
similar feature should be available in all *good* text editors.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Karl Ove Hufthammer
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
 Suppose X is a dataframe or a matrix.  What would you expect to get from 
 X[1]?  What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)

The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.

(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)

 The point is that a dataframe is a list, and a matrix isn't.  If users 
 don't understand that, then they'll be confused somewhere.  Making 
 matrices more list-like in one respect will just move the confusion 
 elsewhere.  The solution is to understand the difference.

My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,colname]', and if it's a data 
frame one may use the shortcut 'X$colname'.

Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, and *usually* gives the 
correct answer.

For example, 'mean' applied on a data frame gives a named vector, not a 
data frame, which is somewhat surprising (given that the columns of a 
data frame may be of different types, while the elements of a vector may 
not). (And yes, I know that it's *documented* that it returns a named 
vector.) On the other hand, perhaps it is surprising that 'mean' works 
on data frames at all. :-)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lm.influence on glm objects

2010-03-02 Thread Cipollini Fabrizio
Dear R users
Today I discovered that function lm.influence() stops when applied to
glm objects with the following error message

Error in if (NROW(e) != n) stop(non-NA residual length does not match
cases used in fitting) :
  argument is of length zero

After inspecting lm.influence.R (both into R-2.10.1.tar.gz and
R-patched.tar.gz) i found (line 53) that n is computed as
n - as.integer(nrow(model$qr$qr))

However, glm objects (differently from lm objects) do not have a $qr
component. Is this intentional, i.e. it means that we have to use
lm.influence only with glm objects?
It could be, but I remark that the lm.influence{stats} help says:

The influence.measures() and other functions listed in See Also
provide a more user oriented way of computing a variety of regression
diagnostics. These all build on lm.influence. Note that for GLMs
(other than the Gaussian family with identity link) these are based on
one-step approximations which may be inadequate if a case has high
influence.

Moreover, if we have to use such a function only with lm objects, I
would suggest to implement some more explicit check.

Thanks in advance for any help

Fabrizio Cipollini

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior with poisosn and glm

2010-03-02 Thread Noah Silverman

Ted,

Brilliant explanation (as usual)

I'm back in school, just starting on a post-graduate degree in stats so 
the help is really appreciated.


Now, I have a slightly trickier question about the same model.

I've seen more than one way to get values out of the glm model.

i.e.  If we're looking at the 10th item in the dataset:
note: m is the model

fitted(m)[10]
predict(m,dataset[10,])

Give me different results.  From my data, I get the following real results:
 predict(m,data[100,])
 100
7.727999
 fitted(m)[100]
 179
3956.637

From my understanding, the exp of the prediction should be equal to the 
fitted value.  Here it is not.  I don't understand why.  Any insight?


-N



On 3/2/10 12:47 AM, (Ted Harding) wrote:

On 02-Mar-10 08:02:27, Noah Silverman wrote:
   

Hi,
I'm just learning about poison links for the glm function.

One of the data sets I'm playing with has several of the
variables as factors (i.e. month, group, etc.)

When I call the glm function with a formula that has a factor
variable, R automatically converts the variable to a series of
variables with unique names and binary values.

For example, with this pseudo data:

yv1month
21january
31.4februrary
1.56.3february
1.24.5january
5.54.0march

I use this call:

m- glm(y ~ v1 + month, family=poisson)

R gives me back a model with variables of
Intercept
v1
monthJanuary
monthFebruary
monthMarch

I'm concerned that this might be doing some strange things
to my model.
Can anyone offer some enlightenment?
Thanks!
 

The creation of auxiliary variables is the way to incorporate
a factor variable into a model. These are usually called
dummy variables, and are essentially indicator variables.

Your data above would correspond to variables I (for Intercept),
J (for January), F (for February) and M (for March) in addition
to the other variables y and v1 as below:

   y  v1I   J   F   M   #   month
   2  1 1   1   0   0   #  january
   3  1.4   1   0   1   0   #  februrary
   1.56.3   1   0   1   0   #  february
   1.24.5   1   1   0   0   #  january
   5.54.0   1   0   0   1   #  march

The linear predictor L in the model for y would then be

   L = a*I + b*v1 + c1*J + c2*F + c3*J

evaluated arithmetically; e.g. for row 2 of the data it is

   a + b*1.4 + c2

However, as given, J + F + M = I, so there is redundancy in
the variables, since there are only three independent values
there  (not so if you exclude the Intercept using a model
formula y ~ v1 + month - 1), so R will provide estimates
which are computed in terms of some pattern of differences
between these four variables called contrasts. Different
patterns of difference present different representations
of the three independent aspects.

There are many different kinds of contrasts available.
One of these will be chosen as default by R (depending in
particular on whether the factor variable is being used
as an ordered factor or an unordered factor). See ?contrasts
for an outline of what is there, ?contrast for more detail,
and look at the help for particular contrasts such as
?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment.

After all that: No, R is not doing strange things to your model!

ted.


E-Mail: (Ted Harding)ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 02-Mar-10   Time: 08:47:11
-- XFMail --



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading sas7bdat files directly

2010-03-02 Thread Chris Long
The dsread output is little-endian, as that's the native format for 
floats on the Wintel platform.  The byte order should stay the same if 
converting directly to a float, using a data structure like (C/C++):

union {
 char bytes[8];
 double value;
}

If reading the values with a SAS HEX informat, the bytes will need to be 
reversed.  It's obviously trivial for me to add an endian-ness option, 
I'll do that later

Chris.

On 02/03/2010 02:06, Roger DeAngelis(xlr82sas) wrote:
 Hi,

It looks like we may need to swap bytes(little endian to big endian). I
 will look into it tonight.

As a side note, SAS reserves 28 floats for missing values. It should be
 easy to convert these to NaN on input to R.

 You can test this in SAS by converting the 16 char floats to ieee8. in SAS
 and doing a put. The result will be A, B...Z, . and _.

 SAS code that produced the listing is below.

 Here are the floats that map to the 28 missing values in SAS

 A  FD00
 B  FC00
 C  FB00
 D  FA00
 E  F900
 F  F800
 G  F700
 H  F600
 I  F500
 J  F400
 K  F300
 L  F200
 M  F100
 N  F000
 O  EF00
 P  EE00
 Q  ED00
 R  EC00
 S  EB00
 T  EA00
 U  E900
 V  E800
 W  E700
 X  E600
 Y  E500
 Z  E400
 _  FF00
 .  FE00

 data mis;
 retain A .A B .B C .C D .D E .E F .F G .G H .H I .I J .J K .K L .L M .M
 N .N O .O P .P Q .Q R .R S .S T .T U .U V .V W .W X .X Y .Y Z .Z
_ ._ DOT .;
 array mis[28] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ DOT;
do idx=1 to 28;
   hex=put(mis[idx],ieee8.);
   xeh=put(hex,hex16.);
   put @1 mis[idx] @6 xeh;
end;




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] capturing errors in Sweave

2010-03-02 Thread Berwin A Turlach
G'day Sundar,

On Mon, 1 Mar 2010 23:46:55 -0800
Sundar Dorai-Raj sdorai...@gmail.com wrote:

 Thanks for the input, but I don't want try in the Sweave output. I
 want the output to look just like it does in the console, as if an
 uncaptured error really did occur.

I don't think that you will get around using try; and you will have
to work moderately hard to make the output appear as it does on the
console.  Probably somewhere along the lines:

 Sweave code start ++
Function-4a=
MySqrt - function(x) {
  if (missing(x)) {
stop('x' is missing with no default)
  }
  if (!is.numeric(x)) {
stop('x' should only be numeric)
  }
  if (x  0) {
stop('x' should be non-negative)
  }
  return(sqrt(x))
}
@

echo=FALSE=
tmp - try(MySqrt())
@ 
eval=FALSE=
MySqrt()
@ 
echo=FALSE=
  cat(tmp[1])
@ 

echo=FALSE=
tmp - try(MySqrt(a))
@ 
eval=FALSE=
MySqrt(a)
@ 
echo=FALSE=
  cat(tmp[1])
@ 

echo=FALSE=
tmp - try(MySqrt(-2))
@ 
eval=FALSE=
MySqrt(-2)  
@ 
echo=FALSE=
  cat(tmp[1])
@ 

=
MySqrt(4)
@ 
+++ Sweave code end ++

Now what I would like to know is how to include easily warning messages
in my Sweave output without having to try whether Jean Lobry's [1] hack
still works. :)

HTH.

Cheers,

Berwin

[1] https://www.stat.math.ethz.ch/pipermail/r-help/2006-December/121975.html

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Install R 2.10.1 on Windows XP Errors

2010-03-02 Thread mkborregaard

From CRAN: 
2.8 What's the best way to upgrade?

That's a matter of taste. For most people the best thing to do is to
uninstall R (see the previous Q), install the new version, copy any
installed packages to the library folder in the new installation, run
update.packages(checkBuilt=TRUE, ask=FALSE) in the new R and then delete
anything left of the old installation. 

Is there now a new procedure for updating the old packages?

Michael Borregaard, PhD student
Department of Biology
University of Copenhagen
-- 
View this message in context: 
http://n4.nabble.com/Install-R-2-10-1-on-Windows-XP-Errors-tp1310942p1574794.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with package fpc

2010-03-02 Thread Sarah Paul

   I am trying to load package fpc in order to use the 'plotcluster' function
   however everytime I attempt to do so I get the following warning message:



library(fpc)
   Loading required package: MASS
   Error: package 'MASS' could not be loaded
   In addition: Warning messages:
   1: package 'fpc' was built under R version 2.9.2
   2: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc =
   lib.loc) :
 there is no package called 'MASS'



   I thought that MASS was one of the basic packages supplied with R but when I
   look in the library I cannot find it.

   However in windows programme documents under R there is a folder called
   MASS.



   Could somebody tell me where I am going wrong?



   Thanks in advance,



   Sarah Paul,

   Cardiff University
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Liviu Andronic
On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com wrote:
 On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:
  Perhaps my biggest problem was that I couldn't (and still haven't) seen
 *absolute beginners* documents.

 there was once a link posted on r-sig-teaching that would probably fit
 your needs, but I cannot find it now.


OK, I found it. Below is an excerpt of that r-sig-teaching e-mail.
Liviu

On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote:
 I think such a website would be a real asset.  It would be most useful
 if it either were restricted to intro. stats. OR organized so that
 materials for real beginners were easy to extract from all the
 materials for programmers and Ph.D. statisticians.  As a relative
 beginner myself, I find the usual resources useless.  In self defense,
 I created materials for my own beginning students:

  http://courses.statistics.com/software/R/Rhome.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bwplot with pch = |

2010-03-02 Thread Deepayan Sarkar
On Mon, Mar 1, 2010 at 10:49 AM, Duncan Mackay mac...@northnet.com.au wrote:
 Dear All

 Below is a toy example of a modified standard bwplot.

 require(lattice)
 DF -
 data.frame(site = rep(1:5, each = 20),
           height = rnorm(100))

 bwplot(site ~ height,DF,
 pch = |,
 par.settings = list(strip.background = list(col = transparent),
  box.rectangle = list(col = grey70,lty = 1),
  box.umbrella = list(col = grey70,lty = 1),
  plot.symbol = list(alpha = 1,col = grey70,cex = 1,pch = 20),
  superpose.symbol = list(cex = rep(0.7, 7),col = black, pch = rep(20,7)))
 )

 The help guide shows that pch = | is a special case.
 This give me a line across the box which is what I want but how do I make it
 thicker and red.

The part of panel.bwplot() responsible for this is

if (all(pch == |))
{
mult - if (notch) 1 - notch.frac else 1
panel.segments(blist.stats[, 3],
   levels.fos - mult * blist.height / 2,
   blist.stats[, 3],
   levels.fos + mult * blist.height / 2,
   lwd = box.rectangle$lwd,
   lty = box.rectangle$lty,
   col = box.rectangle$col,
   alpha = alpha)
}

which shows that you are stuck with the same color as the rest of the
box. However, you can add your own thick red lines in a custom panel
function:

bwplot(site ~ height,DF,
   pch = |,
   panel = function(x, y, ...) {
   panel.bwplot(x, y, ...)
   meds - tapply(x, y, median)
   ylocs - seq_along(meds)
   panel.segments(meds, ylocs - 1/4,
  meds, ylocs + 1/4,
  lwd = 2, col = red)
   })

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question to define a matrix with some vectors with different lengths

2010-03-02 Thread khazaei
Hi,
I have some vector v1,v2,...,vk, with different  lengths. I want to
consider these vectors as a matrix with k rows.
Can you please guide me how I can do it?

Regards
khazaei

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginner

2010-03-02 Thread Brandon Zicha

What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


Easy.  I terms of materials I have been unable to find good books that  
introduce users to R from the perspective of someone familiar only  
with packages like SPSS or STATA, or not familiar with statistics  
packages at all.  Even introduction texts use jargon without  
introducing it.


I think that R-help files should be more thorough than they are, and  
contain more examples.  I thought that STATA help files were sparse!   
The notion that 'R is a user community and thus they do this in their  
spare time' is no excuse for those creating new tools for R not  
developing complete help files.  It doesn't take that much time  
relative to actually creating the new function.


In terms of actual R use - creating, using, and manipulating data are  
the biggest frustration for those of the 'spreadsheet generation'.  I  
get the impression that one needs to not merely understand, but be  
fully fluent in the jargon of matrix mathematics to even know what is  
going on half the time.  I find myself - even now - using 'rules of  
thumb' that 'seemed to work' rather than fully understanding what I am  
doing.  It is particularly discouraging when many of those 'intro  
books' suggest using something besides R for data manipulation - how  
clumsy is that!?


I find the actual programming syntax itself is the easiest part to  
master.  It is certainly more flexible - but without a particularly  
sufficient increase in complexity - than trying to write script in  
SPSS and STATA.


Brandon Zicha

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread John Sorkin
Please take what follows not as an ad hominem statement, but rather as an 
attempt to improve what is already an excellent program, that has been built as 
a result of many, many hours of dedicated work by many, many unpaid, unsung 
volunteers.

It troubles me a bit that when a confusing aspect of R is pointed out the 
response is not to try to improve the language so as to avoid the confusion, 
but rather to state that the confusion is inherent in the language. I 
understand that to make changes that would avoid the confusing aspect of the 
language that has been discussed in this thread would take time and effort by 
an R wizard (which I am not), time and effort that would not be compensated in 
the traditional sense. This does not mean that we should not acknowledge the 
confusion. If we what R to be the de facto lingua franca of statistical 
analysis doesn't it make sense to strive for syntax that is as straight forward 
and consistent as possible? 

Again, please understand that my comment is made with deepest respect for the 
many people who have unselfishly contributed to the R project. Many thanks to 
each and every one of you.

John


 Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
 Suppose X is a dataframe or a matrix.  What would you expect to get from 
 X[1]?  What about as.vector(X), or as.numeric(X)?

All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.

Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)

The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.

(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)

 The point is that a dataframe is a list, and a matrix isn't.  If users 
 don't understand that, then they'll be confused somewhere.  Making 
 matrices more list-like in one respect will just move the confusion 
 elsewhere.  The solution is to understand the difference.

My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,colname]', and if it's a data 
frame one may use the shortcut 'X$colname'.

Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, and *usually* gives the 
correct answer.

For example, 'mean' applied on a data frame gives a named vector, not a 
data frame, which is somewhat surprising (given that the columns of a 
data frame may be of different types, while the elements of a vector may 
not). (And yes, I know that it's *documented* that it returns a named 
vector.) On the other hand, perhaps it is surprising that 'mean' works 
on data frames at all. :-)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] code for empirical copula

2010-03-02 Thread Roslina Zakaria
Hi,
 
I hope somebody can give me an idea where can I can find the code for empirical 
copula.
I have a bivariate data.
 
Thank you so much for your help.




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginner

2010-03-02 Thread Paul Hiemstra

Brandon Zicha wrote:

What were your biggest misconceptions or
stumbling blocks to getting up and running
with R?


Easy.  I terms of materials I have been unable to find good books that 
introduce users to R from the perspective of someone familiar only 
with packages like SPSS or STATA, or not familiar with statistics 
packages at all.  Even introduction texts use jargon without 
introducing it.


I think that R-help files should be more thorough than they are, and 
contain more examples.  I thought that STATA help files were sparse!  
The notion that 'R is a user community and thus they do this in their 
spare time' is no excuse for those creating new tools for R not 
developing complete help files.  It doesn't take that much time 
relative to actually creating the new function.

Hi Brandon,

I would disagree with your point that documentation doesn't take much 
time. Writing documentation that is suitable for both the advanced user 
(being a reference, and thus preferably short) and the beginning user 
(being sort of a tutorial, and thus prefererably longer) is quite a 
challenge, comparable to writing a good paper. Apart from the fact that 
it takes quite a while, it is also not much fun. Often people develop 
packages for their own research and put the software online so others 
can benefit, they don;t need the documentation themselves and don't get 
paid to write the documentation.


So saying 'it's no excuse' really goes too far in my view. R is free, 
you did not pay several thousands of euros giving you the right for good 
support. Even the support is free through the mailing list. You can get 
a paid version of R at Revelution Computing. Then you can call them if 
there are problems. I'm not meaning to offend anybody, but I didn't 
agree with is no excuse for those creating new tools for R not 
developing complete help files.  Partly the strength of R is in the 
open source, but sometimes, as with documentation, this can bite you. 
But I think the R docs aren't that bad, I've seen proprietary software 
that a worse job than R.


my 2euro on the subject :),

Cheers,
Paul


In terms of actual R use - creating, using, and manipulating data are 
the biggest frustration for those of the 'spreadsheet generation'.  I 
get the impression that one needs to not merely understand, but be 
fully fluent in the jargon of matrix mathematics to even know what is 
going on half the time.  I find myself - even now - using 'rules of 
thumb' that 'seemed to work' rather than fully understanding what I am 
doing.  It is particularly discouraging when many of those 'intro 
books' suggest using something besides R for data manipulation - how 
clumsy is that!?


I find the actual programming syntax itself is the easiest part to 
master.  It is certainly more flexible - but without a particularly 
sufficient increase in complexity - than trying to write script in 
SPSS and STATA.


Brandon Zicha

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random real numbers

2010-03-02 Thread frederik vanhaelst
Hi,

How could i generate random real numbers between 0 en 2*pi?

Thanks,

Frederik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem With Pasting (Mac OS X)

2010-03-02 Thread Pete Marchetto

Greetings, fellow travelers.
When I paste a series of commands into R, they execute serially, and when I
go back through commands by hitting the up key, they show up as a block,
rather than as individual lines. Is there any way to change this behavior?
I'm running the 32-bit build of R 2.10.1 on Mac OS X 10.6.2.
-- 
View this message in context: 
http://n4.nabble.com/Problem-With-Pasting-Mac-OS-X-tp1574871p1574871.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginner

2010-03-02 Thread Karl Ove Hufthammer
On Tue, 2 Mar 2010 12:31:45 +0100 Brandon Zicha brandon.zi...@ua.ac.be 
wrote:
 Easy.  I terms of materials I have been unable to find good books that  
 introduce users to R from the perspective of someone familiar only  
 with packages like SPSS or STATA,

Have you read these books:

R for SAS and SPSS Users
http://www.springer.com/statistics/computanional+statistics/book/978-0-
387-09417-5

R for Stata Users
http://www.springer.com/statistics/computanional+statistics/book/978-1-
4419-1317-3

(I have not, so I don't know how good they are.)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question to define a matrix with some vectors with different lengths

2010-03-02 Thread Karl Ove Hufthammer
On Tue, 2 Mar 2010 12:29:28 +0100 (CET) khaz...@ceremade.dauphine.fr 
khaz...@ceremade.dauphine.fr wrote:
 I have some vector v1,v2,...,vk, with different  lengths. I want to
 consider these vectors as a matrix with k rows.
 Can you please guide me how I can do it?

What do you want to do with the missing elements?

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Duncan Murdoch

John Sorkin wrote:

Please take what follows not as an ad hominem statement, but rather as an 
attempt to improve what is already an excellent program, that has been built as 
a result of many, many hours of dedicated work by many, many unpaid, unsung 
volunteers.

It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? 
  


I think you've misunderstood the argument.  It would not be hard to make 
the suggested change.  I don't object to it because it would be too much 
work, I object to it because I think it is not an improvement.  
Dataframes and matrices are different, and there is no way to avoid that 
fact. 


The arguments in favour of the change seem to be these:

- Dataframes and matrices are similar in some respects, so they should 
be similar in more.


In fact, I believe that the source of confusion is the fact that  the 
are similar, so this would not improve things.  People would still be 
confused by the differences, which are unavoidable.


- Using $ to extract a column of a matrix would be convenient.

I agree, it saves 4 keystrokes to type X$column instead of 
X[,column].  But I think it increases confusion, so the savings are 
not worthwhile.  For example, the col2rgb function returns a matrix with 
rows named red, green and blue.  But under your proposal, I'd still need 
to use X[red,] to extract the red component, because columns are 
components, but rows are not.   You are complaining that the lack of $ 
for matrices is an unnecessary asymmetry, and unnecessary asymmetries 
are confusing.  But your proposal introduces a new one!


 - Some functions return matrices when I expect a dataframe, or vice versa.

That will continue to be true regardless of whether the proposed change 
is made.  You need to read the documentation.  If it is unclear, it 
should be improved, the language shouldn't be changed so that sloppy 
documentation is accurate.


 - You suggested this so anyone who disagrees must be lazy.

Which really is an ad hominem argument, despite your disclaimer.  I 
think you should respect the fact that there are people who disagree 
with the value of your suggestion.   (Which is also an ad hominem 
attack, but isn't central to my argument.)


Duncan Murdoch


Again, please understand that my comment is made with deepest respect for the 
many people who have unselfishly contributed to the R project. Many thanks to 
each and every one of you.

John


  

Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 

On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca 
wrote:
  
Suppose X is a dataframe or a matrix.  What would you expect to get from 
X[1]?  What about as.vector(X), or as.numeric(X)?



All this of course depends on type of object one is speaking of. There 
are plenty of surprises available, and it's best to use the most logical 
way of extracting. E.g., to extract the top-left element of a 2D 
structure (data frame or matrix), use 'X[1,1]'.


Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' 
on a data frame, just as if it was a matrix, even though the underlying 
structure is completely different. (This doesn't work on a normal list; 
there you have to type the whole 'X[[2]][3]'.)


The behaviour of the 'as.' functions may sometimes be surprising, at 
least for me. For example, 'as.data.frame' on a named vector gives a 
single-column data frame, instead of a single-row data frame.


(I'm not sure what's the recommended way of converting a named vector to 
row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
and 't(X)' looks like a row of numbers.)


  
The point is that a dataframe is a list, and a matrix isn't.  If users 
don't understand that, then they'll be confused somewhere.  Making 
matrices more list-like in one respect will just move the confusion 
elsewhere.  The solution is to understand the difference.



My main problem is not understanding the difference, which is easy, but 
knowing which type of I have when I get the output a function in a 
package. If I know the object is a named vector or a matrix with column 
names, it's easy enough to type 'X[,colname]', and if it's a data 
frame one may use the shortcut 'X$colname'.


Usually, it *is* documented what the return value of a function is, but 
just looking at the output is much faster, 

Re: [R] two questions for R beginner

2010-03-02 Thread Albert-Jan Roskam
Hi Brandon,
 
I just read this book, which I am sure you will be interested in:
http://www.amazon.com/SAS-SPSS-Users-Statistics-Computing/dp/0387094172

Cheers!!
Albert-Jan

~~
In the face of ambiguity, refuse the temptation to guess.
~~

--- On Tue, 3/2/10, Brandon Zicha brandon.zi...@ua.ac.be wrote:


From: Brandon Zicha brandon.zi...@ua.ac.be
Subject: Re: [R] two questions for R beginner
To: r-help@r-project.org
Date: Tuesday, March 2, 2010, 12:31 PM


 What were your biggest misconceptions or
 stumbling blocks to getting up and running
 with R?

Easy.  I terms of materials I have been unable to find good books that 
introduce users to R from the perspective of someone familiar only with 
packages like SPSS or STATA, or not familiar with statistics packages at all.  
Even introduction texts use jargon without introducing it.

I think that R-help files should be more thorough than they are, and contain 
more examples.  I thought that STATA help files were sparse!  The notion that 
'R is a user community and thus they do this in their spare time' is no excuse 
for those creating new tools for R not developing complete help files.  It 
doesn't take that much time relative to actually creating the new function.

In terms of actual R use - creating, using, and manipulating data are the 
biggest frustration for those of the 'spreadsheet generation'.  I get the 
impression that one needs to not merely understand, but be fully fluent in the 
jargon of matrix mathematics to even know what is going on half the time.  I 
find myself - even now - using 'rules of thumb' that 'seemed to work' rather 
than fully understanding what I am doing.  It is particularly discouraging when 
many of those 'intro books' suggest using something besides R for data 
manipulation - how clumsy is that!?

I find the actual programming syntax itself is the easiest part to master.  It 
is certainly more flexible - but without a particularly sufficient increase in 
complexity - than trying to write script in SPSS and STATA.

Brandon Zicha

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simple data transformation question

2010-03-02 Thread Albert-Jan Roskam
Hi all, 
I have a (hopefully) simple newbie-level question. 
 
# I have data like this:
dtf - data.frame(read.table(textConnection(var  value
  company  9887.1
  company  91117.0
  blaah  91.1
  etc  11
  etc  97111), header=TRUE))
 
# I would like to have output like this (the index number may vary):
var  value.1 value.2
company 9887.1 91117.0
blah  91.1 NA
etc 11 97111
 
# I tried the following.
library(reshape)
cast(dtf, var~value, mean) # 'mean' because some function needs to be specified.
... this does not what I want, nor does t(dtf).
 
Can somebody help me with the correct transformation, or at least with which 
function to use best? Thank you in advance!

Cheers!!
Albert-Jan

~~
In the face of ambiguity, refuse the temptation to guess.
~~


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Gabor Grothendieck
On Tue, Mar 2, 2010 at 7:27 AM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 John Sorkin wrote:

 Please take what follows not as an ad hominem statement, but rather as an
 attempt to improve what is already an excellent program, that has been built
 as a result of many, many hours of dedicated work by many, many unpaid,
 unsung volunteers.

 It troubles me a bit that when a confusing aspect of R is pointed out the
 response is not to try to improve the language so as to avoid the confusion,
 but rather to state that the confusion is inherent in the language. I
 understand that to make changes that would avoid the confusing aspect of the
 language that has been discussed in this thread would take time and effort
 by an R wizard (which I am not), time and effort that would not be
 compensated in the traditional sense. This does not mean that we should not
 acknowledge the confusion. If we what R to be the de facto lingua franca of
 statistical analysis doesn't it make sense to strive for syntax that is as
 straight forward and consistent as possible?

 I think you've misunderstood the argument.  It would not be hard to make the
 suggested change.  I don't object to it because it would be too much work, I
 object to it because I think it is not an improvement.  Dataframes and
 matrices are different, and there is no way to avoid that fact.
 The arguments in favour of the change seem to be these:

Users of zoo have some experience with this since zoo uses matrices to
represent 2d time series and originally did not support $ as a column
extractor but now does.  I was originally opposed to adding it for the
reasons you state but it was eventually added and having used it for
some time now since it got into the package I must say that it is very
convenient and I now regard it as a definite improvement in user
experience.  Certainly I use the feature all the time.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random real numbers

2010-03-02 Thread Karl Ove Hufthammer
On Tue, 2 Mar 2010 11:51:39 +0100 frederik vanhaelst 
frederik.vanhae...@gmail.com wrote:
 How could i generate random real numbers between 0 en 2*pi?

Ten such numbers from the uniform distribution:
  2*pi*runif(10)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random real numbers

2010-03-02 Thread jim holtman
 runif(20,0,2*pi)
 [1] 1.29417642 1.10933879 4.31669186 2.41339484 4.83705630 3.12713657
4.50893007 6.23232980 2.38783146 4.88483239 5.87292617
[12] 1.33293077 4.09458703 0.7593 1.67899698 2.42602639 0.08413394
2.40261439 5.46442874 2.13847582



On Tue, Mar 2, 2010 at 5:51 AM, frederik vanhaelst
frederik.vanhae...@gmail.com wrote:
 Hi,

 How could i generate random real numbers between 0 en 2*pi?

 Thanks,

 Frederik

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Row-wisely converting a data frame into a list

2010-03-02 Thread Sebastian Bauer

Hello,

is there an elegant way, how I can convert each row of a data frame into 
distinct elements of a list?


In essence, what I'm looking for is something like

rows.to.lists - function( df ) {
ll - NULL
for( i in 1:nrow(df) )
ll - append( ll, list(df[i,]) )
return (ll)
}

but more done more efficiently (the data frame may contain ten-thousands 
of rows). I thought about using apply() but this function always returns 
a matrix.


Thanks in advance!

Bye,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question to define a matrix with some vectors with different lengths

2010-03-02 Thread jim holtman
This is what you would use a 'list' for.

On Tue, Mar 2, 2010 at 6:29 AM,  khaz...@ceremade.dauphine.fr wrote:
 Hi,
 I have some vector v1,v2,...,vk, with different  lengths. I want to
 consider these vectors as a matrix with k rows.
 Can you please guide me how I can do it?

 Regards
 khazaei

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random real numbers

2010-03-02 Thread Rubén Roa
From what distribution?
If the uniform,
runif(100,0,2*pi)
If another, install package Runuran, and do this
?Runuran
Vignette(Runuran)

HTH

 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de frederik vanhaelst
Enviado el: martes, 02 de marzo de 2010 11:52
Para: r-h...@stat.math.ethz.ch
Asunto: [R] Random real numbers

Hi,

How could i generate random real numbers between 0 en 2*pi?

Thanks,

Frederik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Double Colors in Main

2010-03-02 Thread Lorenzo Isella

Dear All,
Consider the following trivial code snippet

rm(list=ls())
name_vec - c(color1, color2)

pdf(test_color.pdf)
plot(seq(5), seq(5), main=paste(name_vec[1], and ,name_vec[2], sep=))

dev.off()


What I would like to achieve is rather simple to explain, but it is 
giving me a headache: how can I have two colors in main? Let us say that 
I would like 'color1' to be blue and 'color2' to be black.

Many thanks

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row-wisely converting a data frame into a list

2010-03-02 Thread Nutter, Benjamin
 as.data.frame(t(df)) 

For example

 x - as.data.frame(t(mtcars))
 typeof(x)
[1] list

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Sebastian Bauer
Sent: Tuesday, March 02, 2010 8:12 AM
To: r-help@r-project.org
Subject: [R] Row-wisely converting a data frame into a list

Hello,

is there an elegant way, how I can convert each row of a data frame into
distinct elements of a list?

In essence, what I'm looking for is something like

rows.to.lists - function( df ) {
ll - NULL
for( i in 1:nrow(df) )
ll - append( ll, list(df[i,]) )
return (ll)
}

but more done more efficiently (the data frame may contain ten-thousands
of rows). I thought about using apply() but this function always returns
a matrix.

Thanks in advance!

Bye,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


===

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News  World Report (2009).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use\...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Colors in Main

2010-03-02 Thread Romain Francois
See this thread : 
http://finzi.psych.upenn.edu/Rhelp10/2009-January/185693.html


On 03/02/2010 02:18 PM, Lorenzo Isella wrote:


Dear All,
Consider the following trivial code snippet

rm(list=ls())
name_vec - c(color1, color2)

pdf(test_color.pdf)
plot(seq(5), seq(5), main=paste(name_vec[1], and ,name_vec[2], sep=))

dev.off()


What I would like to achieve is rather simple to explain, but it is
giving me a headache: how can I have two colors in main? Let us say that
I would like 'color1' to be blue and 'color2' to be black.
Many thanks

Lorenzo


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7
`- http://tr.im/O1wO : highlight 0.1-5

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row-wisely converting a data frame into a list

2010-03-02 Thread Sebastian Bauer

Hi!

On 03/02/2010 02:22 PM, Nutter, Benjamin wrote:

as.data.frame(t(df))


For example


x- as.data.frame(t(mtcars))
typeof(x)

[1] list



Thanks for the quick reply!

I would never have guessed that as.data.frame() works that way!

BTW
This one seems also to do the trick:

rows.to.list - function( df ) {
ll-apply(df,1,list)
ll-lapply(ll,unlist)
}

It's even a bit faster here.

Bye,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA Types and Regression models: the same?

2010-03-02 Thread Liaw, Andy
If memory serves, Bill Venables said in the paper cited several times
here, that there's only one type of sums of squares.  So there's only
one type of ANOVA (if I understand what you mean by ANOVA).

Just forget about the different types of tests, and simply ask yourself
this (hopefully simple and straight forward) question: Which pair of
models when compared will answer the question you have at hand?  It's
not sufficient to just ask: Is factor X significant?  It depends on
what else is in the model you're entertaining.

I think it's high time to retire the archaic concept of the different
types of sums of squares.  IMHO they are the biggest red herrings in
Statistics.

Best,
Andy

From: Ravi Kulkarni
 
 Hello,
   I think I am beginning to understand what is involved in 
 the so-called
 Type-I, II, ... ANOVAS (thanks to all the replies I got for 
 yesterday's
 post). I have a question that will help me (and others?) understand it
 better (or remove a misunderstanding):
   I know that ANOVA is really a special case of regression where the
 predictor variable is categorical. I know that there can be 
 various types of
 regression models commonly called stepwise, add, remove..., 
 where one
 controls which predictors are added to the regression model 
 and in what
 order.
   Is this what the various Types of ANOVA correspond to? I 
 mean that I
 think of my ANOVA as a regression model (a General Linear 
 Model) and the
 various ways of entering predictors as the various ANOVA Types.
   Hope that makes sense...
 
   Ravi Kulkarni
 -- 
 View this message in context: 
 http://n4.nabble.com/ANOVA-Types-and-Regression-models-the-sam
 e-tp1574654p1574654.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stack type

2010-03-02 Thread Gabor Grothendieck
Here is an example using proto based on converting Duncan's example:

library(proto)
Stack - proto(new = function(.) proto(Stack,
stack = NULL,
push = function(., el) .$stack - c(list(el), .$stack),
pop = function(.) { stopifnot(length(.$stack)  0)
out - .$stack[[1]]
.$stack[[1]] - NULL
out
}))

mystack - Stack$new()
mystack$push( 1 )
mystack$push( letters )
mystack$pop()
mystack$pop()
mystack$pop() # gives an error


On Mon, Mar 1, 2010 at 8:14 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 01/03/2010 7:56 PM, Worik R wrote:

 How can I implement a stack in R?

 I want to push and pop.  Every thing I push and pop will be the same
 type, but not necessarily an atomic type.

 Use lexical scoping:

 stack - function() {
  store - list()
  push - function(item) {
    store - c(list(item), store)
    invisible(length(store))
  }
  pop - function() {
    if (!length(store)) stop(Nothing to pop!)
    result - store[[1]]
    store[[1]] - NULL
    result
  }
  list(push=push, pop=pop)
 }

 mystack - stack()
 mystack$push( 1 )
 mystack$push( letters )
 mystack$pop()
 mystack$pop()
 mystack$pop() # gives an error

 Duncan Murdoch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple data transformation question

2010-03-02 Thread Henrique Dallazuanna
Try this:

reshape(cbind(id = as.numeric(dtf$var), dtf, time =  with(dtf,
ave(value, var, FUN = seq))), timevar=time, direction=wide)

Or:

 xtabs(value ~ var + ave(value, var, FUN = seq), data = dtf)

On Tue, Mar 2, 2010 at 9:40 AM, Albert-Jan Roskam fo...@yahoo.com wrote:
 Hi all,
 I have a (hopefully) simple newbie-level question.

 # I have data like this:
 dtf - data.frame(read.table(textConnection(var  value
   company  9887.1
   company  91117.0
   blaah  91.1
   etc  11
   etc  97111), header=TRUE))

 # I would like to have output like this (the index number may vary):
 var  value.1 value.2
 company 9887.1 91117.0
 blah  91.1 NA
 etc 11 97111

 # I tried the following.
 library(reshape)
 cast(dtf, var~value, mean) # 'mean' because some function needs to be 
 specified.
 ... this does not what I want, nor does t(dtf).

 Can somebody help me with the correct transformation, or at least with which 
 function to use best? Thank you in advance!

 Cheers!!
 Albert-Jan

 ~~
 In the face of ambiguity, refuse the temptation to guess.
 ~~



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row-wisely converting a data frame into a list

2010-03-02 Thread David Winsemius


On Mar 2, 2010, at 8:11 AM, Sebastian Bauer wrote:


Hello,

is there an elegant way, how I can convert each row of a data frame  
into distinct elements of a list?


split(dfrm, rownames(dfrm))



In essence, what I'm looking for is something like

rows.to.lists - function( df ) {
ll - NULL
for( i in 1:nrow(df) )
ll - append( ll, list(df[i,]) )
return (ll)
}

but more done more efficiently (the data frame may contain ten- 
thousands of rows). I thought about using apply() but this function  
always returns a matrix.


Thanks in advance!

Bye,
Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA Types and Regression models: the same?

2010-03-02 Thread Frank E Harrell Jr

Ravi Kulkarni wrote:

Hello,
  I think I am beginning to understand what is involved in the so-called
Type-I, II, ... ANOVAS (thanks to all the replies I got for yesterday's
post). I have a question that will help me (and others?) understand it
better (or remove a misunderstanding):
  I know that ANOVA is really a special case of regression where the
predictor variable is categorical. I know that there can be various types of
regression models commonly called stepwise, add, remove..., where one
controls which predictors are added to the regression model and in what
order.
  Is this what the various Types of ANOVA correspond to? I mean that I
think of my ANOVA as a regression model (a General Linear Model) and the
various ways of entering predictors as the various ANOVA Types.
  Hope that makes sense...

  Ravi Kulkarni


Ravi,

John Fox's posting provided a lot of information.  Briefly, the Types 
refer to whether effects are adjusted for all other effects in the model 
(Types II, III, IV) or no (Type I or sequential tests only adjust for 
EARLIER terms in the model).  See John's posting for definitions of 
Types II and III and for reasons to almost always use II or III. 
Stepwise procedures are a whole different kettle of fish, and they yield 
invalid statistical tests in almost all cases involving more than two 
candidate variables and more than one fitted model.


The world might be a better place if these types were not invented and 
we stated statistical tests more intuitively.  The contrast function in 
the rms package is one of many ways to do that.  Here are the examples 
from the help file.  See especially the examples using type='joint'.


set.seed(1)
age - rnorm(200,40,12)
sex - factor(sample(c('female','male'),200,TRUE))
logit - (sex=='male') + (age-40)/5
y - ifelse(runif(200) = plogis(logit), 1, 0)
f - lrm(y ~ pol(age,2)*sex)
# Compare a 30 year old female to a 40 year old male
# (with or without age x sex interaction in the model)
contrast(f, list(sex='female', age=30), list(sex='male', age=40))


# For a model containing two treatments, centers, and treatment
# x center interaction, get 0.95 confidence intervals separately
# by cente
center - factor(sample(letters[1:8],500,TRUE))
treat  - factor(sample(c('a','b'),  500,TRUE))
y  - 8*(treat=='b') + rnorm(500,100,20)
f - ols(y ~ treat*center)


lc - levels(center)
contrast(f, list(treat='b', center=lc),
list(treat='a', center=lc))


# Get 'Type III' contrast: average b - a treatment effect over
# centers, weighting centers equally (which is almost always
# an unreasonable thing to do)
contrast(f, list(treat='b', center=lc),
list(treat='a', center=lc),
 type='average')


# Get 'Type II' contrast, weighting centers by the number of
# subjects per center.  Print the design contrast matrix used.
k - contrast(f, list(treat='b', center=lc),
 list(treat='a', center=lc),
  type='average', weights=table(center))
print(k, X=TRUE)
# Note: If other variables had interacted with either treat
# or center, we may want to list settings for these variables
# inside the list()'s, so as to not use default settings


# For a 4-treatment study, get all comparisons with treatment 'a'
treat  - factor(sample(c('a','b','c','d'),  500,TRUE))
y  - 8*(treat=='b') + rnorm(500,100,20)
dd - datadist(treat,center); options(datadist='dd')
f - ols(y ~ treat*center)
lt - levels(treat)
contrast(f, list(treat=lt[-1]),
list(treat=lt[ 1]),
 cnames=paste(lt[-1],lt[1],sep=':'), conf.int=1-.05/3)


# Compare each treatment with average of all others
for(i in 1:length(lt)) {
  cat('Comparing with',lt[i],'\n\n')
  print(contrast(f, list(treat=lt[-i]),
list(treat=lt[ i]), type='average'))
}
options(datadist=NULL)

# Six ways to get the same thing, for a variable that
# appears linearly in a model and does not interact with
# any other variables.  We estimate the change in y per
# unit change in a predictor x1.  Methods 4, 5 also
# provide confidence limits.  Method 6 computes nonparametric
# bootstrap confidence limits.  Methods 2-6 can work
# for models that are nonlinear or non-additive in x1.
# For that case more care is needed in choice of settings
# for x1 and the variables that interact with x1.



coef(fit)['x1']# method 1
diff(predict(fit, gendata(x1=c(0,1 # method 2
g - Function(fit) # method 3
g(x1=1) - g(x1=0)
summary(fit, x1=c(0,1))# method 4
k - contrast(fit, list(x1=1), list(x1=0)) # method 5
print(k, X=TRUE)
fit - update(fit, x=TRUE, y=TRUE)   # method 6
b - bootcov(fit, B=500, coef.reps=TRUE)
bootplot(b, X=k$X)# bootstrap distribution and CL


# In a model containing age, race, and sex,
# compute an estimate of the mean response for a
# 50 year old male, averaged over the races using
# observed frequencies for the races as weights


f - ols(y ~ age + race + sex)

Re: [R] two questions for R beginner

2010-03-02 Thread Paul Hiemstra

Brandon Zicha wrote:

Hey Paul,

Hey Brandon, (adding R-help in the cc)

I agree with you that the documentation of R could be better, especially 
with more examples in code showing not only the common cases, but also 
more esoteric cases. It would be great if everyone invested a lot of 
time to write awesome documentation, but this is not the case. I just 
objected to the tone (I tought :)) I spotted. Some more comments are inline:


Accepting the main point of my post - that the often VERY incomplete 
help files appended to packages can be a major stumbling block for 
getting up and running in R - I take your point.  I probably went a 
bit to far with my language there.


I would point out though that a great many parts of research (like 
writing a bibliography - or searching for citations of any kind 
usually) aren't much fun, but are an important part of research 
related work.  Likewise, complete documentation (by which I hardly 
mean a paper - looking at STATA help files as a minimum would be a 
good start) is part of programming.  I agree that one needs to employ 
some level of judgement, otherwise you will get helpfile that says 
First turn on the computer... then click the 'R' Icon  But, I 
have myself created one or two STATA functions that I have put up for 
public use - so I know how not fun, but necessary complete 
documentation is.  Further, I didn't say that writing documentation 
doesn't take time.  Everything takes time. My point was that relative 
to actually creating the application - writing more complete 
documentation takes very little time. If one invests the time to do 
the 'fun' stuff of writing a new package for R, it seems reasonable 
that taking the (proportionately) little time to write a nicer help 
file would be the most 'professional' thing to do.  But, this could be 
my illusion that all researchers seem themselves as professionals - 
rather than an anarchic egoistic enclave of independent 
self-interested paper producers.
This is what scientists get judged upon, not on how much software they 
publish and how good their documentation is. Furthermore, it is quite 
hard for a hardcore R programmer to judge what people find har about 
their software.
I am notorious for assuming greater standards as an acceptable 'norm' 
than my community at large :-)  Furthermore, you are absolutely right 
that my standards are apparently even to high for many commercial 
applications!  R help is sometimes downright good!


So, if I accept that I am demanding S.O.B. and tone down my thoughts 
of proper documentation and professionalism and adopt the (probably 
more) reasonable perspective you do at the end of  well, this is the 
world we live in... and come on it's free I totally agree that I 
probably went too far!  But, better yet, I think that this observation 
you make suggests a solution: Perhaps R could use a more integrated 
and organized open source help system. I can think of a few 
possibilities - the easiest being a wiki version of R help.  This way 
users could add useful information to help files - such as more 
examples, tricks, tips, and known problems.  This would take advantage 
of the open source, free, user-community centered aspects of R, and 
permit those with an interest in helping beginners to post notes for 
beginners - on the help files.  I know that if such a wiki existed I 
would have posted my recent example of constrain optimization I just 
did recently.   It wouldn't be too difficult to add a function 
wikihelp(X) that would open the wiki help page rather than the 
standard help documentation.  Currently, help on any given command is 
scattered all over help fora all about the web.  A central, indexed, 
and easily referenced help system might be a solution.  Heck, such a 
system could go a step further and link R-help listserv archives by 
command thus centralizing and integrating the open-source user-built 
information resource of the listserv into help().  How many e-mails to 
this listserv begin with 'I just spent a few hours cruising the help 
forums related to 'X' and couldn't find an answer.'
Sounds like a good addition, allowing people to add to the documentation 
as they see fit. There is ofcourse the R wiki, but this is not widely 
used and not firmly embedded into R itself. But how would we keep such a 
system you propose manageable, preventing it from becoming an enormous 
mess. Maybe some kind of moderation?


I note that STATA has all their help files for the latest version of 
stata available on the web (http://www.stata.com/help.cgi?contents).  
How difficult would a similar system - only with R, editable and with 
links to supplementary information - be to set up?  I can't imagine it 
would be horribly expensive in terms of set up costs.
A problem is that there is no company that markets R that could set this 
up, the community is much looser, much more open source. Probably the R 
core team would be the closest thing we have.


What do you 

Re: [R] ANOVA Types and Regression models: the same?

2010-03-02 Thread Frank E Harrell Jr

Sorry there were 2 typos in my note:


John Fox's posting provided a lot of information.  Briefly, the Types 
refer to whether effects are adjusted for all other effects in the model 
(Types II, III, IV) or no (Type I or sequential tests only adjust for 


no - not

EARLIER terms in the model).  See John's posting for definitions of 
Types II and III and for reasons to almost always use II or III. 


II or III - II over III

Sorry about that
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple data transformation question

2010-03-02 Thread Albert-Jan Roskam
Hi Henrique,
 
*Thank you!* The reshape code does precisely what I want. 

Cheers!!
Albert-Jan

~~
In the face of ambiguity, refuse the temptation to guess.
~~

--- On Tue, 3/2/10, Henrique Dallazuanna www...@gmail.com wrote:


From: Henrique Dallazuanna www...@gmail.com
Subject: Re: [R] simple data transformation question
To: Albert-Jan Roskam fo...@yahoo.com
Cc: r-help@r-project.org
Date: Tuesday, March 2, 2010, 2:45 PM


Try this:

reshape(cbind(id = as.numeric(dtf$var), dtf, time =  with(dtf,
ave(value, var, FUN = seq))), timevar=time, direction=wide)

Or:

xtabs(value ~ var + ave(value, var, FUN = seq), data = dtf)

On Tue, Mar 2, 2010 at 9:40 AM, Albert-Jan Roskam fo...@yahoo.com wrote:
 Hi all,
 I have a (hopefully) simple newbie-level question.

 # I have data like this:
 dtf - data.frame(read.table(textConnection(var  value
   company  9887.1
   company  91117.0
   blaah  91.1
   etc  11
   etc  97111), header=TRUE))

 # I would like to have output like this (the index number may vary):
 var  value.1 value.2
 company 9887.1 91117.0
 blah  91.1 NA
 etc 11 97111

 # I tried the following.
 library(reshape)
 cast(dtf, var~value, mean) # 'mean' because some function needs to be 
 specified.
 ... this does not what I want, nor does t(dtf).

 Can somebody help me with the correct transformation, or at least with which 
 function to use best? Thank you in advance!

 Cheers!!
 Albert-Jan

 ~~
 In the face of ambiguity, refuse the temptation to guess.
 ~~



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nu-SVM crashes in e1071

2010-03-02 Thread LWF
Hello !

I`m using SVMs for multi-class classification problems. Therefore I`m using the 
svm() function in the package e1071.
If I use svm(...type=C-classification) everything works fine. But if I want 
to use nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes 
immediately. No error message - just crash.

Did anybody had the same problem and maybe a solution? 
I`m using R 2.10.0 and the latest Version of e1071

Thanks
TIM 

BTW: Using the LibSVM wrapper in Weka the same happens. Maybe there is a 
problem in the LibSVM code...

---
 
Tim Häring
Bavarian State Institute of Forest Research 
Department of Forest Ecology
Hans-Carl-von-Carlowitz-Platz 1
D-85354 Freising

E-Mail: tim.haer...@lwf.bayern.de
http://www.lwf.bayern.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior with poisosn and glm

2010-03-02 Thread Gavin Simpson
On Tue, 2010-03-02 at 00:58 -0800, Noah Silverman wrote:
 Ted,
 
 Brilliant explanation (as usual)
 
 I'm back in school, just starting on a post-graduate degree in stats so 
 the help is really appreciated.
 
 Now, I have a slightly trickier question about the same model.
 
 I've seen more than one way to get values out of the glm model.
 
 i.e.  If we're looking at the 10th item in the dataset:
 note: m is the model
 
 fitted(m)[10]
 predict(m,dataset[10,])
 
 Give me different results.  From my data, I get the following real results:
   predict(m,data[100,])
   100
 7.727999
   fitted(m)[100]
   179
 3956.637

I find that unlikely - why is one labelled 100 and the other 179, so
perhaps something is wrong here? 

However, that said, those two calls *will* give you different results
because with predict, we can have several types of predictions.
see ?predict.glm and note that the default is for type = link, i.e.
top produce predictions on the scale of the linear predictor/link
function, which then need the inverse of the link function applying to
them.

What does

predict(m, data, type = response)[100]

and 

fitted(m)[100]

yield?

Do you have missing values etc in your data?

G

 
  From my understanding, the exp of the prediction should be equal to the 
 fitted value.  Here it is not.  I don't understand why.  Any insight?
 
 -N
 
 
 
 On 3/2/10 12:47 AM, (Ted Harding) wrote:
  On 02-Mar-10 08:02:27, Noah Silverman wrote:
 
  Hi,
  I'm just learning about poison links for the glm function.
 
  One of the data sets I'm playing with has several of the
  variables as factors (i.e. month, group, etc.)
 
  When I call the glm function with a formula that has a factor
  variable, R automatically converts the variable to a series of
  variables with unique names and binary values.
 
  For example, with this pseudo data:
 
  yv1month
  21january
  31.4februrary
  1.56.3february
  1.24.5january
  5.54.0march
 
  I use this call:
 
  m- glm(y ~ v1 + month, family=poisson)
 
  R gives me back a model with variables of
  Intercept
  v1
  monthJanuary
  monthFebruary
  monthMarch
 
  I'm concerned that this might be doing some strange things
  to my model.
  Can anyone offer some enlightenment?
  Thanks!
   
  The creation of auxiliary variables is the way to incorporate
  a factor variable into a model. These are usually called
  dummy variables, and are essentially indicator variables.
 
  Your data above would correspond to variables I (for Intercept),
  J (for January), F (for February) and M (for March) in addition
  to the other variables y and v1 as below:
 
 y  v1I   J   F   M   #   month
 2  1 1   1   0   0   #  january
 3  1.4   1   0   1   0   #  februrary
 1.56.3   1   0   1   0   #  february
 1.24.5   1   1   0   0   #  january
 5.54.0   1   0   0   1   #  march
 
  The linear predictor L in the model for y would then be
 
 L = a*I + b*v1 + c1*J + c2*F + c3*J
 
  evaluated arithmetically; e.g. for row 2 of the data it is
 
 a + b*1.4 + c2
 
  However, as given, J + F + M = I, so there is redundancy in
  the variables, since there are only three independent values
  there  (not so if you exclude the Intercept using a model
  formula y ~ v1 + month - 1), so R will provide estimates
  which are computed in terms of some pattern of differences
  between these four variables called contrasts. Different
  patterns of difference present different representations
  of the three independent aspects.
 
  There are many different kinds of contrasts available.
  One of these will be chosen as default by R (depending in
  particular on whether the factor variable is being used
  as an ordered factor or an unordered factor). See ?contrasts
  for an outline of what is there, ?contrast for more detail,
  and look at the help for particular contrasts such as
  ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment.
 
  After all that: No, R is not doing strange things to your model!
 
  ted.
 
  
  E-Mail: (Ted Harding)ted.hard...@manchester.ac.uk
  Fax-to-email: +44 (0)870 094 0861
  Date: 02-Mar-10   Time: 08:47:11
  -- XFMail --
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] 

Re: [R] nu-SVM crashes in e1071

2010-03-02 Thread Uwe Ligges



On 02.03.2010 15:41, Häring, Tim (LWF) wrote:

Hello !

I`m using SVMs for multi-class classification problems. Therefore I`m using the svm() 
function in the package e1071.
If I use svm(...type=C-classification) everything works fine. But if I want to use 
nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes immediately. No error 
message - just crash.

Did anybody had the same problem and maybe a solution?
I`m using R 2.10.0 and the latest Version of e1071



Maybe for your unstated OS with unstated version of e1071 on an outdated 
version of R without a reproducible example given.


For my WinXP, R-2.10.1, e1071 1.5-22 I get:

library(e1071)
data(iris)
model - svm(Species ~ ., data = iris, type=nu-classification)
model

#Call:
#svm(formula = Species ~ ., data = iris, type = nu-classification)
#
#
#Parameters:
#   SVM-Type:  nu-classification
# SVM-Kernel:  radial
#  gamma:  0.25
# nu:  0.5
#
#Number of Support Vectors:  103

Uwe Ligges






Thanks
TIM

BTW: Using the LibSVM wrapper in Weka the same happens. Maybe there is a 
problem in the LibSVM code...

---
Tim Häring
Bavarian State Institute of Forest Research
Department of Forest Ecology
Hans-Carl-von-Carlowitz-Platz 1
D-85354 Freising

E-Mail: tim.haer...@lwf.bayern.de
http://www.lwf.bayern.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Building R packages in Windows 7

2010-03-02 Thread Eric Ferreira
-- Forwarded message --
From: Duncan Murdoch murd...@stats.uwo.ca
Date: 25 February 2010 22:35
Subject: Re: [R] Building R packages in Windows 7
To: RHelp r-help@r-project.org, Eric Ferreira ericbferre...@gmail.com


On 25/02/2010 7:56 PM, Eric Ferreira wrote:

 Thank you, Sir, but how can I demand it to create HTML files?


The tools::Rd2HTML function will do the translation, but it takes a bit
of work to prepare input for it.  The idea is that you don't need to
save the files, R just produces them when the browser asks for them.

There's an option --enable-prebuilt-html that can be used when
installing a package, but I don't recommend using it.  You'll get the
help page as it exists at install time, not as it is intended to be
displayed at run time.  Links to other packages likely won't work properly.

Duncan Murdoch

P.S. Please copy your responses to the group, so that others can see the
questions and answers.





 On 25 February 2010 14:41, Duncan Murdoch murd...@stats.uwo.ca wrote:

  On 25/02/2010 11:49 AM, Eric Ferreira wrote:

  Ok,

 I'm working under:
 Windows 7 Professional 32bits, 4 GB RAM, 320 GB HD, Intel Core 2 Duo
 processor
 R 2.10.1

 I've installed:
 Rtools211
 MikteX 2.8
 HTML Help Workshop

 Setting my PATH to:
 c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;c:\Arquivos de
 Programas\R\R-2.10.1pat\bin;c:\Arquivos de Programas\MikTeX
 2.8\miktex\bin;c:\Program Files\HTML Help Workshop

 ...creating the package called ExpDes and asking (at the prompt) :

 Rcmd build --binary ExpDes

 Among others, a warning message is printed: WARNING: some HTML links may
 not be found, and no html files are produced.


  Right, HTML help files are produced on demand, they aren`t stored in the
 binary package zip file.  HTML Help Workshop is not being used at all.

 Duncan Murdoch

  Thank you again.






 On 25 February 2010 13:02, Duncan Murdoch murd...@stats.uwo.ca wrote:

  On 25/02/2010 10:56 AM, Eric Ferreira wrote:

  This is my first package. I'm just getting started doing that,

 following

 the
 steps described on you website... I really don't know how I asking for
 CHMs
 to be produced, sorry.


  All I can suggest is that you need to be less stingy with information.
  Tell us what you did.  Tell us what symptoms you saw.  Do both of those

 by

 cut and paste from your console, don't paraphrase, or refer to vague
 instructions like your website.

 Duncan Murdoch


  On 25 February 2010 12:52, Duncan Murdoch murd...@stats.uwo.ca

 wrote:

 On 25/02/2010 10:40 AM, Eric Ferreira wrote:

  Dear Duncan

 Thank so much for your reply.
 Actually, I'm using the latest version of R and the problem

 persists.

 What

 do you use instead of HTML Help Workshop for newer R versions?


  We just produce text and HTML help pages on demand, and LaTeX ones

 for

 the

 pdf manuals.  How are you asking for CHMs to be produced?

 Duncan Murdoch

  Best regards

 Eric.

 On 25 February 2010 11:43, Duncan Murdoch murd...@stats.uwo.ca

 wrote:

 On 25/02/2010 9:06 AM, Eric Ferreira wrote:

  Dear useRs,

 I'm having trouble building R packages in Windows 7 regarding

 HTML

 help

 Workshop.
 Pointing PATH to c:\Program Files\HTML help Workshop does work in

 Windows

 (e.g. Vista) and does not in Windows 7.

 Some tips??


  We don't use the HTML Help Workshop any more since R 2.10.0, so

 you

 could

 upgrade to the current R, and the problem will go away.

  Otherwise, I

 think

 you'll have to ask Microsoft for help.  But they aren't likely to

 be

 helpful:  Win XP is the most recent OS listed as supported.

 Duncan Murdoch





















-- 
Dr Eric B Ferreira
Exact Sciences Department
Federal University of Alfenas
Brazil

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nu-SVM crashes in e1071

2010-03-02 Thread LWF

  I`m using SVMs for multi-class classification problems. Therefore I`m
 using the svm() function in the package e1071.
  If I use svm(...type=C-classification) everything works fine. But
 if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5)
 R crashes immediately. No error message - just crash.
 
  Did anybody had the same problem and maybe a solution?
  I`m using R 2.10.0 and the latest Version of e1071
 
 
 Maybe for your unstated OS with unstated version of e1071 on an
 outdated
 version of R without a reproducible example given.
 
 For my WinXP, R-2.10.1, e1071 1.5-22 I get:
 
 library(e1071)
 data(iris)
 model - svm(Species ~ ., data = iris, type=nu-classification)
 model
 

O.k. - sorry for my sparse information.
I just made an update to R-2.10.1 and e1071 version 1.5-22 on WinXP.
I can reproduce the example with the iris dataset. However R crashes when I 
call svm() with my dataset

model - svm(soil_unit ~ ., data = traindat, type=nu-classification)

My dataset consists of 9259 obs. of 14 variables. My target variable is a 
factor variable with 22 levels (multi-class classification). Predictors are 12 
numeric and 1 factor variables.

Hoping this information is enough.

TIM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random real numbers

2010-03-02 Thread Paul Hiemstra

frederik vanhaelst wrote:

Hi,

How could i generate random real numbers between 0 en 2*pi?

Thanks,

Frederik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  
Googeling for R generate random number gave this as a second hit (on 
my machine):


http://blog.revolution-computing.com/2009/02/how-to-choose-a-random-number-in-r.html

cheers,
Paul

--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading data file with both fixed and tab-delimited fields

2010-03-02 Thread Marshall Feldman
Hello R wizards,

What is the best way to read a data file containing both fixed-width and 
tab-delimited files? (More detail follows.)

_*Details:*_
The U.S. Bureau of Labor Statistics provides local area unemployment 
statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are 
documented in the file la.txt 
ftp://ftp.bls.gov/pub/time.series/la/la.txt. Each data file has five 
tab-delimited fields:

* series_id
* year
* period (codes for things like quarter or month of year)
* value
* footnote_codes

The series_id consists of five fixed-width subfields (length in 
parentheses):

* survey abbreviation (2)
* seasonal code (1)
* area type code (2)
* area code (6)
* measure code (2)

So an example record might be:

LASPS36040003   1990M01 8.8 L

I want to read in the data in one pass and convert them to a data frame with 
the following columns (actual name, class in parentheses):

Survey abbreviation (survey, character)
Seasonal (seasonal, logical seasonal=T)
Area type (area_type_code, factor)
Area (area_code, factor)
Measure (measure_code, factor)
Year (year, Date)
Period (period, factor)
Value (value, numeric)
Footnote (footnote_codes, character but see note)

(Regarding the Footnote, I have to look at the data more. If there's 
just one code per record, this will be a factor; if there are multiple, 
it will either be character or a list. For not I'm making it only 
character.)

Currently I can read the data just fine using read.table, but this makes 
series_id the first variable. I want to break out the subfields as 
separate columns.

Any suggestions?

Thanks.
 Marsh Feldman




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting a subset of a time series

2010-03-02 Thread davidr
The indexing in xts is very nice; it may do what you want.

library(xts)
x.xts - as.xts(x)
plot(x.xts)
plot(x.xts['2005::2006-10'])

HTH,
David Reiner

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Erin Hodgess
Sent: Tuesday, March 02, 2010 1:20 AM
To: R help
Subject: [R] plotting a subset of a time series

Dear R People:

I have the following time series and plot:
 x - ts(rnorm(50),start=2005,freq=12)
 plot(x)


which works fine.

I would like to plot a subset of that time series, which I did with:
 plot(window(x,2005,2006.83))

Is there a better way to do this, please?

Thanks,
Erin



-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail and any materials attached hereto, including, without limitation, 
all content hereof and thereof (collectively, XR Content) are confidential 
and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are 
protected by intellectual property laws.  Without the prior written consent of 
XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
reproduced or otherwise used by anyone other than current employees of XR or 
its affiliates, on behalf of XR or its affiliates.

THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY 
KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE 
FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, 
DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS 
AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR 
INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nu-SVM crashes in e1071

2010-03-02 Thread Uwe Ligges



On 02.03.2010 17:05, Häring, Tim (LWF) wrote:



I`m using SVMs for multi-class classification problems. Therefore I`m

using the svm() function in the package e1071.

If I use svm(...type=C-classification) everything works fine. But

if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5)
R crashes immediately. No error message - just crash.


Did anybody had the same problem and maybe a solution?
I`m using R 2.10.0 and the latest Version of e1071



Maybe for your unstated OS with unstated version of e1071 on an
outdated
version of R without a reproducible example given.

For my WinXP, R-2.10.1, e1071 1.5-22 I get:

library(e1071)
data(iris)
model- svm(Species ~ ., data = iris, type=nu-classification)
model



O.k. - sorry for my sparse information.
I just made an update to R-2.10.1 and e1071 version 1.5-22 on WinXP.
I can reproduce the example with the iris dataset. However R crashes when I 
call svm() with my dataset

model- svm(soil_unit ~ ., data = traindat, type=nu-classification)

My dataset consists of 9259 obs. of 14 variables. My target variable is a 
factor variable with 22 levels (multi-class classification). Predictors are 12 
numeric and 1 factor variables.

Hoping this information is enough.



Well, you might want to send the *reproducible* example (i.e. including 
data that reproduces a crash) to the e1071 maintainer (CCing David). 
Maybe he will be unable to help given it is a problem in the underlying 
libsvm code in which case it might be better to contact the libsvm 
maintainers.


Uwe Ligges




TIM


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting a subset of a time series

2010-03-02 Thread Gabor Grothendieck
Try this:

plot(window(x, end = c(2006, 10)))

On Tue, Mar 2, 2010 at 2:19 AM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Dear R People:

 I have the following time series and plot:
 x - ts(rnorm(50),start=2005,freq=12)
 plot(x)


 which works fine.

 I would like to plot a subset of that time series, which I did with:
 plot(window(x,2005,2006.83))

 Is there a better way to do this, please?

 Thanks,
 Erin



 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random real numbers

2010-03-02 Thread Barry Rowlingson
On Tue, Mar 2, 2010 at 4:05 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote:
 frederik vanhaelst wrote:

 Hi,

 How could i generate random real numbers between 0 en 2*pi?

 Thanks,


 Googeling for R generate random number gave this as a second hit (on my
 machine):

 http://blog.revolution-computing.com/2009/02/how-to-choose-a-random-number-in-r.html


If the original poster wanted real random numbers instead of random
real numbers:

http://finzi.psych.upenn.edu/R/library/random/html/random.html

but I'm not sure how best to convert those real random integers into
real random reals (between 0 and 2pi).

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] kriging with geoR package

2010-03-02 Thread patrocle

hi all.
If someone have the same problems this is the answer:

-vertical legend :
 legend.krige(x.leg=c(X,X),y.leg=c(X,X),kr$pred,vert=TRUE,col=gray(seq(.7,0,l=10)))


-sample's positions on the map:

###coords.dat=table$coords### like in 

image(kr,col=gray(seq(.7,0,l=10)),xlim=c(-1,55),ylim=c(0,53),coords.dat=fau1$coords)

-for the grey scale the help provided by the software is nice


-- 
View this message in context: 
http://n4.nabble.com/kriging-with-geoR-package-tp1008696p1575186.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] capturing errors in Sweave

2010-03-02 Thread Berwin A Turlach
G'day Sundar,

On Tue, 2 Mar 2010 01:03:54 -0800
Sundar Dorai-Raj sdorai...@gmail.com wrote:

 Thanks, Berwin. That works just great!

You are welcome.

I noticed by now that cat(tmp) is sufficient; the tmp[1] in 
cat(tmp[1]) was a left over from earlier attempts to get the output
to look correct.

Cheers,

Berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nu-SVM crashes in e1071

2010-03-02 Thread Steve Lianoglou
Hi,

On Tue, Mar 2, 2010 at 11:05 AM, Häring, Tim (LWF)
tim.haer...@lwf.bayern.de wrote:

  I`m using SVMs for multi-class classification problems. Therefore I`m
 using the svm() function in the package e1071.
  If I use svm(...type=C-classification) everything works fine. But
 if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5)
 R crashes immediately. No error message - just crash.
 
  Did anybody had the same problem and maybe a solution?
  I`m using R 2.10.0 and the latest Version of e1071


 Maybe for your unstated OS with unstated version of e1071 on an
 outdated
 version of R without a reproducible example given.

 For my WinXP, R-2.10.1, e1071 1.5-22 I get:

 library(e1071)
 data(iris)
 model - svm(Species ~ ., data = iris, type=nu-classification)
 model


 O.k. - sorry for my sparse information.
 I just made an update to R-2.10.1 and e1071 version 1.5-22 on WinXP.
 I can reproduce the example with the iris dataset. However R crashes when I 
 call svm() with my dataset

 model - svm(soil_unit ~ ., data = traindat, type=nu-classification)

 My dataset consists of 9259 obs. of 14 variables. My target variable is a 
 factor variable with 22 levels (multi-class classification). Predictors are 
 12 numeric and 1 factor variables.

 Hoping this information is enough.

While you're sending your bug report to David, perhaps you can try the
SVM from kernlab.

It relies on code from libsvm, too, but ... you never know. It can't
hurt to try.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with choose.files

2010-03-02 Thread Caleb Rounds
I have recently upgraded to R 2.10.1 on Windows XP and am using
scripts that I've used in previous versions successfully. I'm having a
problem with choose.files. The lines read:

fura_scan_file-choose.files(caption=Select log file (*.log) for fura-2 scans)
PI_scan_file-choose.files(caption=Select log file (*.log) for PI scans)


The problem is that the directory chosen after the first choose.files
is not remembered. This is an issue b/c my files are nested inside of
several directories and it takes a lot of clicking to get to where I
need to be. Is there a problem with these lines? Is it likely
elsewhere in the script?

I apologize for my ignorance and wasting time, but in the
documentation for choose.files it suggests this should happen
automatically.

Caleb Rounds

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data file with both fixed and tab-delimited fields

2010-03-02 Thread Chidambaram Annamalai
I tried to shoehorn the read.* functions and match both the fixed width and
the variable width fields
in the data but it doesn't seem evident to me. (read.fwf reads fixed width
data properly but the rest
of the fields must be processed separately -- maybe insert NULL stubs in the
remaining fields and
fill them in later?)

One way is to sidestep the entire issue and convert the structured data you
have into a csv
file using sed (usually available on  most *nix systems) with something like
so:

cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[
\t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' |
less

and see if the output is alright and use the resulting .csv file directly in
R using read.csv

If that does not satisfy you maybe the R Wizards on the list might be able
to point you to a
native R way of doing this possibly using scan? I'm not sure though.

Hope this helps,
Chillu

On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman ma...@uri.edu wrote:

 Hello R wizards,

 What is the best way to read a data file containing both fixed-width and
 tab-delimited files? (More detail follows.)

 _*Details:*_
 The U.S. Bureau of Labor Statistics provides local area unemployment
 statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are
 documented in the file la.txt
 ftp://ftp.bls.gov/pub/time.series/la/la.txt. Each data file has five
 tab-delimited fields:

* series_id
* year
* period (codes for things like quarter or month of year)
* value
* footnote_codes

 The series_id consists of five fixed-width subfields (length in
 parentheses):

* survey abbreviation (2)
* seasonal code (1)
* area type code (2)
* area code (6)
* measure code (2)

 So an example record might be:

 LASPS36040003   1990M01 8.8 L

 I want to read in the data in one pass and convert them to a data frame
 with the following columns (actual name, class in parentheses):

Survey abbreviation (survey, character)
Seasonal (seasonal, logical seasonal=T)
Area type (area_type_code, factor)
Area (area_code, factor)
Measure (measure_code, factor)
Year (year, Date)
Period (period, factor)
Value (value, numeric)
Footnote (footnote_codes, character but see note)

 (Regarding the Footnote, I have to look at the data more. If there's
 just one code per record, this will be a factor; if there are multiple,
 it will either be character or a list. For not I'm making it only
 character.)

 Currently I can read the data just fine using read.table, but this makes
 series_id the first variable. I want to break out the subfields as
 separate columns.

 Any suggestions?

 Thanks.
 Marsh Feldman




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add a header to a forest plot (metafor)

2010-03-02 Thread Sebastian Stegmann
Dear R-community,

 

I'm currently trying to assemble a forest plot using the forest function
from package metaphor.

Works well. Even the regular main-argument works for adding a title to the
graph. 

 

However, I would like to add one top row which explains the nature of the
columns. Very much like the usual header in spreadsheet programs. 

For example: Study   Sample   Sample Size   Estimated Effect Size
CI 95%.

 

I tried to add axis(3), but apparently the forest plot isn't that kind of
graphic.

 

Does anyone have any idea? 

 

Cheerio

Sebastian

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data file with both fixed and tab-delimited fields

2010-03-02 Thread Marshall Feldman
Ah, I should have mentioned this. Personally I work on Macs (Leopard) 
and PC's (XP Pro and XP Pro x64). Even though the PC's do have Cygwin, 
I'm trying to make this code portable. So I want to avoid such things as 
sed, perl, etc.

I want to do this in R, even if processing is a bit slower. Eventually, 
I'll hide the code in a class, so the code can be a bit complex.

 Marsh Feldman

On 3/2/2010 12:29 PM, Chidambaram Annamalai wrote:
 I tried to shoehorn the read.* functions and match both the fixed 
 width and the variable width fields
 in the data but it doesn't seem evident to me. (read.fwf reads fixed 
 width data properly but the rest
 of the fields must be processed separately -- maybe insert NULL stubs 
 in the remaining fields and
 fill them in later?)

 One way is to sidestep the entire issue and convert the structured 
 data you have into a csv
 file using sed (usually available on  most *nix systems) with 
 something like so:

 cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^ 
 \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ 
 \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | less

 and see if the output is alright and use the resulting .csv file 
 directly in R using read.csv

 If that does not satisfy you maybe the R Wizards on the list might be 
 able to point you to a
 native R way of doing this possibly using scan? I'm not sure though.

 Hope this helps,
 Chillu

 On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman ma...@uri.edu 
 mailto:ma...@uri.edu wrote:

 Hello R wizards,

 What is the best way to read a data file containing both
 fixed-width and
 tab-delimited files? (More detail follows.)

 _*Details:*_
 The U.S. Bureau of Labor Statistics provides local area unemployment
 statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are
 documented in the file la.txt
 ftp://ftp.bls.gov/pub/time.series/la/la.txt. Each data file has five
 tab-delimited fields:

* series_id
* year
* period (codes for things like quarter or month of year)
* value
* footnote_codes

 The series_id consists of five fixed-width subfields (length in
 parentheses):

* survey abbreviation (2)
* seasonal code (1)
* area type code (2)
* area code (6)
* measure code (2)

 So an example record might be:

 LASPS36040003   1990M01 8.8 L

 I want to read in the data in one pass and convert them to a data
 frame with the following columns (actual name, class in parentheses):

Survey abbreviation (survey, character)
Seasonal (seasonal, logical seasonal=T)
Area type (area_type_code, factor)
Area (area_code, factor)
Measure (measure_code, factor)
Year (year, Date)
Period (period, factor)
Value (value, numeric)
Footnote (footnote_codes, character but see note)

 (Regarding the Footnote, I have to look at the data more. If there's
 just one code per record, this will be a factor; if there are
 multiple,
 it will either be character or a list. For not I'm making it only
 character.)

 Currently I can read the data just fine using read.table, but this
 makes
 series_id the first variable. I want to break out the subfields as
 separate columns.

 Any suggestions?

 Thanks.
 Marsh Feldman




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Dr. Marshall Feldman, PhD
Director of Research and Academic Affairs
CUSR Logo
Center for Urban Studies and Research
The University of Rhode Island
email: marsh @ uri .edu (remove spaces)


  Contact Information:


Kingston:

202 Hart House
Charles T. Schmidt Labor Research Center
The University of Rhode Island
36 Upper College Road
Kingston, RI 02881-0815
tel. (401) 874-5953:
fax: (401) 874-5511


Providence:

206E Shepard Building
URI Feinstein Providence Campus
80 Washington Street
Providence, RI 02903-1819
tel. (401) 277-5218
fax: (401) 277-5464

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding row ID numbers by group

2010-03-02 Thread Alexander Schwall
Hello R community,

I am hoping for some help with the following problem.

I have a data frame containing various groups. These groups are identified
by a grouping variable. I would like to add a sequential ID number to each
group to later sort these individuals within each group by this ID number.

Here is what the final result should look like:

ID   group var2
1  11
2  12
3  13
4  14
1  25
2  26
3  27
4  28
5  29
1  3   10
2  3   11
3  3   12
4  3   13
5  3   14


I have created the following code to loop through this and compare a given
row with the following row for the grouping variable. If a given row would
be different from the then following row, the ID number would be reset and I
would start counting up again. The problem that I am encountering that at
the bottom of the data frame the if statement runs out of a condition
against which to compare the last row.

Here is what I did:

group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

data-data.frame(group, var2)
data

#IDN is the desired ID number by group
IDN -numeric(length(test$var2))
IDN


for (i in 1:(length(data$group))) {
  if(data[i,1]  (length(data$group))){
   if(data[i,1] == data[i+1,1]){
  IDN[i]- sum(IDN[i-1],1)}
   else{
  IDN[i]- -55} #for now an arbitrary value
  }
  if(data[i,1] == (length(data$group))) {
  IDN[i] - 99 #for now an arbitrary value
  }
  }

IDN



Is there maybe an easier way to do this? Any thoughts would be very
appreciated since I am running out of ideas.

Thanks
Alexander

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Duncan Murdoch

On 02/03/2010 11:53 AM, William Dunlap wrote:

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin

 Sent: Tuesday, March 02, 2010 3:46 AM
 To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
 Subject: Re: [R] two questions for R beginners
 
 Please take what follows not as an ad hominem statement, but 
 rather as an attempt to improve what is already an excellent 
 program, that has been built as a result of many, many hours 
 of dedicated work by many, many unpaid, unsung volunteers.
 
 It troubles me a bit that when a confusing aspect of R is 
 pointed out the response is not to try to improve the 
 language so as to avoid the confusion, but rather to state 
 that the confusion is inherent in the language. I understand 
 that to make changes that would avoid the confusing aspect of 
 the language that has been discussed in this thread would 
 take time and effort by an R wizard (which I am not), time 
 and effort that would not be compensated in the traditional 
 sense. This does not mean that we should not acknowledge the 
 confusion. If we what R to be the de facto lingua franca of 
 statistical analysis doesn't it make sense to strive for 
 syntax that is as straight forward and consistent as possible? 


Whenever one changes the language that way old code
will break. 
I think in this case not much code would break.  Mostly when people have 
a matrix M and ask for M$column they'll get an error; the proposal is 
that they'll get the requested column.  (It is possible to have a list 
with names that is also a matrix with dimnames, but I think that is a 
pretty unusual construction.)  But I haven't been convinced that the 
proposal is a net improvement to the language. 


Duncan Murdoch


 The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break.  There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works.  Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good.  Why would one spend time writing a
package that might stop working when R is upgraded?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 Again, please understand that my comment is made with deepest 
 respect for the many people who have unselfishly contributed 
 to the R project. Many thanks to each and every one of you.
 
 John
 
 
  Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
 On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
 murd...@stats.uwo.ca 
 wrote:
  Suppose X is a dataframe or a matrix.  What would you 
 expect to get from 
  X[1]?  What about as.vector(X), or as.numeric(X)?
 
 All this of course depends on type of object one is speaking 
 of. There 
 are plenty of surprises available, and it's best to use the 
 most logical 
 way of extracting. E.g., to extract the top-left element of a 2D 
 structure (data frame or matrix), use 'X[1,1]'.
 
 Luckily, R provides some shortcuts. For example, you can 
 write 'X[2,3]' 
 on a data frame, just as if it was a matrix, even though the 
 underlying 
 structure is completely different. (This doesn't work on a 
 normal list; 
 there you have to type the whole 'X[[2]][3]'.)
 
 The behaviour of the 'as.' functions may sometimes be surprising, at 
 least for me. For example, 'as.data.frame' on a named vector gives a 
 single-column data frame, instead of a single-row data frame.
 
 (I'm not sure what's the recommended way of converting a 
 named vector to 
 row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
 and 't(X)' looks like a row of numbers.)
 
  The point is that a dataframe is a list, and a matrix 
 isn't.  If users 
  don't understand that, then they'll be confused somewhere.  Making 
  matrices more list-like in one respect will just move the confusion 
  elsewhere.  The solution is to understand the difference.
 
 My main problem is not understanding the difference, which is 
 easy, but 
 knowing which type of I have when I get the output a function in a 
 package. If I know the object is a named vector or a matrix 
 with column 
 names, it's easy enough to type 'X[,colname]', and if it's a data 
 frame one may use the shortcut 'X$colname'.
 
 Usually, it *is* documented what the return value of a 
 function is, but 
 just looking at the output is much faster, and *usually* gives the 
 correct answer.
 
 For example, 'mean' applied on a data frame gives a named 
 vector, not a 
 data frame, which is somewhat surprising (given that the columns of a 
 data frame may be of different types, while the elements of a 
 vector may 
 not). 

Re: [R] adding row ID numbers by group

2010-03-02 Thread Henrique Dallazuanna
Try this:

data$ID -  with(data, ave(group, group, FUN = seq))

On Tue, Mar 2, 2010 at 2:53 PM, Alexander Schwall
alexander.schw...@gmail.com wrote:
 Hello R community,

 I am hoping for some help with the following problem.

 I have a data frame containing various groups. These groups are identified
 by a grouping variable. I would like to add a sequential ID number to each
 group to later sort these individuals within each group by this ID number.

 Here is what the final result should look like:

 ID   group var2
 1      1    1
 2      1    2
 3      1    3
 4      1    4
 1      2    5
 2      2    6
 3      2    7
 4      2    8
 5      2    9
 1      3   10
 2      3   11
 3      3   12
 4      3   13
 5      3   14


 I have created the following code to loop through this and compare a given
 row with the following row for the grouping variable. If a given row would
 be different from the then following row, the ID number would be reset and I
 would start counting up again. The problem that I am encountering that at
 the bottom of the data frame the if statement runs out of a condition
 against which to compare the last row.

 Here is what I did:

 group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
 var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

 data-data.frame(group, var2)
 data

 #IDN is the desired ID number by group
 IDN -numeric(length(test$var2))
 IDN


 for (i in 1:(length(data$group))) {
      if(data[i,1]  (length(data$group))){
           if(data[i,1] == data[i+1,1]){
              IDN[i]- sum(IDN[i-1],1)}
           else{
              IDN[i]- -55} #for now an arbitrary value
      }
      if(data[i,1] == (length(data$group))) {
          IDN[i] - 99 #for now an arbitrary value
      }
  }

 IDN



 Is there maybe an easier way to do this? Any thoughts would be very
 appreciated since I am running out of ideas.

 Thanks
 Alexander

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread John Sorkin
William,
I agree that changing syntax can lead to problems. I don't, however think 
extending the language will break existing code. Providing a common syntax for 
accessing matrices and dataframes will not change the way things have been done 
to date, but rather how things will be done in the future.
John  
John Sorkin
jsor...@grecc.umaryland.edu 
-Original Message-
From: William Dunlap wdun...@tibco.com
To: John Sorkin jsor...@grecc.umaryland.edu
To: Karl Ove Hufthammer k...@huftis.org
To:  r-h...@stat.math.ethz.ch

Sent: 3/2/2010 11:53:45 AM
Subject: RE: [R] two questions for R beginners

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin
 Sent: Tuesday, March 02, 2010 3:46 AM
 To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch
 Subject: Re: [R] two questions for R beginners
 
 Please take what follows not as an ad hominem statement, but 
 rather as an attempt to improve what is already an excellent 
 program, that has been built as a result of many, many hours 
 of dedicated work by many, many unpaid, unsung volunteers.
 
 It troubles me a bit that when a confusing aspect of R is 
 pointed out the response is not to try to improve the 
 language so as to avoid the confusion, but rather to state 
 that the confusion is inherent in the language. I understand 
 that to make changes that would avoid the confusing aspect of 
 the language that has been discussed in this thread would 
 take time and effort by an R wizard (which I am not), time 
 and effort that would not be compensated in the traditional 
 sense. This does not mean that we should not acknowledge the 
 confusion. If we what R to be the de facto lingua franca of 
 statistical analysis doesn't it make sense to strive for 
 syntax that is as straight forward and consistent as possible? 

Whenever one changes the language that way old code
will break.  The developers can, with a lot of effort,
fix their own code, and perhaps even user-written code
on CRAN, but code that thousands of users have written
will break.  There is a lot of code out there that was
written by trial and error and by folks who no longer
work at an institution: the code works but no one knows
exactly why it works.  Telling folks they need to change
that code because we have a cleaner but different syntax
now is not good.  Why would one spend time writing a
package that might stop working when R is upgraded?

I think the solution is not to change current semantics
but to write functions that behave better and encourage
users to use them, gradually abandoning the old constructs.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 Again, please understand that my comment is made with deepest 
 respect for the many people who have unselfishly contributed 
 to the R project. Many thanks to each and every one of you.
 
 John
 
 
  Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM 
 On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
 murd...@stats.uwo.ca 
 wrote:
  Suppose X is a dataframe or a matrix.  What would you 
 expect to get from 
  X[1]?  What about as.vector(X), or as.numeric(X)?
 
 All this of course depends on type of object one is speaking 
 of. There 
 are plenty of surprises available, and it's best to use the 
 most logical 
 way of extracting. E.g., to extract the top-left element of a 2D 
 structure (data frame or matrix), use 'X[1,1]'.
 
 Luckily, R provides some shortcuts. For example, you can 
 write 'X[2,3]' 
 on a data frame, just as if it was a matrix, even though the 
 underlying 
 structure is completely different. (This doesn't work on a 
 normal list; 
 there you have to type the whole 'X[[2]][3]'.)
 
 The behaviour of the 'as.' functions may sometimes be surprising, at 
 least for me. For example, 'as.data.frame' on a named vector gives a 
 single-column data frame, instead of a single-row data frame.
 
 (I'm not sure what's the recommended way of converting a 
 named vector to 
 row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
 and 't(X)' looks like a row of numbers.)
 
  The point is that a dataframe is a list, and a matrix 
 isn't.  If users 
  don't understand that, then they'll be confused somewhere.  Making 
  matrices more list-like in one respect will just move the confusion 
  elsewhere.  The solution is to understand the difference.
 
 My main problem is not understanding the difference, which is 
 easy, but 
 knowing which type of I have when I get the output a function in a 
 package. If I know the object is a named vector or a matrix 
 with column 
 names, it's easy enough to type 'X[,colname]', and if it's a data 
 frame one may use the shortcut 'X$colname'.
 
 Usually, it *is* documented what the return value of a 
 function is, but 
 just looking at the output is much faster, and *usually* gives the 
 correct answer.
 
 For example, 'mean' applied on a data frame gives a named 
 vector, not a 
 data frame, which is 

Re: [R] adding row ID numbers by group

2010-03-02 Thread Felipe Carrillo
Like this?

group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
data-data.frame(group, var2)
data
ddply(data,group,transform,ID=1:length(group))
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish  Wildlife Service
California, USA



- Original Message 
 From: Alexander Schwall alexander.schw...@gmail.com
 To: r-help@r-project.org
 Sent: Tue, March 2, 2010 9:53:19 AM
 Subject: [R] adding row ID numbers by group
 
 Hello R community,

I am hoping for some help with the following 
 problem.

I have a data frame containing various groups. These groups are 
 identified
by a grouping variable. I would like to add a sequential ID number 
 to each
group to later sort these individuals within each group by this ID 
 number.

Here is what the final result should look like:

ID  
 group var2
1      1    1
2      
 1    2
3      1    3
4    
   1    4
1      2    5
2  
     2    6
3      2    
 7
4      2    8
5      2  
   9
1      3  10
2      3  
 11
3      3  12
4      3  
 13
5      3  14


I have created the following 
 code to loop through this and compare a given
row with the following row for 
 the grouping variable. If a given row would
be different from the then 
 following row, the ID number would be reset and I
would start counting up 
 again. The problem that I am encountering that at
the bottom of the data 
 frame the if statement runs out of a condition
against which to compare the 
 last row.

Here is what I did:

group- 
 c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
var2- 
 c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

data-data.frame(group, 
 var2)
data

#IDN is the desired ID number by group
IDN 
 -numeric(length(test$var2))
IDN


for (i in 
 1:(length(data$group))) {
      if(data[i,1]  
 (length(data$group))){
          if(data[i,1] == 
 data[i+1,1]){
              IDN[i]- 
 sum(IDN[i-1],1)}
          else{
    
           IDN[i]- -55} #for now an arbitrary 
 value
      }
      if(data[i,1] == 
 (length(data$group))) {
          IDN[i] - 99 
 #for now an arbitrary value
      }
  
 }

IDN



Is there maybe an easier way to do this? Any 
 thoughts would be very
appreciated since I am running out of 
 ideas.

Thanks
Alexander

    [[alternative HTML 
 version deleted]]

__
 ymailto=mailto:R-help@r-project.org; 
 href=mailto:R-help@r-project.org;R-help@r-project.org mailing list
 href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank 
 https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
 guide http://www.R-project.org/posting-guide.html
and provide commented, 
 minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to import map data (maptools?) from a html set of 'coords'

2010-03-02 Thread sylvain willart
Dear R users,

I would like to draw map and import it in maptools/spatstat packages.

The 'raw data' I have come from a web page (map.../map) and are
basically a list of coordinates of a polygon.

I would like to know how to import them in R; I checked the maptools
packages, but all the examples use existing .dbf files.

I just have a (serie of) text file(s) looking like this:

For example, for the French Region Burgundy:

area  href=region.asp?reg=26 shape=poly title=Bourgogne alt=Bourgogne
coords=208,121,211,115,221,113,224,115,225,120,229,122,232,128,251,125,255,
130,256,136,266,138,268,148,267,154,263,160,267,168,267,180,262,
175,256,178,254,184,248,184,243,187,237,187,232,185,234,181,227,
171,216,171,212,166,211,155,208,149,208,135,211,132,213,125,208,
121

any idea welcome,

sylvain

(If anayone is interested with that type of data, they're available at
the INSEE website
along with loads of information on the population and economy of each region)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Building R packages in Windows 7

2010-03-02 Thread Eric Ferreira
-- Forwarded message --
From: Duncan Murdoch murd...@stats.uwo.ca
Date: 25 February 2010 22:30
Subject: Re: [R] Building R packages in Windows 7
To: Eric Ferreira ericbferre...@gmail.com


On 25/02/2010 7:56 PM, Eric Ferreira wrote:

 Thank you, Sir, but how can I demand it to create HTML files?


The tools::Rd2HTML function will do the translation, but it takes a bit of
work to prepare input for it.  The idea is that you don't need to save the
files, R just produces them when the browser asks for them.

There's an option --enable-prebuilt-html that can be used when installing a
package, but I don't recommend using it.  You'll get the help page as it
exists at install time, not as it is intended to be displayed at run time.
 Links to other packages likely won't work properly.


Duncan Murdoch




 On 25 February 2010 14:41, Duncan Murdoch murd...@stats.uwo.ca wrote:

  On 25/02/2010 11:49 AM, Eric Ferreira wrote:

  Ok,

 I'm working under:
 Windows 7 Professional 32bits, 4 GB RAM, 320 GB HD, Intel Core 2 Duo
 processor
 R 2.10.1

 I've installed:
 Rtools211
 MikteX 2.8
 HTML Help Workshop

 Setting my PATH to:
 c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;c:\Arquivos de
 Programas\R\R-2.10.1pat\bin;c:\Arquivos de Programas\MikTeX
 2.8\miktex\bin;c:\Program Files\HTML Help Workshop

 ...creating the package called ExpDes and asking (at the prompt) :

 Rcmd build --binary ExpDes

 Among others, a warning message is printed: WARNING: some HTML links may
 not be found, and no html files are produced.


  Right, HTML help files are produced on demand, they aren`t stored in the
 binary package zip file.  HTML Help Workshop is not being used at all.

 Duncan Murdoch

  Thank you again.






 On 25 February 2010 13:02, Duncan Murdoch murd...@stats.uwo.ca wrote:

  On 25/02/2010 10:56 AM, Eric Ferreira wrote:

  This is my first package. I'm just getting started doing that,

 following

 the
 steps described on you website... I really don't know how I asking for
 CHMs
 to be produced, sorry.


  All I can suggest is that you need to be less stingy with information.
  Tell us what you did.  Tell us what symptoms you saw.  Do both of those

 by

 cut and paste from your console, don't paraphrase, or refer to vague
 instructions like your website.

 Duncan Murdoch


  On 25 February 2010 12:52, Duncan Murdoch murd...@stats.uwo.ca

 wrote:

 On 25/02/2010 10:40 AM, Eric Ferreira wrote:

  Dear Duncan

 Thank so much for your reply.
 Actually, I'm using the latest version of R and the problem

 persists.

 What

 do you use instead of HTML Help Workshop for newer R versions?


  We just produce text and HTML help pages on demand, and LaTeX ones

 for

 the

 pdf manuals.  How are you asking for CHMs to be produced?

 Duncan Murdoch

  Best regards

 Eric.

 On 25 February 2010 11:43, Duncan Murdoch murd...@stats.uwo.ca

 wrote:

 On 25/02/2010 9:06 AM, Eric Ferreira wrote:

  Dear useRs,

 I'm having trouble building R packages in Windows 7 regarding

 HTML

 help

 Workshop.
 Pointing PATH to c:\Program Files\HTML help Workshop does work in

 Windows

 (e.g. Vista) and does not in Windows 7.

 Some tips??


  We don't use the HTML Help Workshop any more since R 2.10.0, so

 you

 could

 upgrade to the current R, and the problem will go away.

  Otherwise, I

 think

 you'll have to ask Microsoft for help.  But they aren't likely to

 be

 helpful:  Win XP is the most recent OS listed as supported.

 Duncan Murdoch




















-- 
Dr Eric B Ferreira
Exact Sciences Department
Federal University of Alfenas
Brazil

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating matrix from long table in database (pivoting)

2010-03-02 Thread Jan Hornych
Hi all,

I have a table in database that is very long and when simplified it has only
two columns in it (id, text). id is the row, and text is the column.
Technically the text is a term and and id is the document.
If simplifying this and assuming there is only one occurrence of the term
per the document. I shall be able to convert this into a binary matrix.
Table looks like this...

*ID** **Text*

1 this
1 is
1 the
1 first
1 row
2 this
2 is
2 the
2 send
2 row
...


in R I would like to have it as

*id  this is the first second row*

1 1  1   1 1  0 1
2 1  1   1 0  1 1

it would be simpler for me to do this transformation in R as I guess the
language is more handy as the SQL. The table in R have few dozen thousand of
columns and rows as well. I know how to read the data from database, but
just unsure if there is some suitable transformation available.

Thank you
Jan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two questions for R beginners

2010-03-02 Thread Keo Ormsby

Liviu Andronic escribió:

On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com wrote:
  

On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote:


 Perhaps my biggest problem was that I couldn't (and still haven't) seen
*absolute beginners* documents.

  

there was once a link posted on r-sig-teaching that would probably fit
your needs, but I cannot find it now.




OK, I found it. Below is an excerpt of that r-sig-teaching e-mail.
Liviu

On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote:
  

I think such a website would be a real asset.  It would be most useful
if it either were restricted to intro. stats. OR organized so that
materials for real beginners were easy to extract from all the
materials for programmers and Ph.D. statisticians.  As a relative
beginner myself, I find the usual resources useless.  In self defense,
I created materials for my own beginning students:

 http://courses.statistics.com/software/R/Rhome.htm


Hi Liviu,
This is indeed the best site for introduction I have seen. Although it 
still assumes some things that at first might seem unintuitive to the 
absolute beginner I talk about. For instance, in the first page, it 
shows that you can do sqrt(x), where x can be a vector, and return a 
vector of the square roots of each number. Although this is high school 
matrix algebra, most users expect that the input to square root function 
to be a single number, not a matrix, as in Excel or a calculator. Other 
concepts that are not explicitly introduced are R workspace, the use 
of arguments in functions (with or without the =), etc. Others are 
things like  diff(range(rainfall)) , where you have the output of one 
function used as the input to another, all in the same command line. All 
these things seem very basic, but can be difficult if you are trying to 
learn on your own with no prior experience in programming.
I hope I am not sounding too difficult and contrarian, I am just trying 
to share my experience with starting with R, and in trying to convey 
this learning to my colleagues and students. In the end, I did find 
everything I needed to learn, and now I feel at ease with R, and I 
believe that almost anybody that can use Excel or something like it, 
could learn R.


Thank you for the information,
Best wishes,
Keo.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding row ID numbers by group

2010-03-02 Thread Alexander Schwall
Thank you gentlemen,

all three solutions are working and very insightful.

Your help and time is very much appreciated.

Alexander

On Tue, Mar 2, 2010 at 1:08 PM, Felipe Carrillo mazatlanmex...@yahoo.comwrote:

 Like this?

 group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
 var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
 data-data.frame(group, var2)
 data
 ddply(data,group,transform,ID=1:length(group))

 Felipe D. Carrillo
 Supervisory Fishery Biologist
 Department of the Interior
 US Fish  Wildlife Service
 California, USA



 - Original Message 
  From: Alexander Schwall alexander.schw...@gmail.com
  To: r-help@r-project.org
  Sent: Tue, March 2, 2010 9:53:19 AM
  Subject: [R] adding row ID numbers by group
 
  Hello R community,

 I am hoping for some help with the following
  problem.

 I have a data frame containing various groups. These groups are
  identified
 by a grouping variable. I would like to add a sequential ID number
  to each
 group to later sort these individuals within each group by this ID
  number.

 Here is what the final result should look like:

 ID
  group var2
 1  11
 2
  12
 3  13
 4
14
 1  25
 2
  26
 3  2
  7
 4  28
 5  2
9
 1  3  10
 2  3
  11
 3  3  12
 4  3
  13
 5  3  14


 I have created the following
  code to loop through this and compare a given
 row with the following row for
  the grouping variable. If a given row would
 be different from the then
  following row, the ID number would be reset and I
 would start counting up
  again. The problem that I am encountering that at
 the bottom of the data
  frame the if statement runs out of a condition
 against which to compare the
  last row.

 Here is what I did:

 group-
  c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3)
 var2-
  c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

 data-data.frame(group,
  var2)
 data

 #IDN is the desired ID number by group
 IDN
  -numeric(length(test$var2))
 IDN


 for (i in
  1:(length(data$group))) {
   if(data[i,1] 
  (length(data$group))){
   if(data[i,1] ==
  data[i+1,1]){
   IDN[i]-
  sum(IDN[i-1],1)}
   else{

IDN[i]- -55} #for now an arbitrary
  value
   }
   if(data[i,1] ==
  (length(data$group))) {
   IDN[i] - 99
  #for now an arbitrary value
   }

  }

 IDN



 Is there maybe an easier way to do this? Any
  thoughts would be very
 appreciated since I am running out of
  ideas.

 Thanks
 Alexander

 [[alternative HTML
  version deleted]]

 __
  ymailto=mailto:R-help@r-project.org;
  href=mailto:R-help@r-project.org;R-help@r-project.org mailing list
  href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank
  https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting
  guide http://www.R-project.org/posting-guide.html
 and provide commented,
  minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting fitted lme values as a smooth line

2010-03-02 Thread Alexandra R Contosta
I am trying to plot fitted lme values as a smooth line of a graph  
showing the exponential relationship between temperature and soil  
respiration.


In the plot, the x-axis has temperature, and the y-axis has soil  
respiration.  When I try to add a line showing temperature versus the  
fitted values, it is jagged and not smooth.



Here is the code I used:

lme.1-lme(fixed=LnFlux~Temp, random=~1|Plt, data=resp)
fit.1-exp(fitted(lme.1))

plot(Flux$Temp, Flux$Flux, xlab=Temperature,  
ylab=expression(CO[2]*Flux), xlim=c(-10, 25), ylim=c(0,250), pch=16)

ord1-order(CFlux$Temp)
lines(CFlux$TempC[ord1], fit.1[ord1], lty=1, lwd=2)

This does not produce a straight line, but a jagged one that moves up  
and down between points.


If I use fitted values from a simple linear model (lm), I don't have  
this problem, and the line is smooth:


lm.1-lm(LnFlux~Temp,  data=resp)
fit.2-exp(fitted(lm.1))
plot(Flux$Temp, Flux$Flux, xlab=Temperature,  
ylab=expression(CO[2]*Flux), xlim=c(-10, 25), ylim=c(0,250), pch=16)

ord2-order(Flux$Temp)
lines(Flux$Temp[ord2], fit.2[ord2], lty=1, lwd=2)

The only difference I can find between the two is the structure of the  
fitted objects.  The fit.1 object from lme is atomic, and lacks  
individual data labels.  Instead, the labels are: attr(*, label)=  
chr Fitted values.
In contrast, the fit.2 object from the lm is Named num, with:  
attr(*, names)= chr [1:460] 1 2 3 4.


Is this difference causing my problem with adding a smooth line to the  
graph?  If so, is there any way I can change the structure of the lme  
fitted object to make it more amenable to adding a smooth line to a  
plot?  Or is something else at work?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Compare parameter estimates of a nlsList object

2010-03-02 Thread Jens Currie
Hello together, 

Is there a tool to test the statistical differences between parameter estimates
of an nlsList fit, with more than two groups? I am able to complete the nlme
function for two groups after getting starting paramaters in nlsList, as seen
below. 

fit.nlme - nlme(rate ~ SSmicmen(conc, Vm, K), fixed=Vm+K~state, groups=~state,
start=c(212, -52, 0.06,- 0.01), data=Puromycin) 
summary(fit.nlme)

However, I am unable to test the differences between more than 2 paramaters. My
data set has 5 different groups and therefore has 5 different paramaeter
estimates and I am not sure how to fill in the start=c() for more than 2
groups.

Thanks 

Jens

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating matrix from long table in database (pivoting)

2010-03-02 Thread Henrique Dallazuanna
Try this:

DF - read.table(textConnection(1 this
1 is
1 the
1 first
1 row
2 this
2 is
2 the
2 send
2 row))
reshape(DF, v.names = 'V2', idvar = 'V1', timevar = 'V2', direction = 'wide')

On Tue, Mar 2, 2010 at 3:35 PM, Jan Hornych jh.horn...@gmail.com wrote:
 Hi all,

 I have a table in database that is very long and when simplified it has only
 two columns in it (id, text). id is the row, and text is the column.
 Technically the text is a term and and id is the document.
 If simplifying this and assuming there is only one occurrence of the term
 per the document. I shall be able to convert this into a binary matrix.
 Table looks like this...

 *ID** **Text*
 
 1 this
 1 is
 1 the
 1 first
 1 row
 2 this
 2 is
 2 the
 2 send
 2 row
 ...


 in R I would like to have it as

 *id  this is the first second row*
 
 1     1  1   1     1          0     1
 2     1  1   1     0          1     1

 it would be simpler for me to do this transformation in R as I guess the
 language is more handy as the SQL. The table in R have few dozen thousand of
 columns and rows as well. I know how to read the data from database, but
 just unsure if there is some suitable transformation available.

 Thank you
 Jan

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating matrix from long table in database (pivoting)

2010-03-02 Thread Phil Spector

Jan -
Here's one way:


tbl = data.frame(id=c(1,1,1,1,1,2,2,2,2,2),

 
text=c('this','is','the','first','row','this','is','the','second','row'))


xtabs(~id+text,tbl)

  text
id  first is row second the this
  1 1  1   1  0   11
  2 0  1   1  1   11

It's a bit tricky to automatically get the column headings to 
be in the order you want.  This comes close:



tbl$text = factor(tbl$text,levels=tbl$text[!duplicated(tbl$text)])
xtabs(~id+text,tbl)

  text
id  this is the first row second
  11  1   1 1   1  0
  21  1   1 0   1  1

Hope this helps.
- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Tue, 2 Mar 2010, Jan Hornych wrote:


Hi all,

I have a table in database that is very long and when simplified it has only
two columns in it (id, text). id is the row, and text is the column.
Technically the text is a term and and id is the document.
If simplifying this and assuming there is only one occurrence of the term
per the document. I shall be able to convert this into a binary matrix.
Table looks like this...

*ID** **Text*

1 this
1 is
1 the
1 first
1 row
2 this
2 is
2 the
2 send
2 row
...


in R I would like to have it as

*id  this is the first second row*

1 1  1   1 1  0 1
2 1  1   1 0  1 1

it would be simpler for me to do this transformation in R as I guess the
language is more handy as the SQL. The table in R have few dozen thousand of
columns and rows as well. I know how to read the data from database, but
just unsure if there is some suitable transformation available.

Thank you
Jan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating matrix from long table in database (pivoting)

2010-03-02 Thread Henrique Dallazuanna
Or better:

reshape(cbind(DF, value = 1), v.names = 'value', idvar = 'V1', timevar
= 'V2', direction = 'wide')

On Tue, Mar 2, 2010 at 3:49 PM, Henrique Dallazuanna www...@gmail.com wrote:
 Try this:

 DF - read.table(textConnection(1 this
 1 is
 1 the
 1 first
 1 row
 2 this
 2 is
 2 the
 2 send
 2 row))
 reshape(DF, v.names = 'V2', idvar = 'V1', timevar = 'V2', direction = 'wide')

 On Tue, Mar 2, 2010 at 3:35 PM, Jan Hornych jh.horn...@gmail.com wrote:
 Hi all,

 I have a table in database that is very long and when simplified it has only
 two columns in it (id, text). id is the row, and text is the column.
 Technically the text is a term and and id is the document.
 If simplifying this and assuming there is only one occurrence of the term
 per the document. I shall be able to convert this into a binary matrix.
 Table looks like this...

 *ID** **Text*
 
 1 this
 1 is
 1 the
 1 first
 1 row
 2 this
 2 is
 2 the
 2 send
 2 row
 ...


 in R I would like to have it as

 *id  this is the first second row*
 
 1     1  1   1     1          0     1
 2     1  1   1     0          1     1

 it would be simpler for me to do this transformation in R as I guess the
 language is more handy as the SQL. The table in R have few dozen thousand of
 columns and rows as well. I know how to read the data from database, but
 just unsure if there is some suitable transformation available.

 Thank you
 Jan

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] capturing errors in Sweave

2010-03-02 Thread Sundar Dorai-Raj
What I ended up using was:

cat(unclass(tmp))

--sundar

On Tue, Mar 2, 2010 at 8:58 AM, Berwin A Turlach ber...@maths.uwa.edu.auwrote:

 G'day Sundar,

 On Tue, 2 Mar 2010 01:03:54 -0800
 Sundar Dorai-Raj sdorai...@gmail.com wrote:

  Thanks, Berwin. That works just great!

 You are welcome.

 I noticed by now that cat(tmp) is sufficient; the tmp[1] in
 cat(tmp[1]) was a left over from earlier attempts to get the output
 to look correct.

 Cheers,

Berwin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Patrick Connolly
On Mon, 01-Mar-2010 at 12:01PM -0500, Max Kuhn wrote:

| In theory, the choice between two perfectly correlated predictors is
| random. Therefore, the importance should be diluted by half.
| However, this is implementation dependent.
| 
| For example, run this:
| 
|   set.seed(1)
|   n - 100
|   p - 10
| 
|   data - as.data.frame(matrix(rnorm(n*(p-1)), nrow = n))
|   data$dup - data[, p-1]
| 
|   data$y - 2 + 4 * data$dup - 2 * data$dup^2 + rnorm(n)
| 
|   data - data[, sample(1:ncol(data))]
| 
|   str(data)
| 
|   library(gbm)
|   fit - gbm(y~., data = data,
|  distribution = gaussian,
|  interaction.depth = 10,
|  n.trees = 100,
|  verbose = FALSE)
|   summary(fit)

What happens if there's a third?


 data$DUP -data$dup 
  fit - gbm(y~., data = data,
+  distribution = gaussian,
+  interaction.depth = 10,
+  n.trees = 100,
+  verbose = FALSE)
   summary(fit)
   var rel.inf
1  DUP 55.98653321
2  dup 42.99934344
3   V2  0.30763599
4   V1  0.17108839
5   V4  0.14272470
6   V3  0.13069450
7   V6  0.07839121
8   V7  0.07109805
9   V5  0.06080096
10  V8  0.05168955
11  V9  0.
 

So V9 which was identical to dup has now gone off the radar altogether.

At first I thought that might be because 100 trees wasn't nearly
enough, so I increased it to 6000 and added in some cross-validation.
Doing a summary at the optimal number of trees still gives a similar
result.

I have to admit to being somewhat puzzled.


-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Liaw, Andy
In most implementations of boosting, and for that matter, single tree,
the first variable wins when there are ties.  In randomForest the
variables are sampled, and thus not tested in the same order from one
node to the next, thus the variables are more likely to share the
glory.

Best,
Andy 

From: Patrick Connolly
 
 On Mon, 01-Mar-2010 at 12:01PM -0500, Max Kuhn wrote:
 
 | In theory, the choice between two perfectly correlated 
 predictors is
 | random. Therefore, the importance should be diluted by half.
 | However, this is implementation dependent.
 | 
 | For example, run this:
 | 
 |   set.seed(1)
 |   n - 100
 |   p - 10
 | 
 |   data - as.data.frame(matrix(rnorm(n*(p-1)), nrow = n))
 |   data$dup - data[, p-1]
 | 
 |   data$y - 2 + 4 * data$dup - 2 * data$dup^2 + rnorm(n)
 | 
 |   data - data[, sample(1:ncol(data))]
 | 
 |   str(data)
 | 
 |   library(gbm)
 |   fit - gbm(y~., data = data,
 |  distribution = gaussian,
 |  interaction.depth = 10,
 |  n.trees = 100,
 |  verbose = FALSE)
 |   summary(fit)
 
 What happens if there's a third?
 
 
  data$DUP -data$dup 
   fit - gbm(y~., data = data,
 +  distribution = gaussian,
 +  interaction.depth = 10,
 +  n.trees = 100,
 +  verbose = FALSE)
summary(fit)
var rel.inf
 1  DUP 55.98653321
 2  dup 42.99934344
 3   V2  0.30763599
 4   V1  0.17108839
 5   V4  0.14272470
 6   V3  0.13069450
 7   V6  0.07839121
 8   V7  0.07109805
 9   V5  0.06080096
 10  V8  0.05168955
 11  V9  0.
  
 
 So V9 which was identical to dup has now gone off the radar 
 altogether.
 
 At first I thought that might be because 100 trees wasn't nearly
 enough, so I increased it to 6000 and added in some cross-validation.
 Doing a summary at the optimal number of trees still gives a similar
 result.
 
 I have to admit to being somewhat puzzled.
 
 
 -- 
 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
 ~.~.~.~.~.   
___Patrick Connolly   
  {~._.~}   Great minds discuss ideas
  _( Y )_   Average minds discuss events 
 (:_~*~_:)  Small minds discuss people  
  (_)-(_). Eleanor Roosevelt
 
 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
 ~.~.~.~.~.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange behavior with poisosn and glm

2010-03-02 Thread Rolf Turner

On 2/03/2010, at 9:02 PM, Noah Silverman wrote:

 Hi,
 
 I'm just learning about poison links for the glm function.
 
 One of the data sets I'm playing with has several of the variables as 
 factors (i.e. month, group, etc.)
 
 When I call the glm function with a formula that has a factor variable, 
 R automatically converts the variable to a series of variables with 
 unique names and binary values.
 
 For example, with this pseudo data:
 
 yv1month
 21january
 31.4februrary
 1.56.3february
 1.24.5january
 5.54.0march
 
 I use this call:
 
 m - glm(y ~ v1 + month, family=poisson)
 
 R gives me back a model with variables of
 Intercept
 v1
 monthJanuary
 monthFebruary
 monthMarch

No it didn't!!!  You are kidding the troops/being economical with the 
truth.

If you had used the data that you show, it would've ``given you a model 
with
variables'':

Intercept
v1
monthfebruray
monthjanuary
monthmarch

No caps in the month name and note the miss-spelling of ``february''.

You actually have ***four*** levels for the month factor:

january februrary february march

If you had spelt ``februrary'' correctly you would have got variables

Intercept
v1
monthjanuary
monthmarch

The first level, february would have been omitted, under the default 
contrasts
(contr.treatment).  You need k-1 dummy variables to specify a factor 
with k levels.

 I'm concerned that this might be doing some strange things to my model.

No, you are doing strange things.

Notice also that the Poisson distribution is a distribution of 
***counts***.
Non-negative integers.  Whole numbers.  Values like 1.5 and 1.2 make no 
immediate
sense in terms of the Poisson distribution.  The Poisson likelihood can 
be evaluated
with non-integer responses, but the glm() function will quite rightly 
worry about
non-integer values and give you a warning.  (Which you didn't mention.)

If you really have non-integer valued responses you shouldn't be using 
the Poisson
family; the quasi family *might* be appropriate --- if you know what 
you're doing.

 Can anyone offer some enlightenment?

I hope you feel enlightened.

cheers,

Rolf Turner
##
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
##

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] distribution for random effects?

2010-03-02 Thread Maureen Ryan
Hi R users,


I am using the following model to analyze data from a factorial experiment
(randomized complete block design with no replication within blocks):


model - glm(survival ~ density * vegetation + (1|block), data=sal2005,
family=binomial)

Does R use a binomial distribution in this formulation to model random
effects or a normal distribution (in which case the analysis is not binomial
at the scale of the experiment)?  If the latter, is there a way to specify
the distribution for random effects?

Thanks, Maureen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Three most useful R package

2010-03-02 Thread Ralf B
Hi R-fans,

I would like put out a question to all R users on this list and hope
it will create some feedback and discussion.

1) What are your 3 most useful R package? and

2) What R package do you still miss and why do you think it would make
a useful addition?

Pulling answers together for these questions will serve as a guide for
new users and help people who just want to get a hint where to look
first. Happy replying!

Best,
Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Max Kuhn
On Tue, Mar 2, 2010 at 2:43 PM, Liaw, Andy andy_l...@merck.com wrote:
 In most implementations of boosting, and for that matter, single tree,
 the first variable wins when there are ties.

They must be in a union :-)

 What happens if there's a third?

If they were P perfectly correlated predictors, the importance would
would be 100% for the first one encountered by gbm. In reality, where
the correlation is strong but not perfect, the other variables would
show up with small importances. In the case of RF, the dilution
factor is 1/P for perfect correlations and gets fuzzier as the
correlation decreases (for reasons that Andy articulated).

-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add a header to a forest plot (metafor)

2010-03-02 Thread Viechtbauer Wolfgang (STAT)
Hi Sebastian,

Here is an example showing a forest plot with some column headings:



library(metafor)

data(dat.bcg)
dat - dat.bcg

res - rma(ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat, measure=RR)

windows(width=6.5, height=4.0, pointsize=10)
par(mar=c(4,0,4,0))
forest(res, slab=paste(dat$author, , , dat$year, sep=),
   xlim=c(-16,6), at=log(c(.05,.25,1,4)), atransf=exp,
   ilab=cbind(dat$tpos, dat$tneg, dat$cpos, dat$cneg),
   ilab.xpos=c(-9.5,-8,-6,-4.5), cex=.8, ylim=c(-1.5,16), efac=1.8)
text(c(-9.5,-8,-6,-4.5), 14.7, c(TB+, TB-, TB+, TB-),   font=2, cex=.8)
text(c(-8.75,-5.25), 15.7, c(Vaccinated, Control),  font=2, cex=.8)
text(-16,14.7, Author(s) and Year, pos=4, font=2, cex=.8)
text(6,  14.7, Relative Risk [95% CI], pos=2, font=2, cex=.8)
title(Figure 1: Forest Plot of the BCG Vaccine Data)



So, just use the text() function to add those column headings. With the ilab 
and ilab.xpos arguments, you can add the information for those columns to the 
plot.

I hope the example helps!

Best,

--
Wolfgang Viechtbauerhttp://www.wvbauer.com/
Department of Methodology and StatisticsTel: +31 (0)43 388-2277
School for Public Health and Primary Care   Office Location:
Maastricht University, P.O. Box 616 Room B2.01 (second floor)
6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)


Original Message
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Sebastian Stegmann Sent: Tuesday, March 02, 2010 18:36
To: r-help@r-project.org
Subject: [R] add a header to a forest plot (metafor)

 Dear R-community,

 I'm currently trying to assemble a forest plot using the forest
 function from package metaphor.

 Works well. Even the regular main-argument works for adding a title to
 the graph.

 However, I would like to add one top row which explains the nature of the
 columns. Very much like the usual header in spreadsheet programs.

 For example: Study   Sample   Sample Size   Estimated Effect Size
 CI 95%.

 I tried to add axis(3), but apparently the forest plot isn't that kind
 of graphic.

 Does anyone have any idea?

 Cheerio

 Sebastian


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with choose.files

2010-03-02 Thread jim holtman
Try this; specify where you want the second one to start:

files.a - choose.files()
# now change to the directory of the first file name to continue search
files.b - choose.files(paste(dirname(files.a[1L]), *, sep='/'))

On Tue, Mar 2, 2010 at 12:17 PM, Caleb Rounds caleb.rou...@gmail.com wrote:
 I have recently upgraded to R 2.10.1 on Windows XP and am using
 scripts that I've used in previous versions successfully. I'm having a
 problem with choose.files. The lines read:

 fura_scan_file-choose.files(caption=Select log file (*.log) for fura-2 
 scans)
 PI_scan_file-choose.files(caption=Select log file (*.log) for PI scans)


 The problem is that the directory chosen after the first choose.files
 is not remembered. This is an issue b/c my files are nested inside of
 several directories and it takes a lot of clicking to get to where I
 need to be. Is there a problem with these lines? Is it likely
 elsewhere in the script?

 I apologize for my ignorance and wasting time, but in the
 documentation for choose.files it suggests this should happen
 automatically.

 Caleb Rounds

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable substitution in for loops

2010-03-02 Thread Jon Erik Ween
Friends

Seems I've run into another snag. More of the nitty-gritty r-details I don't 
understand.

So, as I mentioned below, dataset[[var_sub]] seems to be understood well by the 
functions I previously used and I was able to run my loop successfully with the 
[[var_sub]] as a variable-substitution method. However, now I want to do the 
same with TukeyHSD, and this function does not play nice with this kind of 
syntax. So if I do


fac-as.factor(dataset$factor)
res-aov(dataset$var~dataset$factor)
tuk-TukeyHSD(res,fac)

things work fine. But if I try (similar to the script below which worked for 
ROCR functions):

fac-as.factor(dataset$factor)
var_sub-noquotes(var)
res-aov(dataset[[var_sub]]~dataset$factor)
tuk-TukeyHSD(res,fac)

TukeyHSD craps out with an error, even though res is identical in both cases, 
apart from the formula syntax.

So, TukeyHSD seems to be picky about syntax. Is there any other way I can do 
variable substitution (so I can read variable names from my list) and get this 
loop to work for TukeyHSD?

Thanks

Jon


Friends

First, thanks to all for great feed-back. Open-source rocks! I have a workable 
solution to my question, attached below in case it might be of any use to 
anyone. I'm sure there are more elegant ways of doing this, so any further 
feedback is welcome!

Things I've learned (for other noobs like me to learn from):

1) dataset[[j]] seems equivalent to dataset$var if j-var, though quotes can 
mess 
you up, hence j-noquote(varlist[i]) in the script (it also makes a difference 
that variables in varlist be stored as a space-separated string. tab- or 
line-break-separated lists don't seem to work, though a different method might 
handle 
that)

dataset[[var]] is equivalent to dataset$var given var does not contain any 
special characters. Otherwise j == var has to be TRUE.

2) Loops will abort if they encounter an error (like ROCR encountering a 
prediction that is singular). Error handling can be built in, but is a little 
tricky. I reduplicated the method with a function to test and advance the loop 
on failure. You can suppress error messages if you like)
Not tricky, just use try().


3) Some stats methods don't have NA handling built into them (eg: prediction 
in ROCR chokes if there are empty cells in the variables) hence it seems a good 
idea to 
strip these out before starting. The subsetting with na.omit does this

... given you know what you are doing (and omitting).


4) You reference pieces (slots) of results (S3/S4 objects) by using obj...@slot.
The @ operator is defined for slots of *S4* classes.


Best,
Uwe Ligges

 Hence, you pull out the the auc value in ROCR-performance by p...@y.value 
 in the script. you can see what slots are in an object by simply listing the 
 object contents at the command lineobject.
Thanks again for all the help!

Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
 University of Toronto Faculty of Medicine


...code





## R script for automating stats crunching in large datasets  ##
## Needs space separated list of variable names matching dataset column names ##
## You have to tinker with the code to customize for your application ##
##  
  ##
## Jon Erik Ween MD, MSc,  26 Feb 2010##


library(ROCR) # Load stats package to use if not standard
varslist-scan(/Users/jween/Desktop/INCASvars.txt,list) # Read variable list
results-as.data.frame(array(,c(3,length(varslist # Initialize results
array, one type of stat at a time for now

for (i in 1:length(varslist)){ # Loop throught the variables you want to 
process. Determined by varslist
j-noquote(varslist[i])
vars-c(varslist[i],Issue_class) # Variables to be analyzed
temp-na.omit(incas[vars]) # Have to subset to get rid of NA values
causing ROCR to choke
n-nrow(temp) # Record how many cases the analysis ios based on. Need 
to figure out how to calc cases/controls
#.table-table(temp$SubjClass)  # Maybe for later figure out 
cases/controls
results[1,i]-j # Name particular results column
results[2,i]-n # Number of subjects in analysis
test-try(aucval(i,j),silent=TRUE) # Error handling in case procedure 
craps oust so loop can continue. Supress annoying error messages
if(class(test)==try-error) next else # Run procedure only if OK, 
otherwise skip
pred-prediction(incas[[j]],incas$Issue_class); # Procedure
perf-performance(pred,auc);
results[3,i]-as.numeric(p...@y.values) # Enter result into 

[R] R / R+ Webminar *** R-PLUS Rocks: Interactive, Comprehensible and Highly Visual. March 12th @ 12PM ET (USA Time)

2010-03-02 Thread s...@xlsolutions-corp.com
Welcome to R/ R-PLUS Webminar Series. R-PLUS 3.3 Rocks: Interactive,
Comprehensible and Highly Visual. 
http://www.xlsolutions-corp.com/webminar.asp. 

March 12th @ 12PM ET (USA Time)

Increase your productivity with R-PLUS 3.3 by attending the webminar and
learning how to:

1. Interactively clicking your way through your favorite statisticals
models without the need of programming.
2. Use state-of-the-art R-PLUS tools to produce Publication Quality
Graphics and Reports at a click
3. Edit your Graphics
4. Take advantage of the new R-PLUS 64-bit on windows for larger data
sets
5. For SAS users, our new app R+SAS2R lets you see at a click exactly
which R function (syntax included) is equivalent to a given SAS Proc!

Come learn about R-PLUS 3.3 new cool features and suggest improvements.

Space is limited. Reserve your webminar seat now at:
http://www.xlsolutions-corp.com/webminar.asp. 
You can also email Ms Jennifer McDonald ( jen at  xlsolutions-corp.com)
to register or request the free webminar video.

Our March-April R training courses are available at:  
www.xlsolutions-corp.com/rcourses

 Regards -

 Sue Turner
 Senior Account Manager
 XLSolutions Corporation
 North American Division
 1700 7th Ave
 Suite 2100
 Seattle, WA 98101
 Phone: 206-686-1578
 Email: sue at xlsolutions-corp.com
 web: www.xlsolutions-corp.com/rcourses


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a timeSeries Data Frame

2010-03-02 Thread Luis Felipe Parra
Hello I have 2000 univariate timeSeries of about 20 observations each, as
the following, I would like to store all of them in one object, sort of a
data frame, and to be able to recall each by its column name, which by the
way is the same as the first date. Do you know how can I do this. Thank you

Felipe Parra

GMT
 2009-10-12
2009-10-12  0.002346171
2009-10-14  0.002346171
2009-10-21  0.002346171
2009-10-28  0.002650307
2009-11-16  0.002391950
2009-11-16  0.003848032
2010-03-16  0.003848032
2010-06-17  0.008644137
2010-09-16  0.010690464
2010-12-15  0.016356718
2011-03-15  0.018496109
2011-06-16  0.023354671
2011-09-15  0.025211351
2011-12-21  0.029029900
2012-03-21  0.031173566
2012-06-21  0.033641977
2012-10-15  0.023078052
2013-04-15 -0.118415755
2013-10-15 -0.010497527
2014-04-14  0.010497527
2014-10-14 -0.010497527

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] selection simulation

2010-03-02 Thread vasquez

Hi, I was having trouble developing a looping function in a selection
simulation that I'm trying to develop and I was hoping if someone could
help. Basically, I have a matrix with a random generated numbers
representing scores on the variables. The rows represent applicants and
columns represent the variables.

# Number of variables
p-5

# Number of applicants
n_ap-100

# Create random scores for the applicants across five variables
z-rnorm(n_ap*p,0,1)

#Put z into a matrix
dim(z)-c(n_ap,p)

#Rank applicant scores on variable X1
 d1 - rev(order(z[,1]))

#Rank applicant scores on variable X2
 d2 - rev(order(z[,2]))

#Rank applicant scores on variable X3
 d3 - rev(order(z[,3]))

#Rank applicant scores on variable X4
 d4 - rev(order(z[,4]))

#Rank applicant scores on variable X5
 d5 - rev(order(z[,5]))

pool - cbind(d1,d2,d3,d4,d5)

Is there a way to specify a vector of selection ratio (e.g.
sr-c(.10,.20,.30) ) and use this in a loop so that the function will
produce a matrix of applicants who will be selected when the selection ratio
is .10, .20, and .30?

Thank you and please leave a post if more info is needed 
-- 
View this message in context: 
http://n4.nabble.com/selection-simulation-tp1575587p1575587.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binding a matrix to a matrix

2010-03-02 Thread Luis Felipe Parra
Hello I have a 2x10x200 matrix and I would like to bind to it another 2x10
matrix in order to end up with an 2x10x2001 matrix, which command should i
use in order to do this? Thank you

Felipe Parra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Binding a matrix to a matrix

2010-03-02 Thread Greg Snow
If it is 3 dimensional then it is an array, not a matrix.  The abind function 
in the abind package is probably what you want.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Luis Felipe Parra
 Sent: Tuesday, March 02, 2010 3:23 PM
 To: r-help@r-project.org
 Subject: [R] Binding a matrix to a matrix
 
 Hello I have a 2x10x200 matrix and I would like to bind to it another
 2x10
 matrix in order to end up with an 2x10x2001 matrix, which command
 should i
 use in order to do this? Thank you
 
 Felipe Parra
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >