Re: [R] R Crashes when using large matrices (Ubuntu 11.04)

2011-06-04 Thread Prof Brian Ripley

On Fri, 3 Jun 2011, Matias Salibian-Barrera wrote:


Hello,

This simple SVD calculation (commands are copied immediately below) 
crashes on my Ubuntu machine (R 2.13.0). However it works fine on my 
Windows 7 machine, so I suspect there's a problem with (my?) Ubuntu 
and / or R. Can anybody else reproduce it (with Ubuntu 11.04)? 
Thanks in advance.



From the traceback, the error appears to be in LAPACK or BLAS.
There is no evidence here that 'R crashes' rather than one of those 
crashed R.


You don't tell us whether you compiled R yourself or used someone 
else's pre-compiled distribution -- if the latter, ask on r-sig-debian 
as this is most likely a problem with the distribution, since 
Debian/Ubuntu builds normally replace R's LAPACK/BLAS with that from 
the OS.


It works correctly on a vanilla R build on i686 Fedora 14.



p - 500
n - 300
set.seed(1234)
x - matrix(rnorm(n*p), n, p)
sih - var(x)
b - svd(sih)

produces:

 *** caught illegal operation ***
address 0x42b8c9, cause 'illegal operand'

Traceback:
 1: .Call(La_svd, jobu, jobv, x, double(min(n, p)), u, v, dgsedd, PACKAGE = 
base)
 2: La.svd(x, nu, nv)
 3: svd(sih)

I'm using Ubuntu 11.04 and

version

   _   
platform   i686-pc-linux-gnu   
arch   i686
os linux-gnu   
system i686, linux-gnu 
status 
major  2   
minor  13.0
year   2011
month  04  
day    13  
svn rev    55427   
language   R   
version.string R version 2.13.0 (2011-04-13)

Thanks,

Matias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] outlining data points

2011-06-04 Thread lana
Hi,
 I have tried numerous methods and packages, but thus far cannot seem to
find a solution. I am looking to essentially draw a filled colored shape
around subsets on my data points on a scatter plot where none of the shapes
overlap but instead bend around each other if necessary. I finally came up
with the following steps which approximate what I am looking for, but I am
completely lost as to how to implement several of the steps. The steps are
listed below, along with a pictorial representation of what I am hoping they
will achieve. Any suggestions would be greatly appreciated.

1. Draw a circle of a give radius around each data point 
radius can be the same for each point, or can be determined by a vector 
of
values either the same length of the number of points or is repeated until
all points are assigned
2. Begin with the area of greatest overlap between two circles, draw a line
segment between the two intersection points and assign either side of that
line to its respective shape
3. Repeat with largest remaining area of overlap. If a previous division has
left an intersection point within the new area of overlap, such that there
are now two possible points to attach the line segment to, use the one from
which a division has already been drawn (so that three shapes now come
together in a point)
4. Repeat with successively smaller areas of overlap until no remain
5. Fill each resulting shape with a color determined by an outside vector
associated with the points
6. (if possible) calculate the area of each resultant shape

http://r.789695.n4.nabble.com/file/n3572306/diagram_circles_coloring_3.png 

http://r.789695.n4.nabble.com/file/n3572306/circle_interactions_3.png 

--
View this message in context: 
http://r.789695.n4.nabble.com/outlining-data-points-tp3572306p3572306.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] movie3d in rgl object 'movie' not found

2011-06-04 Thread harryxgordon
Hello,

I am trying to save a .gif movie using movie3d from the package {rgl}.  I am
using the following code combined with the globe example on the ?movie3d
page.  I've installed ImageMagick and the directory seems to be working
properly, i.e. when I do Sys.getenv(PATH), C:\\Program Files
(x86)\\ImageMagick-6.7.0-Q16 shows up.



library(rgl)
open3d()

lat - matrix(seq(90,-90, len=50)*pi/180, 50, 50, byrow=TRUE)
long - matrix(seq(-180, 180, len=50)*pi/180, 50, 50)

r - 6378.1 # radius of Earth in km
x - r*cos(lat)*cos(long)
y - r*cos(lat)*sin(long)
z - r*sin(lat)
persp3d(x, y, z, col=white, 
   texture=system.file(textures/world.png,package=rgl), 
   specular=black, axes=FALSE, box=FALSE, xlab=, ylab=, zlab=,
   normal_x=x, normal_y=y, normal_z=z)

#I run the above, note the device ID and then enter the following with
rgl.cur(1) if my device ID is 1.

movie3d(par3dinterp(par3dsave(params = c(userMatrix, scale, zoom,
FOV), times = FALSE, dev = rgl.cur(1))), duration = 5, fps = 10, movie =
movie, frames = movie, dir=tempdir(), type = gif)

#The par3d window pops up, I move the globe around a bit and press record
a few times.  Then when I press quit, I get the following error:

Error in sprintf(%s%03d.png, frames, i) : object 'movie' not found



Sorry if I've made a silly mistake; I'm kind of a newb.  I haven't found any
record of this same issue on the web.


Many Thanks!
Michelle



--
View this message in context: 
http://r.789695.n4.nabble.com/movie3d-in-rgl-object-movie-not-found-tp3572316p3572316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic growth model

2011-06-04 Thread magushi
HI I want to Fit a logistic growth model for this data set where
y = k*exp(b0+b1(age))/1 + exp(bo+b1(age)),
start = list(b0 = 3, b1 = 3.5 ),
   trace = TRUE)

I need to find the initial valued for b0 and b1. K =3. When I run using b0=3
abd b1 = 3.5, or any number I get the following error


397448.4 :  3.0 3.5 
Error in numericDeriv(form[[3L]], names(ind), env) : 
  Missing value or an infinity produced when evaluating the model. Can
anyone help me how to get the initial values? what does this error msg
implies?
 


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-growth-model-tp3572734p3572734.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic growth model

2011-06-04 Thread Renalda

I want to Fit a logistic growth model for
y = k *eb0+b1(age)/1 + eb0+b1(age), can some one help on how to get the 
initial coefficients b0 and b1? I need to estimate in order to do the 
regression analysis. When I run using b0=0.5 and b1=3.4818, I get the 
following error


397443.8 :  0.5 3.4818
Error in nls(Height ~ k * exp(b1 + b2 * Age)/(1 + exp(b1 + b2 * Age)),  :
  singular gradient
please tell me what is wrong with my initials values, and how to get 
the initial values


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] movie3d in rgl object 'movie' not found

2011-06-04 Thread Prof Brian Ripley

On Fri, 3 Jun 2011, someone ashamed of his/her real name wrote:


Hello,

I am trying to save a .gif movie using movie3d from the package {rgl}.  I am
using the following code combined with the globe example on the ?movie3d
page.  I've installed ImageMagick and the directory seems to be working
properly, i.e. when I do Sys.getenv(PATH), C:\\Program Files
(x86)\\ImageMagick-6.7.0-Q16 shows up.



library(rgl)
open3d()

lat - matrix(seq(90,-90, len=50)*pi/180, 50, 50, byrow=TRUE)
long - matrix(seq(-180, 180, len=50)*pi/180, 50, 50)

r - 6378.1 # radius of Earth in km
x - r*cos(lat)*cos(long)
y - r*cos(lat)*sin(long)
z - r*sin(lat)
persp3d(x, y, z, col=white,
  texture=system.file(textures/world.png,package=rgl),
  specular=black, axes=FALSE, box=FALSE, xlab=, ylab=, zlab=,
  normal_x=x, normal_y=y, normal_z=z)

#I run the above, note the device ID and then enter the following with
rgl.cur(1) if my device ID is 1.


But rgl.cur() is the current device, and it does not take an argument 
in the version of rgl I have.



movie3d(par3dinterp(par3dsave(params = c(userMatrix, scale, zoom,
FOV), times = FALSE, dev = rgl.cur(1))), duration = 5, fps = 10, movie =
movie, frames = movie, dir=tempdir(), type = gif)


I think you meant to set dev= in movie3d, not par3dsave (which appears 
to be part of package tkrgl which you failed to even mention).



#The par3d window pops up, I move the globe around a bit and press record
a few times.  Then when I press quit, I get the following error:

Error in sprintf(%s%03d.png, frames, i) : object 'movie' not found



Sorry if I've made a silly mistake; I'm kind of a newb.  I haven't found any
record of this same issue on the web.


Don't give the values of arguments that you want to take default 
values.  Specifying 'frames = movie' is not the same thing as using 
the default value (the scoping rules differ).  None of


fps = 10, movie = movie, frames = movie, dir=tempdir(), type = gif)

is needed (nor would dev = rgl.cur() be).




Many Thanks!
Michelle



--
View this message in context: 
http://r.789695.n4.nabble.com/movie3d-in-rgl-object-movie-not-found-tp3572316p3572316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating random covariance matrices (with a uniform distribution of correlations)

2011-06-04 Thread Petr Savicky
On Fri, Jun 03, 2011 at 01:54:33PM -0700, Ned Dochtermann wrote:
 Petr,
 This is the code I used for your suggestion:
 
   k-6;kk-(k*(k-1))/2
   x-matrix(0,5000,kk)
   for(i in 1:5000){
   A.1-matrix(0,k,k)
   rs-runif(kk,min=-1,max=1)
   A.1[lower.tri(A.1)]-rs
   A.1[upper.tri(A.1)]-t(A.1)[upper.tri(A.1)]
   cors.i-diag(k)
   t-.001-min(Re(eigen(A.1)$values))
   new.cor-cov2cor(A.1+(t*cors.i))
   x[i,]-new.cor[lower.tri(new.cor)]}
   hist(c(x)); max(c(x)); median(c(x))
 
 This, unfortunately, does not maintain the desired distribution of
 correlations.

Hello.

On the contrary to what i thought originally, there are solutions
also for the case of the correlation matrix. The first solution
creates a singular correlation matrix (of rank 3), but the nondiagonal
entries have exactly the uniform distribution on [-1, 1], since
the scalar product of two independent uniformly distributed unit
vectors in R^3 has the uniform distribution on [-1, 1].

  x - matrix(rnorm(18), nrow=6, ncol=3)
  x - x/sqrt(rowSums(x^2))
  a - x %*% t(x)

The next solution produces a correlation matrix of full rank, whose
non-diagonal entries have distribution very close to the uniform on
[-1, 1]. KS test finds a difference only with sample size more
than 50'000.

  w - c(0.01459422, 0.01830718, 0.04066405, 0.50148488, 0.60330865, 0.61832829)
  x - matrix(rnorm(36), nrow=6, ncol=6) %*% diag(w)
  x - x/sqrt(rowSums(x^2))
  a - x %*% t(x)

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value of 'pi'

2011-06-04 Thread Ted Harding
I asked a radiographer friend of mine to examine this
suggestion, but he said it wouldn't scan.

Ted.
 
On 04-Jun-11 04:04:43, John wrote:
 Last line, try
 
 but one can't, for Pi is transcendental.
 
 
 On Friday, June 03, 2011 04:12:07 AM Jim Lemon wrote:
 On 06/01/2011 10:14 AM, baptiste auguie wrote:
  I propose a Pi Haiku (PIQ),
  
  Pi is of certain value,
  In statistics, invaluable, yet
  Transcending numerics.
 
 How about a pi limerick?
 
 Pi, the great circumferential,
 nearly sent the geometers mental.
 For they tried to extract
 a solution exact
 but one can't, for the thing's transcendental.
 
 Jim
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 04-Jun-11   Time: 11:38:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bootstrap or Wilks ?

2011-06-04 Thread thibault.charles
Dear R-users,

The following question is more a statistic question than a R issue.

I would like to know the difference between the two following non
parametric technics : the bootstrap and the Wilks formula.

I understand well the theory about the two technics but cannot find
anything about their advantages and drawbacks and how to choose one
rather than the other...
The Wilks formula seems to be the simpliest but I don't know more.

Anybody could help ?

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superscripts in strip labels of lattice plot

2011-06-04 Thread Bert Gunter
Peter et, al:

As a minor  note ,,,

On Fri, Jun 3, 2011 at 8:00 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 David has given you the answer. I'll just add that you might
 want to widen the strips a bit if you use superscripted factor levels:

  xyplot(Sepal.Length ~ Sepal.Width | Species, data = iris,
      strip = strip.custom(factor.levels = expression(
          'A'^2,'A'^3,'A'^4)),
      par.settings = list(layout.heights = list(strip = 1.5)))

The quotes surrounding A^2, A^3, A^4  can be omitted.

-- Bert


 Peter Ehlers


 Here is an example that comes up on a search with terms expression
 strip.default (which I thought was the correct argument to the strip
 parameter but turns out I was not remembering my documentation
 correctly:

 http://finzi.psych.upenn.edu/R/Rhelp02/archive/57933.html

 --

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Crashes when using large matrices (Ubuntu 11.04)

2011-06-04 Thread Prof. John C Nash
On Ubuntu 10.04 it ran fine, albeit in a machine with lots of memory, it seems 
to work
fine. Here's the output:
 rm(list=ls())
 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 131881  7.1 35 18.7   35 18.7
Vcells 128838  1.0 786432  6.0   559631  4.3
 p - 500
 n - 300
 set.seed(1234)
 x - matrix(rnorm(n*p), n, p)
 sih - var(x)
 b - svd(sih)

 gc()
  used (Mb) gc trigger (Mb) max used (Mb)
Ncells  133536  7.2 35 18.7   35 18.7
Vcells 1030006  7.92644909 20.2  2536523 19.4


 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



Maybe another 11.04 glitch.

JN


On 06/04/2011 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 93
 Date: Fri, 3 Jun 2011 18:55:06 -0700 (PDT)
 From: Matias Salibian-Barrera msalib...@yahoo.ca
 To: R-help@r-project.org R-help@r-project.org
 Subject: [R] R Crashes when using large matrices (Ubuntu 11.04)
 Message-ID: 75655.88533...@web161614.mail.bf1.yahoo.com
 Content-Type: text/plain; charset=iso-8859-1
 
 
 This simple SVD calculation (commands are copied 
 immediately below) crashes on my Ubuntu machine (R 2.13.0). However it 
 works fine on my Windows 7 machine, so I suspect there's a problem with 
 (my?) Ubuntu and / or R. Can anybody else reproduce it (with Ubuntu 
 11.04)? Thanks in advance.
 
 p - 500
 n - 300
 set.seed(1234)
 x - matrix(rnorm(n*p), n, p)
 sih - var(x)
 b - svd(sih)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] library(SenoMineR)- Triangle Test Query

2011-06-04 Thread Vijayan Padmanabhan
Dear R Group
I was trying to use the triangle.test function in SensoMineR and strangely i
encounter a error in the output of preference matrix from the analysis.
To illustrate, pl see the following dataframe of a design with the response
and preference collected as shown below:

design-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2,
4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4,
3, 1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1), `Product Z` = c(3,
2, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 3, 2), Response =
structure(c(1L,
2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 2L), .Label = c(X, Z), class = factor), Preference =
structure(c(1L,
3L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L,
2L, 1L, 2L), .Label = c(X, Y, Z), class = factor)), .Names =
c(Product X,
Product Y, Product Z, Response, Preference), class = data.frame,
row.names = c(Panelist1.Test1,
Panelist1.Test2, Panelist2.Test1, Panelist2.Test2, Panelist3.Test1,
Panelist3.Test2, Panelist4.Test1, Panelist4.Test2, Panelist5.Test1,
Panelist5.Test2, Panelist6.Test1, Panelist6.Test2, Panelist7.Test1,
Panelist7.Test2, Panelist8.Test1, Panelist8.Test2, Panelist9.Test1,
Panelist9.Test2, Panelist10.Test1, Panelist10.Test2))

If you were to investigate the above dataframe, you would find that for the
comparision of Product 2 Vs Product 3, the preference indicates product 3 is
preferred over product 2 all the time.

## Read output from the following script to see what i mean above:
subset(design,`Product X`==2`Product Y`==3`Product Z`==3)

##Output of above would be:
. Product X Product Y Product Z Response Preference
Panelist3.Test2 2 3 3X  Y
Panelist6.Test2 2 3 3X  Y
Panelist9.Test2 2 3 3X  Y

However when I analyse the design with the answers and preferences using the
following script, I get the $pref output which shows that product 2 is
preferred over 3 all the time. Can somebody explain what is wrong in my
script?

answer-as.vector(design$Response)
preference-as.vector(design$Preference)
triangle.test (design[,1:3], answer,preference)

##$pref output from the triangle.test function shows as follows:

$pref
  1 2 3 4
1 0 0 0 0
2 4 0 3 0
3 0 0 0 0
4 0 0 0 0


Any help in helping me identify what is going wrong here would be highly
appreciated.
Regards
Vijayan Padmanabhan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Checking and building package

2011-06-04 Thread Uwe Ligges



On 04.06.2011 09:37, Petar Milin wrote:

Dear Uwe,
Please, can you help me with this?
I simplified the situation:
I am using Debian testing i386 (also, I have VirtualBox with Win XP on it)
myPackage is the name of the package
and foo.c is in the src/
in NAMESPACE I put:
useDynLib(foo)
export(f1,
f2)
everything else is in appropriate place: data/, man/, R/, DESCRIPTION.

And as you asked, R CMD INSTALL (or build --binary) generates
myPackage.so, but not foo.so.
How to proceed? Where I went wrong?


Everything is alright and behaves as it should. The dll/so should be 
named as the package.
Well, foo is just an example and should be replaced by your package's 
name. So my guess was right and you missed the point to replace the 
foo by the package name in


useDynLib(foo)

Best,
Uwe Ligges






Best,
Petar

On 03/06/11 22:32, Uwe Ligges wrote:



On 03.06.2011 21:46, Petar Milin wrote:

Hello!
I am truing to compile an R-package having c-code. I put foo.c in src/
folder and useDynLib(foo)



Where foo is the name of your package, I hope.
Does R CMD INSTALL yourpackage generate a packagename.so (or .dll)? If
so, it is just the useSynLib() entry that fails. Or does R CMD INSTALL
give any error message?

Uwe Ligges


 in NAMESPACE file. When trying R CMD check,

I got an error message that shared object 'foo' is not found. Then I did
R CMD SHLIB foo.c first. However, after that, I got warnings from R CMD
check that there is an object file in /src folder. Even worse is if I
run R CMD SHLIB for Windows and for Linux and put in /src both foo.so
and foo.dll.
What I am doing wrong? I thought that only *.c is needed in src/, then,
I read in someones advice that both the source and shared library must
be in src/. What should be done if one wants to prepare for CRAN?

Thanks!

Best,
Petar







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-SIG-Finance] Measure quality of fit for MA(q), ARMA(p, q) and GARCH(p, q)

2011-06-04 Thread Robert A'gata
Thank you so much all for your invaluable inputs.

On Sat, Jun 4, 2011 at 3:36 AM, Patrick Burns patr...@burns-stat.com wrote:
 A common thing to do is the Ljung-Box
 test on the residuals.  For garch it
 would be the residuals squared.

 Actually for garch it should be the
 rank of the squared residuals -- see
 http://www.burns-stat.com/pages/Working/ljungbox.pdf

 However, this is an in-sample test.  Much
 better is to do out-of-sample tests.

 On 04/06/2011 04:46, Robert A'gata wrote:

 Hi,

 I would like to ask for a guideline on how to assess quality of fit
 for MA, ARMA and GARCH process. For AR, it still looks like a
 regression for me. So I still can rely on R-square as long as the time
 series itself is stationary. However, for MA, ARMA or GARCH, I do not
 know what measure I should use to assess fit quality. Any suggestions
 would be appreciated. Thank you.

 Robert

 ___
 r-sig-fina...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-sig-finance
 -- Subscriber-posting only. If you want to post, subscribe first.
 -- Also note that this is not the r-help list where general R questions
 should go.


 --
 Patrick Burns
 patr...@burns-stat.com
 http://www.burns-stat.com
 http://www.portfolioprobe.com/blog
 twitter: @portfolioprobe


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] outlining data points

2011-06-04 Thread David Winsemius


On Jun 3, 2011, at 7:20 PM, lana wrote:


Hi,
I have tried numerous methods and packages, but thus far cannot  
seem to
find a solution. I am looking to essentially draw a filled colored  
shape
around subsets on my data points on a scatter plot where none of the  
shapes
overlap but instead bend around each other if necessary. I finally  
came up
with the following steps which approximate what I am looking for,  
but I am
completely lost as to how to implement several of the steps. The  
steps are
listed below, along with a pictorial representation of what I am  
hoping they

will achieve. Any suggestions would be greatly appreciated.


Are you trying to reinvent the wheel? A typical test for clustering  
algorithms that attempt to find non-convex clusters (ones that bend  
around) is the rFace function. Searching with the engine used to  
support the RSiteSearch function finds 17 R functions in various  
packages that use it in their examples:


http://search.r-project.org/cgi-bin/namazu.cgi?query=rFacemax=100result=normalsort=scoreidxname=functions

--
David.




1. Draw a circle of a give radius around each data point
	radius can be the same for each point, or can be determined by a  
vector of
values either the same length of the number of points or is repeated  
until

all points are assigned
2. Begin with the area of greatest overlap between two circles, draw  
a line
segment between the two intersection points and assign either side  
of that

line to its respective shape
3. Repeat with largest remaining area of overlap. If a previous  
division has
left an intersection point within the new area of overlap, such that  
there
are now two possible points to attach the line segment to, use the  
one from

which a division has already been drawn (so that three shapes now come
together in a point)
4. Repeat with successively smaller areas of overlap until no remain
5. Fill each resulting shape with a color determined by an outside  
vector

associated with the points
6. (if possible) calculate the area of each resultant shape

http://r.789695.n4.nabble.com/file/n3572306/diagram_circles_coloring_3.png

http://r.789695.n4.nabble.com/file/n3572306/circle_interactions_3.png

--
View this message in context: 
http://r.789695.n4.nabble.com/outlining-data-points-tp3572306p3572306.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)

2011-06-04 Thread zhu yao
Dear UseRs:

Recently, I have read an article regarding the association between age and
lymph node metastases.
http://jco.ascopubs.org/content/27/18/2931.long
In statistical analysis, the authors stated Because a nonlinear
relationship between age and lymph node involvement was expected based on
existing literature, lymph node involvement was also regressed on age using
nonparametric logistic regression based on locally weighted scatterplot
smoothing (lowess).
http://jco.ascopubs.org/content/27/18/2931.long#ref-11
Could someone explain nonparametric logistic regression based on locally
weighted scatterplot smoothing (lowess)?
Or it is nonparametric regression based on locally weighted scatterplot
smoothing (lowess)

Thanks

*Yao Zhu*
*Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)

2011-06-04 Thread David Winsemius


On Jun 4, 2011, at 11:41 AM, zhu yao wrote:


Dear UseRs:

Recently, I have read an article regarding the association between  
age and

lymph node metastases.
http://jco.ascopubs.org/content/27/18/2931.long
In statistical analysis, the authors stated Because a nonlinear
relationship between age and lymph node involvement was expected  
based on
existing literature, lymph node involvement was also regressed on  
age using
nonparametric logistic regression based on locally weighted  
scatterplot

smoothing (lowess).
http://jco.ascopubs.org/content/27/18/2931.long#ref-11
Could someone explain nonparametric logistic regression based on  
locally

weighted scatterplot smoothing (lowess)?
Or it is nonparametric regression based on locally weighted  
scatterplot

smoothing (lowess)



One can use a logistic link and a local likelihood. Loader describes  
the advantages of such a strategy and shows a worked example in pages  
60-65 of her text Local Regression and Likelihood.  But there is no  
apparent R content in this question (and the authors of the above  
paper said they used SAS) so this very much off-topic for this list.  
You really should start such requests for explication by addressing  
the authors of the paper. Two other web-based statistical sites for  
general or medical statistics questions can be found at the  
GoogleGroups MedStats group and http://stats.stackexchange.com/ .


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] LM/two way analysis/classic parametrisation

2011-06-04 Thread kfl
I will be pleased to know, how to get the classic parametrisation in a two
way analysis of varians:

Classic parametrisation:
Observed = intercept + row-effect + col-effect+ error, where sum af
row-effect=0 and sum of col_effect=0 



--
View this message in context: 
http://r.789695.n4.nabble.com/LM-two-way-analysis-classic-parametrisation-tp3573453p3573453.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LM/two way analysis/classic parametrisation

2011-06-04 Thread David Winsemius


On Jun 4, 2011, at 10:33 AM, kfl wrote:

I will be pleased to know, how to get the classic parametrisation in  
a two

way analysis of varians:

Classic parametrisation:
Observed = intercept + row-effect + col-effect+ error, where sum af
row-effect=0 and sum of col_effect=0


?contrasts # which has a link to contr.sum

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Crashes when using large matrices (Ubuntu 11.04)

2011-06-04 Thread Douglas Bates
On Fri, Jun 3, 2011 at 7:03 PM, Matias Salibian-Barrera
msalib...@yahoo.ca wrote:
 Hello,

 This simple SVD calculation (commands are copied immediately below) crashes 
 on my Ubuntu machine (R 2.13.0). However it works fine on my Windows 7 
 machine, so I suspect there's a problem with (my?) Ubuntu and / or R. Can 
 anybody else reproduce it (with Ubuntu 11.04)? Thanks in advance.

Works fine for me with Ubuntu 11.04 (amd_64) and the pre-compiled R-2.13.0
$ wajig list r-base-core
ii  r-base-core2.13.0-2natty0 GNU
R core of statistical computation and graphics system

 n - 300
 set.seed(1234)
 x - matrix(rnorm(n*p), n, p)
 sih - var(x)
 b - svd(sih)
 str(b)
List of 3
 $ d: num [1:500] 5.04 4.94 4.92 4.83 4.82 ...
 $ u: num [1:500, 1:500] -0.03663 0.05414 0.00182 -0.02847 -0.00117 ...
 $ v: num [1:500, 1:500] -0.03663 0.05414 0.00182 -0.02847 -0.00117 ...
 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] OAuthFactory error

2011-06-04 Thread hawkhandler
i'm trying to create an new OAuthFactory variable for twitter but when i run
the handshake i get the following error:

Error in FUN(c(key, secret,  : 
  unused argument(s) (post.amp = TRUE)

has anyone seen this before or have any suggestions. 

thanks ahead

--
View this message in context: 
http://r.789695.n4.nabble.com/OAuthFactory-error-tp3573800p3573800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modify a data frame by values in the columns

2011-06-04 Thread Peter Ehlers

On 2011-06-03 13:34, Jason024 wrote:

I have a data frame like this:

 col1 col2
r1 21
r2 43
r3 65
r4 87
r5109
r612   11
r714   13
r816   15
r918   17
r10   20   19

I want to modify this data frame, for example, assign every row in column
col1 and col2 to -1 if the values in col1 is less than 12 and values in col2
is greater than 10. The result should look like this:
 col1 col2
r1 -11
r2 -13
r3 -15
r4 -17
r5 -19
r612   -1
r714   -1
r816   -1
r918   -1
r10  20   -1

I have been struggling to make it to work. Any help is appreciated!


This seems made for within(); calling your data.frame 'd':

 d.new - within(d, {
 col1 - ifelse(col1  12, -1, col1)
 col2 - ifelse(col2  10, -1, col2)
})


Peter Ehlers



Jason


--
View this message in context: 
http://r.789695.n4.nabble.com/modify-a-data-frame-by-values-in-the-columns-tp3571995p3571995.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modify a data frame by values in the columns

2011-06-04 Thread Peter Ehlers

On 2011-06-04 11:11, Peter Ehlers wrote:

On 2011-06-03 13:34, Jason024 wrote:

I have a data frame like this:

  col1 col2
r1 21
r2 43
r3 65
r4 87
r5109
r612   11
r714   13
r816   15
r918   17
r10   20   19

I want to modify this data frame, for example, assign every row in column
col1 and col2 to -1 if the values in col1 is less than 12 and values in col2
is greater than 10. The result should look like this:
  col1 col2
r1 -11
r2 -13
r3 -15
r4 -17
r5 -19
r612   -1
r714   -1
r816   -1
r918   -1
r10  20   -1

I have been struggling to make it to work. Any help is appreciated!


This seems made for within(); calling your data.frame 'd':

   d.new- within(d, {
   col1- ifelse(col1  12, -1, col1)
   col2- ifelse(col2  10, -1, col2)
  })


And, of course, the ifelse() isn't necessary:

  d.new - within(d, {
  col1[ col1  12 ] - -1
  col2[ col2  10 ] - -1
 })


Peter Ehlers



Jason


--
View this message in context: 
http://r.789695.n4.nabble.com/modify-a-data-frame-by-values-in-the-columns-tp3571995p3571995.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OAuthFactory error

2011-06-04 Thread Uwe Ligges
What have you actually tried? Please read the posting guide. It also 
tells you to specify version numbers and OS. New vesions are on its way 
to CRAN.


So we can neitehr reproduce (without your code) nor know which version 
you are trying.


Uwe Ligges


On 04.06.2011 19:46, hawkhandler wrote:

i'm trying to create an new OAuthFactory variable for twitter but when i run
the handshake i get the following error:

Error in FUN(c(key, secret,  :
   unused argument(s) (post.amp = TRUE)

has anyone seen this before or have any suggestions.

thanks ahead

--
View this message in context: 
http://r.789695.n4.nabble.com/OAuthFactory-error-tp3573800p3573800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nonlinear model fitting of numerical integral function

2011-06-04 Thread arzensekd
Dear R List-Members,

I wish to find the nonlinear least squares of function defined by an
integral which must be
evaluated numerically.

Is that possible to implement in R?
If it is possible, which problems I need to consider first?

Many Thanks,

Dejan 

--
View this message in context: 
http://r.789695.n4.nabble.com/Nonlinear-model-fitting-of-numerical-integral-function-tp3573978p3573978.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)

2011-06-04 Thread Bert Gunter
Take a look at packages mgcv or gam (and probably others). Different
smoothers are used, but it's nonlinear, nonparametric logistic
regression. which is usually the important part. It also penalizes,
which can be even more important than which smoother is used.

-- Bert

On Sat, Jun 4, 2011 at 9:02 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Jun 4, 2011, at 11:41 AM, zhu yao wrote:

 Dear UseRs:

 Recently, I have read an article regarding the association between age and
 lymph node metastases.
 http://jco.ascopubs.org/content/27/18/2931.long
 In statistical analysis, the authors stated Because a nonlinear
 relationship between age and lymph node involvement was expected based on
 existing literature, lymph node involvement was also regressed on age
 using
 nonparametric logistic regression based on locally weighted scatterplot
 smoothing (lowess).
 http://jco.ascopubs.org/content/27/18/2931.long#ref-11
 Could someone explain nonparametric logistic regression based on locally
 weighted scatterplot smoothing (lowess)?
 Or it is nonparametric regression based on locally weighted scatterplot
 smoothing (lowess)


 One can use a logistic link and a local likelihood. Loader describes the
 advantages of such a strategy and shows a worked example in pages 60-65 of
 her text Local Regression and Likelihood.  But there is no apparent R
 content in this question (and the authors of the above paper said they used
 SAS) so this very much off-topic for this list. You really should start such
 requests for explication by addressing the authors of the paper. Two other
 web-based statistical sites for general or medical statistics questions can
 be found at the GoogleGroups MedStats group and
 http://stats.stackexchange.com/ .

 --
 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Snowball RWeka

2011-06-04 Thread A N
I too have this problem. Everything worked fine last year, but after
updating R and packages I can no longer do word stemming.
Unfortunately, I didn't save the old binaries, otherwise I would just
revert back.

Hoping someone finds a solution for R on Windows. Thanks!
There is a potential solution for R on Mac OS from Kurt Hornik copied
below, but I cannot get this to work on Windows.

Here's the code I'm running:
 #1) Using package Snowball
 library(Snowball)
 source - readLines(system.file(words,
porter,voc.txt,package = Snowball))
 result - SnowballStemmer(source)
 #2) Using package tm
 library(tm)
 data(crude)
 stemDocument(crude[[1]])

In both instances I got a Java error Could not initialize the
GenericPropertiesCreator. This exception was produced:
java.lang.NullPointerException. After receiving this error once in
the session, no further error messages are generated. However,
SnowballStemmer() and stemDocument() return the original unstemmed
text.

Possible Solution:
For those on Mac OS, Kurt Hornik wrote...
 These issues seem to be specific to Mac OS X.  Recent versions of Weka
 have added a package management system not unlike R's, to the effect
 that now when external packages (or the Snowball jar) is loaded their
 KnowledgeFlow GUI is started, which in turn requires AWT---and from what
 I understand, this does not work on Mac OS X.

 Short term, you should be able to Sys.setenv(NOAWT, true).

 More long term, the Weka maintainers have patched their upstream code so
 that it is possible to turn off the dynamic class discovery altogether,
 but I have not found the time to test this ...

I realize this solution was for Mac OS, but not knowing anything about
rJava I tried this on Windows anyways resulting in Error in
Sys.setenv(NOAWT, true) : all arguments must be named

Here's my session info.
  R version 2.13.0 Patched (2011-04-21 r55576)
  Platform: i386-pc-mingw32/i386 (32-bit) (Windows Vista)

  locale:
  [1] LC_COLLATE=English_United States.1252
  [2] LC_CTYPE=English_United States.1252
  [3] LC_MONETARY=English_United States.1252
  [4] LC_NUMERIC=C
  [5] LC_TIME=English_United States.1252

  attached base packages:
  [1] stats graphics  grDevices datasets  utils
methods   base

  other attached packages:
  [1] Snowball_0.0-7 tm_0.5-6   rcom_2.2-3.1   rscproxy_1.3-1

  loaded via a namespace (and not attached):
  [1] grid_2.13.0   rJava_0.9-0 (same error with multiple
older versions) RWeka_0.4-7   RWekajars_3.7.3-1
  [5] slam_0.1-22   tools_2.13.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Predicted values based on fixed effects do not correspond with actual data in cross-classified generalized linear mixed model (lmer)

2011-06-04 Thread Gert
Dear R-Users,

I have fitted a cross-classified generalized linear mixed model using the
lmer package with the following code. 

Mod-lmer(y~x+(1|a)+(1|b)+ (1|c), family=binomial)

In this case, only including a covariate (x) as a fixed effect.

The fitted values, using fitted(mod), correspond to the raw data nicely, and
the mean of the fitted values is equal to the mean of the raw data. In
addition, the parameter estimate for the fixed effect (x) corresponds to the
data as well (the slope ‘seems’ right). So far so good. 

The problem arises when I calculate the predicted values based on the
intercept and the parameter estimate of the fixed effect, using the formula
exp(X)/(1+EXP(X)), where X=intercept + par. Est. * x. 

When I use calculate the mean of these predicted values, this mean is much
lower than the mean of the actual data. The shape of the predicted curve
fits nicely to the data, but the predicted lines is always ‘below’ the
actual data. Apparently, the intercept of the curve is not predicted
correctly.

Does anyone know why this is? 

I guess it has something to do with the fact that the intercept for the
fixed effects is estimated for a certain value of the random effects?
According to the R documentation on fitted values; ‘the fitted values at
level i are obtained by adding together the contributions from the estimated
fixed effects and the estimated random effects…’. But is there an 'average
contribution' of the random effects?

Is there a way to evaluate the fixed effects at the ‘average level’ of the
random effects? Do I need to adjust the formula for the predictions to take
into account the random effects?

Many thanks,
Gert Stulp


--
View this message in context: 
http://r.789695.n4.nabble.com/Predicted-values-based-on-fixed-effects-do-not-correspond-with-actual-data-in-cross-classified-gener-tp3574116p3574116.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using Tps function

2011-06-04 Thread Tania S Pena-Baca
Hi,
I'm using the Tps function of the fields package to plot 2D surfaces.  My
problem is that some arrays have a lot of zeroes and this function fits the
data in a way that contour lines for zero are all over the place.  Sometimes
there are lines where there shouldn't be any that extend from the part of
the array where there are values greater than zero.  How can't I get rid of
them (besides using Illustrator or photoshop)?  I have tried changing the
lambda, but it doesn't help.  Is there any other function that would work
better to smooth data?
thank you in advance,
Tania

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] packages for power law distribution

2011-06-04 Thread fernando del bon






p { margin-bottom: 0.08in; }

Dear All,

I will appreciate some
suggestions of R packages for ESTIMATION OF THE EXPONENT OF
POWER-LAW FREQUENCY DISTRIBUTIONS.  I have been searching at
the R-help list several keywords for this subject and I did not find
a very specific package, except the useful normalp package.  I
believe there are others but I was not able to identify it.  I have
interest in the exponent of power-law distribution of some events
(only frequency) and not bivariate relationships between
two variable.  Specifically I am looking for packages that has
functions of pareto, truncate pareto, discrete pareto and power
law function with maximum likelihood estimation.  What would be
the suggestions for that?

Thanks a lot for your
attention.
Sincerely,
Fernando



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] packages for power law distribution

2011-06-04 Thread Thomas Lumley
There's code at http://tuvalu.santafe.edu/~aaronc/powerlaws/

   -thomas

On Sun, Jun 5, 2011 at 9:01 AM, fernando del bon
fernandodel...@yahoo.com.br wrote:






        p { margin-bottom: 0.08in; }

 Dear All,

 I will appreciate some
 suggestions of R packages for ESTIMATION OF THE EXPONENT OF
 POWER-LAW FREQUENCY DISTRIBUTIONS.  I have been searching at
 the R-help list several keywords for this subject and I did not find
 a very specific package, except the useful normalp package.  I
 believe there are others but I was not able to identify it.  I have
 interest in the exponent of power-law distribution of some events
 (only frequency) and not bivariate relationships between
 two variable.  Specifically I am looking for packages that has
 functions of pareto, truncate pareto, discrete pareto and power
 law function with maximum likelihood estimation.  What would be
 the suggestions for that?

 Thanks a lot for your
 attention.
 Sincerely,
 Fernando



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)

2011-06-04 Thread Thomas Lumley
The Stanform gam()  [gam package] has choices of spline or
local-polynomial (defaulting to local-linear) smoothers.  That's
probably the best match for the description.  It  shouldn't be
necessary to guess -- the paper should have cited the package -- but
we know that is often missed.

-thomas

On Sun, Jun 5, 2011 at 7:43 AM, Bert Gunter gunter.ber...@gene.com wrote:
 Take a look at packages mgcv or gam (and probably others). Different
 smoothers are used, but it's nonlinear, nonparametric logistic
 regression. which is usually the important part. It also penalizes,
 which can be even more important than which smoother is used.

 -- Bert

 On Sat, Jun 4, 2011 at 9:02 AM, David Winsemius dwinsem...@comcast.net 
 wrote:

 On Jun 4, 2011, at 11:41 AM, zhu yao wrote:

 Dear UseRs:

 Recently, I have read an article regarding the association between age and
 lymph node metastases.
 http://jco.ascopubs.org/content/27/18/2931.long
 In statistical analysis, the authors stated Because a nonlinear
 relationship between age and lymph node involvement was expected based on
 existing literature, lymph node involvement was also regressed on age
 using
 nonparametric logistic regression based on locally weighted scatterplot
 smoothing (lowess).
 http://jco.ascopubs.org/content/27/18/2931.long#ref-11
 Could someone explain nonparametric logistic regression based on locally
 weighted scatterplot smoothing (lowess)?
 Or it is nonparametric regression based on locally weighted scatterplot
 smoothing (lowess)


 One can use a logistic link and a local likelihood. Loader describes the
 advantages of such a strategy and shows a worked example in pages 60-65 of
 her text Local Regression and Likelihood.  But there is no apparent R
 content in this question (and the authors of the above paper said they used
 SAS) so this very much off-topic for this list. You really should start such
 requests for explication by addressing the authors of the paper. Two other
 web-based statistical sites for general or medical statistics questions can
 be found at the GoogleGroups MedStats group and
 http://stats.stackexchange.com/ .

 --
 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Men by nature long to get on to the ultimate truths, and will often
 be impatient with elementary studies or fight shy of them. If it were
 possible to reach the ultimate truths without the elementary studies
 usually prefixed to them, these would not be preparatory studies but
 superfluous diversions.

 -- Maimonides (1135-1204)

 Bert Gunter
 Genentech Nonclinical Biostatistics

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)

2011-06-04 Thread Thomas Lumley
Actually they say they used SAS, and Googling for SAS local linear
logistic suggests they used PROC GAM with the LOESS() smoother.
Probably quite similar to gam::gam().

   -thomas

On Sun, Jun 5, 2011 at 9:12 AM, Thomas Lumley tlum...@uw.edu wrote:
 The Stanform gam()  [gam package] has choices of spline or
 local-polynomial (defaulting to local-linear) smoothers.  That's
 probably the best match for the description.  It  shouldn't be
 necessary to guess -- the paper should have cited the package -- but
 we know that is often missed.

    -thomas

 On Sun, Jun 5, 2011 at 7:43 AM, Bert Gunter gunter.ber...@gene.com wrote:
 Take a look at packages mgcv or gam (and probably others). Different
 smoothers are used, but it's nonlinear, nonparametric logistic
 regression. which is usually the important part. It also penalizes,
 which can be even more important than which smoother is used.

 -- Bert

 On Sat, Jun 4, 2011 at 9:02 AM, David Winsemius dwinsem...@comcast.net 
 wrote:

 On Jun 4, 2011, at 11:41 AM, zhu yao wrote:

 Dear UseRs:

 Recently, I have read an article regarding the association between age and
 lymph node metastases.
 http://jco.ascopubs.org/content/27/18/2931.long
 In statistical analysis, the authors stated Because a nonlinear
 relationship between age and lymph node involvement was expected based on
 existing literature, lymph node involvement was also regressed on age
 using
 nonparametric logistic regression based on locally weighted scatterplot
 smoothing (lowess).
 http://jco.ascopubs.org/content/27/18/2931.long#ref-11
 Could someone explain nonparametric logistic regression based on locally
 weighted scatterplot smoothing (lowess)?
 Or it is nonparametric regression based on locally weighted scatterplot
 smoothing (lowess)


 One can use a logistic link and a local likelihood. Loader describes the
 advantages of such a strategy and shows a worked example in pages 60-65 of
 her text Local Regression and Likelihood.  But there is no apparent R
 content in this question (and the authors of the above paper said they used
 SAS) so this very much off-topic for this list. You really should start such
 requests for explication by addressing the authors of the paper. Two other
 web-based statistical sites for general or medical statistics questions can
 be found at the GoogleGroups MedStats group and
 http://stats.stackexchange.com/ .

 --
 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Men by nature long to get on to the ultimate truths, and will often
 be impatient with elementary studies or fight shy of them. If it were
 possible to reach the ultimate truths without the elementary studies
 usually prefixed to them, these would not be preparatory studies but
 superfluous diversions.

 -- Maimonides (1135-1204)

 Bert Gunter
 Genentech Nonclinical Biostatistics

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Thomas Lumley
 Professor of Biostatistics
 University of Auckland




-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cbind 3 or more matrices

2011-06-04 Thread Jim Silverton
How can I cbind three or more matrices like A,B and C. This does not work:

cbind(A,B,C)


-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cbind 3 or more matrices

2011-06-04 Thread Sarah Goslee
do.call(cbind, list(A, B, C))


On Sat, Jun 4, 2011 at 7:14 PM, Jim Silverton jim.silver...@gmail.com wrote:
 How can I cbind three or more matrices like A,B and C. This does not work:

 cbind(A,B,C)


 --
 Thanks,
 Jim.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cbind 3 or more matrices

2011-06-04 Thread baptiste auguie
A, B, C should have the same number of rows.

mlist = replicate(3, matrix(rnorm(6), 2), simplify=FALSE)
names(mlist) = LETTERS[seq_along(mlist)]
with(mlist, cbind(A,B,C))

or,

do.call(cbind, mlist)

HTH,

baptiste

On 5 June 2011 11:14, Jim Silverton jim.silver...@gmail.com wrote:
 How can I cbind three or more matrices like A,B and C. This does not work:

 cbind(A,B,C)


 --
 Thanks,
 Jim.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cbind 3 or more matrices

2011-06-04 Thread Phil Spector

Jim -
   In what sense does cbind(A,B,C) not work?


A = matrix(rnorm(10),5,2)
B = matrix(rnorm(15),5,3)
C = matrix(rnorm(20),5,4)
cbind(A,B,C)

[,1]   [,2]  [,3] [,4]   [,5][,6]
[1,] -0.54194873 -1.1105170 -0.479010  0.619911163  0.1610162  0.49028633
[2,] -0.39289246  0.0752089  1.427386 -0.921868090 -0.7637016 -0.34905125
[3,] -0.07082828 -0.1060497 -1.007713 -0.003673573 -0.8384406 -0.88816295
[4,]  0.22733701 -1.6134894 -1.993654  2.277865363 -2.3599239 -0.21704046
[5,] -0.13809337  0.3443488 -1.384425  0.132130433  0.1345938 -0.04170648
   [,7]   [,8][,9]
[1,] -1.7481451  0.4467964 -0.41358420
[2,] -0.2882922  1.0243662 -0.48263684
[3,]  0.9402479  0.5467952 -0.01922035
[4,]  0.6795783  1.4560765 -0.23013826
[5,]  0.9800312 -1.3462175 -0.77064872

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Sat, 4 Jun 2011, Jim Silverton wrote:


How can I cbind three or more matrices like A,B and C. This does not work:

cbind(A,B,C)


--
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting Quantile Regression

2011-06-04 Thread Frank Harrell
This is not really an R question, and it indicates that you have a good deal
of studying to do about quantile regression before you rely on it.
Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-Quantile-Regression-tp3574216p3574454.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpreting Quantile Regression

2011-06-04 Thread cachai
Hey there!

In normal Regression if p=alpha, there is no significance.
If i get this in quantile regression (for every tau), can I conclude, that
there is no relationship between x and y?

--
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-Quantile-Regression-tp3574216p3574216.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Partial Matching

2011-06-04 Thread Abraham Mathew
Let's say that I have a string and I want to know if a single word
is present in the string. I've written the following function to see if
the word Geico is mentioned in the string Cheap Geico car insurance.
However, it doesn't work, and I assume it has something to do with the any()
function. Do I need to use regular expressions? (I hope not)

main - function(keyword){
   for( i in keyword ){
n = strsplit(as.character(keyword),  )
print( n )
if( any( n==Geico )){
 print( Yes )
  }
 }
}

main(Cheap Geico car insurance)


I'm running R 2.13 on Ubuntu 10.10


Thanks,
Abraham

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binary response GLM Question

2011-06-04 Thread casperyc
Hi all,

I have a problem with binary response data in GLM fitting.
The problem is that the y take only 1 or 0, and if I use logit link, it is
the log of the odds ratio, which is p/(1-p). In my situation, think y is
p, so sometimes the odds is 0, sometimes it is 1/0, which is (should be)
undefine? I wonder how R fits the glm?

The FULL detail of this exercise is as follow:
--
The data here are concerned with whether people default on a loan taken from
a particular bank and for identical interest rates and for a fixed period.
The information on each individual is their sex (male of female); their
income (in pounds), whether the person is a home owner or not, their age (in
years), and the amount of the loan (in pounds).

The information recorded is whether the individal defaulted on the loan or
not. Study the data and try and understand a relation between the persons
characteristics and defaulting. Specifically, what is your estimated
probability that a female aged 42, who is not a home owner, has an income of
23,500, and took a loan of 12,000, defaults on the loan?

The table holding the data have headings as follows:

m/f: male=1, female=0
age: age in years
home: home=1 is a home owner, home=0 is not a home owner
inc: income
loan: amount of loan
def: default=1, non-default=0.

--

my R code

Q3=read.table(tabl3.dat)
colnames(Q3)=c(Sex,Age,Home,Inc,Loan,Def)
Q3$Sex=as.factor(Q3$Sex)
Q3$Home=as.factor(Q3$Home)
Q3$Def=as.factor(Q3$Def)

Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit))

I dont really get that HOW R actually fits the model? if there is 1/0 that
it has to calculate?
This does give me some results but I dont quite feel right about it.

Now,

if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/
(1+0.5-y) ) as the response, then regress it on the explanntory variables, I
got some estimated probability to be 0.49* (when you transfer the log
odds back to p), whereas the previous model give 0.

Am I wrong in the first place to think that the response is y=default?
How should I approach this?

Thanks!


DATA is attached.

http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat 

--
View this message in context: 
http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Partial Matching

2011-06-04 Thread Gabor Grothendieck
On Sat, Jun 4, 2011 at 6:44 PM, Abraham Mathew abmathe...@gmail.com wrote:
 Let's say that I have a string and I want to know if a single word
 is present in the string. I've written the following function to see if
 the word Geico is mentioned in the string Cheap Geico car insurance.
 However, it doesn't work, and I assume it has something to do with the any()
 function. Do I need to use regular expressions? (I hope not)

 main - function(keyword){
       for( i in keyword ){
            n = strsplit(as.character(keyword),  )
            print( n )
            if( any( n==Geico )){
                 print( Yes )
          }
     }
 }

 main(Cheap Geico car insurance)


strsplit returns a one component list containing the vector of words
so you want to replace the relevant statement with:

n = strsplit(as.character(keyword),  )[[1]]

however, regular expressions is shorter:

 x - c(Cheap Geico car insurance, Cheap Gorilla car insurance, A Geicor 
 car)
 regexpr(\\bGeico\\b, x)  0
[1]  TRUE FALSE FALSE

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Binary response GLM Question

2011-06-04 Thread Joshua Wiley
Hi,

Y is not the same as P.  P is the conditional probability given the
data matrix.  So theoretically, P can take on any value in [0, 1],
which means the odds can be anywhere from [0, +infty], not just 0 or
undefined.  In logistic regression, the logit link is pretty standard,
so I do not think you would need to use the empirical logit link.

I am not sure how much detail you want when you ask how does R fit the
glm.  It uses an iterative algorithm.  If you are willing to spend the
time to work through the code, you can learn a lotjust type:
binomial at the console (no quotes no () after it), the source for the
binomial family will print to the console and you can look through the
logit link code.  That gets passed off to glm() to use to fit the
model.  For a more general explanation of the general process, I would
get a book or look online for information on logistic regression or
maximum liklihood estimation.

Cheers,

Josh

On Sat, Jun 4, 2011 at 6:09 PM, casperyc caspe...@hotmail.co.uk wrote:
 Hi all,

 I have a problem with binary response data in GLM fitting.
 The problem is that the y take only 1 or 0, and if I use logit link, it is
 the log of the odds ratio, which is p/(1-p). In my situation, think y is
 p, so sometimes the odds is 0, sometimes it is 1/0, which is (should be)
 undefine? I wonder how R fits the glm?

 The FULL detail of this exercise is as follow:
 --
 The data here are concerned with whether people default on a loan taken from
 a particular bank and for identical interest rates and for a fixed period.
 The information on each individual is their sex (male of female); their
 income (in pounds), whether the person is a home owner or not, their age (in
 years), and the amount of the loan (in pounds).

 The information recorded is whether the individal defaulted on the loan or
 not. Study the data and try and understand a relation between the persons
 characteristics and defaulting. Specifically, what is your estimated
 probability that a female aged 42, who is not a home owner, has an income of
 23,500, and took a loan of 12,000, defaults on the loan?

 The table holding the data have headings as follows:

 m/f: male=1, female=0
 age: age in years
 home: home=1 is a home owner, home=0 is not a home owner
 inc: income
 loan: amount of loan
 def: default=1, non-default=0.

 --

 my R code

 Q3=read.table(tabl3.dat)
 colnames(Q3)=c(Sex,Age,Home,Inc,Loan,Def)
 Q3$Sex=as.factor(Q3$Sex)
 Q3$Home=as.factor(Q3$Home)
 Q3$Def=as.factor(Q3$Def)

 Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit))

 I dont really get that HOW R actually fits the model? if there is 1/0 that
 it has to calculate?
 This does give me some results but I dont quite feel right about it.

 Now,

 if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/
 (1+0.5-y) ) as the response, then regress it on the explanntory variables, I
 got some estimated probability to be 0.49* (when you transfer the log
 odds back to p), whereas the previous model give 0.

 Am I wrong in the first place to think that the response is y=default?
 How should I approach this?

 Thanks!


 DATA is attached.

 http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to convert a factor column into a numeric one?

2011-06-04 Thread Robert A. LaBudde

I have a data frame:

 head(df)
  Time Temp Conc ReplLog10
10  -20H1 6.406547
22  -20H1 5.738683
37  -20H1 5.796394
4   14  -20H1 4.413691
504H1 6.406547
774H1 5.705433
 str(df)
'data.frame':   177 obs. of  5 variables:
 $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ...
 $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ...
 $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ...
 $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
 levels(df$Temp)
[1] -20 4   25  45
 levels(df$Time)
[1] 0  2  7  14

As you can see, Time and Temp are currently factors, not numeric.

I would like to change these columns into numerical ones.

df$Time- as.numeric(df$Time)

doesn't work, as it changes to the factor level indices (1,2,3,4) 
instead of the values (0,2,7,14).


There must be a direct way of doing this in R.

I tried recode() in 'car':

 df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
 head(df)
  Time Temp Conc Repl Freq
10  -20H1 6.406547
22  -20H1 5.738683
37  -20H1 5.796394
4   14  -20H1 4.413691
50   45H1 6.406547
77   45H1 5.705433

but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, 
as expected, although the result is numeric. The same happens if I 
use the order given by levels(df$Temp) instead of the sort order in 
the recode() 2nd argument.


Any hints?

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert a factor column into a numeric one?

2011-06-04 Thread Dennis Murphy
Hi:

Try this:

 dd - data.frame(a = factor(rep(1:5, each = 4)),
+  b = factor(rep(rep(1:2, each = 2), 5)),
+  y = rnorm(20))
 str(dd)
'data.frame':   20 obs. of  3 variables:
 $ a: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 2 2 2 2 3 3 ...
 $ b: Factor w/ 2 levels 1,2: 1 1 2 2 1 1 2 2 1 1 ...
 $ y: num  0.6396 1.467 1.8403 -0.0915 0.2711 ...
 de - within(dd, {
+  a - as.numeric(as.character(a))
+  b - as.numeric(as.character(b))
+} )
 str(de)
'data.frame':   20 obs. of  3 variables:
 $ a: num  1 1 1 1 2 2 2 2 3 3 ...
 $ b: num  1 1 2 2 1 1 2 2 1 1 ...
 $ y: num  0.6396 1.467 1.8403 -0.0915 0.2711 ...


HTH,
Dennis

On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde r...@lcfltd.com wrote:
 I have a data frame:

 head(df)
  Time Temp Conc Repl    Log10
 1    0  -20    H    1 6.406547
 2    2  -20    H    1 5.738683
 3    7  -20    H    1 5.796394
 4   14  -20    H    1 4.413691
 5    0    4    H    1 6.406547
 7    7    4    H    1 5.705433
 str(df)
 'data.frame':   177 obs. of  5 variables:
  $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ...
  $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ...
  $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ...
  $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
 levels(df$Temp)
 [1] -20 4   25  45
 levels(df$Time)
 [1] 0  2  7  14

 As you can see, Time and Temp are currently factors, not numeric.

 I would like to change these columns into numerical ones.

 df$Time- as.numeric(df$Time)

 doesn't work, as it changes to the factor level indices (1,2,3,4) instead of
 the values (0,2,7,14).

 There must be a direct way of doing this in R.

 I tried recode() in 'car':

 df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
 head(df)
  Time Temp Conc Repl     Freq
 1    0  -20    H    1 6.406547
 2    2  -20    H    1 5.738683
 3    7  -20    H    1 5.796394
 4   14  -20    H    1 4.413691
 5    0   45    H    1 6.406547
 7    7   45    H    1 5.705433

 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as
 expected, although the result is numeric. The same happens if I use the
 order given by levels(df$Temp) instead of the sort order in the recode() 2nd
 argument.

 Any hints?
 
 Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
 Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
 824 Timberlake Drive                     Tel: 757-467-0954
 Virginia Beach, VA 23464-3239            Fax: 757-467-2947

 Vere scire est per causas scire

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert a factor column into a numeric one?

2011-06-04 Thread Jorge Ivan Velez
Dr. LaBudde,

Perhaps

as.numeric(as.character(x))

is what you are looking for.

HTH,
Jorge


On Sun, Jun 5, 2011 at 12:31 AM, Robert A. LaBudde  wrote:

 I have a data frame:

  head(df)
  Time Temp Conc ReplLog10
 10  -20H1 6.406547
 22  -20H1 5.738683
 37  -20H1 5.796394
 4   14  -20H1 4.413691
 504H1 6.406547
 774H1 5.705433
  str(df)
 'data.frame':   177 obs. of  5 variables:
  $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ...
  $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ...
  $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ...
  $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
  levels(df$Temp)
 [1] -20 4   25  45
  levels(df$Time)
 [1] 0  2  7  14

 As you can see, Time and Temp are currently factors, not numeric.

 I would like to change these columns into numerical ones.

 df$Time- as.numeric(df$Time)

 doesn't work, as it changes to the factor level indices (1,2,3,4) instead
 of the values (0,2,7,14).

 There must be a direct way of doing this in R.

 I tried recode() in 'car':

  df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
  head(df)
  Time Temp Conc Repl Freq
 10  -20H1 6.406547
 22  -20H1 5.738683
 37  -20H1 5.796394
 4   14  -20H1 4.413691
 50   45H1 6.406547
 77   45H1 5.705433

 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as
 expected, although the result is numeric. The same happens if I use the
 order given by levels(df$Temp) instead of the sort order in the recode() 2nd
 argument.

 Any hints?
 
 Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
 Least Cost Formulations, Ltd.URL: http://lcfltd.com/
 824 Timberlake Drive Tel: 757-467-0954
 Virginia Beach, VA 23464-3239Fax: 757-467-2947

 Vere scire est per causas scire

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert a factor column into a numeric one?

2011-06-04 Thread Joshua Wiley
Hi Robert,

Try this:

## Example data converting mtcars to factors
testdf - as.data.frame(lapply(mtcars, factor))
str(testdf)

## taking advantage of assignment methods to avoid an explicit call to
as.data.frame
## convert factor to numeric using the technique recommended in ?factor
testdf[] - lapply(testdf, function(x)
  as.numeric(levels(x))[x])
str(testdf)


If you do not want to convert all columns, just use a subset.  Here is one way:

testdf[, c(mpg, cyl, disp)] -
  lapply(testdf[, c(mpg, cyl, disp)],
  function(x) as.numeric(levels(x))[x])

I would also look into *why* those numeric columns are being stored as
factors in the first place.  If you are reading the data in with
read.table() or one of its wrapper functions (like read.csv), then it
would be better to preempt the storage as a factor altogether rather
than converting back to numeric.  For example, perhaps something is
being used to indicate missing data that R does not recognize (e.g.,
SAS uses .).  Specifying na.strings = ., would fix this.  See
?read.table for some of the options available.

Hope this helps,

Josh

On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde r...@lcfltd.com wrote:
 I have a data frame:

 head(df)
  Time Temp Conc Repl    Log10
 1    0  -20    H    1 6.406547
 2    2  -20    H    1 5.738683
 3    7  -20    H    1 5.796394
 4   14  -20    H    1 4.413691
 5    0    4    H    1 6.406547
 7    7    4    H    1 5.705433
 str(df)
 'data.frame':   177 obs. of  5 variables:
  $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ...
  $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ...
  $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ...
  $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
 levels(df$Temp)
 [1] -20 4   25  45
 levels(df$Time)
 [1] 0  2  7  14

 As you can see, Time and Temp are currently factors, not numeric.

 I would like to change these columns into numerical ones.

 df$Time- as.numeric(df$Time)

 doesn't work, as it changes to the factor level indices (1,2,3,4) instead of
 the values (0,2,7,14).

 There must be a direct way of doing this in R.

 I tried recode() in 'car':

 df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
 head(df)
  Time Temp Conc Repl     Freq
 1    0  -20    H    1 6.406547
 2    2  -20    H    1 5.738683
 3    7  -20    H    1 5.796394
 4   14  -20    H    1 4.413691
 5    0   45    H    1 6.406547
 7    7   45    H    1 5.705433

 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as
 expected, although the result is numeric. The same happens if I use the
 order given by levels(df$Temp) instead of the sort order in the recode() 2nd
 argument.

 Any hints?
 
 Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
 Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
 824 Timberlake Drive                     Tel: 757-467-0954
 Virginia Beach, VA 23464-3239            Fax: 757-467-2947

 Vere scire est per causas scire

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert a factor column into a numeric one?

2011-06-04 Thread Robert A LaBudde

Exactly! Thanks.

At 12:49 AM 6/5/2011, Jorge Ivan Velez wrote:

Dr. LaBudde,

Perhaps

as.numeric(as.character(x))

is what you are looking for.

HTH,
Jorge


On Sun, Jun 5, 2011 at 12:31 AM, Robert A. LaBudde  wrote:
I have a data frame:

 head(df)
 Time Temp Conc ReplLog10
10  -20H1 6.406547
22  -20H1 5.738683
37  -20H1 5.796394
4   14  -20H1 4.413691
504H1 6.406547
774H1 5.705433
 str(df)
'data.frame':   177 obs. of  5 variables:
 $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ...
 $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ...
 $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ...
 $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
 levels(df$Temp)
[1] -20 4   25  45
 levels(df$Time)
[1] 0  2  7  14

As you can see, Time and Temp are currently factors, not numeric.

I would like to change these columns into numerical ones.

df$Time- as.numeric(df$Time)

doesn't work, as it changes to the factor level indices (1,2,3,4) 
instead of the values (0,2,7,14).


There must be a direct way of doing this in R.

I tried recode() in 'car':

 df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
 head(df)
 Time Temp Conc Repl Freq
10  -20H1 6.406547
22  -20H1 5.738683
37  -20H1 5.796394
4   14  -20H1 4.413691
50   45H1 6.406547
77   45H1 5.705433

but note that the values for 'Temp' in rows 5 and 7 are 45 and not 
4, as expected, although the result is numeric. The same happens if 
I use the order given by levels(df$Temp) instead of the sort order 
in the recode() 2nd argument.


Any hints?

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: 
mailto:r...@lcfltd.comr...@lcfltd.com
Least Cost Formulations, Ltd.URL: 
http://lcfltd.com/http://lcfltd.com/

824 Timberlake Drive Tel: tel:757-467-0954757-467-0954
Virginia Beach, VA 23464-3239Fax: tel:757-467-2947757-467-2947

Vere scire est per causas scire

__
mailto:R-help@r-project.orgR-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.htmlhttp://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert a factor column into a numeric one?

2011-06-04 Thread Robert A LaBudde

Thanks for your help.

As far as your question below is concerned, the data frame arose as a 
result of some data cleaning on an original data frame, which was 
changed into a table, modified, and changed back to a data frame:


ttcrmean- as.table(by(ngbe[,'Log10'], 
list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate),

  mean))
for (k in 1:3) {  #fix-up time zeroes
  for (l in 1:5) { #replicates
t0val- ttcrmean[1,3,k,l]
for (j in 1:4) {  #temps
  ttcrmean[1,j,k,l]- t0val
} #j
  } #l
} #i
df- na.omit(as.data.frame(ttcrmean))
colnames(df)[5]- 'Log10'


At 12:51 AM 6/5/2011, Joshua Wiley wrote:

Hi Robert,
snip
I would also look into *why* those numeric columns are being stored as
factors in the first place.  If you are reading the data in with
read.table() or one of its wrapper functions (like read.csv), then it
would be better to preempt the storage as a factor altogether rather
than converting back to numeric.  For example, perhaps something is
being used to indicate missing data that R does not recognize (e.g.,
SAS uses .).  Specifying na.strings = ., would fix this.  See
?read.table for some of the options available.
snip




Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert a factor column into a numeric one?

2011-06-04 Thread Robert A LaBudde

Thanks! Exactly what I wanted, as the same as Jorge also suggested.

At 12:49 AM 6/5/2011, Dennis Murphy wrote:

Hi:

Try this:

 dd - data.frame(a = factor(rep(1:5, each = 4)),
+  b = factor(rep(rep(1:2, each = 2), 5)),
+  y = rnorm(20))
 str(dd)
'data.frame':   20 obs. of  3 variables:
 $ a: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 2 2 2 2 3 3 ...
 $ b: Factor w/ 2 levels 1,2: 1 1 2 2 1 1 2 2 1 1 ...
 $ y: num  0.6396 1.467 1.8403 -0.0915 0.2711 ...
 de - within(dd, {
+  a - as.numeric(as.character(a))
+  b - as.numeric(as.character(b))
+} )
 str(de)
'data.frame':   20 obs. of  3 variables:
 $ a: num  1 1 1 1 2 2 2 2 3 3 ...
 $ b: num  1 1 2 2 1 1 2 2 1 1 ...
 $ y: num  0.6396 1.467 1.8403 -0.0915 0.2711 ...


HTH,
Dennis

On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde r...@lcfltd.com wrote:
 I have a data frame:

 head(df)
  Time Temp Conc ReplLog10
 10  -20H1 6.406547
 22  -20H1 5.738683
 37  -20H1 5.796394
 4   14  -20H1 4.413691
 504H1 6.406547
 774H1 5.705433
 str(df)
 'data.frame':   177 obs. of  5 variables:
  $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ...
  $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ...
  $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ...
  $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Log10: num  6.41 5.74 5.8 4.41 6.41 ...
 levels(df$Temp)
 [1] -20 4   25  45
 levels(df$Time)
 [1] 0  2  7  14

 As you can see, Time and Temp are currently factors, not numeric.

 I would like to change these columns into numerical ones.

 df$Time- as.numeric(df$Time)

 doesn't work, as it changes to the factor level indices (1,2,3,4) 
instead of

 the values (0,2,7,14).

 There must be a direct way of doing this in R.

 I tried recode() in 'car':

 df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE)
 head(df)
  Time Temp Conc Repl Freq
 10  -20H1 6.406547
 22  -20H1 5.738683
 37  -20H1 5.796394
 4   14  -20H1 4.413691
 50   45H1 6.406547
 77   45H1 5.705433

 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as
 expected, although the result is numeric. The same happens if I use the
 order given by levels(df$Temp) instead of the sort order in the 
recode() 2nd

 argument.

 Any hints?
 
 Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
 Least Cost Formulations, Ltd.URL: http://lcfltd.com/
 824 Timberlake Drive Tel: 757-467-0954
 Virginia Beach, VA 23464-3239Fax: 757-467-2947

 Vere scire est per causas scire

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.