date:20130813

[R] [optim/bbmle] function returns NA at ... distance from x

2013-08-13 Thread Carlos Nasher

Dear R helpers,

I try to find the model parameters using mle2 (bbmle package). As I try to
optimize the likelihood function the following error message occurs:

Error in grad.default(objectivefunction, coef) :
  function returns NA at
1e-040.001013016911639890.0003166929388711890.000935163594829395 distance
from x.
In addition: Warning message:
In optimx(par = c(0.5, 10, 0.7, 10), fn = function (p)  :
  Gradient not computable after method Nelder-Mead

I can't figure out what that means exactly and how to fix it. I understand
that mle2 uses optim (or in my case optimx) to optimize the likelihood
function. As I use the Nelder-Mead method it should not be a problem if the
function returns NA at any iteration (as long as the initial values don't
return NA). Can anyone help me with that?

Here a small example of my code that reproduces the problem:

library(plyr)
library(optimx)

### Sample data ###
x - c(1,1,4,2,3,0,1,6,0,0)
tx - c(30.14, 5.14, 24.43, 10.57, 25.71, 0.00, 14.14, 32.86, 0.00, 0.00)
T - c(32.57, 29.14, 33.57, 34.71, 27.71, 38.14, 36.57, 37.71, 35.86, 30.57)
data - data.frame(x=x, tx=tx, T=T)

### Likelihood function ###
Likelihood - function(data, r, alpha, s, beta) {
  with(data, {
if (r=0 | alpha=0 | s=0 | beta=0) return (NaN)
f - function(x, tx, T)
{
  g - function(y)
(y + alpha)^(-( r + x))*(y + beta)^(-(s + 1))
  integrate(g, tx, T)$value
}
integral - mdply(data, f)
L -
exp(lgamma(r+x)-lgamma(r)+r*(log(alpha)-log(alpha+T))-x*log(alpha+T)+s*(log(beta)-log(beta+T)))+exp(lgamma(r+x)-lgamma(r)+r*log(alpha)+log(s)+s*log(beta)+log(integral$V1))
f - sum(log(L))
return (f)
  })
}

### ML estimation function ###
Estimate_parameters_MLE - function(data, initValues) {
  llhd - function(r, alpha, s, beta) {
return (Likelihood(data, r, alpha, s, beta))
  }
  library(bbmle)
  fit - mle2(llhd, initValues, skip.hessian=TRUE, optimizer=optimx,
method=Nelder-Mead, control=list(maxit=1e8))
  return (fit)
}

### Parameter estimation ###
Likelihood(data=data, r=0.5, alpha=10, s=0.7, beta=10) ### check initial
parameters -- -72.75183 -- initial parameters do return value
MLE_estimation - Estimate_parameters_MLE(data=data, list(r=0.5, alpha=10,
s=0.7, beta=10))

'Error in grad.default(objectivefunction, coef) :
  function returns NA at
1e-040.001013016911639890.0003166929388711890.000935163594829395 distance
from x.
In addition: Warning message:
  In optimx(par = c(0.5, 10, 0.7, 10), fn = function (p)  :
  Gradient not computable after method Nelder-Mead'


Best regards,
Carlos

-
Carlos Nasher
Buchenstr. 12
22299 Hamburg

tel:+49 (0)40 67952962
mobil:+49 (0)175 9386725
mail:  carlos.nas...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How-to add to LDA ggplot axes the Percentage of variance explained

2013-08-13 Thread Lluis

Hi,

How can I add  to LDA ggplot axes the Percentages of variance explained?

Script:
/require(MASS)
require(ggplot2)
iris.lda-lda(Species ~ Sepal.Length + Sepal.Width + Petal.Length +
Petal.Width,  data = iris)
datPred-data.frame(Species=predict(iris.lda)$class,predict(iris.lda)$x)

 
ggplot(datPred, aes(x=LD1, y=LD2, col=Species) ) + geom_point( size = 4,
aes(color = Species))
/  
Thanks 



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-add-to-LDA-ggplot-axes-the-Percentage-of-variance-explained-tp4673603.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rmpi installs before R-3.0.0 but not since

2013-08-13 Thread Patrick Connolly


With R-3.0.1

Loading required package: Rmpi
Failed with error:  ‘package ‘Rmpi’ was built before R 3.0.0: please re-install 
it’


And when I try to reinstall Rmpi, I get this

after a whole bunch of 'yes's

checking mpi.h usability... no
checking mpi.h presence... no
checking for mpi.h... no
configure: error: Cannot find mpi.h header file
ERROR: configuration failed for package ‘Rmpi’


And attempting to go back

With R-2.15.3, I get these warnings:

Warning messages:
1: package ‘gbm’ was built under R version 3.0.1 
2: package ‘survival’ was built under R version 3.0.1 
[1] Error in socketConnection(\localhost\, port = port, server = TRUE, 
blocking = TRUE,  : \n  cannot open the connection\n

So there's no going back.  

Where do I look for reasons why Rmpi can't find mpi.h header even
though it was findable before 3.0.1?  I know not to take that message
too literally since there is a file mpi.h that bash can find.
Something else is being hinted at but what?



 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_NZ.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_NZ.UTF-8LC_COLLATE=en_NZ.UTF-8
 [5] LC_MONETARY=en_NZ.UTF-8LC_MESSAGES=en_NZ.UTF-8   
 [7] LC_PAPER=C LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  splines   grDevices utils stats graphics  methods  
[8] base 

other attached packages:
[1] gbm_2.1  survival_2.37-4  cairoDevice_2.19 lattice_0.20-15 

loaded via a namespace (and not attached):
[1] grid_3.0.1  multicore_0.1-7
 


TIA

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rmpi installs before R-3.0.0 but not since

2013-08-13 Thread Pascal Oettli

Hello,

Maybe this link might help:
http://www.stats.uwo.ca/faculty/yu/Rmpi/install.htm

Regards,
Pascal



2013/8/13 Patrick Connolly p_conno...@slingshot.co.nz


 With R-3.0.1

 Loading required package: Rmpi
 Failed with error:  package Rmpi was built before R 3.0.0: please
 re-install it


 And when I try to reinstall Rmpi, I get this

 after a whole bunch of 'yes's
 
 checking mpi.h usability... no
 checking mpi.h presence... no
 checking for mpi.h... no
 configure: error: Cannot find mpi.h header file
 ERROR: configuration failed for package Rmpi


 And attempting to go back

 With R-2.15.3, I get these warnings:

 Warning messages:
 1: package gbm was built under R version 3.0.1
 2: package survival was built under R version 3.0.1
 [1] Error in socketConnection(\localhost\, port = port, server = TRUE,
 blocking = TRUE,  : \n  cannot open the connection\n

 So there's no going back.

 Where do I look for reasons why Rmpi can't find mpi.h header even
 though it was findable before 3.0.1?  I know not to take that message
 too literally since there is a file mpi.h that bash can find.
 Something else is being hinted at but what?



  sessionInfo()
 R version 3.0.1 (2013-05-16)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_NZ.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_NZ.UTF-8LC_COLLATE=en_NZ.UTF-8
  [5] LC_MONETARY=en_NZ.UTF-8LC_MESSAGES=en_NZ.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] parallel  splines   grDevices utils stats graphics  methods
 [8] base

 other attached packages:
 [1] gbm_2.1  survival_2.37-4  cairoDevice_2.19 lattice_0.20-15

 loaded via a namespace (and not attached):
 [1] grid_3.0.1  multicore_0.1-7
 


 TIA

 --
 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___Patrick Connolly
  {~._.~}   Great minds discuss ideas
  _( Y )_ Average minds discuss events
 (:_~*~_:)  Small minds discuss people
  (_)-(_)  . Eleanor Roosevelt

 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Lme4 and syntax of random factors

2013-08-13 Thread Robert U

Dear
R-users,

Iâve been
looking at the lmer function (lme4 package) in order to set up a mixed linear 
model
and something about the syntax of the random effects eludes me. Iâd like a 
hand
with understanding a specific point, if someone does master this functionâ¦ 


Letâs say
that I have 2 random effects, A (e.g. species, k=2) and B (e.g. individuals,
n=100). I made some research about model syntax, and I have the understanding
that everything at the left side of the random âparameterâ is about SLOPE 
and
everything at the right side about intercept : 


â¦ + (1 |B)
would give me an intercept per individual.
â¦ + (1 |A)
would give me an intercept per species.Â 
â¦ + (1 |A:B)
would give me an intercept per individuals with nested effect (individual
inside species). 


I would
like to have random slopes per species. So I thought I could do something like
that :

â¦ + (A |B) so
to have an intercept per individual and a slope value per species. Graphically,
I would therefore obtain 100 lines with 100 different intercepts and 2 possible
slopes (1 per species). However, when I extract random parameter values
(ranef()), I have :

Â·Â Â Â Â Â Â Â Â  First
column is the intercept : varying values per line (individuals), so OK

Â·Â Â Â Â Â Â Â Â  2nd and 3rd column are Species 1 and 2: I have different 
across
individuals (without obvious pattern: I do not have similar values for 
individual
of the same species), which is not what I was expecting (1 value per species : 
the
slope parameter). 


Is the
mistake Iâm doing (or in my understanding of lme4) obvious to somebody? 


With
regards
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hungarian R User's Group (Gergely Dar?czi)

2013-08-13 Thread Pancho Mulongeni

Hi Gergely!
This sounds so exciting - so R is turning 20 years old? How did you set up your 
R users' group? What are the best practices in going about to set one up?
I would be keen on establishing one here in Windhoek, Namibia.


Pancho Mulongeni
Research Assistant
PharmAccess Foundation
1 Fouché Street
Windhoek West
Windhoek
Namibia
 
Tel:   +264 61 419 000
Fax:  +264 61 419 001/2
Mob: +264 81 4456 286

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] F-test question

2013-08-13 Thread Ingo Wardinski


G'day
I try do compute some F-statistics of a singular spectrum analysis of a 
timeseries sv

I run:
 require(Rssa)
 s - ssa(sv)
 summary(sv)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 -4.238   2.761   6.594   6.324  10.410  15.180
 r1  - reconstruct(s,groups = list(1:5))
 r2  - reconstruct(s,groups = list(1:6))
 SSE_M1 - sum(residuals(r1)^2)
 SSE_M2 - sum(residuals(r2)^2)
 df.num - r1$df - r2$df
 df.den - r2$df
  F - ((SSE_M2 - SSE_M1) / df.num) / (SSE_M1 / df.den)
and eventually
 p.value - 1 - pf(F, df.num, df.den)
Error in pf(F, df.num, df.den) :
  Non-numeric argument to mathematical function

 summary(df.num)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.

 summary(df.den)
Length  Class   Mode
 0   NULL   NULL
 summary(F)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.

I need to compute the p.value, but something is going wrong, and I can't 
see what.

Any help would be very much appreciated
ingo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hungarian R User's Group (Gergely Dar?czi)

2013-08-13 Thread Gergely Daróczi

Hi Pancho,

there are already a bunch of R User Groups around the world:
http://rwiki.sciviews.org/doku.php?id=rugs:r_user_groups

The Revolution Analytics guys has posted some tips on how to found one (
http://www.revolutionanalytics.com/news-events/r-user-group/how-to-start-r-user-group.php)
and they also offer some sponsorship.

What I did here was fairly simple: I wrote a few mails to some some of my
contacts who speak R, also posted some messages on mailing lists and forums
that we should come together. I already got some great feedback, so
hopefully there will be an active Hungarian RUG soon, where we can have
talks, workshops and tutorials on R related topics, or simply to get to
know some other useRs and their field of interest. So it does sounds
exciting indeed, and we will see about the results soon.

And yes: AFAIK R was announced exactly 20 years ago, so that's a great time
to found RUG(s) :)

Best,
Gergely



On 13 August 2013 13:29, Pancho Mulongeni 
p.mulong...@namibia.pharmaccess.org wrote:

 Hi Gergely!
 This sounds so exciting - so R is turning 20 years old? How did you set up
 your R users' group? What are the best practices in going about to set one
 up?
 I would be keen on establishing one here in Windhoek, Namibia.


 Pancho Mulongeni
 Research Assistant
 PharmAccess Foundation
 1 Fouché Street
 Windhoek West
 Windhoek
 Namibia

 Tel:   +264 61 419 000
 Fax:  +264 61 419 001/2
 Mob: +264 81 4456 286





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] F-test question

2013-08-13 Thread Pascal Oettli

Hello,

r1$df and r2$df don't exist.

Regards,
Pascal



2013/8/13 Ingo Wardinski i...@gfz-potsdam.de

 G'day
 I try do compute some F-statistics of a singular spectrum analysis of a
 timeseries sv
 I run:
  require(Rssa)
  s - ssa(sv)
  summary(sv)
Min. 1st Qu.  MedianMean 3rd Qu.Max.
  -4.238   2.761   6.594   6.324  10.410  15.180
  r1  - reconstruct(s,groups = list(1:5))
  r2  - reconstruct(s,groups = list(1:6))
  SSE_M1 - sum(residuals(r1)^2)
  SSE_M2 - sum(residuals(r2)^2)
  df.num - r1$df - r2$df
  df.den - r2$df
   F - ((SSE_M2 - SSE_M1) / df.num) / (SSE_M1 / df.den)
 and eventually
  p.value - 1 - pf(F, df.num, df.den)
 Error in pf(F, df.num, df.den) :
   Non-numeric argument to mathematical function

  summary(df.num)
Min. 1st Qu.  MedianMean 3rd Qu.Max.

  summary(df.den)
 Length  Class   Mode
  0   NULL   NULL
  summary(F)
Min. 1st Qu.  MedianMean 3rd Qu.Max.

 I need to compute the p.value, but something is going wrong, and I can't
 see what.
 Any help would be very much appreciated
 ingo

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [optim/bbmle] function returns NA at

2013-08-13 Thread Prof J C Nash (U30A)

1) Why use Nelder-Mead with optimx when it is an optim() function. You 
are going from New York to Philadelphia via Beijing because of the extra 
overhead. The NM method is there for convenience in comparisons.


2) NM cannot work with NA when it wants to compute the centroid of 
points and search direction. So you've got to find a way to make sure 
your likelihood is properly defined. This seems to be the issue for 
about 90% of failures with optim(x) or other ML methods in my recent 
experience. Note that returning a large value (and make it a good deal 
smaller than the .Machine$double.xmax, say that number *1e-6 to avoid 
computation troubles) often works, but it is a quick and dirty fix.


JN


On 13-08-13 06:00 AM, r-help-requ...@r-project.org wrote:

Message: 36
Date: Tue, 13 Aug 2013 10:38:05 +0200
From: Carlos Nashercarlos.nas...@googlemail.com
To:r-help@r-project.org
Subject: [R] [optim/bbmle] function returns NA at ... distance from x
Message-ID:
CAP=bvwpxj991fbyt9ou5x1jf9nol3vtq1svtjvw82jwfjyz...@mail.gmail.com
Content-Type: text/plain

Dear R helpers,

I try to find the model parameters using mle2 (bbmle package). As I try to
optimize the likelihood function the following error message occurs:

Error in grad.default(objectivefunction, coef) :
   function returns NA at
1e-040.001013016911639890.0003166929388711890.000935163594829395 distance
from x.
In addition: Warning message:
In optimx(par = c(0.5, 10, 0.7, 10), fn = function (p)  :
   Gradient not computable after method Nelder-Mead

I can't figure out what that means exactly and how to fix it. I understand
that mle2 uses optim (or in my case optimx) to optimize the likelihood
function. As I use the Nelder-Mead method it should not be a problem if the
function returns NA at any iteration (as long as the initial values don't
return NA). Can anyone help me with that?

Here a small example of my code that reproduces the problem:

library(plyr)
library(optimx)

### Sample data ###
x - c(1,1,4,2,3,0,1,6,0,0)
tx - c(30.14, 5.14, 24.43, 10.57, 25.71, 0.00, 14.14, 32.86, 0.00, 0.00)
T - c(32.57, 29.14, 33.57, 34.71, 27.71, 38.14, 36.57, 37.71, 35.86, 30.57)
data - data.frame(x=x, tx=tx, T=T)

### Likelihood function ###
Likelihood - function(data, r, alpha, s, beta) {
   with(data, {
 if (r=0 | alpha=0 | s=0 | beta=0) return (NaN)
 f - function(x, tx, T)
 {
   g - function(y)
 (y + alpha)^(-( r + x))*(y + beta)^(-(s + 1))
   integrate(g, tx, T)$value
 }
 integral - mdply(data, f)
 L -
exp(lgamma(r+x)-lgamma(r)+r*(log(alpha)-log(alpha+T))-x*log(alpha+T)+s*(log(beta)-log(beta+T)))+exp(lgamma(r+x)-lgamma(r)+r*log(alpha)+log(s)+s*log(beta)+log(integral$V1))
 f - sum(log(L))
 return (f)
   })
}

### ML estimation function ###
Estimate_parameters_MLE - function(data, initValues) {
   llhd - function(r, alpha, s, beta) {
 return (Likelihood(data, r, alpha, s, beta))
   }
   library(bbmle)
   fit - mle2(llhd, initValues, skip.hessian=TRUE, optimizer=optimx,
method=Nelder-Mead, control=list(maxit=1e8))
   return (fit)
}

### Parameter estimation ###
Likelihood(data=data, r=0.5, alpha=10, s=0.7, beta=10) ### check initial
parameters -- -72.75183 -- initial parameters do return value
MLE_estimation - Estimate_parameters_MLE(data=data, list(r=0.5, alpha=10,
s=0.7, beta=10))

'Error in grad.default(objectivefunction, coef) :
   function returns NA at
1e-040.001013016911639890.0003166929388711890.000935163594829395 distance
from x.
In addition: Warning message:
   In optimx(par = c(0.5, 10, 0.7, 10), fn = function (p)  :
   Gradient not computable after method Nelder-Mead'


Best regards,
Carlos

-
Carlos Nasher
Buchenstr. 12
22299 Hamburg

tel:+49 (0)40 67952962
mobil:+49 (0)175 9386725
mail:carlos.nas...@gmail.com

[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Understanding S4 method dispatch

2013-08-13 Thread Hadley Wickham

Hi all,

Any insight into the code below would be appreciated - I don't
understand why two methods which I think should have equal distance
from the call don't.

Thanks!

Hadley

# Create simple class hierarchy
setClass(A, NULL)
setClass(B, A)

a - new(A)
b - new(B)

setGeneric(f, function(x, y) standardGeneric(f))
setMethod(f, signature(A, A), function(x, y) A-A)
setMethod(f, signature(B, B), function(x, y) B-B)

# These work as I expect
f(a, a)
f(b, b)

setClass(AB, contains = c(A, B))
ab - new(AB)

# Why does this return B-B? Shouldn't both methods be an equal distance?
f(ab, ab)

# These both return distance 1, as I expected
extends(AB, A, fullInfo=TRUE)@distance
extends(AB, B, fullInfo=TRUE)@distance
# So why is signature(B, B) closer than signature(A, A)

-- 
Chief Scientist, RStudio
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxph diagnostics

2013-08-13 Thread Terry Therneau


That's the primary reason for the plot: so that you can look and think.

The test statistic is based on whether a LS line fit to the plot has zero slope.  For 
larger data sets you can sometimes have a significant p-value but good agreement with 
proportional hazards.  It's much like an example from Lincoln Moses' begining statistics 
book (now out of print, so rephrasing from memory).

   Suppose that you flip a coin 10,000 times and get 5101 heads.  What can you 
say?
   a. The coin is not perfectly fair (p.05).  b. But it is darn close to 
perfect! 
As a referee I would be comfortable using that coin to start a football game.

The Cox model gives an average hazard ratio, averaged over time.  When proportional 
hazards holds that value is a complete summary-- nothing else is needed.When it does 
not hold, the average may still be useful, or not, depending on the degree of change over 
time.


Terry Therneau



On 08/13/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Thanks to Bert and G?ran for your responses.

To answer G?ran's comment, yes I did plot the Schoenfeld residuals using
plot.cox.zph and the lines look horizontal (slope = 0) to me, which makes
me think that it contradicts the results of cox.zph.

What alternatives do I have if I assume proportional assumption of coxph
does not hold?

Thanks!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Faster R algorithms than AlgDesign?

2013-08-13 Thread Dimitri Liakhovitski

Hello!

I have a very large experimental design space (all possible combinations of
all possible levels of several factors). For example, 'allcand' below has
1,875,000 possible combinations of 9 factors.

allcand-expand.grid(a1=as.factor(1:5),a2=as.factor(1:5),a3=as.factor(1:5),a4=as.factor(1:5),
a5=as.factor(1:5),a6=as.factor(1:3),a7=as.factor(1:5),a8=as.factor(1:5),a9=as.factor(1:8))
dim(allcand)

My ultimate goal is to grab  a subset of 10,000 out of those 1.875 million
candidates, such that the resulting 10,000 are as orthogonal as possible.

Usually, I use package AlgDesign for such tasks. However, my design space
is so large that it is taking too long even to grab 100 out 1,875,000 -
like that:

library(AlgDesign)
system.time(mydes-optFederov(~.,data=allcand,nTrials=100))

# It took me on my machine 38 min.

Is there a package that could do something like this faster?

Thank you very much!


-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Understanding S4 method dispatch

2013-08-13 Thread Simon Zehnder

Hadley,

The class AB inherits from A and from B, but B already inherits from class A. 
So actually you only have an object of class B in your object of class AB. When 
you call the function f R looks for a method f for AB objects. It does not find 
such a method and looks for a method of the object inherited from, B. Such a 
method is present and is then executed. 

The inheritance structure has to be changed. The behavior is actually desired, 
as if this behavior weren't given a diamond class inheritance would be fatal. 


Best

Simon



On Aug 13, 2013, at 3:08 PM, Hadley Wickham h.wick...@gmail.com wrote:

 Hi all,
 
 Any insight into the code below would be appreciated - I don't
 understand why two methods which I think should have equal distance
 from the call don't.
 
 Thanks!
 
 Hadley
 
 # Create simple class hierarchy
 setClass(A, NULL)
 setClass(B, A)
 
 a - new(A)
 b - new(B)
 
 setGeneric(f, function(x, y) standardGeneric(f))
 setMethod(f, signature(A, A), function(x, y) A-A)
 setMethod(f, signature(B, B), function(x, y) B-B)
 
 # These work as I expect
 f(a, a)
 f(b, b)
 
 setClass(AB, contains = c(A, B))
 ab - new(AB)
 
 # Why does this return B-B? Shouldn't both methods be an equal distance?
 f(ab, ab)
 
 # These both return distance 1, as I expected
 extends(AB, A, fullInfo=TRUE)@distance
 extends(AB, B, fullInfo=TRUE)@distance
 # So why is signature(B, B) closer than signature(A, A)
 
 -- 
 Chief Scientist, RStudio
 http://had.co.nz/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lme4 and syntax of random factors

2013-08-13 Thread Ben Bolker

Robert U tacsunday at yahoo.fr writes:

 
 Dear
 R-users,
 

[snip]

  This question probably belongs on r-sig-mixed-mod...@r-project.org .
Followups there, please.

 Let's say that I have 2 random effects, A (e.g. species, k=2) and B
 (e.g. individuals, n=100). I made some research about model syntax,
 and I have the understanding that everything at the left side of the
 random parameter is about SLOPE and everything at the right
 side about intercept :

  You really can't practically fit a random effect to 2 species (see 
http://glmm.wikidot.com/faq#fixed_vs_random
 
  + (1 |B)
 would give me an intercept per individual.
  
  + (1 |A)
 would give me an intercept per species.

  yes

  + (1 |A:B)
 would give me an intercept per individuals with nested effect (individual
 inside species)

  This would be the same as (1|B) if the individuals are uniquely
identified.  Otherwise you probably want (1|A/B) [except that you
can't really fit a random effect for k=2, as discussed above]

  I would like to have random slopes per species. So I thought I
 could do something like that :

  Probably not feasible.
 
  + (A |B) so to have an intercept per individual and a slope value
 per species. Graphically, I would therefore obtain 100 lines with
 100 different intercepts and 2 possible slopes (1 per
 species). However, when I extract random parameter values (ranef()),
 I have :

  what variable is your slope with respect to?  Suppose it's time.
Then I would recommend

 ~ A*time + (1|A:B)

which will fit a (FIXED effect) interaction between species and time
(different slopes and intercepts for each species), and a random
intercept per individual.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making Sure your matrices are even

2013-08-13 Thread David Carlson

Try this

S2 - data.frame(Group=rep(S, length(S)), Cat=factor(S))
B2 - data.frame(Group=rep(B, length(B)), Cat=factor(B))
V2 - data.frame(Group=rep(V, length(V)), Cat=factor(V))
table(rbind(S2, B2, V2))

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Docbanks84
Sent: Monday, August 12, 2013 5:31 PM
To: r-help@r-project.org
Subject: [R] Making Sure your matrices are even

Hi,

I am trying to do a chi sqaure on a set of values, and my
different groups
are not even. Is there away to add arbetrary symbols or #s to
make the
matrices even? Or do I need to do a different type of pvalue
analysis?

 S-1:86
 B-1:15
 V-1:45
 table(S)
S
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
85 86 
 1  1 
 chisq.test(table(S,B,V))
Error in table(S, B, V) : all arguments must have the same
length



--
View this message in context:
http://r.789695.n4.nabble.com/Making-Sure-your-matrices-are-even
-tp4673598.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Understanding S4 method dispatch

2013-08-13 Thread Hadley Wickham

 The class AB inherits from A and from B, but B already inherits from class A. 
 So actually you only have an object of class B in your object of class AB. 
 When you call the function f R looks for a method f for AB objects. It does 
 not find such a method and looks for a method of the object inherited from, 
 B. Such a method is present and is then executed.

 The inheritance structure has to be changed. The behavior is actually 
 desired, as if this behavior weren't given a diamond class inheritance would 
 be fatal.

Are you sure? That behaviour doesn't agree with the description of
method dispatch given in ?Methods, not with getClass(AB) which shows
that AB inherits from both A and B. (I totally agree that this is a
bad idea, and unlikely to be useful in real life, but I'm trying to
understand the details of S4 dispatch)

 getClass(AB)
Class AB [in .GlobalEnv]

Slots:

Name:  .xData
Class:   NULL

Extends:
Class B, directly
Class A, directly
Class .NULL, by class A, distance 2
Class NULL, by class A, distance 3, with explicit coerce
Class OptionalFunction, by class A, distance 4, with explicit coerce
Class optionalMethod, by class A, distance 4, with explicit coerce

-- 
Chief Scientist, RStudio
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxph diagnostics

2013-08-13 Thread Soumitro Dey

Thank you for your response, Terry.

To put the discussion into perspective, my data set is quite large with
over 160,000 samples and 38 variables. The event is true for all samples in
this dataset. The distribution is zero-inflated (i.e. most events occur at
time = 0).

The result of the cox.zph looks like this:

 cox.zph(coxph1)  rhochisqp
agency1  -1.05e-02 9.06e+00 2.62e-03
agency2   -5.48e-03 2.47e+00 1.16e-01
agency3   -6.47e-03 3.45e+00 6.34e-02
agency4   -6.86e-03 3.87e+00 4.90e-02
agency5   -5.56e-03 2.54e+00 1.11e-01
agency6   -6.79e-03 3.79e+00 5.16e-02
agency7   -4.78e-03 1.88e+00 1.71e-01
agency8   -1.34e-02 1.48e+01 1.22e-04
agency9   -2.78e-03 6.34e-01 4.26e-01
agency10  -6.15e-03 3.11e+00 7.78e-02
agency11   4.82e-04 1.91e-02 8.90e-01
agency12  -4.38e-03 1.58e+00 2.09e-01
agency13  -1.02e-03 8.54e-02 7.70e-01
agency14  -5.44e-03 2.43e+00 1.19e-01
agency15   1.01e-02 8.41e+00 3.73e-03
agency16  -1.81e-03 2.70e-01 6.04e-01
agency17  -3.14e-03 8.12e-01 3.67e-01
agency18  -6.59e-03 3.57e+00 5.88e-02
agency19   1.60e-03 2.12e-01 6.46e-01
agency20  -1.24e-02 1.27e+01 3.74e-04
agency21  -9.02e-03 6.69e+00 9.68e-03
agency22  -5.84e-03 2.81e+00 9.38e-02
agency23   3.99e-03 1.31e+00 2.52e-01
agency24  -9.18e-03 6.93e+00 8.50e-03
agency25  -4.75e-03 1.86e+00 1.73e-01
category1 -1.31e-02 1.43e+01 1.60e-04
category2  1.34e-04 1.47e-03 9.69e-01
category3  7.61e-03 4.75e+00 2.92e-02
category4 -6.65e-03 3.69e+00 5.48e-02
category5 -7.78e-03 4.97e+00 2.58e-02
category6 -8.64e-03 6.12e+00 1.34e-02
fav_count  1.32e-02 1.46e+01 1.32e-04
fow_count -1.83e-02 2.50e+01 5.70e-07
fri_count  9.20e-03 6.89e+00 8.67e-03
stat_count 1.01e-02 9.08e+00 2.58e-03
ht 1.37e-02 1.53e+01 9.08e-05
ul  1.36e-02 1.52e+01 9.67e-05
um  -1.12e-02 1.04e+01 1.24e-03
pos -5.92e-04 2.90e-02 8.65e-01
neg  6.44e-03 3.39e+00 6.56e-02
acti 2.24e-03 4.12e-01 5.21e-01
anat 3.48e-03 9.96e-01 3.18e-01
chemi   -7.82e-03 5.04e+00 2.47e-02
conc 7.04e-05 4.08e-04 9.84e-01
devi-1.34e-03 1.48e-01 7.01e-01
diso-3.60e-03 1.06e+00 3.04e-01
gene 1.31e-03 1.41e-01 7.07e-01
geog 4.64e-03 1.78e+00 1.82e-01
livb-1.19e-02 1.17e+01 6.24e-04
objc 3.87e-03 1.23e+00 2.67e-01
occu 6.06e-04 3.04e-02 8.62e-01
orga-8.24e-04 5.63e-02 8.12e-01
phen 3.87e-03 1.23e+00 2.68e-01
phys-1.94e-03 3.12e-01 5.77e-01
proc 2.23e-03 4.11e-01 5.22e-01
GLOBAL NA 4.20e+02 0.00e+00


The slope of the plot.cox.zph is perfectly 0 for all variables with narrow
confidence bands.

I probably should have put this details in the first post but it would have
been too long. Sorry about that.

Based on the plot of Schoenfeld residuals and Terry's explanation is it
safe to say that proportional hazards assumption holds despite the
significant global p-values?

Thanks!


On Tue, Aug 13, 2013 at 9:16 AM, Terry Therneau thern...@mayo.edu wrote:

 That's the primary reason for the plot: so that you can look and think.

 The test statistic is based on whether a LS line fit to the plot has zero
 slope.  For larger data sets you can sometimes have a significant p-value
 but good agreement with proportional hazards.  It's much like an example
 from Lincoln Moses' begining statistics book (now out of print, so
 rephrasing from memory).
Suppose that you flip a coin 10,000 times and get 5101 heads.  What
 can you say?
a. The coin is not perfectly fair (p.05).  b. But it is darn close
 to perfect! 
 As a referee I would be comfortable using that coin to start a football
 game.

 The Cox model gives an average hazard ratio, averaged over time.  When
 proportional hazards holds that value is a complete summary-- nothing else
 is needed.When it does not hold, the average may still be useful, or
 not, depending on the degree of change over time.

 Terry Therneau



 On 08/13/2013 05:00 AM, r-help-requ...@r-project.org wrote:

 Thanks to Bert and G?ran for your responses.

 To answer G?ran's comment, yes I did plot the Schoenfeld residuals using

 plot.cox.zph and the lines look horizontal (slope = 0) to me, which makes
 me think that it contradicts the results of cox.zph.

 What alternatives do I have if I assume proportional assumption of coxph
 does not hold?

 Thanks!



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

Re: [R] pulling out pairs from data frame

2013-08-13 Thread Kripa R

Oops! Ok So I have this file:
 
SampleName Individual Age Gender
1 4 80 M
2 15 56 F
3 1 75 F
4 15 56 F
5 2 58 F
6 4 80 M

And I want to pull out paired samples, so the resulting file would look 
something like this:
SampleName Individual Age Gender
1 4 80 M
2 15 56 F
4 15 56 F
6 4 80 M  

.kripa
 
 Date: Mon, 12 Aug 2013 18:36:08 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org
 
 Hi,
 The question is not clear so not sure this is what you wanted.
 
 dat1- read.table(text=
 SameName áIndividual áAge Gender
 1 4 á80 áM á
 2 15 á56 F
 3 1 75 áF
 4 15 á56 áF
 5 á2 á58 áF
 6 4 á80 áM
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
 reps-c(4,15)á
 
 ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
 ádat1
 # áSameName Individual Age Gender Newcol
 #1 á á á á1 á á á á á4 á80 á á áM á á á1
 #2 á á á á2 á á á á 15 á56 á á áF á á á1
 #3 á á á á3 á á á á á1 á75 á á áF á á á0
 #4 á á á á4 á á á á 15 á56 á á áF á á á1
 #5 á á á á5 á á á á á2 á58 á á áF á á á0
 #6 á á á á6 á á á á á4 á80 á á áM á á á1
 A.K.á
 
 
 
 
 - Original Message -
 From: Kripa R kripa...@hotmail.com
 To: r-help@r-project.org r-help@r-project.org
 Cc: 
 Sent: Monday, August 12, 2013 6:59 PM
 Subject: [R] pulling out pairs from data frame
 
 Hello everyone, 
 I'm having trouble pulling out paired samples from a data set... I have the 
 following:
 
 reps-c(4,15) #the variable reps is a list of all paired samples
 data
 
 
 
 
 
 á 
 á SameName
 á 
 á 
 á Individual
 á 
 á 
 á Age 
 á 
 á 
 á Gender
 á 
 
 
 á 
 á 1
 á 
 á 
 á 4
 á 
 á 
 á 80
 á 
 á 
 á M
 á 
 
 
 á 
 á 2
 á 
 á 
 á 15
 á 
 á 
 á 56
 á 
 á 
 á F
 á 
 
 
 á 
 á 3
 á 
 á 
 á 1
 á 
 á 
 á 75
 á 
 á 
 á F
 á 
 
 
 á 
 á 4
 á 
 á 
 á 15
 á 
 á 
 á 56
 á 
 á 
 á F
 á 
 
 
 á 
 á 5
 á 
 á 
 á 2
 á 
 á 
 á 58
 á 
 á 
 á F
 á 
 
 
 á 
 á 6
 á 
 á 
 á 4
 á 
 á 
 á 80
 á 
 á 
 á M
 á 
 
 
 
 
 I'd like to make a new variable with only the samples that have pairs. Any 
 suggestions would be greatly appreciated
 
 Thanks!
 
 
 
 
 
 .kripa
 ááá ááá  ááá á  ááá ááá á 
 ááá [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Create rows for columns in dataframe

2013-08-13 Thread Dark

Hi experts,

I have a dataframe with 100k+ records. it has a key/id column and 25 code
columns. I would like to restructure it having a row for each code column.

I have a structure like this (used dput):
structure(list(DSYSRTKY = structure(c(1L, 2L, 3L, 3L, 4L, 4L), .Names =
c(1, 
2, 3, 4, 5, 6), .Label = c(10005, 10203, 
10315, 10327), class = factor), C1 = structure(c(6L, 
3L, 2L, 5L, 1L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(41401, 
42831, 45341, 486, 5990, 71535), class = factor), 
C2 = structure(c(5L, 1L, 3L, 6L, 4L, 2L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(4019, 51881, 5990, 
6826, 78900, V4986), class = factor), C3 = structure(c(6L, 
3L, 5L, 2L, 4L, 1L), .Names = c(1, 2, 3, 4, 5, 
6), .Label = c(5119, 5939, 72400, 7850, 8052, 
V1251), class = factor), C4 = structure(c(6L, 5L, 3L, 
1L, 2L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(3109, 
4019, 4241, 42789, V1011, V454), class = factor), 
C5 = structure(c(1L, 1L, 3L, 1L, 2L, 4L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 2720, 4019, 
7823), class = factor), C6 = structure(c(1L, 1L, 2L, 
1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
311, 41400, 49390), class = factor), C7 = structure(c(1L, 
1L, 2L, 1L, 3L, 4L), .Names = c(1, 2, 3, 4, 5, 
6), .Label = c(, 2724, 2859, V4581), class = factor), 
C8 = structure(c(1L, 1L, 3L, 1L, 4L, 2L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 40390, 71680, 
79029), class = factor), C9 = structure(c(1L, 1L, 2L, 
1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
4168, 5859, V1582), class = factor), C10 = structure(c(1L, 
1L, 3L, 1L, 1L, 2L), .Names = c(1, 2, 3, 4, 5, 
6), .Label = c(, 49390, 7804), class = factor), 
C11 = structure(c(1L, 1L, 3L, 1L, 1L, 2L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 2724, V066), class =
factor), 
C12 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 6930), class = factor), 
C13 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 41400), class = factor), 
C14 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, V4581), class = factor), 
C15 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 40291), class = factor), 
C16 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = c(, 4280), class = factor), 
C17 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C18 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C19 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C20 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C21 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C22 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C23 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C24 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor), 
C25 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
2, 3, 4, 5, 6), .Label = , class = factor)), .Names =
c(DSYSRTKY, 
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, 
C11, C12, C13, C14, C15, C16, C17, C18, C19, 
C20, C21, C22, C23, C24, C25), row.names = c(1, 
2, 3, 4, 5, 6), class = data.frame)

Now I want to restructure this dataframe not having 25 code fields but a row
for each code but only if the code has a value!

The new structure should look something like:
NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(),
CODE=character(),  PRIMAIRY=logical())

The ID column should just be an increment. PRIMAIRY is a boolean which
should be true if orriginally was the first code (C1).

It has to be efficient since my real data has many more rows than my example
structure of only 6 rows.
I tried some looping mechanism and it was working but it was not performing
at all.

Hopefully I provided enough information using dput.

Regards Derk




--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Lattice: bwplot - changing box colors in legend and plot when using panel.groups = function... and panel = panel.superpose

2013-08-13 Thread Anna Zakrisson Braeunlich

Hi,

Yes, I have searched stack overflow.

My issue is to simply change coloring in boxes and legend in my bwplot. I have 
done this many times in lattice, but now I have been tweaking the plot somewhat 
and I can no longer apply the color changes.
I would really appreciate some help.
A. Zakrisson

Here is some dummy data and my script:

mydata- data.frame(factor1 = factor(rep(LETTERS[1:6], each = 80)),
factor2 = factor(rep(c(1:2), each = 16)),
var1 = rnorm(120, mean = rep(c(0, 3, 5), each = 40),
 sd = rep(c(1, 2, 3), each = 20)))

font.settings - list( font = 1, cex = 1, fontfamily = serif)
my.theme - list(
  box.umbrella = list(col = black),
  box.rectangle = list(fill= rep(c(black, black),2)),
  box.dot = list(col = black, pch = 3, cex=2),
  plot.symbol   = list(cex = 1, col = 1, pch= 0), #outlier size and color
  par.xlab.text = font.settings,
  par.ylab.text = font.settings,
  axis.text = font.settings,
  par.sub=font.settings)

bwplot(var1 ~ factor1, data = mydata, groups = factor2,
   box.width = 1/3,#width of the boxes
   auto.key = list(points = FALSE,
   rectangles = TRUE, space = right,
   title=Year, cex.title=1),
   panel = panel.superpose,
   ylab = var1,
   xlab=factor1,
   par.settings = my.theme,
   panel.groups = function(x, y, ..., group.number) {
 panel.bwplot(x + (group.number-1.8)/3, y, ...)
   })


Anna Zakrisson Braeunlich
PhD student

Department of Ecology, Environment and Plant Sciences
Stockholm University
Svante Arrheniusv. 21A
SE-106 91 Stockholm
Sweden/Sverige

Lives in Berlin.
For paper mail:
Katzbachstr. 21
D-10965, Berlin - Kreuzberg
Germany/Deutschland

E-mail: anna.zakris...@su.se
Tel work: +49-(0)3091541281
Mobile: +49-(0)15777374888
LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

º`. .  `. . `. . º`. .  `. . `. .º`. .  `. . 
`. .º

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Basic problem in R

2013-08-13 Thread Sinne Smed

I am teaching a summercourse this and the next week where the students are 
using R.



We have downloaded and use the new version R.3.0. It has worked perfectly until 
today where some of the basic functions have started NOT to work.



Examples are sd() and lm ()



The message we get is Error: could not find function lm or Error: could not 
find function sd



Have you ever encountered that. If yes what can I do about it. Is it a basic 
error in the new R version? I have used R for teaching in 5 years now and hae 
ever encountered a problem like that??



Thanks, Sinne


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making Sure your matrices are even

2013-08-13 Thread Docbanks84

Thank you David.

I had to sort the data afterwards for it to work:

S-1:86
B-1:15
V-1:45
S2 - data.frame(Group=rep(S, length(S)), Cat=factor(S))
B2 - data.frame(Group=rep(B, length(B)), Cat=factor(B))
V2 - data.frame(Group=rep(V, length(V)), Cat=factor(V))
table(rbind(S2, B2, V2))
y-table(rbind(S2,B2,V2))
sort.list(y)
chisq.test(table(y))

This resulted in a chisquare of S2 v. B2 v. V2.



--
View this message in context: 
http://r.789695.n4.nabble.com/Making-Sure-your-matrices-are-even-tp4673598p4673626.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Understanding S4 method dispatch

2013-08-13 Thread Simon Zehnder

If you take an example which works with slots,

setClass(A, representation(a = numeric)
setClass(B, contains = c(A), representation(b = numeric))
a - new(A, a = 2)
b - new(B, a = 3, b = 2)

setClass(AB, contains = c(A, B))
new(AB, a = 2, b = 3)

You see, that there is only one @a slot, the one inherited from B, that B 
inherits from A. If this were not the case, which slot should be taken, if we 
would call @a? To avoid this kind of ambiguity, only one A class is inherited 
to AB: the one B already inherits from A. 

You could create a class, that contains another A object in a slot:

setClass(AandB, contains = c(B), representation(A = A))
new(AandB, a = 2, b = 3, A = new(A, a = 3))

Now back to your example: as there is only one A object inside the B object 
which is contained by the AB object, method dispatch works the way as it 
should: It looks for a method f for an AB object. It does not find one. Then it 
looks for a method f for the contained B object (as this is the only one 
contained in AB) and it finds a method. Then it calls this method on the B part 
of the object AB and the result is B-B

Best

Simon



On Aug 13, 2013, at 4:24 PM, Hadley Wickham h.wick...@gmail.com wrote:

 The class AB inherits from A and from B, but B already inherits from class 
 A. So actually you only have an object of class B in your object of class 
 AB. When you call the function f R looks for a method f for AB objects. It 
 does not find such a method and looks for a method of the object inherited 
 from, B. Such a method is present and is then executed.
 
 The inheritance structure has to be changed. The behavior is actually 
 desired, as if this behavior weren't given a diamond class inheritance would 
 be fatal.
 
 Are you sure? That behaviour doesn't agree with the description of
 method dispatch given in ?Methods, not with getClass(AB) which shows
 that AB inherits from both A and B. (I totally agree that this is a
 bad idea, and unlikely to be useful in real life, but I'm trying to
 understand the details of S4 dispatch)
 
 getClass(AB)
 Class AB [in .GlobalEnv]
 
 Slots:
 
 Name:  .xData
 Class:   NULL
 
 Extends:
 Class B, directly
 Class A, directly
 Class .NULL, by class A, distance 2
 Class NULL, by class A, distance 3, with explicit coerce
 Class OptionalFunction, by class A, distance 4, with explicit coerce
 Class optionalMethod, by class A, distance 4, with explicit coerce
 
 -- 
 Chief Scientist, RStudio
 http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pulling out pairs from data frame

2013-08-13 Thread arun



Hi,
The conditions are still not clear.  


dat2- dat1[dat1$Individual%in% reps,]
dat2
#  SameName Individual Age Gender
#1    1  4  80  M
#2    2 15  56  F
#4    4 15  56  F
#6    6  4  80  M
A.K.


From: Kripa R kripa...@hotmail.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Tuesday, August 13, 2013 10:56 AM
Subject: RE: [R] pulling out pairs from data frame




Oops! Ok So I have this file:

SampleName Individual Age Gender
1 4 80 M
2 15 56 F
3 1 75 F
4 15 56 F
5 2 58 F
6 4 80 M

And I want to pull out paired samples, so the resulting file would look 
something like this:
SampleName Individual Age Gender
1 4 80 M
2 15 56 F
4 15 56 F
6 4 80 M  

.kripa


 Date: Mon, 12 Aug 2013 18:36:08 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org
 
 Hi,
 The question is not clear so not sure this is what you wanted.
 
 dat1- read.table(text=
 SameName áIndividual áAge Gender
 1 4 á80 áM á
 2 15 á56 F
 3 1 75 áF
 4 15 á56 áF
 5 á2 á58 áF
 6 4 á80 áM
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
 reps-c(4,15)á
 
 ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
 ádat1
 # áSameName Individual Age Gender Newcol
 #1 á á á á1 á á á á á4 á80 á á áM á á á1
 #2 á á á á2 á á á á 15 á56 á á áF á á á1
 #3 á á á á3 á á á á á1 á75 á á áF á á á0
 #4 á á á á4 á á á á 15 á56 á á áF á á á1
 #5 á á á á5 á á á á á2 á58 á á áF á á á0
 #6 á á á á6 á á á á á4 á80 á á áM á á á1
 A.K.á
 
 
 
 
 - Original Message -
 From: Kripa R kripa...@hotmail.com
 To: r-help@r-project.org r-help@r-project.org
 Cc: 
 Sent: Monday, August 12, 2013 6:59 PM
 Subject: [R] pulling out pairs from data frame
 
 Hello everyone, 
 I'm having trouble pulling out paired samples from a data set... I have the 
 following:
 
 reps-c(4,15) #the variable reps is a list of all paired samples
 data
 
 
 
 
 
 á 
 á SameName
 á 
 á 
 á Individual
 á 
 á 
 á Age 
 á 
 á 
 á Gender
 á 
 
 
 á 
 á 1
 á 
 á 
 á 4
 á 
 á 
 á 80
 á 
 á 
 á M
 á 
 
 
 á 
 á 2
 á 
 á 
 á 15
 á 
 á 
 á 56
 á 
 á 
 á F
 á 
 
 
 á 
 á 3
 á 
 á 
 á 1
 á 
 á 
 á 75
 á 
 á 
 á F
 á 
 
 
 á 
 á 4
 á 
 á 
 á 15
 á 
 á 
 á 56
 á 
 á 
 á F
 á 
 
 
 á 
 á 5
 á 
 á 
 á 2
 á 
 á 
 á 58
 á 
 á 
 á F
 á 
 
 
 á 
 á 6
 á 
 á 
 á 4
 á 
 á 
 á 80
 á 
 á 
 á M
 á 
 
 
 
 
 I'd like to make a new variable with only the samples that have pairs. Any 
 suggestions would be greatly appreciated
 
 Thanks!
 
 
 
 
 
 .kripa
 ááá ááá  ááá á  ááá ááá á 
 ááá [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pulling out pairs from data frame

2013-08-13 Thread Kripa R

I manipulated the code you sent and it works perfectly, thanks! 

.kripa
 
 Date: Tue, 13 Aug 2013 08:10:53 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org
 
 
 
 Hi,
 The conditions are still not clear.  
 
 
 dat2- dat1[dat1$Individual%in% reps,]
 dat2
 #  SameName Individual Age Gender
 #11  4  80  M
 #22 15  56  F
 #44 15  56  F
 #66  4  80  M
 A.K.
 
 
 From: Kripa R kripa...@hotmail.com
 To: arun smartpink...@yahoo.com 
 Cc: R help r-help@r-project.org 
 Sent: Tuesday, August 13, 2013 10:56 AM
 Subject: RE: [R] pulling out pairs from data frame
 
 
 
 
 Oops! Ok So I have this file:
 
 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 3 1 75 F
 4 15 56 F
 5 2 58 F
 6 4 80 M
 
 And I want to pull out paired samples, so the resulting file would look 
 something like this:
 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M  
 
 .kripa
 
 
  Date: Mon, 12 Aug 2013 18:36:08 -0700
  From: smartpink...@yahoo.com
  Subject: Re: [R] pulling out pairs from data frame
  To: kripa...@hotmail.com
  CC: r-help@r-project.org
  
  Hi,
  The question is not clear so not sure this is what you wanted.
  
  dat1- read.table(text=
  SameName áIndividual áAge Gender
  1 4 á80 áM á
  2 15 á56 F
  3 1 75 áF
  4 15 á56 áF
  5 á2 á58 áF
  6 4 á80 áM
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
  reps-c(4,15)á
  
  ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
  ádat1
  # áSameName Individual Age Gender Newcol
  #1 á á á á1 á á á á á4 á80 á á áM á á á1
  #2 á á á á2 á á á á 15 á56 á á áF á á á1
  #3 á á á á3 á á á á á1 á75 á á áF á á á0
  #4 á á á á4 á á á á 15 á56 á á áF á á á1
  #5 á á á á5 á á á á á2 á58 á á áF á á á0
  #6 á á á á6 á á á á á4 á80 á á áM á á á1
  A.K.á
  
  
  
  
  - Original Message -
  From: Kripa R kripa...@hotmail.com
  To: r-help@r-project.org r-help@r-project.org
  Cc: 
  Sent: Monday, August 12, 2013 6:59 PM
  Subject: [R] pulling out pairs from data frame
  
  Hello everyone, 
  I'm having trouble pulling out paired samples from a data set... I have the 
  following:
  
  reps-c(4,15) #the variable reps is a list of all paired samples
  data
  
  
  
  
  
  á 
  á SameName
  á 
  á 
  á Individual
  á 
  á 
  á Age 
  á 
  á 
  á Gender
  á 
  
  
  á 
  á 1
  á 
  á 
  á 4
  á 
  á 
  á 80
  á 
  á 
  á M
  á 
  
  
  á 
  á 2
  á 
  á 
  á 15
  á 
  á 
  á 56
  á 
  á 
  á F
  á 
  
  
  á 
  á 3
  á 
  á 
  á 1
  á 
  á 
  á 75
  á 
  á 
  á F
  á 
  
  
  á 
  á 4
  á 
  á 
  á 15
  á 
  á 
  á 56
  á 
  á 
  á F
  á 
  
  
  á 
  á 5
  á 
  á 
  á 2
  á 
  á 
  á 58
  á 
  á 
  á F
  á 
  
  
  á 
  á 6
  á 
  á 
  á 4
  á 
  á 
  á 80
  á 
  á 
  á M
  á 
  
  
  
  
  I'd like to make a new variable with only the samples that have pairs. Any 
  suggestions would be greatly appreciated
  
  Thanks!
  
  
  
  
  
  .kripa
  ááá ááá  ááá á  ááá ááá á 
  ááá [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Outliers and overdispersion

2013-08-13 Thread Marta Lomas




Hi  again,  

I have a question on some outliers that I have in my response variable (wich 
are bird counts). At the beginning I did not drop them
 out because they are part of the normal counts and I considered them 
ecologically correct. 

However, I 
tried some of the same models without ouliers and the AICs are thus better. I
 also have nice significances this way...


So would you say that, even though the outliers are right 
observations and taking into consideration that already the negative binomial 
distribution that I am using is accounting for the some of the overdispersion 
due to the outliers, it is
 better to drop them out as the models fit better this way? 


Thanks for your patience!

:)





  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create rows for columns in dataframe

2013-08-13 Thread arun

HI,

Your desired output is not clear.  May be this helps:
#dat1 is the dataset
dat1$ID- 1:nrow(dat1)
library(reshape2)
res1-melt(dat1,id.vars=c(ID,DSYSRTKY))
res1$value-res1$value!=
res1[,2]- as.integer(as.character(res1[,2]))
 res1[,3]-as.character(res1[,3])
 colnames(res1)[3:4]-c(CODE,PRIMARY)
head(res1)
#  ID  DSYSRTKY CODE PRIMARY
#1  1 10005   C1    TRUE
#2  2 10203   C1    TRUE
#3  3 10315   C1    TRUE
#4  4 10315   C1    TRUE
#5  5 10327   C1    TRUE
#6  6 10327   C1    TRUE


A.K.



- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 5:46 AM
Subject: [R] Create rows for columns in dataframe

Hi experts,

I have a dataframe with 100k+ records. it has a key/id column and 25 code
columns. I would like to restructure it having a row for each code column.

I have a structure like this (used dput):
structure(list(DSYSRTKY = structure(c(1L, 2L, 3L, 3L, 4L, 4L), .Names =
c(1, 
2, 3, 4, 5, 6), .Label = c(10005, 10203, 
10315, 10327), class = factor), C1 = structure(c(6L, 
3L, 2L, 5L, 1L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(41401, 
42831, 45341, 486, 5990, 71535), class = factor), 
    C2 = structure(c(5L, 1L, 3L, 6L, 4L, 2L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(4019, 51881, 5990, 
    6826, 78900, V4986), class = factor), C3 = structure(c(6L, 
    3L, 5L, 2L, 4L, 1L), .Names = c(1, 2, 3, 4, 5, 
    6), .Label = c(5119, 5939, 72400, 7850, 8052, 
    V1251), class = factor), C4 = structure(c(6L, 5L, 3L, 
    1L, 2L, 4L), .Names = c(1, 2, 3, 4, 5, 6), .Label =
c(3109, 
    4019, 4241, 42789, V1011, V454), class = factor), 
    C5 = structure(c(1L, 1L, 3L, 1L, 2L, 4L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 2720, 4019, 
    7823), class = factor), C6 = structure(c(1L, 1L, 2L, 
    1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
    311, 41400, 49390), class = factor), C7 = structure(c(1L, 
    1L, 2L, 1L, 3L, 4L), .Names = c(1, 2, 3, 4, 5, 
    6), .Label = c(, 2724, 2859, V4581), class = factor), 
    C8 = structure(c(1L, 1L, 3L, 1L, 4L, 2L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 40390, 71680, 
    79029), class = factor), C9 = structure(c(1L, 1L, 2L, 
    1L, 4L, 3L), .Names = c(1, 2, 3, 4, 5, 6), .Label = c(, 
    4168, 5859, V1582), class = factor), C10 = structure(c(1L, 
    1L, 3L, 1L, 1L, 2L), .Names = c(1, 2, 3, 4, 5, 
    6), .Label = c(, 49390, 7804), class = factor), 
    C11 = structure(c(1L, 1L, 3L, 1L, 1L, 2L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 2724, V066), class =
factor), 
    C12 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 6930), class = factor), 
    C13 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 41400), class = factor), 
    C14 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, V4581), class = factor), 
    C15 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 40291), class = factor), 
    C16 = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = c(, 4280), class = factor), 
    C17 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C18 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C19 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C20 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C21 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C22 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C23 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C24 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor), 
    C25 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Names = c(1, 
    2, 3, 4, 5, 6), .Label = , class = factor)), .Names =
c(DSYSRTKY, 
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, 
C11, C12, C13, C14, C15, C16, C17, C18, C19, 
C20, C21, C22, C23, C24, C25), row.names = c(1, 
2, 3, 4, 5, 6), class = data.frame)

Now I want to restructure this dataframe not having 25 code fields but a row
for each code but only if the code has a value!

The new structure should look something like:
NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(),
CODE=character(),  PRIMAIRY=logical())

The ID column should just be an increment. PRIMAIRY is a boolean which
should be true if orriginally was the first code (C1).

It has to be efficient since my real data has many more rows than my example
structure of only 6 rows.
I tried some looping mechanism and it was working but it was not performing
at all.

Re: [R] Outliers and overdispersion

2013-08-13 Thread Marta Lomas

Thanks for your interest and prompt answer!

What I try to estimate is the correlation of one bird species counts with a set 
of environmental parameters. The count data are zero-inflated and 
overdispersed. I am modeling with hurdle-negative binomial-mixed effects.
The results are very difficult to interpret and it get easier dropping out 3 
outliers. But I do not know if I should do this..
Thanks!
Marta


 Subject: Re: [R] Outliers and overdispersion
 From: szehn...@uni-bonn.de
 Date: Tue, 13 Aug 2013 17:41:10 +0200
 CC: r-help@r-project.org
 To: lomasv...@hotmail.com
 
 I do not know what you are exactly estimating, but if it is about count 
 models and the model fit gets better when you drop the outliers, it does not 
 say, that the model is now more correct. It just says, if the data were 
 without the outliers, this model would fit good. 
 
 Overdispersion in count data is sometimes a cue, that you have a mixture 
 distribution as the generating process - for example instead of one, K 
 different (sub)species of birds which were aggregated in the count data. In 
 this case a mixture (negative binomial)- distribution with K components could 
 fit the data better. 
 
 
 Best
 
 Simon
  
 On Aug 13, 2013, at 5:28 PM, Marta Lomas lomasv...@hotmail.com wrote:
 
  
  
  
  Hi  again,  
  
  I have a question on some outliers that I have in my response variable 
  (wich are bird counts). At the beginning I did not drop them
  out because they are part of the normal counts and I considered them 
  ecologically correct. 
  
  However, I 
  tried some of the same models without ouliers and the AICs are thus better. 
  I
  also have nice significances this way...
  
  
  So would you say that, even though the outliers are right 
  observations and taking into consideration that already the negative 
  binomial 
  distribution that I am using is accounting for the some of the 
  overdispersion due to the outliers, it is
  better to drop them out as the models fit better this way? 
  
  
  Thanks for your patience!
  
  :)
  
  
  
  
  

  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outliers and overdispersion

2013-08-13 Thread Simon Zehnder

I do not know what you are exactly estimating, but if it is about count models 
and the model fit gets better when you drop the outliers, it does not say, that 
the model is now more correct. It just says, if the data were without the 
outliers, this model would fit good. 

Overdispersion in count data is sometimes a cue, that you have a mixture 
distribution as the generating process - for example instead of one, K 
different (sub)species of birds which were aggregated in the count data. In 
this case a mixture (negative binomial)- distribution with K components could 
fit the data better. 


Best

Simon
 
On Aug 13, 2013, at 5:28 PM, Marta Lomas lomasv...@hotmail.com wrote:

 
 
 
 Hi  again,  
 
 I have a question on some outliers that I have in my response variable (wich 
 are bird counts). At the beginning I did not drop them
 out because they are part of the normal counts and I considered them 
 ecologically correct. 
 
 However, I 
 tried some of the same models without ouliers and the AICs are thus better. I
 also have nice significances this way...
 
 
 So would you say that, even though the outliers are right 
 observations and taking into consideration that already the negative binomial 
 distribution that I am using is accounting for the some of the overdispersion 
 due to the outliers, it is
 better to drop them out as the models fit better this way? 
 
 
 Thanks for your patience!
 
 :)
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pulling out pairs from data frame

2013-08-13 Thread Bert Gunter

?duplicated

yourframe[!duplicated(yourframe)$Individual,]

-- Bert

On Tue, Aug 13, 2013 at 8:12 AM, Kripa R kripa...@hotmail.com wrote:
 I manipulated the code you sent and it works perfectly, thanks!

 .kripa

 Date: Tue, 13 Aug 2013 08:10:53 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org



 Hi,
 The conditions are still not clear.


 dat2- dat1[dat1$Individual%in% reps,]
 dat2
 #  SameName Individual Age Gender
 #11  4  80  M
 #22 15  56  F
 #44 15  56  F
 #66  4  80  M
 A.K.

 
 From: Kripa R kripa...@hotmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, August 13, 2013 10:56 AM
 Subject: RE: [R] pulling out pairs from data frame




 Oops! Ok So I have this file:

 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 3 1 75 F
 4 15 56 F
 5 2 58 F
 6 4 80 M

 And I want to pull out paired samples, so the resulting file would look 
 something like this:
 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M

 .kripa


  Date: Mon, 12 Aug 2013 18:36:08 -0700
  From: smartpink...@yahoo.com
  Subject: Re: [R] pulling out pairs from data frame
  To: kripa...@hotmail.com
  CC: r-help@r-project.org
 
  Hi,
  The question is not clear so not sure this is what you wanted.
 
  dat1- read.table(text=
  SameName áIndividual áAge Gender
  1 4 á80 áM á
  2 15 á56 F
  3 1 75 áF
  4 15 á56 áF
  5 á2 á58 áF
  6 4 á80 áM
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
  reps-c(4,15)á
 
  ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
  ádat1
  # áSameName Individual Age Gender Newcol
  #1 á á á á1 á á á á á4 á80 á á áM á á á1
  #2 á á á á2 á á á á 15 á56 á á áF á á á1
  #3 á á á á3 á á á á á1 á75 á á áF á á á0
  #4 á á á á4 á á á á 15 á56 á á áF á á á1
  #5 á á á á5 á á á á á2 á58 á á áF á á á0
  #6 á á á á6 á á á á á4 á80 á á áM á á á1
  A.K.á
 
 
 
 
  - Original Message -
  From: Kripa R kripa...@hotmail.com
  To: r-help@r-project.org r-help@r-project.org
  Cc:
  Sent: Monday, August 12, 2013 6:59 PM
  Subject: [R] pulling out pairs from data frame
 
  Hello everyone,
  I'm having trouble pulling out paired samples from a data set... I have 
  the following:
 
  reps-c(4,15) #the variable reps is a list of all paired samples
  data
 
 
 
 
 
  á
  á SameName
  á
  á
  á Individual
  á
  á
  á Age
  á
  á
  á Gender
  á
 
 
  á
  á 1
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
  á
  á 2
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 3
  á
  á
  á 1
  á
  á
  á 75
  á
  á
  á F
  á
 
 
  á
  á 4
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 5
  á
  á
  á 2
  á
  á
  á 58
  á
  á
  á F
  á
 
 
  á
  á 6
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
 
 
  I'd like to make a new variable with only the samples that have pairs. Any 
  suggestions would be greatly appreciated
 
  Thanks!
 
 
 
 
 
  .kripa
  ááá ááá  ááá á  ááá ááá á
  ááá [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outliers and overdispersion

2013-08-13 Thread Bert Gunter

The central question is: What caused the 3 unusual values? What is
their scientific relevance? Only you can answer that, not us.

-- Bert

On Tue, Aug 13, 2013 at 8:51 AM, Marta Lomas lomasv...@hotmail.com wrote:
Thanks for your interest and prompt answer!

What I try to estimate is the correlation of one bird species counts with a
set of environmental parameters. The count data are zero-inflated and
overdispersed. I am modeling with hurdle-negative binomial-mixed effects.
The results are very difficult to interpret and it get easier dropping out 3
outliers. But I do not know if I should do this..
Thanks!
Marta

Subject: Re: [R] Outliers and overdispersion
From: szehn...@uni-bonn.de
Date: Tue, 13 Aug 2013 17:41:10 +0200
CC: r-help@r-project.org
To: lomasv...@hotmail.com

I do not know what you are exactly estimating, but if it is about count
models and the model fit gets better when you drop the outliers, it does not
say, that the model is now more correct. It just says, if the data were
without the outliers, this model would fit good.

Overdispersion in count data is sometimes a cue, that you have a mixture
distribution as the generating process - for example instead of one, K
different (sub)species of birds which were aggregated in the count data. In
this case a mixture (negative binomial)- distribution with K components
could fit the data better.

Best

Simon

On Aug 13, 2013, at 5:28 PM, Marta Lomas lomasv...@hotmail.com wrote:

Hi again,

I have a question on some outliers that I have in my response variable
(wich are bird counts). At the beginning I did not drop them
out because they are part of the normal counts and I considered them
ecologically correct.

However, I
tried some of the same models without ouliers and the AICs are thus
better. I
also have nice significances this way...

So would you say that, even though the outliers are right
observations and taking into consideration that already the negative
binomial
distribution that I am using is accounting for the some of the
overdispersion due to the outliers, it is
better to drop them out as the models fit better this way?

Thanks for your patience!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pulling out pairs from data frame

2013-08-13 Thread Bert Gunter

Sorry. Typo. Corrected version  is:

 yourframe[!duplicated(yourframe$Individual),]

-- Bert

On Tue, Aug 13, 2013 at 9:05 AM, Bert Gunter bgun...@gene.com wrote:
 ?duplicated

 yourframe[!duplicated(yourframe)$Individual,]

 -- Bert

 On Tue, Aug 13, 2013 at 8:12 AM, Kripa R kripa...@hotmail.com wrote:
 I manipulated the code you sent and it works perfectly, thanks!

 .kripa

 Date: Tue, 13 Aug 2013 08:10:53 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org



 Hi,
 The conditions are still not clear.


 dat2- dat1[dat1$Individual%in% reps,]
 dat2
 #  SameName Individual Age Gender
 #11  4  80  M
 #22 15  56  F
 #44 15  56  F
 #66  4  80  M
 A.K.

 
 From: Kripa R kripa...@hotmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, August 13, 2013 10:56 AM
 Subject: RE: [R] pulling out pairs from data frame




 Oops! Ok So I have this file:

 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 3 1 75 F
 4 15 56 F
 5 2 58 F
 6 4 80 M

 And I want to pull out paired samples, so the resulting file would look 
 something like this:
 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M

 .kripa


  Date: Mon, 12 Aug 2013 18:36:08 -0700
  From: smartpink...@yahoo.com
  Subject: Re: [R] pulling out pairs from data frame
  To: kripa...@hotmail.com
  CC: r-help@r-project.org
 
  Hi,
  The question is not clear so not sure this is what you wanted.
 
  dat1- read.table(text=
  SameName áIndividual áAge Gender
  1 4 á80 áM á
  2 15 á56 F
  3 1 75 áF
  4 15 á56 áF
  5 á2 á58 áF
  6 4 á80 áM
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
  reps-c(4,15)á
 
  ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
  ádat1
  # áSameName Individual Age Gender Newcol
  #1 á á á á1 á á á á á4 á80 á á áM á á á1
  #2 á á á á2 á á á á 15 á56 á á áF á á á1
  #3 á á á á3 á á á á á1 á75 á á áF á á á0
  #4 á á á á4 á á á á 15 á56 á á áF á á á1
  #5 á á á á5 á á á á á2 á58 á á áF á á á0
  #6 á á á á6 á á á á á4 á80 á á áM á á á1
  A.K.á
 
 
 
 
  - Original Message -
  From: Kripa R kripa...@hotmail.com
  To: r-help@r-project.org r-help@r-project.org
  Cc:
  Sent: Monday, August 12, 2013 6:59 PM
  Subject: [R] pulling out pairs from data frame
 
  Hello everyone,
  I'm having trouble pulling out paired samples from a data set... I have 
  the following:
 
  reps-c(4,15) #the variable reps is a list of all paired samples
  data
 
 
 
 
 
  á
  á SameName
  á
  á
  á Individual
  á
  á
  á Age
  á
  á
  á Gender
  á
 
 
  á
  á 1
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
  á
  á 2
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 3
  á
  á
  á 1
  á
  á
  á 75
  á
  á
  á F
  á
 
 
  á
  á 4
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 5
  á
  á
  á 2
  á
  á
  á 58
  á
  á
  á F
  á
 
 
  á
  á 6
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
 
 
  I'd like to make a new variable with only the samples that have pairs. 
  Any suggestions would be greatly appreciated
 
  Thanks!
 
 
 
 
 
  .kripa
  ááá ááá  ááá á  ááá ááá á
  ááá [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] getting rid of .Rhistory and .RData

2013-08-13 Thread Jannis


Dear R users,


occasionally I find .Rhistory and/or .RData files cluttered around in my 
file structure. Is there a way to tell R not to save such files? Or to 
use one central location where to save them (if they are of any use)? I 
have looked through options() to no avail.



Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: bwplot - changing box colors in legend and plot when using panel.groups = function... and panel = panel.superpose

2013-08-13 Thread Richard M. Heiberger

I don't see a question in what you wrote.  Your graph has some similarities
to some of my examples.
Please look at the demo in the HH package

## install.packages(HH)  ## if necessary
library(HH)
demo(bwplot.examples, package=HH)

Rich


On Tue, Aug 13, 2013 at 10:00 AM, Anna Zakrisson Braeunlich 
anna.zakris...@su.se wrote:

 Hi,

 Yes, I have searched stack overflow.

 My issue is to simply change coloring in boxes and legend in my bwplot. I
 have done this many times in lattice, but now I have been tweaking the plot
 somewhat and I can no longer apply the color changes.
 I would really appreciate some help.
 A. Zakrisson

 Here is some dummy data and my script:

 mydata- data.frame(factor1 = factor(rep(LETTERS[1:6], each = 80)),
 factor2 = factor(rep(c(1:2), each = 16)),
 var1 = rnorm(120, mean = rep(c(0, 3, 5), each = 40),
  sd = rep(c(1, 2, 3), each = 20)))

 font.settings - list( font = 1, cex = 1, fontfamily = serif)
 my.theme - list(
   box.umbrella = list(col = black),
   box.rectangle = list(fill= rep(c(black, black),2)),
   box.dot = list(col = black, pch = 3, cex=2),
   plot.symbol   = list(cex = 1, col = 1, pch= 0), #outlier size and color
   par.xlab.text = font.settings,
   par.ylab.text = font.settings,
   axis.text = font.settings,
   par.sub=font.settings)

 bwplot(var1 ~ factor1, data = mydata, groups = factor2,
box.width = 1/3,#width of the boxes
auto.key = list(points = FALSE,
rectangles = TRUE, space = right,
title=Year, cex.title=1),
panel = panel.superpose,
ylab = var1,
xlab=factor1,
par.settings = my.theme,
panel.groups = function(x, y, ..., group.number) {
  panel.bwplot(x + (group.number-1.8)/3, y, ...)
})


 Anna Zakrisson Braeunlich
 PhD student

 Department of Ecology, Environment and Plant Sciences
 Stockholm University
 Svante Arrheniusv. 21A
 SE-106 91 Stockholm
 Sweden/Sverige

 Lives in Berlin.
 For paper mail:
 Katzbachstr. 21
 D-10965, Berlin - Kreuzberg
 Germany/Deutschland

 E-mail: anna.zakris...@su.se
 Tel work: +49-(0)3091541281
 Mobile: +49-(0)15777374888
 LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

 º`. .  `. . `. . º`. .  `. . `. .º`. . 
 `. . `. .º

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to store and manipulate survey data like this?

2013-08-13 Thread Walter Anderson

I have to process a set of survey data with questions that are formatted 
like this;


1) Pick your top three breeds (pick 3)
 1  Rottweiler
 2  Pit Bull
 3  German Shepard
 4  Poodle
 5  Border Collie
 6  Dalmation
 7  Mixed Breed

and the answers are formatted like this:

Respondent, Question1
1, 1,4,7
2, 2,7,5
3, 6,3,5
4, 
...

Any suggestions on how to preprocess the file to be able to do things 
like frequency analysis for breeds?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: bwplot - changing box colors in legend and plot when using panel.groups = function... and panel = panel.superpose

2013-08-13 Thread Kevin Wright

I think I understand your question.  You need to make sure that you are
setting the right parameters in your theme.  Use trellis.par.get() to  have
a look at the MANY possible settings.  For example, in your case, to have
the boxplots and rectangles be the same color:

my.theme - list(
  box.umbrella = list(col = black),
  box.rectangle = list(fill= rep(c(black, black),2)),
  box.dot = list(col = black, pch = 3, cex=2),
  plot.symbol   = list(cex = 1, col = 1, pch= 0), #outlier size and color
  par.xlab.text = font.settings,
  par.ylab.text = font.settings,
  axis.text = font.settings,
  #strip.shingle=list(col=c(red,blue)),
  superpose.symbol=list(fill=c(red,blue)), # boxplots
  #superpose.fill=list(col=c(red,blue)),
  superpose.polygon=list(col=c(red,blue)), # legend
  par.sub=font.settings)

Kevin Wright



On Tue, Aug 13, 2013 at 9:00 AM, Anna Zakrisson Braeunlich 
anna.zakris...@su.se wrote:

 Hi,

 Yes, I have searched stack overflow.

 My issue is to simply change coloring in boxes and legend in my bwplot. I
 have done this many times in lattice, but now I have been tweaking the plot
 somewhat and I can no longer apply the color changes.
 I would really appreciate some help.
 A. Zakrisson

 Here is some dummy data and my script:

 mydata- data.frame(factor1 = factor(rep(LETTERS[1:6], each = 80)),
 factor2 = factor(rep(c(1:2), each = 16)),
 var1 = rnorm(120, mean = rep(c(0, 3, 5), each = 40),
  sd = rep(c(1, 2, 3), each = 20)))

 font.settings - list( font = 1, cex = 1, fontfamily = serif)
 my.theme - list(
   box.umbrella = list(col = black),
   box.rectangle = list(fill= rep(c(black, black),2)),
   box.dot = list(col = black, pch = 3, cex=2),
   plot.symbol   = list(cex = 1, col = 1, pch= 0), #outlier size and color
   par.xlab.text = font.settings,
   par.ylab.text = font.settings,
   axis.text = font.settings,
   par.sub=font.settings)

 bwplot(var1 ~ factor1, data = mydata, groups = factor2,
box.width = 1/3,#width of the boxes
auto.key = list(points = FALSE,
rectangles = TRUE, space = right,
title=Year, cex.title=1),
panel = panel.superpose,
ylab = var1,
xlab=factor1,
par.settings = my.theme,
panel.groups = function(x, y, ..., group.number) {
  panel.bwplot(x + (group.number-1.8)/3, y, ...)
})


 Anna Zakrisson Braeunlich
 PhD student

 Department of Ecology, Environment and Plant Sciences
 Stockholm University
 Svante Arrheniusv. 21A
 SE-106 91 Stockholm
 Sweden/Sverige

 Lives in Berlin.
 For paper mail:
 Katzbachstr. 21
 D-10965, Berlin - Kreuzberg
 Germany/Deutschland

 E-mail: anna.zakris...@su.se
 Tel work: +49-(0)3091541281
 Mobile: +49-(0)15777374888
 LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

 º`. .  `. . `. . º`. .  `. . `. .º`. . 
 `. . `. .º

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Kevin Wright

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outliers and overdispersion

2013-08-13 Thread Marta Lomas

Thanks Bert!
I think they are relatively important. What I am doing is comparing 2003 with
2013 distribution and use of this species in an specific sampled area. They are
currently way lower numbers than in 2003, however in both years the data are
zero inflated. Most of the outliers are in 2003 when they were quite more birds.

On the other hand, the behavior of the species is very social (ruffs) so where
they are 5 birds, they could be 300 in the next 10 minutesso outliers
accounting for this maybe are not that important to take into accout, and thus,
I should focus more in the binomial part of the glmmadmb model that I chose
(where just zeros vs no zeros are modeled).

Thanks for your reflections they are very good to me!

Date: Tue, 13 Aug 2013 09:07:41 -0700
Subject: Re: [R] Outliers and overdispersion
From: gunter.ber...@gene.com
To: lomasv...@hotmail.com
CC: szehn...@uni-bonn.de; r-help@r-project.org

The central question is: What caused the 3 unusual values? What is
their scientific relevance? Only you can answer that, not us.

-- Bert

On Tue, Aug 13, 2013 at 8:51 AM, Marta Lomas lomasv...@hotmail.com wrote:
Thanks for your interest and prompt answer!

What I try to estimate is the correlation of one bird species counts with a
set of environmental parameters. The count data are zero-inflated and
overdispersed. I am modeling with hurdle-negative binomial-mixed effects.
The results are very difficult to interpret and it get easier dropping out
3 outliers. But I do not know if I should do this..
Thanks!
Marta

Subject: Re: [R] Outliers and overdispersion
From: szehn...@uni-bonn.de
Date: Tue, 13 Aug 2013 17:41:10 +0200
CC: r-help@r-project.org
To: lomasv...@hotmail.com

I do not know what you are exactly estimating, but if it is about count
models and the model fit gets better when you drop the outliers, it does
not say, that the model is now more correct. It just says, if the data
were without the outliers, this model would fit good.

Overdispersion in count data is sometimes a cue, that you have a mixture
distribution as the generating process - for example instead of one, K
different (sub)species of birds which were aggregated in the count data.
In this case a mixture (negative binomial)- distribution with K components
could fit the data better.

Best

Simon

On Aug 13, 2013, at 5:28 PM, Marta Lomas lomasv...@hotmail.com wrote:

Hi again,

However, I
tried some of the same models without ouliers and the AICs are thus
better. I
also have nice significances this way...

Thanks for your patience!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Basic problem in R

2013-08-13 Thread Richard M. Heiberger

R-3.0.1 (use all digits for describing an R version) is not the problem.
 Most likely you are masking a function
or something like that.  When you started the R session, did you get a
message about restoring a previous session?
If so, then close R, find the directory in which you were working and
delete (or rename) the file .RData, and start R again.
Or perhaps you inadvertently detached the stats package.  type search()
to check on that.
The repair is the same as above.

Rich

On Tue, Aug 13, 2013 at 10:08 AM, Sinne Smed s...@ifro.ku.dk wrote:

 I am teaching a summercourse this and the next week where the students are
 using R.



 We have downloaded and use the new version R.3.0. It has worked perfectly
 until today where some of the basic functions have started NOT to work.



 Examples are sd() and lm ()



 The message we get is Error: could not find function lm or Error: could
 not find function sd



 Have you ever encountered that. If yes what can I do about it. Is it a
 basic error in the new R version? I have used R for teaching in 5 years now
 and hae ever encountered a problem like that??



 Thanks, Sinne


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] latin1 encoding in WriteXLS

2013-08-13 Thread Hugo Varet

Dear R users,

I've just updated the WriteXLS package (on R 3.0.1) and I now have an error
when exporting a data.frame with the argument Encoding=latin1. For
example, these two lines work:
   library(WriteXLS)
   WriteXLS(iris, iris.xls)
whereas these ones don't work:
  library(WriteXLS)
  WriteXLS(iris, irislatin1.xls,Encoding=latin1)
I get this message:
Argument Sepal.Length isn't numeric in subroutine entry at
C:/Perl64/lib/Encode.pm line 217, CSVFILE line 1.
Modification of a read-only value attempted at C:/Perl64/lib/Encode.pm line
218, CSVFILE line 1.
The Perl script 'WriteXLS.pl' failed to run successfully.
Message d'avis :
l'exécution de la commande 'perl
-IC:/Users/varet/Documents/R/win-library/3.0/WriteXLS/Perl
C:/Users/varet/Documents/R/win-library/3.0/WriteXLS/Perl/WriteXLS.pl
--CSVPath C:\Users\varet\AppData\Local\Temp\RtmpEzqFNz/WriteXLS --verbose
FALSE --AdjWidth FALSE --AutoFilter FALSE --BoldHeaderRow FALSE --FreezeRow
0 --FreezeCol 0 --Encoding latin1 C:\Users\varet\Desktop\irislatin1.xls'
renvoie un statut 255

Does anyone know why it failed? May it be a problem with Perl?

Thanks for your help,

Hugo Varet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pulling out pairs from data frame

2013-08-13 Thread arun

Bert,

dat1-structure(list(SameName = 1:6, Individual = c(4L, 15L, 1L, 15L, 
2L, 4L), Age = c(80L, 56L, 75L, 56L, 58L, 80L), Gender = c(M, 
F, F, F, F, M)), .Names = c(SameName, Individual, 
Age, Gender), class = data.frame, row.names = c(NA, -6L
))
Your solution gives:

 dat1[!duplicated(dat1$Individual),]
#  SameName Individual Age Gender
#1    1  4  80  M
#2    2 15  56  F
#3    3  1  75  F
#5    5  2  58  F

The OP asked for:
And I want to pull out paired samples, so the resulting file would look 
something like this:
 SampleName Individual Age Gender
# 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M

Anyway, the question was not clear as I mentioned in the earlier mail.
Regards,
A.K.




- Original Message -
From: Bert Gunter gunter.ber...@gene.com
To: Kripa R kripa...@hotmail.com
Cc: arun smartpink...@yahoo.com; R help r-help@r-project.org
Sent: Tuesday, August 13, 2013 12:09 PM
Subject: Re: [R] pulling out pairs from data frame

Sorry. Typo. Corrected version  is:

yourframe[!duplicated(yourframe$Individual),]

-- Bert

On Tue, Aug 13, 2013 at 9:05 AM, Bert Gunter bgun...@gene.com wrote:
 ?duplicated

 yourframe[!duplicated(yourframe)$Individual,]

 -- Bert

 On Tue, Aug 13, 2013 at 8:12 AM, Kripa R kripa...@hotmail.com wrote:
 I manipulated the code you sent and it works perfectly, thanks!

 .kripa

 Date: Tue, 13 Aug 2013 08:10:53 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org



 Hi,
 The conditions are still not clear.


 dat2- dat1[dat1$Individual%in% reps,]
 dat2
 #  SameName Individual Age Gender
 #1        1          4  80      M
 #2        2         15  56      F
 #4        4         15  56      F
 #6        6          4  80      M
 A.K.

 
 From: Kripa R kripa...@hotmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, August 13, 2013 10:56 AM
 Subject: RE: [R] pulling out pairs from data frame




 Oops! Ok So I have this file:

 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 3 1 75 F
 4 15 56 F
 5 2 58 F
 6 4 80 M

 And I want to pull out paired samples, so the resulting file would look 
 something like this:
 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M

 .kripa


  Date: Mon, 12 Aug 2013 18:36:08 -0700
  From: smartpink...@yahoo.com
  Subject: Re: [R] pulling out pairs from data frame
  To: kripa...@hotmail.com
  CC: r-help@r-project.org
 
  Hi,
  The question is not clear so not sure this is what you wanted.
 
  dat1- read.table(text=
  SameName áIndividual áAge Gender
  1 4 á80 áM á
  2 15 á56 F
  3 1 75 áF
  4 15 á56 áF
  5 á2 á58 áF
  6 4 á80 áM
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
  reps-c(4,15)á
 
  ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
  ádat1
  # áSameName Individual Age Gender Newcol
  #1 á á á á1 á á á á á4 á80 á á áM á á á1
  #2 á á á á2 á á á á 15 á56 á á áF á á á1
  #3 á á á á3 á á á á á1 á75 á á áF á á á0
  #4 á á á á4 á á á á 15 á56 á á áF á á á1
  #5 á á á á5 á á á á á2 á58 á á áF á á á0
  #6 á á á á6 á á á á á4 á80 á á áM á á á1
  A.K.á
 
 
 
 
  - Original Message -
  From: Kripa R kripa...@hotmail.com
  To: r-help@r-project.org r-help@r-project.org
  Cc:
  Sent: Monday, August 12, 2013 6:59 PM
  Subject: [R] pulling out pairs from data frame
 
  Hello everyone,
  I'm having trouble pulling out paired samples from a data set... I have 
  the following:
 
  reps-c(4,15) #the variable reps is a list of all paired samples
  data
 
 
 
 
 
  á
  á SameName
  á
  á
  á Individual
  á
  á
  á Age
  á
  á
  á Gender
  á
 
 
  á
  á 1
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
  á
  á 2
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 3
  á
  á
  á 1
  á
  á
  á 75
  á
  á
  á F
  á
 
 
  á
  á 4
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 5
  á
  á
  á 2
  á
  á
  á 58
  á
  á
  á F
  á
 
 
  á
  á 6
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
 
 
  I'd like to make a new variable with only the samples that have pairs. 
  Any suggestions would be greatly appreciated
 
  Thanks!
 
 
 
 
 
  .kripa
  ááá ááá  ááá á  ááá ááá á
  ááá [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

         [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact

[R] internal error -3 in R_decompress1

2013-08-13 Thread Jannis


Dear r users,


what could cause such an error:

internal error -3 in R_decompress1


unfortunately the error kills all my usual error catching mechanisms an 
appears on a remote cluster so I can not really tell you which command 
etc is causing it.



Thanks for any hints on where to dig for the solution (or even the just 
cause).


Jannis



R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices datasets  utils methods
[8] base

other attached packages:
 [1] snowfall_1.84-4snow_0.3-12
 [3] DistributionUtils_0.5-1RUnit_0.4.26
 [5] RColorBrewer_1.0-5 plotrix_3.5
 [7] doMC_1.3.0 iterators_1.0.6
 [9] multicore_0.1-7plyr_1.8
[11] raster_2.1-49  sp_1.0-11
[13] abind_1.4-0foreach_1.4.1
[15] RNetCDF_1.6.1-2Rssa_0.9.10
[17] forecast_4.06  svd_0.3.2-1

loaded via a namespace (and not attached):
 [1] codetools_0.2-8  colorspace_1.2-2 compiler_3.0.0 fracdiff_1.4-2
 [5] grid_3.0.0   lattice_0.20-15  nnet_7.3-7 quadprog_1.5-5
 [9] tools_3.0.0  tseries_0.10-32  zoo_1.7-10

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ave function

2013-08-13 Thread Robert Lynch

I've written the following function
CoursePrep - function (Source, SaveName) {


  Clean$TERM - as.factor(Clean$TERM)

  Clean$INST_NUM - as.factor(Clean$INST_NUM)
  Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN =
scale))
  write.csv(Clean,paste(SaveName, csv, sep =.), row.names = FALSE)
  return(Clean)
}

which is all well and good, but I wan't to throw a shapiro.test in before I
normalize.  that is I don't really understand quite how I did ( I got help)
what I wanted to in the
 Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale))
that code for the whole of Clean finds all sets of GRADE.'s that have the
same INST_NUM and TERM computes a mean, subtracts off the mean and divides
by the standard deviation. I would like to for each one of those sets of
grades to call shapiro.test() on the set, to see if it is normal *before* I
assume it is.

I know the naive
with(Clean, shapiro.test( list(TERM, INST_NUM)))
doesn't work.
with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN =
function(x)shapiro.test(x)))

which returns
Error in shapiro.test(x) : sample size must be between 3 and 5000
and I have checked that the sets selected are all of length between 3 and
5000.
using the following on my full data

ClassSize - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN =
function(x)length(x)))
 summary(ClassSize)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
   22.0   198.0   241.0   244.4   279.0   466.0

here is some sample data
GRADE TERM INST_NUM
1,  9,   1
2,  9,   1
3,  9,   1
1.5,   8,   2
1.75, 8,   2
2,  8,  2
0.5,   9,   2
2,  9,  2
3.5,   9,  2
3.5,8, 1
3.75,  8, 1
4,   8,  1

and hopefully the code would test the following set of grades
(1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4)

Thanks Robert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] internal error -3 in R_decompress1

2013-08-13 Thread Prof Brian Ripley


On 13/08/2013 18:47, Jannis wrote:

Dear r users,


what could cause such an error:

internal error -3 in R_decompress1


unfortunately the error kills all my usual error catching mechanisms an
appears on a remote cluster so I can not really tell you which command
etc is causing it.


It is a corrupt package installation, so re-install.




Thanks for any hints on where to dig for the solution (or even the just
cause).

Jannis



R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices datasets  utils methods
[8] base

other attached packages:
  [1] snowfall_1.84-4snow_0.3-12
  [3] DistributionUtils_0.5-1RUnit_0.4.26
  [5] RColorBrewer_1.0-5 plotrix_3.5
  [7] doMC_1.3.0 iterators_1.0.6
  [9] multicore_0.1-7plyr_1.8
[11] raster_2.1-49  sp_1.0-11
[13] abind_1.4-0foreach_1.4.1
[15] RNetCDF_1.6.1-2Rssa_0.9.10
[17] forecast_4.06  svd_0.3.2-1

loaded via a namespace (and not attached):
  [1] codetools_0.2-8  colorspace_1.2-2 compiler_3.0.0 fracdiff_1.4-2
  [5] grid_3.0.0   lattice_0.20-15  nnet_7.3-7 quadprog_1.5-5
  [9] tools_3.0.0  tseries_0.10-32  zoo_1.7-10

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ave function

2013-08-13 Thread arun

Hi,
You could try:
 lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) 
shapiro.test(x$GRADE))
A.K.




- Original Message -
From: Robert Lynch robert.b.ly...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 1:46 PM
Subject: [R] ave function

I've written the following function
CoursePrep - function (Source, SaveName) {


  Clean$TERM - as.factor(Clean$TERM)

  Clean$INST_NUM - as.factor(Clean$INST_NUM)
  Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN =
scale))
  write.csv(Clean,paste(SaveName, csv, sep =.), row.names = FALSE)
  return(Clean)
}

which is all well and good, but I wan't to throw a shapiro.test in before I
normalize.  that is I don't really understand quite how I did ( I got help)
what I wanted to in the
Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale))
that code for the whole of Clean finds all sets of GRADE.'s that have the
same INST_NUM and TERM computes a mean, subtracts off the mean and divides
by the standard deviation. I would like to for each one of those sets of
grades to call shapiro.test() on the set, to see if it is normal *before* I
assume it is.

I know the naive
with(Clean, shapiro.test( list(TERM, INST_NUM)))
doesn't work.
with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN =
function(x)shapiro.test(x)))

which returns
Error in shapiro.test(x) : sample size must be between 3 and 5000
and I have checked that the sets selected are all of length between 3 and
5000.
using the following on my full data

ClassSize - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN =
function(x)length(x)))
 summary(ClassSize)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   22.0   198.0   241.0   244.4   279.0   466.0

here is some sample data
GRADE     TERM     INST_NUM
1,              9,           1
2,              9,           1
3,              9,           1
1.5,           8,           2
1.75,         8,           2
2,              8,          2
0.5,           9,           2
2,              9,          2
3.5,           9,          2
3.5,            8,         1
3.75,          8,         1
4,               8,          1

and hopefully the code would test the following set of grades
(1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4)

Thanks Robert

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pulling out pairs from data frame

2013-08-13 Thread Bert Gunter

Yes, you're right.

So I guess you should match on duplicated values, something like (untested)

with(dat1, dat1[individual %in% individual[duplicated(individual)],]

which is presumably essentially what you gave.

-- Bert

On Tue, Aug 13, 2013 at 10:41 AM, arun smartpink...@yahoo.com wrote:
 Bert,

 dat1-structure(list(SameName = 1:6, Individual = c(4L, 15L, 1L, 15L,
 2L, 4L), Age = c(80L, 56L, 75L, 56L, 58L, 80L), Gender = c(M,
 F, F, F, F, M)), .Names = c(SameName, Individual,
 Age, Gender), class = data.frame, row.names = c(NA, -6L
 ))
 Your solution gives:

  dat1[!duplicated(dat1$Individual),]
 #  SameName Individual Age Gender
 #11  4  80  M
 #22 15  56  F
 #33  1  75  F
 #55  2  58  F

 The OP asked for:
 And I want to pull out paired samples, so the resulting file would look 
 something like this:
 SampleName Individual Age Gender
 # 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M

 Anyway, the question was not clear as I mentioned in the earlier mail.
 Regards,
 A.K.




 - Original Message -
 From: Bert Gunter gunter.ber...@gene.com
 To: Kripa R kripa...@hotmail.com
 Cc: arun smartpink...@yahoo.com; R help r-help@r-project.org
 Sent: Tuesday, August 13, 2013 12:09 PM
 Subject: Re: [R] pulling out pairs from data frame

 Sorry. Typo. Corrected version  is:

 yourframe[!duplicated(yourframe$Individual),]

 -- Bert

 On Tue, Aug 13, 2013 at 9:05 AM, Bert Gunter bgun...@gene.com wrote:
 ?duplicated

 yourframe[!duplicated(yourframe)$Individual,]

 -- Bert

 On Tue, Aug 13, 2013 at 8:12 AM, Kripa R kripa...@hotmail.com wrote:
 I manipulated the code you sent and it works perfectly, thanks!

 .kripa

 Date: Tue, 13 Aug 2013 08:10:53 -0700
 From: smartpink...@yahoo.com
 Subject: Re: [R] pulling out pairs from data frame
 To: kripa...@hotmail.com
 CC: r-help@r-project.org



 Hi,
 The conditions are still not clear.


 dat2- dat1[dat1$Individual%in% reps,]
 dat2
 #  SameName Individual Age Gender
 #11  4  80  M
 #22 15  56  F
 #44 15  56  F
 #66  4  80  M
 A.K.

 
 From: Kripa R kripa...@hotmail.com
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Tuesday, August 13, 2013 10:56 AM
 Subject: RE: [R] pulling out pairs from data frame




 Oops! Ok So I have this file:

 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 3 1 75 F
 4 15 56 F
 5 2 58 F
 6 4 80 M

 And I want to pull out paired samples, so the resulting file would look 
 something like this:
 SampleName Individual Age Gender
 1 4 80 M
 2 15 56 F
 4 15 56 F
 6 4 80 M

 .kripa


  Date: Mon, 12 Aug 2013 18:36:08 -0700
  From: smartpink...@yahoo.com
  Subject: Re: [R] pulling out pairs from data frame
  To: kripa...@hotmail.com
  CC: r-help@r-project.org
 
  Hi,
  The question is not clear so not sure this is what you wanted.
 
  dat1- read.table(text=
  SameName áIndividual áAge Gender
  1 4 á80 áM á
  2 15 á56 F
  3 1 75 áF
  4 15 á56 áF
  5 á2 á58 áF
  6 4 á80 áM
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
  reps-c(4,15)á
 
  ádat1$Newcol-as.numeric(dat1$Individual%in% reps)
  ádat1
  # áSameName Individual Age Gender Newcol
  #1 á á á á1 á á á á á4 á80 á á áM á á á1
  #2 á á á á2 á á á á 15 á56 á á áF á á á1
  #3 á á á á3 á á á á á1 á75 á á áF á á á0
  #4 á á á á4 á á á á 15 á56 á á áF á á á1
  #5 á á á á5 á á á á á2 á58 á á áF á á á0
  #6 á á á á6 á á á á á4 á80 á á áM á á á1
  A.K.á
 
 
 
 
  - Original Message -
  From: Kripa R kripa...@hotmail.com
  To: r-help@r-project.org r-help@r-project.org
  Cc:
  Sent: Monday, August 12, 2013 6:59 PM
  Subject: [R] pulling out pairs from data frame
 
  Hello everyone,
  I'm having trouble pulling out paired samples from a data set... I have 
  the following:
 
  reps-c(4,15) #the variable reps is a list of all paired samples
  data
 
 
 
 
 
  á
  á SameName
  á
  á
  á Individual
  á
  á
  á Age
  á
  á
  á Gender
  á
 
 
  á
  á 1
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
  á
  á 2
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 3
  á
  á
  á 1
  á
  á
  á 75
  á
  á
  á F
  á
 
 
  á
  á 4
  á
  á
  á 15
  á
  á
  á 56
  á
  á
  á F
  á
 
 
  á
  á 5
  á
  á
  á 2
  á
  á
  á 58
  á
  á
  á F
  á
 
 
  á
  á 6
  á
  á
  á 4
  á
  á
  á 80
  á
  á
  á M
  á
 
 
 
 
  I'd like to make a new variable with only the samples that have pairs. 
  Any suggestions would be greatly appreciated
 
  Thanks!
 
 
 
 
 
  .kripa
  ááá ááá  ááá á  ááá ááá á
  ááá [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

Re: [R] Create rows for columns in dataframe

2013-08-13 Thread Dark

Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with zero-inflated negative binomial model in sediment river dynamics

2013-08-13 Thread Lauria, Valentina

Dear All,

I am running a negative binomial model in R using the package pscl in oder to 
estimate bed sediment movements versus river discharge. Currently we have 
deployed 4 different plates to test if a combination of more than one plate 
would better describe the sediment movements when the river discharge changes 
over time.

My data are positively skewed and zero-inflated. I did run both zero-inflated 
Poisson and zero-inflated negative binomial regression and compared them using 
the VUONG test which showed that the negative binomial works better than a 
simple zero-inflated Poisson.

My models look like:


1) plate1 ~ river discharge
2) (plate 1 + plate 2) ~ river discharge
3) (plate 1 + plate 2 +plate 3) ~ river discharge
4) (plate 1 + plate 2 + plate 3 + plate 4) ~ river discharge


My main problem as I am new to these type of models is that I get a different 
sign for the coefficent of discharge in the output of the zero-inflated 
negative binomial model (please see below). What does this mean? Also how could 
I compare the different models (1-4) i.e. what tells me which is performing 
best? Thank you very much in advance for any comments and suggestions!!

Kind Regards,
Valentina


Call:
zeroinfl(formula = plate1 ~ discharge, data = datafit_plates, dist = negbin, 
EM = TRUE)
Pearson residuals:
Min  1Q  Median  3Q Max
-0.6770 -0.3564 -0.2101 -0.0814 12.3421

Count model coefficients (negbin with log link):
 EstimateStd. Error z value Pr(|z|)
(Intercept)  2.557066 0.036593   69.88   2e-16 ***
discharge0.0646980.001983   32.63   2e-16 ***
Log(theta)  -0.775736   0.012451  -62.30   2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
  EstimateStd. Error z valuePr(|z|)
(Intercept)   13.010110.22602  57.56   2e-16 ***
discharge-1.642930.03092   -53.14   2e-16 ***
Theta = 0.4604
Number of iterations in BFGS optimization: 1
Log-likelihood: -6.933e+04 on 5 Df






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regular repeats

2013-08-13 Thread jsf1982

Hi,
Many apologies for the simplicity (hopefully!) of this request - I can't
find it on the forum, but it may have been asked in the past.

I have a data frame consisting of ~2000 rows. I simply want to take the
average of the first 6, then the next 6, then the next 6 until the end of
the table. 
The command 

mean(mole[1:6,c(PercentPI)])

gets me the first 6 rows (column is PercentPI), but I don't know how to
increase the rows incrementally.

Thanks in advance.
J





--
View this message in context: 
http://r.789695.n4.nabble.com/Regular-repeats-tp4673653.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] DIALLEL ANALYSIS

2013-08-13 Thread Waqas Shafqat

sir i have installed plant breeding library well. But when i import the
file in R and give command
* data(fulldial)
Warning message:
In data(fulldial) : data set fulldial not found*


above warning message is found please guide me. my data is under

 MALE FEMALE YIELD  1 1 53.333  1 2 52.333  1 3 54.333  1 4 56.333  1 5
52.667  2 1 52.667  2 2 51.333  2 3 55.333  2 4 52.333  2 5 54  3 1 53.667
3 2 51.667  3 3 52.667  3 4 55.333  3 5 54.667  4 1 57.333  4 2 53.333  4 3
56  4 4 54.667  4 5 51.667  5 1 56.667  5 2 54.333  5 3 51.333  5 4 54  5 5
55.333
Is there any mistake in data entry please tell me..
please send me code and solve question...i shall thankful to you..

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] getting rid of .Rhistory and .RData

2013-08-13 Thread MacQueen, Don

The following should help:

What does R ask you each time you quit R?  Answer no.

Start R with
  R --no-save

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 8/13/13 9:15 AM, Jannis bt_jan...@yahoo.de wrote:

Dear R users,


occasionally I find .Rhistory and/or .RData files cluttered around in my
file structure. Is there a way to tell R not to save such files? Or to
use one central location where to save them (if they are of any use)? I
have looked through options() to no avail.


Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to store and manipulate survey data like this?

2013-08-13 Thread Siraaj Khandkar


On 08/13/2013 12:17 PM, Walter Anderson wrote:

I have to process a set of survey data with questions that are formatted
like this;

1) Pick your top three breeds (pick 3)
  1  Rottweiler
  2  Pit Bull
  3  German Shepard
  4  Poodle
  5  Border Collie
  6  Dalmation
  7  Mixed Breed

and the answers are formatted like this:

Respondent, Question1
1, 1,4,7
2, 2,7,5
3, 6,3,5
4, 
...

Any suggestions on how to preprocess the file to be able to do things
like frequency analysis for breeds?



Here's how I would get started:


 survey - read.csv(survey.csv, as.is=TRUE)
 survey
  Respondent Question1
1  1 1,4,7
2  2 2,7,5
3  3 6,3,5
4  4

 TipleOrNAs - function(x) {if (length(x) == 3) x else c(NA, NA, NA)}
 options - lapply(strsplit(survey$Question1, ,), TripleOrNAs)
 options - matrix(unlist(options), ncol=3, byrow=TRUE)
 survey2 - cbind(survey, options)
 names(survey2) - c(names(survey), paste(Q1.Opt, 1:3, sep=.))
 survey2
  Respondent Question1 Q1.Opt.1 Q1.Opt.2 Q1.Opt.3
1  1 1,4,7147
2  2 2,7,5275
3  3 6,3,5635
4  4   NA NA NA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create rows for columns in dataframe

2013-08-13 Thread arun

According to your first post,


NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(), CODE=character(),  
PRIMAIRY=logical())


The new output dataset: Out1
str(Out1)
'data.frame':    48 obs. of  4 variables:
 $ ID  : chr  1 2 3 4 ...
 $ DSYSRTKY: chr  10005 10005 10005 10005 ...
 $ CODE    : chr  71535 78900 V1251 V454 ...
 $ PRIMAIRY: chr  TRUE FALSE FALSE FALSE ...


I guess you wanted DSYSRTKY to be numeric and PRIMAIRY to be logical
res1-do.call(rbind,lapply(seq_len(nrow(dat1)),function(i) 
{x1-as.character(unlist(dat1[i,-1]));CODE-x1[x1!=];PRIMAIRY-x1[x1!=]==head(x1,1);
 
DSYSRTKY=as.numeric(as.character(dat1[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE)
 }))
 res1$ID- row.names(res1)
res2-res1[,c(4,1:3)]

str(res2)
#'data.frame':    48 obs. of  4 variables:
# $ ID  : chr  1 2 3 4 ...
# $ DSYSRTKY: num  1e+08 1e+08 1e+08 1e+08 1e+08 ...
# $ CODE    : chr  71535 78900 V1251 V454 ...
# $ PRIMAIRY: logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
 head(res2)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
 head(Out1)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
A.K.







- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 12:16 PM
Subject: Re: [R] Create rows for columns in dataframe

Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular repeats

2013-08-13 Thread Doran, Harold

What about something like this:

tmp - data.frame(var1 = rnorm(36), ind = gl(6,6))

with(tmp, tapply(var1, ind, mean))

You can see that your version of

mean(tmp[1:6,c(var1)])

gives the same as mine for the first 6 rows.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of jsf1982
Sent: Tuesday, August 13, 2013 12:46 PM
To: r-help@r-project.org
Subject: [R] Regular repeats

Hi,
Many apologies for the simplicity (hopefully!) of this request - I can't find 
it on the forum, but it may have been asked in the past.

I have a data frame consisting of ~2000 rows. I simply want to take the average 
of the first 6, then the next 6, then the next 6 until the end of the table. 
The command 

mean(mole[1:6,c(PercentPI)])

gets me the first 6 rows (column is PercentPI), but I don't know how to 
increase the rows incrementally.

Thanks in advance.
J





--
View this message in context: 
http://r.789695.n4.nabble.com/Regular-repeats-tp4673653.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit on Linux?

2013-08-13 Thread Stackpole, Chris

 From: Kevin E. Thorpe [mailto:kevin.tho...@utoronto.ca] 
 Sent: Monday, August 12, 2013 11:00 AM
 Subject: Re: [R] Memory limit on Linux?

 What does ulimit -a report on both of these machines?

Greetings,
Sorry for the delay. Other fires demanded more attention...

For the system in which memory seems to allocate as needed:
$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386251
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 386251
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited 

For the system in which memory seems to hang around 5-7GB:
$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 2066497
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

I can also confirm the same behavior on a Scientific Linux system though the 
difference besides CentOS/RHEL is that the Scientific is at an earlier 
version of 6 (6.2 to be exact). The Scientific system has the same ulimit 
configuration as the problem box.

I could be mistaken, but here are the differences I see in the ulimits:
pending signals: shouldn't matter
max locked memory: The Scientific/CentOS system is higher so I don't think this 
is it.
stack size: Again, higher on Scientific/CentOS.
max user processes: Seems high to me, but I don't see how this is capping a 
memory limit.

Am I missing something? Any help is greatly appreciated. 
Thank you!

Chris Stackpole

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Convert list with missing values to dataFrame

2013-08-13 Thread Steven Ranney

I have a dataFrame

sID - c(a, 1,2,3, b, 4,5,6)
rID - c(shr1125, bwr331, bwr330, vjhr1022)

tmp - data.frame(cbind(sID,rID))

but I need to split tmp$sID into three different columns, filling locations
where tmp$sID has only one value with NA.

I can split tmp$sID by the comma

tmp.1 - strsplit(tmp$sID, ,)

but I can't figure out how to convert the resulting list into a dataFrame.

Ideally, tmp will become four columns wide, something like

sID.a  sID.b  sID.c  rID
NA NA ashr1125
12   3bwr331
NA NA b   bwr330
456  vjhr1022

Thoughts or suggestions?

I tried

havecomma - grep(',', tmp$sID)

for( i in 1:nrow(tmp)){
  if (!(tmp[i,] %in% havecomma)){
tmp$sID[i] - paste(', ,', tmp$sID[i], sep=)
}
}

and thought that I might be able to force the list into a dataframe once
each component had three items, but it just seemed to apply the paste()
function to everything which gave me a list with varying numbers of items.

I'm stuck.

Thanks for your help -

SR






Steven H. Ranney

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit on Linux?

2013-08-13 Thread Kevin E. Thorpe

On 08/13/2013 03:06 PM, Stackpole, Chris wrote:

From: Kevin E. Thorpe [mailto:kevin.tho...@utoronto.ca]
Sent: Monday, August 12, 2013 11:00 AM
Subject: Re: [R] Memory limit on Linux?

What does ulimit -a report on both of these machines?

Greetings,
Sorry for the delay. Other fires demanded more attention...

For the system in which memory seems to allocate as needed:
$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386251
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 386251
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

For the system in which memory seems to hang around 5-7GB:
$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 2066497
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

I can also confirm the same behavior on a Scientific Linux system though the 
difference besides CentOS/RHEL is that the Scientific is at an earlier 
version of 6 (6.2 to be exact). The Scientific system has the same ulimit configuration 
as the problem box.

I could be mistaken, but here are the differences I see in the ulimits:
pending signals: shouldn't matter
max locked memory: The Scientific/CentOS system is higher so I don't think this 
is it.
stack size: Again, higher on Scientific/CentOS.
max user processes: Seems high to me, but I don't see how this is capping a 
memory limit.

Am I missing something? Any help is greatly appreciated.
Thank you!

Chris Stackpole

It appears that at the shell level, the differences are not to blame. 
It has been a long time, but years ago in HP-UX, we needed to change an 
actual kernel parameter (this was for S-Plus 5 rather than R back then). 
 Despite the ulimits being acceptable, there was a hard limit in the 
kernel.  I don't know whether such things have been (or can be) built in 
to your problem machine.  If it is a multiuser box, it could be that 
limits have been set to prevent a user from gobbling up all the memory.

The other thing to check is if R has/can be compiled with memory limits.

Sorry I can't be of more help.

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with zero-inflated negative binomial model in sediment river dynamics

2013-08-13 Thread Cade, Brian

Lauria:  For historical reasons the logistic regression (binomial with
logit link) model portion of a zero-inflated count model is usually
structured to predict the probability of the 0 counts rather than the
nonzero (=1) counts so the coefficients will be the negative of what you
expect based on the count model portion (as in your output).  It is simple
to interpret the probability of the logistic regression portion as the
probability of the nonzero counts by just taking the negative of the
coefficient estimates provided for the probability of the zero counts.

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  ca...@usgs.gov brian_c...@usgs.gov
tel:  970 226-9326



On Tue, Aug 13, 2013 at 9:06 AM, Lauria, Valentina 
valentina.lau...@nuigalway.ie wrote:

 Dear All,

 I am running a negative binomial model in R using the package pscl in oder
 to estimate bed sediment movements versus river discharge. Currently we
 have deployed 4 different plates to test if a combination of more than one
 plate would better describe the sediment movements when the river discharge
 changes over time.

 My data are positively skewed and zero-inflated. I did run both
 zero-inflated Poisson and zero-inflated negative binomial regression and
 compared them using the VUONG test which showed that the negative binomial
 works better than a simple zero-inflated Poisson.

 My models look like:


 1) plate1 ~ river discharge
 2) (plate 1 + plate 2) ~ river discharge
 3) (plate 1 + plate 2 +plate 3) ~ river discharge
 4) (plate 1 + plate 2 + plate 3 + plate 4) ~ river discharge


 My main problem as I am new to these type of models is that I get a
 different sign for the coefficent of discharge in the output of the
 zero-inflated negative binomial model (please see below). What does this
 mean? Also how could I compare the different models (1-4) i.e. what tells
 me which is performing best? Thank you very much in advance for any
 comments and suggestions!!

 Kind Regards,
 Valentina


 Call:
 zeroinfl(formula = plate1 ~ discharge, data = datafit_plates, dist =
 negbin, EM = TRUE)
 Pearson residuals:
 Min  1Q  Median  3Q Max
 -0.6770 -0.3564 -0.2101 -0.0814 12.3421

 Count model coefficients (negbin with log link):
  EstimateStd. Error z value Pr(|z|)
 (Intercept)  2.557066 0.036593   69.88   2e-16 ***
 discharge0.0646980.001983   32.63   2e-16 ***
 Log(theta)  -0.775736   0.012451  -62.30   2e-16 ***

 Zero-inflation model coefficients (binomial with logit link):
   EstimateStd. Error z valuePr(|z|)
 (Intercept)   13.010110.22602  57.56   2e-16 ***
 discharge-1.642930.03092   -53.14   2e-16 ***
 Theta = 0.4604
 Number of iterations in BFGS optimization: 1
 Log-likelihood: -6.933e+04 on 5 Df






 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create rows for columns in dataframe

2013-08-13 Thread arun

You could also try:
##Out1 is the output dataset
Out1$PRIMAIRY-as.logical(Out1$PRIMAIRY) #changing the class
#dat1 input dataset

vec1- paste(dat1[,1],dat1[,2],colnames(dat1)[2],sep=.)
res2-reshape(dat1,idvar=newCol,varying=list(2:26),direction=long)
res3-res2[order(res2[,4]),]
res4-  res3[res3[,3]!=,-4]
vec2-paste(res4[,1],res4[,3],paste0(C,res4[,2]),sep=.)
 res4$PRIMAIRY-vec2%in%vec1
 row.names(res4)-1:nrow(res4)
res4$ID- row.names(res4)
res4[,c(1,3)]- lapply(res4[,c(1,3)],as.character)
res5-res4[,c(5,1,3,4)]
colnames(res5)[3]-CODE
identical(res5,Out1)
#[1] TRUE
A.K.



A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 2:45 PM
Subject: Re: [R] Create rows for columns in dataframe

According to your first post,


NewDataFrame - data.frame(ID=integer(), DSYSRTKY=integer(), CODE=character(),  
PRIMAIRY=logical())


The new output dataset: Out1
str(Out1)
'data.frame':    48 obs. of  4 variables:
 $ ID  : chr  1 2 3 4 ...
 $ DSYSRTKY: chr  10005 10005 10005 10005 ...
 $ CODE    : chr  71535 78900 V1251 V454 ...
 $ PRIMAIRY: chr  TRUE FALSE FALSE FALSE ...


I guess you wanted DSYSRTKY to be numeric and PRIMAIRY to be logical
res1-do.call(rbind,lapply(seq_len(nrow(dat1)),function(i) 
{x1-as.character(unlist(dat1[i,-1]));CODE-x1[x1!=];PRIMAIRY-x1[x1!=]==head(x1,1);
 
DSYSRTKY=as.numeric(as.character(dat1[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE)
 }))
 res1$ID- row.names(res1)
res2-res1[,c(4,1:3)]

str(res2)
#'data.frame':    48 obs. of  4 variables:
# $ ID  : chr  1 2 3 4 ...
# $ DSYSRTKY: num  1e+08 1e+08 1e+08 1e+08 1e+08 ...
# $ CODE    : chr  71535 78900 V1251 V454 ...
# $ PRIMAIRY: logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
 head(res2)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
 head(Out1)
#  ID  DSYSRTKY  CODE PRIMAIRY
#1  1 10005 71535 TRUE
#2  2 10005 78900    FALSE
#3  3 10005 V1251    FALSE
#4  4 10005  V454    FALSE
#5  5 10203 45341 TRUE
#6  6 10203  4019    FALSE
A.K.







- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 12:16 PM
Subject: Re: [R] Create rows for columns in dataframe

Hi,

My desired output for my sample!! using dput():
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 
42, 43, 44, 45, 46, 47, 48), DSYSRTKY = c(10005, 
10005, 10005, 10005, 10203, 10203, 
10203, 10203, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10315, 10315, 10315, 
10315, 10315, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327, 10327, 10327, 10327, 
10327, 10327), CODE = c(71535, 78900, V1251, 
V454, 45341, 4019, 72400, V1011, 42831, 5990, 8052, 
4241, 4019, 311, 2724, 71680, 4168, 7804, V066, 
6930, 41400, V4581, 40291, 4280, 5990, V4986, 5939, 
3109, 41401, 6826, 7850, 4019, 2720, 49390, 2859, 
79029, V1582, 486, 51881, 5119, 42789, 7823, 41400, 
V4581, 40390, 5859, 49390, 2724), PRIMAIRY = c(TRUE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, 
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c(ID, 
DSYSRTKY, CODE, PRIMAIRY), row.names = c(NA, 48L), class =
data.frame)

So the 'DSYSRTKY' (10005) has 4 code fields filled so you get 4 rows.
The next one also 4, the third one 16. Anyway, just take a look at the
sample.

I think this will help trying to make clear what my desired result is!

Regards Derk





--
View this message in context: 
http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673646.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert list with missing values to dataFrame

2013-08-13 Thread MacQueen, Don

Try,

sID - c(a, 1,2,3, b, 4,5,6)

tmp1 - strsplit(sID,',')

tmp2 - lapply(tmp1,
   function(x) if (length(x)==1) c('','',x) else x )

tmp3 - matrix(unlist(tmp2),ncol=3, byrow=TRUE)


rID - c(shr1125, bwr331, bwr330, vjhr1022)

newdf - data.frame(cbind(tmp3,rID))

You'll need to name the first three columns.

As an aside, note that you don't need the cbind in your
   data.frame(cbind(sID,rID))
because
   data.frame(sID,rID)
does just as well.
But cbind is needed in my example, because tmp3 is a matrix.

-Don


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 8/13/13 12:09 PM, Steven Ranney steven.ran...@gmail.com wrote:

I have a dataFrame

sID - c(a, 1,2,3, b, 4,5,6)
rID - c(shr1125, bwr331, bwr330, vjhr1022)

tmp - data.frame(cbind(sID,rID))

but I need to split tmp$sID into three different columns, filling
locations
where tmp$sID has only one value with NA.

I can split tmp$sID by the comma

tmp.1 - strsplit(tmp$sID, ,)

but I can't figure out how to convert the resulting list into a dataFrame.

Ideally, tmp will become four columns wide, something like

sID.a  sID.b  sID.c  rID
NA NA ashr1125
12   3bwr331
NA NA b   bwr330
456  vjhr1022

Thoughts or suggestions?

I tried

havecomma - grep(',', tmp$sID)

for( i in 1:nrow(tmp)){
  if (!(tmp[i,] %in% havecomma)){
tmp$sID[i] - paste(', ,', tmp$sID[i], sep=)
}
}

and thought that I might be able to force the list into a dataframe once
each component had three items, but it just seemed to apply the paste()
function to everything which gave me a list with varying numbers of items.

I'm stuck.

Thanks for your help -

SR






Steven H. Ranney

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] post-hoc test for aovp() function

2013-08-13 Thread j.cr...@neurocognition.org

Hello.
I am using the aovp() function from the library lmPerm with one factor
(group: 3 levels) controlling for 2 covariates. I now want to conduct a
post-hoc test using the same model. Unfortunately, I did not find an
appropriate test which works with 2 covariates. I would be grateful for any
suggestions. Thank you.
Julia



--
View this message in context: 
http://r.789695.n4.nabble.com/post-hoc-test-for-aovp-function-tp4673672.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular repeats

2013-08-13 Thread Berend Hasselman


On 13-08-2013, at 18:46, jsf1982 jamie.free...@ucl.ac.uk wrote:

 Hi,
 Many apologies for the simplicity (hopefully!) of this request - I can't
 find it on the forum, but it may have been asked in the past.
 
 I have a data frame consisting of ~2000 rows. I simply want to take the
 average of the first 6, then the next 6, then the next 6 until the end of
 the table. 
 The command 
 
 mean(mole[1:6,c(PercentPI)])
 
 gets me the first 6 rows (column is PercentPI), but I don't know how to
 increase the rows incrementally.


Something like this

N - 27
dd - data.frame(A=rnorm(1:N),index=gl(6,6,N))
aggregate(dd$A,by=list(dd$index),mean)

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regular repeats

2013-08-13 Thread arun

Hi,
You could try:
set.seed(24)
  dat1- as.data.frame(matrix(sample(1:50,29*6,replace=TRUE),ncol=6))

((seq_len(nrow(dat1))-1)%/%6)+1
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5


#For a particular column:
aggregate(dat1[,5],list(((seq_len(nrow(dat1))-1)%/%6)+1),FUN=mean)
#  Group.1    x
#1   1 38.16667
#2   2 29.5
#3   3 23.16667
#4   4 21.16667
#5   5 20.6
#or for the whole columns

aggregate(dat1,list(((seq_len(nrow(dat1))-1)%/%6)+1),FUN=mean)
#  Group.1   V1   V2   V3   V4   V5   V6
#1   1 28.3 17.5 12.7 35.0 38.16667 30.16667
#2   2 26.16667 31.3 35.3 19.7 29.5 24.8
#3   3 24.0 11.8 20.0 25.5 23.16667 20.8
#4   4 18.3 23.3 23.7 20.3 21.16667 21.16667
#5   5 22.6 30.4 17.4 21.8 20.6 24.4

#or

library(plyr)
res1-ddply(dat1,.(((seq_len(nrow(dat1))-1)%/%6)+1),summarize,MeanV1=mean(V1)) 
 colnames(res1)[1]-Group
res1
#  Group   MeanV1
#1 1 28.3
#2 2 26.16667
#3 3 24.0
#4 4 18.3
#5 5 22.6
A.K.





- Original Message -
From: jsf1982 jamie.free...@ucl.ac.uk
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 12:46 PM
Subject: [R] Regular repeats

Hi,
Many apologies for the simplicity (hopefully!) of this request - I can't
find it on the forum, but it may have been asked in the past.

I have a data frame consisting of ~2000 rows. I simply want to take the
average of the first 6, then the next 6, then the next 6 until the end of
the table. 
The command 

mean(mole[1:6,c(PercentPI)])

gets me the first 6 rows (column is PercentPI), but I don't know how to
increase the rows incrementally.

Thanks in advance.
J





--
View this message in context: 
http://r.789695.n4.nabble.com/Regular-repeats-tp4673653.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert list with missing values to dataFrame

2013-08-13 Thread arun

Hi,

You could try:

tmp[,1]- as.character(tmp[,1])
 tmp[,1][-grep(,,tmp[,1])]-paste0(,,,tmp[,1][-grep(,,tmp[,1])])
tmp2-data.frame(read.table(text=tmp[,1],sep=,,header=FALSE,stringsAsFactors=FALSE),rID=tmp[,2],stringsAsFactors=FALSE)
  colnames(tmp2)[1:3]-paste(sID,letters[1:3],sep=.)
tmp2
#  sID.a sID.b sID.c  rID
#1    NA    NA a  shr1125
#2 1 2 3   bwr331
#3    NA    NA b   bwr330
#4 4 5 6 vjhr1022

BTW,
 data.frame(sID,rID,stringsAsFactors=FALSE)#cbind is not needed.  In this case, 
it is okay, 
#    sID  rID
#1 a  shr1125
#2 1,2,3   bwr331
#3 b   bwr330
#4 4,5,6 vjhr1022
#But if they were of different class:
str(data.frame(cbind(sID,Col2=1:4),stringsAsFactors=FALSE))
#'data.frame':    4 obs. of  2 variables:
# $ sID : chr  a 1,2,3 b 4,5,6
# $ Col2: chr  1 2 3 4
 str(data.frame(sID,Col2=1:4,stringsAsFactors=FALSE))
#'data.frame':    4 obs. of  2 variables:
# $ sID : chr  a 1,2,3 b 4,5,6
# $ Col2: int  1 2 3 4



A.K.




- Original Message -
From: Steven Ranney steven.ran...@gmail.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 3:09 PM
Subject: [R] Convert list with missing values to dataFrame

I have a dataFrame

sID - c(a, 1,2,3, b, 4,5,6)
rID - c(shr1125, bwr331, bwr330, vjhr1022)

tmp - data.frame(cbind(sID,rID))

but I need to split tmp$sID into three different columns, filling locations
where tmp$sID has only one value with NA.

I can split tmp$sID by the comma

tmp.1 - strsplit(tmp$sID, ,)

but I can't figure out how to convert the resulting list into a dataFrame.

Ideally, tmp will become four columns wide, something like

sID.a  sID.b  sID.c  rID
NA     NA     a        shr1125
1        2       3        bwr331
NA     NA     b       bwr330
4        5        6      vjhr1022

Thoughts or suggestions?

I tried

havecomma - grep(',', tmp$sID)

for( i in 1:nrow(tmp)){
  if (!(tmp[i,] %in% havecomma)){
    tmp$sID[i] - paste(', ,', tmp$sID[i], sep=)
    }
    }

and thought that I might be able to force the list into a dataframe once
each component had three items, but it just seemed to apply the paste()
function to everything which gave me a list with varying numbers of items.

I'm stuck.

Thanks for your help -

SR






Steven H. Ranney

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Runtime error in R

2013-08-13 Thread Camilo Mora


Hi everyone:

I am running a code in R and I get the following message after using  
large files (files larger than 2GB):


Runtime error!
this application has requested the Runtime to terminate it in an usual way.
Please contact the application's support team for more information

Another person posted a similar situation years back with the use of a  
large data.table. but no solution was proposed. Does anyone has come  
across this problem? is there a fix?.


in R at 64bit, the message appear immediately after running the code.  
At 32bit, the code start but I run into the issue that the code  
reaches the RAM limit. This makes me to suspect that the error in  
64bit is related to a default value on how big the files could be.


Anyway, any help will be highly appreciated.

Cheers,

Camilo



Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
 Provider code: 313
 Phone 776 2282
 From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regression of categorical data

2013-08-13 Thread Walter Anderson

I have a set of survey data where I have answers to identify preference 
of three categories using three questions


1) a or b?
2) b or c?
3) a or c?


and want to obtain weights for each of the preferences

something like X(a) + Y(b) + Z(c) = 100%

I am at a loss how how to calculate this from the data.  Any help would 
be appreciated!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Runtime error in R

2013-08-13 Thread Jeff Newmiller

It would seem that in going to a 64 bit architecture you have not escaped your 
memory problems. Such problems are highly varied in details, so you would need 
to be much more specific about how you are encountering this problem before 
anyone could help. Read the Posting Guide and make a reproducible example and 
provide the output of sessionInfo just before the problem occurs.

Note that operating-system-specific solutions are sometimes necessary, but 
algorithmic solutions are usually the most powerful at scaling to larger 
sizes... that is, change your code or use a different tool for part or all of 
the work. It can become crucial to understand every step of the processing you 
are doing in such cases.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Camilo Mora cm...@dal.ca wrote:
Hi everyone:

I am running a code in R and I get the following message after using  
large files (files larger than 2GB):

Runtime error!
this application has requested the Runtime to terminate it in an usual
way.
Please contact the application's support team for more information

Another person posted a similar situation years back with the use of a 

large data.table. but no solution was proposed. Does anyone has come  
across this problem? is there a fix?.

in R at 64bit, the message appear immediately after running the code.  
At 32bit, the code start but I run into the issue that the code  
reaches the RAM limit. This makes me to suspect that the error in  
64bit is related to a default value on how big the files could be.

Anyway, any help will be highly appreciated.

Cheers,

Camilo



Camilo Mora, Ph.D.
Department of Geography, University of Hawaii
Currently available in Colombia
Phone:   Country code: 57
  Provider code: 313
  Phone 776 2282
  From the USA or Canada you have to dial 011 57 313 776 2282
http://www.soc.hawaii.edu/mora/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to store and manipulate survey data like this?

2013-08-13 Thread Walter Anderson


On 08/13/2013 11:41 AM, Siraaj Khandkar wrote:

On 08/13/2013 12:17 PM, Walter Anderson wrote:

I have to process a set of survey data with questions that are formatted
like this;

1) Pick your top three breeds (pick 3)
  1  Rottweiler
  2  Pit Bull
  3  German Shepard
  4  Poodle
  5  Border Collie
  6  Dalmation
  7  Mixed Breed

and the answers are formatted like this:

Respondent, Question1
1, 1,4,7
2, 2,7,5
3, 6,3,5
4, 
...

Any suggestions on how to preprocess the file to be able to do things
like frequency analysis for breeds?



Here's how I would get started:


 survey - read.csv(survey.csv, as.is=TRUE)
 survey
  Respondent Question1
1  1 1,4,7
2  2 2,7,5
3  3 6,3,5
4  4

 TipleOrNAs - function(x) {if (length(x) == 3) x else c(NA, NA, NA)}
 options - lapply(strsplit(survey$Question1, ,), TripleOrNAs)
 options - matrix(unlist(options), ncol=3, byrow=TRUE)
 survey2 - cbind(survey, options)
 names(survey2) - c(names(survey), paste(Q1.Opt, 1:3, sep=.))
 survey2
  Respondent Question1 Q1.Opt.1 Q1.Opt.2 Q1.Opt.3
1  1 1,4,7147
2  2 2,7,5275
3  3 6,3,5635
4  4   NA NA NA





Thank you!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to store and manipulate survey data like this?

2013-08-13 Thread arun

Hi,
You could try:
dat2- read.table(text='
Respondent, Question1
1, 1,4,7
2, 2,7,5
3, 6,3,5
4, 
',sep=,,header=TRUE,stringsAsFactors=FALSE)
library(stringr)
dat2New-cbind(dat2,do.call(rbind,lapply( 
str_split(str_trim(dat2[,2]),,),as.numeric)))
colnames(dat2New)[3:5]- paste(Q1,colnames(dat2New)[3:5],sep=.)
 dat2New
#  Respondent Question1 Q1.1 Q1.2 Q1.3
#1  1 1,4,7    1    4    7
#2  2 2,7,5    2    7    5
#3  3 6,3,5    6    3    5
#4  4 NA   NA   NA
A.K.



- Original Message -
From: Walter Anderson wandrso...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 13, 2013 12:17 PM
Subject: [R] How to store and manipulate survey data like this?

I have to process a set of survey data with questions that are formatted 
like this;

1) Pick your top three breeds (pick 3)
  1  Rottweiler
  2  Pit Bull
  3  German Shepard
  4  Poodle
  5  Border Collie
  6  Dalmation
  7  Mixed Breed

and the answers are formatted like this:

Respondent, Question1
1, 1,4,7
2, 2,7,5
3, 6,3,5
4, 
...

Any suggestions on how to preprocess the file to be able to do things 
like frequency analysis for breeds?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Understanding S4 method dispatch

2013-08-13 Thread Hervé Pagès


Hi Hadley,

I suspect that the dispatch algorithm doesn't realize that selection
is ambiguous in your example. For 2 reasons:

  (1) When it does realize it, it notifies the user:

setClass(A, NULL)
setGeneric(f, function(x, y) standardGeneric(f))
setMethod(f, signature(A, ANY), function(x, y) A-ANY)
setMethod(f, signature(ANY, A), function(x, y) ANY-A)
a - new(A)

  Then:

 f(a, a)
Note: method with signature ‘A#ANY’ chosen for function ‘f’,
 target signature ‘A#A’.
 ANY#A would also be valid
[1] A-ANY

  (2) When dispatch is ambiguous, the first method lexicographically in
  the ordering should be selected (according to ?Methods). So it
  should be A#A, not B#B.

So it looks like a bug to me...

Cheers,
H.


On 08/13/2013 06:08 AM, Hadley Wickham wrote:

Hi all,

Any insight into the code below would be appreciated - I don't
understand why two methods which I think should have equal distance
from the call don't.

Thanks!

Hadley

# Create simple class hierarchy
setClass(A, NULL)
setClass(B, A)

a - new(A)
b - new(B)

setGeneric(f, function(x, y) standardGeneric(f))
setMethod(f, signature(A, A), function(x, y) A-A)
setMethod(f, signature(B, B), function(x, y) B-B)

# These work as I expect
f(a, a)
f(b, b)

setClass(AB, contains = c(A, B))
ab - new(AB)

# Why does this return B-B? Shouldn't both methods be an equal distance?
f(ab, ab)

# These both return distance 1, as I expected
extends(AB, A, fullInfo=TRUE)@distance
extends(AB, B, fullInfo=TRUE)@distance
# So why is signature(B, B) closer than signature(A, A)



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: bwplot - changing box colors in legend and plot when using panel.groups = function... and panel = panel.superpose

2013-08-13 Thread Duncan Mackay

I had a similar problem and found when looking 
inside one of the lattice functions that the 
legend colours are controlled by the superpose 
series eg superpose.line, superpose.polygon etc

in trellis.par.set/get or par.settings

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au



At 02:28 14/08/2013, you wrote:

Content-Type: text/plain
Content-Disposition: inline
Content-length: 3675

I think I understand your question.  You need to make sure that you are
setting the right parameters in your theme.  Use trellis.par.get() to  have
a look at the MANY possible settings.  For example, in your case, to have
the boxplots and rectangles be the same color:

my.theme - list(
  box.umbrella = list(col = black),
  box.rectangle = list(fill= rep(c(black, black),2)),
  box.dot = list(col = black, pch = 3, cex=2),
  plot.symbol   = list(cex = 1, col = 1, pch= 0), #outlier size and color
  par.xlab.text = font.settings,
  par.ylab.text = font.settings,
  axis.text = font.settings,
  #strip.shingle=list(col=c(red,blue)),
  superpose.symbol=list(fill=c(red,blue)), # boxplots
  #superpose.fill=list(col=c(red,blue)),
  superpose.polygon=list(col=c(red,blue)), # legend
  par.sub=font.settings)

Kevin Wright



On Tue, Aug 13, 2013 at 9:00 AM, Anna Zakrisson Braeunlich 
anna.zakris...@su.se wrote:

 Hi,

 Yes, I have searched stack overflow.

 My issue is to simply change coloring in boxes and legend in my bwplot. I
 have done this many times in lattice, but now I have been tweaking the plot
 somewhat and I can no longer apply the color changes.
 I would really appreciate some help.
 A. Zakrisson

 Here is some dummy data and my script:

 mydata- data.frame(factor1 = factor(rep(LETTERS[1:6], each = 80)),
 factor2 = factor(rep(c(1:2), each = 16)),
 var1 = rnorm(120, mean = rep(c(0, 3, 5), each = 40),
  sd = rep(c(1, 2, 3), each = 20)))

 font.settings - list( font = 1, cex = 1, fontfamily = serif)
 my.theme - list(
   box.umbrella = list(col = black),
   box.rectangle = list(fill= rep(c(black, black),2)),
   box.dot = list(col = black, pch = 3, cex=2),
   plot.symbol   = list(cex = 1, col = 1, pch= 0), #outlier size and color
   par.xlab.text = font.settings,
   par.ylab.text = font.settings,
   axis.text = font.settings,
   par.sub=font.settings)

 bwplot(var1 ~ factor1, data = mydata, groups = factor2,
box.width = 1/3,#width of the boxes
auto.key = list(points = FALSE,
rectangles = TRUE, space = right,
title=Year, cex.title=1),
panel = panel.superpose,
ylab = var1,
xlab=factor1,
par.settings = my.theme,
panel.groups = function(x, y, ..., group.number) {
  panel.bwplot(x + (group.number-1.8)/3, y, ...)
})


 Anna Zakrisson Braeunlich
 PhD student

 Department of Ecology, Environment and Plant Sciences
 Stockholm University
 Svante Arrheniusv. 21A
 SE-106 91 Stockholm
 Sweden/Sverige

 Lives in Berlin.
 For paper mail:
 Katzbachstr. 21
 D-10965, Berlin - Kreuzberg
 Germany/Deutschland

 E-mail: anna.zakris...@su.se
 Tel work: +49-(0)3091541281
 Mobile: +49-(0)15777374888
 LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

 º`. .  `. . `. . º`. .  `. . `. .º`. . 
 `. . `. .º

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Kevin Wright

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with adding SD to graph

2013-08-13 Thread Marc Girondot


Among the many solutions, here is the one using phenology package:

library(phenology)

plot_errbar(1:100, rnorm(100, 1, 2),
xlab=axe x, ylab=axe y, bty=n, xlim=c(1,100),
errbar.x=2, errbar.y=rnorm(100, 1, 0.1))


or

x - 1:100
plot_errbar(x=1:100, rnorm(100, 1, 2),
xlab=axe x, ylab=axe y, bty=n, xlim=c(1,100),
x.minus=x-2, x.plus=x+2)



Sincerely

Marc Girondot


Le 12/08/13 15:41, Hedera a Ã©crit :
 Hello,

 I really need help, I am completely new in using R and many things were
 possible to figure out but not my last problem.
 I created a  dotchart with the dotchart2 command. On the y axis are my 16
 groups and plotted is the mean of the data from each group. And now I want
 to add the SD for every mean data point. Sure, I can calculate the SD (with
 removing the NAs that are present in my data by na.rm=TRUE) but I donÂ´t know
 how to combine the graph with the SD. I found the arrows function but it
 looks weird when I try to plot:
 arrows(x0=m,y0=c(1:16),y1=c(1:16)-1,x1=m+1 length=0)   - m is the
 calculated mean of my data with removed NAs and 167 because of the 16
 treatments
 Has somebody please, please any (easy) suggestion how to do it?

 I hope to get some help.
 Thank you very much for reading this.



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/help-with-adding-SD-to-graph-tp4673558.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
__
Marc Girondot, Pr

Laboratoire Ecologie, SystÃ©matique et Evolution
Equipe de Conservation des Populations et des CommunautÃ©s
CNRS, AgroParisTech et UniversitÃ© Paris-Sud 11 , UMR 8079
BÃ¢timent 362
91405 Orsay Cedex, France

Tel:  33 1 (0)1.69.15.72.30   Fax: 33 1 (0)1.69.15.73.53
e-mail: marc.giron...@u-psud.fr
Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html
Skype: girondot


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grImport/ghostscript problems

2013-08-13 Thread Andrew Halford

Hi Listers

I have been trying to import a .ps graphic file into R using the grImport
package but I keep getting the following error message

Error in PostScriptTrace(fish.ps) :
  status 127 in running command 'gswin32c.exe -q -dBATCH -dNOPAUSE
-sDEVICE=pswrite
-sOutputFile=C:\Users\ahalford\AppData\Local\Temp\Rtmp6BOVDe\fileffc30613d6
-sstdout=fish.ps.xml capturefish.ps'

Any advice appreciated.

Andy


-- 
Andrew Halford Ph.D
Adjunct Research Scientist
University of Guam  Curtin University
Ph: +61 (0) 468 419 473

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grImport/ghostscript problems

2013-08-13 Thread Pascal Oettli

Hello,

What is the result of sessionInfo()?

Regards,
Pascal



2013/8/14 Andrew Halford andrew.half...@gmail.com

 Hi Listers

 I have been trying to import a .ps graphic file into R using the grImport
 package but I keep getting the following error message

 Error in PostScriptTrace(fish.ps) :
   status 127 in running command 'gswin32c.exe -q -dBATCH -dNOPAUSE
 -sDEVICE=pswrite
 -sOutputFile=C:\Users\ahalford\AppData\Local\Temp\Rtmp6BOVDe\fileffc30613d6
 -sstdout=fish.ps.xml capturefish.ps'

 Any advice appreciated.

 Andy


 --
 Andrew Halford Ph.D
 Adjunct Research Scientist
 University of Guam  Curtin University
 Ph: +61 (0) 468 419 473

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grap Element from Web Page

2013-08-13 Thread Sparks, John James

Dear R Helpers,

I would like to pull the CIK number from the web page

http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany

If you put this web page into your browser you will see the CIK number in
red on the left side of the page near the top.

When I try the basic
require(scrapeR)
require(XML)
require(RCurl)
doc
-htmlTreeParse(http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFTFind=Searchowner=excludeaction=getcompany;)
str(doc)

I get a large number of items in the data frame that I don't know how to
interpret.  Both
tables - readHTMLTable(doc)

and

list-xmlToList(doc)

result in errors.

Any (positive) guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

75 matches

Mail list logo