Re: [R] Question on output of vectors from a for loop into a matrix

2007-08-16 Thread Dylan Beaudette
On Thursday 16 August 2007 15:35, Ryan Briscoe Runquist wrote:
> Hello R-help,
>
> I am a recent convert to R from SAS and I am having some difficulty with
> output of a for loop into a matrix.  I have searched the help manuals and
> the archives, but I can't get my code to work.  It is probably a syntax
> error that I am not spotting.  I am trying to make a distance matrix by
> extracting the distances from each point to all others in a vector (the for
> loop).  I can get them all to print and can save each individually to a
> file, but I cannot get them to be bound into a matrix.
>
> Here is the code that I have tried:
>
> m<-length(Day.1.flower.coords$x) #31 grid points
>
> output.matix<-matrix(0.0,m,m)
> for(i in 1:m){
>   dist<-spDistsN1(Day.1.coords.matrix, Day.1.coords.matrix[i,])
>   dist<-as.vector(dist)
>   output.matrix[i,]<-dist
>   print(dist)}
>
> The error message that I get is:
>
> Error in output.matrix[i,] <- dist : incorrect number of subscripts on
> matrix
>
> Thanks for your help.
>
> Ryan
>
> ~~
> Ryan D. Briscoe Runquist
> Population Biology Graduate Group
> University of California, Davis
> [EMAIL PROTECTED]
>


Hi Bryan, for all UCD students you have the luxury of not using a loop! :) 

Would something like this work - for the creation of a 'distance matrix' :

# example dataset: 2 x 2 grid
d <- expand.grid(x=1:2, y=1:2)

# an instructive plot
plot(d, type='n')
text(d, rownames(d))

# create distance object and convert to matrix:
m <- as.matrix(dist(d))
m

 1234
1 0.00 1.00 1.00 1.414214
2 1.00 0.00 1.414214 1.00
3 1.00 1.414214 0.00 1.00
4 1.414214 1.00 1.00 0.00

# is that the kind of distance matrix you are looking for : 

?dist

Distance Matrix Computation

Description:

 This function computes and returns the distance matrix computed by
 using the specified distance measure to compute the distances
 between the rows of a data matrix.
[...]

cheers,

Dylan

> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve in R

2007-08-03 Thread Dylan Beaudette
On Thursday 26 July 2007 10:45, Frank E Harrell Jr wrote:
> Dylan Beaudette wrote:
> > On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
> >> Note that even though the ROC curve as a whole is an interesting
> >> 'statistic' (its area is a linear translation of the
> >> Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
> >> statistics), each individual point on it is an improper scoring rule,
> >> i.e., a rule that is optimized by fitting an inappropriate model.  Using
> >> curves to select cutoffs is a low-precision and arbitrary operation, and
> >> the cutoffs do not replicate from study to study.  Probably the worst
> >> problem with drawing an ROC curve is that it tempts analysts to try to
> >> find cutoffs where none really exist, and it makes analysts ignore the
> >> whole field of decision theory.
> >>
> >> Frank Harrell
> >
> > Frank,
> >
> > This thread has caught may attention for a couple reasons, possibly
> > related to my novice-level experience.
> >
> > 1. in a logistic regression study, where i am predicting the probability
> > of the response being 1 (for example) - there exists a continuum of
> > probability values - and a finite number of {1,0} realities when i either
> > look within the original data set, or with a new 'verification' data set.
> > I understand that drawing a line through the probabilities returned from
> > the logistic regression is a loss of information, but there are times
> > when a 'hard' decision requiring prediction of {1,0} is required. I have
> > found that the ROCR package (not necessarily the ROC Curve) can be useful
> > in identifying the probability cutoff where accuracy is maximized. Is
> > this an unreasonable way of using logistic regression as a predictor?

Thanks for the detailed response Frank. My follow-up questions are below:

> Logistic regression (with suitable attention to not assuming linearity
> and to avoiding overfitting) is a great way to estimate P[Y=1].  Given
> good predicted P[Y=1] and utilities (losses, costs) for incorrect
> positive and negative decisions, an optimal decision is one that
> optimizes expected utility.  The ROC curve does not play a direct role
> in this regard.  

Ok.

> If per-subject utilities are not available, the analyst 
> may make various assumptions about utilities (including the unreasonable
> but often used assumption that utilities do not vary over subjects) to
> find a cutoff on P[Y=1]. 

Can you elaborate on what exactly a "per-subject utility" is? In my case, I am 
trying to predict the occurance of specific soil features based on two 
predictor variables: 1 continuous, the other categorical.  Thus far my 
evaluation of how well this method works is based on how often I can 
correctly predict (a categorical) quality.


> A very nice feature of P[Y=1] is that error 
> probabilities are self-contained.  For example if P[Y=1] = .02 for a
> single subject and you predict Y=0, the probability of an error is .02
> by definition.  One doesn't need to compute an overall error probability
> over the whole distribution of subjects' risks.  If the cost of a false
> negative is C, the expected cost is .02*C in this example.

Interesting. The hang-up that I am having is that I need to predict from 
{O,1}, as the direct users of this information are not currently interested 
in in raw probabilities. As far as I know, in order to predict a class from a 
probability I need use a cutoff... How else can I accomplish this without 
imposing a cutoff on the entire dataset? One thought, identify a cutoff for 
each level of the categorical predictor term in the model... (?)

> > 2. The ROC curve can be a helpful way of communicating false positives /
> > false negatives to other users who are less familiar with the output and
> > interpretation of logistic regression.
>
> What is more useful than that is a rigorous calibration curve estimate
> to demonstrate the faithfulness of predicted P[Y=1] and a histogram
> showing the distribution of predicted P[Y=1]

Ok. I can make that histogram - how would one go about making the 'rigorous 
calibration curve' ? Note that I have a training set, from which the model is 
built, and a smaller testing set for evaluation. 


> .  Models that put a lot of 
> predictions near 0 or 1 are the most discriminating.  Calibration curves
> and risk distributions are easier to explain than ROC curves.

By 'risk discrimination' do you mean said histogram ?

> Too often 
> a statistician will solve for a cutoff on P[Y=1], imposing her own
> utility function without querying any subjects.

in this case I have 

Re: [R] ROC curve in R

2007-07-26 Thread Dylan Beaudette
On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
> Note that even though the ROC curve as a whole is an interesting
> 'statistic' (its area is a linear translation of the
> Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
> statistics), each individual point on it is an improper scoring rule,
> i.e., a rule that is optimized by fitting an inappropriate model.  Using
> curves to select cutoffs is a low-precision and arbitrary operation, and
> the cutoffs do not replicate from study to study.  Probably the worst
> problem with drawing an ROC curve is that it tempts analysts to try to
> find cutoffs where none really exist, and it makes analysts ignore the
> whole field of decision theory.
>
> Frank Harrell

Frank,

This thread has caught may attention for a couple reasons, possibly related to 
my novice-level experience. 

1. in a logistic regression study, where i am predicting the probability of 
the response being 1 (for example) - there exists a continuum of probability 
values - and a finite number of {1,0} realities when i either look within the 
original data set, or with a new 'verification' data set. I understand that 
drawing a line through the probabilities returned from the logistic 
regression is a loss of information, but there are times when a 'hard' 
decision requiring prediction of {1,0} is required. I have found that the 
ROCR package (not necessarily the ROC Curve) can be useful in identifying the 
probability cutoff where accuracy is maximized. Is this an unreasonable way 
of using logistic regression as a predictor? 

2. The ROC curve can be a helpful way of communicating false positives / false 
negatives to other users who are less familiar with the output and 
interpretation of logistic regression. 


3. I have been using the area under the ROC Curve, kendall's tau, and cohen's 
kappa to evaluate the accuracy of a logistic regression based prediction, the 
last two statistics based on a some probability cutoff identified before 
hand. 


How does the topic of decision theory relate to some of the circumstances 
described above? Is there a better way to do some of these things?

Cheers,

Dylan



>
> [EMAIL PROTECTED] wrote:
> > http://search.r-project.org/cgi-bin/namazu.cgi?query=ROC&max=20&result=no
> >rmal&sort=score&idxname=Rhelp02a&idxname=functions&idxname=docs
> >
> > there is a lot of help try help.search("ROC curve") gave
> > Help files with alias or concept or title matching 'ROC curve' using
> > fuzzy matching:
> >
> >
> >
> > granulo(ade4) Granulometric Curves
> > plot.roc(analogue)Plot ROC curves and associated
> > diagnostics
> > roc(analogue) ROC curve analysis
> > colAUC(caTools)   Column-wise Area Under ROC
> > Curve (AUC)
> > DProc(DPpackage)  Semiparametric Bayesian ROC
> > curve analysis
> > cv.enet(elasticnet)   Computes K-fold cross-validated
> > error curve for elastic net
> > ROC(Epi)  Function to compute and draw
> > ROC-curves.
> > lroc(epicalc) ROC curve
> > cv.lars(lars) Computes K-fold cross-validated
> > error curve for lars
> > roc.demo(TeachingDemos)   Demonstrate ROC curves by
> > interactively building one
> >
> > HTH
> > see the help and examples those will suffice
> >
> > Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
> >
> >
> >
> > Regards,
> >
> > Gaurav Yadav
> > +++
> > Assistant Manager, CCIL, Mumbai (India)
> > Mob: +919821286118 Email: [EMAIL PROTECTED]
> > Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
> >
> >
> >
> > "Rithesh M. Mohan" <[EMAIL PROTECTED]>
> > Sent by: [EMAIL PROTECTED]
> > 07/26/2007 11:26 AM
> >
> > To
> > 
> > cc
> >
> > Subject
> > [R] ROC curve in R
> >
> >
> >
> >
> >
> >
> > Hi,
> >
> >
> >
> > I need to build ROC curve in R, can you please provide data steps / code
> > or guide me through it.
> >
> >
> >
> > Thanks and Regards
> >
> > Rithesh M Mohan
> >
> >
> >  [[alternative HTML version deleted]]
>
> -
> Frank E Harrell Jr   Professor and Chair   School of Medicine
>   Department of Biostatistics   Vanderbilt University
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] DF and intercept term meaning for mixed (lme) models

2007-07-25 Thread Dylan Beaudette
Hi,

I am using the lme package to fit mixed effects models to a set of data.

I am having a difficult time understanding the *meaning* of the numDF (degrees 
of freedom in the numerator), denDF (DF in the denomenator), as well as the 
Intercept term in the output.

For example:

I have a groupedData object called 'Soil', and am fitting an lme model as 
follows:

## fit a simple model
# errors partitioned among replicates
fit1 <- lme(
   log(ksat) ~ log(conc) + ordered(sar) + soil_id ,
   random = ~ 1 | rep,
   data=Soil
)

## check significance of model terms
anova(fit1)

              numDF denDF  F-value p-value
(Intercept)      1  1253 64313.21  <.0001
log(conc)        1   597   173.34  <.0001
ordered(sar)     2   597    13.87  <.0001
soil_id         29   597    54.92  <.0001


I am pretty sure that I am interpreting the p-values for the predictor terms 
to mean that these terms contribute significantly to the variation in the 
response variable, (?) . I am not sure what the significance of the Intercept 
term really means. Does it have something to do with the significance of the 
random effects in the model?

Also, from a practical standpoint, how can I best describe / interpret the 
numDF and denDF terms to others... or do they even matter to a person who is 
looking to see if the 'treatment' predictor terms had any effect on the 
response term?

Thanks in advance. 

Dylan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reversing the x-axis terms on a boxplot

2007-07-25 Thread Dylan Beaudette
Hi,

I am able to reverse the order of plotting on regular plots (i.e. with the 
plot() function) by manually setting the xlim variable. 

Is there some trick like this which will work for a boxplot?

* for example:

l <- sample(letters, 500, replace=TRUE)
n <- runif(500)
boxplot(n ~ l)


this will produce a plot with the x-axis ranging from a->z ... i know that 
these are labels, associated with an integer index-- but is there someway to 
reverse the plotting order?

Thanks in advance,

Dylan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inter-rater agreement index kappa

2007-06-26 Thread Dylan Beaudette
On Tuesday 26 June 2007 10:14, Nair, Murlidharan T wrote:
> Is there a function that calculates the inter-rater agreement index
> (kappa) in R?
>
> Thanks ../Murli

I have found a couple useful approaches:

# PCC, kappa, rand index
require(e1701)
classAgreement(2x2.table)

# kendall's tau
cor(x,y, method='kendall')

cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] off-topic: affine transformation matrix

2007-05-29 Thread Dylan Beaudette
Thanks for the prompt and clear reply! The simplicity of the solution may have 
been why I initially overlooked this approach...


The results look convincing (http://169.237.35.250/~dylan/temp/affine.png), 
now I just need to verify that the output from coef() is in the format that I 
need it in.


l <- lm(cbind(nx,ny) ~ x + y, data=g)
coef(l)
 nx   ny
(Intercept)  6.87938629  5.515261158
x1.01158806 -0.005449152
y   -0.04481893  0.996895878


## convert to format needed for affine() function in postGIS?
t(coef(l))

   (Intercept)x   y
nx6.879386  1.011588063 -0.04481893
ny5.515261 -0.005449152  0.99689588


note that the format that I am looking for looks something like the matrix 
defined on this page:
http://www.geom.uiuc.edu/docs/reference/CRC-formulas/node15.html

cheers,

dylan



On Monday 28 May 2007 15:18, Prof Brian Ripley wrote:
> Isn't this just a regression (hopefully with a near-zero error).
>
> coef(lm(cbind(xnew, ynew) ~ xold + yold))
>
> should do what I think you are asking for.  (I am not clear which
> direction you want the transformation, so choose 'old' and 'new'
> accordingly.)
>
> On Mon, 28 May 2007, Dylan Beaudette wrote:
> > This may sound like a very naive question, but...
> >
> > give two lists of coordinate pairs (x,y - Cartesian space) is there any
> > simple way to compute the affine transformation matrix in R.
> >
> > I have a set of data which is offset from where i know it should be. I
> > have coordinates of the current data, and matching coordinates of where
> > the data should be. I need to compute the composition of the affine
> > transformation matrix, so that I can apply an affine transform the entire
> > dataset.
> >
> > any ideas?
> >
> > thanks in advance!

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] off-topic: affine transformation matrix

2007-05-28 Thread Dylan Beaudette
This may sound like a very naive question, but...

give two lists of coordinate pairs (x,y - Cartesian space) is there any simple 
way to compute the affine transformation matrix in R.

I have a set of data which is offset from where i know it should be. I have 
coordinates of the current data, and matching coordinates of where the data 
should be. I need to compute the composition of the affine transformation 
matrix, so that I can apply an affine transform the entire dataset. 

any ideas?

thanks in advance!
-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cross-validation / sensitivity anaylsis for logistic regression model

2007-05-14 Thread Dylan Beaudette
Hi,

I have developed a logistic regression model in the form of (factor_1~ numeric 
+ factor_2) and  would like to perform a cross-validation or some similar 
form of sensitivity analysis on this model.

using cv.glm() from the boot package:

# dataframe from which model was built in 'z'
# model is called 'm_geo.lrm'

# as suggested in the man page for a binomial model:
cost <- function(r, pi=0) mean(abs(r-pi)>0.5)
cv.10.err <- cv.glm(z, m_geo.lrm, cost, K=10)$delta

I get the following:
cv.10.err
1 1 
0.275 0.281

Am I correct in interpreting that this is the mean estimated error percentage 
for this specified model, after 10 runs of the cross-validation?

any tips on understanding the output from cv.glm() would be greatly 
appreciated. I am mostly looking to perform a sensitivity analysis with this 
model and dataset - perhaps there are other methods?

thanks

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate similar to SPSS

2007-04-25 Thread Dylan Beaudette
?table

On Wednesday 25 April 2007 14:32, Natalie O'Toole wrote:
> Hi,
>
> Does anyone know if: with R can you take a set of numbers and aggregate
> them like you can in SPSS? For example, could you calculate the percentage
> of people who smoke based on a dataset like the following:
>
> smoke = 1
> non-smoke = 2
>
> variable
> 1
> 1
> 1
> 2
> 2
> 1
> 1
> 1
> 2
> 2
> 2
> 2
> 2
> 2
>
>
> When aggregated, SPSS can tell you what percentage of persons are smokers
> based on the frequency of 1's and 2's. Can R statistical package do a
> similar thing?
>
> Thanks,
>
> Nat
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] locate nearest value in lookup table

2007-04-07 Thread Dylan Beaudette
Hi everyone,

I have a pile of data derived from an analytical device, which reports
values as a continuous distribution. I need to associate classes
(based on the Munsell color system) using a standard look-up table -
the problem is that I would like to find the *closest* matching entry
in the lookup table.

I have attempted to do this by first creating as difference vector
between color space coordinates associated with a single analytical
value and the entire lookup table. Then I found the smallest
difference vector, and used that entry in the lookup table. Hoerver,
This does not always produce logical results...

here are the steps:

# read in the data
soil <- read.table("http://169.237.35.250/~dylan/temp/munsell-all.dat";,
header=T)

x <- read.csv('http://169.237.35.250/~dylan/temp/data_dump_4-7-2007.csv',
header=FALSE)

# extract important variables X,Y,Z
c <- data.frame(id=x$V1, X=x$V2, Y=x$V3, Z=x$V4)

# convert munsell to XYZ
soil$X <- soil$x * (soil$Y/soil$y)
soil$Y <- soil$Y
soil$Z <- (1 - soil$x - soil$y) * (soil$Y / soil$y)

## init some  vars and try to find the closest in the table
res <- list()
for( i in as.numeric(rownames(c)) )
{

# compute a difference for each color component
d <- cbind( abs(soil$X - c$X[i]), abs(soil$Y - c$Y[i]), abs(soil$Z - c$Z[i]) )

# find the difference vector
# 3D distance formula
b <- sqrt( sqrt( d[,1]^2 + d[,2]^2) + d[,3]^2)

# get the smallest difference vector
i.closest <- head( soil[order(b), ], 1)

res$input_row[i] <- i
res$H[i] <-  as.vector(unlist(i.closest[1]))
res$V[i] <- as.vector(unlist(i.closest[2]))
res$C[i] <- as.vector(unlist(i.closest[3]))
res$diff_vect[i] <- min(b)

}

# summarize the conversions
paste(res$input_row, res$H, res$V, res$C)


Does this approach even make sense?


thanks,

dylan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prop.test or chisq.test ..?

2007-02-28 Thread Dylan Beaudette
On Wednesday 28 February 2007 01:07, Christoph Buser wrote:
> Hi
>
> Some comments are inside.
>
> Dylan Beaudette writes:
>  > Hi everyone,
>  >
>  > Suppose I have a count the occurrences of positive results, and the
>  > total number of occurrences:
>  >
>  >
>  > pos <- 14
>  > total <- 15
>  >
>  > testing that the proportion of positive occurrences is greater than 0.5
>  > gives a p-value and confidence interval:
>  >
>  > prop.test( pos, total, p=0.5, alternative='greater')
>  >
>  > 1-sample proportions test with continuity correction
>  >
>  > data:  14 out of 15, null probability 0.5
>  > X-squared = 9.6, df = 1, p-value = 0.0009729
>  > alternative hypothesis: true p is greater than 0.5
>  > 95 percent confidence interval:
>  >  0.706632 1.00
>  > sample estimates:
>  > p
>  > 0.933
>
> First of all by default there is a continuity correction in
> prop.test(). If you use
>
> prop.test(pos, total, p=0.5, alternative="greater", correct = FALSE)
>
>   1-sample proportions test without continuity correction
>
> data:  pos out of total, null probability 0.5
> X-squared = 11.2667, df = 1, p-value = 0.0003946
> alternative hypothesis: true p is greater than 0.5
> 95 percent confidence interval:
>  0.7492494 1.000
> sample estimates:
> p
> 0.933
>
> Remark that know the X-squared is identical to  your result from
> chisq.test(), but the p-value is 0.0007891/2
>
> The reason is that you are testing here the against the
> alternative "greater"
>
> If you use a two sided test
>
> prop.test(pos, total, p=0.5, alternative="two.sided", correct = FALSE)
>
> then you reporduce the results form chisq.test().
>
>  > My question is how does the use of chisq.test() differ from the above
>  > operation. For example:
>  >
>  > chisq.test(table( c(rep('pos', 14), rep('neg', 1)) ))
>  >
>  > Chi-squared test for given probabilities
>  >
>  > data:  table(c(rep("pos", 14), rep("neg", 1)))
>  > X-squared = 11.2667, df = 1, p-value = 0.0007891
>  >
>  > ... gives slightly different results. I am corrent in interpreting that
>  > the chisq.test() function in this case is giving me a p-value associated
>  > with the test that the probabilities of pos are *different* than the
>  > probabilities of neg -- and thus a larger p-value than the prop.test(...
>  > , p=0.5, alternative='greater') ?
>
> Yes. In your example chisq.test() tests the null hypothesis if
> all population probabilities are equal
>
> ?chisq.test says:
> "In this case, the hypothesis tested is whether the population
> probabilities equal those in 'p', or are all equal if 'p' is not
> given."
>
> which means p1 = p2 = 0.5 in your two population case against
> the alternative p1 != p2.
>
> This is similar to the test in prop.test() p=0.5 against p!=0.5
> and therefore you get identical results if you choose
> alternative="two.sided" in prop.test().
>
>  > I realize that this is a rather elementary question, and references to a
>  > text would be just as helpful. Ideally, I would like a measure of how
>  > much I can 'trust' that a larger proportion is also statistically
>  > meaningful. Thus far the results from prop.test() match my intuition,
>  > but
>  > affirmation would be
>
> Your intuition was correct. Nevertheless in your original
> results (with the continuity correction), the p-value of
> prop.test()  (0.0009729) was larger than the p-value of
> chisq.test() (0.0007891) and therefore against your intuition.
>
>  > great.
>  >
>  > Cheers,
>  >
>  >
>  > --
>  > Dylan Beaudette
>  > Soils and Biogeochemistry Graduate Group
>  > University of California at Davis
>  > 530.754.7341
>  >
>  > __
>  > R-help@stat.math.ethz.ch mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide
>  > http://www.R-project.org/posting-guide.html and provide commented,
>  > minimal, self-contained, reproducible code.
>
> Hope this helps
>
> Christoph Buser
>

Thanks for the tips Christoph, this was the help that I was looking for.

cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prop.test or chisq.test ..?

2007-02-26 Thread Dylan Beaudette
Hi everyone,

Suppose I have a count the occurrences of positive results, and the total 
number of occurrences:


pos <- 14
total <- 15

testing that the proportion of positive occurrences is greater than 0.5 gives 
a p-value and confidence interval:

prop.test( pos, total, p=0.5, alternative='greater')

1-sample proportions test with continuity correction

data:  14 out of 15, null probability 0.5 
X-squared = 9.6, df = 1, p-value = 0.0009729
alternative hypothesis: true p is greater than 0.5 
95 percent confidence interval:
 0.706632 1.00 
sample estimates:
p 
0.933


My question is how does the use of chisq.test() differ from the above 
operation. For example:

chisq.test(table( c(rep('pos', 14), rep('neg', 1)) ))

Chi-squared test for given probabilities

data:  table(c(rep("pos", 14), rep("neg", 1))) 
X-squared = 11.2667, df = 1, p-value = 0.0007891

... gives slightly different results. I am corrent in interpreting that the 
chisq.test() function in this case is giving me a p-value associated with the 
test that the probabilities of pos are *different* than the probabilities of 
neg -- and thus a larger p-value than the prop.test(... , p=0.5, 
alternative='greater') ? 

I realize that this is a rather elementary question, and references to a text 
would be just as helpful. Ideally, I would like a measure of how much I 
can 'trust' that a larger proportion is also statistically meaningful. Thus 
far the results from prop.test() match my intuition, but affirmation would be 
great.

Cheers,


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] model diagnostics for logistic regression

2007-02-13 Thread Dylan Beaudette
Greetings,

I am using both the lrm() {Design} and glm( , family=binomial()) to perform a 
a logisitic regression in R. Apart from the typical summary() methods, what 
other methods of diagnosing logistic regression models does R provide? i.e. 
plotting an 'lm' object, etc. 

Secondly, is there any facility to calculate the R^{2)_{L} as suggested by 
Menard in "Applied Logistic Regression Analysis" (2002) ?

Any thought would be greatly appreciated.

Cheers,


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sample size calculations

2007-02-05 Thread Dylan Beaudette
Greetings,

I have experimented with the MBESS and pwr packages for the estimation of 
sample size for a given CV, precision, and confidence interval. 

Thus far I have found the ss.aipe.cv {MBESS} (Sample size planning for the 
coefficient of variation given the goal of Accuracy in Parameter Estimation 
approach to sample size planning.) function to be best suited for my needs.

However, the data from which I am calculating my CV is approximately 
log-normally distributed- and thus has a large CV (1.4). Using this CV, 
precision (20% within the pop mean) and confidence interval (95%) parameters 
I obviously get a suggested sample size that is very large (n = 1182). By 
reducing my precision and confidence interval requirements to something like:
ss.aipe.cv(C.of.V=1.4, width=0.5, conf.level=0.9)
... the function still suggests about 230 samples which is near the upper 
limit of feasibility. 

I would like to deduce an optimal number of samples, however the log-normal 
distribution of this data suggests that the above approach is not well suited 
to this task. 

Are there any better approaches or references which might send me in the right 
direction?

Thanks in advance,


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extra panel arguments to plot.nmGroupedData {nlme}

2007-01-28 Thread Dylan Beaudette
Greetings,


I have a groupedData (nmGroupedData) object created with the following syntax:

Soil <- groupedData(
  ksat ~ conc | soil_id/sar/rep,
  data=soil.data,
  labels=list(x='Solution Concentration', y='Saturated Hydraulic Conductivity'),
  units=list(x='(cmol_c)', y='(cm/s)')
)


the original data represents longitudinal observations in the form of:
'data.frame':   1197 obs. of  5 variables:
 $ soil_id: Factor w/ 19 levels "Arbuckle","Campbell",..: 16 16 16 16
16 16 16 16 16 16 ...
 $ sar: int  12 12 12 12 12 12 12 6 6 6 ...
 $ conc   : num  500 100 50 10 5 1 0.003 500 100 50 ...
 $ rep: Factor w/ 3 levels "C1","C2","C3": 1 1 1 1 1 1 1 1 1 1 ...
 $ ksat   : num  0.000214 0.000207 0.000198 0.000160 0.000108 ...

the default plotting behaviour of this groupedData object works as expected:
plot(
  Soil,
  collapse=1, inner=~sar, aspect='fill',
  scales=list( x=list(log=TRUE), y=list(log=TRUE))
)

... however, there is no way for me to alter the panels...

for example, attempting to add a horizontal line with a call to
panel.abline= ... :
plot(
  Soil,
  collapse=1, inner=~sar, aspect='fill',
  scales=list( x=list(log=TRUE), y=list(log=TRUE)),
  FUN=mean,
  panel= function(x,y, subscripts, groups) {
panel.xyplot(x, y, type='o')
panel.abline(h=0.005, col='black', lty=2)
}
)

... results in an identical plot. Trying to re-create the results of
plot.nmGroupedData with a direct call to xyplot() has thus far been
unsucsessful- as I cannot figure out how to specifiy the original
formula in a meaningful way to xyplot: ksat ~ conc | soil_id/sar/rep .

Any tips on passing panel arguments to plot.nmGroupedData() would be
greatly appreciated.

Cheers,

Dylan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problem [cluster]

2006-12-02 Thread Dylan Beaudette
Hi Stephano,

Looks like you used my example verbatim 
(http://casoilresource.lawr.ucdavis.edu/drupal/node/221)

:)

While my approach has not *yet* been published, the original source [4] by 
Roger Bivand certainly has. Just a reminder.

That said, I would highly recommend reading up on the background literature 
assocated with both the cluster package [1] and terrain classificartion i.e.
[2] and [3]. Note that although the clara() function was created to work on 
massive datasets, it is still possible to overwhelm the available memory with 
multiple gridded objects- recall that all R objects are held in memory.

I have asked the maintainer of the cluster package, Martin Maechler, about 
integrating a known medoid option into the clara() function- which would be 
extremely useful in adding some 'supervision' to landscape classification 
with clara(). Hopefully there will be enough requests for the feature, that 
Martin will kindly add it :) .

1. Kaufman, L. & Rousseeuw, P.J. Finding Groups in Data An Introduction to 
Cluster Analysis Wiley-Interscience, 2005

2. Blaszczynski, J. Landform characterization with geographical information 
systems Photogrammetric Engineering and Remote Sensing, 1997, 63, 183-191

3. Wood, W.F. & Snell, J.B. A Quatitative system for classifying landforms 
U.S. Quatermaster Research & Engineering Center, 1960

4. Bivand, R. Integrating GRASS 5.0 and R: GIS and modern statistics Computers 
& Geosciences, 2000, 26, 1043–1052


On Friday 01 December 2006 14:04, Massimo Di Stefano wrote:
> hi to all,
> frustated for this error, to day i buy a 1 GB memory
> slot for my laptop
> now it have 1,28GB instead the old 512, but i've the
> same error :-(
> damn!damn!how can i do?
> repeat for a little area (about 20X20 km and res=20m)
> it work fine!
> have you any suggestion?
> is ther a method for look if this error depend from my
> ram or other?
> thanks foe any suggestion!
> i need your help.
> thanks.
> Massimo
>
>
> Il giorno 01/dic/06, alle ore 16:05, massimodisasha ha
> scritto:
> hi,
> i'm trying to perform a clustering on a big dataframe
> the code is this:
>
>
> print("load required R packages")
>
> require(spgrass6)
>
> require(cluster)
>
> gmeta6 <- gmeta6()
>
> print("read in our 7 raster files from GRASS")
>
> x <-
> readFLOAT6sp(c("er","crosc","longc","slope","profc","minic","maxic"))
>
> print("assemble a matrix of our terrain variables")
>
> morph <- data.frame(cbind(x$er, x$crosc, x$longc,
> x$slope, x$profc, x$minic, x$maxic))
>
> print("normailize slope by dividing my max(slope)")
>
> morph <- data.frame(cbind(x$er, x$crosc, x$longc,
> x$slope/max(x$slope), x$profc, x$minic, x$maxic))
>
> names(morph) <-
> c("er","crosc","longc","slope_n","profc","minic","maxic")
>
> print("perform the clustering")
>
> morph.clara <- clara(morph, k=5, stand=F)
>
> x$morph_class <- morph.clara$clustering
>
> print("send result back to GRASS")
>
> rast.put6(x,"morph", zcol="morph_class")
>
>
>
> during the step : perform the clustering
> after a lot of time,
> i've this error:
>
>
>
>
> Errore in sprintf(fmt, ...) : La lunghezza della
> stringa eccede la dimensione del buffer di 8192
> Inoltre: Warning messages:
> 1: perl = TRUE è implementato solo nei locale UTF-8
> 2: perl = TRUE è implementato solo nei locale UTF-8
> 3: perl = TRUE è implementato solo nei locale UTF-8
> 4: perl = TRUE è implementato solo nei locale UTF-8
> 5: perl = TRUE è implementato solo nei locale UTF-8
> 6: perl = TRUE è implementato solo nei locale UTF-8
> 7: perl = TRUE è implementato solo nei locale UTF-8
> 8: La stringa di caratteri verrà probabilmente
> troncata
> Esecuzione interrotta
>
>
>
> if i try the same code on a subregion of my data, it
> works very fine!
> but for a large region i've this error :-(
>
> obviously i think that is a memory problem, right ?
> (i'm working with a notebook PPC-1.33-512ram)
> my data are  : 7 raster-map on a region of about 50X40
> km at a resolution of 20m.
> is there some wolkaround about the memory problems?
>
> an other question is:
> what is this :
> Warning messages:
> 1: perl = TRUE è implementato solo nei locale UTF-8
> 2: perl = TRUE è implementato solo nei locale UTF-8
> 3: perl = TRUE è implementato solo nei locale UTF-8
> 4: perl = TRUE è implementato solo nei locale UTF-8
> 5: perl = TRUE è implementato solo nei locale UTF-8
> 6: perl = TR

[R] evaluation of 2 matrices: categorical comparison

2006-11-07 Thread Dylan Beaudette
Greetings,

I have two matrices, read in from raster data stored in GRASS. Each matrix 
represents the output of a aggregation process on categorical data (Nomial / 
Ordinal) - derived from imagery.

I have compared these two data by the following methods:

* pretending the data is continuous approaches (flawed i am sure...)
1. computing a difference map by computing the absolute difference 
(cell-by-cell) between the two outputs

2. simple linear regression / scatter plot of the two outputs

3. 2D histogram via. hist2d() in the gplots package


*** categorical comparisons
4. conversion of each matrix to a vector of factors and:

5. frequency tabulation for each level

6. mosiacplot 

7. computation of Cohen's Kappa (using the cohen.kappa function from 
the 'concord' package) :
# a and b are vectors with the 'factor-ized' matrices, stored as vectors
cohen.kappa(table(a,b), type='count')


A lengthy summary of this small adventure has been recorded here:
http://casoilresource.lawr.ucdavis.edu/drupal/node/275/

Please note that I am looking for a *better?* way to compare these two 
matrices. It could be that the Cohen's Kappa, and difference maps are the 
best that I am going to get.

Thanks in advance,


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any good way to convert sp class objects to splancs object

2006-10-25 Thread Dylan Beaudette
On Wednesday 25 October 2006 12:02, Roger Bivand wrote:
> On Wed, 25 Oct 2006, Dylan Beaudette wrote:
> > Greetings:
> >
> > are there any simple ways to convert an sp class object to a splancs ppp
> > class object, outside of reading the coordinates and computing a
> > bounding box?
>
> Aren't ppp class objects from the spatstat package? Have you tried the
> spspatstat wrapper package on the R-spatial sourceforge repository? The
> as() coerce method should get you there.
>
> If you really mean splancs, there are not wrappers yet, but are on their
> way.
>
> (Consider posting on R-sig-geo for specific questions like this).
>
> Roger
>
> > cheers,


Thanks for the reply Roger.

I made a slight mistake. I meant to say:

convert sp class object (point / polygon) to a *spatstat* ppp class object.

in the mean time i had come up with the following hack:

# extract data from sp coords slot
# extract data from sp bbox slot

#manually build ppp class object with the ppp() function.

#note that in spatstat the ordering of vertices in a polygon must be in 
counter-clockwise direction, so i had to reverse my vertex vectors.

i will post the worked up examples with data in a bit.

thanks!


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] any good way to convert sp class objects to splancs object

2006-10-25 Thread Dylan Beaudette
Greetings:

are there any simple ways to convert an sp class object to a splancs ppp class 
object, outside of reading the coordinates and computing a bounding box?

cheers,
-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compiling rgdal package on windows / macos

2006-10-11 Thread Dylan Beaudette
On Thursday 05 October 2006 02:06, Roger Bivand wrote:
> On Wed, 4 Oct 2006, Dylan Beaudette wrote:
> > Greetings:
> >
> > As I am not a windows user, I cannot try this: is it possible to install
> > rgdal on windows without having to compile it from source ?
>
> Andy Jaworski already replied that rgdal Windows binaries are available
> from CRAN mirrors, thanks to Uwe Ligges.
>
> > Compilation on MacOS is within my abilities, however each time i try and
> > install the rgdal package it dies complaining that it cannot find
> > gdal-config --- which was recently installed with GRASS. I have updated
> > my PATH environment variable, logged out, but R still cannot find the
> > gdal-config program.
>
> For Mac OSX, please consult the detailed instructions on:
>
> http://www.r-project.org/Rgeo -> Maps (in the left navigation bar).
>
> Hint: see the configure.args= argument of install.packages() and use
> --with-gdal-config= to get the address right.
>
> No Mac OSX binaries will be provided from CRAN, there are simply too many
> Mac OSX varieties out there to be sure that the installed external
> software harmonises with the rgdal binary build. However, one way of
> getting there is referenced in Rgeo, using William Kyngesburye's
> frameworks (rgdal binaries are provided in harmony with PROJ.4 and GDAL).
>
> By the way:
>
> RSiteSearch("rgdal macosx")
>
> first hit takes you to a recent reply on this topic on the R-sig-geo list.
>
> > any tips on getting the rgdal package up and running on MacOS or Windows
> > would be greatly appreciated.
> >
> > Cheers,

Thanks Roger. i had been reading some data information in regards to the rgdal 
package, and thought I would put out a question as soon as possible. Thanks 
for all of the hard work!

PS: I asked this question as I am currently leading a seminar on open source 
software tools for soil scientists, and wanted to highlight the GRASS-R 
coupling. Since many of the students are using macs, I wanted to make sure 
that I would be able to give them an example that they could implement 
themselves. 

Cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compiling rgdal package on windows / macos

2006-10-04 Thread Dylan Beaudette
Greetings:

As I am not a windows user, I cannot try this: is it possible to install rgdal 
on windows without having to compile it from source ?   

Compilation on MacOS is within my abilities, however each time i try and 
install the rgdal package it dies complaining that it cannot find 
gdal-config --- which was recently installed with GRASS. I have updated my 
PATH environment variable, logged out, but R still cannot find the 
gdal-config program.

any tips on getting the rgdal package up and running on MacOS or Windows would 
be greatly appreciated.

Cheers,


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mysterious error on compile R 2.3.1

2006-09-20 Thread Dylan Beaudette
On Wednesday 20 September 2006 12:38, Peter Dalgaard wrote:
> Dylan Beaudette <[EMAIL PROTECTED]> writes:
> > Getting a very strange error with a new install of R from source on x86;
> >
> > make[3]: Leaving directory `/tmp/R.INSTALL.r20887/cluster/src'
> > ** R
> > ** data
> > **  moving datasets to lazyload DB
> > Error in factor(c(1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1), 
> > : invalid labels; length 2 should be 1 or 1
> > Execution halted
> > ERROR: lazydata failed for package 'cluster'
> > ** Removing '/home/dylan/src/R-2.3.1/library/cluster'
> > make[2]: *** [cluster.ts] Error 1
> > make[2]: Leaving directory
> > `/home/dylan/src/R-2.3.1/src/library/Recommended' make[1]: ***
> > [recommended-packages] Error 2
> > make[1]: Leaving directory
> > `/home/dylan/src/R-2.3.1/src/library/Recommended' make: ***
> > [stamp-recommended] Error 2
> >
> > note that i am using the GCC flags:
> > CFLAGS=-march=opteron -ffast-math
> > CXXFLAGS=-march=opteron -ffast-math
> >
> > any ideas?
>
> Don't do that
>
> Seriously!
>
> -ffast-math will allow the compiler to break IEEE math specifications,
>  which in turn will break R all over the place.

yikes! i wont do that anymore.

this fixed the problem. thanks!

cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mysterious error on compile R 2.3.1

2006-09-20 Thread Dylan Beaudette
Getting a very strange error with a new install of R from source on x86;

make[3]: Leaving directory `/tmp/R.INSTALL.r20887/cluster/src'
** R
** data
**  moving datasets to lazyload DB
Error in factor(c(1, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1),  : 
invalid labels; length 2 should be 1 or 1
Execution halted
ERROR: lazydata failed for package 'cluster'
** Removing '/home/dylan/src/R-2.3.1/library/cluster'
make[2]: *** [cluster.ts] Error 1
make[2]: Leaving directory `/home/dylan/src/R-2.3.1/src/library/Recommended'
make[1]: *** [recommended-packages] Error 2
make[1]: Leaving directory `/home/dylan/src/R-2.3.1/src/library/Recommended'
make: *** [stamp-recommended] Error 2

note that i am using the GCC flags:
CFLAGS=-march=opteron -ffast-math
CXXFLAGS=-march=opteron -ffast-math

any ideas?

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] questions regarding spline functions

2006-08-01 Thread Dylan Beaudette
 examples
lines(spline(y.2 ~ x.2), col="blue", lty=2)
lines(new_x.2, new_y.2, col='red', lty=2)
lines(ispline_y.2, col='green', lty=2)

legend(8.4,24, legend=c('soil data', 'splines()', 'interpSpline()', 'bs()'), 
col=c('black', 'blue', 'red', 'green'), lty=c(1,1,1,1), lwd=c(2,1,1,1), 
cex=0.7)


Cheers,

Dylan


> Brent
>
> D. Brenton Myers
> Graduate Fellow
> University of Missouri
> Soil Environmental and Atmospheric Sciences
> 269 Agriculture Engineering Building
> Columbia, MO 65211
> Work: (573) 882-1146
> Home: (573) 446-6856
> Cell: (573) 881-6855
> email: [EMAIL PROTECTED]
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Dylan Beaudette
> Sent: Monday, July 31, 2006 6:43 PM
> To: RHELP
> Subject: [R] questions regarding spline functions
>
> Greetings,
>
> A couple general questions regarding the use of splines to interpolate
> depth
> profile data.
>
> Here is an example of a set of depths, with associated attributes for a
> given
> soil profile, along  with a function for calculating midpoints from a
> set of
> soil horizon boundaries:
>
> #calculate midpoints:
> mid <- function(x) {
> for( i in 1:length(x)) {
>  if( i > 1) {
>a[i] = (x[i] - x[i-1]) / 2 + x[i-1]
>   }
>  }
> #reurn the results
> a[which(!is.na(a))]
> }
>
> #horizon depth bounds
> z <- c(0,2,18,24,68,160,170,192,200)
>
> #horizon midpoints, associated with horizon attribute
> x <- mid(z)
>
> #clay pct
> y <- c(0,1,2,2,4,7,6,1)
>
> #plot them
> plot(y ~ x, xlab="Depth", ylab="Percent Clay", type="s")
> points(y ~ x, cex=0.5, pch=16)
>
> These point pairs usually represent a trend with depth, which I would
> like to
> model with splines - or some similar approach, as they have been found
> to
> work better than other methods such as a fitted polynomial.
>
> Using the B Spline function from the 'splines' package, it is possible
> to fit
> a model of some property with depth based on the bs() function:
>
> #natual, B-Splines
> library(splines)
>
> #fit a b-spline model:
> fm <- lm(y ~ bs(x, df=5) )
>
> I am able to predict a soil property with depth, at unsampled locations
> with
> this model with:
>
> new_x <-  seq(range(x)[1], range(x)[2], len = 200)
>
> #predict attribute at unsampled depths:
> new_y <- predict(fm, data.frame(x=new_x) )
>
> #plot the predicted attribute at the unsampled depths
> lines(new_x, new_y, col='red')
>
> This tends to work fairly well (see attached), but I am wondering if I
> can use
> the 'knots' parameter in the bs() function for incorporation of horizon
> boundary information into the creation of the spline. Moreover, it would
> be
> nice if the spline-based model 'fm' would predict a set of values with
> similar mean and range as the original data points: i.e
>
> #summary of clay values from original data:
> summary(y)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>   0.000   1.000   2.000   2.875   4.500   7.00
>
> #see above
> summary(new_y)
> Min.  1st Qu.   Median Mean  3rd Qu. Max.
> -0.05786  2.09500  3.13200  3.62800  5.17100  7.08700
>
>
> This is based on an article I read :
> http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V67-3WWRDYY-3
> &_user=4421&_handle=V-WA-A-W-AU-MsSAYWW-UUA-U-AACZEDZWBC-AACVCCDUBC-BZAY
> UEWB-AU-U&_fmt=summary&_coverDate=08%2F31%2F1999&_rdoc=3&_orig=browse&_s
> rch=%23toc%235807%231999%23999089998%23108393!&_cdi=5807&view=c&_acct=C0
> 00059598&_version=1&_urlVersion=0&_userid=4421&md5=488f1e114d8d64265ff65
> 506e9587e71
>
> where the author talks about a so-called 'equal-area quadratic smoothing
>
> spline' approach to describing a soil property depth function.
> Unfortunately
> the author did not provide sample code
>
> Any thoughts / input would be greatly appreciated!
>
> Cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341


spline2.png
Description: PNG image
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] questions regarding spline functions

2006-07-31 Thread Dylan Beaudette
Greetings,

A couple general questions regarding the use of splines to interpolate depth 
profile data.

Here is an example of a set of depths, with associated attributes for a given 
soil profile, along  with a function for calculating midpoints from a set of 
soil horizon boundaries:

#calculate midpoints:
mid <- function(x) {
for( i in 1:length(x)) {
 if( i > 1) {
   a[i] = (x[i] - x[i-1]) / 2 + x[i-1] 
  } 
 } 
#reurn the results
a[which(!is.na(a))]
}

#horizon depth bounds
z <- c(0,2,18,24,68,160,170,192,200)

#horizon midpoints, associated with horizon attribute
x <- mid(z)

#clay pct
y <- c(0,1,2,2,4,7,6,1)

#plot them
plot(y ~ x, xlab="Depth", ylab="Percent Clay", type="s")
points(y ~ x, cex=0.5, pch=16)

These point pairs usually represent a trend with depth, which I would like to 
model with splines - or some similar approach, as they have been found to 
work better than other methods such as a fitted polynomial.

Using the B Spline function from the 'splines' package, it is possible to fit 
a model of some property with depth based on the bs() function:

#natual, B-Splines
library(splines)

#fit a b-spline model:
fm <- lm(y ~ bs(x, df=5) )

I am able to predict a soil property with depth, at unsampled locations with 
this model with:

new_x <-  seq(range(x)[1], range(x)[2], len = 200)

#predict attribute at unsampled depths:
new_y <- predict(fm, data.frame(x=new_x) )

#plot the predicted attribute at the unsampled depths
lines(new_x, new_y, col='red')

This tends to work fairly well (see attached), but I am wondering if I can use 
the 'knots' parameter in the bs() function for incorporation of horizon 
boundary information into the creation of the spline. Moreover, it would be 
nice if the spline-based model 'fm' would predict a set of values with 
similar mean and range as the original data points: i.e

#summary of clay values from original data:
summary(y)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  0.000   1.000   2.000   2.875   4.500   7.00

#see above
summary(new_y)
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
-0.05786  2.09500  3.13200  3.62800  5.17100  7.08700


This is based on an article I read :
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V67-3WWRDYY-3&_user=4421&_handle=V-WA-A-W-AU-MsSAYWW-UUA-U-AACZEDZWBC-AACVCCDUBC-BZAYUEWB-AU-U&_fmt=summary&_coverDate=08%2F31%2F1999&_rdoc=3&_orig=browse&_srch=%23toc%235807%231999%23999089998%23108393!&_cdi=5807&view=c&_acct=C59598&_version=1&_urlVersion=0&_userid=4421&md5=488f1e114d8d64265ff65506e9587e71

where the author talks about a so-called 'equal-area quadratic smoothing 
spline' approach to describing a soil property depth function. Unfortunately 
the author did not provide sample code

Any thoughts / input would be greatly appreciated!

Cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341


spline1.png
Description: PNG image
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to use large data set ?

2006-07-19 Thread Dylan Beaudette
>
> >
> >
> > Yohan C.
> >
> >
> >
> > --
> > Ce message est confidentiel. Son contenu ne represente en aucun cas un
> > engagement de la part du Groupe Soft Computing sous reserve de tout
> > accord conclu par ecrit entre vous et le Groupe Soft Computing. Toute
> > publication, utilisation ou diffusion, meme partielle, doit etre
> > autorisee prealablement.
> > Si vous n'etes pas destinataire de ce message, merci d'en avertir
> > immediatement l'expediteur.
> > This message is confidential. Its content does not constitute a
> > commitment by Soft Computing Group except where provided for in a
> > written agreement between you and Soft Computing Group. Any unauthorised
> > disclosure, use or dissemination, either whole or partial, is
> > prohibited. If you are not the intended recipient of this message,
> > please notify the sender immediately.
> > ------
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coloring leaves in a hclust or dendrogram plot [solved]

2006-06-06 Thread Dylan Beaudette
On Saturday 11 March 2006 07:37, Romain Francois wrote:
> Le 11.03.2006 15:56, Gabor Grothendieck a écrit :
> > Where does one find A2Rplot that is called in that example?
> >
> > A2R does not appear to be on CRAN.  I did find this:
> >
> >   http://addictedtor.free.fr/Download/A2R.zip
> >
> > which is a zip file containing some R code but it is not a package,
> > which the line library(A2R) seems to need, and it does not include
> > A2Rplot in any case.
>
> (...)
>
> Hi Gabor and all,
>
> the package A2R is really highly experimental now and i don't have much
> time to improve it to such an extent that it would be possible to commit
> it to CRAN. For now on, the source of the package is there :
> http://addictedtor.free.fr/packages/A2R/
> For people that can't build it, it is possible to source the files from
> here :
> http://addictedtor.free.fr/packages/A2R/lastVersion/R/
>
> Plus, i am (slowly) preparing an rgg package, so the content of A2R will
> probably go there
>
> Romain

Thanks for the update Romain,

I have successfully installed the A2R package, and I nearly have a working 
graph from A2Rplot. sample output here:

http://169.237.35.250/~dylan/temp/a2rplot.pdf

however, the labels at the bottom of the dendrogram are truncated. Is there 
any way to prevent this - some parameter that I can pass to A2Rplot() ?

Also, is it possible to add more factors to the bottom of the plot?

Thanks!

Dylan




-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] standardization of values before call to pam() or clara()

2006-05-22 Thread Dylan Beaudette
Greetings,

Experimenting with the cluster package, and am starting to scratch my head in 
regards to the *best* way to standardize my data. Both functions can 
pre-standardize columns in a dataframe. according to the manual:

Measurements are standardized for each variable (column), by subtracting the 
variable's mean value and dividing by the variable's mean absolute deviation. 

This works well when input variables are all in the same units. When I include 
new variables with a different intrinsic range, the ones with the largest 
relative values tend to be _weighted_ . this is certainly not surprising, but 
complicates things. 

Does there exist a robust technique to effectively re-scale each of the 
variables, regardless of their intrinsic range to some set range, say from 
{0,1} ?

I have tried dividing a variable by the maximum value of that variable, but I 
am not sure if this is statistically correct. 

Any ideas, thoughts would be greatly appreciated.

Cheers,

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] passing known medoids to clara() in the cluster package

2006-05-17 Thread Dylan Beaudette
Martin,

Just wanted to check on the status of including known medoids into calls to 
the clara() function within the cluster package.

Cheers,

Dylan

On Monday 10 April 2006 14:25, Dylan Beaudette wrote:
> Thanks for the reply.
>
> On Sunday 09 April 2006 11:46 pm, Martin Maechler wrote:
> > >>>>> "DylanB" == Dylan Beaudette <[EMAIL PROTECTED]>
> > >>>>> on Sun, 9 Apr 2006 19:28:44 -0700 writes:
> >
> > DylanB> Greetings, I have had good success using the clara()
> > DylanB> function to perform a simple cluster analysis on a
> > DylanB> large dataset (1 million+ records with 9 variables).
> >
> > DylanB> Since the clara function is a wrapper to pam(),
> > DylanB> which will accept known medoid data - I am wondering
> > DylanB> if this too is possible with clara() ... The
> > DylanB> documentation does not suggest that this is
> > DylanB> possible.
> >
> > indeed, it doesn't --  because it's not yet possible.
> > I (as maintainer of "cluster") had added the ``known medoid''
> > option to pam() a while ago last June (for  cluster version 1.10.0),
> > and had left a note my TODO file to do the same for clara().
>
> Ah. that would explain things ! :) . I will check back periodically to see
> when this feature is completed.
>
> > Unfortunately it's not true that clara() was a wrapper to pam()
> > as you state above.
>
> I must have misread the manual pages...
>
> > Given your wish and clear "use case" situation, I'm more
> > motivated to approach this particular 'TODO' item!
> >
> > Martin Maechler, ETH Zurich
> >
> > DylanB> Essentially I am trying to implement a "supervised
> > DylanB> classification" of numerous geographic data
> > DylanB> layers. The "unsupervised" approach using clara()
> > DylanB> works well, but I feel the output classes would be
> > DylanB> more meaningful if I were able to let clara() know
> > DylanB> about the classes that I have in mind.
> >
> > DylanB> Is this at all feasible, or am I trying to
> > DylanB> accomplish something that is not possible?
>
> Thanks Martin!
>
> I will give pam() a try, and see if it can handle the large dataset that I
> am currently using clara() for -- usually only about 5 seconds are required
> for clara() to complete.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] passing known medoids to clara() in the cluster package

2006-04-10 Thread Dylan Beaudette
Thanks for the reply.

On Sunday 09 April 2006 11:46 pm, Martin Maechler wrote:
> >>>>> "DylanB" == Dylan Beaudette <[EMAIL PROTECTED]>
> >>>>> on Sun, 9 Apr 2006 19:28:44 -0700 writes:
>
> DylanB> Greetings, I have had good success using the clara()
> DylanB> function to perform a simple cluster analysis on a
> DylanB> large dataset (1 million+ records with 9 variables).
>
> DylanB> Since the clara function is a wrapper to pam(),
> DylanB> which will accept known medoid data - I am wondering
> DylanB> if this too is possible with clara() ... The
> DylanB> documentation does not suggest that this is
> DylanB> possible.
>
> indeed, it doesn't --  because it's not yet possible.
> I (as maintainer of "cluster") had added the ``known medoid''
> option to pam() a while ago last June (for  cluster version 1.10.0),
> and had left a note my TODO file to do the same for clara().

Ah. that would explain things ! :) . I will check back periodically to see 
when this feature is completed.

> Unfortunately it's not true that clara() was a wrapper to pam()
> as you state above.

I must have misread the manual pages...

> Given your wish and clear "use case" situation, I'm more
> motivated to approach this particular 'TODO' item!
>
> Martin Maechler, ETH Zurich
>
> DylanB> Essentially I am trying to implement a "supervised
> DylanB> classification" of numerous geographic data
> DylanB> layers. The "unsupervised" approach using clara()
> DylanB> works well, but I feel the output classes would be
> DylanB> more meaningful if I were able to let clara() know
> DylanB> about the classes that I have in mind.
>
> DylanB> Is this at all feasible, or am I trying to
>     DylanB> accomplish something that is not possible?

Thanks Martin! 

I will give pam() a try, and see if it can handle the large dataset that I am 
currently using clara() for -- usually only about 5 seconds are required for 
clara() to complete.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] passing known medoids to clara() in the cluster package

2006-04-09 Thread Dylan Beaudette
Greetings,

I have had good success using the clara() function to perform a simple cluster 
analysis on a large dataset (1 million+ records with 9 variables). 

Since the clara function is a wrapper to pam(), which will accept known medoid 
data - I am wondering if this too is possible with clara() ... The 
documentation does not suggest that this is possible. 

Essentially I am trying to implement a "supervised classification" of numerous 
geographic data layers. The "unsupervised" approach using clara() works well, 
but I feel the output classes would be more meaningful if I were able to let 
clara() know about the classes that I have in mind.

Is this at all feasible, or am I trying to accomplish something that is not 
possible?

Cheers,


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] transform shapefiles in UTM to lat/long

2006-03-21 Thread Dylan Beaudette
On Tuesday 21 March 2006 08:13 am, Fredy O. Mejia wrote:
> Dear all:
>
> I have a shapefile in UTM coordinate system and I would like to transform
> it to Lat/Log coordinates (WSG84).  The package PBSmapping contains
> function convUL to transform between the two coordinate systems when data
> is in the form of a data frame with attributes specifying the coordinate
> system. However, when shapefiles are imported using function read.shape
> (package maptools), a list instead of a data frame is generated.  I could
> extract all the polygons in the list and transform them, individually, but
> I hope there is an easier way of doing that.  Besides, the shapefile i'm
> trying to convert has two different UTM zones (is from Guatemala, and it
> covers zones 15 and 16).  Any sugestions would be greatly appreciated.
>
> Thanks,
>
> Fredy
>
>   [[alternative HTML version deleted]]
>

Fredy,

I would recommend performing the inverse projection with the GDAL/OGR library 
first. 


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] coloring leaves in a hclust or dendrogram plot [solved]

2006-03-10 Thread Dylan Beaudette
On Thursday 09 March 2006 06:12 pm, Dylan Beaudette wrote:
> Greetings,
>
> I have perused the r-help mailing list archives for an answer to this
> question, without avail.
>
> I would like to color the "leaves" of a dendrogram plot based on a cutoff
> in one of the variables involved in the initial clustering.
>
> My input data is in the form of:
>   B K
> Alameda   0.2475770 0.7524230
> Alpine0.4546784 0.5453216
> Amador0.6278610 0.3721390
>
> essentially rows labeled by county name, with two variables: percent voted
> for B and percent voted for K. While it is obvious that this is somewhat of
> a contrived example, I intend to use this as a learning device.
>
> Here is the code used to create and plot the dendrogram:
> hc <- hclust(dist(y), "ave")
> dend <- as.dendrogram(hc)
> plot(dend, main="CA 2004 Election Results by County")
>
> An example of the output can be found here:
> http://casoilresource.lawr.ucdavis.edu/drupal/node/206?size=_original
>
>
> I have experimented with the edgePar and nodePar parameters for the
> plot.dendrogram() method, but have not been able to make sense of the
> output.
>
> The basis for setting the colors of the leaves in the dendrogram is a
> simple majority calculation:
>
> reds <- y[y$B > 0.5, ]
> blues <- y[y$K > 0.5, ]
>
> Such that leaves in the tree will be colored based on the membership in
> either of the two above groups.
>
> Is there a resource documenting how this might be accomplished?
>
> Any thoughts or ideas would be greatly appreciated.
>
> Cheers,
>
> Dylan

Replying to my own post...

Discovered the dendapply() function:

reds <<- as.factor(row.names(y[y$B > 0.5, ]))
blues <<- as.factor(row.names(y[y$K > 0.5, ]))

#define a function for coloring and sizing node elements:
colLab <- function(n)
  {
  if(is.leaf(n))
{
a <- attributes(n)
if ( length(which(blues == a$label)) == 1 )
  {
  attr(n, "nodePar") <- c(a$nodePar, list(lab.col = "blue", lab.cex=.7, 
col="blue", cex=pop[n], pch=16 ))
  }
else
  {
  attr(n, "nodePar") <- c(a$nodePar, list(lab.col = "red", lab.cex=.7, 
col="red", cex=pop[n], pch=16))
  }  
}
  n
  }

#modfiy dendrogram nodes and re-plot
dend_colored <- dendrapply(dend, colLab)

...which did the trick

http://casoilresource.lawr.ucdavis.edu/drupal/node/210



-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] coloring leaves in a hclust or dendrogram plot

2006-03-09 Thread Dylan Beaudette
Greetings,

I have perused the r-help mailing list archives for an answer to this 
question, without avail. 

I would like to color the "leaves" of a dendrogram plot based on a cutoff in 
one of the variables involved in the initial clustering.

My input data is in the form of:
  B K
Alameda   0.2475770 0.7524230
Alpine0.4546784 0.5453216
Amador0.6278610 0.3721390

essentially rows labeled by county name, with two variables: percent voted for 
B and percent voted for K. While it is obvious that this is somewhat of a 
contrived example, I intend to use this as a learning device.

Here is the code used to create and plot the dendrogram:
hc <- hclust(dist(y), "ave")
dend <- as.dendrogram(hc)
plot(dend, main="CA 2004 Election Results by County")

An example of the output can be found here:
http://casoilresource.lawr.ucdavis.edu/drupal/node/206?size=_original


I have experimented with the edgePar and nodePar parameters for the 
plot.dendrogram() method, but have not been able to make sense of the output. 

The basis for setting the colors of the leaves in the dendrogram is a simple 
majority calculation:

reds <- y[y$B > 0.5, ]
blues <- y[y$K > 0.5, ]

Such that leaves in the tree will be colored based on the membership in either 
of the two above groups.

Is there a resource documenting how this might be accomplished? 

Any thoughts or ideas would be greatly appreciated.

Cheers,

Dylan






-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] extracting RGB values from a colorspace class object

2006-03-02 Thread Dylan Beaudette
Greetings,

After pouring over the documentation for the 'colorspace' package, I have not 
been able to figure out how the plot() method converts colorspace coordinates 
to RGB values for display on the screen. I am convert between colorspaces 
with the various as() methods... but cannot seem to find a way to extract RGB 
(i.e. for displaying on a computer screen) triplets from color space 
coordinates.

Here is a sample of the data that I am working with:
H   V   C   x   y   Y
 2.5Y0.2 2  1.43  0.97   0.237
 2.5Y0.4 2 0.729 0.588   0.467
 2.5Y0.6 2 0.563 0.491   0.699
 2.5Y0.6 4 0.995 0.734   0.699
 2.5Y0.8 2 0.479 0.439   0.943
 2.5Y0.8 4  0.82  0.64   0.943
 2.5Y  1 20.43620.4177   1.21

I have converted xyY coordinates to XYZ coordinates, based on the equations 
presented here:
http://www.brucelindbloom.com/index.html?ColorCalcHelp.html

#read in the soil colors: munsell + xyY
soil <- read.table("soil_colors", header=F, 
col.names=c("H","V","C","x","y","Y"))

#convert to XYZ
X <- (soil$x * soil$Y ) / soil$y
Y <- soil$Y
Z <- ( (1- soil$x - soil$y) * soil$Y ) / soil$y

#make an XYZ colorspace object:
require(colorspace)
soil_XYZ <- XYZ(X,Y,Z)

#visualize the new XYZ colorspace object in the L*U*V colorspace:
plot(as(soil_XYZ, "LUV"), cex=2)

here is an example of the output:
http://169.237.35.250/~dylan/temp/soil_colors-LUV_space.png

Somehow the plot() method in the colorspace package is converting color space 
coordinates to their RGB values... Does anyone have an idea as to how to 
access these RGB triplets?

Thanks in advance!




-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] understanding patterns in categorical vs. continuous data

2006-01-27 Thread Dylan Beaudette
Thanks to all for the helpful suggestions, I was able to get good start from 
there.

Cheers,

Dylan

On Thursday 26 January 2006 12:03 pm, Gabor Grothendieck wrote:
> Would this do?
>
> boxplot(Sepal.Length ~ Species, iris, horizontal = TRUE)
> library(Hmisc)
> summary(Sepal.Length ~ Species, iris, fun = summary)
>
> On 1/26/06, Dylan Beaudette <[EMAIL PROTECTED]> wrote:
> > Greetings,
> >
> > I have a set of bivariate data: one variable (vegetation type) which is
> > categorical, and one (computed annual insolation) which is continuous.
> > Plotting veg_type ~ insolation produces a nice overview of the patterns
> > that I can see in the source data. However, due to the large number of
> > samples (1,000), and the apparent "spread" in the distribution of a
> > single vegetation type over a range of insolation values- I having a hard
> > time quantitatively describing the relationship between the two
> > variables.
> >
> > Here is a link to a sample graph:
> > http://casoilresource.lawr.ucdavis.edu/drupal/node/162
> >
> > Since the data along each vegetation type "line" is not a distribution in
> > the traditional sense, I am having problems applying descriptive
> > statistical methods. Conceptually, I would like to some how describe the
> > variation with insolation, along each vegetation type "line".
> >
> > Any guidance, or suggested reading material would be greatly appreciated.
> >
> >
> > --
> > Dylan Beaudette
> > Soils and Biogeochemistry Graduate Group
> > University of California at Davis
> > 530.754.7341
> >
> > ______
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] understanding patterns in categorical vs. continuous data

2006-01-26 Thread Dylan Beaudette
Greetings,

I have a set of bivariate data: one variable (vegetation type) which is 
categorical, and one (computed annual insolation) which is continuous. 
Plotting veg_type ~ insolation produces a nice overview of the patterns that 
I can see in the source data. However, due to the large number of samples 
(1,000), and the apparent "spread" in the distribution of a single vegetation 
type over a range of insolation values- I having a hard time quantitatively 
describing the relationship between the two variables. 

Here is a link to a sample graph:
http://casoilresource.lawr.ucdavis.edu/drupal/node/162

Since the data along each vegetation type "line" is not a distribution in the 
traditional sense, I am having problems applying descriptive statistical 
methods. Conceptually, I would like to some how describe the variation with 
insolation, along each vegetation type "line".

Any guidance, or suggested reading material would be greatly appreciated.


-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] overlay additional axes

2005-11-28 Thread Dylan Beaudette
Greetings,

I am trying to add an extra labled axis in position 3 (top x-axis), with 
numbers that do not match up with the existing axes.

Surely this must be possible, and I am just doing it incorectly.

So far I have tried the following:
#make a plot
plot(TIK, type="l", cex=.25, xlim=c(2,32), ylim=c(0,1600))

#try and add a new axis with different numbers in position 3
axis(3,0.154/(2*sin(TIK[,1]/2*pi/180)))

...obviously the nature of the numbers in both axes is quite different. 

is it possible to have the bottom axis (degrees 2Theta) line up with a 
corosponding top axis of 0.154/(2*sin(TIK[,1]/2*pi/180)) ?

thanks in advance!

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] finding peaks in a simple dataset with R

2005-11-26 Thread Dylan Beaudette
On Nov 25, 2005, at 10:27 AM, Martin Maechler wrote:

> Let me try to summarize my view on this:
>
> - I still it would make sense to have a *simple* peaks() function
>   in R which provides the same (or more) functionality as the
>   corresponding S-plus one.From
>   For a proper data analysis situation, I think one would have to
>   do something more sophisticated, based on a model (with a random
>   component), such as nonparametric regression, time-series,
>   Hence peaks() should be kept as simple as reasonable.
>
> - Of course I know that  which() or %in% can be used to deal
>   with logicals containing NAs {As a matter of fact, I've had
>   my fingers in both implementations for R!}.
>   Still, the main use of logical vectors in S often is for
>   situations where NAs only appear because of missing data:
>
>   Indexing ([]), all(), any(), sum()  are all very nice and
>   useful for logical vectors particularly when there are no NAs.
>
> - I agree that a different more flexible function returning
>   values from {-1,0,1} would be desirable, "for symmetry reasons".
>   ===> added a peaksign() function
>
> Here's code that implements the above {and other concerns
> mentioned in this thread}, including some ``consistency
> checking'' :
>
> peaks <- function(series, span = 3, do.pad = TRUE) {
> if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
> s1 <- 1:1 + (s <- span %/% 2)
> if(span == 1) return(rep.int(TRUE, length(series)))
> z <- embed(series, span)
> v <- apply(z[,s1] > z[, -s1, drop=FALSE], 1, all)
> if(do.pad) {
> pad <- rep.int(FALSE, s)
> c(pad, v, pad)
> } else v
> }
>
> peaksign <- function(series, span = 3, do.pad = TRUE)
> {
> ## Purpose: return (-1 / 0 / 1) if series[i] is ( trough / 
> "normal" / peak )
> ## 
> --
> ## Author: Martin Maechler, Date: 25 Nov 2005
>
> if((span <- as.integer(span)) %% 2 != 1 || span == 1)
> stop("'span' must be odd and >= 3")
> s1 <- 1:1 + (s <- span %/% 2)
> z <- embed(series, span)
> d <- z[,s1] - z[, -s1, drop=FALSE]
> ans <- rep.int(0:0, nrow(d))
> ans[apply(d > 0, 1, all)] <- as.integer(1)
> ans[apply(d < 0, 1, all)] <- as.integer(-1)
> if(do.pad) {
> pad <- rep.int(0:0, s)
> c(pad, ans, pad)
> } else ans
> }
>
>
> check.pks <- function(y, span = 3)
> stopifnot(identical(peaks( y, span), peaksign(y, span) ==  1),
>   identical(peaks(-y, span), peaksign(y, span) == -1))
>
> for(y in list(1:10, rep(1,10), c(11,2,2,3,4,4,6,6,6))) {
> for(sp in c(3,5,7))
> check.pks(y, span = sp)
> stopifnot(peaksign(y) == 0)
> }
>
> y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
> ##- [1] 2 5 7
> ##- [1] 4 6 5
> check.pks(y)
>
> set.seed(7)
> y <- rpois(100, lambda = 7)
> check.pks(y)
> py <- peaks(y)
> plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
> points(seq(y)[py], y[py], col = 2, cex = 1.5)
>
> p7 <- peaks(y,7)
> points(seq(y)[p7], y[p7], col = 3, cex = 2)
> mtext("peaks(y,7)", col=3)
>
> set.seed(2)
> x <- round(rnorm(500), 2)
> y <- cumsum(x)
> check.pks(y)
>
> plot(y, type="o", cex = 1/4)
> p15 <- peaks(y,15)
> points(seq(y)[p15], y[p15], col = 3, cex = 2)
> mtext("peaks(y,15)", col=3)
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>
>

Wow! This was the exact sort of simple peak finding algorithm I was 
looking for. Would it be ok for me to post this to our dept. webpage so 
that others may use it?

Cheers,

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] finding peaks in a simple dataset with R

2005-11-23 Thread Dylan Beaudette
On Wednesday 23 November 2005 10:15 am, Tuszynski, Jaroslaw W. wrote:
> >> I am looking for some way to locate peaks in a simple x,y data set.
>
> See my 'msc.peaks.find' function in 'caMassClass', it has a simple peak
> finding algorithm.
>
> Jarek Tuszynski
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Jarek,

Thanks for the tip. I was able to install the caMassClass package and all of 
its dependancies. In addition, I was able to run the examples on the manual 
pages.

However, The format of the input data to the 'msc.peaks.find' function is not 
apparent to me. In its simplest form, my data looks something like this:

2.00 233
2.04 220
...
11.60 540
12.00 600   <-- a peak!
12.04 450
...

Here is an example R session, trying out the function you suggested:

#importing my data like this:
X <- read.table("input.dat", header=TRUE)

#from the example:
Peaks = msc.peaks.find(X)

#errors with:
Error in sort(x, partial = unique(c(lo, hi))) :
'x' must be atomic


Also: I have tried one of the functions ( 'getPeaks' ) listed on the 
'msc.peaks.find' manual page, however I am still having a problem with the 
format of my data vs. what the function is expecting.

#importing my data like this:
X <- read.table("input.dat", header=TRUE)

#setup an output file for peak information
peakfile <- paste("peakinfo.csv", sep="/")

#run the analysis:
getPeaks(X,peakfile)

#errors with:
Error in area/max(area) : non-numeric argument to binary operator
In addition: Warning message:
no finite arguments to max; returning -Inf

any ideas would be greatly appreciated!

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] finding peaks in a simple dataset with R

2005-11-23 Thread Dylan Beaudette
On Nov 23, 2005, at 8:10 AM, Martin Maechler wrote:

>>>>>> "Marc" == Marc Kirchner <[EMAIL PROTECTED]>
>>>>>> on Wed, 23 Nov 2005 14:33:28 + writes:
>
>>>
>>> I wonder if we shouldn't polish that a bit and add to R's
>>> standard 'utils' package.
>>>
>
> Marc> Hm, I figured out there are (at least) two versions out 
> there, one being
> Marc> the "original" idea and a modification.
>
> Marc> === Petr Pikal in 2001 (based on Brian Ripley's idea)==
> Marc> peaks <- function(series, span=3) {
> Marc> z <- embed(series, span)
> Marc> result <- max.col(z) == 1 + span %/% 2
> Marc> result
> Marc> }
>
> Marc> versus
>
> Marc> === Petr Pikal in 2004 ==
> Marc> peaks2<-function(series,span=3) {
> Marc> z <- embed(series, span)
> Marc> s <- span%/%2
> Marc> v<- max.col(z) == 1 + s
> Marc> result <- c(rep(FALSE,s),v)
> Marc> result <- result[1:(length(result)-s)]
> Marc> result
> Marc> }
>
> Thank you, Marc,
>
> Marc> Comparison shows
>>> peaks(c(1,4,1,1,6,1,5,1,1),3)
> Marc> [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
> Marc> which is a logical vector for elements 2:N-1 and
>
>>> peaks2(c(1,4,1,1,6,1,5,1,1),3)
> Marc> [1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
> Marc> which is a logical vector for elements 1:N-2.
>
> Marc> As I would expect to "lose" (span-1)/2 elements on each side
> Marc> of the vector, to me the 2001 version feels more natural.
>
> I think for the function to be more useful it the result should
> have the original vector length and hence I'd propose to also
> pad with FALSE at the upper end.
>

Hi, been lurking for a while, and thought that I would chime in.

I have similar need: x,y data with many "peaks" - finding them with R 
would be a nice feature. However, as mentioned above the ability to 
find more than one peak would be especially helpful. In addition 
preserving the length of the input vector would help with linking peak 
locations to their real index.

In my case the data is generated by an X-ray diffraction machine: 
x-axis in Degrees, y-axis in relative intensity. The data looks like 
this:
http://casoilresource.lawr.ucdavis.edu/drupal/node/71


> Marc> Also, both "suffer" from being non-deterministic in the
> Marc> multiple-maxima-case (the two 4s here)
>
> yes, of course, because of max.col().
> Note that Venables & Ripley would consider this to be rather a feature.
>
> Marc> This could (should?) be fixed by modifying the call to 
> max.col()
> Marc> result <- max.col(z, "first") == 1 + span %/% 2;
>
> Actually I think it should become an option, but I'd use "first"
> as default.
>

I would second that.

> Marc> Just my two cents,
>
> Thank you.
>
> Here is my current proposal which also demonstrates why it's
> useful to pad with FALSE :
>
> peaks <-function(series, span = 3, ties.method = "first") {
> if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
> z <- embed(series, span)
> s <- span %/% 2
> v <- max.col(z, ties.method = ties.method) == s + 1:1
> pad <- rep(FALSE, s)
> c(pad, v, pad)
> }
>
> y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
> ##- [1] 2 5 7
> ##- [1] 4 6 5
>
> set.seed(7)
> y <- rpois(100, lambda = 7)
> py <- peaks(y)
> plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
> points(seq(y)[py], y[py], col = 2, cex = 1.5)
>
> p7 <- peaks(y,7)
> points(seq(y)[p7], y[p7], col = 3, cex = 2)
> mtext("peaks(y,7)", col=3)

Thanks for working on this, as I would imagine there are other lurkers 
out there who are waiting for a solution to this problem.

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html