Re: [R] Random Forest

2007-04-23 Thread Arne.Muller
Ruben, Maybe your binary response is a numeric vector - try converting it into a factor with two levels. You probably want classification rather than regression (the dependent variable should be numeric and continous)! Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL

Re: [R] splitting very long character string

2006-11-02 Thread Arne.Muller
Hello, thanks a lot for your help on splitting the string to get a numeric vector. I'm now writign the string to a tempfile and read it in via scan - this is fast enough for me: library(XML); ... tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T); tmp = xmlValue(tmp[[1]]);

[R] splitting very long character string

2006-11-01 Thread Arne.Muller
Hello, I've a very long character array (500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value

[R] graphics and 'layout' question

2006-09-15 Thread Arne.Muller
Hello, I got stuck with a graphics question: I've 3 figures that I present on a single page (window) via 'layout'. The layout is layout(matrix(c(1,1,2,3), 2, 2, byrow=TRUE)); so that the frst plot spans the both columns in row one. Now I'd like to magnify the fist figure so that it takes 20%

[R] randomForest question

2006-07-26 Thread Arne.Muller
Hello, I've a question regarding randomForest (from the package with same name). I've 16 featurs (nominative), 159 positive and 318 negative cases that I'd like to classify (binary classification). Using the tuning from the e1071 package it turns out that the best performance if reached when

[R] data import problem

2006-03-08 Thread Arne.Muller
Dear All, I'm trying to read a text data file that contains several records separated by a blank line. Each record starts with a row that contains it's ID and the number of rows for the records (two columns), then the data table itself, e.g. 123 5 89.17911.1024 90.57351.1024 92.5666

Re: [R] data import problem

2006-03-08 Thread Arne.Muller
Well, the data is generated by a perl script, and I could just configure the perl script so that there is one file per data table, but I though I'd probably must more efficent to have all records in a single file rather than reading a thousands of small files ... . kind regards,

[R] calculating IC50

2006-02-02 Thread Arne.Muller
Hello, I was wondering if there is an R-package to automatically calculate the IC50 value (concentration of a substrance that inhibits cell growth to 50%) for some measurements. kind regards, Arne [[alternative HTML version deleted]]

Re: [R] Dynamic Programming in R

2006-01-20 Thread Arne.Muller
Hello, I've implemented dynamic programming for aligning spectral data (usually 100 to 200 peaks in one spectrum, but some spectra contain 5k peaks) entirely in R. As François Pinard pointed out, the memory usage should be proportional to the n x n dynamic programming matrix, and I've not yet

[R] RMySQL/DBI

2006-01-06 Thread Arne.Muller
Hello, does anybody run RMySQL/DBI successfully on SunOS5.8 and MySQL 3.23.53 ? I'll get a segmentation fault whe trying to call dbConnect. We'll soon swtich to MySQL 4, however, I was wondering whether the very ancient mysql version realy is the problem ... RMySQL 0.5-5 DBI 0.1-9 R 2.2.0

[R] trellis: style of axis labels

2005-12-12 Thread Arne.Muller
Hello, is it possible to get xyplot of package lattice to acknowledge par(las=2)? In my trellis plot the x-axis lables are overlapping (they're factors with rather long level names), and I'd like to have them vertical. The trellis plot doesn't seem to read the 'par' settings, and

[R] data frames and factors

2005-11-24 Thread Arne.Muller
Hello, I have prepared an svm on some training data and would like to use the svm model for predicting binary outcome from new data. The input data frame contains several numeric and factor variables. Usually I construct the input matrix of the entities to be predicted with a perl script that

[R] basic anova and t-test question

2005-08-26 Thread Arne.Muller
Hello, I'm posting this to receive some comments/hints about a rather statistical than R-technical question ... . In an anova of a lme factor SSPos11 shows up non-significant, but in the t-test of the summay 2 of the 4 levels (one for constrast) are significant. See below for some truncated

[R] RandomForest question

2005-07-21 Thread Arne.Muller
Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although

[R] p-values for classification

2005-07-01 Thread Arne.Muller
Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate

[R] randomForest error

2005-06-30 Thread Arne.Muller
Hello, I'm using the random forest package. One of my factors in the data set contains 41 levels (I can't code this as a numeric value - in terms of linear models this would be a random factor). The randomForest call comes back with an error telling me that the limit is 32 categories. Is

[R] svm and scaling input

2005-06-28 Thread Arne.Muller
Dear All, I've a question about scaling the input variables for an analysis with svm (package e1071). Most of my variables are factors with 4 to 6 levels but there are also some numeric variables. I'm not familiar with the math behind svms, so my assumtions maybe completely wrong ... or

[R] bug in predict.lme?

2005-06-08 Thread Arne.Muller
Dear All, I've come across a problem in predict.lme. Assigning a model formula to a variable and then using this variable in lme (instead of typing the formula into the formula part of lme) works as expect. However, when performing a predict on the fitted model I gan an error messag -

[R] lm/lme cross-validation

2005-05-31 Thread Arne.Muller
Hello, is there a special package/method to cross-validate linear fixed effects and mixed effects models (from lme)? I've tried cv.glm on an lme (hoping that it may deal with any kind of linear model ...), but it raises an error: Error in eval(expr, envir, enclos) : couldn't find function

[R] error in plot.lmList

2005-05-13 Thread Arne.Muller
Hello, in R-2.1.0 I'm trying to prodice trellis plots from an lmList object as described in the help for plot.lmList. I can generate the plots from the help, but on my own data plotting fails with an error message that I cannot interpret (please see below). Any hints are greatly appreciapted.

[R] casting lm.fit output to an lm object

2005-01-06 Thread Arne.Muller
Hello, Is it possible to cast the output of lm.fit to an lm object? I've 10,000 linear models for a gene expression experiment, all of which have the same model matrix. Maybe calling lm.fit on a model matrix and a data vector is faster than lm. I'd like to use each fit for an anova as well as

RE: [R] Re: The hidden costs of GPL software?

2004-11-18 Thread Arne.Muller
[...] I am a biologist coming to R via Bioconductor. I have no computer background in computer sciences and only basic undergraduate training level in statistics. I have used R with great pleasure and great pains. The most difficult thing is to know what functions to use - sometimes I

[R] printing to stderr

2004-11-10 Thread Arne.Muller
Hello, is it possible to configure the print function to print to stderr? kind regards, Arne __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!

[R] Boxplot, space to axis

2004-09-30 Thread Arne.Muller
Hello, I've crearted a boxplot with 84 boxes. So fat everything is as I expect, but there is quite some space between the 1st box and axis 2 and the last box and axis 4. Since 84 boxes get very slim anyway I'd like to discribute as much of the horizontal space over the x-axis. Maybe I've

RE: [R] Boxplot, space to axis

2004-09-30 Thread Arne.Muller
Hello Deepayan, thanks for your suggestion, xaxs='i' works, but it leaves no space at all. I though this may be configurable by a real value. kind regards, Arne -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Deepayan Sarkar Sent: 30

[R] strange tickmarks placing in image

2004-08-03 Thread Arne.Muller
Hello, I've a problem aligning tickmarks to an image. I've created a correlation matrix for 84 datasets. I'm visualizing the matrix as an image with colour coding according to the correlation coefficient. The 84 datasets are distributed over three factors, but the desgin is unbalanced, so

[R] binning a vector

2004-07-26 Thread Arne.Muller
Hello, I was wondering wether there's a function in R that takes two vectors (of same length) as input and computes mean values for bins (intervals) or even a sliding window over these vectros. I've several x/y data set (input/response) that I'd like plot together. Say the x-data for one data

[R] unbalanced design for anova with low number of replicates

2004-06-28 Thread Arne.Muller
Hello, I'm wondering what's the best way to analyse an unbalanced design with a low number of replicates. I'm not a statistician, and I'm looking for some direction for this problem. I've a 2 factor design: Factor batch with 3 levels, and factor dose within each batch with 5 levels. Dose

RE: [R] Perl--R interface

2004-06-23 Thread Arne.Muller
Hi, look at http://www.omegahat.org/RSPerl/index.html. regards, Arne -- Arne Muller, Ph.D. Toxicogenomics, Aventis Pharma arne dot muller domain=aventis com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of XIAO LIU Sent: 23

[R] help with memory greedy storage

2004-05-14 Thread Arne.Muller
Hello, I've a problem with a self written routine taking a lot of memory (1.2Gb). Maybe you can suggest some enhancements, I'm pretty sure that my implementation is not optimal ... I'm creating many linear models and store coefficients, anova p-values ... all I need in different lists which

[R] storage of lm objects in a database

2004-05-13 Thread Arne.Muller
Hello, I'd like to use DBI to store lm objects in a database. I've to analyze many of linear models and I cannot store them in a single R-session (not enough memory). Also it'd be nice to have them persistent. Maybe it's possible to create a compact binary representation of the object (the

[R] R versus SAS: lm performance

2004-05-11 Thread Arne.Muller
Hello, A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all

RE: [R] R versus SAS: lm performance

2004-05-11 Thread Arne.Muller
Hello, thanks for your reply. I've now done the profiling, and I interpret that the most time is spend in the fortran routine(s): Each sample represents 0.02 seconds. Total run time: 920.21999453 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in

RE: [R] R versus SAS: lm performance

2004-05-11 Thread Arne.Muller
Thanks All, for your help. There seems to be a lot I can try to speed up the fits. However, I'd like to go for a much simpler model which I think is justified by the experiment itself, e.g; I may think about removing the nestinh (Ba:Ti:Do)/Ar. The model matrix has 1344 rows and 2970 columns,

[R] strange result with contrasts

2004-04-20 Thread Arne.Muller
Hello, I'm trying to reproduce some SAS result wit R (after I got suspicious with the result in R). I struggle with the contrasts in a linear model. I've got three factors d$dose - as.factor(d$dose) # 5 levels d$time - as.factor(d$time) # 2 levels d$batch - as.factor(d$batch) # 3 levels

RE: [R] Storing p-values from a glm

2004-04-06 Thread Arne.Muller
Hi, for example one could do it this way: v - summary(fit)$coefficients[,4] the coefficient attribute is a matrix, and with the 4 you refere to the pvalue (at least in lm - don't know if summary(glm) produces sligthely different output). to skip the intercept (1st row): v -

[R] number point under-flow

2004-02-04 Thread Arne.Muller
Hello, I've come across the following situation in R-1.8.1 (compile + running under RedHat 7.1): phyper(24, 514, 5961-514, 53, lower.tail=T) [1] 1 phyper(24, 514, 5961-514, 53, lower.tail=F) [1] -1.037310e-11 I'd expect the later to be 0 or some very small positive number. Is this a number

RE: [R] number point under-flow

2004-02-04 Thread Arne.Muller
Hi, yes, I did compile it with gcc 2.96 ... . Do you've an estimate on how bad this error is, e.g. how much it effects the calculations in R? kind regards, Arne -Original Message- From: Roger D. Peng [mailto:[EMAIL PROTECTED] Sent: 04 February 2004 14:49 To:

[R] Cochran-Mantel-Haenszel problem

2003-12-11 Thread Arne.Muller
Hello, I've tried to analyze some data with a CMH test. My 3 dimensional contingency tables are 2x2xN where N is usually between 10 and 100. The problem is that there may be 2 strata with opposite counts (the 2x2 contigency table for these are reversed), producing opposite odds ratios that

[R] multidimensional Fisher or Chi square test

2003-12-03 Thread Arne.Muller
Hello, Is there a test for independence available based on a multidimensional contingency table? I've about 300 processes, and for each of them I get numbers for failures and successes. I've two or more conditions under which I test these processes. If I had just one process to test I could

RE: [R] significance in difference of proportions

2003-12-01 Thread Arne.Muller
Hello, thanks for the replies to this subject. I'm using a fisher.test to test if the proportions of my 2 samples are different (see Ted's example below). The assumption was that the two samples are from the same population and that they may contain a different number of positives (due to

[R] significance in difference of proportions

2003-11-27 Thread Arne.Muller
Hello, I'm looking for some guidance with the following problem: I've 2 samples A (111 items) and B (10 items) drawn from the same unknown population. Witihn A I find 9 positives and in B 0 positives. I'd like to know if the 2 samples A and B are different, ie is there a way to find out whether

[R] FDR in p.adjust

2003-11-03 Thread Arne.Muller
Hello, I've a question about the fdr method in p.adjust: What is the threshold of the FDR, and is it possible to change this threshold? As I understand the FDR (please correct) it adjusts the p-values so that for less than N% (say the cutoff is 25%) of the alternative hypothesis the Null is in

[R] why does data frame subset return vector

2003-10-18 Thread Arne.Muller
Hello, I've a weired problem with a data frame. Basically it should be just one column with specific names coming from a data file (the file contains 2 rows, one should be the for the rownames of the data frame the other contains numeric values). df.rr - read.table(RR_anova.txt, header=T,

[R] sub data frame by expression

2003-10-17 Thread Arne.Muller
Hi All, I've the following data frame with 54 rows and 4 colums: x Ratio Dose Time Batch R.010mM.04h.NEW0.02 010mM 04h NEW R.010mM.04h.NEW.1 0.07 010mM 04h NEW ... R.010mM.24h.NEW.2 0.06 010mM 24h NEW R.010mM.04h.OLD0.19 010mM 04h OLD

RE: [R] sub data frame by expression

2003-10-17 Thread Arne.Muller
Sorry, I just figured it out: x[x$Batch == 'OLD',] instead of x[x$Batch == 'OLD']. I didn't know this has to be in the same format then x[1:20,] where I already used the comma. sorry for posting the previous message ... Arne -Original Message- From: [EMAIL PROTECTED]

RE: [R] sub data frame by expression

2003-10-17 Thread Arne.Muller
Hi, thanks for your replies regarding the problem to select a sub data frame by expression. I start getting an understanding on how indexing works in R. thanks for your replies, Arne -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: 17

[R] A data frame of data frames

2003-10-16 Thread Arne.Muller
Hello, I'm trying to set up the flowwing data structure in R: A data frame with 7,000 rows and 4 colums. The rownames have some special meaning (they are names of genes). The 1st column per row is itself a data frame, and columns 2 to 4 will keep numeric values. The data frame contained in the

[R] New R - recompiling all packages

2003-10-08 Thread Arne.Muller
Hi All, I'm running R 1.7.1, and I've installed some additional packages such a Bioconductor. Do I've to re-install all the additional packages when ugrading to R 1.8.0 (i.e. are there compile in dependencies)? thanks for your help, Arne

[R] updating via CRAN and http

2003-10-08 Thread Arne.Muller
Hello, thanks for the tips on updating packages for 1.8.0. The updating is a real problem for me, since I've to do it sort of manually using my web-browser or wget. I'm behind a firewall that requires http/ftp authentification (username and passwd) for every request it sends to a server outside

RE: [R] updating via CRAN and http

2003-10-08 Thread Arne.Muller
Sorry, I didn' mean it the nasty way. I wouldn't have been surprised if the R-team had told me the authentification with the firewall is my problem (i.e. a special case that cannot be dealt with by th R-team). Yess, and off course I should have had a much closer lookk into the docu. Thanks again

[R] Jonckheere-Terpstra test

2003-10-05 Thread Arne.Muller
Hello, can anybody here explain what a Jonckheere-Terpstra test is and whether it is implemented in R? I just know it's a non-parametric test, otherwise I've no clue about it ;-( . Are there alternatives to this test? thanks for help, Arne

[R] multi-dimensional hash

2003-10-02 Thread Arne.Muller
Hello, I was wondering what's the best data structure in R for a multi-dimensional lookup table, and how to implement it. I've several categories say A, B, C ... and within each of these categories there are other categories such as a, b, c, ... . There can be up to 5 dimensions. The actual value

[R] R book

2003-09-11 Thread Arne.Muller
Hi All, I'd be interested in your opinions of the book Introductory Statistics with R by Peter Dalgaard Does it well describe the R object concept, the language itself and statistical aspects (I am not a statistician)? thanks for your opinion, Arne

RE: [R] No joy installing R with shared libs

2003-09-09 Thread Arne.Muller
Hi, I've experienced similar failures with the RSperl installation. So I'd be interested if someone sorts out the library misery ... ;-) Arne -Original Message- From: Laurent Faisnel [mailto:[EMAIL PROTECTED] Sent: 09 September 2003 12:48 To: [EMAIL PROTECTED] Subject: Re:

[R] all values from a data frame

2003-09-05 Thread Arne.Muller
Hello, I've a data frame with 15 colums and 6000 rows, and I need the data in a single vector of size 9 for ttest. Is there such a conversion function in R, or would I have to write my own loop over the colums? thanks for your help + kind regards Arne