Ruben,
Maybe your binary response is a numeric vector - try converting it into
a factor with two levels. You probably want classification rather than
regression (the dependent variable should be numeric and continous)!
Arne
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL
Hello,
thanks a lot for your help on splitting the string to get a numeric vector. I'm
now writign the string to a tempfile and read it in via scan - this is fast
enough for me:
library(XML);
...
tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T);
tmp = xmlValue(tmp[[1]]);
Hello,
I've a very long character array (500k characters) that need to split by '\n'
resulting in an array of about 60k numbers. The help on strsplit says to use
perl=TRUE to get better formance, but still it takes several minutes to split
this string.
The massive string is the return value
Hello,
I got stuck with a graphics question: I've 3 figures that I present on a single
page (window) via 'layout'. The layout is
layout(matrix(c(1,1,2,3), 2, 2, byrow=TRUE));
so that the frst plot spans the both columns in row one. Now I'd like to
magnify the fist figure so that it takes 20%
Hello,
I've a question regarding randomForest (from the package with same name). I've
16 featurs (nominative), 159 positive and 318 negative cases that I'd like to
classify (binary classification).
Using the tuning from the e1071 package it turns out that the best performance
if reached when
Dear All,
I'm trying to read a text data file that contains several records separated by
a blank line. Each record starts with a row that contains it's ID and the
number of rows for the records (two columns), then the data table itself, e.g.
123 5
89.17911.1024
90.57351.1024
92.5666
Well, the data is generated by a perl script, and I could just configure the
perl script so that there is one file per data table, but I though I'd probably
must more efficent to have all records in a single file rather than reading a
thousands of small files ... .
kind regards,
Hello,
I was wondering if there is an R-package to automatically calculate the IC50
value (concentration of a substrance that inhibits cell growth to 50%) for some
measurements.
kind regards,
Arne
[[alternative HTML version deleted]]
Hello,
I've implemented dynamic programming for aligning spectral data (usually 100 to
200 peaks in one spectrum, but some spectra contain 5k peaks) entirely in R.
As François Pinard pointed out, the memory usage should be proportional to the
n x n dynamic programming matrix, and I've not yet
Hello,
does anybody run RMySQL/DBI successfully on SunOS5.8 and MySQL 3.23.53 ? I'll
get a segmentation fault whe trying to call dbConnect. We'll soon swtich to
MySQL 4, however, I was wondering whether the very ancient mysql version realy
is the problem ...
RMySQL 0.5-5
DBI 0.1-9
R 2.2.0
Hello,
is it possible to get xyplot of package lattice to acknowledge par(las=2)? In
my trellis plot the x-axis lables are overlapping (they're factors with rather
long level names), and I'd like to have them vertical. The trellis plot doesn't
seem to read the 'par' settings, and
Hello,
I have prepared an svm on some training data and would like to use the svm
model for predicting binary outcome from new data.
The input data frame contains several numeric and factor variables. Usually I
construct the input matrix of the entities to be predicted with a perl script
that
Hello,
I'm posting this to receive some comments/hints about a rather statistical than
R-technical question ... .
In an anova of a lme factor SSPos11 shows up non-significant, but in the t-test
of the summay 2 of the 4 levels (one for constrast) are significant. See below
for some truncated
Hello,
I'm trying to find out the optimal number of splits (mtry parameter) for a
randomForest classification. The classification is binary and there are 32
explanatory variables (mostly factors with each up to 4 levels but also some
numeric variables) and 575 cases.
I've seen that although
Dear All,
I'm classifying some data with various methods (binary classification). I'm
interpreting the results via a confusion matrix from which I calculate the
sensitifity and the fdr. The classifiers are trained on 575 data points and my
test set has 50 data points.
I'd like to calculate
Hello,
I'm using the random forest package. One of my factors in the data set contains
41 levels (I can't code this as a numeric value - in terms of linear models
this would be a random factor). The randomForest call comes back with an error
telling me that the limit is 32 categories.
Is
Dear All,
I've a question about scaling the input variables for an analysis with svm
(package e1071). Most of my variables are factors with 4 to 6 levels but there
are also some numeric variables.
I'm not familiar with the math behind svms, so my assumtions maybe completely
wrong ... or
Dear All,
I've come across a problem in predict.lme. Assigning a model formula to a
variable and then using this variable in lme (instead of typing the formula
into the formula part of lme) works as expect. However, when performing a
predict on the fitted model I gan an error messag -
Hello,
is there a special package/method to cross-validate linear fixed effects and
mixed effects models (from lme)? I've tried cv.glm on an lme (hoping that it
may deal with any kind of linear model ...), but it raises an error:
Error in eval(expr, envir, enclos) : couldn't find function
Hello,
in R-2.1.0 I'm trying to prodice trellis plots from an lmList object as
described in the help for plot.lmList. I can generate the plots from the help,
but on my own data plotting fails with an error message that I cannot interpret
(please see below). Any hints are greatly appreciapted.
Hello,
Is it possible to cast the output of lm.fit to an lm object? I've 10,000 linear
models for a gene expression experiment, all of which have the same model
matrix. Maybe calling lm.fit on a model matrix and a data vector is faster than
lm. I'd like to use each fit for an anova as well as
[...]
I am a biologist coming to R via Bioconductor. I have no computer
background in computer sciences and only basic undergraduate training
level in statistics.
I have used R with great pleasure and great pains. The most difficult
thing is to know what functions to use - sometimes I
Hello,
is it possible to configure the print function to print to stderr?
kind regards,
Arne
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
Hello,
I've crearted a boxplot with 84 boxes. So fat everything is as I expect, but there is
quite some space between the 1st box and axis 2 and the last box and axis 4. Since 84
boxes get very slim anyway I'd like to discribute as much of the horizontal space over
the x-axis.
Maybe I've
Hello Deepayan,
thanks for your suggestion, xaxs='i' works, but it leaves no space at all. I though
this may be configurable by a real value.
kind regards,
Arne
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Deepayan Sarkar
Sent: 30
Hello,
I've a problem aligning tickmarks to an image. I've created a correlation matrix for
84 datasets. I'm visualizing the matrix as an image with colour coding according to
the correlation coefficient.
The 84 datasets are distributed over three factors, but the desgin is unbalanced, so
Hello,
I was wondering wether there's a function in R that takes two vectors (of same length)
as input and computes mean values for bins (intervals) or even a sliding window over
these vectros.
I've several x/y data set (input/response) that I'd like plot together. Say the x-data
for one data
Hello,
I'm wondering what's the best way to analyse an unbalanced design with a low number of
replicates. I'm not a statistician, and I'm looking for some direction for this
problem.
I've a 2 factor design:
Factor batch with 3 levels, and factor dose within each batch with 5 levels. Dose
Hi,
look at http://www.omegahat.org/RSPerl/index.html.
regards,
Arne
--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of XIAO LIU
Sent: 23
Hello,
I've a problem with a self written routine taking a lot of memory (1.2Gb). Maybe you
can suggest some enhancements, I'm pretty sure that my implementation is not optimal
...
I'm creating many linear models and store coefficients, anova p-values ... all I need
in different lists which
Hello,
I'd like to use DBI to store lm objects in a database. I've to analyze many of linear
models and I cannot store them in a single R-session (not enough memory). Also it'd be
nice to have them persistent.
Maybe it's possible to create a compact binary representation of the object (the
Hello,
A collegue of mine has compared the runtime of a linear model + anova in SAS and S+.
He got the same results, but SAS took a bit more than a minute whereas S+ took 17
minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of
memory, and I assume that all
Hello,
thanks for your reply. I've now done the profiling, and I interpret that the most time
is spend in the fortran routine(s):
Each sample represents 0.02 seconds.
Total run time: 920.21999453 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in
Thanks All, for your help. There seems to be a lot I can try to speed up the fits.
However, I'd like to go for a much simpler model which I think is justified by the
experiment itself, e.g; I may think about removing the nestinh (Ba:Ti:Do)/Ar.
The model matrix has 1344 rows and 2970 columns,
Hello,
I'm trying to reproduce some SAS result wit R (after I got suspicious with the result
in R). I struggle with the contrasts in a linear model.
I've got three factors
d$dose - as.factor(d$dose) # 5 levels
d$time - as.factor(d$time) # 2 levels
d$batch - as.factor(d$batch) # 3 levels
Hi,
for example one could do it this way:
v - summary(fit)$coefficients[,4]
the coefficient attribute is a matrix, and with the 4 you refere to the
pvalue (at least in lm - don't know if summary(glm) produces sligthely
different output).
to skip the intercept (1st row): v -
Hello,
I've come across the following situation in R-1.8.1 (compile + running under
RedHat 7.1):
phyper(24, 514, 5961-514, 53, lower.tail=T)
[1] 1
phyper(24, 514, 5961-514, 53, lower.tail=F)
[1] -1.037310e-11
I'd expect the later to be 0 or some very small positive number. Is this a
number
Hi,
yes, I did compile it with gcc 2.96 ... . Do you've an estimate on how bad
this error is, e.g. how much it effects the calculations in R?
kind regards,
Arne
-Original Message-
From: Roger D. Peng [mailto:[EMAIL PROTECTED]
Sent: 04 February 2004 14:49
To:
Hello,
I've tried to analyze some data with a CMH test. My 3 dimensional contingency
tables are 2x2xN where N is usually between 10 and 100.
The problem is that there may be 2 strata with opposite counts (the 2x2
contigency table for these are reversed), producing opposite odds ratios that
Hello,
Is there a test for independence available based on a multidimensional
contingency table?
I've about 300 processes, and for each of them I get numbers for failures and
successes. I've two or more conditions under which I test these processes.
If I had just one process to test I could
Hello,
thanks for the replies to this subject. I'm using a fisher.test to test if
the proportions of my 2 samples are different (see Ted's example below).
The assumption was that the two samples are from the same population and that
they may contain a different number of positives (due to
Hello,
I'm looking for some guidance with the following problem:
I've 2 samples A (111 items) and B (10 items) drawn from the same unknown
population. Witihn A I find 9 positives and in B 0 positives. I'd like to
know if the 2 samples A and B are different, ie is there a way to find out
whether
Hello,
I've a question about the fdr method in p.adjust: What is the threshold of
the FDR, and is it possible to change this threshold?
As I understand the FDR (please correct) it adjusts the p-values so that for
less than N% (say the cutoff is 25%) of the alternative hypothesis the Null
is in
Hello,
I've a weired problem with a data frame. Basically it should be just one
column with
specific names coming from a data file (the file contains 2 rows, one should
be
the for the rownames of the data frame the other contains numeric values).
df.rr - read.table(RR_anova.txt, header=T,
Hi All,
I've the following data frame with 54 rows and 4 colums:
x
Ratio Dose Time Batch
R.010mM.04h.NEW0.02 010mM 04h NEW
R.010mM.04h.NEW.1 0.07 010mM 04h NEW
...
R.010mM.24h.NEW.2 0.06 010mM 24h NEW
R.010mM.04h.OLD0.19 010mM 04h OLD
Sorry, I just figured it out: x[x$Batch == 'OLD',] instead of x[x$Batch ==
'OLD']. I didn't know this has to be in the same format then x[1:20,] where I
already used the comma.
sorry for posting the previous message ...
Arne
-Original Message-
From: [EMAIL PROTECTED]
Hi,
thanks for your replies regarding the problem to select a sub data frame by
expression. I start getting an understanding on how indexing works in R.
thanks for your replies,
Arne
-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: 17
Hello,
I'm trying to set up the flowwing data structure in R:
A data frame with 7,000 rows and 4 colums. The rownames have some special
meaning (they are names of genes). The 1st column per row is itself a data
frame, and columns 2 to 4 will keep numeric values.
The data frame contained in the
Hi All,
I'm running R 1.7.1, and I've installed some additional packages such a
Bioconductor. Do I've to re-install all the additional packages when ugrading
to R 1.8.0 (i.e. are there compile in dependencies)?
thanks for your help,
Arne
Hello,
thanks for the tips on updating packages for 1.8.0. The updating is a real
problem for me, since I've to do it sort of manually using my web-browser or
wget. I'm behind a firewall that requires http/ftp authentification (username
and passwd) for every request it sends to a server outside
Sorry, I didn' mean it the nasty way. I wouldn't have been surprised if the
R-team had told me the authentification with the firewall is my problem (i.e.
a special case that cannot be dealt with by th R-team).
Yess, and off course I should have had a much closer lookk into the docu.
Thanks again
Hello,
can anybody here explain what a Jonckheere-Terpstra test is and whether it is
implemented in R? I just know it's a non-parametric test, otherwise I've no
clue about it ;-( . Are there alternatives to this test?
thanks for help,
Arne
Hello,
I was wondering what's the best data structure in R for a multi-dimensional
lookup table, and how to implement it. I've several categories say A, B,
C ... and within each of these categories there are other categories such
as a, b, c, ... . There can be up to 5 dimensions. The actual value
Hi All,
I'd be interested in your opinions of the book
Introductory Statistics with R by Peter Dalgaard
Does it well describe the R object concept, the language itself and
statistical aspects (I am not a statistician)?
thanks for your opinion,
Arne
Hi,
I've experienced similar failures with the RSperl installation. So I'd be
interested if someone sorts out the library misery ... ;-)
Arne
-Original Message-
From: Laurent Faisnel [mailto:[EMAIL PROTECTED]
Sent: 09 September 2003 12:48
To: [EMAIL PROTECTED]
Subject: Re:
Hello,
I've a data frame with 15 colums and 6000 rows, and I need the data in a
single vector of size 9 for ttest. Is there such a conversion function in
R, or would I have to write my own loop over the colums?
thanks for your help + kind regards
Arne
56 matches
Mail list logo