from:"Philipp Pagel"

Re: [R] control the conversion of factor to numeric

2011-10-18 Thread Philipp Pagel

On Tue, Oct 18, 2011 at 03:40:27PM +0200, Martin Batholdy wrote:
 Ok, I think that would work – thanks!
 
 However, in my case I read a data.frame via read.table().
 So some of the columns get transformed to factors automatically –  I don't
 generate the factor-variables as in the example, so I can't control how the
 levels are ordered (or can I?).


You can't while reading the data but nothing can stop you from
re-ordering the levels once you have your data.frame. An example with
the iris data:

 data(iris)
 str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 
1 ...
 iris$Species - factor(iris$Species, ordered=TRUE, levels=c('versicolor', 
 'virginica', 'setosa'))
 str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species : Ord.factor w/ 3 levels versicolorvirginica..: 3 3 3 3 3 
3 3 3 3 3 ...

cu
Philipp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Histogram for each ID value

2011-10-17 Thread Philipp Pagel



 where the first column is the chromosome location and the second column is
 some value. What I'd like to do is have a histogram created for each chr
 location (i.e. a separate histogram for chr1, chr2, chr3, chr7, chr9, and
 chr22). I am just having a hard time getting everything to work out and am
 hoping for some suggestions.

ggplot and looping combined with traditional graphics have already been
mentioned, so I'll add the lattice solution for completeness:

histogram(~foo | choromosome, dat)

This assumes that your dataframe is called dat and contains two
columns called foo (your numeric value) and chomosome (your
chromosome identifier).

cu
Philipp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave doesn't work

2011-08-21 Thread Philipp Pagel

On Sun, Aug 21, 2011 at 09:18:25AM -0700, danielepippo wrote:

 Sweave(example.Rtex) in R it seems working

[...]

 * ...le/Desktop/dati/LaTeX1.Rtex*

Sounds like you are first running sweave on the file 'example.Rtex'
and later LaTex on 'LaTeX1.Rtex'

Two points:

1) why do these files have totally differnt names? If the Sweave file
was called 'example.Rtex' I'd expect the corresponding LaTex file to
be 'example.tex'

2) Why to both files have the same extension? Commonly, the Sweave
files are called 'something.Snw' or 'something.Rnw' and the resulting
LaTex Files would be 'something.tex'

cu
Philipp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is this a bug for my fault?

2011-08-18 Thread Philipp Pagel

On Thu, Aug 18, 2011 at 04:52:58PM +0700, Rut S wrote:
 I tried to recode some complex multiple variables and run into a problem that
 r can change only some column that I want to change.
 
 I can reproduce the problem with this
 
 idfortest - c(6,23,46,63,200,238,297,321,336,364,386,392,414,434,441)
 id - seq(1:500)
 id[id==idfortest]
 
 the result showed 
 Warning in id == idfortest :
   longer object length is not a multiple of shorter object length
 [1] 200 386 434
 
 can you enlighten me for this, thank you in advance.

Others have already pointed out what the problem is. I'd like to add
that you are probablyu looking for the %in% operator.

cu
Philipp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write merged data frame to a file

2011-07-18 Thread Philipp Pagel

On Mon, Jul 18, 2011 at 04:00:29PM +0200, Andrea Franceschini wrote:
 
 I use version 13 of R in OSX (downloaded and installed less than 1 year ago).

Probably 2.13 ...

[...] code omitted

 The first lines are OK (i.e. 14 columns, like the dataframe), while at
 a certain point I get lines with only 3 columns !!!
 The bad lines that contain only 3 columns have the name and the
 description of the gene (i.e. the content of the file that I merged
 with).
 Besides, these strange lines also get repeated (see the bottom).

I havent't carefully analyzed your code so I may be wrong but my guess for
all weird behaviour of gene related data.frames problems is this:

Gene descriptions love to contain things like Foo 5' obfuscation
factor.  Note the ' in the description which read.table will
happily interpret as a quotation mark and eat lots of rows until it
happens to encouter a closing counterpart. This leads to all kinds of
funny results. So I bet your problem is not in write.table but in
reading the data. Have a closer look at your data frame: are you
really getting the expected number of observations in the merged
data.frame? Are the rows in question really ok in the data frame? If
my guess is correct you should be able to fix your problem by
including quote= in both your read.table commands.

If it doesn't, also try comment.char= - another popular source of
problems.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem in reading a sequence file

2011-07-05 Thread Philipp Pagel

On Tue, Jul 05, 2011 at 02:06:02PM +0200, albert coster wrote:
  seqfile
 V1
 1 NNATTAAAGGGC
 
 I want only  NNATTAAAGGGC .

If I understand correctly, your file simply contains one string
(sequence) per line. In that case you may want to use scan() instead
for read.table but without more infromation it's hard to know.

Can you proviede a very short example file (maybe 5 lines) and also
the output of str(foo) where foo is the variable you read the file
into?

Also: do you want a data frame with a single column? Or rather a
vector of strings? Something else? Does your file ONLY contain
sequences - or are there also identifiers, annotations etc.?

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem in reading a sequence file

2011-07-05 Thread Philipp Pagel


On Tue, Jul 05, 2011 at 04:53:32PM +0200, albert coster wrote:

I'm taking this back to the list so others can follow up.

 Yes, the file is consists of one string (sequence) per line.
 
 The files format is following: 
 
 Sequence
 NNATTAAAGGGC

OK - in that case (and as you want a vector anyway) you can use
scan('seq.txt', what=character)()

 
  seqfile-read.table(seq.txt)
 Warning message:
 In read.table(seq.txt) :
   incomplete final line found by readTableHeader on 'seq.txt'

OK - that means you don't have a newline ('\n') at the end of your
sequence file and read.table is warning you about that.

  str(seqfile)
 'data.frame': 2 obs. of  1 variable:
  $ V1: Factor w/ 2 levels NNATTAAAGGGC,..: 2 1

This indicates that there are at least two lines in the file (so you
got two levels in the factor). So I would guess there is an empy line
before your sequence or you really have the word 'Sequence' on line 1.

For sequence data it probably does not make much sense to let R
convert to factor and a character colunm would be prefered. This can
be accomplished by using one of the options 'as.is',
'stringsAsFactors' or 'colClasses'. 

If you use scan you'll need to get rid of the extra line first. If
you stick with read.table you can specify the first line as your
header line using the header=TRUE option. Now you can address column
'Sequence' as such. Example:

 dat - read.table('seq.txt', as.is=T, header=TRUE)
 dat$Sequence
[1] NNATTAAAGGGC
 dat[, 'Sequence']
[1] NNATTAAAGGGC
 str(dat)
'data.frame':   1 obs. of  1 variable:
 $ Sequence: chr NNATTAAAGGGC

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-06-27 Thread Philipp Pagel



On Sun, Jun 26, 2011 at 06:34:28PM -0700, Ungku Akashah wrote:
 hello.  I need some help about this R software. I've been searching
 for volcano plot(statistic) script for long, but still not found.
 May i request the script for volcano plot. If able, pls include any
 tips about volcano plot.


http://lmgtfy.com/?q=r+volcano+plot

cu
Philipp



-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shrink file size of pdf graphics

2011-05-20 Thread Philipp Pagel

On Thu, May 19, 2011 at 01:35:51PM -0700, Layman123 wrote:

 I tried both, the plot devices in R and pdftk. First I tried the png-device,
 but as I wanted to increase the number of pixels with 'width' and 'height',
 the labels are getting smaller

When I really need a png, I usually produce a pdf or eps first and
then convert to png of the desired resolution with the convert command
of imagemagick (but of course any other software, like e.g.
Photoshop, should work fine, too). That way I don't have to figure out
the correct paramters to make the png the way I want it and I have the
additional benetfit of a vector grpahics master file that I can
esily use to produce addictional verison in differnent resolutions
etc.

 Is there a way to do this with the gs-command so that it would be even more
 compressed?

Possibly, but of course there is a limit to how much you can compress
a file without resorting to lossy compression. You may have hit that
limit.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TR: Simulate keyboard

2011-05-16 Thread Philipp Pagel

On Mon, May 16, 2011 at 11:27:05AM +0200, Thibault Charles wrote:
 I cannot find a way to simulate a keyboard event as pressing the ?enter?
 key.
 for (i in 1:nombre_fichiers_monteCarlo){
 system(paste('C:/Trnsys17/Exe/TRNExe.exe',liste_dck_monteCarlo[i]),wait=TRUE)
 }
 
 My problem is that at each step, trnsys ask the user to press ?enter? from
 the keyboard and I would like not have to press myself on ?enter?.
 
 Does exist a function to simulate this kind of keyboard event ?

I don't think R can handle that (but I may be wrong). On a UNIX
platform, this kind of problem could be tackled with the expect
command. 

Your code above suggests you are on a Windows platform. I did a quick
google search and it seems that expect is available for windows as
part of the CYGWIN suite. And there also seems to be an expect for
windows from activestate:

http://docs.activestate.com/activetcl/8.5/expect4win/

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TR: Simulate keyboard

2011-05-16 Thread Philipp Pagel

On Mon, May 16, 2011 at 01:50:15PM +0200, Thibault Charles wrote:
 Thank you for your response.
 
 In this case, do you think it is possible to write a little program in java
 which would execute my script and simulate a press on my keyboard ?

I think you need to do it the other way round: let R call an extrenal
scripts that handles the press-enter business (expect/Java/whatever).

I still think expect would be easieast, but if you feel more
comfortable with Java, that's going to work aswell. 

Coming to thionk of it: your external software is purely text based
(i.e. running in a DOS-box), isn't it? If it's not, and you are
actually getting message windows, 'expect' won't be much help but
there are several tools out there that will happily record your ations
(keyboard and mouse events alike) and play them back later: google for
'windows macro recorder' and you'll get more varieties than you will
care for.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RV: R question

2011-05-06 Thread Philipp Pagel

 which is the maximum large of digits that R has?, because SQL work
 with 50 digits I think. and I need a software that work  with a lot
 of digits.

The .Machine() command will provide some insight into these matters.

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RV: R question

2011-05-06 Thread Philipp Pagel

On Fri, May 06, 2011 at 09:17:11AM -0400, David Winsemius wrote:
 
 On May 6, 2011, at 4:03 AM, Philipp Pagel wrote:
 The .Machine() command will provide some insight into these matters.
 
 On my device (and I suspect on all versions of R) .Machine is a
 built-in list and there is no .Machine() function.

Oops - my fault. You are right, of course.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot several histograms with same y-axes scaling using hist()

2011-04-29 Thread Philipp Pagel


On Fri, Apr 29, 2011 at 03:35:41AM -0700, hck wrote:
 Problem: hist()-function, scale = “percent”
[...]
 =Hist(na.exclude(AA3), breaks=50, col=seashell3,
 scale=percent,xlim=c(-1, 1), xlab=Bewertungsfehler,
 ylab=Haeufigkeit (in %), main=KBV, border=white)

Before anyone can really help you'll need to let us know where your
Hist() function came from. 

hist() from package graphics does not have a scale parameter and
honours ylim without a problem.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.csv fails to read a CSV file from google docs

2011-04-29 Thread Philipp Pagel

On Fri, Apr 29, 2011 at 06:19:24PM +0300, Tal Galili wrote:

 data_url - 
 http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
 read.csv(data_url)
 Error in file(file, rt) : cannot open the connection

I get the same error (R 2.11.1, Debian LINUX) and don't have a
solution. But I did some tests and found the origin of the problem

I can download the file from google with wget but get some interesting
´information in the process:


$ wget -v 
'http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv'
--2011-04-29 20:07:40--  
http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
Resolving spreadsheets0.google.com... 209.85.148.139, 209.85.148.113, 
209.85.148.138, ...
Connecting to spreadsheets0.google.com|209.85.148.139|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: 
https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 [following]
--2011-04-29 20:07:41--  
https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
Connecting to spreadsheets0.google.com|209.85.148.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: 
“pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv.1”

[ =   
] 41  --.-K/s   in 0s  

2011-04-29 20:07:42 (342 KB/s) - 
“pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv.1”
 saved [41]


The message that caught my attention was the http redirection: 302 Moved
Temporarily.

If you try again with the new url you get this:

 read.csv(url(https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=trueg;))
Error in open.connection(file, rt) : cannot open the connection
In addition: Warning message:
In open.connection(file, rt) : unsupported URL scheme

?url told me Note that ‘https://’ connections are not supported.
Case closed, problem unsolved...

Dirty workaround: use system() and wget or whatever command is available on
Windows for this.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bwlpot problems: printing, and tick labels

2011-04-28 Thread Philipp Pagel

 A. It produces empty JPEGs. When the 'bwplot' line alone is submitted, the
 plot duly shows up.

See FAQ:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f

BTW: don't use jpg for plotting if you can - they routinely look ugly.

 B. When the 'bwplot' line alone is submitted, y labels are values 1 to 6,
 not actual distinct values of y$maxthreads.

That's because maxthreads is not a factor - you can convert it to one.
See below.

 (C. I would, of course, prefer to produce plots for all distinct values of
 x$maxthreads in a single swoop, on a single figure). 

That's what I was about to suggest. Don't loop over the tasks - use
the power of lattice. I think this should be close to what you want:

bwplot(factor(maxthreads) ~ time | factor(tasks), x)

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.table: fill=T for header?

2011-04-27 Thread Philipp Pagel


Dear ExpeRts,t

I am trying to read tab delimted data produced by somewhat brain dead
software that seems to think it's a good idea to have an extra tab
character after the last column - except for the header line. As
explained in the help page, read.delim now assumes that the first
column contains the row.names (which is not even wrong) but now and all
col.names get shiftet by one column. Example:

infile - 'sample\tx1\n1\tA\t\n2\tB\t\n3\tA\t'
read.delim(textConnection(infile))
sample x1
  1  A NA
  2  B NA
  3  A NA

So I set row.names to NULL because the man page said Using
‘row.names = NULL’ forces row numbering.. Now the row.names really
are numbered automatically but I get a bonus column:

read.delim(textConnection(infile), row.names=NULL)
row.names sample x1
  1 1  A NA
  2 2  B NA
  3 3  A NA

Hm - not what I want. I am also a bit puzzeled why the extra column is
introduced instead of just using the first col.name. At the moment I
deal with it by fixing the col.names and dumping the extra column:

dat - read.delim(textConnection(infile), row.names=NULL)
colnames(dat) - colnames(dat)[-1]
dat - dat[-ncol(dat)]
dat
sample x1
  1  1  A
  2  2  B
  3  3  A

I worked my way through ?read.delim but could not find an option to
deal with these (flawed) files directly. As the opposite situation
(i.e. more col.names than data) can be fixed with fill=T I was hoping
something like fill.header=T or fill='header' may exist.  Did I just
not find it or does it not exist?  And if it doesn't - does anyone
else think it would be a nice item for the wishlist?

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple Missing cases Function

2011-04-19 Thread Philipp Pagel

On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton wrote:
 Dear all
 
  
 
 I have written a function to perform a very simple but useful task which I
 do regularly. It is designed to show how many values are missing from each
 variable in a data.frame. In its current form it works but is slow because I
 have used several loops to achieve this simple task. 

Why not use summary?

 foo - data.frame(a=c(1,3,4,NA), b=c(NA,4,NA,8), c=factor(c('A', NA, 'A', 
 'B')))
 summary(foo)
   a   bc
 Min.   :1.000   Min.   :4   A   :2  
 1st Qu.:2.000   1st Qu.:5   B   :1  
 Median :3.000   Median :6   NA's:1  
 Mean   :2.667   Mean   :6   
 3rd Qu.:3.500   3rd Qu.:7   
 Max.   :4.000   Max.   :8   
 NA's   :1.000   NA's   :2  

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple Missing cases Function

2011-04-19 Thread Philipp Pagel

On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton wrote:
 Dear all
 
  
 
 I have written a function to perform a very simple but useful task which I
 do regularly. It is designed to show how many values are missing from each
 variable in a data.frame. In its current form it works but is slow because I
 have used several loops to achieve this simple task. 

Oh - and in case you ONLY wnt the number of NAs in each column this
should be pretty efficient:

lapply(foo, function(x){sum(is.na(x))})

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2011-04-18 Thread Philipp Pagel

On Mon, Apr 18, 2011 at 04:11:57PM +0530, Ramnath R wrote:
 Hai
 
  From which CRAN mirror can get the package ?LPP2005REC?

As the first hit of a google search for LPP2005REC told me it
is not a package but a dataset in package timeSeries.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clearing Console; of weeks of codes!

2011-04-14 Thread Philipp Pagel

 I do see I have weeks of codes in my console when I check with my arrow up
 keys.  I have been clearing them with Control L but it seems to clear it
 clear the screen temporally.

CTRL-L simply clears the screen and not the history.

 I do see the previous codes again when I open R
 the next day, after quitting the session! 
 
 Q:
 How do I clear this?

What you are seeing is the R history which is stored in the file
.Rhistory in the current working directory when the session is closed
or savehistory() is used. Deleting that file before starting R will
clear the history. I am not sure you can clear the history of a
running R session. Deleting the file will not work while the session
is open because the history is in memory at that time and I am not
aware of a command to manipulate the current history.

The environment variable R_HISTSIZE can be used to control the size
of the history.

see ?history for details.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hash table...

2011-04-14 Thread Philipp Pagel

On Thu, Apr 14, 2011 at 06:44:53PM +1200, Worik R wrote:
 To improve the efficiency of a process I am writing I would like to cache
 results.  So I would like a data structure like a hash table.
 
 So if I call Z - f(Y)  I can cache Z associated with Y: CACHE[Y] - Z
 
 I am stumped.  I expected to be able to use a list for this but I cannot
 figure how

If y is an integer, factor or string you could try something along these
lines:

cache - list()
y - 12
cache[[as.character(y)]] - sqrt(y)
y-98
cache[[as.character(y)]] - sqrt(y)
cache

$`12`
[1] 3.464102

$`98`
[1] 9.899495

Of course this can get you in trouble if y is a floating point
number because of the issues with identity of such numbers, as
discussed in ?all.equal and FAQ 7.31 Why doesn't R think these
numbers are equal?.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for loop performance

2011-04-14 Thread Philipp Pagel

 I am running some simulations in R involving reading in several
 hundred datasets, performing some statistics and outputting those
 statistics to file. I have noticed that it seems that the time it
 takes to process of a dataset (or, say, a set of 100 datasets) seems
 to take longer as the simulation progresses.

Reading data, e.g. with read.table can be slow because it does a fair
bit of checking content, guessing data types etc. So I guess the
question is: how is your data stored (files, in what format,
database) and how do you read it into R? 

Once we know this there may be tricks to speed up the data import.

 I am curious to know if this has to do with how R processes
 code in loops or if it might be due to memory usage issues (e.g.,
 repeatedly reading data into the same matrix).

Probalby not - I would guess it's the parsing of the input data that
is slow.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clearing Console; of weeks of codes!

2011-04-14 Thread Philipp Pagel


Please reply to the list, so the OP and otheres following the thread
can see your contributions. I'm taking this back to r-help.

On Thu, Apr 14, 2011 at 01:43:31AM -0700, Mohammad Tanvir Ahamed wrote:
 you can try it ...
 
 rm(list=ls())

No - this has been suggested before and saying it gain does not make
it less wrong: rm() will delete objects from the workspace but has
absolutely no effect on the history.

cu
Philipp


 /...Tanvir Ahamed
 
 
 
 ━━━
 From: Philipp Pagel p.pa...@wzw.tum.de
 To: r-help@r-project.org
 Sent: Thursday, April 14, 2011 10:23 AM
 Subject: Re: [R] Clearing Console; of weeks of codes!
 
  I do see I have weeks of codes in my console when I check with my arrow
 up
  keys.  I have been clearing them with Control L but it seems to clear it
  clear the screen temporally.
 
 CTRL-L simply clears the screen and not the history.
 
  I do see the previous codes again when I open R
  the next day, after quitting the session!
 
  Q:
  How do I clear this?
 
 What you are seeing is the R history which is stored in the file
 .Rhistory in the current working directory when the session is closed
 or savehistory() is used. Deleting that file before starting R will
 clear the history. I am not sure you can clear the history of a
 running R session. Deleting the file will not work while the session
 is open because the history is in memory at that time and I am not
 aware of a command to manipulate the current history.
 
 The environment variable R_HISTSIZE can be used to control the size
 of the history.
 
 see ?history for details.
 
 cu
 Philipp
 
 --
 Dr. Philipp Pagel
 Lehrstuhl f r Genomorientierte Bioinformatik
 Technische Universit t M nchen
 Wissenschaftszentrum Weihenstephan
 Maximus-von-Imhof-Forum 3
 85354 Freising, Germany
 http://webclu.bio.wzw.tum.de/~pagel/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for loop performance

2011-04-14 Thread Philipp Pagel

On Thu, Apr 14, 2011 at 06:50:56AM -0500, Barth B. Riley wrote:
 
 Thank you Phillip for your post. I am reading in:
 
 1. a 3 x 100 item parameter file (floating point and integer data)
 2. a 100 x 1000 item response file (integer data)
 3. a 6 x 1000 person parameter file (contains simulation condition
 information, person measures)
 
 4. I am then computing several statistics used in subsequent ROC
 analyses, the AUCs being stored in a 6000 x 15 matrix of floating
 point numbers
 
 I am using read.table for #1-#3 and write.table for #4. The process
 of reading files (#1-#3) and writing to file is done over 6,000
 iterations.

A few ideas:

1) try to use the colClasses argument to read.table. That way R will
not have to guess the data type of columns.

2) When you say 6000 iterations - do you mean you are reading/writing the SAME
files over and over again? Or do you have 6000 sets of files? In the
former case the obvious advice would be to only read them once.

3) If the input files were generated in R, another option would be to
save()/load() them rather than using write.table()/read.table(). 

4) If the came from some other application, possibly storing
everything in a database may speed up things.

5) Is your data on a file server? If yes: try moving it to the local
disc temporarily to see if network i/o is limiting your speed.

6) Whatever you try to improve performance - measure the effects
rather than rely on your impression (system.time, Rprof, ...) in order
to find out what part of the program is actually eating up the most
time.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compatibility with Work Load/Resource Managers

2011-04-13 Thread Philipp Pagel

 I was wondering if anyone knew whether R is capable of integrating
 with the following work load/resource managers TORQUE, OpenPBS, PBS
 Pro, LSF, and SGE? 

I am running R scripts in our cluster under SGE on a regular basis and
have also done that under Platform LSF in the past but I am not sure
what you mean by integrating with these systems.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xyplot, groups and colors

2011-04-11 Thread Philipp Pagel

On Fri, Apr 08, 2011 at 08:14:21AM -0700, Dennis Murphy wrote:

Thanks to everyone who replied! Especialy this and the ggplot advice
did what I wanted.

 xyplot(circumference~age, dat, groups=Tree, type='l',
   col.line = c('red', 'blue', 'blue', 'red', 'red'))

This is essentially what I had been doing after somehow creating the
correct color vector.

 After a little more fiddling around, this also works, and seems a bit less
 kludgy:
 
 dat$group2 - factor(dat$group, labels = c('red', 'blue'))
 xyplot(circumference~age, dat, groups=Tree, type='l',
   col.line = levels(dat$group2))

Perfect! Using the levels directly had not occured to me. Thanks!

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list to data frame

2011-04-10 Thread Philipp Pagel

On Sun, Apr 10, 2011 at 06:01:39PM +, Franklin Tamborello II wrote:
 I need to make a data frame out of the data that I currently have in
 a list. This works, but is ugly:
 ineffData-rbind(ineffFilesList[[1]], ineffFilesList[[2]],
 ineffFilesList[[3]], ineffFilesList[[4]], ineffFilesList[[5]],
 ineffFilesList[[6]], ineffFilesList[[7]], ineffFilesList[[8]],
 ineffFilesList[[9]], ineffFilesList[[10]], ineffFilesList[[11]],
 ineffFilesList[[12]], ineffFilesList[[13]], ineffFilesList[[14]],
 ineffFilesList[[15]], ineffFilesList[[16]], ineffFilesList[[17]],
 ineffFilesList[[18]], ineffFilesList[[19]], ineffFilesList[[20]],
 ineffFilesList[[21]], ineffFilesList[[22]], ineffFilesList[[23]],
 ineffFilesList[[24]], ineffFilesList[[25]], ineffFilesList[[26]],
 ineffFilesList[[27]])
 
 
 What's an efficient way of doing this such that the computer will do
 the work of recurring through the list of elements of
 ineffFilesList?

as.data.frame(ineffFilesList)


cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xyplot, groups and colors

2011-04-08 Thread Philipp Pagel


Dear ExpeRts,

I am trying to plot a bunch of growth curves and would like to get
some more control over groups and line colors than I seem to have.

Example:

# make some data
dat - Orange
dat$group - ifelse(dat$Tree%in%c('1','4','5'), 'A', 'B')

# plot
xyplot(circumference~age, dat, groups=group)

# now use lines to make the growth curve more visible
xyplot(circumference~age, dat, groups=group, type='l')
# ugly, because of the 'return' lines

# to fix this set groups to Tree
xyplot(circumference~age, dat, groups=Tree, type='l')
# better, but now each Tree has its own color

Of course I can now use the col argument to manually assign the colors
by group but is there a more elegant way that I missed?

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] force output dimension of table function

2011-04-07 Thread Philipp Pagel

On Thu, Apr 07, 2011 at 05:37:08AM +0200, fisken wrote:
 When I use the 'table' function on a simple vector it counts the
 number of occurences.
 So depending on the values of my input vector the function returns a
 class of type table with different lengths.
 
 Is there an easy way to tell the table function, the values to expect?
 
 And what I wanted was
 
 0 1 2 3 4 5
 0 1 1 1 0 2

The solution using factos has already been posted. if you are really
interested in integers only you could also use tabulate():

 tabulate(s)
[1] 1 1 1 0 2

Note that this excludes zero, though.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function order

2011-04-06 Thread Philipp Pagel

On Wed, Apr 06, 2011 at 11:35:32AM +0100, Yan Jiao wrote:
 abc-cbind(c(1,6,2),c(2,5,3),c(3,2,1))## matrix I want to sort
 
 if I do
 abc[ order(abc[,3]), increasing = TRUE]

Jim already pointed out that the argument needs to go inside the
parenthes of the order function. In addition, order has an argument
called 'decreasing', but none called 'inceasing'. Finally, you are
lacking a comma in your subsetting of the matrix:

 abc[ order(abc[,3], decreasing=F)]
[1] 2 6 1

But you probably mean:

 abc[ order(abc[,3], decreasing=F), ]
 [,1] [,2] [,3]
[1,]231
[2,]652
[3,]123

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving console and graph output to same file

2011-04-05 Thread Philipp Pagel

On Tue, Apr 05, 2011 at 10:53:03AM +0530, Nikhil Abhyankar wrote:
 Hello All,
 
 How do I save the output of the R console and the graphic output to the same
 PDF file and append these to each other?
 
 I need to have a frequency table and a corresponding graph, one below the
 other in a file. I have tried with sending the cross table to the graph
 window using 'textplot' and then saving the graphic output. However, the
 table does not look nice in the graph output.
 
 Is there any way the output from the console can be saved in a file and then
 the output from the graph window be appended to the same file?

Sweave an odfWeave are very nice methods for generating reports with
both text, R code, Results from R and Graphics.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] one question about bioconductor

2011-03-31 Thread Philipp Pagel

On Thu, Mar 31, 2011 at 09:32:06AM -0700, wang peter wrote:
 dear lady and gentalmen:
   i am gaoshan from kansas university.
 i used such coding to deal with gel data
 
 data - ReadAffy()
  Warning messages:
 1: In file(out, wt) :
   cannot open file
 'C:\Users\gaoshan\AppData\Local\Temp\RtmpvsyXOV\Rhttpd3f0b2e85': No such
 file or directory

As the message says: there is something wrong with the path. 

In order to get more helpful replies, you should show the actual code
you used and also give a hint about the spoecific packages you were
using. E.g. ReadAffy most certainly requires at least a filename which
seems to be missing from your comamnd above. 

In addition, I recommend to post your question on the bioconductor
mailing list.

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read password-protected files

2011-03-31 Thread Philipp Pagel

On Thu, Mar 31, 2011 at 11:00:48AM -0700, Shi, Tao wrote:
 Hi list,
 
 I have a bunch of .csv files that are password-protected.  I wonder if there 
 is 
 a way to read them in in R without manually removing the password protection 
 for 
 each file?

I doubt that there is such a thing as a password protected csv file.
They are just text files, after all. So I guess you have something
else. How or what did the presumed pasword protection?

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not all rows are being read-in

2011-03-30 Thread Philipp Pagel

On Tue, Mar 29, 2011 at 06:58:59PM -0400, Dimitri Liakhovitski wrote:
 I have a tab-delimited .txt file (size 800MB) with about 3.4 million
 rows and 41 columns. About 15 columns contain strings.
 Tried to read it in in R 2.12.2 on a laptop that has Windows XP:
 mydata-read.delim(file=FileName.TXT,sep=\t)
 R did not complain (!) and I got: dim(mydata) 1692063 41.

My guess would be that there are (unexpected) quotes and/or double quotes in 
your
file and so R thinks that rather large blocks of your file are
actually very long strings. This routinely happens in situations like
this:

ID  x   description
1 0.4   my first measurement  
2 1.6   Normal 5 object
3 0.4   Some measuremetn
4 0.7   A 4 long sample

R thinks that the description in row 2 ends in row 4 and you loose
data.

Try read.delim(..., quote=).

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using graphics straight from R into published articles

2011-03-30 Thread Philipp Pagel

On Wed, Mar 30, 2011 at 08:48:55AM +, ONKELINX, Thierry wrote:
 Large snip.
  
  Absolutely vector - no jpeg, png, ... although it takes 
 
 That depends on the kind of graph. I aggree that you should try
 vector at first. But when it generates very larges files (e.g.
 scatterplots with thousands of points) then you better switch to
 bitmaps like tiff or png. Jpeg can create artefacts, so is not very
 good for graphics.

True. Sometimes one can get away with switching from a normal
scatterplot to hexbin or something like this but if that is not
anoption a high resolution tiff or png is the way out.

And of course, I agree that jpeg should never be used for graphs.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using graphics straight from R into published articles

2011-03-30 Thread Philipp Pagel

On Wed, Mar 30, 2011 at 09:56:09AM -0700, blanco wrote:
 Wow - thanks all for your helpful replies.  Awesome forum.
 
 Am I right to assume that you use the postscript function to create .ps  and
 .pdf files from R?

almost:

postscript(..., onefile=FALSE) # for eps
pdf() # for PDF

And don't forget to close the device with dev.off() after the plot.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reversing order of vector

2011-03-29 Thread Philipp Pagel

On Tue, Mar 29, 2011 at 12:20:50AM -0700, Vincy Pyne wrote:
 
 vect1 = as.character(c(ABC, XYZ, LMN, DEF))

as.character is unnecessary, here.

  vect1
 [1] ABC XYZ LMN DEF
 
 I want to reverse the order of this vector as
 
 vect2 = c(DEF, LMN, XYZ, ABC)

vect2 - rev(vect1))

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] producing histogram-like plot

2011-03-29 Thread Philipp Pagel

On Tue, Mar 29, 2011 at 11:05:08AM +0200, Karin Lagesen wrote:
 Hi!
 
 I have a dataset that looks like this:
 
 0.0   14
 0.0   3
 0.9   12
 ...and so on.
 
 I would like to plot this in a histogram-like manner.

One way would be to re-create the original data and then simply use
hist:

dat - data.frame(x=c(0,0,0.9,0.73,0.78,1,0.3,0.32), 
freq=c(14,3,12,15,2,15,2,8))
hist(with(dat, rep(x, times=freq)))

My example did not take special binning wishes into account but you
can easily customiye that with the breaks argument to hist.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using graphics straight from R into published articles

2011-03-29 Thread Philipp Pagel

On Tue, Mar 29, 2011 at 09:31:18AM -0700, blanco wrote:
 I was just wondering if people use graphics from R straight into articles or
 are they always edited  in some way; fonts, headers, axis, color etc?  Using
 photoshop or some other programs?
 
 I would like to think it is possible, better and more profession to do it
 all in R.
 I tried google and the search option but found nothing on the topic.  
 
 What are the experiences for all the professionals out there that use R?
 Are there any articles on this specific subject?

I'm not aware of any articles on the topic but I can share what I do:

95% of the time I tweak various graphics parameters in R and see no
necessity for postprocessing in other applications.

In about 5% I do some manual editing for a camera ready figure.
These are usually the result of exotic request from referees. But
under no circumstances would I use Photoshop or any other pixel
graphics software for this. My R graphics are always created as eps or
pdf vector graphics and any editing is done with a proper vector
graphics software (Illustrator or Inkscape).

I share your feeling that it is better to do as much as possible in R
because it means that I won't have to do it again if I need to produce
another revision of the figure - all it takes is anoother run of my
script. And I can re-use good solutions in the future. Any manual
touch-ups have to be done manually every single time = not my idea of
efficiency.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.xls - rotate data.frame

2011-03-25 Thread Philipp Pagel

On Fri, Mar 25, 2011 at 11:43:31AM +0100, Knut Krueger wrote:
 Hi to all,
 how could I  to rotate automatically a data sheet which was imported
 by read.xls?
 
  x1 x2 x3  xn
 y1 1   4  7   ...  xn/y1
 y2 2   5  8    xn/y2
 y3 3   6  9xn/y2
 yn ... ... ... Xn/Yn
 
 
 to
 
   y1 y2  y3     yn
 x1  1  23 . Yn/x1
 x2  4  56   Yn/x2
 x3  7  8   9    Yn/x2
 xn  ...   ...  ...   .  Yn/xn

If all the columns (x) are of the same type (e.g. all numeric) you can
use t(). Example:

dat - data.frame(x1=1:10, x2=(1:10)*2, x3=10:1)
dat2 - as.data.frame(t(dat))

If the comlumns are of differnt types (e.g. some numeric, some
factors) I don't think you can do this at all, because columns of a
data.frame represent vectors, i.e. all value sin a column need to be
of the same type.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.xls - rotate data.frame

2011-03-25 Thread Philipp Pagel

 Unfortunately we have mixed types f.e text , dates times , and numbers

OK - in that case you can't fit the data into data.frame. Possibley
you cold get what you need using some kind of list structure but I
think it's better to ask why you need to transpose the data. Maybe
someone can suggest an alternative solution that doesn't require the
transposition.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.xls - rotate data.frame

2011-03-25 Thread Philipp Pagel

 we have (imported from excel)
 
 frame - 
 data.frame(x0=c(y1,y2,y3,y4),x1=c(1,2,3,4),x2=c(5,6,7,8),x1=c(9,10,11,12))
 where y1..yn are the names of the rows
 we need  frame$x1   .. .  frame$xn
 and  frame[1,] .. frame[n,]  but the first column is no the rownames.
 
 if it is possible to rotate the whole dataset we could use
 
 frame$y1 ..frame$y2

I am not 100% sure I understood what you intend to do but I think what
you are saying is that you would like to address certain rows by name
rather than by index. Is that correct?

If so you could solve it like this:

# assign the desired row names
rownames(frame) = frame[,1]
# remove the old name column
frame - frame[,2:ncol(frame)]

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.xls - rotate data.frame

2011-03-25 Thread Philipp Pagel

 I am not 100% sure I understood what you intend to do but I think what
 you are saying is that you would like to address certain rows by name
 rather than by index. Is that correct?
 
 If so you could solve it like this:
 
 # assign the desired row names
 rownames(frame) = frame[,1]
 # remove the old name column
 frame - frame[,2:ncol(frame)]

And adding to my own posting:

removing the column can be done more elegantly:

frame - frame[,-1]

And I forgot to mention that now you can say things like

frame['x2',]w

The frame$y2 notation still only works for columns, of course.
Maybe, if you tell us some more about your actual analysis, 
more help can be provided.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Magic Number Error Message

2011-03-25 Thread Philipp Pagel

On Fri, Mar 25, 2011 at 06:42:49AM -0700, armstrwa wrote:

 When I attempt to run a script, I keep getting the error message shown

  load(H:\\Restoration Center\\Climate Change and
  Restoration\\MidAtlFloodRisk\\discharge data\\R files\\ALRT.txt)
 Error: bad restore file magic number (file may be corrupted) -- no data
 loaded
 In addition: Warning message:
 file 'ALRT.txt' has magic number '# Coh'
Use of save versions prior to 2 is deprecated 

The load() function reads stored DATA into the workspace. As you say
you want to run a SCRIPT you are probably looking for source().

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bar Chart

2011-03-23 Thread Philipp Pagel

 How do you do a bar chart of 2 vectors?
 I have one vector which has 10 numbers, and another which has 10 names.
 The numbers are the frequency of the corresponding name, but when I do a bar
 chart it says that there is no height. Thanks. 

The first thing we'd need to know is HOW you tried to create the bar
chart. R usually offer quite a lot different ways to tackle a problem
so knowing what exactly you did helps a lot in helping. 

That said, I'll assume you tried the barplot() command which would
work e.g. like this:

v1 - 1:3
v2 - c('A', 'B', 'B')
barplot(v1, names.arg=v2)

If v1 is a named vector things are even easier:

names(v1) - v2
barplot(v1)

As I said, there are a bunch of other ways - e.g. using the lattice function 
barchart() which works a bit differently.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graph lines don;t appear

2011-03-15 Thread Philipp Pagel

On Tue, Mar 15, 2011 at 12:01:45PM +0100, Sara Szeremeta wrote:
 Hi
 
  I am trying to plot two simple graphs with a grid in background. The axis
 and grid appears in correct position, but the actual data are not there
  Can somebody provide me a hint what is missing?
 
  The code is:
 
 pln - read.table(file=PLN.txt, header=TRUE, dec=,)
 par(mfrow=c(1,2))
 plot(pln[,1], type=l, lwd=2, ylab=EUR/PLN, xlab=NULL, xlim = c(1993,
 2011), ylim = c(2, 5),  panel.first = grid(nx=NULL, ny=NULL))
 plot(log(pln[,1]), type=l, ylab=EUR/PLN, xlab=NULL, xlim = c(1993,
 2011), panel.first = grid(equilogs = FALSE))

pln[,1] is just one column, so you are not plotting the values vs the
year but vs. their index. As xlim is set to the interval 1993-2011 you
simply don't see your data...

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] beamer overlays with Sweave?

2011-03-08 Thread Philipp Pagel

   This may be asking too much, but I'm wondering if anyone has a
 solution (even a hack) for creating multiple (overlay) plots in an
 Sweave file and post-processing the overlays in beamer appropriately.


Although I have not done this with beamer and overlays before, I once
had to resort to generating the includegraphics commands from within a
loop in order to save substantial amoutns of typing. You could do
something similar along these lines (untested):


echo=F=
slidenum - 1
plotbasename - something
plotfilename - paste(plotbasename, slidenum, .pdf, sep=)
pdf(file=plotfilename)
plot(stuff)
dev.off()
cat(\\only, slidenum, {\includegraphics{, plotfilename ,}}\n 
,sep=)
@

echo=F=
slidenum - slidenum + 1
plotfilename - paste(plotbasenema, slidenum, .pdf, sep=)
pdf(file=plotfilename)
plot(stuff)
dev.off()
cat(\\only, slidenum, {\includegraphics{, plotfilename ,}}\n 
,sep=)
@

If you want toget really fancy, you could wrap most of this in a
conveniance function...

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] beamer overlays with Sweave?

2011-03-08 Thread Philipp Pagel


Oops - have to comment my own answer:

 echo=F=

For this to work it needs to be 

echo=F, results=tex=


cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating additional column

2011-03-08 Thread Philipp Pagel



Hi!

max.col does what you want. Example:

 dat - data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20))
 dat
 a   b  c
1   1.17910304 -0.56951219 -0.2243664
2  -1.43840866 -0.99013855 -0.1613536
3   1.08515152 -0.77975274  0.3734530
4  -0.92154605 -0.20318367  0.1384842
[...]
 dat$maxcol - colnames(dat)[max.col(dat)]
 dat
 a   b  c maxcol
1   1.17910304 -0.56951219 -0.2243664  a
2  -1.43840866 -0.99013855 -0.1613536  c
3   1.08515152 -0.77975274  0.3734530  a
4  -0.92154605 -0.20318367  0.1384842  c
[...]

cu
Philipp

On Tue, Mar 08, 2011 at 01:25:10PM +0100, Bodnar Laszlo EB_HU wrote:
 Hello everybody,
 
 I have a little problem in good old R. It is basically the following.
 
 I have this small database with 3 rows and the following columns:
 d1,
 d2,
 d3 and
 Highest d value - which selects the highest value from d1, d2, d3 in each row.
 
 d1  d2  d3  Highest d value
 1   51.398426   39.111721   11.6086220  51.398426
 2   4.0578017.7284070.1234711   7.728407
 3   7.2793417.36050918.2964676  18.296468
 
 I'd like to make an additional column which shows the label of the relevant 
 column where we've found the maximum d value. Something like this:
 
 d1  d2  d3  Highest d value Where is the 
 maximum?
 1   51.398426   39.111721   11.6086220  51.398426   d1
 2   4.0578017.7284070.1234711   7.728407d2
 3   7.2793417.36050918.2964676  18.296468   d3
 
 Is there an easy way to do this?
 Thank you very much and have a pleasant day!
 
 Laszlo
 
 
 Ez az e-mail ??s az ??sszes hozz?? tartoz?? csatolt mell??klet titkos 
 ??s/vagy jogilag, szakmailag vagy m??s m??don v??dett inform??ci??t 
 tartalmazhat. Amennyiben nem ??n a lev??l c??mzettje akkor a lev??l 
 tartalm??nak k??zl??se, reproduk??l??sa, m??sol??sa, vagy egy??b m??s ??ton 
 t??rt??n?? terjeszt??se, felhaszn??l??sa szigor??an tilos. Amennyiben 
 t??ved??sb??l kapta meg ezt az ??zenetet k??rj??k azonnal ??rtes??tse az 
 ??zenet k??ld??j??t. Az Erste Bank Hungary Zrt. (EBH) nem v??llal 
 felel??ss??get az inform??ci?? teljes ??s pontos - c??mzett(ek)hez t??rt??n?? 
 - eljuttat??srt, valamint semmilyen k??s??s??rt, kapcsolat 
 megszakad??sb??l ered?? hibrt, vagy az inform??ci?? felhaszn??l??s??b??l 
 vagy annak megb??zhatatlans??g??b??l ered?? k??r??rt.
 
 Az ??zenetek EBH-n k??v??li k??ld??je vagy c??mzettje tudom??sul veszi ??s 
 hozz??j??rul, hogy az ??zenetekhez m??s banki alkalmazott is hozz??f??rhet az 
 EBH folytonos munkamenet??nek biztos??t??sa ??rdek??ben.
 
 
 This e-mail and any attached files are confidential an...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How two compare two matrixes

2011-03-04 Thread Philipp Pagel

 Dear all I have two 10*10 matrixes and I would like to compare
 theirs contents. By the word content I mean to check visually (not
 with any mathematical formulation) how similar are the contents.

If they are really only 10x10 you can simply print them both to the
screen and look at them. I'm not sure what else you could do if you
are not interested in a specific distance emasure etc.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How two compare two matrixes

2011-03-04 Thread Philipp Pagel

On Fri, Mar 04, 2011 at 01:49:29AM -0800, Alaios wrote:
 That's the problem
 Even a 10*10 matrix does not fit to the screen (10 columns do not
 fit in one screen's row) and thus I do not get a well aligned matrix
 printed.
 
 This is that makes comparisons not that easy to the eye.  From the
 other hand  with edit(mymatrix) I get scrolls so I can scroll to one
 row and see only the area  I want to focus in. Problem with edit is
 that it blocks cli and thus I can not have two edits running at the
 same time.

Hm - it does fit on my screen but if you're on a laptop... Maybe you
could write both matrices to files and compare them in an external
viewer (Excel, less, ...).

If I remember correctly, the object browser/data viewer of JGR allows 
editing several objects at once.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overleap an iteration within a for-loop when error message produced

2011-03-04 Thread Philipp Pagel

 Assume that the 5th iteration (subject=5) leads to the error
 message. How can I tell R to continue with the 6th iteration?

try or tryCatch are probably what you want.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.table

2011-02-22 Thread Philipp Pagel

 I am using read.table to read a plan 3 column CSV file . the file is getting
 read .
 
 But the first column has datetime in the csv file in the following format:
 20110221.114041
 
 But this is being read as 20110221 only . the time portion (decimal is
 missing) in the data frame

My guess is that ist does get read correctly but you only see part of
the actual number because R will usually not print all available
digits. Example:

 a - 20110221.114041
 a
[1] 20110221
 options(digits)
$digits
[1] 7

 format(a, digits=7)
[1] 20110221
 format(a, digits=20)
[1] 20110221.114041

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] qbeta

2011-02-22 Thread Philipp Pagel

On Tue, Feb 22, 2011 at 10:09:51AM +, Dr. Alireza Zolfaghari wrote:
 Hi List,
 Does any body know how I can see the code behind qbeta function?

As the code seems to be internal, you'll need to download the r-source
code and find it in there. In my copy of R it is here:
R-2.11.1/src/nmath/qbeta.c

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitting logit to data

2011-02-21 Thread Philipp Pagel

On Mon, Feb 21, 2011 at 12:13:09PM +0100, Sylvia Tippmann wrote:
 Hello,
 
 I'd like to fit a logit function to my data.
 The data is distributed like a logit (like in this plot on wikipedia 
 http://en.wikipedia.org/wiki/File:Logit.png)
 but the values on the x-axis are not between 0 and 1.
 I don't think using a glm is the solution because I simply want to  
 infer the parameters of the logit function
 (offset, compression, slope...), so I can apply it to all my values on  
 x and get my value y.

Two ideas:

1) scale your data so it does fit in [0,1] before fitting a glm

2) Use nls() to fit whatever function you find suitable

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix in R

2011-02-18 Thread Philipp Pagel

On Fri, Feb 18, 2011 at 06:32:01AM -0800, danielepippo wrote:
 
 but if in my function 
 pp_ris2[i,j]=myfunction}
 must be the indexes 0-0,0-1,0-2,0-3, ?

You'll have to take care of that yourself with a bit of index
arithmetics. It's  the same you encounter in C, if you are
modelling something that would like to be indexed starting with 1 -
just the other way round.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] saving plots

2011-02-14 Thread Philipp Pagel

On Mon, Feb 14, 2011 at 12:59:13PM +0530, km wrote:
 Hi all,
 
 Is there a way to save the currently displayed plot to  an image format just
 after we view it?
 I think this would be more intuitive as a user if I wish to save it just
 after I visualize the plot.
 
 I am aware that we need to do some thing like this
 jpeg('somefilename.jpg')
 ... plot... commands...
 dev.off()

In addition to savePlot, which has already been recommended, you may
also want to look at dev.copy2eps and dev.copy2pdf. 

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.Date

2011-02-09 Thread Philipp Pagel

On Wed, Feb 09, 2011 at 08:44:30AM +0100, Valeri Fabio wrote:
 Hello,
 
 I find out which package disturbs as.Date(). It is the package Epi:
 
  as.Date(36525, origin=1900-01-01)
 [1] 2000-01-02
  library(Epi)
  as.Date(36525, origin=1900-01-01)
 [1] 2070-01-01
  detach(package:Epi)
  as.Date(36525, origin=1900-01-01)
 [1] 2000-01-02

OK - that makes sense. Epi has its own as.Date.numeric function and
upon loading the package you get a warning:


 library(Epi)
Attaching package: 'Epi'
The following object(s) are masked from 'package:base':
as.Date.numeric, merge.data.frame


A quick look at the manual page confirms that Epi's version does not
have an origin option. 

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.Date

2011-02-08 Thread Philipp Pagel

 I have a strange behavior of the as.Date() function. For example:
 as.Date(36525, origin=1900-01-01')
 
 I would expect to get 2000-01-01. But R gives me

That's almost exactly what I get with R 2.11.1, LINUX (minus the
one-day differnce which is probably correct, too lazy to count leap
years...):

 as.Date(36525, origin=1900-01-01')
[1] 2000-01-02

At first I thought the excess single quote maight be causingyour
problem, but it doesn't for me.

Maybe you need to upgrade R? Possibly it's an already fixed issue?

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Average of several line plots

2011-02-03 Thread Philipp Pagel

On Thu, Feb 03, 2011 at 01:36:57AM -0800, mattnixon wrote:
 
 The data doesn't represent functions. Basically the X values represent the
 distance across a sample and the Y values are a measure of the colour
 intensity at that point across the sample (i.e. a line plot across the
 sample). Each data set represents a measurement across a different section
 of the sample. All data sets show alternating 'light' and 'dark' sections,
 though the sample isn't perfect so the widths of each section do not
 entirely match up from one data set to another. 
 
 The problem comes from the fact that some data sets contain as many as 400
 measurements across the sample whereas others contain as few as 150
 measurements. This means that measurements do not necessarily occur at the
 same value of X on different data sets. Therefore I think I need some way to
 average the lines ('of best fit') that each data set creates on the graph,
 rather than averaging the data ponits themselfs as I can't see how I can
 take averages/weighted averages of the data points when they occur at
 different values of X (and at different intervals) across the sample. Is my
 description any better this time?

I am not 100% sure, but if I understand your problem correctly,
loess() may be applicable.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CSV value not being read as it appears

2011-01-14 Thread Philipp Pagel

On Fri, Jan 14, 2011 at 07:58:07PM +1000, bgr...@dyson.brisnet.org.au wrote:
 
 Thanks for your e-mail. The data was a report derived from a statewide
 database, saved in EXCEL format, so the usual issue of the vagaries of
 human data entry variation wasn't the issue as the data was an automated
 report, which is run every three months.

If this problem occurs with computer generated data, it may also be
worthwhile to talk to whoever is in charge of that reporting system
and hope to get the bug fixed.

And just to add one of my favorite inital checks: I always double
check if the number of levels of each factor in my data.frame seems to
make sense.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Panel title: mfrow() or ?

2011-01-10 Thread Philipp Pagel

 par(mfrow=c(3,2))
 
 The 6 graphs are coming out quite all right, but now I would like to
 put a title on top of the page - i.e. something that is common for
 all 6 graphs - how can I do that?

title(main=My title, outer=TRUE)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 300 dpi and eps:

2010-12-16 Thread Philipp Pagel


 Can someone recommend some paper that makes clear the relation and
 distinctions between vector and raster graphics, but especially with
 some practical examples in regard to what is the relation between
 page (height and width) and dpi.

I'm not aware of a paper, but it's really not rocket science, as long
as you stay away from color calibration at which point it IS rocket
science ;-)

You should not be concerned at all with the relation between page
dimensions and dpi - that's the publishers business. All you need to
ensure is that the figures you provide are of high quality and in an
accepted format.

 In A. photoshop for example I can define for a graph width in
 inches, height in inches and resolution in pixels/inch color model
 CMYK and 8 bit. How one works in R?

Don't. Just stick with a vector format. All journals I have ever dealt
with accepted either EPS or PDF as a vector format.

 Or one saves the graph from postscript function as eps or tiff and
 you tell to the editor of the journal do whatever you want because I
 am done; I provided you already a vector graph that has infinite
 pixels?:-)

Exacxplty that, except TIFF is not a vector format and you should not
use it with R.

Some rules of thumb:

Use a pixel format if and only if

 1) your image is a picture from a digital camera, scanner,
microscope, screenshot or similar. I.e. the original graphics wants
to be a pixel graphics by nature.

 2) You are forced to by higher powers. In this case stick with vector
format until your figure is 100% ready and only then convert to a
high-resolution TIFF/PNG/whatever.

Use a vector format in all other cases - especially if we are talking
about things you create in the computer yourself: R graphs,
flow-charts, technical illustrations, ...

If you want to add annotation to an image (i.e. a pixel graphis) never
use Photohop or similar software - instead import the pixel graphis
into a vector graphis software (e.g. Illustrator/Corel Draw/...) and
add your arrows, text etc. While this will not magically make the
image a better or even infinite resolution, it will make sure the rest
is.

If you have images (scans, photos, ...), avoid lossy compression
formats (e.g. jpg), use TIFF (or maybe PNG) instead. Lossy compression
will

 a) ruin edges - e.g. lines of a graph (that's whu screenshots in jpg
always look crappy)

 b) degrade in quality with every decompress - edit - compress cycle

The only occasion I ever convert my vector graphics to a pixel format
is when a colleague who has to use PowerPoint needs it for a
presentation. As PowerPoint does not support any vector formats except
that flaky wms/ems format there is no other choice. So I convert to a
e.g. 600dpi png. (Has this changed in recent versions of PowerPoint?)
But mind you: I don't do that in R, so I always have a vector format
master figure.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to save play back an entire R session?

2010-12-16 Thread Philipp Pagel

 Saving the session history is indeed easy (savehistory).
 The problem is the playback.  I didn't find a reliable method.

Well, you could simply source() the .Rhistory file (or the file you
saved under some other name). But as you already poitned out - it's
better to go with scripts to begin with.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 300 dpi and eps:

2010-12-15 Thread Philipp Pagel


 Everything works fine to place them in a pdf file , or eps file, but
 when it comes to have a high quality of 300 dpi these graphs are not
 good. For example I open the eps file with Adobe Illustrator (AI)
 and it shows that it is a 72dpi graph.

This is simply not true: it's an eps and thus of essentually infinite
resolution for all practial purposes. So your problem is not with 
the R-generated eps but somewhere downstream from that. Any
postprocessing, conversion or editing?

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] evaluating NAs in a dataframe

2010-12-08 Thread Philipp Pagel


Hi!

 How can one evaluate NAs in a numeric dataframe column?  For example, I have
 a dataframe (demo) with a column of numbers and several NAs. If I write
 demo.df = 10, numerals will return TRUE or FALSE, but if the value is
 NA, NA is returned. But if I write demo.df == NA, it returns as NA

Sounds like you are looking for is.na :

 is.na(c(1,NA,3))
[1] FALSE  TRUE FALSE


 As an example, I want to assign rows to classes based on values in
 demo$Area. Some of the values in demo$Area are NA
 
 for (i in 1:nrow(demo)) {
   if (demo$Area[i]  0  demo$Area[i]  10) {Class[i]-S01} ## 1-10 cm2
   if (demo$Area[i] = 10  demo$Area[i]  25) {Class[i] - S02} ##
 10-25cm2

[...]

   if (demo$Area[i] =3200) {Class[i] - S10} ## 3200 cm2
   }
 
 What happens is that I get the message Error in if (demo$Area[i]  0 
 demo$Area[i]  10) { : missing value where TRUE/FALSE needed

First of all, you don't need a loop here. Example:

# make up some data
foo - data.frame(a=sample(1:20, 20, replace=TRUE))
# assign to classes
foo$class - cut(foo$a, breaks=c(-1, 7, 13, 20), labels=c('small', 'medium', 
'large'))

This also works in the presence of NAs - but of course the class will
be NA in those cases which, at least in my opinion, is the correct
value.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to save a data set as .txt on fly?

2010-11-26 Thread Philipp Pagel

On Thu, Nov 25, 2010 at 09:23:17PM -0800, Stephen Liu wrote:
 Hi David,
 
 
  But you didn't try:
 
  DNase# which was after all the name of the object you saved.
 
 Sorry I don't follow.

He is telling you that it is not surprising that 'aaa' does not exist,
if the object you saved was called DNase

 
 I can't do it with following steps:
  DNase
  save(DNase, file=C:/Users/satimis/Documents/dnase.txt)
  load(file=C:/Users/satimis/Documents/dnase.txt)
  dnase
 Error: object 'dnase' not found
  dnase.txt
 Error: object 'dnase.txt' not found

Again - you need to use the name of the object which happens to be
'DNase' - not 'dnase', 'dnase.txt' or 'aaa'

 I'm curious to know why the .txt file created in this way can't be read with 
 Notpad and WordPad?

It can be read with them - only it does not look the way you expected.
If you want to export data for use in other software funcitons like
write.table may be of interest to you. Load and save are meant for use
in R, only.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to save a data set as .txt on fly?

2010-11-26 Thread Philipp Pagel

  He is telling you that it is not surprising that 'aaa' does not exist,
  if the object you saved was called DNase
 
 No, such a file.  Just rechecked it with Win7 search command. (Win7 has been 
 rebooted)

You are confusing two things here:

a) Files living in the filesystem 
b) Objects in your R workspace

When you saved your dataset you had to assign a filename, obviously.
But this has nothing to do with the name of the object(s) contained in
these files. So no matter what your file is called this has no effect
on the names of the objects you get upon loading the file.

I recommend reading ?save, ?load and ?save.image.

  Again - you need to use the name of the object which happens to be
  'DNase' - not 'dnase', 'dnase.txt' or 'aaa'
 
 Thanks.
 
 I have this idea at the beginning.  A further thought changed my mind.
 
 On R console
  DNase
 displays the content of the data set.
 
 If I save the file in the same name.  It may confuse me on running DNase 
 whether 
 the output is the content of the data set OR from the file created.

R does not care about the file unless you load it and you can pick any
filename you like without affecting the name of the object(s). Once
loaded, there is no magical link between the two. Of course, when you
load objects from a file this will overwrite any objects of the same
names (object names, not file names!) that happen to live in your
workspace before the load command.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shifting down ylab in a plot

2010-11-24 Thread Philipp Pagel

 I am trying to shift down the ylab of my plot but can't find how to do it.
 I tried to tune mar but it enable more room for the labels to be displayed
 but it does not move to ylab as I would like.
 
 Is there a way with par  to shift down my ylab ??

There may be a simpler/more elegant way to do it, but this does what
you asked for:

# plot data omitting the ylab
plot(1:10,1:10, ylab='')
# add the ylab myself using flushleft (adj=0.0)
mtext('foo', side=2, line=3, adj=0.0)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Change column of numbers in data frame to days

2010-10-23 Thread Philipp Pagel

 I have a vector of numbers ranging form 20 to 500.  The numbers represent
 days since a starting point. The list is not consecutive, some numbers
 skipped and some numbers duplicated.  I know day 1 was a Monday.  I want to
 use this vector in a lm but I need to factor by day.  I'm wondering how to
 assign Monday to 22,29,36,..., Tuesday to 23,30,37,... etc...

Here is one way to do it:

# make some sample data
foo - c(22,29,23,37)
# convert to factor of weekdays
foo- factor(foo %% 7, levels=1:7, labels=c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 
'Sat', 'Sun'))
foo

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with R

2010-10-15 Thread Philipp Pagel

On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:

 FATAL ERROR: unable to restore saved data in .RDATA

Without more information it's hard to know what exactly went wrong.

Anyway, the message most likely means that the .RData file got
corrupted. Deleting it should solve the problem. Note that this means
that you will get an empty workspace and have to recreate whatever
data was in it before.

 I decided to uninstall the copy (a R2.11.0) and installed a new
 version (2.11.1) but I'm still receiving the same message. When I
 click OK the closes.

Re-installation of R will most likely not fix this (unless a change in
the format of the .RData files had occurred - but to my knowledge no
such thing has happened, recently.)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scripting help

2010-09-15 Thread Philipp Pagel

On Wed, Sep 15, 2010 at 12:22:15PM -0400, Ayyappa Chaturvedula wrote:
 Dear all, I am new to R and this group. I have good experience in S
 scripts.  I need some orientation on data imports, general plotting
 functions. Can you please direct me?

Welcome to R. Coming from an S background you should have no problems
to adjust quickly. For data import have a look at the R Data
Import/Export manual:

http://cran.r-project.org/doc/manuals/R-data.html

Plotting is not to different from S-Plus. As far as my knowledge goes
there are 3 differnet plotting frameworsk in R:

1) Basic plot functions like plot or hist some of which are covered in 
the Introduction to R

2) lattice (the R aequivalent of trellis graphics

covered in many manual pages, many tutorials and talks google will
quickly find and last but not least the book
Lattice: Multivariate Data Visualization with R by Deepayan
Sarkar who implemented lattice

3) ggplot2

See http://had.co.nz/ggplot2/ for documentation
and consider the book ggplot2: Elegant Graphics for Data Analysis by
Hadley Wickham (the author of ggplot2)


I use all three frameworks on a regular basis - choosing the
respective functions depending on the complexity of the task.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice: layout and number of pages

2010-09-08 Thread Philipp Pagel


Dear expeRts,

?xyplot says: In general, giving a high value of ‘layout[3]’ is not
wasteful because blank pages are never created.

But the following example does generate blank pages - well except for
the ylab:

data(barley)
require(lattice)
stripplot(yield~year|site, barley, layout=c(2,1,5))

Did I misinterpret the sentence from the help page or is this a bug?
Yes - I know that his works fine:

stripplot(yield~year|site, barley, layout=c(2,1))

Just curious...

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Adaptively Set Up the Coordinate Range of Multiple Graphs in One Figure

2010-08-31 Thread Philipp Pagel

On Tue, Aug 31, 2010 at 03:26:12AM -0700, Wonsang You wrote:
 In the above codes, I had to  arbitrarily set up the coordinate range of the
 figure in advance before calculating the values y. (seexlim and ylim)
 In results, the figure did not contain all data since most of data were
 outside the predefined range.
 I am wondering about how to control xlim and ylim adaptive to the real range
 of data, in order to include all data in the figure.

You do that by not specifying xlim and ylim - in that case R will
calculate them based on your data. Maybe I did not understand waht
exactly you want to get but if you explicitly set the limits that's
what R is going to use.

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.loess and NA/NaN values

2010-08-30 Thread Philipp Pagel


 What you can do is patch the code to add the NAs back after the 
 Prediction step (which many predict() methods do).

Thanks Andy for your hints and especially for digging into the problem
like this! I have, in the meantime, written a simple wrapper around
predict.loess that fills in the NAs, where I would like to have them.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.loess and NA/NaN values

2010-08-30 Thread Philipp Pagel

On Mon, Aug 30, 2010 at 01:50:03PM +0100, Prof Brian Ripley wrote:
 The underlying problem is your expectations.
 
 R (unlike S) was set up many years ago to use na.omit as the
 default, and when fitting both lm() and loess() silently omit cases
 with missing values.  So why should prediction from 'newdata' be
 different unless documented to be so (which it is nowadays for
 predict.lm, even though you are adding to the evidence that was a
 mistake)?

Thanks for your insights into the undelying philisophy. I agree that
na.omit is a sensible default for model fitting. But I am not so sure
that quietly omitting unpredictable values is such a good idea -
especially if predict methods for different types of model implement
inconsistent approaches. I see no disadvantage in returning NA where
no prediction/computation is possible -- the value is 'Not Available',
after all. (And the length of the result vector would match
nrow(newdata) which would be handy for most practical purposes)

 loess() is somewhat different from lm() in that it does not in
 general allow extrapolation, and the prediction for Inf and NaN is
 simply undefined.

Of course this is correct but I still think that predict.loess not
only acts in a way that will most likely be surprising to most users
but also inconsistent with itself (Inf vs. NA/NaN). If extrapolation
is the problem Inf should not yield anything but it does (and the same
applies to values outside of the original x-range):

x - rnorm(15)
y - rnorm(15)
model.loess - loess(y~x)
predict(model.loess, data.frame(x=c(0.5, Inf)))
# [1] -0.02508801  NA
predict(model.loess, data.frame(x=min(x)-10))
# [1] NA


Actually, while tracking down my problem I did consider that
extrapolation could be the problem and, according to the last example
in ?loess, tried to set control = loess.control(surface = direct).
To my surprise, now even Inf fails - although I am much happier with
getting an error message than with silent omission.

Anyway, writing a little wrapper that puts NAs back into results, is
not a big deal and in that respect my problem is solved. 

 Nevertheless, take a look at the version in R-devel (pre-2.12.0)
 which give you more options.

Thanks for that information - I will definitely have a look at that.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predict.loess and NA/NaN values

2010-08-27 Thread Philipp Pagel


Hi!

In a current project, I am fitting loess models to subsets of data in
order to use the loess predicitons for normalization (similar to what
is done in many microarray analyses). While working on this I ran into
a problem when I tried to predict from the loess models and the data
contained NAs or NaNs. I tracked down the problem to the fact that
predict.loess will not return a value at all when fed with such
values. A toy example:

x - rnorm(15)
y - x + rnorm(15)
model.lm - lm(y~x)
model.loess - loess(y~x)
predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))

The behaviour of predict.lm meets my expectation: I get a vector of
length 5 where the unpredictable ones are NA or NaN. predict.loess on the
other hand returns only 3 values quietly skipping the last two.

I was unable to find anything in the manual page that explains this
behaviour or says how to change it. So I'm asking the community: Is
there a way to fix this or do I have to code around it?

This is in R 2.11.1 (Linux), by the way.

Thanks in advance

Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: basic hist() question

2010-08-22 Thread Philipp Pagel

 It works fine.
 
 Could you explain to me why it did not worked for read.table?

Because of what Gavin already explaied in his reply: read.table
returns a data.frame and hist needs a vector.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] which one give clear picture-pdf, jpg or tiff?

2010-08-20 Thread Philipp Pagel

On Fri, Aug 20, 2010 at 09:30:18AM -0500, Stuart Luppescu wrote:
 On Fri, 2010-08-20 at 01:30 -0700, Joshua Wiley wrote:
  I usually save them
  from R as a PDF or postscript file, rasterize them in GIMP (free
  answer to Photoshop) at the desired resolution, and finally choose the
  desired format/compression (jpeg, png, bitmap, tiff, etc.) to save it
  as from there.  
 
 Woah. That's really involved. I use this little shell function to
 convert from ps to png:
 
 function ps2png {
   ps_file=$1
   png_file=`echo $ps_file | sed -e 's/\.ps$/.png/'`
   gs -dQUIET -dNOPAUSE -dBATCH -sDEVICE=png16m -sOutputFile=$png_file 
 -r200x200 $ps_file
 }


I'd like to add yet another tool that I use on LINUX systems for this
purpose: imagemagick

# turn EPS into PNG at 600dpi
convert -density 600 foo.eps foo.png

Very conveniant, especially if there are lots of figures to be
converted:

for file in *.eps; do convert -density 600 $file `basename $file .eps`.png; done

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading a text file, one line at a time

2010-08-16 Thread Philipp Pagel

On Sun, Aug 15, 2010 at 10:58:51AM -0400, Data Analytics Corp. wrote:
 I have an upcoming project that will involve a large text file.  I want to
 
   1. read the file into R one line at a time
   2. do some string manipulations on the line
   3. write the line to another text file.

You already got some good advice about how to solve this in R. I would
just like to add that many people, including myself, prefer to do all
text file scrubbing and especially string manipulations in scripting
languages like Python or Perl followed by statistical analysis in R.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where the data file is stored?

2010-08-12 Thread Philipp Pagel

 On R I create a datafile named data.  I can evoke it on R with;
  data
 
 
 On R Commander
 Data - Active data set - Select active data set - (data) OK
 
 only one data set there data
 
 - View data set
 I can read it
 
 - Edit data set
 showing 25 rows of data.  Clicking the box shows a thick border around it.  
 But 
 I couldn't edit the data inside the box.
 
 I wonder where this datafile is stored on the OS
 
 On Ubuntu terminal;
 $ locate data.rda
 $ locate data.image
 $ locate data.images
 $ locate data.csv

You dont't tell us what you did to create a datafile - to me it
sounds like you created an object (probably a data frame) in your R
workspace. If that's  the case it is stored in a file called .RData in
your current work directory (together with other variables in your
workspace). If that is not what you did please give us mre
information.

BTW: R has a function called data and it is not a very good idea to
use function names as variable names.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] basic question about t-test with adjusted p value

2010-08-07 Thread Philipp Pagel

On Sat, Aug 07, 2010 at 04:08:40PM -0400, josef.kar...@phila.gov wrote:
 I have read the R manual and help archives, sorry but I'm still stuck. 
 
 How would I do a t-test with an adjusted p-value?
 
 Suppose that I use t.test ( ) , with the function argument alternative = 
 two.sided,  and data such that degrees of freedom = 20.  The function 
 calculates a t-statistic of 2.086, and p-value =0.05
 
 How do I then adjust the p-value?  My thought is to do
 p.adjust (pt(2.086, df=20),BH) 
 but that doesn't change anything (returns 0.975)
 
 what is the procedure?  I'm sorry if there is a basic concept that I am 
 missing here...

I'm confused - what result where you expecting? p.adjust will need to
know the number of test you are trying to adjust for - either by
giving explicitly giving the number or by handing a vector of p-values
to the function.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 64-bit R on 64-bit Windows box... Still not enough memory?!

2010-08-06 Thread Philipp Pagel

On Thu, Aug 05, 2010 at 04:40:48PM -0700, noclue_ wrote:
 
 I have a 64-bit windows box -
 Intel Xeon CPU E7340 @ 2.4GHz 31.9GB of RAM
 I have R 2.11.1 (64bit) running on it.
 
 My csv data is 3.6 GB (with about 15 million obs, 120 variables.)

Here is my guess: Your vraiables are mstly numeric but only given with
two significant digits in the csv file:

  A B ...
0.0  12.0
1.3   0.4
2.3   1.1

So that would make

15e6 * 120 * 3 / 1024^3 = 5.0 Gb

You ahve 3.6Gb - but that's close enough. If you read that into R,
each nume ris represented as a double - i.e. 8 byte. Thus the entire
data frame takes

15e6 * 120 * 8 / 1024^3 = 13.4Gb

With almost half of your memory taken things can get problematic. Once
you start actually working with the data you'll have to allow for a
lot more space because copies will probably be made in the process.

So you may have to put your data into a database and process it in
pieces. Or use sqldf or bigmemory or something like that.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 64-bit R on 64-bit Windows box... Still not enough memory?!

2010-08-06 Thread Philipp Pagel

On Fri, Aug 06, 2010 at 09:03:09AM -0700, noclue_ wrote:
 
  .Machine$sizeof.pointer 
 [1] 4

So it appears you are not on 64bit. Experpt form the help page:

[...]
sizeof.pointer: the number of bytes in a C ‘SEXP’ type.  Will be ‘4’ on
  32-bit builds and ‘8’ on 64-bit builds of R.
[...]

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to apply apply?!

2010-08-06 Thread Philipp Pagel


 How do I multiply only the close of every row using the 'apply' function?
 And once multiplied how do I obtain a new table that also contains the new
 2*CLOSE column (without cbind?).

You don't use apply in this case - a simple multiplication and
variable assignment will do:

 require(tseries)
 foo - get.hist.quote('^GDAXI')
 foo[1:10, ]
 Open   HighLow  Close
1991-01-02 1375.4 1375.4 1359.1 1366.1
1991-01-03 1371.7 1374.7 1365.2 1366.7
1991-01-04 1375.4 1398.0 1375.4 1396.1
1991-01-07 1373.6 1373.6 1352.5 1358.2
1991-01-08 1350.4 1357.1 1345.5 1354.0
1991-01-09 1358.6 1380.8 1358.0 1375.2
1991-01-10 1367.9 1383.7 1363.1 1383.4
1991-01-11 1401.3 1406.4 1376.9 1382.3
1991-01-14 1354.5 1354.5 1327.8 1327.8
1991-01-15 1327.0 1330.3 1312.4 1325.6
 foo$Close - foo$Close * 2
 foo$Close - foo$Close * 2
 foo[1:10, ]
 Open   HighLow  Close
1991-01-02 1375.4 1375.4 1359.1 2732.2
1991-01-03 1371.7 1374.7 1365.2 2733.4
1991-01-04 1375.4 1398.0 1375.4 2792.2
1991-01-07 1373.6 1373.6 1352.5 2716.4
1991-01-08 1350.4 1357.1 1345.5 2708.0
1991-01-09 1358.6 1380.8 1358.0 2750.4
1991-01-10 1367.9 1383.7 1363.1 2766.8
1991-01-11 1401.3 1406.4 1376.9 2764.6
1991-01-14 1354.5 1354.5 1327.8 2655.6
1991-01-15 1327.0 1330.3 1312.4 2651.2

 2) Also, how do I run a generic function per row. Say for example I want to
 calculate the Implied Volatility for each row of this data frame ( using the
 RMterics package). How do I do that please using the apply function? I am
 focusing on apply because I like the vectorisation concept in R and I do not
 want to use a for loop etc.

You can get the manual page of any R-command by either preceding it
by a question mark or giving the command as an argument to the help
function - specificly:

?apply
help(apply)

Especially the example section is useful for a jumpstart.
Here is an example of computing row means:

apply(foo, 1, mean)

Instead of 'mean' you can insert whatever function you'd like to apply.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: cannot allocate vector of size xxx Mb

2010-08-05 Thread Philipp Pagel

On Thu, Aug 05, 2010 at 03:53:21AM -0400, Ralf B wrote:
  a - rnorm(500)
 Error: cannot allocate vector of size 38.1 Mb

 
 When running memory.limit() I am getting this:
 
 memory.limit()
 [1] 2047
 
 Which shows me that I have 2 GB of memory available. What is wrong?
 Shouldn't 38 MB be very feasible?

From what I gather fomr ?memory.limit it does not tell you how much
memory it currently available. So my guess is that you have som rather
large objects in your workspace already and thus there is not enough
space left for you vectors.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting dataframe to matrix

2009-10-16 Thread Philipp Pagel

On Fri, Oct 16, 2009 at 01:33:14AM -0700, Noah Silverman wrote:
 Hi,
 
 I'm experimenting with a few learners that require a matrix as their
 input.  (Currently svmpath, vbmp, etc.)
 
 I currently have a dataframe with 50 columns and 20,000 rows.
 
 I tried using:
 
 x - as.matrix(my_data.frame)
 
 If I then as, is.matrix(x), I get TRUE.
 
 However everywhere I've tried to use the matrix returns errors.

Without more information I can't even start to guess what is going
wrong. Please give a short, reproducible example of what you did and
what errors you encountered.

as.matrix() should suffice for creating a matrix from a data.frame :

 foo - data.frame(1:4, 4:1, sqrt(1:4), log(4:1))
 foo
  X1.4 X4.1 sqrt.1.4.  log.4.1.
  114  1.00 1.3862944
  223  1.414214 1.0986123
  332  1.732051 0.6931472
  441  2.00 0.000
 det(foo)
Error in UseMethod(determinant) :
  no applicable method for determinant
 det(as.matrix(foo))
[1] -0.1092489

So probably your problem is somewhere else.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting dataframe to matrix

2009-10-16 Thread Philipp Pagel

On Fri, Oct 16, 2009 at 01:55:03AM -0700, Noah Silverman wrote:
 I think  you may be correct.
 
 I've manage to get the data into a format that the function accepts.
 
 The error appears to be because I have negative values in my data:
 
 Error in apply(safeNormCDF(s), 1, prod) :
   dim(X) must have a positive length

Sounds like safeNormCDF() does not return a matrix but a vector.
What does dim(safeNormCDF(s)) say? 

 apply(1:9, 1, sum)
Error in apply(1:9, 1, sum) : dim(X) must have a positive length
 apply(matrix(1:9, nrow=3), 1, sum)
[1] 12 15 18
 apply(matrix(1:9, nrow=1), 1, sum)
[1] 45


cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] two graphs 1 x-axis

2009-10-16 Thread Philipp Pagel

On Fri, Oct 16, 2009 at 12:22:06PM +0200, Duijvesteijn, Naomi wrote:
 I have a question concerning plotting graphs.
 Here an example dataset
 
 
 a-c(1,2,3,4,5,6)
 b-c(3,5,4,6,1,1)
 c-c(1,1,1,1,1,1)
 d-as.data.frame(cbind(a,b,c))
 plot.new()
 plot(d$a, d$b, col=red)
 par(new=TRUE)
 plot(d$a,d$c, col=red, pch=|)
 
 What I would want is to plot de second plot under the first plot. So
 not in the the first plot. There is a way to divide your graph in 2
 or 3 parts and use the same x-axis but I do not seem to get it
 right. Could somebody help me out?

Yes, use something alng these lines:

par(mrfow=c(2,1)) 
plot(d$a, d$b, col=red)
plot(d$a, d$c, col=red, pch=|)

As both plots use the same data for X you are set. If you need to
force two datasets with different x-ranges into the same range, you
can use the xlim parameter to define the desired range.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Stretch the x-axis for better alignment comparison

2009-09-24 Thread Philipp Pagel

On Wed, Sep 23, 2009 at 11:25:23AM -0700, Maggie wrote:
 I have the following code that aligns the two graphs.
 Problem is that in .pdf it gives me it x-axis (0-100) is broken down
 into 0-20, 20-40..and so on.
 I wonder if there is for it to display the x-axis (and y-axis) in more
 detail than that.

Without the necessary data I canot directly reproduce your example but
have a look at this for a start:

plot(0:10)
axis(1, seq(0,10,0.2), labels=F)

You may also want to use xaxt='n' in the plot command and then
construct use axis to build the axis the way you want it. If reading
out data from the graph is a concern, you may also want to look at the
grid() command.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange split behavior?

2009-09-23 Thread Philipp Pagel

On Wed, Sep 23, 2009 at 07:29:30AM -0500, Peng Yu wrote:
 On Wed, Sep 23, 2009 at 1:24 AM, Peter Dalgaard
 p.dalga...@biostat.ku.dk wrote:
  Peng Yu wrote:
 
 Is there an operation on a factor to get a subset and keep only the
 corresponding levels (see commented line below)?

Yes, there is: call factor() on your subset:

 a - factor(rep(letters[1:5], 5))
 a
 [1] a b c d e a b c d e a b c d e a b c d e a b c d e
Levels: a b c d e
 b - a[a!='b']
 b
 [1] a c d e a c d e a c d e a c d e a c d e
Levels: a b c d e
 factor(b)
 [1] a c d e a c d e a c d e a c d e a c d e
Levels: a c d e


cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Suppressing script commands in R console when executing long program

2009-09-18 Thread Philipp Pagel

On Fri, Sep 18, 2009 at 03:46:27PM +1000, Steven Kang wrote:
 *Q1. Are there any way of suppressing the commands in the R console?*

I think this has been answered already.

 *Q2. Is R capable of reading numbers that are represented with 1,000
 separator commas?*

I am not aware of an option to read.table and freinds that does this
but you can recover easily:

 foo - read.delim('foo.tbl')
 foo
  A  B
1 1 12,300
2 2 256,001.01
3 3  900.1
4 4 80
 str(foo)
'data.frame':   4 obs. of  2 variables:
 $ A: int  1 2 3 4
 $ B: Factor w/ 4 levels 12,300,256,001.01,..: 1 2 4 3
 foo$B - as.numeric(sub(',', '', as.character(foo$B)))
 foo
  AB
1 1  12300.0
2 2 256001.0
3 3900.1
4 4 80.0
 str(foo)
'data.frame':   4 obs. of  2 variables:
 $ A: int  1 2 3 4
 $ B: num  12300 256001 900 80


cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Datetime conversion

2009-09-18 Thread Philipp Pagel

 The same what you have worked out is my need but i'm getting the following
 error  
 Error in `$-.data.frame`(`*tmp*`, date, value = list(sec = c(0, 0,  : 
   replacement has 9 rows, data has 14

Please give more detail about what you did. This error is certainly
not from the example used in previous postings, as the data fram eused
there has 9 rows, not 14. Without the details (code) on what you did
its all guesswork. Perhaps you are mixing two data.frames of differnt
shape or ...

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Suppressing script commands in R console when executing long program

2009-09-18 Thread Philipp Pagel

On Fri, Sep 18, 2009 at 12:59:16PM +0200, Philipp Pagel wrote:

  foo$B - as.numeric(sub(',', '', as.character(foo$B)))

Thinking about it some more, you should use gsub instead of sub here.
Otherwise only the first occurrence of the thousands separator will be
removed.

cu
Philipp


-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Datetime conversion

2009-09-18 Thread Philipp Pagel

On Fri, Sep 18, 2009 at 04:32:27AM -0700, premmad wrote:
 
 Sorry for confusing you all with my inexperienced posting .
 I tried as u said if you have 9 rows in the data it is working fine but
 please try out the same example as you have suggested earlier with morethan
 9 rows.
 
 I tried it as following
 datetime -c(
 + 01OCT1987:00:00:00.000,
 +  12APR2004:00:00:00.000,
 +   01DEC1987:00:00:00.000,
 +  01OCT1975:00:00:00.000,
 +   01AUG1979:00:00:00.000,
 +  26JUN2003:00:00:00.000,
 +  01JAN1900:00:00:00.000,
 +  13MAY1998:00:00:00.000,
 +  30SEP1998:00:00:00.000,
 +  30SEP1998:00:00:00.000,
 +  30SEP1998:00:00:00.000,
 +  30SEP1998:00:00:00.000) 
 dt - as.data.frame(datetime) 
 
 dt$date-strptime(as.character(dt$datetime),%d%b%Y) 
 
 and got the following error :
 
 Error in `$-.data.frame`(`*tmp*`, date, value = list(sec = c(0, 0,  : 
   replacement has 9 rows, data has 12.

Oops - sorry you are right. There is a Problem with inserting the
object. Try this instead:

dt$date - as.Date(dt$datetime, %d%b%Y)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] latex code in R - convert to pdf

2009-09-17 Thread Philipp Pagel

 
 is it possible to convert latex code to pdf in R (like a
 latex-program would do it)?
 Is there a package that comes with this capabilities?
 
 
 My problem is that I want to generate tables automatically -
 and I can't use a latex editor at that computer ...
 
 
 Besides latex ... are there good ways to generate tables in R?

Have a look at Sweave and xtable - I think that's what you want.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] latex code in R - convert to pdf

2009-09-17 Thread Philipp Pagel

On Thu, Sep 17, 2009 at 10:08:57AM +0200, Philipp Pagel wrote:
  
  is it possible to convert latex code to pdf in R (like a
  latex-program would do it)?
  Is there a package that comes with this capabilities?
  
  
  My problem is that I want to generate tables automatically -
  and I can't use a latex editor at that computer ...
  
  
  Besides latex ... are there good ways to generate tables in R?
 
 Have a look at Sweave and xtable - I think that's what you want.

Charlies post made me aware that by latex editor you may mean that
there is no LaTeX installation on your machine. In that case Sweave
and xtable will obviously be of little use. If you have Openoffice on
that computer package odfWeave may be the solution. If openoffice is
not available, either, maybe package HTMLUtils would be another option
(I haven't used it so far, so I may be wrong here).

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Philipp Pagel

On Tue, Sep 08, 2009 at 02:53:11PM +0300, Lauri Nikkinen wrote:
 I have a text file similar to this (separated by spaces):
 
 x - DF12 This is an example 1 This
 DF12 This is an 1232 This is
 DF14 This is 12334 This is an
 DF15 This 23 This is an example
 
 
 and I know the field lengths of each variable (there is 5 variables in
 this data set), which are:
 
 varlength - c(2, 2, 18, 5, 18)
 
 How can I import this kind of data into R, using the varlength
 variable as an field separator indicator?

I am not totally sure what exaclty the expected result is. From your
description I got the impression that your data file uses a mixture of
separation characters and fixed-width formatting. Maybe I
misinterpreted your example. Have a look at read.fwf() an if that does
not solve your problem maybe explain the Structure and expected result
a little further.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data separated by spaces, getting data into R using field lengths

2009-09-08 Thread Philipp Pagel

On Tue, Sep 08, 2009 at 03:21:53PM +0300, Lauri Nikkinen wrote:
 This data is from database and the maximum length of a field is
 defined. I mean that every column has a maximum length and I want to
 use this maximum length as a separator. So if one cell in that
 column is shorter than the maximum, cell should be padded with white
 spaces or something like that. This seems to be hard to explain.

OK - now I got it. RODBC has already been sugested. If for some reason
that is impossible you could try to dump the data using a proper
delimiter (e.g. tab). Without a real delimiter it is certainly hard to
parse the data - and it may even be impossible depending on what
characters are allowed in your free-text fields.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 >

1 - 100 of 212 matches

Mail list logo