Re: [R] control the conversion of factor to numeric
On Tue, Oct 18, 2011 at 03:40:27PM +0200, Martin Batholdy wrote: Ok, I think that would work – thanks! However, in my case I read a data.frame via read.table(). So some of the columns get transformed to factors automatically – I don't generate the factor-variables as in the example, so I can't control how the levels are ordered (or can I?). You can't while reading the data but nothing can stop you from re-ordering the levels once you have your data.frame. An example with the iris data: data(iris) str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... iris$Species - factor(iris$Species, ordered=TRUE, levels=c('versicolor', 'virginica', 'setosa')) str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Ord.factor w/ 3 levels versicolorvirginica..: 3 3 3 3 3 3 3 3 3 3 ... cu Philipp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histogram for each ID value
where the first column is the chromosome location and the second column is some value. What I'd like to do is have a histogram created for each chr location (i.e. a separate histogram for chr1, chr2, chr3, chr7, chr9, and chr22). I am just having a hard time getting everything to work out and am hoping for some suggestions. ggplot and looping combined with traditional graphics have already been mentioned, so I'll add the lattice solution for completeness: histogram(~foo | choromosome, dat) This assumes that your dataframe is called dat and contains two columns called foo (your numeric value) and chomosome (your chromosome identifier). cu Philipp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave doesn't work
On Sun, Aug 21, 2011 at 09:18:25AM -0700, danielepippo wrote: Sweave(example.Rtex) in R it seems working [...] * ...le/Desktop/dati/LaTeX1.Rtex* Sounds like you are first running sweave on the file 'example.Rtex' and later LaTex on 'LaTeX1.Rtex' Two points: 1) why do these files have totally differnt names? If the Sweave file was called 'example.Rtex' I'd expect the corresponding LaTex file to be 'example.tex' 2) Why to both files have the same extension? Commonly, the Sweave files are called 'something.Snw' or 'something.Rnw' and the resulting LaTex Files would be 'something.tex' cu Philipp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug for my fault?
On Thu, Aug 18, 2011 at 04:52:58PM +0700, Rut S wrote: I tried to recode some complex multiple variables and run into a problem that r can change only some column that I want to change. I can reproduce the problem with this idfortest - c(6,23,46,63,200,238,297,321,336,364,386,392,414,434,441) id - seq(1:500) id[id==idfortest] the result showed Warning in id == idfortest : longer object length is not a multiple of shorter object length [1] 200 386 434 can you enlighten me for this, thank you in advance. Others have already pointed out what the problem is. I'd like to add that you are probablyu looking for the %in% operator. cu Philipp __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write merged data frame to a file
On Mon, Jul 18, 2011 at 04:00:29PM +0200, Andrea Franceschini wrote: I use version 13 of R in OSX (downloaded and installed less than 1 year ago). Probably 2.13 ... [...] code omitted The first lines are OK (i.e. 14 columns, like the dataframe), while at a certain point I get lines with only 3 columns !!! The bad lines that contain only 3 columns have the name and the description of the gene (i.e. the content of the file that I merged with). Besides, these strange lines also get repeated (see the bottom). I havent't carefully analyzed your code so I may be wrong but my guess for all weird behaviour of gene related data.frames problems is this: Gene descriptions love to contain things like Foo 5' obfuscation factor. Note the ' in the description which read.table will happily interpret as a quotation mark and eat lots of rows until it happens to encouter a closing counterpart. This leads to all kinds of funny results. So I bet your problem is not in write.table but in reading the data. Have a closer look at your data frame: are you really getting the expected number of observations in the merged data.frame? Are the rows in question really ok in the data frame? If my guess is correct you should be able to fix your problem by including quote= in both your read.table commands. If it doesn't, also try comment.char= - another popular source of problems. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem in reading a sequence file
On Tue, Jul 05, 2011 at 02:06:02PM +0200, albert coster wrote: seqfile V1 1 NNATTAAAGGGC I want only NNATTAAAGGGC . If I understand correctly, your file simply contains one string (sequence) per line. In that case you may want to use scan() instead for read.table but without more infromation it's hard to know. Can you proviede a very short example file (maybe 5 lines) and also the output of str(foo) where foo is the variable you read the file into? Also: do you want a data frame with a single column? Or rather a vector of strings? Something else? Does your file ONLY contain sequences - or are there also identifiers, annotations etc.? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem in reading a sequence file
On Tue, Jul 05, 2011 at 04:53:32PM +0200, albert coster wrote: I'm taking this back to the list so others can follow up. Yes, the file is consists of one string (sequence) per line. The files format is following: Sequence NNATTAAAGGGC OK - in that case (and as you want a vector anyway) you can use scan('seq.txt', what=character)() seqfile-read.table(seq.txt) Warning message: In read.table(seq.txt) : incomplete final line found by readTableHeader on 'seq.txt' OK - that means you don't have a newline ('\n') at the end of your sequence file and read.table is warning you about that. str(seqfile) 'data.frame': 2 obs. of 1 variable: $ V1: Factor w/ 2 levels NNATTAAAGGGC,..: 2 1 This indicates that there are at least two lines in the file (so you got two levels in the factor). So I would guess there is an empy line before your sequence or you really have the word 'Sequence' on line 1. For sequence data it probably does not make much sense to let R convert to factor and a character colunm would be prefered. This can be accomplished by using one of the options 'as.is', 'stringsAsFactors' or 'colClasses'. If you use scan you'll need to get rid of the extra line first. If you stick with read.table you can specify the first line as your header line using the header=TRUE option. Now you can address column 'Sequence' as such. Example: dat - read.table('seq.txt', as.is=T, header=TRUE) dat$Sequence [1] NNATTAAAGGGC dat[, 'Sequence'] [1] NNATTAAAGGGC str(dat) 'data.frame': 1 obs. of 1 variable: $ Sequence: chr NNATTAAAGGGC cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
On Sun, Jun 26, 2011 at 06:34:28PM -0700, Ungku Akashah wrote: hello. I need some help about this R software. I've been searching for volcano plot(statistic) script for long, but still not found. May i request the script for volcano plot. If able, pls include any tips about volcano plot. http://lmgtfy.com/?q=r+volcano+plot cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Shrink file size of pdf graphics
On Thu, May 19, 2011 at 01:35:51PM -0700, Layman123 wrote: I tried both, the plot devices in R and pdftk. First I tried the png-device, but as I wanted to increase the number of pixels with 'width' and 'height', the labels are getting smaller When I really need a png, I usually produce a pdf or eps first and then convert to png of the desired resolution with the convert command of imagemagick (but of course any other software, like e.g. Photoshop, should work fine, too). That way I don't have to figure out the correct paramters to make the png the way I want it and I have the additional benetfit of a vector grpahics master file that I can esily use to produce addictional verison in differnent resolutions etc. Is there a way to do this with the gs-command so that it would be even more compressed? Possibly, but of course there is a limit to how much you can compress a file without resorting to lossy compression. You may have hit that limit. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TR: Simulate keyboard
On Mon, May 16, 2011 at 11:27:05AM +0200, Thibault Charles wrote: I cannot find a way to simulate a keyboard event as pressing the ?enter? key. for (i in 1:nombre_fichiers_monteCarlo){ system(paste('C:/Trnsys17/Exe/TRNExe.exe',liste_dck_monteCarlo[i]),wait=TRUE) } My problem is that at each step, trnsys ask the user to press ?enter? from the keyboard and I would like not have to press myself on ?enter?. Does exist a function to simulate this kind of keyboard event ? I don't think R can handle that (but I may be wrong). On a UNIX platform, this kind of problem could be tackled with the expect command. Your code above suggests you are on a Windows platform. I did a quick google search and it seems that expect is available for windows as part of the CYGWIN suite. And there also seems to be an expect for windows from activestate: http://docs.activestate.com/activetcl/8.5/expect4win/ cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] TR: Simulate keyboard
On Mon, May 16, 2011 at 01:50:15PM +0200, Thibault Charles wrote: Thank you for your response. In this case, do you think it is possible to write a little program in java which would execute my script and simulate a press on my keyboard ? I think you need to do it the other way round: let R call an extrenal scripts that handles the press-enter business (expect/Java/whatever). I still think expect would be easieast, but if you feel more comfortable with Java, that's going to work aswell. Coming to thionk of it: your external software is purely text based (i.e. running in a DOS-box), isn't it? If it's not, and you are actually getting message windows, 'expect' won't be much help but there are several tools out there that will happily record your ations (keyboard and mouse events alike) and play them back later: google for 'windows macro recorder' and you'll get more varieties than you will care for. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RV: R question
which is the maximum large of digits that R has?, because SQL work with 50 digits I think. and I need a software that work with a lot of digits. The .Machine() command will provide some insight into these matters. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RV: R question
On Fri, May 06, 2011 at 09:17:11AM -0400, David Winsemius wrote: On May 6, 2011, at 4:03 AM, Philipp Pagel wrote: The .Machine() command will provide some insight into these matters. On my device (and I suspect on all versions of R) .Machine is a built-in list and there is no .Machine() function. Oops - my fault. You are right, of course. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot several histograms with same y-axes scaling using hist()
On Fri, Apr 29, 2011 at 03:35:41AM -0700, hck wrote: Problem: hist()-function, scale = “percent” [...] =Hist(na.exclude(AA3), breaks=50, col=seashell3, scale=percent,xlim=c(-1, 1), xlab=Bewertungsfehler, ylab=Haeufigkeit (in %), main=KBV, border=white) Before anyone can really help you'll need to let us know where your Hist() function came from. hist() from package graphics does not have a scale parameter and honours ylim without a problem. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv fails to read a CSV file from google docs
On Fri, Apr 29, 2011 at 06:19:24PM +0300, Tal Galili wrote: data_url - http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv read.csv(data_url) Error in file(file, rt) : cannot open the connection I get the same error (R 2.11.1, Debian LINUX) and don't have a solution. But I did some tests and found the origin of the problem I can download the file from google with wget but get some interesting ´information in the process: $ wget -v 'http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv' --2011-04-29 20:07:40-- http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv Resolving spreadsheets0.google.com... 209.85.148.139, 209.85.148.113, 209.85.148.138, ... Connecting to spreadsheets0.google.com|209.85.148.139|:80... connected. HTTP request sent, awaiting response... 302 Moved Temporarily Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv [following] --2011-04-29 20:07:41-- https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv Connecting to spreadsheets0.google.com|209.85.148.139|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: “pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv.1” [ = ] 41 --.-K/s in 0s 2011-04-29 20:07:42 (342 KB/s) - “pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv.1” saved [41] The message that caught my attention was the http redirection: 302 Moved Temporarily. If you try again with the new url you get this: read.csv(url(https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=trueg;)) Error in open.connection(file, rt) : cannot open the connection In addition: Warning message: In open.connection(file, rt) : unsupported URL scheme ?url told me Note that ‘https://’ connections are not supported. Case closed, problem unsolved... Dirty workaround: use system() and wget or whatever command is available on Windows for this. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bwlpot problems: printing, and tick labels
A. It produces empty JPEGs. When the 'bwplot' line alone is submitted, the plot duly shows up. See FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f BTW: don't use jpg for plotting if you can - they routinely look ugly. B. When the 'bwplot' line alone is submitted, y labels are values 1 to 6, not actual distinct values of y$maxthreads. That's because maxthreads is not a factor - you can convert it to one. See below. (C. I would, of course, prefer to produce plots for all distinct values of x$maxthreads in a single swoop, on a single figure). That's what I was about to suggest. Don't loop over the tasks - use the power of lattice. I think this should be close to what you want: bwplot(factor(maxthreads) ~ time | factor(tasks), x) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table: fill=T for header?
Dear ExpeRts,t I am trying to read tab delimted data produced by somewhat brain dead software that seems to think it's a good idea to have an extra tab character after the last column - except for the header line. As explained in the help page, read.delim now assumes that the first column contains the row.names (which is not even wrong) but now and all col.names get shiftet by one column. Example: infile - 'sample\tx1\n1\tA\t\n2\tB\t\n3\tA\t' read.delim(textConnection(infile)) sample x1 1 A NA 2 B NA 3 A NA So I set row.names to NULL because the man page said Using ‘row.names = NULL’ forces row numbering.. Now the row.names really are numbered automatically but I get a bonus column: read.delim(textConnection(infile), row.names=NULL) row.names sample x1 1 1 A NA 2 2 B NA 3 3 A NA Hm - not what I want. I am also a bit puzzeled why the extra column is introduced instead of just using the first col.name. At the moment I deal with it by fixing the col.names and dumping the extra column: dat - read.delim(textConnection(infile), row.names=NULL) colnames(dat) - colnames(dat)[-1] dat - dat[-ncol(dat)] dat sample x1 1 1 A 2 2 B 3 3 A I worked my way through ?read.delim but could not find an option to deal with these (flawed) files directly. As the opposite situation (i.e. more col.names than data) can be fixed with fill=T I was hoping something like fill.header=T or fill='header' may exist. Did I just not find it or does it not exist? And if it doesn't - does anyone else think it would be a nice item for the wishlist? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Missing cases Function
On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton wrote: Dear all I have written a function to perform a very simple but useful task which I do regularly. It is designed to show how many values are missing from each variable in a data.frame. In its current form it works but is slow because I have used several loops to achieve this simple task. Why not use summary? foo - data.frame(a=c(1,3,4,NA), b=c(NA,4,NA,8), c=factor(c('A', NA, 'A', 'B'))) summary(foo) a bc Min. :1.000 Min. :4 A :2 1st Qu.:2.000 1st Qu.:5 B :1 Median :3.000 Median :6 NA's:1 Mean :2.667 Mean :6 3rd Qu.:3.500 3rd Qu.:7 Max. :4.000 Max. :8 NA's :1.000 NA's :2 cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Missing cases Function
On Tue, Apr 19, 2011 at 03:29:08PM +0800, Tim Elwell-Sutton wrote: Dear all I have written a function to perform a very simple but useful task which I do regularly. It is designed to show how many values are missing from each variable in a data.frame. In its current form it works but is slow because I have used several loops to achieve this simple task. Oh - and in case you ONLY wnt the number of NAs in each column this should be pretty efficient: lapply(foo, function(x){sum(is.na(x))}) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
On Mon, Apr 18, 2011 at 04:11:57PM +0530, Ramnath R wrote: Hai From which CRAN mirror can get the package ?LPP2005REC? As the first hit of a google search for LPP2005REC told me it is not a package but a dataset in package timeSeries. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clearing Console; of weeks of codes!
I do see I have weeks of codes in my console when I check with my arrow up keys. I have been clearing them with Control L but it seems to clear it clear the screen temporally. CTRL-L simply clears the screen and not the history. I do see the previous codes again when I open R the next day, after quitting the session! Q: How do I clear this? What you are seeing is the R history which is stored in the file .Rhistory in the current working directory when the session is closed or savehistory() is used. Deleting that file before starting R will clear the history. I am not sure you can clear the history of a running R session. Deleting the file will not work while the session is open because the history is in memory at that time and I am not aware of a command to manipulate the current history. The environment variable R_HISTSIZE can be used to control the size of the history. see ?history for details. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hash table...
On Thu, Apr 14, 2011 at 06:44:53PM +1200, Worik R wrote: To improve the efficiency of a process I am writing I would like to cache results. So I would like a data structure like a hash table. So if I call Z - f(Y) I can cache Z associated with Y: CACHE[Y] - Z I am stumped. I expected to be able to use a list for this but I cannot figure how If y is an integer, factor or string you could try something along these lines: cache - list() y - 12 cache[[as.character(y)]] - sqrt(y) y-98 cache[[as.character(y)]] - sqrt(y) cache $`12` [1] 3.464102 $`98` [1] 9.899495 Of course this can get you in trouble if y is a floating point number because of the issues with identity of such numbers, as discussed in ?all.equal and FAQ 7.31 Why doesn't R think these numbers are equal?. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop performance
I am running some simulations in R involving reading in several hundred datasets, performing some statistics and outputting those statistics to file. I have noticed that it seems that the time it takes to process of a dataset (or, say, a set of 100 datasets) seems to take longer as the simulation progresses. Reading data, e.g. with read.table can be slow because it does a fair bit of checking content, guessing data types etc. So I guess the question is: how is your data stored (files, in what format, database) and how do you read it into R? Once we know this there may be tricks to speed up the data import. I am curious to know if this has to do with how R processes code in loops or if it might be due to memory usage issues (e.g., repeatedly reading data into the same matrix). Probalby not - I would guess it's the parsing of the input data that is slow. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clearing Console; of weeks of codes!
Please reply to the list, so the OP and otheres following the thread can see your contributions. I'm taking this back to r-help. On Thu, Apr 14, 2011 at 01:43:31AM -0700, Mohammad Tanvir Ahamed wrote: you can try it ... rm(list=ls()) No - this has been suggested before and saying it gain does not make it less wrong: rm() will delete objects from the workspace but has absolutely no effect on the history. cu Philipp /...Tanvir Ahamed ━━━ From: Philipp Pagel p.pa...@wzw.tum.de To: r-help@r-project.org Sent: Thursday, April 14, 2011 10:23 AM Subject: Re: [R] Clearing Console; of weeks of codes! I do see I have weeks of codes in my console when I check with my arrow up keys. I have been clearing them with Control L but it seems to clear it clear the screen temporally. CTRL-L simply clears the screen and not the history. I do see the previous codes again when I open R the next day, after quitting the session! Q: How do I clear this? What you are seeing is the R history which is stored in the file .Rhistory in the current working directory when the session is closed or savehistory() is used. Deleting that file before starting R will clear the history. I am not sure you can clear the history of a running R session. Deleting the file will not work while the session is open because the history is in memory at that time and I am not aware of a command to manipulate the current history. The environment variable R_HISTSIZE can be used to control the size of the history. see ?history for details. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f r Genomorientierte Bioinformatik Technische Universit t M nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop performance
On Thu, Apr 14, 2011 at 06:50:56AM -0500, Barth B. Riley wrote: Thank you Phillip for your post. I am reading in: 1. a 3 x 100 item parameter file (floating point and integer data) 2. a 100 x 1000 item response file (integer data) 3. a 6 x 1000 person parameter file (contains simulation condition information, person measures) 4. I am then computing several statistics used in subsequent ROC analyses, the AUCs being stored in a 6000 x 15 matrix of floating point numbers I am using read.table for #1-#3 and write.table for #4. The process of reading files (#1-#3) and writing to file is done over 6,000 iterations. A few ideas: 1) try to use the colClasses argument to read.table. That way R will not have to guess the data type of columns. 2) When you say 6000 iterations - do you mean you are reading/writing the SAME files over and over again? Or do you have 6000 sets of files? In the former case the obvious advice would be to only read them once. 3) If the input files were generated in R, another option would be to save()/load() them rather than using write.table()/read.table(). 4) If the came from some other application, possibly storing everything in a database may speed up things. 5) Is your data on a file server? If yes: try moving it to the local disc temporarily to see if network i/o is limiting your speed. 6) Whatever you try to improve performance - measure the effects rather than rely on your impression (system.time, Rprof, ...) in order to find out what part of the program is actually eating up the most time. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compatibility with Work Load/Resource Managers
I was wondering if anyone knew whether R is capable of integrating with the following work load/resource managers TORQUE, OpenPBS, PBS Pro, LSF, and SGE? I am running R scripts in our cluster under SGE on a regular basis and have also done that under Platform LSF in the past but I am not sure what you mean by integrating with these systems. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot, groups and colors
On Fri, Apr 08, 2011 at 08:14:21AM -0700, Dennis Murphy wrote: Thanks to everyone who replied! Especialy this and the ggplot advice did what I wanted. xyplot(circumference~age, dat, groups=Tree, type='l', col.line = c('red', 'blue', 'blue', 'red', 'red')) This is essentially what I had been doing after somehow creating the correct color vector. After a little more fiddling around, this also works, and seems a bit less kludgy: dat$group2 - factor(dat$group, labels = c('red', 'blue')) xyplot(circumference~age, dat, groups=Tree, type='l', col.line = levels(dat$group2)) Perfect! Using the levels directly had not occured to me. Thanks! cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list to data frame
On Sun, Apr 10, 2011 at 06:01:39PM +, Franklin Tamborello II wrote: I need to make a data frame out of the data that I currently have in a list. This works, but is ugly: ineffData-rbind(ineffFilesList[[1]], ineffFilesList[[2]], ineffFilesList[[3]], ineffFilesList[[4]], ineffFilesList[[5]], ineffFilesList[[6]], ineffFilesList[[7]], ineffFilesList[[8]], ineffFilesList[[9]], ineffFilesList[[10]], ineffFilesList[[11]], ineffFilesList[[12]], ineffFilesList[[13]], ineffFilesList[[14]], ineffFilesList[[15]], ineffFilesList[[16]], ineffFilesList[[17]], ineffFilesList[[18]], ineffFilesList[[19]], ineffFilesList[[20]], ineffFilesList[[21]], ineffFilesList[[22]], ineffFilesList[[23]], ineffFilesList[[24]], ineffFilesList[[25]], ineffFilesList[[26]], ineffFilesList[[27]]) What's an efficient way of doing this such that the computer will do the work of recurring through the list of elements of ineffFilesList? as.data.frame(ineffFilesList) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot, groups and colors
Dear ExpeRts, I am trying to plot a bunch of growth curves and would like to get some more control over groups and line colors than I seem to have. Example: # make some data dat - Orange dat$group - ifelse(dat$Tree%in%c('1','4','5'), 'A', 'B') # plot xyplot(circumference~age, dat, groups=group) # now use lines to make the growth curve more visible xyplot(circumference~age, dat, groups=group, type='l') # ugly, because of the 'return' lines # to fix this set groups to Tree xyplot(circumference~age, dat, groups=Tree, type='l') # better, but now each Tree has its own color Of course I can now use the col argument to manually assign the colors by group but is there a more elegant way that I missed? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] force output dimension of table function
On Thu, Apr 07, 2011 at 05:37:08AM +0200, fisken wrote: When I use the 'table' function on a simple vector it counts the number of occurences. So depending on the values of my input vector the function returns a class of type table with different lengths. Is there an easy way to tell the table function, the values to expect? And what I wanted was 0 1 2 3 4 5 0 1 1 1 0 2 The solution using factos has already been posted. if you are really interested in integers only you could also use tabulate(): tabulate(s) [1] 1 1 1 0 2 Note that this excludes zero, though. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function order
On Wed, Apr 06, 2011 at 11:35:32AM +0100, Yan Jiao wrote: abc-cbind(c(1,6,2),c(2,5,3),c(3,2,1))## matrix I want to sort if I do abc[ order(abc[,3]), increasing = TRUE] Jim already pointed out that the argument needs to go inside the parenthes of the order function. In addition, order has an argument called 'decreasing', but none called 'inceasing'. Finally, you are lacking a comma in your subsetting of the matrix: abc[ order(abc[,3], decreasing=F)] [1] 2 6 1 But you probably mean: abc[ order(abc[,3], decreasing=F), ] [,1] [,2] [,3] [1,]231 [2,]652 [3,]123 cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving console and graph output to same file
On Tue, Apr 05, 2011 at 10:53:03AM +0530, Nikhil Abhyankar wrote: Hello All, How do I save the output of the R console and the graphic output to the same PDF file and append these to each other? I need to have a frequency table and a corresponding graph, one below the other in a file. I have tried with sending the cross table to the graph window using 'textplot' and then saving the graphic output. However, the table does not look nice in the graph output. Is there any way the output from the console can be saved in a file and then the output from the graph window be appended to the same file? Sweave an odfWeave are very nice methods for generating reports with both text, R code, Results from R and Graphics. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] one question about bioconductor
On Thu, Mar 31, 2011 at 09:32:06AM -0700, wang peter wrote: dear lady and gentalmen: i am gaoshan from kansas university. i used such coding to deal with gel data data - ReadAffy() Warning messages: 1: In file(out, wt) : cannot open file 'C:\Users\gaoshan\AppData\Local\Temp\RtmpvsyXOV\Rhttpd3f0b2e85': No such file or directory As the message says: there is something wrong with the path. In order to get more helpful replies, you should show the actual code you used and also give a hint about the spoecific packages you were using. E.g. ReadAffy most certainly requires at least a filename which seems to be missing from your comamnd above. In addition, I recommend to post your question on the bioconductor mailing list. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read password-protected files
On Thu, Mar 31, 2011 at 11:00:48AM -0700, Shi, Tao wrote: Hi list, I have a bunch of .csv files that are password-protected. I wonder if there is a way to read them in in R without manually removing the password protection for each file? I doubt that there is such a thing as a password protected csv file. They are just text files, after all. So I guess you have something else. How or what did the presumed pasword protection? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not all rows are being read-in
On Tue, Mar 29, 2011 at 06:58:59PM -0400, Dimitri Liakhovitski wrote: I have a tab-delimited .txt file (size 800MB) with about 3.4 million rows and 41 columns. About 15 columns contain strings. Tried to read it in in R 2.12.2 on a laptop that has Windows XP: mydata-read.delim(file=FileName.TXT,sep=\t) R did not complain (!) and I got: dim(mydata) 1692063 41. My guess would be that there are (unexpected) quotes and/or double quotes in your file and so R thinks that rather large blocks of your file are actually very long strings. This routinely happens in situations like this: ID x description 1 0.4 my first measurement 2 1.6 Normal 5 object 3 0.4 Some measuremetn 4 0.7 A 4 long sample R thinks that the description in row 2 ends in row 4 and you loose data. Try read.delim(..., quote=). cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using graphics straight from R into published articles
On Wed, Mar 30, 2011 at 08:48:55AM +, ONKELINX, Thierry wrote: Large snip. Absolutely vector - no jpeg, png, ... although it takes That depends on the kind of graph. I aggree that you should try vector at first. But when it generates very larges files (e.g. scatterplots with thousands of points) then you better switch to bitmaps like tiff or png. Jpeg can create artefacts, so is not very good for graphics. True. Sometimes one can get away with switching from a normal scatterplot to hexbin or something like this but if that is not anoption a high resolution tiff or png is the way out. And of course, I agree that jpeg should never be used for graphs. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using graphics straight from R into published articles
On Wed, Mar 30, 2011 at 09:56:09AM -0700, blanco wrote: Wow - thanks all for your helpful replies. Awesome forum. Am I right to assume that you use the postscript function to create .ps and .pdf files from R? almost: postscript(..., onefile=FALSE) # for eps pdf() # for PDF And don't forget to close the device with dev.off() after the plot. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reversing order of vector
On Tue, Mar 29, 2011 at 12:20:50AM -0700, Vincy Pyne wrote: vect1 = as.character(c(ABC, XYZ, LMN, DEF)) as.character is unnecessary, here. vect1 [1] ABC XYZ LMN DEF I want to reverse the order of this vector as vect2 = c(DEF, LMN, XYZ, ABC) vect2 - rev(vect1)) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] producing histogram-like plot
On Tue, Mar 29, 2011 at 11:05:08AM +0200, Karin Lagesen wrote: Hi! I have a dataset that looks like this: 0.0 14 0.0 3 0.9 12 ...and so on. I would like to plot this in a histogram-like manner. One way would be to re-create the original data and then simply use hist: dat - data.frame(x=c(0,0,0.9,0.73,0.78,1,0.3,0.32), freq=c(14,3,12,15,2,15,2,8)) hist(with(dat, rep(x, times=freq))) My example did not take special binning wishes into account but you can easily customiye that with the breaks argument to hist. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using graphics straight from R into published articles
On Tue, Mar 29, 2011 at 09:31:18AM -0700, blanco wrote: I was just wondering if people use graphics from R straight into articles or are they always edited in some way; fonts, headers, axis, color etc? Using photoshop or some other programs? I would like to think it is possible, better and more profession to do it all in R. I tried google and the search option but found nothing on the topic. What are the experiences for all the professionals out there that use R? Are there any articles on this specific subject? I'm not aware of any articles on the topic but I can share what I do: 95% of the time I tweak various graphics parameters in R and see no necessity for postprocessing in other applications. In about 5% I do some manual editing for a camera ready figure. These are usually the result of exotic request from referees. But under no circumstances would I use Photoshop or any other pixel graphics software for this. My R graphics are always created as eps or pdf vector graphics and any editing is done with a proper vector graphics software (Illustrator or Inkscape). I share your feeling that it is better to do as much as possible in R because it means that I won't have to do it again if I need to produce another revision of the figure - all it takes is anoother run of my script. And I can re-use good solutions in the future. Any manual touch-ups have to be done manually every single time = not my idea of efficiency. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.xls - rotate data.frame
On Fri, Mar 25, 2011 at 11:43:31AM +0100, Knut Krueger wrote: Hi to all, how could I to rotate automatically a data sheet which was imported by read.xls? x1 x2 x3 xn y1 1 4 7 ... xn/y1 y2 2 5 8 xn/y2 y3 3 6 9xn/y2 yn ... ... ... Xn/Yn to y1 y2 y3 yn x1 1 23 . Yn/x1 x2 4 56 Yn/x2 x3 7 8 9 Yn/x2 xn ... ... ... . Yn/xn If all the columns (x) are of the same type (e.g. all numeric) you can use t(). Example: dat - data.frame(x1=1:10, x2=(1:10)*2, x3=10:1) dat2 - as.data.frame(t(dat)) If the comlumns are of differnt types (e.g. some numeric, some factors) I don't think you can do this at all, because columns of a data.frame represent vectors, i.e. all value sin a column need to be of the same type. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.xls - rotate data.frame
Unfortunately we have mixed types f.e text , dates times , and numbers OK - in that case you can't fit the data into data.frame. Possibley you cold get what you need using some kind of list structure but I think it's better to ask why you need to transpose the data. Maybe someone can suggest an alternative solution that doesn't require the transposition. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.xls - rotate data.frame
we have (imported from excel) frame - data.frame(x0=c(y1,y2,y3,y4),x1=c(1,2,3,4),x2=c(5,6,7,8),x1=c(9,10,11,12)) where y1..yn are the names of the rows we need frame$x1 .. . frame$xn and frame[1,] .. frame[n,] but the first column is no the rownames. if it is possible to rotate the whole dataset we could use frame$y1 ..frame$y2 I am not 100% sure I understood what you intend to do but I think what you are saying is that you would like to address certain rows by name rather than by index. Is that correct? If so you could solve it like this: # assign the desired row names rownames(frame) = frame[,1] # remove the old name column frame - frame[,2:ncol(frame)] cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.xls - rotate data.frame
I am not 100% sure I understood what you intend to do but I think what you are saying is that you would like to address certain rows by name rather than by index. Is that correct? If so you could solve it like this: # assign the desired row names rownames(frame) = frame[,1] # remove the old name column frame - frame[,2:ncol(frame)] And adding to my own posting: removing the column can be done more elegantly: frame - frame[,-1] And I forgot to mention that now you can say things like frame['x2',]w The frame$y2 notation still only works for columns, of course. Maybe, if you tell us some more about your actual analysis, more help can be provided. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Magic Number Error Message
On Fri, Mar 25, 2011 at 06:42:49AM -0700, armstrwa wrote: When I attempt to run a script, I keep getting the error message shown load(H:\\Restoration Center\\Climate Change and Restoration\\MidAtlFloodRisk\\discharge data\\R files\\ALRT.txt) Error: bad restore file magic number (file may be corrupted) -- no data loaded In addition: Warning message: file 'ALRT.txt' has magic number '# Coh' Use of save versions prior to 2 is deprecated The load() function reads stored DATA into the workspace. As you say you want to run a SCRIPT you are probably looking for source(). cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bar Chart
How do you do a bar chart of 2 vectors? I have one vector which has 10 numbers, and another which has 10 names. The numbers are the frequency of the corresponding name, but when I do a bar chart it says that there is no height. Thanks. The first thing we'd need to know is HOW you tried to create the bar chart. R usually offer quite a lot different ways to tackle a problem so knowing what exactly you did helps a lot in helping. That said, I'll assume you tried the barplot() command which would work e.g. like this: v1 - 1:3 v2 - c('A', 'B', 'B') barplot(v1, names.arg=v2) If v1 is a named vector things are even easier: names(v1) - v2 barplot(v1) As I said, there are a bunch of other ways - e.g. using the lattice function barchart() which works a bit differently. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph lines don;t appear
On Tue, Mar 15, 2011 at 12:01:45PM +0100, Sara Szeremeta wrote: Hi I am trying to plot two simple graphs with a grid in background. The axis and grid appears in correct position, but the actual data are not there Can somebody provide me a hint what is missing? The code is: pln - read.table(file=PLN.txt, header=TRUE, dec=,) par(mfrow=c(1,2)) plot(pln[,1], type=l, lwd=2, ylab=EUR/PLN, xlab=NULL, xlim = c(1993, 2011), ylim = c(2, 5), panel.first = grid(nx=NULL, ny=NULL)) plot(log(pln[,1]), type=l, ylab=EUR/PLN, xlab=NULL, xlim = c(1993, 2011), panel.first = grid(equilogs = FALSE)) pln[,1] is just one column, so you are not plotting the values vs the year but vs. their index. As xlim is set to the interval 1993-2011 you simply don't see your data... cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] beamer overlays with Sweave?
This may be asking too much, but I'm wondering if anyone has a solution (even a hack) for creating multiple (overlay) plots in an Sweave file and post-processing the overlays in beamer appropriately. Although I have not done this with beamer and overlays before, I once had to resort to generating the includegraphics commands from within a loop in order to save substantial amoutns of typing. You could do something similar along these lines (untested): echo=F= slidenum - 1 plotbasename - something plotfilename - paste(plotbasename, slidenum, .pdf, sep=) pdf(file=plotfilename) plot(stuff) dev.off() cat(\\only, slidenum, {\includegraphics{, plotfilename ,}}\n ,sep=) @ echo=F= slidenum - slidenum + 1 plotfilename - paste(plotbasenema, slidenum, .pdf, sep=) pdf(file=plotfilename) plot(stuff) dev.off() cat(\\only, slidenum, {\includegraphics{, plotfilename ,}}\n ,sep=) @ If you want toget really fancy, you could wrap most of this in a conveniance function... cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] beamer overlays with Sweave?
Oops - have to comment my own answer: echo=F= For this to work it needs to be echo=F, results=tex= cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating additional column
Hi! max.col does what you want. Example: dat - data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20)) dat a b c 1 1.17910304 -0.56951219 -0.2243664 2 -1.43840866 -0.99013855 -0.1613536 3 1.08515152 -0.77975274 0.3734530 4 -0.92154605 -0.20318367 0.1384842 [...] dat$maxcol - colnames(dat)[max.col(dat)] dat a b c maxcol 1 1.17910304 -0.56951219 -0.2243664 a 2 -1.43840866 -0.99013855 -0.1613536 c 3 1.08515152 -0.77975274 0.3734530 a 4 -0.92154605 -0.20318367 0.1384842 c [...] cu Philipp On Tue, Mar 08, 2011 at 01:25:10PM +0100, Bodnar Laszlo EB_HU wrote: Hello everybody, I have a little problem in good old R. It is basically the following. I have this small database with 3 rows and the following columns: d1, d2, d3 and Highest d value - which selects the highest value from d1, d2, d3 in each row. d1 d2 d3 Highest d value 1 51.398426 39.111721 11.6086220 51.398426 2 4.0578017.7284070.1234711 7.728407 3 7.2793417.36050918.2964676 18.296468 I'd like to make an additional column which shows the label of the relevant column where we've found the maximum d value. Something like this: d1 d2 d3 Highest d value Where is the maximum? 1 51.398426 39.111721 11.6086220 51.398426 d1 2 4.0578017.7284070.1234711 7.728407d2 3 7.2793417.36050918.2964676 18.296468 d3 Is there an easy way to do this? Thank you very much and have a pleasant day! Laszlo Ez az e-mail ??s az ??sszes hozz?? tartoz?? csatolt mell??klet titkos ??s/vagy jogilag, szakmailag vagy m??s m??don v??dett inform??ci??t tartalmazhat. Amennyiben nem ??n a lev??l c??mzettje akkor a lev??l tartalm??nak k??zl??se, reproduk??l??sa, m??sol??sa, vagy egy??b m??s ??ton t??rt??n?? terjeszt??se, felhaszn??l??sa szigor??an tilos. Amennyiben t??ved??sb??l kapta meg ezt az ??zenetet k??rj??k azonnal ??rtes??tse az ??zenet k??ld??j??t. Az Erste Bank Hungary Zrt. (EBH) nem v??llal felel??ss??get az inform??ci?? teljes ??s pontos - c??mzett(ek)hez t??rt??n?? - eljuttat??srt, valamint semmilyen k??s??s??rt, kapcsolat megszakad??sb??l ered?? hibrt, vagy az inform??ci?? felhaszn??l??s??b??l vagy annak megb??zhatatlans??g??b??l ered?? k??r??rt. Az ??zenetek EBH-n k??v??li k??ld??je vagy c??mzettje tudom??sul veszi ??s hozz??j??rul, hogy az ??zenetekhez m??s banki alkalmazott is hozz??f??rhet az EBH folytonos munkamenet??nek biztos??t??sa ??rdek??ben. This e-mail and any attached files are confidential an...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How two compare two matrixes
Dear all I have two 10*10 matrixes and I would like to compare theirs contents. By the word content I mean to check visually (not with any mathematical formulation) how similar are the contents. If they are really only 10x10 you can simply print them both to the screen and look at them. I'm not sure what else you could do if you are not interested in a specific distance emasure etc. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How two compare two matrixes
On Fri, Mar 04, 2011 at 01:49:29AM -0800, Alaios wrote: That's the problem Even a 10*10 matrix does not fit to the screen (10 columns do not fit in one screen's row) and thus I do not get a well aligned matrix printed. This is that makes comparisons not that easy to the eye. From the other hand with edit(mymatrix) I get scrolls so I can scroll to one row and see only the area I want to focus in. Problem with edit is that it blocks cli and thus I can not have two edits running at the same time. Hm - it does fit on my screen but if you're on a laptop... Maybe you could write both matrices to files and compare them in an external viewer (Excel, less, ...). If I remember correctly, the object browser/data viewer of JGR allows editing several objects at once. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overleap an iteration within a for-loop when error message produced
Assume that the 5th iteration (subject=5) leads to the error message. How can I tell R to continue with the 6th iteration? try or tryCatch are probably what you want. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table
I am using read.table to read a plan 3 column CSV file . the file is getting read . But the first column has datetime in the csv file in the following format: 20110221.114041 But this is being read as 20110221 only . the time portion (decimal is missing) in the data frame My guess is that ist does get read correctly but you only see part of the actual number because R will usually not print all available digits. Example: a - 20110221.114041 a [1] 20110221 options(digits) $digits [1] 7 format(a, digits=7) [1] 20110221 format(a, digits=20) [1] 20110221.114041 cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] qbeta
On Tue, Feb 22, 2011 at 10:09:51AM +, Dr. Alireza Zolfaghari wrote: Hi List, Does any body know how I can see the code behind qbeta function? As the code seems to be internal, you'll need to download the r-source code and find it in there. In my copy of R it is here: R-2.11.1/src/nmath/qbeta.c cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting logit to data
On Mon, Feb 21, 2011 at 12:13:09PM +0100, Sylvia Tippmann wrote: Hello, I'd like to fit a logit function to my data. The data is distributed like a logit (like in this plot on wikipedia http://en.wikipedia.org/wiki/File:Logit.png) but the values on the x-axis are not between 0 and 1. I don't think using a glm is the solution because I simply want to infer the parameters of the logit function (offset, compression, slope...), so I can apply it to all my values on x and get my value y. Two ideas: 1) scale your data so it does fit in [0,1] before fitting a glm 2) Use nls() to fit whatever function you find suitable cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix in R
On Fri, Feb 18, 2011 at 06:32:01AM -0800, danielepippo wrote: but if in my function pp_ris2[i,j]=myfunction} must be the indexes 0-0,0-1,0-2,0-3, ? You'll have to take care of that yourself with a bit of index arithmetics. It's the same you encounter in C, if you are modelling something that would like to be indexed starting with 1 - just the other way round. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saving plots
On Mon, Feb 14, 2011 at 12:59:13PM +0530, km wrote: Hi all, Is there a way to save the currently displayed plot to an image format just after we view it? I think this would be more intuitive as a user if I wish to save it just after I visualize the plot. I am aware that we need to do some thing like this jpeg('somefilename.jpg') ... plot... commands... dev.off() In addition to savePlot, which has already been recommended, you may also want to look at dev.copy2eps and dev.copy2pdf. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.Date
On Wed, Feb 09, 2011 at 08:44:30AM +0100, Valeri Fabio wrote: Hello, I find out which package disturbs as.Date(). It is the package Epi: as.Date(36525, origin=1900-01-01) [1] 2000-01-02 library(Epi) as.Date(36525, origin=1900-01-01) [1] 2070-01-01 detach(package:Epi) as.Date(36525, origin=1900-01-01) [1] 2000-01-02 OK - that makes sense. Epi has its own as.Date.numeric function and upon loading the package you get a warning: library(Epi) Attaching package: 'Epi' The following object(s) are masked from 'package:base': as.Date.numeric, merge.data.frame A quick look at the manual page confirms that Epi's version does not have an origin option. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.Date
I have a strange behavior of the as.Date() function. For example: as.Date(36525, origin=1900-01-01') I would expect to get 2000-01-01. But R gives me That's almost exactly what I get with R 2.11.1, LINUX (minus the one-day differnce which is probably correct, too lazy to count leap years...): as.Date(36525, origin=1900-01-01') [1] 2000-01-02 At first I thought the excess single quote maight be causingyour problem, but it doesn't for me. Maybe you need to upgrade R? Possibly it's an already fixed issue? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Average of several line plots
On Thu, Feb 03, 2011 at 01:36:57AM -0800, mattnixon wrote: The data doesn't represent functions. Basically the X values represent the distance across a sample and the Y values are a measure of the colour intensity at that point across the sample (i.e. a line plot across the sample). Each data set represents a measurement across a different section of the sample. All data sets show alternating 'light' and 'dark' sections, though the sample isn't perfect so the widths of each section do not entirely match up from one data set to another. The problem comes from the fact that some data sets contain as many as 400 measurements across the sample whereas others contain as few as 150 measurements. This means that measurements do not necessarily occur at the same value of X on different data sets. Therefore I think I need some way to average the lines ('of best fit') that each data set creates on the graph, rather than averaging the data ponits themselfs as I can't see how I can take averages/weighted averages of the data points when they occur at different values of X (and at different intervals) across the sample. Is my description any better this time? I am not 100% sure, but if I understand your problem correctly, loess() may be applicable. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CSV value not being read as it appears
On Fri, Jan 14, 2011 at 07:58:07PM +1000, bgr...@dyson.brisnet.org.au wrote: Thanks for your e-mail. The data was a report derived from a statewide database, saved in EXCEL format, so the usual issue of the vagaries of human data entry variation wasn't the issue as the data was an automated report, which is run every three months. If this problem occurs with computer generated data, it may also be worthwhile to talk to whoever is in charge of that reporting system and hope to get the bug fixed. And just to add one of my favorite inital checks: I always double check if the number of levels of each factor in my data.frame seems to make sense. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Panel title: mfrow() or ?
par(mfrow=c(3,2)) The 6 graphs are coming out quite all right, but now I would like to put a title on top of the page - i.e. something that is common for all 6 graphs - how can I do that? title(main=My title, outer=TRUE) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 300 dpi and eps:
Can someone recommend some paper that makes clear the relation and distinctions between vector and raster graphics, but especially with some practical examples in regard to what is the relation between page (height and width) and dpi. I'm not aware of a paper, but it's really not rocket science, as long as you stay away from color calibration at which point it IS rocket science ;-) You should not be concerned at all with the relation between page dimensions and dpi - that's the publishers business. All you need to ensure is that the figures you provide are of high quality and in an accepted format. In A. photoshop for example I can define for a graph width in inches, height in inches and resolution in pixels/inch color model CMYK and 8 bit. How one works in R? Don't. Just stick with a vector format. All journals I have ever dealt with accepted either EPS or PDF as a vector format. Or one saves the graph from postscript function as eps or tiff and you tell to the editor of the journal do whatever you want because I am done; I provided you already a vector graph that has infinite pixels?:-) Exacxplty that, except TIFF is not a vector format and you should not use it with R. Some rules of thumb: Use a pixel format if and only if 1) your image is a picture from a digital camera, scanner, microscope, screenshot or similar. I.e. the original graphics wants to be a pixel graphics by nature. 2) You are forced to by higher powers. In this case stick with vector format until your figure is 100% ready and only then convert to a high-resolution TIFF/PNG/whatever. Use a vector format in all other cases - especially if we are talking about things you create in the computer yourself: R graphs, flow-charts, technical illustrations, ... If you want to add annotation to an image (i.e. a pixel graphis) never use Photohop or similar software - instead import the pixel graphis into a vector graphis software (e.g. Illustrator/Corel Draw/...) and add your arrows, text etc. While this will not magically make the image a better or even infinite resolution, it will make sure the rest is. If you have images (scans, photos, ...), avoid lossy compression formats (e.g. jpg), use TIFF (or maybe PNG) instead. Lossy compression will a) ruin edges - e.g. lines of a graph (that's whu screenshots in jpg always look crappy) b) degrade in quality with every decompress - edit - compress cycle The only occasion I ever convert my vector graphics to a pixel format is when a colleague who has to use PowerPoint needs it for a presentation. As PowerPoint does not support any vector formats except that flaky wms/ems format there is no other choice. So I convert to a e.g. 600dpi png. (Has this changed in recent versions of PowerPoint?) But mind you: I don't do that in R, so I always have a vector format master figure. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to save play back an entire R session?
Saving the session history is indeed easy (savehistory). The problem is the playback. I didn't find a reliable method. Well, you could simply source() the .Rhistory file (or the file you saved under some other name). But as you already poitned out - it's better to go with scripts to begin with. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 300 dpi and eps:
Everything works fine to place them in a pdf file , or eps file, but when it comes to have a high quality of 300 dpi these graphs are not good. For example I open the eps file with Adobe Illustrator (AI) and it shows that it is a 72dpi graph. This is simply not true: it's an eps and thus of essentually infinite resolution for all practial purposes. So your problem is not with the R-generated eps but somewhere downstream from that. Any postprocessing, conversion or editing? cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluating NAs in a dataframe
Hi! How can one evaluate NAs in a numeric dataframe column? For example, I have a dataframe (demo) with a column of numbers and several NAs. If I write demo.df = 10, numerals will return TRUE or FALSE, but if the value is NA, NA is returned. But if I write demo.df == NA, it returns as NA Sounds like you are looking for is.na : is.na(c(1,NA,3)) [1] FALSE TRUE FALSE As an example, I want to assign rows to classes based on values in demo$Area. Some of the values in demo$Area are NA for (i in 1:nrow(demo)) { if (demo$Area[i] 0 demo$Area[i] 10) {Class[i]-S01} ## 1-10 cm2 if (demo$Area[i] = 10 demo$Area[i] 25) {Class[i] - S02} ## 10-25cm2 [...] if (demo$Area[i] =3200) {Class[i] - S10} ## 3200 cm2 } What happens is that I get the message Error in if (demo$Area[i] 0 demo$Area[i] 10) { : missing value where TRUE/FALSE needed First of all, you don't need a loop here. Example: # make up some data foo - data.frame(a=sample(1:20, 20, replace=TRUE)) # assign to classes foo$class - cut(foo$a, breaks=c(-1, 7, 13, 20), labels=c('small', 'medium', 'large')) This also works in the presence of NAs - but of course the class will be NA in those cases which, at least in my opinion, is the correct value. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to save a data set as .txt on fly?
On Thu, Nov 25, 2010 at 09:23:17PM -0800, Stephen Liu wrote: Hi David, But you didn't try: DNase# which was after all the name of the object you saved. Sorry I don't follow. He is telling you that it is not surprising that 'aaa' does not exist, if the object you saved was called DNase I can't do it with following steps: DNase save(DNase, file=C:/Users/satimis/Documents/dnase.txt) load(file=C:/Users/satimis/Documents/dnase.txt) dnase Error: object 'dnase' not found dnase.txt Error: object 'dnase.txt' not found Again - you need to use the name of the object which happens to be 'DNase' - not 'dnase', 'dnase.txt' or 'aaa' I'm curious to know why the .txt file created in this way can't be read with Notpad and WordPad? It can be read with them - only it does not look the way you expected. If you want to export data for use in other software funcitons like write.table may be of interest to you. Load and save are meant for use in R, only. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to save a data set as .txt on fly?
He is telling you that it is not surprising that 'aaa' does not exist, if the object you saved was called DNase No, such a file. Just rechecked it with Win7 search command. (Win7 has been rebooted) You are confusing two things here: a) Files living in the filesystem b) Objects in your R workspace When you saved your dataset you had to assign a filename, obviously. But this has nothing to do with the name of the object(s) contained in these files. So no matter what your file is called this has no effect on the names of the objects you get upon loading the file. I recommend reading ?save, ?load and ?save.image. Again - you need to use the name of the object which happens to be 'DNase' - not 'dnase', 'dnase.txt' or 'aaa' Thanks. I have this idea at the beginning. A further thought changed my mind. On R console DNase displays the content of the data set. If I save the file in the same name. It may confuse me on running DNase whether the output is the content of the data set OR from the file created. R does not care about the file unless you load it and you can pick any filename you like without affecting the name of the object(s). Once loaded, there is no magical link between the two. Of course, when you load objects from a file this will overwrite any objects of the same names (object names, not file names!) that happen to live in your workspace before the load command. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shifting down ylab in a plot
I am trying to shift down the ylab of my plot but can't find how to do it. I tried to tune mar but it enable more room for the labels to be displayed but it does not move to ylab as I would like. Is there a way with par to shift down my ylab ?? There may be a simpler/more elegant way to do it, but this does what you asked for: # plot data omitting the ylab plot(1:10,1:10, ylab='') # add the ylab myself using flushleft (adj=0.0) mtext('foo', side=2, line=3, adj=0.0) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change column of numbers in data frame to days
I have a vector of numbers ranging form 20 to 500. The numbers represent days since a starting point. The list is not consecutive, some numbers skipped and some numbers duplicated. I know day 1 was a Monday. I want to use this vector in a lm but I need to factor by day. I'm wondering how to assign Monday to 22,29,36,..., Tuesday to 23,30,37,... etc... Here is one way to do it: # make some sample data foo - c(22,29,23,37) # convert to factor of weekdays foo- factor(foo %% 7, levels=1:7, labels=c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')) foo cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with R
On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote: FATAL ERROR: unable to restore saved data in .RDATA Without more information it's hard to know what exactly went wrong. Anyway, the message most likely means that the .RData file got corrupted. Deleting it should solve the problem. Note that this means that you will get an empty workspace and have to recreate whatever data was in it before. I decided to uninstall the copy (a R2.11.0) and installed a new version (2.11.1) but I'm still receiving the same message. When I click OK the closes. Re-installation of R will most likely not fix this (unless a change in the format of the .RData files had occurred - but to my knowledge no such thing has happened, recently.) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scripting help
On Wed, Sep 15, 2010 at 12:22:15PM -0400, Ayyappa Chaturvedula wrote: Dear all, I am new to R and this group. I have good experience in S scripts. I need some orientation on data imports, general plotting functions. Can you please direct me? Welcome to R. Coming from an S background you should have no problems to adjust quickly. For data import have a look at the R Data Import/Export manual: http://cran.r-project.org/doc/manuals/R-data.html Plotting is not to different from S-Plus. As far as my knowledge goes there are 3 differnet plotting frameworsk in R: 1) Basic plot functions like plot or hist some of which are covered in the Introduction to R 2) lattice (the R aequivalent of trellis graphics covered in many manual pages, many tutorials and talks google will quickly find and last but not least the book Lattice: Multivariate Data Visualization with R by Deepayan Sarkar who implemented lattice 3) ggplot2 See http://had.co.nz/ggplot2/ for documentation and consider the book ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham (the author of ggplot2) I use all three frameworks on a regular basis - choosing the respective functions depending on the complexity of the task. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice: layout and number of pages
Dear expeRts, ?xyplot says: In general, giving a high value of ‘layout[3]’ is not wasteful because blank pages are never created. But the following example does generate blank pages - well except for the ylab: data(barley) require(lattice) stripplot(yield~year|site, barley, layout=c(2,1,5)) Did I misinterpret the sentence from the help page or is this a bug? Yes - I know that his works fine: stripplot(yield~year|site, barley, layout=c(2,1)) Just curious... cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Adaptively Set Up the Coordinate Range of Multiple Graphs in One Figure
On Tue, Aug 31, 2010 at 03:26:12AM -0700, Wonsang You wrote: In the above codes, I had to arbitrarily set up the coordinate range of the figure in advance before calculating the values y. (seexlim and ylim) In results, the figure did not contain all data since most of data were outside the predefined range. I am wondering about how to control xlim and ylim adaptive to the real range of data, in order to include all data in the figure. You do that by not specifying xlim and ylim - in that case R will calculate them based on your data. Maybe I did not understand waht exactly you want to get but if you explicitly set the limits that's what R is going to use. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.loess and NA/NaN values
What you can do is patch the code to add the NAs back after the Prediction step (which many predict() methods do). Thanks Andy for your hints and especially for digging into the problem like this! I have, in the meantime, written a simple wrapper around predict.loess that fills in the NAs, where I would like to have them. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.loess and NA/NaN values
On Mon, Aug 30, 2010 at 01:50:03PM +0100, Prof Brian Ripley wrote: The underlying problem is your expectations. R (unlike S) was set up many years ago to use na.omit as the default, and when fitting both lm() and loess() silently omit cases with missing values. So why should prediction from 'newdata' be different unless documented to be so (which it is nowadays for predict.lm, even though you are adding to the evidence that was a mistake)? Thanks for your insights into the undelying philisophy. I agree that na.omit is a sensible default for model fitting. But I am not so sure that quietly omitting unpredictable values is such a good idea - especially if predict methods for different types of model implement inconsistent approaches. I see no disadvantage in returning NA where no prediction/computation is possible -- the value is 'Not Available', after all. (And the length of the result vector would match nrow(newdata) which would be handy for most practical purposes) loess() is somewhat different from lm() in that it does not in general allow extrapolation, and the prediction for Inf and NaN is simply undefined. Of course this is correct but I still think that predict.loess not only acts in a way that will most likely be surprising to most users but also inconsistent with itself (Inf vs. NA/NaN). If extrapolation is the problem Inf should not yield anything but it does (and the same applies to values outside of the original x-range): x - rnorm(15) y - rnorm(15) model.loess - loess(y~x) predict(model.loess, data.frame(x=c(0.5, Inf))) # [1] -0.02508801 NA predict(model.loess, data.frame(x=min(x)-10)) # [1] NA Actually, while tracking down my problem I did consider that extrapolation could be the problem and, according to the last example in ?loess, tried to set control = loess.control(surface = direct). To my surprise, now even Inf fails - although I am much happier with getting an error message than with silent omission. Anyway, writing a little wrapper that puts NAs back into results, is not a big deal and in that respect my problem is solved. Nevertheless, take a look at the version in R-devel (pre-2.12.0) which give you more options. Thanks for that information - I will definitely have a look at that. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict.loess and NA/NaN values
Hi! In a current project, I am fitting loess models to subsets of data in order to use the loess predicitons for normalization (similar to what is done in many microarray analyses). While working on this I ran into a problem when I tried to predict from the loess models and the data contained NAs or NaNs. I tracked down the problem to the fact that predict.loess will not return a value at all when fed with such values. A toy example: x - rnorm(15) y - x + rnorm(15) model.lm - lm(y~x) model.loess - loess(y~x) predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN))) The behaviour of predict.lm meets my expectation: I get a vector of length 5 where the unpredictable ones are NA or NaN. predict.loess on the other hand returns only 3 values quietly skipping the last two. I was unable to find anything in the manual page that explains this behaviour or says how to change it. So I'm asking the community: Is there a way to fix this or do I have to code around it? This is in R 2.11.1 (Linux), by the way. Thanks in advance Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: basic hist() question
It works fine. Could you explain to me why it did not worked for read.table? Because of what Gavin already explaied in his reply: read.table returns a data.frame and hist needs a vector. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] which one give clear picture-pdf, jpg or tiff?
On Fri, Aug 20, 2010 at 09:30:18AM -0500, Stuart Luppescu wrote: On Fri, 2010-08-20 at 01:30 -0700, Joshua Wiley wrote: I usually save them from R as a PDF or postscript file, rasterize them in GIMP (free answer to Photoshop) at the desired resolution, and finally choose the desired format/compression (jpeg, png, bitmap, tiff, etc.) to save it as from there. Woah. That's really involved. I use this little shell function to convert from ps to png: function ps2png { ps_file=$1 png_file=`echo $ps_file | sed -e 's/\.ps$/.png/'` gs -dQUIET -dNOPAUSE -dBATCH -sDEVICE=png16m -sOutputFile=$png_file -r200x200 $ps_file } I'd like to add yet another tool that I use on LINUX systems for this purpose: imagemagick # turn EPS into PNG at 600dpi convert -density 600 foo.eps foo.png Very conveniant, especially if there are lots of figures to be converted: for file in *.eps; do convert -density 600 $file `basename $file .eps`.png; done cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading a text file, one line at a time
On Sun, Aug 15, 2010 at 10:58:51AM -0400, Data Analytics Corp. wrote: I have an upcoming project that will involve a large text file. I want to 1. read the file into R one line at a time 2. do some string manipulations on the line 3. write the line to another text file. You already got some good advice about how to solve this in R. I would just like to add that many people, including myself, prefer to do all text file scrubbing and especially string manipulations in scripting languages like Python or Perl followed by statistical analysis in R. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where the data file is stored?
On R I create a datafile named data. I can evoke it on R with; data On R Commander Data - Active data set - Select active data set - (data) OK only one data set there data - View data set I can read it - Edit data set showing 25 rows of data. Clicking the box shows a thick border around it. But I couldn't edit the data inside the box. I wonder where this datafile is stored on the OS On Ubuntu terminal; $ locate data.rda $ locate data.image $ locate data.images $ locate data.csv You dont't tell us what you did to create a datafile - to me it sounds like you created an object (probably a data frame) in your R workspace. If that's the case it is stored in a file called .RData in your current work directory (together with other variables in your workspace). If that is not what you did please give us mre information. BTW: R has a function called data and it is not a very good idea to use function names as variable names. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] basic question about t-test with adjusted p value
On Sat, Aug 07, 2010 at 04:08:40PM -0400, josef.kar...@phila.gov wrote: I have read the R manual and help archives, sorry but I'm still stuck. How would I do a t-test with an adjusted p-value? Suppose that I use t.test ( ) , with the function argument alternative = two.sided, and data such that degrees of freedom = 20. The function calculates a t-statistic of 2.086, and p-value =0.05 How do I then adjust the p-value? My thought is to do p.adjust (pt(2.086, df=20),BH) but that doesn't change anything (returns 0.975) what is the procedure? I'm sorry if there is a basic concept that I am missing here... I'm confused - what result where you expecting? p.adjust will need to know the number of test you are trying to adjust for - either by giving explicitly giving the number or by handing a vector of p-values to the function. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 64-bit R on 64-bit Windows box... Still not enough memory?!
On Thu, Aug 05, 2010 at 04:40:48PM -0700, noclue_ wrote: I have a 64-bit windows box - Intel Xeon CPU E7340 @ 2.4GHz 31.9GB of RAM I have R 2.11.1 (64bit) running on it. My csv data is 3.6 GB (with about 15 million obs, 120 variables.) Here is my guess: Your vraiables are mstly numeric but only given with two significant digits in the csv file: A B ... 0.0 12.0 1.3 0.4 2.3 1.1 So that would make 15e6 * 120 * 3 / 1024^3 = 5.0 Gb You ahve 3.6Gb - but that's close enough. If you read that into R, each nume ris represented as a double - i.e. 8 byte. Thus the entire data frame takes 15e6 * 120 * 8 / 1024^3 = 13.4Gb With almost half of your memory taken things can get problematic. Once you start actually working with the data you'll have to allow for a lot more space because copies will probably be made in the process. So you may have to put your data into a database and process it in pieces. Or use sqldf or bigmemory or something like that. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 64-bit R on 64-bit Windows box... Still not enough memory?!
On Fri, Aug 06, 2010 at 09:03:09AM -0700, noclue_ wrote: .Machine$sizeof.pointer [1] 4 So it appears you are not on 64bit. Experpt form the help page: [...] sizeof.pointer: the number of bytes in a C ‘SEXP’ type. Will be ‘4’ on 32-bit builds and ‘8’ on 64-bit builds of R. [...] cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to apply apply?!
How do I multiply only the close of every row using the 'apply' function? And once multiplied how do I obtain a new table that also contains the new 2*CLOSE column (without cbind?). You don't use apply in this case - a simple multiplication and variable assignment will do: require(tseries) foo - get.hist.quote('^GDAXI') foo[1:10, ] Open HighLow Close 1991-01-02 1375.4 1375.4 1359.1 1366.1 1991-01-03 1371.7 1374.7 1365.2 1366.7 1991-01-04 1375.4 1398.0 1375.4 1396.1 1991-01-07 1373.6 1373.6 1352.5 1358.2 1991-01-08 1350.4 1357.1 1345.5 1354.0 1991-01-09 1358.6 1380.8 1358.0 1375.2 1991-01-10 1367.9 1383.7 1363.1 1383.4 1991-01-11 1401.3 1406.4 1376.9 1382.3 1991-01-14 1354.5 1354.5 1327.8 1327.8 1991-01-15 1327.0 1330.3 1312.4 1325.6 foo$Close - foo$Close * 2 foo$Close - foo$Close * 2 foo[1:10, ] Open HighLow Close 1991-01-02 1375.4 1375.4 1359.1 2732.2 1991-01-03 1371.7 1374.7 1365.2 2733.4 1991-01-04 1375.4 1398.0 1375.4 2792.2 1991-01-07 1373.6 1373.6 1352.5 2716.4 1991-01-08 1350.4 1357.1 1345.5 2708.0 1991-01-09 1358.6 1380.8 1358.0 2750.4 1991-01-10 1367.9 1383.7 1363.1 2766.8 1991-01-11 1401.3 1406.4 1376.9 2764.6 1991-01-14 1354.5 1354.5 1327.8 2655.6 1991-01-15 1327.0 1330.3 1312.4 2651.2 2) Also, how do I run a generic function per row. Say for example I want to calculate the Implied Volatility for each row of this data frame ( using the RMterics package). How do I do that please using the apply function? I am focusing on apply because I like the vectorisation concept in R and I do not want to use a for loop etc. You can get the manual page of any R-command by either preceding it by a question mark or giving the command as an argument to the help function - specificly: ?apply help(apply) Especially the example section is useful for a jumpstart. Here is an example of computing row means: apply(foo, 1, mean) Instead of 'mean' you can insert whatever function you'd like to apply. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: cannot allocate vector of size xxx Mb
On Thu, Aug 05, 2010 at 03:53:21AM -0400, Ralf B wrote: a - rnorm(500) Error: cannot allocate vector of size 38.1 Mb When running memory.limit() I am getting this: memory.limit() [1] 2047 Which shows me that I have 2 GB of memory available. What is wrong? Shouldn't 38 MB be very feasible? From what I gather fomr ?memory.limit it does not tell you how much memory it currently available. So my guess is that you have som rather large objects in your workspace already and thus there is not enough space left for you vectors. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting dataframe to matrix
On Fri, Oct 16, 2009 at 01:33:14AM -0700, Noah Silverman wrote: Hi, I'm experimenting with a few learners that require a matrix as their input. (Currently svmpath, vbmp, etc.) I currently have a dataframe with 50 columns and 20,000 rows. I tried using: x - as.matrix(my_data.frame) If I then as, is.matrix(x), I get TRUE. However everywhere I've tried to use the matrix returns errors. Without more information I can't even start to guess what is going wrong. Please give a short, reproducible example of what you did and what errors you encountered. as.matrix() should suffice for creating a matrix from a data.frame : foo - data.frame(1:4, 4:1, sqrt(1:4), log(4:1)) foo X1.4 X4.1 sqrt.1.4. log.4.1. 114 1.00 1.3862944 223 1.414214 1.0986123 332 1.732051 0.6931472 441 2.00 0.000 det(foo) Error in UseMethod(determinant) : no applicable method for determinant det(as.matrix(foo)) [1] -0.1092489 So probably your problem is somewhere else. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting dataframe to matrix
On Fri, Oct 16, 2009 at 01:55:03AM -0700, Noah Silverman wrote: I think you may be correct. I've manage to get the data into a format that the function accepts. The error appears to be because I have negative values in my data: Error in apply(safeNormCDF(s), 1, prod) : dim(X) must have a positive length Sounds like safeNormCDF() does not return a matrix but a vector. What does dim(safeNormCDF(s)) say? apply(1:9, 1, sum) Error in apply(1:9, 1, sum) : dim(X) must have a positive length apply(matrix(1:9, nrow=3), 1, sum) [1] 12 15 18 apply(matrix(1:9, nrow=1), 1, sum) [1] 45 cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two graphs 1 x-axis
On Fri, Oct 16, 2009 at 12:22:06PM +0200, Duijvesteijn, Naomi wrote: I have a question concerning plotting graphs. Here an example dataset a-c(1,2,3,4,5,6) b-c(3,5,4,6,1,1) c-c(1,1,1,1,1,1) d-as.data.frame(cbind(a,b,c)) plot.new() plot(d$a, d$b, col=red) par(new=TRUE) plot(d$a,d$c, col=red, pch=|) What I would want is to plot de second plot under the first plot. So not in the the first plot. There is a way to divide your graph in 2 or 3 parts and use the same x-axis but I do not seem to get it right. Could somebody help me out? Yes, use something alng these lines: par(mrfow=c(2,1)) plot(d$a, d$b, col=red) plot(d$a, d$c, col=red, pch=|) As both plots use the same data for X you are set. If you need to force two datasets with different x-ranges into the same range, you can use the xlim parameter to define the desired range. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stretch the x-axis for better alignment comparison
On Wed, Sep 23, 2009 at 11:25:23AM -0700, Maggie wrote: I have the following code that aligns the two graphs. Problem is that in .pdf it gives me it x-axis (0-100) is broken down into 0-20, 20-40..and so on. I wonder if there is for it to display the x-axis (and y-axis) in more detail than that. Without the necessary data I canot directly reproduce your example but have a look at this for a start: plot(0:10) axis(1, seq(0,10,0.2), labels=F) You may also want to use xaxt='n' in the plot command and then construct use axis to build the axis the way you want it. If reading out data from the graph is a concern, you may also want to look at the grid() command. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange split behavior?
On Wed, Sep 23, 2009 at 07:29:30AM -0500, Peng Yu wrote: On Wed, Sep 23, 2009 at 1:24 AM, Peter Dalgaard p.dalga...@biostat.ku.dk wrote: Peng Yu wrote: Is there an operation on a factor to get a subset and keep only the corresponding levels (see commented line below)? Yes, there is: call factor() on your subset: a - factor(rep(letters[1:5], 5)) a [1] a b c d e a b c d e a b c d e a b c d e a b c d e Levels: a b c d e b - a[a!='b'] b [1] a c d e a c d e a c d e a c d e a c d e Levels: a b c d e factor(b) [1] a c d e a c d e a c d e a c d e a c d e Levels: a c d e cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suppressing script commands in R console when executing long program
On Fri, Sep 18, 2009 at 03:46:27PM +1000, Steven Kang wrote: *Q1. Are there any way of suppressing the commands in the R console?* I think this has been answered already. *Q2. Is R capable of reading numbers that are represented with 1,000 separator commas?* I am not aware of an option to read.table and freinds that does this but you can recover easily: foo - read.delim('foo.tbl') foo A B 1 1 12,300 2 2 256,001.01 3 3 900.1 4 4 80 str(foo) 'data.frame': 4 obs. of 2 variables: $ A: int 1 2 3 4 $ B: Factor w/ 4 levels 12,300,256,001.01,..: 1 2 4 3 foo$B - as.numeric(sub(',', '', as.character(foo$B))) foo AB 1 1 12300.0 2 2 256001.0 3 3900.1 4 4 80.0 str(foo) 'data.frame': 4 obs. of 2 variables: $ A: int 1 2 3 4 $ B: num 12300 256001 900 80 cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Datetime conversion
The same what you have worked out is my need but i'm getting the following error Error in `$-.data.frame`(`*tmp*`, date, value = list(sec = c(0, 0, : replacement has 9 rows, data has 14 Please give more detail about what you did. This error is certainly not from the example used in previous postings, as the data fram eused there has 9 rows, not 14. Without the details (code) on what you did its all guesswork. Perhaps you are mixing two data.frames of differnt shape or ... cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suppressing script commands in R console when executing long program
On Fri, Sep 18, 2009 at 12:59:16PM +0200, Philipp Pagel wrote: foo$B - as.numeric(sub(',', '', as.character(foo$B))) Thinking about it some more, you should use gsub instead of sub here. Otherwise only the first occurrence of the thousands separator will be removed. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Datetime conversion
On Fri, Sep 18, 2009 at 04:32:27AM -0700, premmad wrote: Sorry for confusing you all with my inexperienced posting . I tried as u said if you have 9 rows in the data it is working fine but please try out the same example as you have suggested earlier with morethan 9 rows. I tried it as following datetime -c( + 01OCT1987:00:00:00.000, + 12APR2004:00:00:00.000, + 01DEC1987:00:00:00.000, + 01OCT1975:00:00:00.000, + 01AUG1979:00:00:00.000, + 26JUN2003:00:00:00.000, + 01JAN1900:00:00:00.000, + 13MAY1998:00:00:00.000, + 30SEP1998:00:00:00.000, + 30SEP1998:00:00:00.000, + 30SEP1998:00:00:00.000, + 30SEP1998:00:00:00.000) dt - as.data.frame(datetime) dt$date-strptime(as.character(dt$datetime),%d%b%Y) and got the following error : Error in `$-.data.frame`(`*tmp*`, date, value = list(sec = c(0, 0, : replacement has 9 rows, data has 12. Oops - sorry you are right. There is a Problem with inserting the object. Try this instead: dt$date - as.Date(dt$datetime, %d%b%Y) cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] latex code in R - convert to pdf
is it possible to convert latex code to pdf in R (like a latex-program would do it)? Is there a package that comes with this capabilities? My problem is that I want to generate tables automatically - and I can't use a latex editor at that computer ... Besides latex ... are there good ways to generate tables in R? Have a look at Sweave and xtable - I think that's what you want. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] latex code in R - convert to pdf
On Thu, Sep 17, 2009 at 10:08:57AM +0200, Philipp Pagel wrote: is it possible to convert latex code to pdf in R (like a latex-program would do it)? Is there a package that comes with this capabilities? My problem is that I want to generate tables automatically - and I can't use a latex editor at that computer ... Besides latex ... are there good ways to generate tables in R? Have a look at Sweave and xtable - I think that's what you want. Charlies post made me aware that by latex editor you may mean that there is no LaTeX installation on your machine. In that case Sweave and xtable will obviously be of little use. If you have Openoffice on that computer package odfWeave may be the solution. If openoffice is not available, either, maybe package HTMLUtils would be another option (I haven't used it so far, so I may be wrong here). cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 08, 2009 at 02:53:11PM +0300, Lauri Nikkinen wrote: I have a text file similar to this (separated by spaces): x - DF12 This is an example 1 This DF12 This is an 1232 This is DF14 This is 12334 This is an DF15 This 23 This is an example and I know the field lengths of each variable (there is 5 variables in this data set), which are: varlength - c(2, 2, 18, 5, 18) How can I import this kind of data into R, using the varlength variable as an field separator indicator? I am not totally sure what exaclty the expected result is. From your description I got the impression that your data file uses a mixture of separation characters and fixed-width formatting. Maybe I misinterpreted your example. Have a look at read.fwf() an if that does not solve your problem maybe explain the Structure and expected result a little further. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data separated by spaces, getting data into R using field lengths
On Tue, Sep 08, 2009 at 03:21:53PM +0300, Lauri Nikkinen wrote: This data is from database and the maximum length of a field is defined. I mean that every column has a maximum length and I want to use this maximum length as a separator. So if one cell in that column is shorter than the maximum, cell should be padded with white spaces or something like that. This seems to be hard to explain. OK - now I got it. RODBC has already been sugested. If for some reason that is impossible you could try to dump the data using a proper delimiter (e.g. tab). Without a real delimiter it is certainly hard to parse the data - and it may even be impossible depending on what characters are allowed in your free-text fields. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.