[R] possible spam alert
The last two times I have originated message threads on R or Bioconductor I have received the message included below from someone named Patrick Connolly. Both times I was the originator of the message thread and used what I thought was a unique subject line that explained as best I could what my question was. Patrick seems to be implying that I am abusing the R and BioC help newsgroups in this fashion. When I emailed him to give me a specific example, he did not reply. The most recent thread that he seems concerned about was to the R list and was entitled regexpr and parsing question . I believe the previous post of mine that he had problems with was to the BioC list but I can't remember its subject. Is this spam? If I am doing this correctly, you should see the subject possible spam alert in the subject header of THIS message. Would the moderators of the lists please check and see if I am doing some wrong and, if not, inform Mr. Connolly that I am not. If others have received this message in error, it is possible it is spam and users should be alerted. Thanks, Mark Mark W. Kimpel MD Official Business Address: Department of Psychiatry Indiana University School of Medicine PR M116 Institute of Psychiatric Research 791 Union Drive Indianapolis, IN 46202 This is a request to anyone who starts a new subject to begin with a new message and NOT reply to an existing one. If your mail client is any good, it's very simple to set up an alias (mine is simply 'r') so that the tedious task of typing 'r-help@stat.math.ethz.ch' is unnecessary and it's quicker than scrolling through an address book. It's also quicker than deleting the previous subject. Most mornings, I have over a screenful of messages mostly from R-help and it's very useful to have them threaded. However, the usefulness of threading is lost when posters reply to a message and then change the subject instead of creating a new message. People who don't have a mail client that can display email in threads are probably unaware that this sort of thing can happen in ones that do: 37 N 25 Jan Luis Silva ( 34) [R] plot/screen 38 N 25 Jan Uwe Ligges ( 55) `- 39 N 25 Jan Fernando Henrique Ferra ( 20) [R] Plotting coloured histograms - 40 N 26 Jan Mohamed A. Kerasha ( 12) |-[R] Distributions. 41 N 26 Jan [EMAIL PROTECTED] ( 26) | |- 42 26 Jan Qin Xin ( 9) | `-[R] how could I add legends 43 27 Jan Ko-Kang Kevin Wang ( 31) | `- 44 N 26 Jan Remigijus Lapinskas ( 32) |-Re: [R] Plotting coloured his 45 N 26 Jan Damon Wischik (125) `- 46 N 25 Jan [EMAIL PROTECTED] ( 10) [R] plotting primatives, ellipse 47 N 25 Jan Uwe Ligges ( 19) `- As Martin Maechler explained some time ago, it also screws up the archives for a similar reason. Your cooperation will be greatly appreciated. best -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] possible spam alert
Peter, Thanks you for your explanation, I had taken Mr. Connolly's message to me to imply that I was not changing the subject line. I use MS Outlook 2007 and, unless I am just not seeing it, Outlook does not normally display the in reply to header, I was under the mistaken impression that that was what the Subject line was for. See, for example, the header to your message to me below. Outlook will, however, sort messages by Subject, and that is what I thought was meant by threading. Well, I learned something today and apologize for any inconvenience my posts may have caused. BTW, I use Outlook because it is supported by my university server and will synch my appointments and contacts with my PDA, which runs Windows CE. If anyone has a suggestion for me of a better email program that will provide proper threading AND work with a MS email server and synch with Windows CE, I'd love to hear it. Thanks again, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 6:25 PM To: Kimpel, Mark William Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] possible spam alert Kimpel, Mark William wrote: The last two times I have originated message threads on R or Bioconductor I have received the message included below from someone named Patrick Connolly. Both times I was the originator of the message thread and used what I thought was a unique subject line that explained as best I could what my question was. Patrick seems to be implying that I am abusing the R and BioC help newsgroups in this fashion. When I emailed him to give me a specific example, he did not reply. The most recent thread that he seems concerned about was to the R list and was entitled regexpr and parsing question . I believe the previous post of mine that he had problems with was to the BioC list but I can't remember its subject. Is this spam? No. Breach of netiquette, yes. The message in question starts a new thread, yet contains an In-Reply-To: header line, which presumably means that you started writing the message as a reply to something completely unrelated, specifically: Re: [R] change plotting symbol for groups in trellis graph. You should not do that, unless you know how to remove the In-Reply-To line (and this is not obvious in many mail clients); changing the subject is not sufficient. If I am doing this correctly, you should see the subject possible spam alert in the subject header of THIS message. Would the moderators of the lists please check and see if I am doing some wrong and, if not, inform Mr. Connolly that I am not. If others have received this message in error, it is possible it is spam and users should be alerted. Thanks, Mark Mark W. Kimpel MD Official Business Address: Department of Psychiatry Indiana University School of Medicine PR M116 Institute of Psychiatric Research 791 Union Drive Indianapolis, IN 46202 This is a request to anyone who starts a new subject to begin with a new message and NOT reply to an existing one. If your mail client is any good, it's very simple to set up an alias (mine is simply 'r') so that the tedious task of typing 'r-help@stat.math.ethz.ch' is unnecessary and it's quicker than scrolling through an address book. It's also quicker than deleting the previous subject. Most mornings, I have over a screenful of messages mostly from R-help and it's very useful to have them threaded. However, the usefulness of threading is lost when posters reply to a message and then change the subject instead of creating a new message. People who don't have a mail client that can display email in threads are probably unaware that this sort of thing can happen in ones that do: 37 N 25 Jan Luis Silva ( 34) [R] plot/screen 38 N 25 Jan Uwe Ligges ( 55) `- 39 N 25 Jan Fernando Henrique Ferra ( 20) [R] Plotting coloured histograms - 40 N 26 Jan Mohamed A. Kerasha ( 12) |-[R] Distributions. 41 N 26 Jan [EMAIL PROTECTED] ( 26) | |- 42 26 Jan Qin Xin ( 9) | `-[R] how could I add legends 43 27 Jan Ko-Kang Kevin Wang ( 31) | `- 44 N 26 Jan Remigijus Lapinskas ( 32) |-Re: [R] Plotting coloured his 45 N 26 Jan Damon Wischik (125) `- 46 N 25 Jan [EMAIL PROTECTED] ( 10) [R] plotting primatives, ellipse 47 N 25 Jan Uwe Ligges ( 19) `- As Martin Maechler explained some time ago, it also screws up the archives for a similar reason. Your cooperation will be greatly appreciated. best __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org
[R] Outlook does threading
See below for Bert Gunter's off list reply to me (which I do appreciate). I'm putting it back on the list because it seems there is still confusion regarding the difference between threading and sorting by subject. I thought the example I will give below will serve as instructional for other Outlook users who may be similarly confused as I was (am?). Per Bert's instructions, I just set up my inbox to sort by subject. I sent one email to myself with the subject test1 and then replied to it without changing the subject. The reply correctly went to test1 in the inbox sorter. I then changed the subject heading in the test1 reply to test2 and sent it to myself. This time Outlook re-categorized it and put it in a separate compartment in the view called test2. If Outlook can do threading the way the R mail server does, I don't think this is the way to do it. Unless someone has an idea of how to correctly set up Outlook to do threading in the manner that the R mail server does, I think the message for us Outlook users is to just create, from scratch, a new message when initiating a new subject. Thanks for all your help. Mark -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 7:03 PM To: Kimpel, Mark William Subject: Outlook does threading Mark: No need to bother the R list with this. Outlook does threading. Just sort on Subject in the viewer. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark William Sent: Wednesday, January 31, 2007 3:36 PM To: Peter Dalgaard Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] possible spam alert Peter, Thanks you for your explanation, I had taken Mr. Connolly's message to me to imply that I was not changing the subject line. I use MS Outlook 2007 and, unless I am just not seeing it, Outlook does not normally display the in reply to header, I was under the mistaken impression that that was what the Subject line was for. See, for example, the header to your message to me below. Outlook will, however, sort messages by Subject, and that is what I thought was meant by threading. Well, I learned something today and apologize for any inconvenience my posts may have caused. BTW, I use Outlook because it is supported by my university server and will synch my appointments and contacts with my PDA, which runs Windows CE. If anyone has a suggestion for me of a better email program that will provide proper threading AND work with a MS email server and synch with Windows CE, I'd love to hear it. Thanks again, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 6:25 PM To: Kimpel, Mark William Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] possible spam alert Kimpel, Mark William wrote: The last two times I have originated message threads on R or Bioconductor I have received the message included below from someone named Patrick Connolly. Both times I was the originator of the message thread and used what I thought was a unique subject line that explained as best I could what my question was. Patrick seems to be implying that I am abusing the R and BioC help newsgroups in this fashion. When I emailed him to give me a specific example, he did not reply. The most recent thread that he seems concerned about was to the R list and was entitled regexpr and parsing question . I believe the previous post of mine that he had problems with was to the BioC list but I can't remember its subject. Is this spam? No. Breach of netiquette, yes. The message in question starts a new thread, yet contains an In-Reply-To: header line, which presumably means that you started writing the message as a reply to something completely unrelated, specifically: Re: [R] change plotting symbol for groups in trellis graph. You should not do that, unless you know how to remove the In-Reply-To line (and this is not obvious in many mail clients); changing the subject is not sufficient. If I am doing this correctly, you should see the subject possible spam alert in the subject header of THIS message. Would the moderators of the lists please check and see if I am doing some wrong and, if not, inform Mr. Connolly that I am not. If others have received this message in error, it is possible it is spam and users should be alerted. Thanks, Mark Mark W. Kimpel MD Official Business Address: Department of Psychiatry Indiana University School of Medicine PR M116 Institute of Psychiatric Research 791 Union Drive Indianapolis, IN 46202 This is a request to anyone who starts a new subject to begin with a new message and NOT reply to an existing one
Re: [R] Outlook does threading
Tony, I went to the MS link that you suggested (see below) and it indeed says that The Arrange by Conversation arrangement shows your e-mail items grouped by message subject or 'thread.' Instead of arranging by subject, I arranged my view by conversation and got exactly the same result that I had gotten when viewing by subject, i.e. MS Outlook looks only at the subject line when deciding on threads, conversations, subjects, or whatever you want to call it. I am, BTW, using Outlook 2007 on Windows XP SP2 and cannot vouch for Outlooks behavior in other versions or configurations. So, no matter what I do, it seems impossible for me to duplicate in Outlook what Gabor pointed out to me when he said, You can see how it looks to most readers by viewing it on gmane: http://thread.gmane.org/gmane.comp.lang.r.general/78065 Note that even though the subject has been changed its still listed as a child of another message rather than the start of a new thread. I did check and of course Gabor is correct. This subject does need to be put to bed. I have reread the posting guide for R-help at http://www.r-project.org/ and it does indeed say Do please create a new email message when posting to the list rather than replying to a previous message and simply changing the subject line! This allows sensible threading in the mailing list archives (and many users e-mail readers). To be honest, I probably read this 3 years ago when I subscribed to the list but, because my email reader doesn't behave this way, I just forgot about it. I email so many people during the day that I frequently hit reply to a previous message and then change the subject if appropriate. So, not to justify my behavior, but would it be possible for the R mail server to somehow check and see if the subject heading on a thread has been changed and then return-to-sender with a standard message explaining everything we have been through tonight? If Patrick Connolly sees this enough to have a standard message he sends out and Martin Maechler commented on it in the past, perhaps other Windows users of Outlook are doing the same thing I did. Rest assured that I have learned my lesson and won't repeat the same mistake, but if such a filter was put in place at the R mail server level, perhaps it would save the non-Outlook users a lot of aggravation. These exchanges have been edifying and I thank all for their patience and explanations. Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Tony Plate [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 31, 2007 8:08 PM To: Kimpel, Mark William Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] Outlook does threading Your final paragraph has the take-home message for everyone (not just MS Outlook users): just create, from scratch, a new message when initiating a new subject. Viewing threads can be completely different to sorting based on the subject line. Your initial post with the subject regexpr and parsing question was in fact a reply to the message from Gabor Grothendick in the thread Re: [R] change plotting symbol for groups in trellis graph. (I can see this by looking at the header information: I see a In-reply-to: header item.) When I view threads in the Thunderbird mail reader, your post and replies with the subject regexpr and parsing question do in fact show up under the thread in which Gabor's message appeared, not in their own thread. According to http://office.microsoft.com/en-us/outlook/HA011356671033.aspx, one can view threads in Outlook by selecting View-Arrange By-Conversation. Hope this helps (in case the horse was not thoroughly dead already.) -- Tony Plate Kimpel, Mark William wrote: See below for Bert Gunter's off list reply to me (which I do appreciate). I'm putting it back on the list because it seems there is still confusion regarding the difference between threading and sorting by subject. I thought the example I will give below will serve as instructional for other Outlook users who may be similarly confused as I was (am?). Per Bert's instructions, I just set up my inbox to sort by subject. I sent one email to myself with the subject test1 and then replied to it without changing the subject. The reply correctly went to test1 in the inbox sorter. I then changed the subject heading in the test1 reply to test2 and sent it to myself. This time Outlook re-categorized it and put it in a separate compartment in the view called test2. If Outlook can do threading the way the R mail server does, I don't think this is the way to do it. Unless someone has an idea of how to correctly set up Outlook to do threading in the manner that the R mail server does, I think the message for us Outlook users is to just create, from scratch, a new message when initiating a new subject. Thanks for all your help. Mark -Original
[R] regexpr and parsing question
The main problem I am trying to solve it this: I am importing a tab delimited file whose first line contains only one column, which is a descriptor of the form col_1 col_2 col_3, i.e. the colnames are not tab delineated but are separated by whitespace. I would like to parse this first line and make such that it becomes the colnames of the rest of the file, which I am reading into R using read.delim(). The file is so huge that I must do this in R. My first question is this: What is the best way to accomplish what I want to do? My other questions revolve around some failed attempts on my part to solve the problem on my own using regular expressions. I thought that perhaps I could change the first line to c(col_1, col_2, col_3) using gsub. I was having trouble figuring out how R uses the backslash character because I know that sometimes the backslash one would use in Perl needs to be a double backslash in R. Here is a sample of what I tried and what I got: a-col_1 col_2 col_3 gsub(\\s, , a) [1] col_1 col_2 col_3 gsub(\\s, \\s , a) [1] col_1scol_2scol_3 As you can see, it looks like R is taking a regular expression for pattern, but not taking it for replacement. Why is this? Assuming that I did want to solve my original problem with gsub and then turn the string into an R object, how would I get gsub to return c(col_1, col_2, col_3) using my original string? Finally, is there a way to declare a string as a regular expression so that R sees it the same way other languages, such as Perl do, i.e. make the backslash be interpreted the same way? For someone who is just learning regular expressions as I am, it is very frustrating to read about them in references and then have to translate what I've learned into R syntax. I was thinking that instead of enclosing the string in , one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we use I() in formulae. These are a bunch of questions, but obviously I have a lot to learn! Thanks, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [BioC] problem with biomaRt getHomolog function
Steffen, When the new biomaRt tries to load it errors out because I do not have RMySQL installed. There is not a Windows binary for RMySQL and it does contain C code that I do not know how to build. I do not use the MySQL option in biomaRt. Does RMySQL need to be a required dependency? Below is my screen output and sessionINfo. require(biomaRt) Loading required package: biomaRt Loading required package: RMySQL Error: package 'RMySQL' could not be loaded In addition: Warning message: there is no package called 'RMySQL' in: library(pkg, character.only = TRUE, logical = TRUE, lib.loc = lib.loc) sessionInfo() R version 2.5.0 Under development (unstable) (2007-01-19 r40528) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils tools methods base other attached packages: DBI limma affyaffyio Biobase 0.1-12 2.9.8 1.13.14 1.3.3 1.13.34 Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Steffen Durinck [mailto:[EMAIL PROTECTED] Sent: Friday, January 26, 2007 9:24 AM To: Kimpel, Mark William Cc: [EMAIL PROTECTED] Subject: Re: [BioC] problem with biomaRt getHomolog function Hi Mark, I think the rat entrezgene id 613226 is a recently added entrezgene id and is not yet available in Ensembl. Ensembl updates every two months and the last update of entrezgene id 613226 appears to be December 26 of 2006. So this might be the reason. Also I would suggest you use the developmental version of biomaRt (biomaRt_1.9.15) to do getHomolog queries. A recent change in the BioMart suite enables the biomaRt package to retrieve both the id you use to query and the ids in the result of the query. Here's an example: rat.entrezgene.ID=c(24842,83502,24205) mouse.mart - useMart(ensembl,mmusculus_gene_ensembl) rat.mart- useMart(ensembl, rnorvegicus_gene_ensembl) mouse.homolog-getHomolog(id =rat.entrezgene.ID, from.mart = rat.mart,from.type = entrezgene,to.type=entrezgene, to.mart=mouse.mart) mouse.homolog V1V2 1 24842 22059 2 24205 11789 3 24205NA 4 83502 12550 best, Steffen Kimpel, Mark William wrote: I am trying to use the getHomolog function of package biomaRt to map rat entrezgene IDs to mouse entrezgene IDs. For every ID I try, I get NULL as return, even when I know that a mouse mapping exists. For example, ratID 613226 corresponds to mouse 229706 . See my code and sessionInfo below. Anyone know what I am doing wrong? Thanks, Mark require(DBI) [1] TRUE require(biomaRt) [1] TRUE mouse.mart - useMart(ensembl,mmusculus_gene_ensembl) Checking attributes and filters ... ok rat.mart- useMart(ensembl, rnorvegicus_gene_ensembl) Checking attributes and filters ... ok rat.entrezgene.ID-613226 mouse.homolog-getHomolog(id =rat.entrezgene.ID, from.mart = rat.mart, from.type = entrezgene, + to.type=entrezgene, to.mart=mouse.mart) Warning message: getBM returns NULL. in: getHomolog(id = rat.entrezgene.ID, from.mart = rat.mart, from.type = entrezgene, sessionInfo() R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils tools methods base other attached packages: biomaRtRCurl XML DBI RWinEdtlimma affy affyio Biobase 1.8.1 0.8-0 1.2-0 0.1-12 1.7-5 2.9.1 1.12.2 1.2.0 1.12.2 Mark W. Kimpel MD Official Business Address: Department of Psychiatry Indiana University School of Medicine PR M116 Institute of Psychiatric Research 791 Union Drive Indianapolis, IN 46202 Preferred Mailing Address: 15032 Hunter Court Westfield, IN 46074 (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX [[alternative HTML version deleted]] ___ Bioconductor mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Steffen Durinck, Ph.D. Oncogenomics Section Pediatric Oncology Branch National Cancer Institute, National Institutes of Health URL: http://home.ccr.cancer.gov/oncology/oncogenomics/ Phone: 301-402-8103 Address: Advanced Technology Center, 8717 Grovemont Circle Gaithersburg, MD 20877 __ R-help
[R] help with regexpr in gsub
I have a very long vector of character strings of the format GO:0008104.ISS and need to strip off the dot and anything that follows it. There are always 10 characters before the dot. The actual characters and the number of them after the dot is variable. So, I would like to return in the format GO:0008104 . I could do this with substr and loop over the entire vector, but I thought there might be a more elegant (and faster) way to do this. I have tried gsub using regular expressions without success. The code gsub(pattern= \.*? , replacement=, x=character.vector) correctly locates the positions in the vector that contain the dot, but replaces all of the strings with . Obviously not what I want. Is there a regular expression for replacement that would accomplish what I want? Or, does R have a better way to do this? Thanks, Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with regexpr in gsub
Thanks for 6 ways to skin this cat! I am just beginning to learn about the power of regular expressions and appreciate the many examples of how they can be used in this context. This knowledge will come in handy the next time the number of characters is variable both before and after the dot. On my machine and for my particular example, however, Seth is correct in that substr is by far the fastest. I had forgotten that substr is vectorized. Below is the output of my speed trials and sessionInfo in case anyone is curious. I artificially made the go.id vector 10X its normal length to magnify differences. I did also check to verify that each solution worked as predicted, which they all did. Thanks again for your generous help, Mark length(go.ids) [1] 79750 go.ids[1:5] [1] GO:0006091.NA GO:0008104.ISS GO:0008104.ISS GO:0006091.NA GO:0006091.NAS system.time(z - gsub([.].*, , go.ids)) [1] 0.47 0.00 0.47 NA NA system.time(z - gsub('\\..+$','', go.ids)) [1] 0.56 0.00 0.56 NA NA system.time(z - gsub('([^.]+)\\..*','\\1',go.ids)) [1] 1.08 0.00 1.09 NA NA system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids)) [1] 1.03 0.00 1.03 NA NA system.time(z - sub(\\..+, , go.ids)) [1] 0.49 0.00 0.48 NA NA system.time(z - substr(go.ids, 0, 10)) [1] 0.02 0.00 0.01 NA NA sessionInfo() R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices datasets utils tools methods base other attached packages: rat2302 xlsReadWritePro qvalue affycoretools biomaRt RCurl XML GOstatsCategory 1.14.0 1.0.6 1.8.0 1.6.0 1.8.1 0.8-0 1.2-0 2.0.4 2.0.3 genefiltersurvivalKEGGRBGL annotate GO graph RWinEdt limma 1.12.0 2.301.14.11.10.0 1.12.11.14.11.12.0 1.7-5 2.9.1 affy affyio Biobase 1.12.2 1.2.01.12.2 Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 17, 2007 8:11 PM To: Seth Falcon Cc: Kimpel, Mark William; r-help@stat.math.ethz.ch Subject: Re: [R] help with regexpr in gsub On Wed, 2007-01-17 at 16:46 -0800, Seth Falcon wrote: Kimpel, Mark William [EMAIL PROTECTED] writes: I have a very long vector of character strings of the format GO:0008104.ISS and need to strip off the dot and anything that follows it. There are always 10 characters before the dot. The actual characters and the number of them after the dot is variable. So, I would like to return in the format GO:0008104 . I could do this with substr and loop over the entire vector, but I thought there might be a more elegant (and faster) way to do this. I have tried gsub using regular expressions without success. The code gsub(pattern= \.*? , replacement=, x=character.vector) I guess you want: sub(([GO:0-9]+)\\..*$, \\1, goids) [You don't need gsub here] But I don't understand why you wouldn't want to use substr. At least for me substr looks to be about 20x faster than sub for this problem... library(GO) goids = ls(GOTERM) gids = paste(goids, ISS, sep=.) gids[1:10] [1] GO:001.ISS GO:002.ISS GO:003.ISS GO:004.ISS [5] GO:006.ISS GO:007.ISS GO:009.ISS GO:010.ISS [9] GO:011.ISS GO:012.ISS system.time(z - substr(gids, 0, 10)) user system elapsed 0.008 0.000 0.007 system.time(z2 - sub(([GO:0-9]+)\\..*$, \\1, gids)) user system elapsed 0.136 0.000 0.134 I think that some of the overhead here in using sub() is due to the effective partitioning of the source vector, a more complex regex and then just returning the first element. This can be shortened to: # Note that I have 12 elements here gids [1] GO:001.ISS GO:002.ISS GO:003.ISS GO:004.ISS [5] GO:005.ISS GO:006.ISS GO:007.ISS GO:008.ISS [9] GO:009.ISS GO:010.ISS GO:011.ISS GO:012.ISS system.time(z2 - sub(\\..+, , gids)) [1] 0 0 0 0 0 z2 [1] GO:001 GO:002 GO:003 GO:004 GO:005 [6] GO:006 GO:007 GO:008 GO:009 GO:010 [11] GO:011 GO:012 Which would appear to be quicker than using substr(). HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting
Re: [R] help with regexpr in gsub
Thanks Brian, that advice may help speed up my regexp operations in the future. The computer science advice offered by those of you who are more expert is appreciated by we biologists who are primarily working more at the level of bioinformatics. Mark Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 17, 2007 11:49 PM To: Kimpel, Mark William Cc: [EMAIL PROTECTED]; Seth Falcon; r-help@stat.math.ethz.ch Subject: Re: [R] help with regexpr in gsub One thing to watch with experiments like this is that the locale will matter. Character operations will be faster in a single-byte locale (as used here) than in a variable-byte locale (and I suspect Seth and Marc used UTF-8), and the relative speeds may alter. Also, the PCRE regexps are often much faster, and 'useBytes' can be much faster with ASCII data in UTF-8. For example: # R-devel, x86_64 Linux library(GO) goids - ls(GOTERM) gids - paste(goids, ISS, sep=.) go.ids - rep(gids, 10) length(go.ids) [1] 205950 # In en_GB (single byte) system.time(z - gsub([.].*, , go.ids)) user system elapsed 1.709 0.004 1.716 system.time(z - gsub([.].*, , go.ids, perl=TRUE)) user system elapsed 0.241 0.004 0.246 system.time(z - gsub('\\..+$','', go.ids)) user system elapsed 2.254 0.018 2.286 system.time(z - gsub('([^.]+)\\..*','\\1',go.ids)) user system elapsed 2.890 0.002 2.895 system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids)) user system elapsed 2.716 0.002 2.721 system.time(z - sub(\\..+, , go.ids)) user system elapsed 1.724 0.001 1.725 system.time(z - substr(go.ids, 0, 10)) user system elapsed 0.084 0.000 0.084 # in en_GB.utf8 system.time(z - gsub([.].*, , go.ids)) user system elapsed 1.689 0.020 1.712 system.time(z - gsub([.].*, , go.ids, perl=TRUE)) user system elapsed 0.718 0.017 0.736 system.time(z - gsub([.].*, , go.ids, perl=TRUE, useByte=TRUE)) user system elapsed 0.243 0.001 0.244 system.time(z - gsub('\\..+$','', go.ids)) user system elapsed 2.509 0.024 2.537 system.time(z - gsub('([^.]+)\\..*','\\1',go.ids)) user system elapsed 3.772 0.004 3.779 system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids)) user system elapsed 4.088 0.007 4.099 system.time(z - sub(\\..+, , go.ids)) user system elapsed 1.920 0.004 1.927 system.time(z - substr(go.ids, 0, 10)) user system elapsed 0.096 0.002 0.098 substr still wins, but by a much smaller margin. On Wed, 17 Jan 2007, Kimpel, Mark William wrote: Thanks for 6 ways to skin this cat! I am just beginning to learn about the power of regular expressions and appreciate the many examples of how they can be used in this context. This knowledge will come in handy the next time the number of characters is variable both before and after the dot. On my machine and for my particular example, however, Seth is correct in that substr is by far the fastest. I had forgotten that substr is vectorized. Below is the output of my speed trials and sessionInfo in case anyone is curious. I artificially made the go.id vector 10X its normal length to magnify differences. I did also check to verify that each solution worked as predicted, which they all did. Thanks again for your generous help, Mark length(go.ids) [1] 79750 go.ids[1:5] [1] GO:0006091.NA GO:0008104.ISS GO:0008104.ISS GO:0006091.NA GO:0006091.NAS system.time(z - gsub([.].*, , go.ids)) [1] 0.47 0.00 0.47 NA NA system.time(z - gsub('\\..+$','', go.ids)) [1] 0.56 0.00 0.56 NA NA system.time(z - gsub('([^.]+)\\..*','\\1',go.ids)) [1] 1.08 0.00 1.09 NA NA system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids)) [1] 1.03 0.00 1.03 NA NA system.time(z - sub(\\..+, , go.ids)) [1] 0.49 0.00 0.48 NA NA system.time(z - substr(go.ids, 0, 10)) [1] 0.02 0.00 0.01 NA NA sessionInfo() R version 2.4.1 (2006-12-18) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices datasets utils tools methods base other attached packages: rat2302 xlsReadWritePro qvalue affycoretools biomaRt RCurl XML GOstats Category 1.14.0 1.0.6 1.8.0 1.6.0 1.8.1 0.8-0 1.2-0 2.0.4 2.0.3 genefiltersurvivalKEGGRBGL annotate GO graph RWinEdt limma 1.12.0 2.301.14.11.10.0 1.12.11.14.11.12.0 1.7-5 2.9.1 affy affyio
[R] help with plot of prcomp object
I need to plot a prcomp object from package stats with custom symbols suitable for BW publication. My boss specifically wants filled and unfilled square, triangle, circle, inverted triangle, diamond to represent 5 brain regions of 2 types of rat. Can I specify these as a parameter? Thanks, Mark Mark W. Kimpel MD Official Business Address: Department of Psychiatry Indiana University School of Medicine PR M116 Institute of Psychiatric Research 791 Union Drive Indianapolis, IN 46202 Preferred Mailing Address: 15032 Hunter Court Westfield, IN 46074 (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with postscript output of R-devel on Windows
I have developed a problem with the postscript output of plot on Windows. My code still works properly with R 2.3 but, with R 2.4, the white text on red background does not show up. It does, however, show up when output is sent to the screen. Below is my code and sessionInfo. R version 2.4.0 Under development (unstable) (2006-08-29 r39012) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools methods stats graphics grDevices utils datasets [9] base other attached packages: Rgraphviz geneplotter XML GOstatsCategoryhgu95av2 KEGGmulttest xtable 1.11.91.11.80.99-8 1.6.0 1.4.11.12.0 1.8.11.11.2 1.3-2 RBGLannotate GO graph Ruuid limma genefiltersurvival rat2302 1.8.11.11.5 1.6.5 1.11.131.11.2 2.7.9 1.11.8 2.281.12.0 affy affyio Biobase 1.11.6 1.1.8 1.11.29 fileName-paste(experiment, contrast, FDR, FDR, Graph, ps, sep=.) postscript(file=fileName, paper=special,width=width, height=height) #set up graphics device plot(result.gN, layout.param, nodeAttrs = nAttrs, edgeAttrs = eAttrs, main=paste(paste(Experiment:, experiment, ; Contrast:, contrast,; FDR:, FDR, sep=), paste(Min. connections ==, min.edges, Min. citations per connection ==, min.cites, Additional search criteria:, termAdditional, sep= ), sep=)) Thanks, Mark Mark W. Kimpel MD __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with postscript output of R-devel on Windows
I apologize for my previous confusing example. Below is some sample code, taken directly from the image help file, that reproduces a postscript problem. This now happens with both R 2.3.1 and R 2.4 What I get appears to be output of only certain postscript objects, to use an Adobe term. When I use the R GUI menu to save as, jpeg and pdf files save correctly, but the postscript file does not. I am not getting any axis labels or topo labels. This is true whether I import the PS file into either Photoshop or Illustrator. Thanks, Mark x - 10*(1:nrow(volcano)) y - 10*(1:ncol(volcano)) image(x, y, volcano, col = terrain.colors(100), axes = FALSE) contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = peru) axis(1, at = seq(100, 800, by = 100)) axis(2, at = seq(100, 600, by = 100)) box() title(main = Maunga Whau Volcano, font.main = 4) sessionInfo() Version 2.3.1 (2006-06-01) i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base Mark W. Kimpel MD (317) 490-5129 Work, Mobile (317) 663-0513 Home (no voice mail please) 1-(317)-536-2730 FAX -Original Message- From: Duncan Murdoch [mailto:[EMAIL PROTECTED] Sent: Thursday, August 31, 2006 12:52 PM To: Kimpel, Mark William Cc: r-help@stat.math.ethz.ch Subject: Re: [R] problem with postscript output of R-devel on Windows On 8/31/2006 11:27 AM, Kimpel, Mark William wrote: I have developed a problem with the postscript output of plot on Windows. My code still works properly with R 2.3 but, with R 2.4, the white text on red background does not show up. It does, however, show up when output is sent to the screen. Below is my code and sessionInfo. R version 2.4.0 Under development (unstable) (2006-08-29 r39012) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools methods stats graphics grDevices utils datasets [9] base other attached packages: Rgraphviz geneplotter XML GOstatsCategory hgu95av2KEGGmulttest xtable 1.11.91.11.80.99-8 1.6.0 1.4.1 1.12.0 1.8.11.11.2 1.3-2 RBGLannotate GO graph Ruuid limma genefiltersurvival rat2302 1.8.11.11.5 1.6.5 1.11.131.11.2 2.7.91.11.8 2.281.12.0 affy affyio Biobase 1.11.6 1.1.8 1.11.29 fileName-paste(experiment, contrast, FDR, FDR, Graph, ps, sep=.) postscript(file=fileName, paper=special,width=width, height=height) #set up graphics device plot(result.gN, layout.param, nodeAttrs = nAttrs, edgeAttrs = eAttrs, main=paste(paste(Experiment:, experiment, ; Contrast:, contrast,; FDR:, FDR, sep=), paste(Min. connections ==, min.edges, Min. citations per connection ==, min.cites, Additional search criteria:, termAdditional, sep= ), sep=)) Could you put together a reproducible example to illustrate the problem? We don't have all the variables used in that example. I think you should be able to do it with just base packages attached; if not, it's likely a problem with one of the contributed packages, rather than with R. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R on a supercomputer
I am using R with Bioconductor to perform analyses on large datasets using bootstrap methods. In an attempt to speed up my work, I have inquired about using our local supercomputer and asked the administrator if he thought R would run faster on our parallel network. I received the following reply: The second benefit is that the processors have large caches. Briefly, everything is loaded into cache before going into the processor. With large caches, there is less movement of data between memory and cache, and this can save quite a bit of time. Indeed, when programmers optimize code they usually think about how to do things to keep data in cache as long as possible. Whether you would receive any benefit from larger cache depends on how R is written. If it's written such that data remain in cache, the speed-up could be considerable, but I have no way to predict it. My question is, is R written such that data remain in cache? Thanks, Mark W. Kimpel MD Indiana University School of Medicine [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html