[R] possible spam alert

2007-01-31 Thread Kimpel, Mark William
The last two times I have originated message threads on R or
Bioconductor I have received the message included below from someone
named Patrick Connolly. Both times I was the originator of the message
thread and used what I thought was a unique subject line that explained
as best I could what my question was. Patrick seems to be implying that
I am abusing the R and BioC help newsgroups in this fashion. 

When I emailed him to give me a specific example, he did not reply. The
most recent thread that he seems concerned about was to the R list and
was entitled regexpr and parsing question . I believe the previous
post of mine that he had problems with was to the BioC list but I can't
remember its subject.

Is this spam?

If I am doing this correctly, you should see the subject possible spam
alert in the subject header of THIS message.

Would the moderators of the lists please check and see if I am doing
some wrong and, if not, inform Mr. Connolly that I am not. If others
have received this message in error, it is possible it is spam and users
should be alerted.

Thanks,

Mark

Mark W. Kimpel MD 

 

 

Official Business Address:

 

Department of Psychiatry

Indiana University School of Medicine

PR M116

Institute of Psychiatric Research

791 Union Drive

Indianapolis, IN 46202

 
This is a request to anyone who starts a new subject to begin with a new
message and NOT reply to an existing one.  If your mail client is any
good, it's very simple to set up an alias (mine is simply 'r') so that
the tedious task of typing 'r-help@stat.math.ethz.ch' is unnecessary and
it's quicker than scrolling through an address book.
It's also quicker than deleting the previous subject.

Most mornings, I have over a screenful of messages mostly from R-help
and it's very useful to have them threaded.  However, the usefulness of
threading is lost when posters reply to a message and then change the
subject instead of creating a new message.

People who don't have a mail client that can display email in threads
are probably unaware that this sort of thing can happen in ones that do:


37 N   25 Jan Luis Silva  ( 34) [R] plot/screen
38 N   25 Jan Uwe Ligges  ( 55) `- 
39 N   25 Jan Fernando Henrique Ferra ( 20) [R] Plotting coloured
histograms
-  40 N   26 Jan Mohamed A. Kerasha  ( 12) |-[R] Distributions.
41 N   26 Jan [EMAIL PROTECTED]   ( 26) | |-
42 26 Jan Qin Xin (  9) | `-[R] how could I add
legends
43 27 Jan Ko-Kang Kevin Wang  ( 31) |   `-
44 N   26 Jan Remigijus Lapinskas ( 32) |-Re: [R] Plotting
coloured his
45 N   26 Jan Damon Wischik   (125) `- 
46 N   25 Jan [EMAIL PROTECTED]   ( 10) [R] plotting primatives,
ellipse
47 N   25 Jan Uwe Ligges  ( 19) `-   


As Martin Maechler explained some time ago, it also screws up the
archives for a similar reason.

Your cooperation will be greatly appreciated.

best

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

   ___Patrick Connolly   
 {~._.~} Great minds discuss ideas
 _( Y )_Middle minds discuss events 
(:_~*~_:)Small minds discuss people  
 (_)-(_)   . Anon
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] possible spam alert

2007-01-31 Thread Kimpel, Mark William
Peter,

Thanks you for your explanation, I had taken Mr. Connolly's message to
me to imply that I was not changing the subject line. I use MS Outlook
2007 and, unless I am just not seeing it, Outlook does not normally
display the in reply to header, I was under the mistaken impression
that that was what the Subject line was for. See, for example, the
header to your message to me below. Outlook will, however, sort messages
by Subject, and that is what I thought was meant by threading.

Well, I learned something today and apologize for any inconvenience my
posts may have caused.

BTW, I use Outlook because it is supported by my university server and
will synch my appointments and contacts with my PDA, which runs Windows
CE. If anyone has a suggestion for me of a better email program that
will provide proper threading AND work with a MS email server and synch
with Windows CE, I'd love to hear it.

Thanks again,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Peter Dalgaard [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 31, 2007 6:25 PM
To: Kimpel, Mark William
Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Subject: Re: [R] possible spam alert

Kimpel, Mark William wrote:
 The last two times I have originated message threads on R or
 Bioconductor I have received the message included below from someone
 named Patrick Connolly. Both times I was the originator of the message
 thread and used what I thought was a unique subject line that
explained
 as best I could what my question was. Patrick seems to be implying
that
 I am abusing the R and BioC help newsgroups in this fashion. 

 When I emailed him to give me a specific example, he did not reply.
The
 most recent thread that he seems concerned about was to the R list and
 was entitled regexpr and parsing question . I believe the previous
 post of mine that he had problems with was to the BioC list but I
can't
 remember its subject.

 Is this spam?
   
No. Breach of netiquette, yes.

The message in question starts a new thread, yet contains an 
In-Reply-To: header line, which presumably means that you started 
writing the message as a reply to something completely unrelated, 
specifically: Re: [R] change plotting symbol for groups in trellis 
graph. You should not do that, unless you know how to remove the 
In-Reply-To line (and this is not obvious in many mail clients); 
changing the subject is not sufficient.
 If I am doing this correctly, you should see the subject possible
spam
 alert in the subject header of THIS message.

 Would the moderators of the lists please check and see if I am doing
 some wrong and, if not, inform Mr. Connolly that I am not. If others
 have received this message in error, it is possible it is spam and
users
 should be alerted.

 Thanks,

 Mark

 Mark W. Kimpel MD 

  

  

 Official Business Address:

  

 Department of Psychiatry

 Indiana University School of Medicine

 PR M116

 Institute of Psychiatric Research

 791 Union Drive

 Indianapolis, IN 46202

  
 This is a request to anyone who starts a new subject to begin with a
new
 message and NOT reply to an existing one.  If your mail client is any
 good, it's very simple to set up an alias (mine is simply 'r') so that
 the tedious task of typing 'r-help@stat.math.ethz.ch' is unnecessary
and
 it's quicker than scrolling through an address book.
 It's also quicker than deleting the previous subject.

 Most mornings, I have over a screenful of messages mostly from R-help
 and it's very useful to have them threaded.  However, the usefulness
of
 threading is lost when posters reply to a message and then change the
 subject instead of creating a new message.

 People who don't have a mail client that can display email in threads
 are probably unaware that this sort of thing can happen in ones that
do:


 37 N   25 Jan Luis Silva  ( 34) [R] plot/screen
 38 N   25 Jan Uwe Ligges  ( 55) `- 
 39 N   25 Jan Fernando Henrique Ferra ( 20) [R] Plotting coloured
 histograms
 -  40 N   26 Jan Mohamed A. Kerasha  ( 12) |-[R] Distributions.
 41 N   26 Jan [EMAIL PROTECTED]   ( 26) | |-
 42 26 Jan Qin Xin (  9) | `-[R] how could I
add
 legends
 43 27 Jan Ko-Kang Kevin Wang  ( 31) |   `-
 44 N   26 Jan Remigijus Lapinskas ( 32) |-Re: [R] Plotting
 coloured his
 45 N   26 Jan Damon Wischik   (125) `- 
 46 N   25 Jan [EMAIL PROTECTED]   ( 10) [R] plotting
primatives,
 ellipse
 47 N   25 Jan Uwe Ligges  ( 19) `-   


 As Martin Maechler explained some time ago, it also screws up the
 archives for a similar reason.

 Your cooperation will be greatly appreciated.

 best



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org

[R] Outlook does threading

2007-01-31 Thread Kimpel, Mark William
See below for Bert Gunter's off list reply to me (which I do
appreciate). I'm putting it back on the list because it seems there is
still confusion regarding the difference between threading and sorting
by subject. I thought the example I will give below will serve as
instructional for other Outlook users who may be similarly confused as I
was (am?). 

Per Bert's instructions, I just set up my inbox to sort by subject. I
sent one email to myself with the subject test1 and then replied to it
without changing the subject. The reply correctly went to test1 in the
inbox sorter. I then changed the subject heading in the test1 reply to
test2 and sent it to myself. This time Outlook re-categorized it and
put it in a separate compartment in the view called test2.

If Outlook can do threading the way the R mail server does, I don't
think this is the way to do it.

Unless someone has an idea of how to correctly set up Outlook to do
threading in the manner that the R mail server does, I think the message
for us Outlook users is to just create, from scratch, a new message when
initiating a new subject.

Thanks for all your help. 

Mark

-Original Message-
From: Bert Gunter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 31, 2007 7:03 PM
To: Kimpel, Mark William
Subject: Outlook does threading

 Mark:

No need to bother the R list with this. Outlook does threading. Just
sort on
Subject in the viewer.

Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
650-467-7374

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kimpel, Mark
William
Sent: Wednesday, January 31, 2007 3:36 PM
To: Peter Dalgaard
Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] possible spam alert

Peter,

Thanks you for your explanation, I had taken Mr. Connolly's message to
me to imply that I was not changing the subject line. I use MS Outlook
2007 and, unless I am just not seeing it, Outlook does not normally
display the in reply to header, I was under the mistaken impression
that that was what the Subject line was for. See, for example, the
header to your message to me below. Outlook will, however, sort messages
by Subject, and that is what I thought was meant by threading.

Well, I learned something today and apologize for any inconvenience my
posts may have caused.

BTW, I use Outlook because it is supported by my university server and
will synch my appointments and contacts with my PDA, which runs Windows
CE. If anyone has a suggestion for me of a better email program that
will provide proper threading AND work with a MS email server and synch
with Windows CE, I'd love to hear it.

Thanks again,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Peter Dalgaard [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 31, 2007 6:25 PM
To: Kimpel, Mark William
Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Subject: Re: [R] possible spam alert

Kimpel, Mark William wrote:
 The last two times I have originated message threads on R or
 Bioconductor I have received the message included below from someone
 named Patrick Connolly. Both times I was the originator of the message
 thread and used what I thought was a unique subject line that
explained
 as best I could what my question was. Patrick seems to be implying
that
 I am abusing the R and BioC help newsgroups in this fashion. 

 When I emailed him to give me a specific example, he did not reply.
The
 most recent thread that he seems concerned about was to the R list and
 was entitled regexpr and parsing question . I believe the previous
 post of mine that he had problems with was to the BioC list but I
can't
 remember its subject.

 Is this spam?
   
No. Breach of netiquette, yes.

The message in question starts a new thread, yet contains an 
In-Reply-To: header line, which presumably means that you started 
writing the message as a reply to something completely unrelated, 
specifically: Re: [R] change plotting symbol for groups in trellis 
graph. You should not do that, unless you know how to remove the 
In-Reply-To line (and this is not obvious in many mail clients); 
changing the subject is not sufficient.
 If I am doing this correctly, you should see the subject possible
spam
 alert in the subject header of THIS message.

 Would the moderators of the lists please check and see if I am doing
 some wrong and, if not, inform Mr. Connolly that I am not. If others
 have received this message in error, it is possible it is spam and
users
 should be alerted.

 Thanks,

 Mark

 Mark W. Kimpel MD 

  

  

 Official Business Address:

  

 Department of Psychiatry

 Indiana University School of Medicine

 PR M116

 Institute of Psychiatric Research

 791 Union Drive

 Indianapolis, IN 46202

  
 This is a request to anyone who starts a new subject to begin with a
new
 message and NOT reply to an existing one

Re: [R] Outlook does threading

2007-01-31 Thread Kimpel, Mark William
Tony,

I went to the MS link that you suggested (see below) and it indeed says
that The Arrange by Conversation arrangement shows your e-mail items
grouped by message subject or 'thread.' Instead of arranging by
subject, I arranged my view by conversation and got exactly the same
result that I had gotten when viewing by subject, i.e. MS Outlook looks
only at the subject line when deciding on threads, conversations,
subjects, or whatever you want to call it. I am, BTW, using Outlook 2007
on Windows XP SP2 and cannot vouch for Outlooks behavior in other
versions or configurations.

So, no matter what I do, it seems impossible for me to duplicate in
Outlook what Gabor pointed out to me when he said,

 You can see how it looks to most readers by viewing it on gmane:

  http://thread.gmane.org/gmane.comp.lang.r.general/78065

Note that even though the subject has been changed its still listed as a
child of another message rather than the start of a new thread. I did
check and of course Gabor is correct.

This subject does need to be put to bed. I have reread the posting guide
for R-help at http://www.r-project.org/
and it does indeed say Do please create a new email message when
posting to the list rather than replying to a previous message and
simply changing the subject line! This allows sensible threading in the
mailing list archives (and many users e-mail readers).

To be honest, I probably read this 3 years ago when I subscribed to the
list but, because my email reader doesn't behave this way, I just forgot
about it. I email so many people during the day that I frequently hit
reply to a previous message and then change the subject if appropriate.

So, not to justify my behavior, but would it be possible for the R mail
server to somehow check and see if the subject heading on a thread has
been changed and then return-to-sender with a standard message
explaining everything we have been through tonight? If Patrick Connolly
sees this enough to have a standard message he sends out and Martin
Maechler commented on it in the past, perhaps other Windows users of
Outlook are doing the same thing I did. Rest assured that I have learned
my lesson and won't repeat the same mistake, but if such a filter was
put in place at the R mail server level, perhaps it would save the
non-Outlook users a lot of aggravation.

These exchanges have been edifying and I thank all for their patience
and explanations.

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Tony Plate [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 31, 2007 8:08 PM
To: Kimpel, Mark William
Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] Outlook does threading

Your final paragraph has the take-home message for everyone (not just MS

Outlook users): just create, from scratch, a new message when 
initiating a new subject.

Viewing threads can be completely different to sorting based on the 
subject line.  Your initial post with the subject regexpr and parsing 
question was in fact a reply to the message from Gabor Grothendick in 
the thread Re: [R] change plotting symbol for groups in trellis graph.

   (I can see this by looking at the header information: I see a 
In-reply-to: header item.)

When I view threads in the Thunderbird mail reader, your post and 
replies with the subject regexpr and parsing question do in fact show 
up under the thread in which Gabor's message appeared, not in their own 
thread.

According to 
http://office.microsoft.com/en-us/outlook/HA011356671033.aspx, one can 
view threads in Outlook by selecting View-Arrange By-Conversation.

Hope this helps (in case the horse was not thoroughly dead already.)

-- Tony Plate

Kimpel, Mark William wrote:
 See below for Bert Gunter's off list reply to me (which I do
 appreciate). I'm putting it back on the list because it seems there is
 still confusion regarding the difference between threading and sorting
 by subject. I thought the example I will give below will serve as
 instructional for other Outlook users who may be similarly confused as
I
 was (am?). 
 
 Per Bert's instructions, I just set up my inbox to sort by subject. I
 sent one email to myself with the subject test1 and then replied to
it
 without changing the subject. The reply correctly went to test1 in
the
 inbox sorter. I then changed the subject heading in the test1 reply to
 test2 and sent it to myself. This time Outlook re-categorized it and
 put it in a separate compartment in the view called test2.
 
 If Outlook can do threading the way the R mail server does, I don't
 think this is the way to do it.
 
 Unless someone has an idea of how to correctly set up Outlook to do
 threading in the manner that the R mail server does, I think the
message
 for us Outlook users is to just create, from scratch, a new message
when
 initiating a new subject.
 
 Thanks for all your help. 
 
 Mark
 
 -Original

[R] regexpr and parsing question

2007-01-30 Thread Kimpel, Mark William
The main problem I am trying to solve it this:

I am importing a tab delimited file whose first line contains only one
column, which is a descriptor of the form col_1 col_2 col_3, i.e. the
colnames are not tab delineated but are separated by whitespace. I would
like to parse this first line and make such that it becomes the colnames
of the rest of the file, which I am reading into R using read.delim().
The file is so huge that I must do this in R.

My first question is this: What is the best way to accomplish what I
want to do?

My other questions revolve around some failed attempts on my part to
solve the problem on my own using regular expressions. I thought that
perhaps I could change the first line to c(col_1, col_2, col_3)
using gsub. I was having trouble figuring out how R uses the backslash
character because I know that sometimes the backslash one would use in
Perl needs to be a double backslash in R.

Here is a sample of what I tried and what I got:

a-col_1 col_2 col_3

 gsub(\\s,   , a) 

[1] col_1 col_2 col_3

 gsub(\\s, \\s , a) 

[1] col_1scol_2scol_3

As you can see, it looks like R is taking a regular expression for
pattern, but not taking it for replacement. Why is this?

Assuming that I did want to solve my original problem with gsub and then
turn the string into an R object, how would I get gsub to return
c(col_1, col_2, col_3) using my original string?

Finally, is there a way to declare a string as a regular expression so
that R sees it the same way other languages, such as Perl do, i.e. make
the backslash be interpreted the same way? For someone who is just
learning regular expressions as I am, it is very frustrating to read
about them in references and then have to translate what I've learned
into R syntax. I was thinking that instead of enclosing the string in
, one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
use I() in formulae.

These are a bunch of questions, but obviously I have a lot to learn!

Thanks,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [BioC] problem with biomaRt getHomolog function

2007-01-26 Thread Kimpel, Mark William
Steffen,

When the new biomaRt tries to load it errors out because I do not have
RMySQL installed. There is not a Windows binary for RMySQL and it does
contain C code that I do not know how to build.

I do not use the MySQL option in biomaRt. Does RMySQL need to be a
required dependency? Below is my screen output and sessionINfo.

require(biomaRt)
Loading required package: biomaRt
Loading required package: RMySQL
Error: package 'RMySQL' could not be loaded
In addition: Warning message:
there is no package called 'RMySQL' in: library(pkg, character.only =
TRUE, logical = TRUE, lib.loc = lib.loc) 
 sessionInfo()
R version 2.5.0 Under development (unstable) (2007-01-19 r40528) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices datasets  utils tools
methods   base 

other attached packages:
  DBI limma  affyaffyio   Biobase 
 0.1-12   2.9.8 1.13.14   1.3.3 1.13.34 


Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Steffen Durinck [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 26, 2007 9:24 AM
To: Kimpel, Mark William
Cc: [EMAIL PROTECTED]
Subject: Re: [BioC] problem with biomaRt getHomolog function

Hi Mark,

I think the rat entrezgene id 613226 is a recently added entrezgene id 
and is not yet available in Ensembl.  Ensembl updates every two months 
and the last update of entrezgene id 613226 appears to be December 26 of

2006.  So this might be the reason.
Also I would suggest you use the developmental version of biomaRt 
(biomaRt_1.9.15) to do getHomolog queries.  A recent change in the 
BioMart suite enables the biomaRt package to retrieve both the id you 
use to query and the ids in the result of the query. 

Here's an example:

rat.entrezgene.ID=c(24842,83502,24205)
mouse.mart - useMart(ensembl,mmusculus_gene_ensembl)
rat.mart- useMart(ensembl, rnorvegicus_gene_ensembl)
mouse.homolog-getHomolog(id =rat.entrezgene.ID, from.mart = 
rat.mart,from.type = entrezgene,to.type=entrezgene,
to.mart=mouse.mart)

  mouse.homolog
 V1V2
1 24842 22059
2 24205 11789
3 24205NA
4 83502 12550

best,
Steffen

Kimpel, Mark William wrote:
 I am trying to use the getHomolog function of package biomaRt to map
 rat entrezgene IDs to mouse entrezgene IDs. For every ID I try, I get
 NULL as return, even when I know that a mouse mapping exists.

  

 For example, ratID 613226 corresponds to mouse 229706 .

  

 See my code and sessionInfo below. Anyone know what I am doing wrong?

  

 Thanks, Mark

  

   
 require(DBI)
 

 [1] TRUE

   
 require(biomaRt)
 

 [1] TRUE

   

   
 mouse.mart - useMart(ensembl,mmusculus_gene_ensembl)
 

 Checking attributes and filters ... ok

   

   
 rat.mart- useMart(ensembl, rnorvegicus_gene_ensembl)
 

 Checking attributes and filters ... ok

   

   
 rat.entrezgene.ID-613226
 

   
 
 

   
 mouse.homolog-getHomolog(id =rat.entrezgene.ID, from.mart =
rat.mart,
 
 from.type = entrezgene, 

 + to.type=entrezgene, to.mart=mouse.mart)

 Warning message:

 getBM returns NULL. in: getHomolog(id = rat.entrezgene.ID, from.mart =
 rat.mart, from.type = entrezgene,  

   

  

   
 sessionInfo()
 

 R version 2.4.1 (2006-12-18) 

 i386-pc-mingw32 

  

 locale:

 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

  

 attached base packages:

 [1] stats graphics  grDevices datasets  utils
tools
 methods   base 

  

 other attached packages:

  biomaRtRCurl  XML  DBI  RWinEdtlimma affy
affyio
 Biobase 

  1.8.1  0.8-0  1.2-0 0.1-12  1.7-5  2.9.1 1.12.2
1.2.0
 1.12.2

  

 Mark W. Kimpel MD 

  

  

 Official Business Address:

  

 Department of Psychiatry

 Indiana University School of Medicine

 PR M116

 Institute of Psychiatric Research

 791 Union Drive

 Indianapolis, IN 46202

  

 Preferred Mailing Address:

  

 15032 Hunter Court

 Westfield, IN  46074

  

 (317) 490-5129 Work,  Mobile

  

 (317) 663-0513 Home (no voice mail please)

 1-(317)-536-2730 FAX


   [[alternative HTML version deleted]]

 ___
 Bioconductor mailing list
 [EMAIL PROTECTED]
 https://stat.ethz.ch/mailman/listinfo/bioconductor
 Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877

__
R-help

[R] help with regexpr in gsub

2007-01-17 Thread Kimpel, Mark William
I have a very long vector of character strings of the format
GO:0008104.ISS and need to strip off the dot and anything that follows
it. There are always 10 characters before the dot. The actual characters
and the number of them after the dot is variable.

So, I would like to return in the format GO:0008104 . I could do this
with substr and loop over the entire vector, but I thought there might
be a more elegant (and faster) way to do this.

I have tried gsub using regular expressions without success. The code 

gsub(pattern= \.*? , replacement=, x=character.vector)

correctly locates the positions in the vector that contain the dot, but
replaces all of the strings with . Obviously not what I want. Is there
a regular expression for replacement that would accomplish what I want?

Or, does R have a better way to do this?

Thanks,

Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with regexpr in gsub

2007-01-17 Thread Kimpel, Mark William
Thanks for 6 ways to skin this cat! I am just beginning to learn about
the power of regular expressions and appreciate the many examples of how
they can be used in this context. This knowledge will come in handy the
next time the number of characters is variable both before and after the
dot. On my machine and for my particular example, however, Seth is
correct in that substr is by far the fastest. I had forgotten that
substr is vectorized.

Below is the output of my speed trials and sessionInfo in case anyone is
curious. I artificially made the go.id vector 10X its normal length to
magnify differences. I did also check to verify that each solution
worked as predicted, which they all did.

Thanks again for your generous help, Mark

length(go.ids)
[1] 79750
 go.ids[1:5]
[1] GO:0006091.NA  GO:0008104.ISS GO:0008104.ISS GO:0006091.NA
GO:0006091.NAS
 system.time(z - gsub([.].*, , go.ids))
[1] 0.47 0.00 0.47   NA   NA
 system.time(z - gsub('\\..+$','', go.ids))
[1] 0.56 0.00 0.56   NA   NA
 system.time(z - gsub('([^.]+)\\..*','\\1',go.ids))
[1] 1.08 0.00 1.09   NA   NA
 system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids))
[1] 1.03 0.00 1.03   NA   NA
 system.time(z - sub(\\..+, , go.ids))
[1] 0.49 0.00 0.48   NA   NA
 system.time(z - substr(go.ids, 0, 10))
[1] 0.02 0.00 0.01   NA   NA
 sessionInfo()
R version 2.4.1 (2006-12-18) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices datasets  utils
tools methods   base 

other attached packages:
rat2302 xlsReadWritePro  qvalue   affycoretools
biomaRt   RCurl XML GOstatsCategory 
   1.14.0 1.0.6 1.8.0 1.6.0
1.8.1 0.8-0 1.2-0 2.0.4 2.0.3 
 genefiltersurvivalKEGGRBGL
annotate  GO   graph RWinEdt   limma

   1.12.0  2.301.14.11.10.0
1.12.11.14.11.12.0 1.7-5 2.9.1

   affy  affyio Biobase 
   1.12.2 1.2.01.12.2

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Marc Schwartz [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 17, 2007 8:11 PM
To: Seth Falcon
Cc: Kimpel, Mark William; r-help@stat.math.ethz.ch
Subject: Re: [R] help with regexpr in gsub

On Wed, 2007-01-17 at 16:46 -0800, Seth Falcon wrote:
 Kimpel, Mark William [EMAIL PROTECTED] writes:
 
  I have a very long vector of character strings of the format
  GO:0008104.ISS and need to strip off the dot and anything that
follows
  it. There are always 10 characters before the dot. The actual
characters
  and the number of them after the dot is variable.
 
  So, I would like to return in the format GO:0008104 . I could do
this
  with substr and loop over the entire vector, but I thought there
might
  be a more elegant (and faster) way to do this.
 
  I have tried gsub using regular expressions without success. The
code 
 
  gsub(pattern= \.*? , replacement=, x=character.vector)
 
 I guess you want:
 
 sub(([GO:0-9]+)\\..*$, \\1, goids)
 
 [You don't need gsub here]
 
 But I don't understand why you wouldn't want to use substr.  At least
 for me substr looks to be about 20x faster than sub for this
 problem...
 
 
library(GO)
goids = ls(GOTERM)
gids = paste(goids, ISS, sep=.)
gids[1:10]
[1] GO:001.ISS GO:002.ISS GO:003.ISS
GO:004.ISS
[5] GO:006.ISS GO:007.ISS GO:009.ISS
GO:010.ISS
[9] GO:011.ISS GO:012.ISS
   
system.time(z - substr(gids, 0, 10))
  user  system elapsed 
 0.008   0.000   0.007 
system.time(z2 - sub(([GO:0-9]+)\\..*$, \\1, gids))
  user  system elapsed 
 0.136   0.000   0.134 

I think that some of the overhead here in using sub() is due to the
effective partitioning of the source vector, a more complex regex and
then just returning the first element.

This can be shortened to:

# Note that I have 12 elements here
 gids
 [1] GO:001.ISS GO:002.ISS GO:003.ISS GO:004.ISS
 [5] GO:005.ISS GO:006.ISS GO:007.ISS GO:008.ISS
 [9] GO:009.ISS GO:010.ISS GO:011.ISS GO:012.ISS

 system.time(z2 - sub(\\..+, , gids))
[1] 0 0 0 0 0

 z2
 [1] GO:001 GO:002 GO:003 GO:004 GO:005
 [6] GO:006 GO:007 GO:008 GO:009 GO:010
[11] GO:011 GO:012


Which would appear to be quicker than using substr().

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting

Re: [R] help with regexpr in gsub

2007-01-17 Thread Kimpel, Mark William
Thanks Brian, that advice may help speed up my regexp operations in the
future. The computer science advice offered by those of you who are more
expert is appreciated by we biologists who are primarily working more at
the level of bioinformatics. Mark

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 17, 2007 11:49 PM
To: Kimpel, Mark William
Cc: [EMAIL PROTECTED]; Seth Falcon; r-help@stat.math.ethz.ch
Subject: Re: [R] help with regexpr in gsub

One thing to watch with experiments like this is that the locale will 
matter.  Character operations will be faster in a single-byte locale (as

used here) than in a variable-byte locale (and I suspect Seth and Marc 
used UTF-8), and the relative speeds may alter.  Also, the PCRE regexps 
are often much faster, and 'useBytes' can be much faster with ASCII data

in UTF-8.

For example:

# R-devel, x86_64 Linux
library(GO)
goids - ls(GOTERM)
gids - paste(goids, ISS, sep=.)
go.ids - rep(gids, 10)
 length(go.ids)
[1] 205950

# In en_GB (single byte)

 system.time(z - gsub([.].*, , go.ids))
user  system elapsed
   1.709   0.004   1.716
 system.time(z - gsub([.].*, , go.ids, perl=TRUE))
user  system elapsed
   0.241   0.004   0.246

 system.time(z - gsub('\\..+$','', go.ids))
user  system elapsed
   2.254   0.018   2.286
 system.time(z - gsub('([^.]+)\\..*','\\1',go.ids))
user  system elapsed
   2.890   0.002   2.895
 system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids))
user  system elapsed
   2.716   0.002   2.721
 system.time(z - sub(\\..+, , go.ids))
user  system elapsed
   1.724   0.001   1.725
 system.time(z - substr(go.ids, 0, 10))
user  system elapsed
   0.084   0.000   0.084

# in en_GB.utf8

 system.time(z - gsub([.].*, , go.ids))
user  system elapsed
   1.689   0.020   1.712
 system.time(z - gsub([.].*, , go.ids, perl=TRUE))
user  system elapsed
   0.718   0.017   0.736
 system.time(z - gsub([.].*, , go.ids, perl=TRUE, useByte=TRUE))
user  system elapsed
   0.243   0.001   0.244

 system.time(z - gsub('\\..+$','', go.ids))
user  system elapsed
   2.509   0.024   2.537
 system.time(z - gsub('([^.]+)\\..*','\\1',go.ids))
user  system elapsed
   3.772   0.004   3.779
 system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids))
user  system elapsed
   4.088   0.007   4.099
 system.time(z - sub(\\..+, , go.ids))
user  system elapsed
   1.920   0.004   1.927
 system.time(z - substr(go.ids, 0, 10))
user  system elapsed
   0.096   0.002   0.098

substr still wins, but by a much smaller margin.


On Wed, 17 Jan 2007, Kimpel, Mark William wrote:

 Thanks for 6 ways to skin this cat! I am just beginning to learn about
 the power of regular expressions and appreciate the many examples of
how
 they can be used in this context. This knowledge will come in handy
the
 next time the number of characters is variable both before and after
the
 dot. On my machine and for my particular example, however, Seth is
 correct in that substr is by far the fastest. I had forgotten that
 substr is vectorized.

 Below is the output of my speed trials and sessionInfo in case anyone
is
 curious. I artificially made the go.id vector 10X its normal length to
 magnify differences. I did also check to verify that each solution
 worked as predicted, which they all did.

 Thanks again for your generous help, Mark

 length(go.ids)
 [1] 79750
 go.ids[1:5]
 [1] GO:0006091.NA  GO:0008104.ISS GO:0008104.ISS GO:0006091.NA
 GO:0006091.NAS
 system.time(z - gsub([.].*, , go.ids))
 [1] 0.47 0.00 0.47   NA   NA
 system.time(z - gsub('\\..+$','', go.ids))
 [1] 0.56 0.00 0.56   NA   NA
 system.time(z - gsub('([^.]+)\\..*','\\1',go.ids))
 [1] 1.08 0.00 1.09   NA   NA
 system.time(z - sub(([GO:0-9]+)\\..*$, \\1, go.ids))
 [1] 1.03 0.00 1.03   NA   NA
 system.time(z - sub(\\..+, , go.ids))
 [1] 0.49 0.00 0.48   NA   NA
 system.time(z - substr(go.ids, 0, 10))
 [1] 0.02 0.00 0.01   NA   NA
 sessionInfo()
 R version 2.4.1 (2006-12-18)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

 attached base packages:
 [1] splines   stats graphics  grDevices datasets
utils
 tools methods   base

 other attached packages:
rat2302 xlsReadWritePro  qvalue   affycoretools
 biomaRt   RCurl XML GOstats
Category
   1.14.0 1.0.6 1.8.0 1.6.0
 1.8.1 0.8-0 1.2-0 2.0.4
2.0.3
 genefiltersurvivalKEGGRBGL
 annotate  GO   graph RWinEdt
limma

   1.12.0  2.301.14.11.10.0
 1.12.11.14.11.12.0 1.7-5
2.9.1

   affy  affyio

[R] help with plot of prcomp object

2006-09-18 Thread Kimpel, Mark William

I need to plot a prcomp object from package stats with custom symbols suitable 
for BW publication. My boss specifically wants filled and unfilled square, 
triangle, circle, inverted triangle, diamond to represent 5 brain regions of 2 
types of rat.

Can I specify these as a parameter?

Thanks,

Mark

Mark W. Kimpel MD 

 
Official Business Address:
 
Department of Psychiatry
Indiana University School of Medicine
PR M116
Institute of Psychiatric Research
791 Union Drive
Indianapolis, IN 46202
 
Preferred Mailing Address:
 
15032 Hunter Court
Westfield, IN  46074
 
(317) 490-5129 Work,  Mobile
 
(317) 663-0513 Home (no voice mail please)
1-(317)-536-2730 FAX

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with postscript output of R-devel on Windows

2006-08-31 Thread Kimpel, Mark William
I have developed a problem with the postscript output of plot on Windows. My 
code still works properly with R 2.3 but, with R 2.4, the white text on red 
background does not show up. It does, however, show up when output is sent to 
the screen. Below is my code and sessionInfo.

R version 2.4.0 Under development (unstable) (2006-08-29 r39012) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   tools methods   stats graphics  grDevices 
utils datasets 
[9] base 

other attached packages:
  Rgraphviz geneplotter XML GOstatsCategoryhgu95av2
KEGGmulttest  xtable 
   1.11.91.11.80.99-8 1.6.0 1.4.11.12.0 
1.8.11.11.2 1.3-2 
   RBGLannotate  GO   graph   Ruuid   limma  
genefiltersurvival rat2302 
1.8.11.11.5 1.6.5   1.11.131.11.2 2.7.9
1.11.8  2.281.12.0 
   affy  affyio Biobase 
   1.11.6 1.1.8   1.11.29


fileName-paste(experiment, contrast, FDR, FDR, Graph, ps, sep=.)
postscript(file=fileName, paper=special,width=width, height=height) #set 
up graphics device
plot(result.gN, layout.param, nodeAttrs = nAttrs, edgeAttrs = eAttrs,
main=paste(paste(Experiment:, experiment, ;  Contrast:, contrast,; 
 FDR:, FDR, sep=), paste(Min. connections ==, min.edges, Min. citations 
per connection ==, min.cites, Additional search criteria:,
termAdditional, sep= ), sep=))

Thanks,

Mark

Mark W. Kimpel MD 

 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with postscript output of R-devel on Windows

2006-08-31 Thread Kimpel, Mark William
I apologize for my previous confusing example. Below is some sample
code, taken directly from the image help file, that reproduces a
postscript problem. This now happens with both R 2.3.1 and R 2.4

What I get appears to be output of only certain postscript objects, to
use an Adobe term. When I use the R GUI menu to save as, jpeg and pdf
files save correctly, but the postscript file does not. I am not getting
any axis labels or topo labels. This is true whether I import the PS
file into either Photoshop or Illustrator.

Thanks, Mark

x - 10*(1:nrow(volcano))
 y - 10*(1:ncol(volcano))
 image(x, y, volcano, col = terrain.colors(100), axes = FALSE)
 contour(x, y, volcano, levels = seq(90, 200, by = 5),
 add = TRUE, col = peru)
 axis(1, at = seq(100, 800, by = 100))
 axis(2, at = seq(100, 600, by = 100))
 box()
 title(main = Maunga Whau Volcano, font.main = 4)

 sessionInfo()
Version 2.3.1 (2006-06-01) 
i386-pc-mingw32 

attached base packages:
[1] methods   stats graphics  grDevices utils
datasets 
[7] base  

Mark W. Kimpel MD 

 

(317) 490-5129 Work,  Mobile

 

(317) 663-0513 Home (no voice mail please)

1-(317)-536-2730 FAX


-Original Message-
From: Duncan Murdoch [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 31, 2006 12:52 PM
To: Kimpel, Mark William
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] problem with postscript output of R-devel on Windows

On 8/31/2006 11:27 AM, Kimpel, Mark William wrote:
 I have developed a problem with the postscript output of plot on
Windows. My code still works properly with R 2.3 but, with R 2.4, the
white text on red background does not show up. It does, however, show up
when output is sent to the screen. Below is my code and sessionInfo.
 
 R version 2.4.0 Under development (unstable) (2006-08-29 r39012) 
 i386-pc-mingw32 
 
 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
 
 attached base packages:
 [1] splines   tools methods   stats graphics
grDevices utils datasets 
 [9] base 
 
 other attached packages:
   Rgraphviz geneplotter XML GOstatsCategory
hgu95av2KEGGmulttest  xtable 
1.11.91.11.80.99-8 1.6.0 1.4.1
1.12.0 1.8.11.11.2 1.3-2 
RBGLannotate  GO   graph   Ruuid
limma  genefiltersurvival rat2302 
 1.8.11.11.5 1.6.5   1.11.131.11.2
2.7.91.11.8  2.281.12.0 
affy  affyio Biobase 
1.11.6 1.1.8   1.11.29
 
 
 fileName-paste(experiment, contrast, FDR, FDR, Graph, ps,
sep=.)
 postscript(file=fileName, paper=special,width=width,
height=height) #set up graphics device
 plot(result.gN, layout.param, nodeAttrs = nAttrs, edgeAttrs =
eAttrs,
 main=paste(paste(Experiment:, experiment, ;  Contrast:,
contrast,;  FDR:, FDR, sep=), paste(Min. connections ==,
min.edges, Min. citations per connection ==, min.cites, Additional
search criteria:,
 termAdditional, sep= ), sep=))


Could you put together a reproducible example to illustrate the problem?

  We don't have all the variables used in that example.  I think you 
should be able to do it with just base packages attached; if not, it's 
likely a problem with one of the contributed packages, rather than with
R.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R on a supercomputer

2005-10-10 Thread Kimpel, Mark William
I am using R with Bioconductor to perform analyses on large datasets
using bootstrap methods. In an attempt to speed up my work, I have
inquired about using our local supercomputer and asked the administrator
if he thought R would run faster on our parallel network. I received the
following reply:

 

 

The second benefit is that the processors have large caches. 

Briefly, everything is loaded into cache before going into the
processor.  With large caches, there is less movement of data between
memory and cache, and this can save quite a bit of time.  Indeed, when
programmers optimize code they usually think about how to do things to
keep data in cache as long as possible. 

  Whether you would receive any benefit from larger cache depends on how
R is written. If it's written such that  data remain in cache, the
speed-up could be considerable, but I have no way to predict it.

 

My question is, is R written such that data remain in cache? 

 

Thanks,

 

 

Mark W. Kimpel MD 

 

Indiana University School of Medicine

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html