date:20101010

Re: [R] same random numbers in different sessions

2010-10-10 Thread Liviu Andronic

Dear all
Thanks for all the pointers.

On Sat, Oct 9, 2010 at 11:39 PM, Daniel Nordlund
djnordl...@frontier.com wrote:
 Could you be reloading a workspace at start-up that is setting the seed?  
 What happens if you start R using the --vanilla option?

It seems that this is the culprit. For some reason, my $HOME session
already has
 ls(all=T)
[1] .Random.seed

If I open a session in /tmp, or using --vanilla then I get actual
pseudo-random numbers. It seems that require(IPSUR) is not at fault,
either.

Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] same random numbers in different sessions

2010-10-10 Thread Liviu Andronic

Hello

On Sun, Oct 10, 2010 at 1:16 AM, jim holtman jholt...@gmail.com wrote:
 You need to set the set.seed yourself.  There are some simulation
 where I do want the same numbers generated and can use the set.seed to
 set it to a know value.  If you want something random each time, then
 use the time of day in the call to set.seed.

I try to do this, but I get funny results. I put
try(rm(.Random.seed))
set.seed(Sys.time())

in
/usr/lib/R/etc/Rprofile.site

but I get the following error
Error in rm(.Random.seed) : cannot remove variables from base namespace
[Previously saved workspace restored]

and it is as if the set.seed() call didn't work, since I get the same
random values.
 rnorm(1:10)
 [1] -1.3618103  0.4241701  1.0720076  0.2208145 -0.5375314 -0.4846588
 [7]  0.7576768  0.6527407 -0.6868786  0.8718527

If I do
 set.seed(Sys.time())
 rnorm(1:10)
 [1] -0.6165650  0.6305187 -0.9316815  0.6034638 -0.8593514 -1.0243644
 [7] -0.1050344  0.4408562 -0.3466161  0.4058430

manually, within the session, the seed seems to be changed as
requested. Am I doing something wrong?
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel

2010-10-10 Thread vanilla fantasy

Thanks David.

You mentioned that I need to insert an extra cell and move the header over
one position. I tried for a few times but still coudnt figure out how. Can
you pls advise? Many thanks.
On Sun, Oct 10, 2010 at 11:55 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Oct 9, 2010, at 10:54 PM, missvanilla wrote:


 Dear all,

 I'm totally new to R. Recently I've been trying to use getYahooData in TTR
 package in order to download stock index daily open/high/low/close. The
 downloaded data is in the format of

  Open  High Low   Close  Volume
 2000-01-04 18937.45 19187.61 18937.45 19002.86  0
 2000-01-05 19003.51 19003.51 18221.82 18542.55  0
 2000-01-06 18574.01 18582.74 18168.27 18168.27  0
 2000-01-07 18194.05 18285.73 18068.10 18193.41  0
 2000-01-11 18246.10 18887.56 18246.10 18850.92  0
 2000-01-12 18780.17 18811.87 18626.92 18677.42  0
 2000-01-13 18667.18 18845.03 18667.18 18833.29  0
 2000-01-14 18882.99 19058.02 18733.83 18956.55  0
 2000-01-17 19025.62 19442.58 19025.62 19437.23  0
 2000-01-18 19412.47 19412.47 19145.17 19196.57  0

 However, when I attempted to write the data to excel using write.table,
 dates in the first colume  become 1,2,3,4 in the excel file. Same problem
 happened if write.csv was used.

 If you run these two lines of code you'll get what I meant.. before
 running
 the code, package TTR needs to be loaded.

 N225 - getYahooData(^N225, 2101, )
 write.table(N225,Nikkei.xls,sep='\t', row.name = TRUE , col.name = NA)


 There is a well-described problem with write.table files going into Excel.
 There is no leading item or tab on the first row. You need to insert an
 extra cell and move the header over one position. Then you won't be
 misinterpreting your row.names as dates.

 --
 David


 Appreciate your kind assistance! Thanks a lot in advance.

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Help-needed-for-getYahooData-in-TTR-package-writing-the-Yahoo-data-to-excel-tp2970017p2970017.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel

2010-10-10 Thread vanilla fantasy

Just trying to be clearer on the problem faced, the dates in the 1st colume
become 1,2,3,4 in excel as below, the screenshot of how data appears in
excel is attached.

  Open  High Low   Close  Volume
1  18937.45 19187.61 18937.45 19002.86  0
2  19003.51 19003.51 18221.82 18542.55  0
3  18574.01 18582.74 18168.27 18168.27  0
4  18194.05 18285.73 18068.10 18193.41  0
5  18246.10 18887.56 18246.10 18850.92  0
.
.

On Sun, Oct 10, 2010 at 11:55 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Oct 9, 2010, at 10:54 PM, missvanilla wrote:


 Dear all,

 I'm totally new to R. Recently I've been trying to use getYahooData in TTR
 package in order to download stock index daily open/high/low/close. The
 downloaded data is in the format of

  Open  High Low   Close  Volume
 2000-01-04 18937.45 19187.61 18937.45 19002.86  0
 2000-01-05 19003.51 19003.51 18221.82 18542.55  0
 2000-01-06 18574.01 18582.74 18168.27 18168.27  0
 2000-01-07 18194.05 18285.73 18068.10 18193.41  0
 2000-01-11 18246.10 18887.56 18246.10 18850.92  0
 2000-01-12 18780.17 18811.87 18626.92 18677.42  0
 2000-01-13 18667.18 18845.03 18667.18 18833.29  0
 2000-01-14 18882.99 19058.02 18733.83 18956.55  0
 2000-01-17 19025.62 19442.58 19025.62 19437.23  0
 2000-01-18 19412.47 19412.47 19145.17 19196.57  0

 However, when I attempted to write the data to excel using write.table,
 dates in the first colume  become 1,2,3,4 in the excel file. Same problem
 happened if write.csv was used.

 If you run these two lines of code you'll get what I meant.. before
 running
 the code, package TTR needs to be loaded.

 N225 - getYahooData(^N225, 2101, )
 write.table(N225,Nikkei.xls,sep='\t', row.name = TRUE , col.name = NA)


 There is a well-described problem with write.table files going into Excel.
 There is no leading item or tab on the first row. You need to insert an
 extra cell and move the header over one position. Then you won't be
 misinterpreting your row.names as dates.

 --
 David


 Appreciate your kind assistance! Thanks a lot in advance.

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Help-needed-for-getYahooData-in-TTR-package-writing-the-Yahoo-data-to-excel-tp2970017p2970017.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] same random numbers in different sessions

2010-10-10 Thread Duncan Murdoch


On 10/10/2010 4:30 AM, Liviu Andronic wrote:

Hello

On Sun, Oct 10, 2010 at 1:16 AM, jim holtman jholt...@gmail.com wrote:

You need to set the set.seed yourself.  There are some simulation
where I do want the same numbers generated and can use the set.seed to
set it to a know value.  If you want something random each time, then
use the time of day in the call to set.seed.


I try to do this, but I get funny results. I put
try(rm(.Random.seed))
set.seed(Sys.time())

in
/usr/lib/R/etc/Rprofile.site

but I get the following error
Error in rm(.Random.seed) : cannot remove variables from base namespace
[Previously saved workspace restored]

and it is as if the set.seed() call didn't work, since I get the same
random values.

rnorm(1:10)

 [1] -1.3618103  0.4241701  1.0720076  0.2208145 -0.5375314 -0.4846588
 [7]  0.7576768  0.6527407 -0.6868786  0.8718527

If I do

set.seed(Sys.time())
rnorm(1:10)

 [1] -0.6165650  0.6305187 -0.9316815  0.6034638 -0.8593514 -1.0243644
 [7] -0.1050344  0.4408562 -0.3466161  0.4058430

manually, within the session, the seed seems to be changed as
requested. Am I doing something wrong?


The Rprofile.site is being executed before the saved workspace is 
restored.  See ?Startup for the sequence of events on startup. You could 
put the rm() in .First() in the saved workspace and it would do what you 
want.


But more generally, I would say the thing you are doing wrong is saving 
.RData sometimes, but not consistently saving it.  In my opinion it's 
safest to never save it; then you won't recover unexpected things from 
your history.  But it's also safe to always save it.  Then you'll get a 
new copy of .Random.seed saved each time.


I think the q() function makes it a little too easy to do what you did: 
 if you intend to never save it, but answer Yes just once, you get 
into your situation.  I don't know what the alternative should be.


One possibility would be for R to record whether the workspace was 
restored at the start of the session, and use that to determine the 
default when ending it.  But that would mess up people who are trying to 
reproduce things from identical conditions, e.g. when tracking down a bug.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GC verbose=false still showing report

2010-10-10 Thread Duncan Murdoch


On 09/10/2010 9:59 PM, Robin Jeffries wrote:

invisible(gc())

worked perfectly. Thanks Jeff.

@ Josh: I know how to toggle showing/hiding command echos, but I
haven't figured out how to toggle on/off any printed output.


Use results=hide as an Sweave option, e.g.

echo=FALSE, results=hide=
gc()
@

Duncan Murdoch






On Sat, Oct 9, 2010 at 5:10 PM, Robin Jeffries rjeffr...@ucla.edu wrote:

I must be reading the help file for gc() wrong. I thought it said that
gc(verbose=FALSE) will run the garbage collection without printing the
Ncells/Vcells summary. However, this is what I get:

gc(verbose = FALSE)
used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 267097 14.3 531268  28.4   531268  28.4
Vcells 429302  3.3   20829406 159.0 55923977 426.7

I'm embedding this in an Sweave/TeX file, so I *really* can't have
this printing out. Suggestions other than manually editing the TeX
file?

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel

2010-10-10 Thread David Winsemius



On Oct 10, 2010, at 5:27 AM, vanilla fantasy wrote:

Just trying to be clearer on the problem faced, the dates in the 1st  
colume become 1,2,3,4 in excel as below, the screenshot of how data  
appears in excel is attached.


I can affirm that there was at one time a jpg since you copied me an  
my mail client is much less suspicious about attachments than is the  
mainly lis server. Nobody else got a copy, though.




  Open  High Low   Close  Volume
1  18937.45 19187.61 18937.45 19002.86  0
2  19003.51 19003.51 18221.82 18542.55  0
3  18574.01 18582.74 18168.27 18168.27  0
4  18194.05 18285.73 18068.10 18193.41  0
5  18246.10 18887.56 18246.10 18850.92  0


THat was really not a data.frame in R, but rahter an xts object and  
when write.table coverted it to a data.frame the dates (which were an  
attribute got stripped off.:


 str(N225)
An ‘xts’ object from 2000-01-04 to 2010-10-08 containing:
  Data: num [1:2644, 1:5] 18937 19004 18574 18194 18246 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:5] Open High Low Close ...
  Indexed by objects of class: [POSIXt,POSIXct] TZ:
  xts Attributes:
 NULL

 str(as.data.frame(N225))
'data.frame':   2644 obs. of  5 variables:
 $ Open  : num  18937 19004 18574 18194 18246 ...
 $ High  : num  19188 19004 18583 18286 1 ...
 $ Low   : num  18937 18222 18168 18068 18246 ...
 $ Close : num  19003 18543 18168 18193 18851 ...
 $ Volume: num  0 0 0 0 0 0 0 0 0 0 ...

You probably first need to extract the dates from that xts object
 names(attributes(N225))
[1] index   dim dimnamesclass
.indexCLASS

[6] .indexTZ
 str(attr(N225, index))
 num [1:2644] 9.47e+08 9.47e+08 9.47e+08 9.47e+08 9.48e+08 ...

So the dates are not in a DateTime format, but experimentation shows  
them to be a series of 5:00:: if treated as POSIXct.

 dts - as.POSIXct(attributes(N225)$index, origin=1970-01-01)
 str(dts)
 POSIXct[1:2644], format: 2000-01-04 05:00:00 2000-01-05  
05:00:00 ...


You will first need to use as.data.frame to re-class the data matrix  
in N225 and then add a column of dates. Sorry for the initial off-base  
reply. I should have looked at the tab separated file you created  
rather than assuming I what would happen. Maybe all in one stroke:


 N225df - cbind(dts, as.data.frame(N225) )
 str(N225df)
'data.frame':   2644 obs. of  6 variables:
 $ dts   : POSIXct, format: 2000-01-04 05:00:00 2000-01-05  
05:00:00 ...

 $ Open  : num  18937 19004 18574 18194 18246 ...
 $ High  : num  19188 19004 18583 18286 1 ...
 $ Low   : num  18937 18222 18168 18068 18246 ...
 $ Close : num  19003 18543 18168 18193 18851 ...
 $ Volume: num  0 0 0 0 0 0 0 0 0 0 ...


--
David.


.
.

On Sun, Oct 10, 2010 at 11:55 AM, David Winsemius dwinsem...@comcast.net 
 wrote:


On Oct 9, 2010, at 10:54 PM, missvanilla wrote:


Dear all,

I'm totally new to R. Recently I've been trying to use getYahooData  
in TTR
package in order to download stock index daily open/high/low/close.  
The

downloaded data is in the format of

 Open  High Low   Close  Volume
2000-01-04 18937.45 19187.61 18937.45 19002.86  0
2000-01-05 19003.51 19003.51 18221.82 18542.55  0
2000-01-06 18574.01 18582.74 18168.27 18168.27  0
2000-01-07 18194.05 18285.73 18068.10 18193.41  0
2000-01-11 18246.10 18887.56 18246.10 18850.92  0
2000-01-12 18780.17 18811.87 18626.92 18677.42  0
2000-01-13 18667.18 18845.03 18667.18 18833.29  0
2000-01-14 18882.99 19058.02 18733.83 18956.55  0
2000-01-17 19025.62 19442.58 19025.62 19437.23  0
2000-01-18 19412.47 19412.47 19145.17 19196.57  0

However, when I attempted to write the data to excel using  
write.table,
dates in the first colume  become 1,2,3,4 in the excel file. Same  
problem

happened if write.csv was used.

If you run these two lines of code you'll get what I meant.. before  
running

the code, package TTR needs to be loaded.

N225 - getYahooData(^N225, 2101, )
write.table(N225,Nikkei.xls,sep='\t', row.name = TRUE , col.name =  
NA)


There is a well-described problem with write.table files going into  
Excel. There is no leading item or tab on the first row. You need to  
insert an extra cell and move the header over one position. Then you  
won't be misinterpreting your row.names as dates.


--
David


Appreciate your kind assistance! Thanks a lot in advance.

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-needed-for-getYahooData-in-TTR-package-writing-the-Yahoo-data-to-excel-tp2970017p2970017.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


nikkei screenshot.jpg

[R] trycatch examples

2010-10-10 Thread Santosh Srinivas

Dear R-group,

I am looking for some good examples on trycatch. Any pointers?
The help manual seems quite limited.

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Package prabclus not available?

2010-10-10 Thread Christian Hennig


Hi there,

I just tried to install the package prabclus on a computer running Ubuntu 
Linux 9.04 using install.packages from within R.

This gave me a message:
Warning message:
In install.packages(prabclus) : package ‘prabclus’ is not available

I tried to do this selecting two different CRAN mirrors (same result) and 
with other packages (installing them works fine).


Looking up the CRAN mirror website I used (UK, London), there doesn't seem 
to be anything wrong with prabclus. (iMac checking apparently gives an 
error which is due to an error with package spdep on that platform in 
tests, but that shouldn't affect using it on Linux, or should it?)


Any explanation?

Thanks and best wishes,
Christian

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Parallel processing

2010-10-10 Thread Partha Sinha

1.what is the application to install for to speed up processing for
multicore processor in windows environment?
2. how to compute time for executing a particular a code?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parallel processing

2010-10-10 Thread Tal Galili

Hello Partha,

Both questions are answered here:
http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/

http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/I
would also recommend you to have a look here:
http://www.r-statistics.com/2010/09/using-the-plyr-1-2-package-parallel-processing-backend-with-windows/

There are claims for other packages to achieve this, but I wasn't able to
make them work (I'd be glad to hear of better results by others)

Best,
Tal
http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Sun, Oct 10, 2010 at 2:58 PM, Partha Sinha pnsinh...@gmail.com wrote:

 1.what is the application to install for to speed up processing for
 multicore processor in windows environment?
 2. how to compute time for executing a particular a code?

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed for getYahooData in TTR package writing the Yahoo data to excel

2010-10-10 Thread Gabor Grothendieck

On Sun, Oct 10, 2010 at 8:09 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Oct 10, 2010, at 5:27 AM, vanilla fantasy wrote:

 Just trying to be clearer on the problem faced, the dates in the 1st
 colume become 1,2,3,4 in excel as below, the screenshot of how data appears
 in excel is attached.

 I can affirm that there was at one time a jpg since you copied me an my mail
 client is much less suspicious about attachments than is the mainly lis
 server. Nobody else got a copy, though.


              Open      High         Low       Close      Volume
 1          18937.45 19187.61 18937.45 19002.86      0
 2          19003.51 19003.51 18221.82 18542.55      0
 3          18574.01 18582.74 18168.27 18168.27      0
 4          18194.05 18285.73 18068.10 18193.41      0
 5          18246.10 18887.56 18246.10 18850.92      0

 THat was really not a data.frame in R, but rahter an xts object and when
 write.table coverted it to a data.frame the dates (which were an attribute
 got stripped off.:


write.zoo in the zoo package can write xts objects. See ?write.zoo:

   write.zoo(N225, file = myfile.dat, ...possibly other arguments...)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory management in R

2010-10-10 Thread Lorenzo Isella




I already offered the Biostrings package. It provides more robust
methods for string matching than does grepl. Is there a reason that you
choose not to?



Indeed that is the way I should go for and I have installed the package 
after some struggling. Since biostring is a fairly complex package and I 
need only a way to check if a certain string A is a subset of string B, 
do you know the biostring functions to achieve this?
I see a lot of methods for biological (DNA, RNA) sequences, and they may 
not apply to my series (which are definitely not from biology).

Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hausman test for endogeneity

2010-10-10 Thread Holger Steinmetz


Dear Liviu,

thank you very much. After inspecting the options, I *guess* that systemfit
is what I need.
However, I absolutely don't understand how it works. I searched long for a
detailed documentation (beyond the rather cryptic standard documentation)
but found none. 

Has anybody references/advises how to conduct the test?

Best,
Holger
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Hausman-test-for-endogeneity-tp2969522p2970261.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: rulefit error on Linux

2010-10-10 Thread Uwe Ligges




On 07.10.2010 10:55, noclue_ wrote:


R version 2.8.1 (2008-12-22) on Linux 64-bit

I am trying to run 'rulefit' function (Rule based Learning Ensembles). but I
got the following error -


rulefit(x,y)

Warning: This program is an suid-root program or is being run by the root
user.
The full text of the error or warning message cannot be safely formatted
in this environment. You may get a more descriptive message by running the
program as a non-root user or by removing the suid bit on the executable.
xterm Xt error: Can't open display: %s
xterm:  DISPLAY is not set
Error in file(file, r) : cannot open the connection
In addition: Warning message:
In file(file, r) :
   cannot open file '/root/_rulefit/rfstatus': No such file or directory
--

On windows R 2.10, I got this run successfully. So I am wondering whether it
is due to my R older version on Linux.


Well, may be, both R versions are old and the release candidate for 
R-2.12.0 is out, hence please use that one for now and witch to the 
release version next week.


For your original question: Looks like you had some terminal without X 
forwarding, i.e. R was not able to open an x11 window in ordert to 
present a plot.


Uwe Ligges





Thanks!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matching long strings ... was Re: Memory management in R

2010-10-10 Thread David Winsemius



On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote:




I already offered the Biostrings package. It provides more robust
methods for string matching than does grepl. Is there a reason that  
you

choose not to?



Indeed that is the way I should go for and I have installed the  
package after some struggling.


For me is was a matter of waiting. The only struggle was coming from  
my inner timer saying it was taking too long.


Since biostring is a fairly complex package and I need only a way to  
check if a certain string A is a subset of string B, do you know the  
biostring functions to achieve this?
I see a lot of methods for biological (DNA, RNA) sequences, and they  
may not apply to my series (which are definitely not from biology).

Cheers


It appeared to me that the function matchPattern should replace your  
grepl invocation that was failing. It returns a more complex  
structure, so you would need to determine what would be an exact  
replacement for grepl(...) != 1. Looks like a no-match event resutls  
in the start and end items being of length 0.


 str(  matchPattern(A, BString(BBB)) )
Formal class 'XStringViews' [package Biostrings] with 7 slots
  ..@ subject:Formal class 'BString' [package Biostrings]  
with 6 slots
  .. .. ..@ shared :Formal class 'SharedRaw' [package  
IRanges] with 2 slots

  .. .. .. .. ..@ xp:externalptr
  .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8
  .. .. ..@ offset : int 0
  .. .. ..@ length : int 3
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ elementType: chr ANY
  .. .. ..@ metadata   : list()
  ..@ start  : int(0)
  ..@ width  : int(0)
  ..@ NAMES  : NULL
  ..@ elementMetadata: NULL
  ..@ elementType: chr integer
  ..@ metadata   : list()

Perhaps:

length(matchPattern(fut_string, past_string)@start ) == 0

You do need to use BString() on at least the past_string argument and  
maybe the fut_string as well. The BioConductor Mailing List would have  
a larger audience with experience using this package, so they should  
probably be your next avenue for advice. I am just reading the help  
pages as you should be able to do. The help page help(lowlevel- 
matching) should probably be reviewed since there may be efficiency  
issues to consider as mentioned below.


When dropped into your function with the BString coercion, it  
replicated your small example results and did not crash after a long  
period with your larger example, so I then terminated it and insert a  
reporter line to monitor progress. With that reporter I got up into  
the 200's for count_len without error. My laptop CPU was warming up  
the case and I was getting sleepy so I terminated the process. (I had  
no way of checking for accuracy, even if I had let it proceed, since  
you did not offer a correct answer.)


By the way, the construct ... grepl(. , .) != 1 ... is perhaps  
inefficient. It could more compactly be expressed as ...   ! 
grepl(. , .)  which would not be doing coercion of logicals to integers.


--
David.



Lorenzo


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hausman test for endogeneity

2010-10-10 Thread Arne Henningsen

Hi Holger

On 10 October 2010 15:36, Holger Steinmetz holger.steinm...@web.de wrote:
 After inspecting the options, I *guess* that systemfit
 is what I need.
 However, I absolutely don't understand how it works. I searched long for a
 detailed documentation (beyond the rather cryptic standard documentation)
 but found none.

 Has anybody references/advises how to conduct the test?

A paper describing the systemfit package has been published in the
journal of statistical software:

http://www.jstatsoft.org/v23/i04/paper

It describes the Hausman test for testing the consistency of the 3SLS
estimates against the 2SLS estimates (see sections 2.8 and 4.6).

I guess (but I am not sure -- maybe others can comment on this) that
you test for the endogeneity of regressors, e.g., by

fitSur - systemfit( myFormula, data = myData, method = SUR )

fit3sls - systemfit( myFormula, data = myData, method = 3SLS, inst
= myInst )

hausman.systemfit( fit3sls, fitSur )

If some regressors are endogenous, the SUR estimates are inconsistent
but the 3SLS estimates are consistent given that the instrumental
variables are exogenous. However, if all regressors are exogenous,
both estimates should be consistent but the SUR estimates should be
more efficient.

Best wishes,
Arne

-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package prabclus not available?

2010-10-10 Thread Uwe Ligges


Works for me for both the London mirror as well as CRAN master.
May I guess that you are under R  2.10.x on that machine?

Uwe



On 10.10.2010 14:48, Christian Hennig wrote:

Hi there,

I just tried to install the package prabclus on a computer running
Ubuntu Linux 9.04 using install.packages from within R.
This gave me a message:
Warning message:
In install.packages(prabclus) : package ‘prabclus’ is not available

I tried to do this selecting two different CRAN mirrors (same result)
and with other packages (installing them works fine).

Looking up the CRAN mirror website I used (UK, London), there doesn't
seem to be anything wrong with prabclus. (iMac checking apparently gives
an error which is due to an error with package spdep on that platform in
tests, but that shouldn't affect using it on Linux, or should it?)

Any explanation?

Thanks and best wishes,
Christian

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package prabclus not available?

2010-10-10 Thread Christian Hennig


Works for me for both the London mirror as well as CRAN master.
May I guess that you are under R  2.10.x on that machine?


Oh yes, need to update this first. Thanks. Solved.

The error message could be more informative, though...

Christian



Uwe



On 10.10.2010 14:48, Christian Hennig wrote:

Hi there,

I just tried to install the package prabclus on a computer running
Ubuntu Linux 9.04 using install.packages from within R.
This gave me a message:
Warning message:
In install.packages(prabclus) : package ‘prabclus’ is not available

I tried to do this selecting two different CRAN mirrors (same result)
and with other packages (installing them works fine).

Looking up the CRAN mirror website I used (UK, London), there doesn't
seem to be anything wrong with prabclus. (iMac checking apparently gives
an error which is due to an error with package spdep on that platform in
tests, but that shouldn't affect using it on Linux, or should it?)

Any explanation?

Thanks and best wishes,
Christian

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matching long strings ... was Re: Memory management in R

2010-10-10 Thread Lorenzo Isella


On 10/10/2010 04:11 PM, David Winsemius wrote:

length(matchPattern(fut_string, past_string)@start ) == 0


Wow, thanks a lot!
I am still testing this, but it looks like this is a good replacement 
for grepl. Definitely, since I am not a life scientist even from afar by 
training, this solution/analogy with sequencing in biology would have 
never come to my mind.

Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] segfault caused by `icfit` in `interval` package

2010-10-10 Thread Yuliya Matveyeva

Dear R community,
 I am using the R package `interval` in order to perform some modelling
tests of the
NPMLE convergence in the case of censoring. So all I am doing is drawing a
sample
from exponential distribution, making it a censored sample and computing the
NPMLE of
its distribution function. But when run on Linux Calculate 10.4 the program
keeps
crashing and reporting a segmentation fault
after the call to the `icfit` function when the sample size gets to 70.
When run on Windows 7 it seems to be fine.
That is why I am totally confused and have decided to ask for help.

I have attached the code I am running which results in a segmentation fault
if run on Linux Calculate.
It has the seed set to the value which leads to this error. But it is
important to note
that if the parameters used in the program and the seed are changed it
doesn't
necessarily crash.

Here is the description of my R version and OS:

 sessionInfo()
R version 2.10.1 (2009-12-14)
i686-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

After calling `icfit` the program quits with the following output
(I have replaced the output concerning the arguments passed to
initcomputeMLE by
 arguments passed to initcomputeMLE so that the description of the

output wouldn't be too long):

 *** caught segfault ***
address 0xc, cause 'memory not mapped'

Traceback:
 1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol)
 2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol =
tol)
 3: initcomputeMLE( arguments passed to initcomputeMLE)
 4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A =
A))
 5: doTryCatch(return(expr), name, parentenv, handler)
 6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 7: tryCatchList(expr, classes, parentenv, handlers)
 8: tryCatch(expr, error = function(e) {call - conditionCall(e)if
(!is.null(call)) {if (identical(call[[1L]],
quote(doTryCatch))) call - sys.call(-4L)dcall -
deparse(call)[1L]prefix - paste(Error in, dcall, : )
LONG - 75Lmsg - conditionMessage(e)sm - strsplit(msg,
\n)[[1L]]w - 14L + nchar(dcall, type = w) + nchar(sm[1L], type
= w)if (is.na(w)) w - 14L + nchar(dcall, type = b)
+ nchar(sm[1L], type = b)if (w  LONG)
prefix - paste(prefix, \n  , sep = )}else prefix - Error :
msg - paste(prefix, conditionMessage(e), \n, sep = )
.Internal(seterrmessage(msg[1L]))if (!silent 
identical(getOption(show.error.messages), TRUE)) {cat(msg,
file = stderr()).Internal(printDeferredWarnings())}
invisible(structure(msg, class = try-error))})
 9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin,
A = A)))
10: icfit.default(L = left, R = right)
11: icfit(L = left, R = right)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

I would greatly appreciate any help provided.
Sincerely yours,
Yuliya Matveyeva.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] segfault caused by `icfit` in `interval` package

2010-10-10 Thread Uwe Ligges

Please try with the R-2.12.0 ReleaseCandidate and thwe most recent 
version of the package (please report its version number!).


If it still fails, it may be a good idea to contact the package 
maintainer of interval first.


Uwe Ligges



On 10.10.2010 17:33, Yuliya Matveyeva wrote:

Dear R community,
  I am using the R package `interval` in order to perform some modelling
tests of the
NPMLE convergence in the case of censoring. So all I am doing is drawing a
sample
from exponential distribution, making it a censored sample and computing the
NPMLE of
its distribution function. But when run on Linux Calculate 10.4 the program
keeps
crashing and reporting a segmentation fault
after the call to the `icfit` function when the sample size gets to 70.
When run on Windows 7 it seems to be fine.
That is why I am totally confused and have decided to ask for help.

I have attached the code I am running which results in a segmentation fault
if run on Linux Calculate.
It has the seed set to the value which leads to this error. But it is
important to note
that if the parameters used in the program and the seed are changed it
doesn't
necessarily crash.

Here is the description of my R version and OS:


sessionInfo()

R version 2.10.1 (2009-12-14)
i686-pc-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

After calling `icfit` the program quits with the following output
(I have replaced the output concerning the arguments passed to
initcomputeMLE by
  arguments passed to initcomputeMLE  so that the description of the

output wouldn't be too long):

  *** caught segfault ***
address 0xc, cause 'memory not mapped'

Traceback:
  1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol)
  2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol =
tol)
  3: initcomputeMLE(  arguments passed to initcomputeMLE)
  4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A =
A))
  5: doTryCatch(return(expr), name, parentenv, handler)
  6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  7: tryCatchList(expr, classes, parentenv, handlers)
  8: tryCatch(expr, error = function(e) {call- conditionCall(e)if
(!is.null(call)) {if (identical(call[[1L]],
quote(doTryCatch))) call- sys.call(-4L)dcall-
deparse(call)[1L]prefix- paste(Error in, dcall, : )
LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg,
\n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type
= w)if (is.na(w)) w- 14L + nchar(dcall, type = b)
+ nchar(sm[1L], type = b)if (w  LONG)
prefix- paste(prefix, \n  , sep = )}else prefix- Error :
msg- paste(prefix, conditionMessage(e), \n, sep = )
.Internal(seterrmessage(msg[1L]))if (!silent
identical(getOption(show.error.messages), TRUE)) {cat(msg,
file = stderr()).Internal(printDeferredWarnings())}
invisible(structure(msg, class = try-error))})
  9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin,
A = A)))
10: icfit.default(L = left, R = right)
11: icfit(L = left, R = right)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

I would greatly appreciate any help provided.
Sincerely yours,
Yuliya Matveyeva.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matching long strings ... was Re: Memory management in R

2010-10-10 Thread Martin Morgan

On 10/10/2010 07:11 AM, David Winsemius wrote:
 
 On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote:
 

 I already offered the Biostrings package. It provides more robust
 methods for string matching than does grepl. Is there a reason that you
 choose not to?


 Indeed that is the way I should go for and I have installed the
 package after some struggling.
 
 For me is was a matter of waiting. The only struggle was coming from my
 inner timer saying it was taking too long.
 
 Since biostring is a fairly complex package and I need only a way to
 check if a certain string A is a subset of string B, do you know the
 biostring functions to achieve this?
 I see a lot of methods for biological (DNA, RNA) sequences, and they
 may not apply to my series (which are definitely not from biology).
 Cheers
 
 It appeared to me that the function matchPattern should replace your
 grepl invocation that was failing. It returns a more complex structure,
 so you would need to determine what would be an exact replacement for
 grepl(...) != 1. Looks like a no-match event resutls in the start and
 end items being of length 0.
 
 str(  matchPattern(A, BString(BBB)) )

A couple of things from this thread.

To install a Bioconductor package follow directions here

  http://bioconductor.org/install/index.html#install-bioconductor-packages

which leads to

   source(http://bioconductor.org/biocLite.R;)
   biocLite(Biostrings)

biocLite is just a wrapper around install.packages with appropriate
repositories defined.

Some Bioconductor packages are relatively mature and make relatively
advanced use of S4 classes, so looking at str() is not that helpful --
the way the user is meant to interact with the object is different from
the way the object is implemented. So the best bet is to look at the
relevant help pages

  result = matchPattern(A, BString(BBB))
  class(result)
  class?XStringViews

and the help pages referenced there, or from which XStringViews inherits

   class(XStringViews)

and in particular

   class?Ranges

Rather than accessing the 'start' slot, use start(result). Vignettes are
used heavily in Bioconductor packages, and in particular

   browseVignettes(Biostrings)

pops up a page with several relevant vignettes, e.g., 'A short
presentation of the basic classes...' and perhaps 'Pairwise Sequence
Alignment'. These are also accessible on the Bioconductor web site,
e.g., on the pages linked from

  http://bioconductor.org/help/bioc-views/release/bioc/

The rule of thumb hinted at below -- that an operation seems to be
taking longer than it should -- probably indicates that the function is
being invoked in an inefficient way. If the documentation is opaque then
definitely the place to seek additional help is on the Bioconductor
mailing list

  http://bioconductor.org/help/mailing-list/

Hope this helps.

Martin


 Formal class 'XStringViews' [package Biostrings] with 7 slots
   ..@ subject:Formal class 'BString' [package Biostrings] with
 6 slots
   .. .. ..@ shared :Formal class 'SharedRaw' [package IRanges]
 with 2 slots
   .. .. .. .. ..@ xp:externalptr
   .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8
   .. .. ..@ offset : int 0
   .. .. ..@ length : int 3
   .. .. ..@ elementMetadata: NULL
   .. .. ..@ elementType: chr ANY
   .. .. ..@ metadata   : list()
   ..@ start  : int(0)
   ..@ width  : int(0)
   ..@ NAMES  : NULL
   ..@ elementMetadata: NULL
   ..@ elementType: chr integer
   ..@ metadata   : list()
 
 Perhaps:
 
 length(matchPattern(fut_string, past_string)@start ) == 0
 
 You do need to use BString() on at least the past_string argument and
 maybe the fut_string as well. The BioConductor Mailing List would have a
 larger audience with experience using this package, so they should
 probably be your next avenue for advice. I am just reading the help
 pages as you should be able to do. The help page
 help(lowlevel-matching) should probably be reviewed since there may be
 efficiency issues to consider as mentioned below.
 
 When dropped into your function with the BString coercion, it replicated
 your small example results and did not crash after a long period with
 your larger example, so I then terminated it and insert a reporter
 line to monitor progress. With that reporter I got up into the 200's for
 count_len without error. My laptop CPU was warming up the case and I was
 getting sleepy so I terminated the process. (I had no way of checking
 for accuracy, even if I had let it proceed, since you did not offer a
 correct answer.)
 
 By the way, the construct ... grepl(. , .) != 1 ... is perhaps
 inefficient. It could more compactly be expressed as ...   !grepl(. ,
 .)  which would not be doing coercion of logicals to integers.
 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

[R] Help reading table rows into lists

2010-10-10 Thread Alison Waller


Hi all,

I have a large table mapping thousands of COGs(groups of genes) to  
pathways.

# Ex
COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
##

I would like to combine this information into a big list such as below
COG2PATHWAY- 
list 
(COG0001 
= 
c 
(patha 
,pathb 
,pathc 
),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh))


I am stuck and have tried various methods involving (probably mangled)  
versions of lappy and loops.


Any suggestions on the most efficient way to do this would be great.

Thanks,

Alison

Here is my latest attempt.

#

line_num-length(scan(file=/g/bork8/waller/ 
test_COGtoPath.txt,what=character,sep=\n))

COG2Path-vector(list,line_num)
COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/waller/ 
test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))


#

I am getting an error

#

COG2Path-lapply(1:(line_num-1),function(x) scan(file=/g/bork8/ 
waller/ 
test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))

Error in file(file, r) : cannot open the connection
In addition: Warning message:
In file(file, r) :

But if I do scan alone I don't get an error

# then I suppose it looks like the easiest wasy to name the list  
variables is using unix to cut the first column out and then read that  
in.
names(COG2Path)-scan(file=/g/bork8/waller/ 
test_col_names.txt,sep=\t,what=character)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory management in R

2010-10-10 Thread Mike Marchywka

 Date: Sun, 10 Oct 2010 15:27:11 +0200
 From: lorenzo.ise...@gmail.com
 To: dwinsem...@comcast.net
 CC: r-help@r-project.org
 Subject: Re: [R] Memory management in R

  I already offered the Biostrings package. It provides more robust
  methods for string matching than does grepl. Is there a reason that you
  choose not to?

 Indeed that is the way I should go for and I have installed the package
 after some struggling. Since biostring is a fairly complex package and I
 need only a way to check if a certain string A is a subset of string B,
 do you know the biostring functions to achieve this?
 I see a lot of methods for biological (DNA, RNA) sequences, and they may
 not apply to my series (which are definitely not from biology).

Generally the differences relate to alphabet and things you may want
to know about them. Unless you are looking for reverse complement
text strings, there will be a lot of stuff you don't need. Offhand,
I'd be looking for things like computational linguistics packages
as you are looking to find patterns or predictability in human readable 
character sequences. Now, humans can probably write hairpin-text( look
at what RNA can do LOL) but this is probably not what you care about. 

However,  as I mentioned earlier, I had to write my own regex compiler ( 
coincidently
for bio apps ) to get required performance. Your application and understanding
may benefit from things like building dictionaries that aren't really
part of regex and that can easily be done in a few lines of c++ code
using STL containers. To get statistically meaningful samples, you almost
will certainly need faster code.

 Cheers

 Lorenzo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create single vector after looping through multiple data frames with GREP

2010-10-10 Thread Simon Kiss

Hello all, 

I changed the subject line of the e-mail, because the question I''m posing now 
is different than the first one. I hope that this is proper etiquette.  
However, the original chain is included below.

I've incorporated bits of  both Ethan and Brian's code into the script below, 
but there's one aspect I can't get my head around. I'm totally new to 
programming with control structures. The reproducible code below creates a list 
containing 19 data frames, one each for the Most Important Problem  survey 
data for Canada.

What I'd like at this stage is a loop where I can search through all the data 
frames for rows containing the search term and then bind the rows together in a 
plotable (sp?) format.

At the bottom of the code below, you'll find my first attempt to make use of a 
search string and to put it into a plotable format.  It only partially works.  
I can only get the numbers for one year, where I'd like to be able to get a 
string of numbers for several years.But, on the upside, grep appears to do the 
trick in terms of selecting rows.  

Can any one suggest a solution?
Yours truly,
Simon Kiss

#This is the reproducible code to set-up all the data frames
require(XML)
library(XML)
#This gets the data from the web and lists them
mylist - paste (http://www.queensu.ca/cora/_trends/mip_;,
c(1987:2001,2003:2006), .htm, sep=)
alltables - lapply(mylist, readHTMLTable)

#convert to dataframes
r-lapply(alltables, function(x) {as.data.frame(x)} )

#This is just some house-cleaning; structuring all the tables so they are 
uniform 
r[[1]][3]-r[[1]][2]
r[[1]][2]-c( )
r[[2]][4]-r[[2]][2]
r[[2]][5]-r[[2]][3]
r[[2]][2:3]-c( )
r[[3]][4:5]-r[[3]][3:4]
r[[3]][3]-c( )

#This loop deletes some superfluous columns and rows, turns the first column in 
to character strings and the data into numeric
for (i in 1:19) {
n.rows-dim(r[[i]])[1]
r[[i]] - r[[i]][15:n.rows-3, 1:5]
n.rows-dim(r[[i]])[1]
row.names(r[[i]]) -NULL
names(r[[i]]) - c(Response, Q1, Q2, Q3, Q4)

r[[i]][, 1]-as.character(r[[i]][,1])
#r[[i]][,2:5]-as.numeric(as.character(r[[i]][,2:5]))
r[[i]][, 2:5]-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))})
#n.rows-dim(r[[i]])[1]
#r[[i]]-r[[i]][9
}

#This code is my first attempt at introducing a search string, getting the 
rows, binding and plotting;
economy-r[[10]][grep('Economy', r[[10]][,1]),]
economy_2-r[[11]][grep('Economy', r[[11]][,1]),]
test-cbind(economy, economy_2)
plot(as.numeric(test), type='l')

#here's another attempt I'm trying
economy-data.frame
for (i in 15:19) {
economy[i,] -r[[i]][grep('Economy', r[[i]][,1]), ]
}

Begin forwarded message:

 From: Simon Kiss sjk...@gmail.com
 Date: October 7, 2010 4:59:46 PM EDT
 To: Simon Kiss simonjk...@yahoo.ca
 Subject: Fwd: [R] Converting scraped data
 
 
 
 Begin forwarded message:
 
 From: Ethan Brown ethancbr...@gmail.com
 Date: October 6, 2010 4:22:41 PM GMT-04:00
 To: Simon Kiss sjk...@gmail.com
 Cc: r-help@r-project.org
 Subject: Re: [R] Converting scraped data
 
 Hi Simon,
 
 You'll notice the test data.frame has a whole mix of characters in
 the columns you're interested, including a - for missing values, and
 that the columns you're interested in are in fact factors.
 
 as.numeric(factor) returns the level of the factor, not the value of
 the level. (See ?levels and ?factor)--that's why it's giving you those
 irrelevant integers. I always end up using something like this handy
 code snippet to deal with the situation:
 
 unfactor - function(factors)
 # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor
 # Transform a factor back into its factor names
 {
  return(levels(factors)[factors])
 }
 
 Then, to get your data to where you want it, I'd do this:
 
 require(XML)
 theurl - http://www.queensu.ca/cora/_trends/mip_2006.htm;
 tables - readHTMLTable(theurl)
 n.rows - unlist(lapply(tables, function(t) dim(t)[1]))
 class(tables)
 test-data.frame(tables, stringsAsFactors=FALSE)
 
 
 result - test[11:42, 1:5] #Extract the actual data we want
 names(result) - c(Response, Q1, Q2,Q3,Q4)
 for(i in 2:5) {
 # Convert columns to factors
 result[,i] - as.numeric(unfactor(result[,i]))
 }
 result
 
 From here you should be able to plot or do whatever else you want.
 
 Hope this helps,
 Ethan Brown
 
 
 On Wed, Oct 6, 2010 at 9:52 AM, Simon Kiss sjk...@gmail.com wrote:
 Dear Colleagues,
 I used this code to scrape data from the URL conatined within.  This code
 should be reproducible.
 
 require(XML)
 library(XML)
 theurl - http://www.queensu.ca/cora/_trends/mip_2006.htm;
 tables - readHTMLTable(theurl)
 n.rows - unlist(lapply(tables, function(t) dim(t)[1]))
 class(tables)
 test-data.frame(tables, stringsAsFactors=FALSE)
 test[16,c(2:5)]
 as.numeric(test[16,c(2:5)])
 quartz()
 plot(c(1:4), test[15, c(2:5)])
 
 calling the values from the row of interest using test[16, c(2:5)] can bring
 them up as represented on the screen, plotting them or coercing them to
 numeric changes the values and in a way that doesn't make

Re: [R] segfault caused by `icfit` in `interval` package

2010-10-10 Thread Yuliya Matveyeva

On the main site I have only found
R-2.11.1.tar.gzhttp://cran.gis-lab.info/src/base/R-2/R-2.11.1.tar.gzto
be the latest release (the latest stable release as far as I
understand
it). But unfortunately it doesn't pass the `make check` on my system (that
is probably the reason why the `emerge` command keeps telling me that my R
version is up-to-date). May be I should post a separate message  about this
fact, but I am guessing I shouldn't because making a new release suitable
for all OS's is probably just a matter of time.
But before I write to the package maintainer directly could you please tell
if there might be a non-package-specific reason for a segfault in my case ?

Sincerely yours,
Yuliya Matveyeva.

2010/10/10 Uwe Ligges lig...@statistik.tu-dortmund.de

 Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version
 of the package (please report its version number!).

 If it still fails, it may be a good idea to contact the package maintainer
 of interval first.

 Uwe Ligges




 On 10.10.2010 17:33, Yuliya Matveyeva wrote:

 Dear R community,
  I am using the R package `interval` in order to perform some modelling
 tests of the
 NPMLE convergence in the case of censoring. So all I am doing is drawing a
 sample
 from exponential distribution, making it a censored sample and computing
 the
 NPMLE of
 its distribution function. But when run on Linux Calculate 10.4 the
 program
 keeps
 crashing and reporting a segmentation fault
 after the call to the `icfit` function when the sample size gets to 70.
 When run on Windows 7 it seems to be fine.
 That is why I am totally confused and have decided to ask for help.

 I have attached the code I am running which results in a segmentation
 fault
 if run on Linux Calculate.
 It has the seed set to the value which leads to this error. But it is
 important to note
 that if the parameters used in the program and the seed are changed it
 doesn't
 necessarily crash.

 Here is the description of my R version and OS:

  sessionInfo()

 R version 2.10.1 (2009-12-14)
 i686-pc-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 After calling `icfit` the program quits with the following output
 (I have replaced the output concerning the arguments passed to
 initcomputeMLE by
   arguments passed to initcomputeMLE  so that the description of
 the

 output wouldn't be too long):

  *** caught segfault ***
 address 0xc, cause 'memory not mapped'

 Traceback:
  1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol)
  2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol
 =
 tol)
  3: initcomputeMLE(  arguments passed to initcomputeMLE)
  4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A
 =
 A))
  5: doTryCatch(return(expr), name, parentenv, handler)
  6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  7: tryCatchList(expr, classes, parentenv, handlers)
  8: tryCatch(expr, error = function(e) {call- conditionCall(e)if
 (!is.null(call)) {if (identical(call[[1L]],
 quote(doTryCatch))) call- sys.call(-4L)dcall-
 deparse(call)[1L]prefix- paste(Error in, dcall, : )
 LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg,
 \n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type
 = w)if (is.na(w)) w- 14L + nchar(dcall, type =
 b)
 + nchar(sm[1L], type = b)if (w  LONG)
 prefix- paste(prefix, \n  , sep = )}else prefix- Error :
 msg- paste(prefix, conditionMessage(e), \n, sep = )
 .Internal(seterrmessage(msg[1L]))if (!silent
 identical(getOption(show.error.messages), TRUE)) {
  cat(msg,
 file = stderr()).Internal(printDeferredWarnings())}
 invisible(structure(msg, class = try-error))})
  9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin,
 A = A)))
 10: icfit.default(L = left, R = right)
 11: icfit(L = left, R = right)

 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace

 I would greatly appreciate any help provided.
 Sincerely yours,
 Yuliya Matveyeva.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE

Re: [R] segfault caused by `icfit` in `interval` package

2010-10-10 Thread Yuliya Matveyeva

The package version is Version: 1.0-1.0 as reported by the
packageDescription(interval).

2010/10/10 Uwe Ligges lig...@statistik.tu-dortmund.de

 Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version
 of the package (please report its version number!).

 If it still fails, it may be a good idea to contact the package maintainer
 of interval first.

 Uwe Ligges




 On 10.10.2010 17:33, Yuliya Matveyeva wrote:

 Dear R community,
  I am using the R package `interval` in order to perform some modelling
 tests of the
 NPMLE convergence in the case of censoring. So all I am doing is drawing a
 sample
 from exponential distribution, making it a censored sample and computing
 the
 NPMLE of
 its distribution function. But when run on Linux Calculate 10.4 the
 program
 keeps
 crashing and reporting a segmentation fault
 after the call to the `icfit` function when the sample size gets to 70.
 When run on Windows 7 it seems to be fine.
 That is why I am totally confused and have decided to ask for help.

 I have attached the code I am running which results in a segmentation
 fault
 if run on Linux Calculate.
 It has the seed set to the value which leads to this error. But it is
 important to note
 that if the parameters used in the program and the seed are changed it
 doesn't
 necessarily crash.

 Here is the description of my R version and OS:

  sessionInfo()

 R version 2.10.1 (2009-12-14)
 i686-pc-linux-gnu

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 After calling `icfit` the program quits with the following output
 (I have replaced the output concerning the arguments passed to
 initcomputeMLE by
   arguments passed to initcomputeMLE  so that the description of
 the

 output wouldn't be too long):

  *** caught segfault ***
 address 0xc, cause 'memory not mapped'

 Traceback:
  1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol)
  2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol
 =
 tol)
  3: initcomputeMLE(  arguments passed to initcomputeMLE)
  4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A
 =
 A))
  5: doTryCatch(return(expr), name, parentenv, handler)
  6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  7: tryCatchList(expr, classes, parentenv, handlers)
  8: tryCatch(expr, error = function(e) {call- conditionCall(e)if
 (!is.null(call)) {if (identical(call[[1L]],
 quote(doTryCatch))) call- sys.call(-4L)dcall-
 deparse(call)[1L]prefix- paste(Error in, dcall, : )
 LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg,
 \n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type
 = w)if (is.na(w)) w- 14L + nchar(dcall, type =
 b)
 + nchar(sm[1L], type = b)if (w  LONG)
 prefix- paste(prefix, \n  , sep = )}else prefix- Error :
 msg- paste(prefix, conditionMessage(e), \n, sep = )
 .Internal(seterrmessage(msg[1L]))if (!silent
 identical(getOption(show.error.messages), TRUE)) {
  cat(msg,
 file = stderr()).Internal(printDeferredWarnings())}
 invisible(structure(msg, class = try-error))})
  9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin,
 A = A)))
 10: icfit.default(L = left, R = right)
 11: icfit(L = left, R = right)

 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace

 I would greatly appreciate any help provided.
 Sincerely yours,
 Yuliya Matveyeva.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] segfault caused by `icfit` in `interval` package

2010-10-10 Thread Uwe Ligges




On 10.10.2010 19:10, Yuliya Matveyeva wrote:

On the main site I have only found
R-2.11.1.tar.gzhttp://cran.gis-lab.info/src/base/R-2/R-2.11.1.tar.gzto
be the latest release (the latest stable release as far as I
understand
it). But unfortunately it doesn't pass the `make check` on my system (that
is probably the reason why the `emerge` command keeps telling me that my R
version is up-to-date). May be I should post a separate message  about this
fact, but I am guessing I shouldn't because making a new release suitable
for all OS's is probably just a matter of time.
But before I write to the package maintainer directly could you please tell
if there might be a non-package-specific reason for a segfault in my case ?



Probably it is the package, but we would need a reproducible example to 
check it.


Uwe Ligges



Sincerely yours,
Yuliya Matveyeva.

2010/10/10 Uwe Liggeslig...@statistik.tu-dortmund.de


Please try with the R-2.12.0 ReleaseCandidate and thwe most recent version
of the package (please report its version number!).

If it still fails, it may be a good idea to contact the package maintainer
of interval first.

Uwe Ligges




On 10.10.2010 17:33, Yuliya Matveyeva wrote:


Dear R community,
  I am using the R package `interval` in order to perform some modelling
tests of the
NPMLE convergence in the case of censoring. So all I am doing is drawing a
sample
from exponential distribution, making it a censored sample and computing
the
NPMLE of
its distribution function. But when run on Linux Calculate 10.4 the
program
keeps
crashing and reporting a segmentation fault
after the call to the `icfit` function when the sample size gets to 70.
When run on Windows 7 it seems to be fine.
That is why I am totally confused and have decided to ask for help.

I have attached the code I am running which results in a segmentation
fault
if run on Linux Calculate.
It has the seed set to the value which leads to this error. But it is
important to note
that if the parameters used in the program and the seed are changed it
doesn't
necessarily crash.

Here is the description of my R version and OS:

  sessionInfo()



R version 2.10.1 (2009-12-14)
i686-pc-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

After calling `icfit` the program quits with the following output
(I have replaced the output concerning the arguments passed to
initcomputeMLE by
   arguments passed to initcomputeMLE   so that the description of
the

output wouldn't be too long):

  *** caught segfault ***
address 0xc, cause 'memory not mapped'

Traceback:
  1: .Call(ComputeMLEForR, R, B, max.inner, max.outer, tol)
  2: computeMLE(R, B, max.inner = max.inner, max.outer = max.outer, tol
=
tol)
  3: initcomputeMLE(   arguments passed to initcomputeMLE)
  4: do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin, A
=
A))
  5: doTryCatch(return(expr), name, parentenv, handler)
  6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
  7: tryCatchList(expr, classes, parentenv, handlers)
  8: tryCatch(expr, error = function(e) {call- conditionCall(e)if
(!is.null(call)) {if (identical(call[[1L]],
quote(doTryCatch))) call- sys.call(-4L)dcall-
deparse(call)[1L]prefix- paste(Error in, dcall, : )
LONG- 75Lmsg- conditionMessage(e)sm- strsplit(msg,
\n)[[1L]]w- 14L + nchar(dcall, type = w) + nchar(sm[1L], type
= w)if (is.na(w)) w- 14L + nchar(dcall, type =
b)
+ nchar(sm[1L], type = b)if (w   LONG)
prefix- paste(prefix, \n  , sep = )}else prefix- Error :
msg- paste(prefix, conditionMessage(e), \n, sep = )
.Internal(seterrmessage(msg[1L]))if (!silent
identical(getOption(show.error.messages), TRUE)) {
  cat(msg,
file = stderr()).Internal(printDeferredWarnings())}
invisible(structure(msg, class = try-error))})
  9: try(do.call(initfit, args = list(L = L, R = R, Lin = Lin, Rin = Rin,
A = A)))
10: icfit.default(L = left, R = right)
11: icfit(L = left, R = right)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

I would greatly appreciate any help provided.
Sincerely yours,
Yuliya Matveyeva.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__

Re: [R] Help reading table rows into lists

2010-10-10 Thread Gabor Grothendieck

On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote:
 Hi all,

 I have a large table mapping thousands of COGs(groups of genes) to pathways.
 # Ex
 COG0001 patha   pathb   pathc
 COG0002 pathd   pathe
 COG0003 pathe   pathf   pathg   pathh
 ##

 I would like to combine this information into a big list such as below
 COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh))

 I am stuck and have tried various methods involving (probably mangled)
 versions of lappy and loops.

 Any suggestions on the most efficient way to do this would be great.


Try this:


Lines - COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
DF - read.table(textConnection(Lines), header = FALSE,
 fill = TRUE, as.is = TRUE, na.strings = )

library(reshape2)
m - na.omit(melt(DF, 1))
result - unstack(m, value ~ V1)

giving

 result
$COG0001
[1] patha pathb pathc

$COG0002
[1] pathd pathe

$COG0003
[1] pathe pathf pathg pathh


or

 acast(DF, value ~ V1)
  COG0001 COG0002 COG0003
patha patha   NANA
pathb pathb   NANA
pathc pathc   NANA
pathd NApathd   NA
pathe NApathe   pathe
pathf NANApathf
pathg NANApathg
pathh NANApathh
Levels: patha pathb pathc pathd pathe pathf pathg pathh

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matching long strings ... was Re: Memory management in R

2010-10-10 Thread David Winsemius



On Oct 10, 2010, at 11:35 AM, Martin Morgan wrote:


On 10/10/2010 07:11 AM, David Winsemius wrote:


On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote:




I already offered the Biostrings package. It provides more robust
methods for string matching than does grepl. Is there a reason  
that you

choose not to?



Indeed that is the way I should go for and I have installed the
package after some struggling.


For me is was a matter of waiting. The only struggle was coming  
from my

inner timer saying it was taking too long.


Since biostring is a fairly complex package and I need only a way to
check if a certain string A is a subset of string B, do you know the
biostring functions to achieve this?
I see a lot of methods for biological (DNA, RNA) sequences, and they
may not apply to my series (which are definitely not from biology).
Cheers


It appeared to me that the function matchPattern should replace your
grepl invocation that was failing. It returns a more complex  
structure,

so you would need to determine what would be an exact replacement for
grepl(...) != 1. Looks like a no-match event resutls in the start and
end items being of length 0.


str(  matchPattern(A, BString(BBB)) )


A couple of things from this thread.

To install a Bioconductor package follow directions here

 http://bioconductor.org/install/index.html#install-bioconductor-packages

which leads to

  source(http://bioconductor.org/biocLite.R;)
  biocLite(Biostrings)

biocLite is just a wrapper around install.packages with appropriate
repositories defined.

Some Bioconductor packages are relatively mature and make relatively
advanced use of S4 classes, so looking at str() is not that helpful --
the way the user is meant to interact with the object is different  
from

the way the object is implemented. So the best bet is to look at the
relevant help pages

 result = matchPattern(A, BString(BBB))
 class(result)
 class?XStringViews


The above was the most surprising example for me (not being  
particularly S4-savvy). Looks like it parses as:

`?`(class, XStringViews)

Is that an S4 sort of extension for accessing documentation or have I  
just missed a more general method? I tried looking at the help Index  
for the methods package.




and the help pages referenced there, or from which XStringViews  
inherits


  class(XStringViews)

and in particular

  class?Ranges

Rather than accessing the 'start' slot, use start(result). Vignettes  
are

used heavily in Bioconductor packages, and in particular

  browseVignettes(Biostrings)

pops up a page with several relevant vignettes, e.g., 'A short
presentation of the basic classes...' and perhaps 'Pairwise Sequence
Alignment'. These are also accessible on the Bioconductor web site,
e.g., on the pages linked from

 http://bioconductor.org/help/bioc-views/release/bioc/

The rule of thumb hinted at below -- that an operation seems to be
taking longer than it should -- probably indicates that the function  
is
being invoked in an inefficient way. If the documentation is opaque  
then

definitely the place to seek additional help is on the Bioconductor
mailing list

 http://bioconductor.org/help/mailing-list/

Hope this helps.

Martin



Formal class 'XStringViews' [package Biostrings] with 7 slots
 ..@ subject:Formal class 'BString' [package Biostrings]  
with

6 slots
 .. .. ..@ shared :Formal class 'SharedRaw' [package  
IRanges]

with 2 slots
 .. .. .. .. ..@ xp:externalptr
 .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8
 .. .. ..@ offset : int 0
 .. .. ..@ length : int 3
 .. .. ..@ elementMetadata: NULL
 .. .. ..@ elementType: chr ANY
 .. .. ..@ metadata   : list()
 ..@ start  : int(0)
 ..@ width  : int(0)
 ..@ NAMES  : NULL
 ..@ elementMetadata: NULL
 ..@ elementType: chr integer
 ..@ metadata   : list()

Perhaps:

length(matchPattern(fut_string, past_string)@start ) == 0

You do need to use BString() on at least the past_string argument and
maybe the fut_string as well. The BioConductor Mailing List would  
have a

larger audience with experience using this package, so they should
probably be your next avenue for advice. I am just reading the help
pages as you should be able to do. The help page
help(lowlevel-matching) should probably be reviewed since there  
may be

efficiency issues to consider as mentioned below.

When dropped into your function with the BString coercion, it  
replicated

your small example results and did not crash after a long period with
your larger example, so I then terminated it and insert a reporter
line to monitor progress. With that reporter I got up into the  
200's for
count_len without error. My laptop CPU was warming up the case and  
I was

getting sleepy so I terminated the process. (I had no way of checking
for accuracy, even if I had let it proceed, since you did not offer a
correct answer.)

By the way, the construct ... grepl(. , .) != 1

Re: [R] Matching long strings ... was Re: Memory management in R

2010-10-10 Thread Martin Morgan

On 10/10/2010 11:00 AM, David Winsemius wrote:
 
 On Oct 10, 2010, at 11:35 AM, Martin Morgan wrote:
 
 On 10/10/2010 07:11 AM, David Winsemius wrote:

 On Oct 10, 2010, at 9:27 AM, Lorenzo Isella wrote:


 I already offered the Biostrings package. It provides more robust
 methods for string matching than does grepl. Is there a reason that
 you
 choose not to?


 Indeed that is the way I should go for and I have installed the
 package after some struggling.

 For me is was a matter of waiting. The only struggle was coming from my
 inner timer saying it was taking too long.

 Since biostring is a fairly complex package and I need only a way to
 check if a certain string A is a subset of string B, do you know the
 biostring functions to achieve this?
 I see a lot of methods for biological (DNA, RNA) sequences, and they
 may not apply to my series (which are definitely not from biology).
 Cheers

 It appeared to me that the function matchPattern should replace your
 grepl invocation that was failing. It returns a more complex structure,
 so you would need to determine what would be an exact replacement for
 grepl(...) != 1. Looks like a no-match event resutls in the start and
 end items being of length 0.

 str(  matchPattern(A, BString(BBB)) )

 A couple of things from this thread.

 To install a Bioconductor package follow directions here

  http://bioconductor.org/install/index.html#install-bioconductor-packages

 which leads to

   source(http://bioconductor.org/biocLite.R;)
   biocLite(Biostrings)

 biocLite is just a wrapper around install.packages with appropriate
 repositories defined.

 Some Bioconductor packages are relatively mature and make relatively
 advanced use of S4 classes, so looking at str() is not that helpful --
 the way the user is meant to interact with the object is different from
 the way the object is implemented. So the best bet is to look at the
 relevant help pages

  result = matchPattern(A, BString(BBB))
  class(result)
  class?XStringViews
 
 The above was the most surprising example for me (not being particularly
 S4-savvy). Looks like it parses as:
 `?`(class, XStringViews)

similarly ?XStringViews-class

 Is that an S4 sort of extension for accessing documentation or have I
 just missed a more general method? I tried looking at the help Index for
 the methods package.

?? documents type?topic. It is more general, in that package?stats
takes one to the 'stats' topic amongst the 'package' doc-type help
pages. It relies on package authors choosing appropriate docTypes for
their man pages.

One S4 paradigm that can be useful is the analog of methods(class=lm),
which is showMethods(class=XStringViews, where=package:Biostrings).

Martin

 

 and the help pages referenced there, or from which XStringViews inherits

   class(XStringViews)

 and in particular

   class?Ranges

 Rather than accessing the 'start' slot, use start(result). Vignettes are
 used heavily in Bioconductor packages, and in particular

   browseVignettes(Biostrings)

 pops up a page with several relevant vignettes, e.g., 'A short
 presentation of the basic classes...' and perhaps 'Pairwise Sequence
 Alignment'. These are also accessible on the Bioconductor web site,
 e.g., on the pages linked from

  http://bioconductor.org/help/bioc-views/release/bioc/

 The rule of thumb hinted at below -- that an operation seems to be
 taking longer than it should -- probably indicates that the function is
 being invoked in an inefficient way. If the documentation is opaque then
 definitely the place to seek additional help is on the Bioconductor
 mailing list

  http://bioconductor.org/help/mailing-list/

 Hope this helps.

 Martin


 Formal class 'XStringViews' [package Biostrings] with 7 slots
  ..@ subject:Formal class 'BString' [package Biostrings] with
 6 slots
  .. .. ..@ shared :Formal class 'SharedRaw' [package IRanges]
 with 2 slots
  .. .. .. .. ..@ xp:externalptr
  .. .. .. .. ..@ .link_to_cached_object:environment: 0x11e0e59f8
  .. .. ..@ offset : int 0
  .. .. ..@ length : int 3
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ elementType: chr ANY
  .. .. ..@ metadata   : list()
  ..@ start  : int(0)
  ..@ width  : int(0)
  ..@ NAMES  : NULL
  ..@ elementMetadata: NULL
  ..@ elementType: chr integer
  ..@ metadata   : list()

 Perhaps:

 length(matchPattern(fut_string, past_string)@start ) == 0

 You do need to use BString() on at least the past_string argument and
 maybe the fut_string as well. The BioConductor Mailing List would have a
 larger audience with experience using this package, so they should
 probably be your next avenue for advice. I am just reading the help
 pages as you should be able to do. The help page
 help(lowlevel-matching) should probably be reviewed since there may be
 efficiency issues to consider as mentioned below.

 When dropped into your function with the BString coercion, it replicated
 your

Re: [R] Help reading table rows into lists

2010-10-10 Thread Jeffrey Spies

To get just the list you wanted, Gabor's solution is more elegant, but
here's another using the apply family.  First, your data:

dat - scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n)

I expect dat to be a vector of strings where each string is a line of
values separated by tabs, which I think, by looking at your other
code, is what you get.

sapply(dat, function(x){
tmp-unlist(strsplit(x, '\t', fixed=T))
out - list(tmp[seq_along(tmp)[-1]])
names(out) - tmp[1]
out
}, USE.NAMES=F)

The one difference between the two is that if you have a COG with no
pathways (might not be realistic or that big of a deal), this solution
will have the COG name in the list with a value of character(0) where
Gabor's will omit the COG completely. Again, probably not a big deal.

Cheers,

Jeff.

On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller alison.wal...@embl.de wrote:
 Hi all,

 I have a large table mapping thousands of COGs(groups of genes) to pathways.
 # Ex
 COG0001 patha   pathb   pathc
 COG0002 pathd   pathe
 COG0003 pathe   pathf   pathg   pathh
 ##

 I would like to combine this information into a big list such as below
 COG2PATHWAY-list(COG0001=c(patha,pathb,pathc),COG0002=c(pathd,pathe),COG0003=c(pathf,pathg,pathh))

 I am stuck and have tried various methods involving (probably mangled)
 versions of lappy and loops.

 Any suggestions on the most efficient way to do this would be great.

 Thanks,

 Alison

 Here is my latest attempt.

 #

 line_num-length(scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n))
 COG2Path-vector(list,line_num)
 COG2Path-lapply(1:(line_num-1),function(x)
 scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))

 #

 I am getting an error

 #

COG2Path-lapply(1:(line_num-1),function(x)
 scan(file=/g/bork8/waller/test_COGtopath.txt,skip=x,nlines=1,quiet=T,what='character',sep=\t))
 Error in file(file, r) : cannot open the connection
 In addition: Warning message:
 In file(file, r) :

 But if I do scan alone I don't get an error

 # then I suppose it looks like the easiest wasy to name the list variables
 is using unix to cut the first column out and then read that in.
 names(COG2Path)-scan(file=/g/bork8/waller/test_col_names.txt,sep=\t,what=character)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hausman test for endogeneity

2010-10-10 Thread Holger Steinmetz


Dear Arne,

this looks promising! Thank you very much.

Best,
Holger
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Hausman-test-for-endogeneity-tp2969522p2970564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help reading table rows into lists

2010-10-10 Thread Gabor Grothendieck

On Sun, Oct 10, 2010 at 2:59 PM, Jeffrey Spies jsp...@virginia.edu wrote:
 To get just the list you wanted, Gabor's solution is more elegant, but
 here's another using the apply family.  First, your data:

 dat - 
 scan(file=/g/bork8/waller/test_COGtoPath.txt,what=character,sep=\n)

 I expect dat to be a vector of strings where each string is a line of
 values separated by tabs, which I think, by looking at your other
 code, is what you get.

 sapply(dat, function(x){
    tmp-unlist(strsplit(x, '\t', fixed=T))
    out - list(tmp[seq_along(tmp)[-1]])
    names(out) - tmp[1]
    out
 }, USE.NAMES=F)

 The one difference between the two is that if you have a COG with no
 pathways (might not be realistic or that big of a deal), this solution
 will have the COG name in the list with a value of character(0) where
 Gabor's will omit the COG completely. Again, probably not a big deal.

If that is important then do it this way:

Lines - COG0001 patha   pathb   pathc
COG0002 pathd   pathe
COG0003 pathe   pathf   pathg   pathh
COG0004
DF - read.table(textConnection(Lines), header = FALSE,
fill = TRUE, as.is = TRUE, na.strings = )

library(reshape2)
m - melt(DF, 1)
lapply(unstack(m, value ~ V1), complete.cases)

acast(m, value ~ V1)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] venneuler (java?) color palette 0 - 1

2010-10-10 Thread Karl Brand


Dear UseRs and DevelopeRs

It would be helpful to see the color palette available in the 
venneuler() function.


The relevant par of ?venneuler states:

colors: colors of the circles as values between 0 and 1

-which explains color specification, but from what pallette? Short of 
trial and error, i'd really appreciate if some one could help me locate 
a 0 - 1 pallette for this function to aid with color selection.


FWIW, i tried the below code and received the displayed error. I failed 
to turn up any solutions to this error...


Any suggestions appreciated,

Karl


library(venneuler)

ve - venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1))

class(ve)
[1] VennDiagram

ve$colors - c(red, green, blue)

plot(ve)

Error in col * 360 : non-numeric argument to binary operator

--
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 |F +31 (0)10 704 4743 |M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help reading table rows into lists

2010-10-10 Thread Alison Waller


Thanks Gabor and Jeffrey,

and thanks for explaining the differences.  I think I'll go with  
Jeffery's as I think I want entries for COGs with no pathway.


Alison
On 10-Oct-10, at 8:59 PM, Jeffrey Spies wrote:


sapply(dat, function(x){
   tmp-unlist(strsplit(x, '\t', fixed=T))
   out - list(tmp[seq_along(tmp)[-1]])
   names(out) - tmp[1]
   out
}, USE.NAMES=F)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help reading table rows into lists

2010-10-10 Thread Gabor Grothendieck

On Sun, Oct 10, 2010 at 3:29 PM, Alison Waller alison.wal...@embl.de wrote:
 Thanks Gabor and Jeffrey,

 and thanks for explaining the differences.  I think I'll go with Jeffery's
 as I think I want entries for COGs with no pathway.


My second post does handle that case.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] venneuler (java?) color palette 0 - 1

2010-10-10 Thread Paul Murrell


Hi

On 11/10/2010 9:01 a.m., Karl Brand wrote:

Dear UseRs and DevelopeRs

It would be helpful to see the color palette available in the
venneuler() function.

The relevant par of ?venneuler states:

colors: colors of the circles as values between 0 and 1

-which explains color specification, but from what pallette? Short of
trial and error, i'd really appreciate if some one could help me locate
a 0 - 1 pallette for this function to aid with color selection.


The color spec stored in the VennDiagram object is multiplied by 360 to 
give the hue component of an hcl() colour specification.  For example, 
0.5 would mean the colour hcl(0.5*360, 130, 60)


Alternatively, you can control the colours when you call plot, for 
example, ...


plot(ve, col=c(red, green, blue))

... should work.

Paul


FWIW, i tried the below code and received the displayed error. I failed
to turn up any solutions to this error...

Any suggestions appreciated,

Karl


library(venneuler)

ve- venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1))

class(ve)
[1] VennDiagram

ve$colors- c(red, green, blue)

plot(ve)

Error in col * 360 : non-numeric argument to binary operator



--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Line Type Specification: lty=onoff but lty=offon?

2010-10-10 Thread Henrik Bengtsson

Hi,

Section 'Line Type Specification' in help(par) explains how you can do
custom line types.  For example:

plot(NA, xlim=c(0,1), ylim=c(0,1));
abline(h=1/2, col=blue, lwd=2, lty=88);

will draw a dashed line segment where the line is composed of 8 units
of on (blue color) and 8 units of off (transparent), then
repeated.

Now I'd like to draw a second red line overlapping this one, but where
the gaps are now red.  Technically, I think the following would
define that:

abline(h=1/2, col=red, lwd=2, lty=0880);

that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated.

However, zeros are not allowed (actually, why not?)

Any suggestions to draw one red and one blue dashed lines that, if
overlapping, the the overlapping segments will be blue, red, blue,
red, ...?

/Henrik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line Type Specification: lty=onoff but lty=offon?

2010-10-10 Thread David Winsemius



On Oct 10, 2010, at 5:50 PM, Henrik Bengtsson wrote:


Hi,

Section 'Line Type Specification' in help(par) explains how you can do
custom line types.  For example:

plot(NA, xlim=c(0,1), ylim=c(0,1));
abline(h=1/2, col=blue, lwd=2, lty=88);

will draw a dashed line segment where the line is composed of 8 units
of on (blue color) and 8 units of off (transparent), then
repeated.

Now I'd like to draw a second red line overlapping this one, but where
the gaps are now red.  Technically, I think the following would
define that:

abline(h=1/2, col=red, lwd=2, lty=0880);

that is 0 on, 8 off, 8 on (red color) and 0 off, then  
repeated.


However, zeros are not allowed (actually, why not?)

Any suggestions to draw one red and one blue dashed lines that, if
overlapping, the the overlapping segments will be blue, red, blue,
red, ...?


You might look at the code for color.scale.lines in package plotrix.  
It's not exactly what you asked for but Jim Lemon has figured out out  
how to change colors of connected segments.


--
David.


/Henrik


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line Type Specification: lty=onoff but lty=offon?

2010-10-10 Thread Peter Alspach

Tena koe Henrik

Not exactly what you are requesting but you could draw a solid line and then 
the dashed line over the top:

plot(1:10, panel.first=abline(0, 1, col='red', lwd=2), panel.last=abline(0, 1, 
col='blue', lty='88', lwd=2))

or

plot(1:10)
abline(0, 1, col='red', lwd=2)
abline(0, 1, col='blue', lty='88', lwd=2))

This may be system or graphics device dependent (I'm using Windows).

HTH 

Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Henrik Bengtsson
 Sent: Monday, 11 October 2010 10:50 a.m.
 To: r-help
 Subject: [R] Line Type Specification: lty=onoff but
 lty=offon?
 
 Hi,
 
 Section 'Line Type Specification' in help(par) explains how you can do
 custom line types.  For example:
 
 plot(NA, xlim=c(0,1), ylim=c(0,1));
 abline(h=1/2, col=blue, lwd=2, lty=88);
 
 will draw a dashed line segment where the line is composed of 8 units
 of on (blue color) and 8 units of off (transparent), then
 repeated.
 
 Now I'd like to draw a second red line overlapping this one, but where
 the gaps are now red.  Technically, I think the following would
 define that:
 
 abline(h=1/2, col=red, lwd=2, lty=0880);
 
 that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated.
 
 However, zeros are not allowed (actually, why not?)
 
 Any suggestions to draw one red and one blue dashed lines that, if
 overlapping, the the overlapping segments will be blue, red, blue,
 red, ...?
 
 /Henrik
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be subject to legal 
privilege.
 If you are not the intended recipient you must not use, disseminate, 
distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received 
this
 e-mail in error, please notify the sender and delete all material pertaining 
to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] CRAN (and crantastic) updates this week

2010-10-10 Thread Crantastic

CRAN (and crantastic) updates this week

New packages



Updated packages


BradleyTerry2 (0.9-3), COUNT (1.1.1), DeducerPlugInScaling (0.0-6)



This email provided as a service for the R community by
http://crantastic.org.

Like it?  Hate it?  Please let us know: crana...@gmail.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line Type Specification: lty=onoff but lty=offon?

2010-10-10 Thread Henrik Bengtsson

Thanks both, but unfortunately not.

Here is a better illustration on what I want to achieve;

xs - c(0,1,2,3); ys - c(-1,0,0,1);
lty - c(FF11, 1FF1);
plot(NA, xlim=c(0,3), ylim=c(-1,1));
lines(xs, ys, col=red, lwd=2, lty=lty[1]);
lines(xs, -ys, col=blue, lwd=2, lty=lty[2]);

except that I don't want those short 1:s pieces.  Ideally I'd like to use:

lty - c(FF00, 0FF0);

and dashes of any lengths, e.g. lty - c(2200, 0220);

/Henrik

On Sun, Oct 10, 2010 at 3:42 PM, Peter Alspach
peter.alsp...@plantandfood.co.nz wrote:
 Tena koe Henrik

 Not exactly what you are requesting but you could draw a solid line and then 
 the dashed line over the top:

 plot(1:10, panel.first=abline(0, 1, col='red', lwd=2), panel.last=abline(0, 
 1, col='blue', lty='88', lwd=2))

 or

 plot(1:10)
 abline(0, 1, col='red', lwd=2)
 abline(0, 1, col='blue', lty='88', lwd=2))

 This may be system or graphics device dependent (I'm using Windows).

 HTH 

 Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Henrik Bengtsson
 Sent: Monday, 11 October 2010 10:50 a.m.
 To: r-help
 Subject: [R] Line Type Specification: lty=onoff but
 lty=offon?

 Hi,

 Section 'Line Type Specification' in help(par) explains how you can do
 custom line types.  For example:

 plot(NA, xlim=c(0,1), ylim=c(0,1));
 abline(h=1/2, col=blue, lwd=2, lty=88);

 will draw a dashed line segment where the line is composed of 8 units
 of on (blue color) and 8 units of off (transparent), then
 repeated.

 Now I'd like to draw a second red line overlapping this one, but where
 the gaps are now red.  Technically, I think the following would
 define that:

 abline(h=1/2, col=red, lwd=2, lty=0880);

 that is 0 on, 8 off, 8 on (red color) and 0 off, then repeated.

 However, zeros are not allowed (actually, why not?)

 Any suggestions to draw one red and one blue dashed lines that, if
 overlapping, the the overlapping segments will be blue, red, blue,
 red, ...?

 /Henrik

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 The contents of this e-mail are confidential and may be subject to legal 
 privilege.
  If you are not the intended recipient you must not use, disseminate, 
 distribute or
  reproduce all or any part of this e-mail or attachments.  If you have 
 received this
  e-mail in error, please notify the sender and delete all material pertaining 
 to this
  e-mail.  Any opinion or views expressed in this e-mail are those of the 
 individual
  sender and may not represent those of The New Zealand Institute for Plant and
  Food Research Limited.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line Type Specification: lty=onoff but lty=offon?

2010-10-10 Thread Steve Taylor

You might have to use segments() to place the line segments precisely.

?segments

From: Henrik Bengtsson h...@stat.berkeley.edu
To:Peter Alspach peter.alsp...@plantandfood.co.nz
CC:r-help r-help@r-project.org
Date: 11/Oct/2010 1:23p
Subject: Re: [R] Line Type Specification: lty=onoff but lty=offon?
Thanks both, but unfortunately not.

Here is a better illustration on what I want to achieve;

xs - c(0,1,2,3); ys - c(-1,0,0,1);
lty - c(FF11, 1FF1);
plot(NA, xlim=c(0,3), ylim=c(-1,1));
lines(xs, ys, col=red, lwd=2, lty=lty[1]);
lines(xs, -ys, col=blue, lwd=2, lty=lty[2]);

except that I don't want those short 1:s pieces.  Ideally I'd like to use:

lty - c(FF00, 0FF0);

and dashes of any lengths, e.g. lty - c(2200, 0220);

/Henrik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rearrange command in quantreg package

2010-10-10 Thread Dimitris.Kapetanakis



Dear all,

I want to use the rearrange command which is based on Chernozhukov et al
paper and is included in the quantreg package. So, I run a quantile
regression in which I included dummy variables for state and years in order
to estimate the respective fixed effects quantile regression. The problems
are the followings:

1. At example that is stated in the help, I don't understand what the
income=quantile(income, 0.3) stands for, in the newdata argument of predict.
Why use a specific quantile since we estimate the response's quantile
prediction as a function of the quantile index? (If I understand correctly).
So, if I use the following code the predict command seems to work fine 

dyear-dummy(ekc$year)[,-1]
dstate-dummy(ekc$state)[,-1]

dekc-cbind(ekc, dyear, dstate)

z.nox-rq(nox~dyear+dstate+pcinc+I(pcinc^2)+I(pcinc^3), tau=-1, data=dekc)

zp.nox - predict(z.nox,newdata=list(pcinc=ekc$pcinc,
dyear=dummy(ekc$year)[,-1], dstate=dummy(ekc$state)[,-1]), type=stepfun)
 
but when I am going to do the plot 

plot(zp.nox,do.points = FALSE, xlab = expression(tau), ylab = expression(Q (
tau )), main=Quantile Something)

the following error appears

Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' is a list, but does not have components 'x' and 'y'

On the other hand when I using

 zp.nox - predict(z.nox,newdata=list(pcinc=quantile(ekc$pcinc, 0.3),
dyear=dummy(ekc$year)[,-1], dstate=dummy(ekc$state)[,-1]), type=stepfun)

the following error appears:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) : 
  variable lengths differ (found for 'pcinc')

and again it is not working if I put quantile in the dummies

Perhaps I am doing a stupid mistake since I haven't understood why we use
this 0.3 quantile in income

 data(engel)
z - rq(foodexp ~ income, tau = -1,data =engel)
zp -
predict(z,newdata=list(income=quantile(engel$income,.03)),type=stepfun)
plot(zp,do.points = FALSE, xlab = expression(tau),
ylab = expression(Q ( tau )), main=Engel Food Expenditure
Quantiles)
plot(rearrange(zp),do.points = FALSE, add=TRUE,col.h=red,col.v=red)
legend(.6,300,c(Before Rearrangement,After
Rearrangement),lty=1,col=c(black,red))

2. My initial target was to re-estimate the fitted curves
(nox=b1_hat*pcinc+b2_hat*pcinc^2+b3_hat*pcinc^3) without the quantile
crossing. Obviously the rearrange command does not do this. Does anybody
know how can I re-estimate (if possible) the quantile regressions without
the curves crossing for different quantiles?

Thanks a lot

-- 
View this message in context: 
http://r.789695.n4.nabble.com/rearrange-command-in-quantreg-package-tp2970611p2970611.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Number of occurences of a character in a string

2010-10-10 Thread Santosh Srinivas

New to R ... which is a function to most effectively search the number of 
occurrences of a character in a string?

b - c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k)

I want the number of semi-colons ; in b?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Number of occurences of a character in a string

2010-10-10 Thread Christian Raschke


 length(gregexpr(;, b)[[1]])
[1] 5


This works as long as the substrings you are searching for don't overlap.

Christian



On 10/10/2010 11:18 PM, Santosh Srinivas wrote:

New to R ... which is a function to most effectively search the number of 
occurrences of a character in a string?

b- c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k)

I want the number of semi-colons ; in b?

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   



--
Christian Raschke
Department of Economics
and
ISDS Research Lab (HSRG)
Louisiana State University
Patrick Taylor Hall, Rm 2128
Baton Rouge, LA 70803
cras...@lsu.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Number of occurences of a character in a string

2010-10-10 Thread Michael Sumner

Literally:

length( gregexpr(;, b)[[1]])

But more generally, in case b has more than one element:

sapply(gregexpr(;, b), length)

?gregexpr



On Mon, Oct 11, 2010 at 3:18 PM, Santosh Srinivas 
santosh.srini...@gmail.com wrote:

 New to R ... which is a function to most effectively search the number of
 occurrences of a character in a string?

 b -
 c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k)

 I want the number of semi-colons ; in b?

 Thanks.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsum...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create single vector after looping through multiple data frames with GREP

2010-10-10 Thread Michael Bedward

Hi Simon,

The function below should do it or at least get you started...

getPlotData - function (datalist, response, times)
{
  qdata - sapply(datalist[times],
function(df) {
  irow - grepl(response, df$Response)
  df[irow, 2:5]
}
  )

  # qdata is a matrix with rows Q1:Q4 and cols for times;
  # we turn it into a two col matrix with col 1 = time index
  # and col 2 = value
  time.index - seq(4 * ncol(qdata))
  out - cbind(time.index, as.numeric(qdata))
  rownames(out) - paste(time.index, rownames(qdata), sep=.)
  colnames(out) - c(time, response)
  out
}

#Example, get data for times 10:15 where Response contains Economy
x - getPlotData(r, Economy, 10:15)


Michael


On 11 October 2010 03:35, Simon Kiss sjk...@gmail.com wrote:
 Hello all,

 I changed the subject line of the e-mail, because the question I''m posing 
 now is different than the first one. I hope that this is proper etiquette.  
 However, the original chain is included below.

 I've incorporated bits of  both Ethan and Brian's code into the script below, 
 but there's one aspect I can't get my head around. I'm totally new to 
 programming with control structures. The reproducible code below creates a 
 list containing 19 data frames, one each for the Most Important Problem  
 survey data for Canada.

 What I'd like at this stage is a loop where I can search through all the data 
 frames for rows containing the search term and then bind the rows together in 
 a plotable (sp?) format.

 At the bottom of the code below, you'll find my first attempt to make use of 
 a search string and to put it into a plotable format.  It only partially 
 works.  I can only get the numbers for one year, where I'd like to be able to 
 get a string of numbers for several years.But, on the upside, grep appears to 
 do the trick in terms of selecting rows.

 Can any one suggest a solution?
 Yours truly,
 Simon Kiss

 #This is the reproducible code to set-up all the data frames
 require(XML)
 library(XML)
 #This gets the data from the web and lists them
 mylist - paste (http://www.queensu.ca/cora/_trends/mip_;,
 c(1987:2001,2003:2006), .htm, sep=)
 alltables - lapply(mylist, readHTMLTable)

 #convert to dataframes
 r-lapply(alltables, function(x) {as.data.frame(x)} )

 #This is just some house-cleaning; structuring all the tables so they are 
 uniform
 r[[1]][3]-r[[1]][2]
 r[[1]][2]-c( )
 r[[2]][4]-r[[2]][2]
 r[[2]][5]-r[[2]][3]
 r[[2]][2:3]-c( )
 r[[3]][4:5]-r[[3]][3:4]
 r[[3]][3]-c( )

 #This loop deletes some superfluous columns and rows, turns the first column 
 in to character strings and the data into numeric
 for (i in 1:19) {
 n.rows-dim(r[[i]])[1]
 r[[i]] - r[[i]][15:n.rows-3, 1:5]
 n.rows-dim(r[[i]])[1]
 row.names(r[[i]]) -NULL
 names(r[[i]]) - c(Response, Q1, Q2, Q3, Q4)

 r[[i]][, 1]-as.character(r[[i]][,1])
 #r[[i]][,2:5]-as.numeric(as.character(r[[i]][,2:5]))
 r[[i]][, 2:5]-lapply(r[[i]][, 2:5], function(x) 
 {as.numeric(as.character(x))})
 #n.rows-dim(r[[i]])[1]
 #r[[i]]-r[[i]][9
 }

 #This code is my first attempt at introducing a search string, getting the 
 rows, binding and plotting;
 economy-r[[10]][grep('Economy', r[[10]][,1]),]
 economy_2-r[[11]][grep('Economy', r[[11]][,1]),]
 test-cbind(economy, economy_2)
 plot(as.numeric(test), type='l')

 #here's another attempt I'm trying
 economy-data.frame
 for (i in 15:19) {
 economy[i,] -r[[i]][grep('Economy', r[[i]][,1]), ]
 }

 Begin forwarded message:

 From: Simon Kiss sjk...@gmail.com
 Date: October 7, 2010 4:59:46 PM EDT
 To: Simon Kiss simonjk...@yahoo.ca
 Subject: Fwd: [R] Converting scraped data



 Begin forwarded message:

 From: Ethan Brown ethancbr...@gmail.com
 Date: October 6, 2010 4:22:41 PM GMT-04:00
 To: Simon Kiss sjk...@gmail.com
 Cc: r-help@r-project.org
 Subject: Re: [R] Converting scraped data

 Hi Simon,

 You'll notice the test data.frame has a whole mix of characters in
 the columns you're interested, including a - for missing values, and
 that the columns you're interested in are in fact factors.

 as.numeric(factor) returns the level of the factor, not the value of
 the level. (See ?levels and ?factor)--that's why it's giving you those
 irrelevant integers. I always end up using something like this handy
 code snippet to deal with the situation:

 unfactor - function(factors)
 # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor
 # Transform a factor back into its factor names
 {
  return(levels(factors)[factors])
 }

 Then, to get your data to where you want it, I'd do this:

 require(XML)
 theurl - http://www.queensu.ca/cora/_trends/mip_2006.htm;
 tables - readHTMLTable(theurl)
 n.rows - unlist(lapply(tables, function(t) dim(t)[1]))
 class(tables)
 test-data.frame(tables, stringsAsFactors=FALSE)


 result - test[11:42, 1:5] #Extract the actual data we want
 names(result) - c(Response, Q1, Q2,Q3,Q4)
 for(i in 2:5) {
 # Convert columns to factors
 result[,i] - as.numeric(unfactor(result[,i]))
 }
 result

 From here you should be

[R] textConnection on List

2010-10-10 Thread Santosh Srinivas

I'm trying to optimize some code that I have 

I have a list of delimited text in a list (see below). I want to do a
read.table via a text connection so that I can get the delimited values into
a table ... 
Something like ... tmp_MF_Data_F - read.table(textConnection(tmpTxtList),
sep=';', quote = '') ... but this fails ...

Any idea how to go about it?

Thanks for the help.


head(tmpTxtList)
[[1]]
[1] \106270;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS-
DIVIDEND;10.3287;10.3287;0.;01-Apr-2008\

[[2]]
[1] \106269;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS-
GROWTH;10.3287;10.3287;0.;01-Apr-2008\

[[3]]
[1] \102767;Birla Sun Life Dynamic Bond Fund-Retail
Plan-Growth;12.6832;12.6832;12.6832;01-Apr-2008\

[[4]]
[1] \102766;Birla Sun Life Dynamic Bond Fund-Retail Plan-Quarterly
Dividend;10.5396;10.5396;10.5396;01-Apr-2008\

[[5]]
[1] \102855;Birla Sun Life Fixed Maturity Plan - Annual  Series
3-Dividend;9.9830;9.7833;9.9830;01-Apr-2008\

[[6]]
[1] \102856;Birla Sun Life Fixed Maturity Plan - Annual  Series
3-Growth;12.3964;12.1485;12.3964;01-Apr-2008\

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] MATLAB vrs. R

2010-10-10 Thread Craig O'Connell


I need to find the area under a trapezoid for a research-related project.  I 
was able to find the area under the trapezoid in MATLAB using the code:
function [int] = myquadrature(f,a,b) 
% user-defined quadrature function
% integrate data f from x=a to x=b assuming f is equally spaced over the 
interval 
% use type 
% determine number of data points
npts = prod(size(f));
nint = npts -1; %number of intervals
if(npts =1)
  error('need at least two points to integrate')
end;
% set the grid spacing
if(b =a)
  error('something wrong with the interval, b should be greater than a')
else
  dx = b/real(nint);
end;
npts = prod(size(f));

% trapezoidal rule
% can code in line, hint:  sum of f is sum(f) 
% last value of f is f(end), first value is f(1)
% code below
int=0;
for i=1:(nint)
%F(i)=dx*((f(i)+f(i+1))/2);
int=int+dx*((f(i)+f(i+1))/2);
end
%int=sum(F);

Then to call myquadrature I did:
% example function call test the user-defined myquadrature function
% setup some data

% velocity profile across a channel
% remember to use ? for help, e.g. ?seq 
x = 0:10:2000; 
% you can access one element of a list of values using brackets
% x(1) is the first x value, x(2), the 2nd, etc.
% if you want the last value, a trick is x(end)  
% the function cos is cosin and mean gives the mean value
% pi is 3.1415, or pi
% another hint, if you want to multiple two series of numbers together
% for example c = a*b where c(1) = a(1)*b(1), c(2) = a(2)*b(2), etc.
% you must tell Matlab you want element by element multiplication
% e.g.:c = a.*b
% note the .
%
h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1); %bathymetry
u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1);  %vertically-averaged cross-transect 
velocity
plot(x,-h)
% set begin and end points for the integration 
a = x(1);
b = x(end);
% call your quadrature function.  Hint, the answer should be 3.
f=u.*h;
val =  myquadrature(f,a,b);
fprintf('the solution is %f\n',val);

This is great, I got the expected answer of 3.
 
NOW THE ISSUE IS, I HAVE NO IDEA HOW THIS CODE TRANSLATES TO R.  Here is what I 
attempted to do, and with error messages, I can tell i'm doing something wrong:
  myquadrature-function(f,a,b){
npts=length(f)
nint=npts-1
if(npts=1)
error('need at least two points to integrate')
end;
if(b=a)
error('something wrong with the interval, b should be greater than a')
else
dx=b/real(nint)
end;
npts=length(f)
_(below this line, I cannot code)
int=0
for(i in 1:(npts-1))
sum(f)=((b-a)/(2*length(f)))*(0.5*f[i]+f[i+1]+f[length(f)])}
%F(i)=dx*((f(i)+f(i+1))/2);
int=int+dx*((f(i)+f(i+1))/2);
end
%int=sum(F);
 
Thank you and any potential suggestions would be greatly appreciated.
 
Dr. Argese.   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] textConnection on List

2010-10-10 Thread Santosh Srinivas

I think unlist did the trick ... I used  

tmpData - unlist (tmpTxtList)
tmp_MF_Data_F - read.table(textConnection(tempTxt), sep=';', quote = '')
write.table(tmp_MF_Data_F,file=MF_Data_F.txt, append=T, sep =|,
col.names=F, row.names=F, quote=F)



-Original Message-
From: Santosh Srinivas [mailto:santosh.srini...@gmail.com] 
Sent: 11 October 2010 11:05
To: 'r-help'
Subject: textConnection on List

I'm trying to optimize some code that I have 

I have a list of delimited text in a list (see below). I want to do a
read.table via a text connection so that I can get the delimited values into
a table ... 
Something like ... tmp_MF_Data_F - read.table(textConnection(tmpTxtList),
sep=';', quote = '') ... but this fails ...

Any idea how to go about it?

Thanks for the help.


head(tmpTxtList)
[[1]]
[1] \106270;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS-
DIVIDEND;10.3287;10.3287;0.;01-Apr-2008\

[[2]]
[1] \106269;BIRLA SUN LIFE CAPITAL PROTECTION ORIENTED FUND-3 YRS-
GROWTH;10.3287;10.3287;0.;01-Apr-2008\

[[3]]
[1] \102767;Birla Sun Life Dynamic Bond Fund-Retail
Plan-Growth;12.6832;12.6832;12.6832;01-Apr-2008\

[[4]]
[1] \102766;Birla Sun Life Dynamic Bond Fund-Retail Plan-Quarterly
Dividend;10.5396;10.5396;10.5396;01-Apr-2008\

[[5]]
[1] \102855;Birla Sun Life Fixed Maturity Plan - Annual  Series
3-Dividend;9.9830;9.7833;9.9830;01-Apr-2008\

[[6]]
[1] \102856;Birla Sun Life Fixed Maturity Plan - Annual  Series
3-Growth;12.3964;12.1485;12.3964;01-Apr-2008\

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

53 matches

Mail list logo