Re: [R] read.xport

2005-07-14 Thread bogdan romocea
How about avoiding SAS XPORT altogether and exporting everything in
the simple, clean, non-proprietary, extremely reliable,
platform-independent ... etc text format (CSV, tab delimited etc)?


 -Original Message-
 From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, July 14, 2005 10:31 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] read.xport
 
 
 I am trying to import data from a SAS XPORT file that 
 contains 24 SAS files.
 When I use the read.xport procedure only about 16 data 
 frames (components)
 are created.  Any suggestions?
 
  
 
  
 
 **
 ***
 
 Gary A. Nelson, Ph.D
 
 Massachusetts Division of Marine Fisheries
 
 30 Emerson Avenue
 
 Gloucester, MA 01930
 
 Phone: (978) 282-0308 x114
 
 Fax: (617) 727-3337
 
 Email: [EMAIL PROTECTED]
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it possible to create highly customized report in *.xls format by using R/S+?

2005-07-21 Thread bogdan romocea
So your conclusion is that the only choice is to make mistakes and get
in trouble. (That's what Excel excels at.)

Two options I haven't seen mentioned are:
1. Create your deliverables in HTML format, and change the extension
from .htm to .xls; Excel will import them automatically. The way the
file looks in Excel is determined by .CSS settings (I've seen this
happen) and I presume HTML tags.
2. For the real spreadsheet thing, switch to OpenOffice.org. Their
format is XML compressed with ZIP which you can easily work with since
the format specifications are not proprietary. See
http://xml.openoffice.org/ for details.



 -Original Message-
 From: Wensui Liu [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, July 20, 2005 10:56 AM
 To: Greg Snow
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Is it possible to create highly customized 
 report in *.xls format by using R/S+?
 
 
 I appreciate your reply and understand your point completely. But at
 times we can't change the rule, the only choice is to follow the rule.
 Most deliverables in my work are in excel format.
 
 On 7/20/05, Greg Snow [EMAIL PROTECTED] wrote:
  See:
  
  http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html
  and
  http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf
  
  Greg Snow, Ph.D.
  Statistical Data Center, LDS Hospital
  Intermountain Health Care
  [EMAIL PROTECTED]
  (801) 408-8111
  
   Wensui Liu [EMAIL PROTECTED] 07/19/05 03:22PM 
  I remember in one slide of Prof. Ripley's presentation overhead, he
  said the most popular data analysis software is excel.
  
  So is there any resource or tutorial on this topic?
  
  Thank you so much!
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
  
  
 
 
 -- 
 WenSui Liu, MS MA
 Senior Decision Support Analyst
 Division of Health Policy and Clinical Effectiveness
 Cincinnati Children Hospital Medical Center
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Rprof fails in combination with RMySQL

2005-07-21 Thread bogdan romocea
I think you're barking up the wrong tree. Optimize the MySQL code
separately from optimizing the R code. A very nice reference about the
former is http://highperformancemysql.com/. Also, if possible, do
everything in MySQL.
hth,
b.


 -Original Message-
 From: Thieme, Lutz [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, July 21, 2005 10:11 AM
 To: Rhelp (E-mail)
 Subject: [R] Rprof fails in combination with RMySQL
 
 
 Dear R community,
 
 I tried to optimized my R code by using Rprof. In my R code 
 I'm using MySQL
 database connections intensively. After a bunch of queries R 
 fails with the 
 following error message:
 Error in .Call(RS_MySQL_newConnection, drvId, con.params, 
 groups, PACKAGE = .MySQLPkgName) : 
 RS-DBI driver: (could not connect [EMAIL PROTECTED] 
 on dbname myDB
 
 Without the R profiler this code runs very stable since weeks.
 
 Do you have any ideas or suggestions?
 
 I tried the following R versions:
 ___
 platform i386-pc-solaris2.8
 arch i386  
 os   solaris2.8
 system   i386, solaris2.8  
 status 
 major1 
 minor9.1   
 year 2004  
 month06
 day  21
 language R   
 ___
 platform sparc-sun-solaris2.8
 arch sparc   
 os   solaris2.8  
 system   sparc, solaris2.8   
 status   
 major2   
 minor1.1 
 year 2005
 month06  
 day  20  
 language R   
 ___
 platform sparc-sun-solaris2.8
 arch sparc   
 os   solaris2.8  
 system   sparc, solaris2.8   
 status   
 major1   
 minor9.1 
 year 2004
 month06  
 day  21  
 language R   
 
 
 Thank you in advance and kind regards,
 
 Lutz Thieme
 AMD Saxony/ Product Engineering AMD Saxony Limited 
 Liability Company  Co. KG
 phone: + 49-351-277-4269 M/S E22-PE, 
 Wilschdorfer Landstr. 101
 fax: + 49-351-277-9-4269 D-01109 Dresden, Germany
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Rprof fails in combination with RMySQL

2005-07-22 Thread bogdan romocea
I think the opposite is applicable too - optimize R outside of MySQL.
Exclude the MySQL queries completely and use instead the same data
frames (prepared beforehand) with Rprof. Then, if you really want to
run the full code with Rprof, wrap the queries in try():
data - try(fetch(dbSendQuery(connection,query),n=-1))
if (class(data) == try-error) for (i in 1:100) {
data - try(fetch(dbSendQuery(connection,query),n=-1))  
if (class(data) != try-error) break
}
Also, why do you close the connection after each query? Open one
connection and use it for the whole R session. (I never close the
connection after a query.)
hth,
b.


 -Original Message-
 From: Thieme, Lutz [mailto:[EMAIL PROTECTED] 
 Sent: Friday, July 22, 2005 2:04 AM
 To: bogdan romocea
 Cc: R-help@stat.math.ethz.ch
 Subject: Re: [R] Rprof fails in combination with RMySQL
 
 
 Hello Bogdan, 
 
 thanks for you reply. My MySQL is always optimized oustide 
 from R (but many thanks for the interesting link!). 
 I'm very sure that I have to optimize the R code which uses 
 the data from my queries for calculations. To get more in-
 formation which R function is the main speed limiter I tried 
 Rprof. 
 Because I'm always opening and closing the connection for every 
 query I have never opened more than one connection. 
 And again: The same R code runs without Rprof stable since weeks
 multiple times a day. I can exclude by 99% that the error comes 
 from the database. Maybe it comes from large number of opening
 closing cycles?...
 
 Regards,
 
 Lutz
 
 
 
  -Original Message-
  From: bogdan romocea [mailto:[EMAIL PROTECTED]
  Sent: Thursday, July 21, 2005 5:05 PM
  To: Thieme, Lutz
  Cc: R-help@stat.math.ethz.ch
  Subject: RE: [R] Rprof fails in combination with RMySQL
  
  
  I think you're barking up the wrong tree. Optimize the MySQL code
  separately from optimizing the R code. A very nice 
 reference about the
  former is http://highperformancemysql.com/. Also, if possible, do
  everything in MySQL.
  hth,
  b.
  
  
   -Original Message-
   From: Thieme, Lutz [mailto:[EMAIL PROTECTED] 
   Sent: Thursday, July 21, 2005 10:11 AM
   To: Rhelp (E-mail)
   Subject: [R] Rprof fails in combination with RMySQL
   
   
   Dear R community,
   
   I tried to optimized my R code by using Rprof. In my R code 
   I'm using MySQL
   database connections intensively. After a bunch of queries R 
   fails with the 
   following error message:
   Error in .Call(RS_MySQL_newConnection, drvId, con.params, 
   groups, PACKAGE = .MySQLPkgName) : 
   RS-DBI driver: (could not connect [EMAIL PROTECTED] 
   on dbname myDB
   
   Without the R profiler this code runs very stable since weeks.
   
   Do you have any ideas or suggestions?
   
   I tried the following R versions:
   ___
   platform i386-pc-solaris2.8
   arch i386  
   os   solaris2.8
   system   i386, solaris2.8  
   status 
   major1 
   minor9.1   
   year 2004  
   month06
   day  21
   language R   
   ___
   platform sparc-sun-solaris2.8
   arch sparc   
   os   solaris2.8  
   system   sparc, solaris2.8   
   status   
   major2   
   minor1.1 
   year 2005
   month06  
   day  20  
   language R   
   ___
   platform sparc-sun-solaris2.8
   arch sparc   
   os   solaris2.8  
   system   sparc, solaris2.8   
   status   
   major1   
   minor9.1 
   year 2004
   month06  
   day  21  
   language R   
   
   
   Thank you in advance and kind regards,
   
   Lutz Thieme
   AMD Saxony/ Product Engineering AMD Saxony Limited 
   Liability Company  Co. KG
   phone: + 49-351-277-4269 M/S E22-PE, 
   Wilschdorfer Landstr. 101
   fax: + 49-351-277-9-4269 D-01109 Dresden, Germany
   
   
   [[alternative HTML version deleted]]
   
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
   http://www.R-project.org/posting-guide.html
  
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] choose between dates and times

2005-07-26 Thread bogdan romocea
If happenat is not a datetime value, convert it with strptime(). Then,
one solution is to transform it in the following way:
num.time - as.numeric(format(happenat,%Y%m%d%H%M%S))
This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset
your data frame with
dfr[which(num.time = 20050725153000  num.time = 20050726123000),]
hth,
b.


 -Original Message-
 From: Kerry Bush [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, July 26, 2005 3:54 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] choose between dates and times
 
 
 Dear R-helpers,
   I have the following data:
 
  yhappenat x
 5185 (07/22/05 00:05:14)   14
 5186 (07/22/05 00:15:14)   14
 5187 (07/22/05 00:25:14)   14
 5188 (07/22/05 00:35:14)   14
 ..
 
 I want to choose between 07/25/05 15:30:00 and
 07/26/05 12:30:00. Anybody had experience in handling
 this kind of data? Is there a simple way to subset by
 the variable 'happenat'? Thanks.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to hiding code for a package

2005-08-01 Thread bogdan romocea
There's something else you could try - since you can't hide the code,
obfuscate it. Hide the real thing in a large pile of useless,
complicated, awfully formatted code that would stop anyone except the
most desperate (including yourself, after a couple of weeks/months)
from trying to understand it. The best solution would be to compile
the code, but R is not there yet.


 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, July 30, 2005 5:35 AM
 To: Gary Wong
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] How to hiding code for a package
 
 
 What you ask is impossible.  For a function to be callable it 
 has to be 
 locatable and hence can be printed.
 
 One possibility is to have a namespace, and something like
 
 foo - function(...) foo_internal(...)
 
 where foo is exported but foo_internal is not.  Then foo_internal is 
 hidden from casual inspection, but it can be listed by cognescenti.
 
 Why do you want to do this?  Anyhone can read the source code of your 
 package, and any function which can be called can be 
 deparsed, possibly 
 after jumping through a few hoops.
 
 On Sat, 30 Jul 2005, Gary Wong wrote:
 
  Hey everyone,
 
  I have made a package and wish to release it but
  before then I have a problem. I have a few functions
  in this package written in R that I wish to hide such
  that after installation, someone can use say the
  function foo(parameters = ) but cannot do foo.
  Typing foo should not show the source code or at least
  not all of it. Is there a way to do this ? I have
  searched the mailing list and used google, and have
  found something like [R] Hiding internal package
  functions for the doc. pkg-internal.Rd but this seems
  different since it seems that the keyword internal
  just hides the function from showing in the index and
  hides documentation, not the function itself. Can
  someone help? Thanks
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] date format

2005-08-10 Thread bogdan romocea
You need the day to convert to a date format. Assuming day=15:
x.date - as.Date(paste(as.character(x),-15,sep=),format=%Y-%m-%d)


 -Original Message-
 From: alessandro carletti [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 10, 2005 9:37 AM
 To: rHELP
 Subject: [R] date format
 
 
 Hi,
 I have a problem with a vector (x) containing dates in
 format -mm (I'm working with monthly means): how
 can I convert it in date format, so that I can plot it
 recognising trends for my variables?
 class(x) says: factor
 Thanks
 Alex
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Concerning reading of SAS-files

2005-08-12 Thread bogdan romocea
The first one is an index, not a data set. Anyway, just use SAS to
export the data sets in text format (CSV, tab-delimited etc). You can
then easily read those in R. (By the way, the help for read.xport says
that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT
file? Hint: no.)


 -Original Message-
 From: Fredrik Thuring [mailto:[EMAIL PROTECTED] 
 Sent: Friday, August 12, 2005 4:52 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Concerning reading of SAS-files
 
 
 
 Hi!
 
 I'm trying to start a credibility estimation study with a 
 coule of data 
 sets that are created for SAS. The data sets are saved as 
 .sas7bndx and 
 .sas7bdat.
 I've tried reading them to R with the function 'read.xport' but this 
 returns the error message 'Error in lookup.xport(file) : 
 unable to open 
 file'.
 Are there any other functions that one could use instead?
 
 Thanks a lot to who ever can solve my problem!
 
 Fredrik Thuring
 Codan Insurance, Copenhagen
 
 Best regards
 Fredrik Thuring
 
 
 --
 
 This e-mail and any attachment may be confidential and may 
 also be privileged.
 If you are not the intended recipient, please notify us 
 immediately and then
 delete this e-mail and any attachment without retaining 
 copies or disclosing
 the contents thereof to any other person.
 Thank you.
 --
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] retrieving large columns using RODBC

2005-08-15 Thread bogdan romocea
This appears to be an SQL issue. Look for a way to speed up your
queries in Postgresql. I presume you haven't created an index on
'index', which means that every time you run your SELECT, Postgresql
is forced to do a full table scan (not good). If the index doesn't
solve the problem, look for some SQL help.


 -Original Message-
 From: Tamas K Papp [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, August 13, 2005 4:03 AM
 To: R-help mailing list
 Subject: [R] retrieving large columns using RODBC
 
 
 Hi,
 
 I have a large table in Postgresql (result of an MCMC 
 simulation, with 1
 million rows) and I would like to retrive colums (correspond 
 to variables)
 using RODBC.  I have a column called index which is used to 
 order rows.
 
 Unfortunately, sqlQuery can't return all the values from a 
 column at once
 (RODBC complains about lack of memory).  So I am using the 
 following code:
 
 getcolumns - function(channel, tablename, colnames, totalrows,
   ordered=TRUE,chunksize=1e5) {
   r - matrix(double(0),totalrows,length(colnames))
   for (i in 1:ceiling(totalrows/chunksize)) {
 cat(.)
 r[((i-1)*chunksize+1):(i*chunksize)] - as.matrix(
   sqlQuery(channel, paste(SELECT, paste(colnames,collapse=, ),
   FROM, tablename,
   WHERE index =, i*chunksize,
   AND index , (i-1)*chunksize,
   if (ordered) ORDER BY index; 
 else ;)))
   }
   cat(\n)
   drop(r)   # convert to vector if needed
 }
 
 to retrieve it in chunks.  However, this is very slow -- 
 takes about 15
 minutes on my machine.  Is there a way to speed it up?
 
 I am running Linux on a powerbook, RODBC version 1.1-4, R 2.1.1.  The
 machine has only 512 Mb of RAM.
 
 Thanks,
 
 Tamas
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Regular expressions sub

2005-08-18 Thread bogdan romocea
One solution is
test - c(1.11,10.11,11.11,113.31,114.2,114.3)
id -  unlist(lapply(strsplit(test,[.]),function(x) {x[2]}))


 -Original Message-
 From: Bernd Weiss [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, August 18, 2005 12:10 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Regular expressions  sub
 
 
 Dear all,
 
 I am struggling with the use of regular expression. I got
 
  as.character(test$sample.id)
  [1] 1.11   10.11  11.11  113.31 114.2  114.3  114.8  
 
 and need
 
 [1] 11   11  11  31 2  3  8
 
 I.e. remove everything before the . .
 
 TIA,
 
 Bernd
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Linux Standalone Server Suggestions for R

2005-09-01 Thread bogdan romocea
Most powerful in what way? Quite a lot depends on the jobs you're going to run.
- To run CPU-bound jobs, more CPUs is better. (Even though R doesn't
do threading, you can manually split some CPU-bound jobs in several
parts and run them simultaneously.) Apart from multiple CPUs and
hyperthreading, check the new dual-core CPUs.
- To run very large jobs, more memory is better. You can easily spend
most of your money on memory. Get the fastest one.
- You should get 64-bit CPUs, otherwise you won't be able to run very
large jobs (search the list for details).

I would suggest that you buy a configuration that can handle more CPUs
and memory than you think you need now (say, at least 4 max CPUs and
16 GB max memory), then keep on adding more memory and CPUs as your
needs change.
hth,
b.


 -Original Message-
 From: Jia-Shing So [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, August 31, 2005 10:03 PM
 To: r-help@stat.math.ethz.ch
 Cc: Phuoc Hong
 Subject: [R] Linux Standalone Server Suggestions for R
 
 
 Hi All,
 
 My group is  looking for any suggestions on what to purchase to  
 achieve the most powerful number crunching system that $50k 
 can buy.   
 The main application that will be used is R so input on what 
 hardware  
 benefits R most will be appreciated.  The requirements are 
 that it be  
 a single standalone server (i.e. not a cluster solution), and 
 it that  
 must be able to run unix/linux.  If anyone has any experience/ 
 suggestions regarding the following questions that would also be  
 greatly appreciated.
 
 AMD vs Intel chips, especially 64-bit versions of the two?
 Using Itanium/Opterons and if so how much of a performance boost did  
 you achieve vs other 64-bit chip sets?
 Also, does anyone know if there is an upper thresh hold on much  
 memory R can use?
 
 Thanks in advance for any help and suggestions,
 
 Jia-Shing So
 Programmer Analyst
 Biostatistics and Bioinformatics Lab
 University of California, San Diego
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] RMySQL installation problem on FC4 x86_64

2005-09-07 Thread bogdan romocea
Dear useRs,

I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0
and MySQL 4.1.11-2 installed through yum). After an initial
configuration error (could not find the MySQL installation include
and/or library directories) I managed to install RMySQL with
   # export PKG_LIBS=-L/usr/lib64/mysql -lmysqlclient
   # R CMD INSTALL RMySQL_0.5-5.tar.gz

However, when I load the package I get this error:
 require(RMySQL)
Loading required package: RMySQL
Loading required package: DBI
Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library
'/usr/lib64/R/library/RMySQL/libs/RMySQL.so':
  /usr/lib64/R/library/RMySQL/libs/RMySQL.so: undefined symbol:
mysql_field_count
[1] FALSE

Can anyone offer a suggestion, or perhaps email me a precompiled binary?
Thank you,
b.

platform x86_64-redhat-linux-gnu
arch x86_64
os   linux-gnu
system   x86_64, linux-gnu
status   
major2
minor1.0
year 2005
month04
day  18
language R

# yum list installed mysql
Installed Packages
mysql.i3864.1.11-2   installed
mysql.x86_64   4.1.11-2   installed

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] boxplot statistics

2005-10-06 Thread bogdan romocea
A related comment - don't rely (too much) on boxplots. They show only
a few things, which may be limiting in many cases and completely
misleading in others. Here are a couple of suggestions for plots which
you may find more useful than the standard box plots:
- figure 3.27 from 
http://www.stat.auckland.ac.nz/~paul/RGraphics/chapter3.html
- violin plots (see package vioplot) - density plots - histograms
- box-percentile plots (bpplot from Hmisc)
- quantile plots
- if comparing 2 distributions, qq plots, quantile-difference plots,
mean-difference plots etc.


 -Original Message-
 From: Karin Lagesen [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, October 04, 2005 5:24 AM
 To: [EMAIL PROTECTED]
 Subject: [R] boxplot statistics



 I have read and reread the boxplot and the boxplot stats page, and I
 still cannot understand how and what boxplot shows. I realize that
 this might be due to me not knowing enough statistics, but anyway...

 First, how does boxplot determine the size of the box? And is the line
 inside the box the mean or the median (or something completely
 different?) And how does it determine how long out the whiskers should
 go?

 Also, the boxplot.stats page talks about hinges, what are those?
 The two hinges are versions of the first and third quartile, i.e.,
 close to 'quantile(x, c(1,3)/4)'.

 Thankyou very much.

 Karin
 --
 Karin Lagesen, PhD student
 [EMAIL PROTECTED]
 http://www.cmbn.no/rognes/

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] add leading 0s to %d from png() {was Automatic creation of file names}

2005-10-08 Thread bogdan romocea
Dear useRs,

Is there a way to 'properly' format %d when plotting more than one
page on png()? 'Properly' means to me with leading 0s, so that the
PNGs become easy to navigate in a file/image browser. Lacking a better
solution I ended up using the code below, but would much prefer
something like
   png(test_%d.png,bg=white,width=1000,height=700)
where %d could be formatted like
   formatC(%d,digits=0,wid=3,flag=0,mode=integer)

Thank you,
b.

#---works, but is rather complicated---
pngno - 0 ; i - 1
for (w in 1:53) {
  if (i %in% c(4*0:100+1)) {
pngno - pngno + 1
png(paste(test_,formatC(pngno,digits=0,wid=4,flag=0,mode=integer),
  .png,sep=),bg=white,width=1000,height=750)
par(mfrow=c(2,2),mai=c(4,5,3,2)/10,omi=c(0.2,0,0,0),
  cex.axis=1,cex.main=1.2)
}
  plot(1:10,main=w)
  if (i %in% c(4*1:100)) dev.off()
  i - i+1
  }
dev.off()


From: Mike Prager Mike.Prager at noaa.gov
Subject: Re: [R] Automatic creation of file names
Newsgroups: gmane.comp.lang.r.general
Date: 2005-09-22 14:51:54 GMT (2 weeks, 1 day, 23 hours and 55 minutes ago)

Walter --

P.S.  The advantage of using formatC over pasting the digits (1:1000)
directly is that when one uses leading zeroes, as in the formatC example
shown, the resulting filenames will sort into proper order.

...MHP

You can use paste() with something like

 formatC(number,digits=0,wid=3,flag=0)

(where number is your loop index) to generate the filenames.

on 9/22/2005 10:21 AM Leite,Walter said the following:

I have a question about how to save to the hard drive the one thousand
datasets I generated in a simulation. ://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] decreasing performance of for() loop

2005-10-10 Thread bogdan romocea
Dear useRs,

I'm wondering why the for() loop below runs slower as it progresses.
On a Win XP box, the iterations at the beginning run much faster than
those at the end:
1%, iteration 2000, 10:10:16
2%, iteration 4000, 10:10:17
3%, iteration 6000, 10:10:17
98%, iteration 196000, 10:24:04
99%, iteration 198000, 10:24:24
100%, iteration 20, 10:24:38

Is there something that can be done about this?  Would such a loop run
faster in C/C++/Fortran?

Thank you,
b.

#---sample code
loop.progress - function(loop,iterations,steps,toprint=NULL)
{
marks - c(1,floor(iterations/steps)*(1:steps))
if (loop %in% marks) {
if (is.null(toprint)) prt - loop else prt - toprint
cat(paste(round((which(marks == loop)-1)*(100/steps),0),%, iteration ,
prt,, ,format(Sys.time(),%H:%M:%S),sep=),\n)
}   
}
#---loop that runs slower and slower
test - runif(20)
out - vector(mode=numeric)
lg - 30
for (i in (lg+1):length(test))
{
loop.progress(i,length(test),100)   
out[i] - sum(test[(i-lg):i])
}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] decreasing performance of for() loop

2005-10-10 Thread bogdan romocea
Nevermind, I found the fix. Declaring the length for out eliminates
the performance decrease,
   out - vector(mode=numeric,length=length(test))


On 10/10/05, bogdan romocea [EMAIL PROTECTED] wrote:
 Dear useRs,

 I'm wondering why the for() loop below runs slower as it progresses.
 On a Win XP box, the iterations at the beginning run much faster than
 those at the end:
 1%, iteration 2000, 10:10:16
 2%, iteration 4000, 10:10:17
 3%, iteration 6000, 10:10:17
 98%, iteration 196000, 10:24:04
 99%, iteration 198000, 10:24:24
 100%, iteration 20, 10:24:38

 Is there something that can be done about this?  Would such a loop run
 faster in C/C++/Fortran?

 Thank you,
 b.

 #---sample code
 loop.progress - function(loop,iterations,steps,toprint=NULL)
 {
 marks - c(1,floor(iterations/steps)*(1:steps))
 if (loop %in% marks) {
 if (is.null(toprint)) prt - loop else prt - toprint
 cat(paste(round((which(marks == loop)-1)*(100/steps),0),%, iteration 
 ,
 prt,, ,format(Sys.time(),%H:%M:%S),sep=),\n)
 }
 }
 #---loop that runs slower and slower
 test - runif(20)
 out - vector(mode=numeric)
 lg - 30
 for (i in (lg+1):length(test))
 {
 loop.progress(i,length(test),100)
 out[i] - sum(test[(i-lg):i])
 }


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] adding 1 month to a date

2005-10-12 Thread bogdan romocea
Simple addition and subtraction works as well:
  as.Date(1995/12/01,format=%Y/%m/%d) + 30
If you have datetime values you can use
  strptime(1995-12-01 08:00:00,format=%Y-%m-%d %H:%M:%S) + 30*24*3600
where 30*24*3600 = 30 days expressed in seconds.


 -Original Message-
 From: Marc Schwartz [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, October 11, 2005 10:16 PM
 To: t c
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] adding 1 month to a date


 On Tue, 2005-10-11 at 16:26 -0700, t c wrote:
  Within an R dataset, I have a date field called date_.
 (The dates are
  in the format -MM-DD, e.g. 1995-12-01.)

  How can I add or subtract 1 month from this date, to get
 1996-01-01 or
  1995-11-01.

 There might be an easier way to do this, but using seq.Date(), you can
 increment or decrement from a Time 0 by months:

 Add 1 month:

 This takes your Time 0, generates a 2 element sequence (which begins
 with Time 0) and then takes the second element:

  seq(as.Date(1995-12-01), by = month, length = 2)[2]
 [1] 1996-01-01



 Subtract 1 month:

 Same as above, but we use 'by = -1 month' and take the
 second element:

  seq(as.Date(1995-12-01), by = -1 month, length = 2)[2]
 [1] 1995-11-01


 See ?as.Date and ?seq.Date for more information. The former
 function is
 used to convert from a character vector to a Date class object. Note
 that in your case, the date format is consistent with the default. Pay
 attention to the 'format' argument in as.Date() if your dates
 should be
 in other formats.

 HTH,

 Marc Schwartz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to use large data set ?

2006-07-20 Thread bogdan romocea
By far, the cheapest and easiest solution (and the very first to try)
is to add more memory. The cost depends on what kind you need, but
here's for example 2 GB you can buy for only $150:
http://www.newegg.com/Product/Product.asp?Item=N82E16820144157

Project constraints?! If they don't want to spend a couple hundred USD
for memory, you're working on the wrong project (and/or for the wrong
organization). Buying more memory (say up to a few GB) is orders of
magnitude cheaper than the licenses for some proprietary software that
can get around memory constraints, and probably (much) cheaper than
the loss of productivity caused by the extra training and setup time
needed to try to implement an alternative solution (such as a
connection to a DBMS). And even if the extra memory needed for R were
as expensive as the license for a proprietary software, which choice
would be more reasonable?


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of mahesh r
 Sent: Wednesday, July 19, 2006 4:23 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] how to use large data set ?

 Hi,
 I would like to extend to the query posted earlier on using large data
 bases. I am trying to use Rgdal to mine within the remote
 sensing imageries.
 I dont have problems bring the images within the R
 environment. But when I
 try to convert the images to a data.frame I receive an
 warning message from
 R saying 1: Reached total allocation of 510Mb: see
 help(memory.size) and
 the process terminates. Due to project constarints I am given a very
 old 2.4Ghz computer with only 512 MB RAM. I think what R is currently
 doing is
 trying to store the results in the RAM and since the image
 size is very big
 (some 9 million pixels), I think it gets out of memory.

 My question is
 1. Is there any possibility to dump the temporary variables
 in a temp folder
 within the hard disk (as many softwares do) instead of leting
 R store them
 in RAM
 2. Could this be possible without creating a connection to a
 any back hand
 database like Oracle.

 Thanks,

 Mahesh


 On 7/19/06, Greg Snow [EMAIL PROTECTED] wrote:
 
  You did not say what analysis you want to do, but many
 common analyses
  can be done as special cases of regression models and you
 can use the
  biglm package to do regression models.
 
  Here is an example that worked for me to get the mean and standard
  deviation by day from an oracle database with over 23
 million rows (I
  had previously set up 'edw' as an odbc connection to the
 database under
  widows, any of the database connections packages should work for you
  though):
 
  library(RODBC)
  library(biglm)
 
  con - odbcConnect('edw',uid='glsnow',pwd=pass)
 
  odbcQuery(con, select ADMSN_WEEKDAY_CD, LOS_DYS from
 CM.CASEMIX_SMRY)
 
  t1 - Sys.time()
 
  tmp - sqlGetResults(con, max=10)
 
  names(tmp) - c(Day,LoS)
  tmp$Day - factor(tmp$Day, levels=as.character(1:7))
  tmp - na.omit(tmp)
  tmp - subset(tmp, LoS  0)
 
  ff - log(LoS) ~ Day
 
  fit - biglm(ff, tmp)
 
  i - nrow(tmp)
  while( !is.null(nrow( tmp - sqlGetResults(con, max=10) ) ) ){
  names(tmp) - c(Day,LoS)
  tmp$Day - factor(tmp$Day, levels=as.character(1:7))
  tmp - na.omit(tmp)
  tmp - subset(tmp, LoS  0)
 
  fit - update(fit,tmp)
 
  i - i + nrow(tmp)
  cat(format(i,big.mark=','), rows processed\n)
  }
 
  summary(fit)
 
  t2 - Sys.time()
 
  t2-t1
 
  Hope this helps,
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  [EMAIL PROTECTED]
  (801) 408-8111
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
 Yohan CHOUKROUN
  Sent: Wednesday, July 19, 2006 9:42 AM
  To: 'r-help@stat.math.ethz.ch'
  Subject: [R] how to use large data set ?
 
  Hello R users,
 
 
 
  Sorry for my English, i'm French.
 
 
 
  I want to use a large dataset (3 millions of rows and 70 var) but I
  don't know how to do because my computer crash quickly (P4
 2.8Ghz, 1Go
  ).
 
  I have also a bi Xeon with 2Go so I want to do computation on this
  computer and show the results on mine. Both of them are on
 Windows XP...
 
 
 
  To do shortly I have:
 
 
 
  1 server with a MySQL database
 
  1computer
 
  and I want to use them with a large dataset.
 
 
 
  I'm trying to use RDCOM to connect the database and
 installing (but it's
  hard for me..) Rpad.
 
 
 
  Is there another solutions ?
 
 
 
  Thanks in advance
 
 
 
 
 
  Yohan C.
 
 
 
 
 --
  Ce message est confidentiel. Son contenu ne represente en
 aucun cas un
  engagement de la part du Groupe Soft Computing sous reserve de tout
  accord conclu par ecrit entre vous et le Groupe Soft
 Computing. Toute
  publication, utilisation ou diffusion, meme partielle, doit etre
  autorisee prealablement.
  Si vous n'etes pas destinataire de ce message, merci d'en avertir
  immediatement 

[R] scatter plot with axes drawn on the same scale

2006-07-28 Thread bogdan romocea
Dear useRs,

I'd like to produce some scatter plots where N units on the X axis are
equal to N units on the Y axis (as measured with a ruler, on screen or
paper). This approach
  x - sample(10:200,40) ; y - sample(20:100,40)
  windows(width=max(x),height=max(y))
  plot(x,y)
is better than plot(x,y) but doesn't solve the problem because of the
other parameters (margins etc). Is there an easy, official way of
sizing the axes to the same scale, one that would also work with
multiple scatter plots being sent to the same pdf() - plus perhaps
layout() or par(mfrow())?

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prefixing list names in print

2006-08-08 Thread bogdan romocea
A simple function will do what you want, customize this as needed:
lprint - function(lst,prefix)
{
for (i in 1:length(lst)) {
   cat(paste(prefix,$,names(lst)[i],sep=),\n)
   print(lst[[i]])
   cat(\n)
}
}
P - list(A=a,B=b)
lprint(P,Prefix)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Laurent Deniau
 Sent: Tuesday, August 08, 2006 12:25 PM
 To: R-help
 Subject: [R] prefixing list names in print

 With

 print(list(A=a,B=b))

 it displays

 $A
 [1] a

 $B
 [1] b

 I would like to add a common prefix to all the list tags after the $.
 Pasting the prefix to the names does not work (appear after the $).
 For example if the prefix would be P, it should display:

 P$A
 [1] a

 P$B
 [1] b

 I tried to add a name attribute to the list or to add a
 prefix=P to
 print but nothing works. Any hint?

 Thanks,

   Laurent.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] screen resolution effects on graphics

2006-08-28 Thread bogdan romocea
You forgot to mention your OS. This was asked before and if I recall
correctly the answer for Windows was no. An acceptable solution (imho)
is to edit the Rprofile.site files and add something like
  pngplotwidth - 990 ; pngplotheight - 700
  pdfplotwidth - 14 ; pdfplotheight - 10
Then, use these values in your functions. It's manual, but you only
need to do this once for each machine.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Charles Annis, P.E.
 Sent: Monday, August 28, 2006 8:50 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] screen resolution effects on graphics

 Greetings, R-Citizens:

 I have the good fortune of working with a 19 1280 X 1024
 pixel monitor.  My
 R-code produces nice-looking graphics on this machine but the
 same code
 results in crowded plots on an older machine with 800 X 600
 resolution.  In
 hindsight this seems obvious, but I didn't anticipate it.

 My code will be used on machines with varying graphics (and memory)
 capacity.  Is there a way I can check the native resolution
 of the machine
 so that I can make adjustments to my code for the possible
 limitations of
 the machine running it?

 Thanks.


 Charles Annis, P.E.

 [EMAIL PROTECTED]
 phone: 561-352-9699
 eFax:  614-455-3265
 http://www.StatisticalEngineering.com


 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternatives to merge for large data sets?

2006-09-07 Thread bogdan romocea
One obvious alternative is an SQL join, which you could do directly in
a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating
indexes on user/userid before the join may save a lot of time.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Adam
 D. I. Kramer
 Sent: Thursday, September 07, 2006 2:46 PM
 To: Prof Brian Ripley
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Alternatives to merge for large data sets?


 On Thu, 7 Sep 2006, Prof Brian Ripley wrote:

  Which version of R?

 Previously, 2.3.1.

  Please try 2.4.0 alpha, as it has a different and more efficient
  algorithm for the case of 1-1 matches.

 I downloaded and installed R-latest, but got the same error message:

 Error: cannot allocate vector of size 7301 Kb

 ...though at least the too-big size was larger this time.

 My data set is not exactly 1-1; every item in prof may have
 one or more
 matches in pubbounds, though every item in pubbounds
 corrosponds only to
 one prof.

 --Adam

 
  On Wed, 6 Sep 2006, Adam D. I. Kramer wrote:
 
  Hello,
 
  I am trying to merge two very large data sets, via
 
  pubbounds.prof -
 
 merge(x=pubbounds,y=prof,by.x=user,by.y=userid,all=TRUE,so
 rt=FALSE)
 
  which gives me an error of
 
  Error: cannot allocate vector of size 2962 Kb
 
  I am reasonably sure that this is correct syntax.
 
  The trouble is that pubbounds and prof are large; they are
 data frames which
  take up 70M and 11M respectively when saved as .Rdata files.
 
  I understand from various archive searches that merge
 can't handle that,
  because merge takes n^2 memory, which I do not have.
 
  Not really true (it has been changed since those days).  Of
 course, if you
  have multiple matches it must do so.
 
  My question is whether there is an alternative to merge
 which would carry
  out the process in a slower, iterative manner...or if I
 should just bite the
  bullet, write.table, and use a perl script to do the job.
 
  Thankful as always,
  Adam D. I. Kramer
 
  --
  Brian D. Ripley,  [EMAIL PROTECTED]
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unexpected behavior of boxplot(x, notch=TRUE, log=y)

2006-10-05 Thread bogdan romocea
A function I've been using for a while returned a surprising [to me,
given the data] error recently:
   Error in plot.window(xlim, ylim, log, asp, ...) :
   Logarithmic axis must have positive limits

After some digging I realized what was going on:
x - c(10460.97, 10808.67, 29499.98, 1, 35818.62, 48535.59, 1, 1,
   42512.1, 1627.39, 1, 7571.06, 21479.69, 25, 1, 16143.85, 12736.96,
   1, 7603.63, 1, 33155.24, 1, 1, 50, 3361.78, 1, 37781.84, 1, 1,
   1, 46492.05, 22334.88, 1, 1)
summary(x)
boxplot(x,notch=TRUE,log=y)  #unexpected
boxplot(x)  #ok
boxplot(x,log=y)  #ok
boxplot(x,notch=TRUE)  #aha

I can get around this, but thought that maybe boxplot() should be
adjusted to deal with something like this on its own.

Thank you,
b.

platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  4.0
year   2006
month  10
day03
svn rev39566
language   R
version.string R version 2.4.0 (2006-10-03)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read 4-jan-02 as date

2004-10-11 Thread bogdan romocea
Dear R users,

I have a column with dates (character) in a data frame:
12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01  8-Jan-01  5-Jan-01
and I need to convert them to (Julian) dates so that I can
sort the whole data frame by date. I thought it would be
very simple, but after checking the documentation and the
list I still don't have something that works.

1. as.Date returns the error below. What am I doing wrong?
As far as I can see the character strings are in standard
format.
d$Date - as.Date(d$Date, format=%d-%b-%y)
Error in fromchar(x) : character string is not in a
standard unambiguous format

2. as.date {Survival} produces this error,
d$Date - as.date(d$Date, order = dmy)
Error in as.date(d$Date, order = dmy) : Cannot coerce to
date format

3. Assuming all else fails, is there a text function
similar to SCAN in SAS? Given a string like 9-Jan-01 and
- as separator, I'd like a function that can read the
first, second and third values (9, Jan, 01), so that I can
get Julian dates with mdy.date {survival}.

Thanks in advance,
b.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] read 4-jan-02 as date

2004-10-12 Thread bogdan romocea
Thank you everyone. Indeed, I had read the data via
read.csv and the date column was a factor. Everything works
fine if I convert to character first.

Regards,
b.


--- Sundar Dorai-Raj [EMAIL PROTECTED] wrote:

 
 
 bogdan romocea wrote:
 
  Dear R users,
  
  I have a column with dates (character) in a data frame:
  12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01  8-Jan-01 
 5-Jan-01
  and I need to convert them to (Julian) dates so that I
 can
  sort the whole data frame by date. I thought it would
 be
  very simple, but after checking the documentation and
 the
  list I still don't have something that works.
  
  1. as.Date returns the error below. What am I doing
 wrong?
  As far as I can see the character strings are in
 standard
  format.
  d$Date - as.Date(d$Date, format=%d-%b-%y)
  Error in fromchar(x) : character string is not in a
  standard unambiguous format
  
  2. as.date {Survival} produces this error,
  d$Date - as.date(d$Date, order = dmy)
  Error in as.date(d$Date, order = dmy) : Cannot coerce
 to
  date format
  
  3. Assuming all else fails, is there a text function
  similar to SCAN in SAS? Given a string like 9-Jan-01
 and
  - as separator, I'd like a function that can read the
  first, second and third values (9, Jan, 01), so that I
 can
  get Julian dates with mdy.date {survival}.
  
  Thanks in advance,
  b.
  
 
 If you're reading this from a file (via read.table, for
 example), then 
 your date column is probably a factor. Convert to
 character first.
 
   x
 [1] 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01  8-Jan-01 
 5-Jan-01
 Levels: 10-Jan-01 11-Jan-01 12-Jan-01 5-Jan-01 8-Jan-01
 9-Jan-01
  
   Date(x, format=%d-%b-%y)
 Error in fromchar(x) : character string is not in a
 standard unambiguous 
 format
  
   sort(as.Date(as.character(x), format=%d-%b-%y))
 [1] 2001-01-05 2001-01-08 2001-01-09 2001-01-10
 2001-01-11
 [6] 2001-01-12
 
 
 --sundar
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] incomplete function output

2004-10-13 Thread bogdan romocea
Dear R users,

I have a function (below) which encompasses several tests.
However, when I run it, only the output of the last test is
displayed. How can I ensure that the function root(var)
will run and display the output from all tests, and not
just the last one?

Thank you,
b.

root - function(var)
{
#---Phillips-Perron
PP.test(var, lshort = TRUE) 
PP.test(var, lshort = FALSE) 

#---Augmented Dickey-Fuller 
adf.test(var, alternative = stationary, k =
trunc((length(var)-1)^(1/3)))

#---KPSS
kpss.test(var, null = Level, lshort = TRUE)
kpss.test(var, null = Trend, lshort = FALSE)
}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] output processing / ARMA order identification

2004-10-25 Thread bogdan romocea
Dear R users,

I need to fit an ARMA model. As far as I've seen, EACF (extended ACF)
is not available in R. 

1. Let's say I fit a series of ARMA models in a loop. Given the
code/output included below, how do I pull 'Model' and 'Fit' (AIC)
from each summary() so that I can combine them into an array/data
frame to be sorted by AIC?

2. Apart from EACF, are you aware perhaps of another function in R
that can help solve the issue of ARMA order identification?

Thank you,
b.


 arma - arma(var, order=c(1,1), lag=NULL, coef=NULL, 
+ include.intercept = TRUE, series = NULL)
 summary(arma)

Call:
arma(x = var, order = c(1, 1), lag = NULL, coef = NULL,
include.intercept = TRUE, series = NULL)

Model:
ARMA(1,1)

Residuals:
 Min   1Q   Median   3Q  Max 
-686.092  -68.4994.024   65.531  509.171 

Coefficient(s):
   Estimate  Std. Error  t value Pr(|t|)
ar10.9906530.003724  265.987   2e-16 ***
ma1   -0.0195620.030110   -0.650   0.5159
intercept 90.940774   36.9146822.464   0.0138 *  
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Fit:
sigma^2 estimated as 14193,  Conditional Sum-of-Squares = 17116373, 
AIC = 14983.22

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] plot time series / dates (basic)

2004-11-01 Thread bogdan romocea
Dear R users,

I'm having a hard time with some very simple things. I have a time
series where the dates are in the format 7-Oct-04. I imported the
file with read.csv so the date column is a factor. The series is
rather long and I want to plot it piece by piece. The function below
works fine, except that the labels for date are meaningless (ie
9.47e+08 or 109800 - apparently the number of seconds since
whatever). I don't want to convert the data frame to a ts object
because there are missing days and I don't want any interpolation.

1. How do I replace the date labels with something like 'Mar04',
instead of 9.47e+08 / 109800?

2. In the PDF file, the space between the two graphs printed pair by
pair is fairly large. Can I remove/reduce the area that seems
reserved for Title and X label so that, on a page, the space between
the graph at the top and the one at the bottom is minimized?

3. Given the function below, I haven't discovered a way to have
vara appear as the Title or Y label in graphs.
main=as.character(vara) lists all the values of vara (which is a
column from the data frame d). So, how can I use the name of a vector
as title or label in a plot?

Thank you,
b.


d - ('data.csv', header = T, sep = ,, quote=, dec=., 
fill = T, skip=0)
attach(d)
#function to plot a long time series piece by piece
pl - function(vara, varb, points)
{
date - as.POSIXct(strptime(as.character(Date), %d-%b-%y), tz =
GMT)
pr1 - vector(mode=numeric)
pr2 - vector(mode=numeric)
dat - vector()
for (j in 1:(round(length(Vol)/points)+1)) #number of plots
{
for (i in ((j-1)*points+1):(j*points)) 
{
pr1[i-points*(j-1)] - vara[i]
pr2[i-points*(j-1)] - varb[i]
dat[i-points*(j-1)] - date[i]
}
par(mfrow=c(2,1)) 
plot(dat, pr1, type=b)
plot(dat, pr2, type=b)
}
}

pdf(Rplots.pdf)
pl(Vol, atr, 50)
dev.off()






__ 

Check out the new Yahoo! Front Page. 
www.yahoo.com

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plot time series / dates (basic)

2004-11-02 Thread bogdan romocea
Thank you for the suggestions. I managed to fix everything except the
first part. 
dat - date[(j-1)*points+1):(j*points)]
causes a syntax error. If I do 
dat - vector() 
I end up with numbers (which is fine by me - just like SAS dates).
However, after checking a couple of sources I still have no idea how
to format numbers as dates (for plotting/printing). Does anyone have
an example for formatting 12710 (# of days since 1 Jan 1970) as
19-Oct-04 (in the x axis of a plot)?

Regards,
b.


#function to plot a long time series piece by piece
pl - function(vara, varb, points)
{
date - as.Date(as.character(Date), %d-%b-%y)
pr1 - vector(mode=numeric)
pr2 - vector(mode=numeric)
#dat - vector()
dat - date[(j-1)*points+1):(j*points)]
for (j in 1:(round(length(Vol)/points)+1)) #number of plots
{
for (i in ((j-1)*points+1):(j*points)) 
{
pr1[i-points*(j-1)] - vara[i]
pr2[i-points*(j-1)] - varb[i]
#dat[i-points*(j-1)] - date[i]
#dat - date[i]
}
par(mfrow=c(2,1), mai=c(0.4, 0.5, 0.3, 0.1), omi=c(0.2, 0, 0, 0), 
cex.axis=0.7, cex=1.2, cex.main=0.7, pch=*) 
plot(dat, pr1, main=deparse(substitute(vara)), type=o)
#axis.Date(1,dat,format=%b%y) 
plot(dat, pr2, main=deparse(substitute(varb)), type=o)
}
}





--- Prof Brian Ripley [EMAIL PROTECTED] wrote:

 On Mon, 1 Nov 2004, bogdan romocea wrote:
 
  Dear R users,
  
  I'm having a hard time with some very simple things. I have a
 time
  series where the dates are in the format 7-Oct-04. 
 
 So why use as.POSIXct for a date, rather than as.Date?
 
  I imported the
  file with read.csv so the date column is a factor. The series is
  rather long and I want to plot it piece by piece. The function
 below
  works fine, except that the labels for date are meaningless (ie
  9.47e+08 or 109800 - apparently the number of seconds since
  whatever). I don't want to convert the data frame to a ts object
  because there are missing days and I don't want any
 interpolation.
  
  1. How do I replace the date labels with something like 'Mar04',
  instead of 9.47e+08 / 109800?
 
 Just don't convert them to that format.  You set up
 
dat - vector()
 
 which is not a dates object.  If you use standard R indexing, it
 will
 work. If you throw the class away, it will not.  Try
 
dat - date[(j-1)*points+1):(j*points)]
 
 etc (no for loop required).
 
 If you want a different format, see ?axis.Date
 
  2. In the PDF file, the space between the two graphs printed pair
 by
  pair is fairly large. Can I remove/reduce the area that seems
  reserved for Title and X label so that, on a page, the space
 between
  the graph at the top and the one at the bottom is minimized?
 
 There's a whole chapter on this in `An Introduction to R': have you
 read 
 it?
 
  3. Given the function below, I haven't discovered a way to have
  vara appear as the Title or Y label in graphs.
  main=as.character(vara) lists all the values of vara (which is a
  column from the data frame d). So, how can I use the name of a
 vector
  as title or label in a plot?
 
 That's almost an FAQ.  Use deparse(substitute(vara))
 
  d - ('data.csv', header = T, sep = ,, quote=, dec=., 
  fill = T, skip=0)
  attach(d)
  #function to plot a long time series piece by piece
  pl - function(vara, varb, points)
  {
  date - as.POSIXct(strptime(as.character(Date), %d-%b-%y), tz
 =
  GMT)
  pr1 - vector(mode=numeric)
  pr2 - vector(mode=numeric)
  dat - vector()
  for (j in 1:(round(length(Vol)/points)+1)) #number of plots
  {
  for (i in ((j-1)*points+1):(j*points)) 
  {
  pr1[i-points*(j-1)] - vara[i]
  pr2[i-points*(j-1)] - varb[i]
  dat[i-points*(j-1)] - date[i]
  }
  par(mfrow=c(2,1)) 
  plot(dat, pr1, type=b)
  plot(dat, pr2, type=b)
  }
  }
  
  pdf(Rplots.pdf)
  pl(Vol, atr, 50)
  dev.off()
 
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics, 
 http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 




__ 

Check out the new Yahoo! Front Page. 
www.yahoo.com

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] misleading output after ordering data frame

2004-11-08 Thread bogdan romocea
Dear R users,

I have a data frame which I create with read.csv and then order by
date:
d - na.omit(read.csv(...))
d - d[order(as.Date(as.character(d$Date), format=%d-%b-%y), 
decreasing=F, na.last=F),]

My problem is that even though the data frame is ordered as
requested, the old row numbers are preserved. For example:

* Before sorting:
 d[1:3,]
  Date   Amt
1 5-Nov-04 87.07
2 4-Nov-04 85.80
3 3-Nov-04 82.90

* After sorting:
 d[1:3,]
 Date   Amt  
500 12-Nov-02 84.23
499 13-Nov-02 85.05
498 14-Nov-02 84.95

Is there a way to update the row numbers as well? It's not that
important, but I find it a bit confusing.

Thank you,
b.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] an off-topic question - model validation

2004-11-12 Thread bogdan romocea
Assuming you have enough data, usually 1/4 to 1/2 is used for
validation. 

One reference would be
Picard, R.R. and Berk, K.N. (1990)
Data Splitting, The American Statistician, 44;140-147.

hth,
b.

-Original Message-
From: Wensui Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 11, 2004 10:20 PM
To: [EMAIL PROTECTED]
Subject: [R] an off-topic question - model validation


Currently, I am working on a data mining project and plan to divide
the data table into 2 parts, one for modeling and the other for
validation to compare several models.

But I am not sure about the percentage of data I should use to build
the model and the one I should keep to validate the model.

Is there any literature reference about this topic? 

Thank you so much!

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] density estimation: compute sum(value * probability) for given distribution

2004-11-12 Thread bogdan romocea
Dear R users,

This is a KDE beginner's question. 
I have this distribution:
 length(cap)
[1] 200
 summary(cap)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  459.9   802.3   991.6  1066.0  1242.0  2382.0 
I need to compute the sum of the values times their probability of
occurence.

The graph is fine,
den - density(cap, from=min(cap), 
   to=max(cap), give.Rkern=F)
plot(den)

However, how do I compute sum(values*probabilities)? The
probabilities produced by the density function sum to only 26%: 
 sum(den$y)
[1] 0.2611142

Would it perhaps be ok to simply do
 sum(den$x*den$y) * (1/sum(den$y))
[1] 1073.22
?

Thank you,
b.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] density estimation: compute sum(value * probability) for given distribution

2004-11-13 Thread bogdan romocea
Andy,

Thanks a lot for the clarifications. I was running a simulation a
number of times and trying to come up with a number to summarize the
results. And, I failed to realize from the beginning that what I was
trying to compute was just the mean.

Regards,
b.


--- Liaw, Andy [EMAIL PROTECTED] wrote:

 First thing you probably should realize is that density is _not_
 probability.  A probability density function _integrates_ to one,
 not _sum_
 to one.  If X is an absolutely continuous RV with density f, then
 Pr(X=x)=0
 for all x, and Pr(a  X  b) = \int_a^b f(x) dx.
 
 sum x*Pr(X=x) (over all possible values of x) for a discrete
 distribution is
 just the expectation, or mean, of the distribution.  The
 expectation for a
 continuous distribution is \int x f(x) dx, where the integral is
 over the
 support of f.  This is all elementary math stat that you can find
 in any
 textbook.
 
 Could you tell us exactly what you are trying to compute, or why
 you're
 computing it?
 
 HTH,
 Andy
 
  From: bogdan romocea
  
  Dear R users,
  
  This is a KDE beginner's question. 
  I have this distribution:
   length(cap)
  [1] 200
   summary(cap)
 Min. 1st Qu.  MedianMean 3rd Qu.Max. 
459.9   802.3   991.6  1066.0  1242.0  2382.0 
  I need to compute the sum of the values times their probability
 of
  occurence.
  
  The graph is fine,
  den - density(cap, from=min(cap), 
 to=max(cap), give.Rkern=F)
  plot(den)
  
  However, how do I compute sum(values*probabilities)? The
  probabilities produced by the density function sum to only 26%: 
   sum(den$y)
  [1] 0.2611142
  
  Would it perhaps be ok to simply do
   sum(den$x*den$y) * (1/sum(den$y))
  [1] 1073.22
  ?
  
  Thank you,
  b.
  
  __
  [EMAIL PROTECTED] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  
  
 
 

--
 Notice:  This e-mail message, together with any attachments,
 contains information of Merck  Co., Inc. (One Merck Drive,
 Whitehouse Station, New Jersey, USA 08889), and/or its affiliates
 (which may be known outside the United States as Merck Frosst,
 Merck Sharp  Dohme or MSD and in Japan, as Banyu) that may be
 confidential, proprietary copyrighted and/or legally privileged. It
 is intended solely for the use of the individual or entity named on
 this message.  If you are not the intended recipient, and have
 received this message in error, please notify us immediately by
 reply e-mail and then delete it from your system.

--


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Running R from CD?

2004-11-21 Thread bogdan romocea
Better install and run R from a USB flash drive. This will save you
the trouble of re-writing the CD as you upgrade and install new
packages. Also, you can simply copy the R installation on your work
computer (no install rights needed); R will run.

HTH,
b.


From: Hans van Walen hans_at_vanwalen.com
Date: Fri 27 Aug 2004 - 23:54:53 EST


At work I have no permission to install R. So, would anyone know
whether it is possible to create a CD with a running R-installation
for a windows(XP) pc? And of course, how to?

Thank you for your help,
Hans van Walen

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] SAS or R software

2004-11-24 Thread bogdan romocea
neela v writes:
 Hi all there
  
 Can some one clarify me on this issue, features wise which is
better R or SAS, leaving the commerical aspect associated with it. I
suppose there are few people who have worked on both R and SAS and
wish they would be able to help me in deciding on this.
  
 THank you for the help
 


I very much doubt you can make an informed decision if you leave the
commercial aspect (license) aside. A single Base SAS installation
(server) can cost tens of thousands of [[your currency here; may need
to multiply by 10 or 100 or more]] in the first year, then a
percentage of that in the following years. (SAS software is not
purchased, but licensed on a yearly basis.) Want more than Base SAS?
Prepare your wallet: thousands upon thousands (per year) for
regression, anova, clustering (SAS/Stat), graphics (SAS/Graph), time
series (SAS/ETS), optimizations (SAS/OR) etc. Then, if you want
decision trees and neural networks (Enterprise Miner), I warmly
recommend you to quickly find a chair and sit down before you hear
the price tag. 

Will you always work for an organization that licenses SAS software?
Will the organization license all the modules you'll need? Will those
modules do everything you want? As others have said, R is a lot more
flexible, and the GPL ensures that whatever you can do today will
continue to be expanded and improved (much faster than SAS Institute
would want or be able to expand/improve SAS). 

All in all, if you're primarily interested in data analysis (and
don't want, for example, to get a job as a SAS programmer) and still
choose SAS, you will regret it one day. The benefits are few (such as
robust manipulation of massive data sets - I mean in excess of
hundreds of millions of rows) and the risks are high (whatever you do
is dependent on proprietary, very expensive software). With R, almost
the opposite is true: lots of benefits and no risks (nothing can take
R away from you).

HTH,
b.






__ 

All your favorites on one personal page – Try My Yahoo!

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] [BASIC] Solution of creating a sequence of object names

2004-11-29 Thread bogdan romocea
You may be missing something. After you create all those objects,
you'll want to use them. Use get():
for (i in 1:10) ... get(paste(object,i,sep=)) ... 
It took me about a week to find out how to do this. I waited for a
few days, but before I got to ask this basic/rtfm question, someone
else - fortunately :-) - did.

HTH,
b.


-Original Message-
From: John [mailto:[EMAIL PROTECTED]
Sent: Monday, November 29, 2004 4:03 PM
To: [EMAIL PROTECTED]
Subject: [R] [BASIC] Solution of creating a sequence of object names


Dear R-users,

I state that this is for beginners, so you may ignore
this in order not to be irritated.

By the way, patience is another important thing,
together with kindness, we should keep in mind when
we teach students and our own children as Jim Lemon
pointed out well in the context of the Socratic
method. You may know that being kind does not mean 
giving spoonfed answers to questioners.

-

I was asked for the solution of my problem, and a
couple of answers were given to me in private emails.
I am not sure if it was a mere accident. I post them
now, without their permission, for those who are
interested in learning them. So if you're happy to
know the solution, thanks should go to the person
concerned. I thank all the three people named below.

(1) my solution after reading the R-FAQ 7.21 as Uwe
Ligges pointed out

 for ( i in 1:10 ) {
+ assign(paste(my.file., i, sep=), NULL)
+ }


(2) Adai Ramasamy's solution

 for(obj in paste(my.ftn, 1:10, sep=))
assign(obj, NULL)
 
### or 
 
 for(i in 1:10) assign(paste(my.ftn, i, sep=),
NULL)


(3) James Holtman's solution

# For example, if you want to generate 10 groups 
# of 5 random numbers and store them 
# under then names GRPn where n is 1 - 10, 
# the following can be used:
#
 Result - list()  # create the list
 for (i in 1:10) Result[[paste(GRP, i, sep='')]] -
runif(5)   # store each result
 Result# print out the data
$GRP1
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819

$GRP2
[1] 0.89838968 0.94467527 0.66079779 0.62911404
0.06178627

$GRP3
[1] 0.2059746 0.1765568 0.6870228 0.3841037 0.7698414

$GRP4
[1] 0.4976992 0.7176185 0.9919061 0.3800352 0.7774452

$GRP5
[1] 0.9347052 0.2121425 0.6516738 0.121 0.2672207

$GRP6
[1] 0.38611409 0.01339033 0.38238796 0.86969085
0.34034900

$GRP7
[1] 0.4820801 0.5995658 0.4935413 0.1862176 0.8273733

$GRP8
[1] 0.6684667 0.7942399 0.1079436 0.7237109 0.4112744

$GRP9
[1] 0.8209463 0.6470602 0.7829328 0.5530363 0.5297196

$GRP10
[1] 0.78935623 0.02333120 0.47723007 0.73231374
0.69273156



Regards,

John

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Protocol for answering basic questions

2004-12-01 Thread bogdan romocea
I'm also an R beginner. I have asked stupid questions, and received
RTFM replies. I believe such replies are _GREAT_, as long as they
include a brief reference to what to read, and where. (In some cases
searches don't work unless you happen to use the 'right' keywords,
and in other cases it may be relatively easy to miss a paragraph in a
manual - or even FAQ.)

I believe that rudeness (perceived or real) doesn't matter. It is
only solving the problem that matters. In this respect, it seems to
me that most (if not all) users who ask a question on R-help figure
out what to do.  

In regards to politeness, I think that the solution - and the problem
- lies almost completely in the other camp: those who ask (and not
those who reply). I would recommend all R beginners to not feel
easily offended, and to not be afraid to ask stupid questions. So
what if you risk being perceived a lazy idiot? (As I occasionally am,
and certainly will be again.) Do go ahead and ask, if you must. Do
you need to solve your problem or not?

Many many many thanks to all those who bother to answer questions on
R-help. (I still find it hard to believe that experts such as Brian
Ripley and Peter Dalgaard, to quote just two names, take the trouble
to answer so many questions, including basic ones.) And, of course,
thank heavens and the R Core Team that R exists.
b.


-Original Message-
From: Robert Brown FM CEFAS [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 01, 2004 11:46 AM
To: [EMAIL PROTECTED]
Subject: [R] Protocol for answering basic questions


I have been following the discussions on 'Reasons not to answer very
basic questions in a straightforward way' with interest as someone
who is also new to R and has had similar experiences.  As such it
with sadness that I note that most seem to agree with the present
approach to the responses to basic questions.  I must thank those
respondants to my own questions who have been helpful, but there are
some whose replies are in my opinion not only unhelpful but actually
rude.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] finding the most frequent row

2004-12-10 Thread bogdan romocea
Here's something that works. I'm sure there are better solutions (in
particular the paste part - I couldn't figure out how to avoid typing
a[i,1], ..., a[i,10]).

a - matrix(nrow=1000,ncol=10)
for (i in 1:1000)
for (j in 1:10)
a[i,j] - sample(1:0,1)

b - vector(mode=character)
for (i in 1:1000)
b[i] - paste(a[i,1],a[i,2],a[i,3],a[i,4],a[i,5],
a[i,6],a[i,7],a[i,8],a[i,9],a[i,10],sep=)

#the most frequent row
table(b)[table(b) == max(table(b))]

HTH,
b.


-Original Message-
From: Lisa Pappas [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 09, 2004 5:15 PM
To: [EMAIL PROTECTED]
Subject: [R] finding the most frequent row


I am bootstrapping using a function that I have defined.  The
Statistic of the function is an array of 10 numbers.  Therefore if
I use 1000 replications,  the t matrix will have 1000 rows each of
which is a bootstrap replicate of this 10 number array (10 columns). 
Is there any easy way in R to determine which row appears the most
frequently? 

Thanks,
Lisa Pappas

Huntsman Cancer Institute wishes to promote open communication while
protecting confidential and/or privileged information.  If you have
received this message in error, please inform the sender and delete
all copies.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] errors when trying to rename data frame columns

2004-12-12 Thread bogdan romocea
Dear R users,

I need to rename the columns of a series of data frames. The names of
the data frames and those of the columns need to be pulled from some
vectors. I tried a couple of things but only got errors. What am I
missing?

#---create data frame
dframes - c(a,b,c)
assign(dframes[2],data.frame(11:20,21:30))

#---rename the columns
cols - c(one,two)

 names(get(dframes[2])) - cols
Error: couldn't find function get-
 assign(dframes[2],data.frame(cols[1]=11:20,cols[2]=21:30))
Error: syntax error
 labels(get(dframes[2]))[[2]] - cols
Error: couldn't find function labels-

I'm using R 2.0.0 on Windows XP.

Thank you,
b.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] switching to Linux, suggestions?

2004-12-13 Thread bogdan romocea
Before choosing a GNU/Linux distribution look into the package
management issue. 
http://distrowatch.com/
I would suggest that you avoid all RPM-based distributions (Mandrake,
Fedora, SuSE), and consider Debian (+ those based on it)  the
source-based distributions (such as Gentoo). I've been using Mandrake
for a couple of years but got tired of RPM. 

HTH,
b.


-Original Message-
From: Thomas W Volscho [mailto:[EMAIL PROTECTED]
Sent: Sunday, December 12, 2004 3:24 PM
To: [EMAIL PROTECTED]
Subject: [R] switching to Linux, suggestions?


Dear List,
I have acquired a new desktop and wanted to put a free OS on it.  I
am trying Fedora Core 1, but not sure what the best Linux OS is for
using R 2.0.1?

Thank you in advance for your input,
Tom Volscho


Thomas W. Volscho
Graduate Student
Dept. of Sociology U-2068
University of Connecticut
Storrs, CT 06269
Phone: (860) 486-3882
http://vm.uconn.edu/~twv1

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




__ 

Dress up your holiday email, Hollywood style. Learn more.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Moving standard deviation?

2004-12-13 Thread bogdan romocea
A simple for loop does the job. Why not write your own function?

movsd - function(series,lag)
{
movingsd - vector(mode=numeric)
for (i in lag:length(series))
{
movingsd[i] - sd(series[(i-lag+1):i])
}
assign(movingsd,movingsd,.GlobalEnv)
}

This is very efficient: it takes (much) less time to write from
scratch than to look for an existing function.

HTH,
b.


-Original Message-
From: doktora v 
Sent: Monday, December 13, 2004 1:46 PM
Cc: [EMAIL PROTECTED]
Subject: Re: [R] Moving standard deviation?


I have tried there but didn't find anything useful. Most of the
matches are for functions which take a std dev input, and the
moving
part of the query relates to something else (like moving average in
the qcc package).

Anyway, it's not too difficult to create the function, but I was
wondering if anyone had done it before. Efficiency is a
concideration,
naturally.

I'll post what i come up with...

cheers
dok


On Mon, 13 Dec 2004 10:04:59 -0800, Spencer Graves
[EMAIL PROTECTED] wrote:
   A search for moving standard deviation at www.r-project.org
-
 search - R site search just produced 7 matches.  Please look at
those
 and let us know if none of those help you (and what you tried that
 didn't work).
 
   spencer graves
 
 doktora v wrote:
 
 Is there a simple function in R to get a moving standard deviation
 (i.e. for the last x samples)?
 
 My goal is to plot bollinger bands around a moving average for
price
 data. I use kernel smoothing for the moving average.
 
 cheers and thanks!
 over and out
 -- doktora
 
 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
 
 
 
 --
 Spencer Graves, PhD, Senior Development Engineer
 O:  (408)938-4420;  mobile:  (408)655-4567
 


__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




__ 

Jazz up your holiday email with celebrity designs. Learn more.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] sort() leaves row names unaffected

2004-12-13 Thread bogdan romocea
I asked the same question a few weeks ago. See
http://tolstoy.newcastle.edu.au/R/help/04/11/6775.html


-Original Message-
From: Martin Wegmann
Sent: Tuesday, December 14, 2004 6:23 AM
To: [EMAIL PROTECTED]
Subject: [R] sort() leaves row names unaffected


Hello, 

I wonder if I ran into a bug. If I do 

summary(df1$X1) - df1.y

df1.y
a  b   c   d  e
[1,] 50.74627 8.955224 17.91045 19.40299 2.985075

sort(df1.y) 
   a  b   c   d  e
[1,] 2.985075 8.955224 17.91045 19.40299 50.74627

my numbers are sorted but do not anymore correspond to the rownames. 
For me it is counterintuitive that solely the numbers are sorted and
not the 
names. Is there a way to sort names + numbers or is this behaviour of
sort() 
unintended?

Martin

R 2.0.1-1 debian reposit.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




__ 

Dress up your holiday email, Hollywood style. Learn more.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Re : Save result in a For Loop

2004-12-14 Thread bogdan romocea
Not sure if it's the best way, but you could do it this way:
all.results - vector(mode=numeric)
for (i in 1:100)
{
...
this.run - ...
all.results - c(all.results,this.run)
}
At this point all.results contains the values of this.run from the
whole loop. If this.run is not a vector/number but a data frame look
at rbind/cbind.

Or, create a vector/matrix first and then populate it from the for
loop:
all.results - vector()/matrix()/data.frame()
for (i in 1:100)
  for(j ...)
{
...
all.results[i] - this.run  ,OR
all.results[i,] - this.run  , OR
all.results[i,j] - this.run
}

HTH,
b.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 14, 2004 2:44 PM
To: [EMAIL PROTECTED]
Subject: [R] Re : Save result in a For Loop


Hiya,

I have been struggling to save the result from the FOR loop. What is
the
best way to do it, as I need the result to merge with another dataset
for
further analysis ?

for (dd in ((M-10):M)){
+ dist-(32-dd)
+ r-1/2*(1-exp(-2*dist/100))
+ map-c(dd,round(r,4))
+ print(map)
+ next
+ }

Thanks. Stella
___
This message, including attachments, is confidential. If\ yo...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] faster row by row data frame processing

2004-12-20 Thread bogdan romocea
Dear R users,

I have a data frame with a few thousand rows and several hundred
numeric columns (plus a date column). For each row (day), I want to
assign +/- 1 to the highest X absolute values, 0 to the other values,
and save all that in a separate data frame. 

I have a working solution (below), however I find it rather slow. Is
there something I could do to increase the speed? (The code is
CPU-bound; Pentium 4 @ 2.4 GHz, 512 MB RAM, Win XP, R 2.0.0.)

Thank you,
b.


#all is the original data frame (date + a number of columns)
#set up the output data frame
DailyTopN - data.frame(all[1,1],matrix(ncol=ncol(all)-1))
names(DailyTopN) - names(all)
top - 20
for (i in 1:1000)   #the rows to be processed
{
#data frame row as vector
onerow - na.omit(as.matrix(all[i,][2:ncol(all)])[1, ])
#select the 'top' highest absolute values
r - rank(abs(onerow),ties.method=random)
selected - names(r[which(r = top)])
#set +/-1 for the highest absolute values, 0 for the others
DailyTopN[i,selected] - 1 * sign(all[i,selected])
DailyTopN[i,1] - all[i,1]  #add the date
}
DailyTopN[is.na(DailyTopN)] - 0
rownames(DailyTopN) - 1:nrow(DailyTopN)

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] scheduling R tasks under windows

2004-12-21 Thread bogdan romocea
Save the command(s) in a batch (.bat) file, and then run the .bat
file from the task scheduler.


-Original Message-
From: Mikkel Grum [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 21, 2004 3:18 PM
To: RHelp
Subject: [R] scheduling R tasks under windows


I'm trying to schedule R tasks in Windows Server 2003.
I can run the following from the DOS prompt without
any difficulty:

c:\Reportsc:\r\rw2001\bin\rterm.exe --no-restore
--no-save test.R test.out

where test.r has two lines: library(tools);
Sweave(rlr.Rnw).

When I try to run the same from the task scheduler, I
fill in the dialogue box as follows:

Run:c:\r\rw2001\bin\rterm.exe --no-restore --no-save
test.R test.out

Start in:   c:\Reports

Which opens Rterm, but is preceded by ARGUMENT
'test.R' __ignored__ and ARGUMENT 'test.out'
__ignored__

Anyone know what I'm doing wrong?

Mikkel

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] how to fit in R

2004-12-22 Thread bogdan romocea
See
http://www.statsoft.com/textbook/stdisfit.html
There are several approaches you can use - Chi-square, Q-Q plots, P-P
plots, various tests (Kolmogorov-Smirnov, Shapiro-Wilks' W) etc. 

HTH,
b.


-Original Message-
From: Angela Re
Sent: Wednesday, December 22, 2004 9:13 AM
To: [EMAIL PROTECTED]
Subject: [R] how to fit in R


Good morning,
in my work I need to study data distributions and so I  need to fit
the 
experimental distribution by theoretical curves such as normal,
Poison, 
binomial and so on.  I'd like to know, given a vector of data, for
example

x-rnorm (1000, 10)

if they follow a normal distribution. I'd like to do a fit (to
estimate 
the parameters of the theoretical distribution) and then a goodness
test.
Can you suggest me any R package or manuals about this issue? The 
documentation on the R-guide isn't sufficient to me.
Thank you of your help, Angela

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] combination of scatterplot and image graph

2004-12-22 Thread bogdan romocea
Dear R users,

I'm interested in a combination of a scatterplot and an image graph.
I have two large vectors. Because in the scatterplot some areas are
sparsely and others densely populated, I want to see the points, and
I also want their color to be changed based on their density (similar
to a heat map). Is there a function that can do that?

Thank you,
b.





__ 

Send a seasonal email greeting and help others. Do good.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] coplot with png: disappearing grid lines

2004-12-29 Thread bogdan romocea
Dear useRs,

When I use coplot() and output to png/jpeg/bmp, the grid lines from
the scatter plots disappear. If I output to pdf() the grid lines are
there, however I can't use it - I have many points, and the resulting
PDF file is large and very slow to open and scroll through. (By the
way, if I click File-Save As-png/jpeg/bmp from Rgui.exe, the grid
lines are preserved - but I need to use code.)

With coplot(), is there a way to:
1. Keep the grid lines in the scatter plots when exporting output to
png(), and perhaps change their color?
2. Specify the number of grid lines to be drawn on the x and y axes?

I'm running R 2.0.0 on Win XP.

Thank you,
b.

a - rnorm(5)
b - rnorm(5)
c - rnorm(5)
#pdf(test.pdf,height=9,width=12)
png(test.png,height=900,width=1200)
coplot(a ~ b | c,pch=20,col=navy,
bar.bg=c(num=gray(0.8),fac=grey(0.95)))
dev.off()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Tuning string matching

2005-01-05 Thread bogdan romocea
This is a rather complex problem. I'm not aware of an R function /
package that can do something like this, but in case you need to build
it from scratch read
http://support.sas.com/documentation/periodicals/obs/obswww15/index.html
If you're familiar with SAS you could translate the code to R.

HTH,
b.


-Original Message-
From: [EMAIL PROTECTED]
Sent: Wednesday, January 05, 2005 12:36 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Tuning string matching


Dear list,

I spent about two hours searching on the message archive, with no
avail.
I have a list of people that have to pass an on-line test, but only a
fraction
of them do it. Moreover, as they input their names, the resulting
string do not
always match the names I have in my database.

I would like to do two things:

1. Match any strings that are 90% the same
Example:
name1 - Harry Harrington
name2 - Harry Harington
I need a function that would declare those strings as a match (ideally
having an
argument that would allow introducing 80% instead of 90%)

2. Arrange a final table that would take me from:

Table1 (the complete list of people from my database)
No Name
1  Byron C. Andrew
2  Friedman Bob
3  Harrington Harry

Table2 (the people having been tested)
No Name   Score
1  Harry Harington13
2  Byron Andrew   28

to:

No Name1  Name2  Score
1  Byron C. AndrewByron Andrew   28
2  Friedman Bob
3  Harrington Harry   Harry Harington13

Thank you in advance, any help is highly appreciated.
Adrian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] global objects not overwritten within function

2005-01-11 Thread bogdan romocea
Dear useRs,

I have a function that creates several global objects with
assign(obj,obj,.GlobalEnv), and which I need to run iteratively in
another function. The code is similar to

f - function(...) {
assign(obj,obj,.GlobalEnv)
}
fct - function(...) {
for (i in 1:1000)
{
...
f(...)  
...obj...
rm(obj) #code fails without this line
}
}

I don't understand why f(), when run in a for() loop inside fct(), does
not overwrite the global object 'obj'. If I don't delete 'obj' after I
use it, the code fails - the same objects created by the first
iteration are used by subsequent iterations. 

I checked ?assign and the Evaluation chapter in 'R Language Definition'
but still don't understand why the above happens. Can someone briefly
explain or suggest something I should read? By the way, I don't want to
use 'better' techniques (lists, functions that return values instead of
creating global objects etc) - I want to create global objects with f()
and overwrite them again and again within fct().

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] global objects not overwritten within function

2005-01-12 Thread bogdan romocea
Apparently the message below wasn't posted on R-help, so I'm sending it
again. Sorry if you received it twice.

--- bogdan romocea [EMAIL PROTECTED] wrote:

 Date: Tue, 11 Jan 2005 17:31:42 -0800 (PST)
 From: bogdan romocea [EMAIL PROTECTED]
 Subject: Re: [R] global objects not overwritten within function

Thank you to everyone who replied. I had no idea that ... means
something in R, I only wanted to make the code look simpler. I'm
pasting below the functional equivalent of what took me yesterday a
couple of hours to debug. Function f() takes several arguments (that's
why I want to have the code as a function) and creates several objects.
I then need to use those objects in another function fct(), and I want
to overwrite them to save memory (they're pretty large).

It appears that Robert's guess (dynamic/lexical scoping) explains
what's going on. I've noticed though another strange (to me) issue:
without indexing (such as obj1 - obj1[obj1  0] - which I need to use
though), fct() prints the expected values even without removing the
objects after each iteration. However, after indexing is introduced,
rm() must be used to make fct() return the intended output. How would
that be explained?

Kind regards,
b.

f - function(read,position){
obj1 - 5 * read[position]:(read[position]+5)
obj2 - 7 * read[position]:(read[position]+5)
assign(obj1,obj1,.GlobalEnv)
assign(obj2,obj2,.GlobalEnv)
}
fct - function(input){
for (i in 1:5)
{
f(input,i)
obj1 - obj1[obj1  0]
obj2 - obj2[obj2  0]
print(obj1)
print(obj2)
#   rm(obj1,obj2)   #get intended results with this line
}
}
a - 1:10
fct(a)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] help wanted using R in a classroom

2005-01-18 Thread bogdan romocea
It appears you wouldn't get much improvement at all even if the 2nd CPU
were used at 100%. Five R sessions can easily overwhelm one CPU. I
think you need (a lot) more CPUs than 2 to solve your problem.

Possible solutions:
1. Install R on each eMac. Since you have 40 of them, you might want to
put together a script to do this.
2. Get some boxes that can run Windows. On Windows, you can run R from
a CD/zip drive/USB drive. (So you could burn 40 CDs and have everyone
run their R session on their box.) As far as I know the same is not
true for GNU/Linux and Mac OS.

HTH,
b.


-Original Message-
From: Sam Parvaneh
Sent: Monday, January 17, 2005 6:11 AM
To: r-help@stat.math.ethz.ch
Subject: [R] help wanted using R in a classroom


Hi everyone!

I'm using R 2.0.1 for Mac OS X in a classroom with 40 eMacs running Mac
OS 
X version 10.3.6.
These Macs are network based, meaning that the students log in to an 
XServe G4 where their user accounts and home directories are stored.

The problem that I'm having each time a group of students (usually 7 to

10) use R is that the whole system get incredibly slow.
The response time for opening an application  while the students are 
running R is around 5 minutes.
If a student wants to log into the system while others are running R,
it 
can take up to 10 minutes for the student to get logged in.
Everything gets very slow that it's almost impossible to work.
When I look at the server Graphs, the CPU usage of the first CPU is
always 
100% when these students are using R. The second CPU is left at 15%. 

When these students quit R, then everything's is back to normal again.
The 
usage of both CPUs go back down to between 5-10%.
Is there anyone out there using R in a university like this?
Does anyone have an idea what this might depend one or maybe a
solution?
I can provide some more information if anyone wants, if you think you
can 
help me.

Thanks in advance
/Sam


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] animation without intermediate files?

2005-01-26 Thread bogdan romocea
Here's a different suggestion. Create a bunch of image files, and then
use an image browser (GQview is one of the best; if you're on Win look
at ACDSee) to view them as a slide show. Good image browsers read
images in advance and should not produce flickering. I haven't
experimented though with delays under 5 seconds.

HTH,
b.


-Original Message-
From: Paul Murrell
Sent: Wednesday, January 26, 2005 2:46 PM
To: Martin Maechler
Cc: Cari G Kaufman; r-help@stat.math.ethz.ch
Subject: Re: [R] animation without intermediate files?


Hi


Martin Maechler wrote:
MM == Martin Maechler [EMAIL PROTECTED]
on Tue, 25 Jan 2005 09:59:03 +0100 writes:

 
Paul == Paul Murrell [EMAIL PROTECTED]
on Tue, 25 Jan 2005 13:40:15 +1300 writes:

 
 Paul Hi
 Paul Cari G Kaufman wrote:
  Hello, 
  
  Does anyone know how to make movies in R by making a
  sequence of plots?  I'd like to animate a long
  trajectory for exploratory purposes only, without
  creating a bunch of image files and then using another
  program to string them together.  In Splus I would do
  this using double.buffer() to eliminate the flickering
  caused by replotting. For instance, with a 2-D
  trajectory in vectors x and y I would use the following:
  
  motif()
  double.buffer(back)
  for (i in 1:length(x)) {
plot(x[i], y[i], xlim=range(x), ylim=range(y))
double.buffer(copy)
  }
  double.buffer(front)
  
  I haven't found an equivalent function to double.buffer in R.
 I tried
  playing around with dev.set() and dev.copy() but so far with
no success
  (still flickers).
 
  Paul Double buffering is only currently an option on the
Windows graphics 
  Paul device (and there it is on by default).  So something
like ...
 
  Paul x - rnorm(100)
  Paul for (i in 1:100)
  Paul plot(1:i, x[1:i], xlim=c(0, 100), ylim=c(-4, 4),
pch=16, cex=2)
 
  Paul is already smooth
 
 MM well, sorry Paul, but not for my definition of smooth!
 
 MM Instead, 
 
 MM n - 100
 MM plot(1,1, xlim=c(0,n), ylim=c(-4,4), type=n)
 MM x - rnorm(n)
 MM for (i in 1:n) { points(i, x[i], pch=16, cex=2);
Sys.sleep(0.02) }
 
 MM comes much closer to my version of smooth  ;-)
 
 I apologize to Paul, since  what I said seems to be quite
 platform dependent.  Here's my current knowledge on the matter:
 
 o  Paul's   for(..) plot(..) 
 - flickers quite a bit for me {on Linux X11 with no
   particularly fast graphics card}.
 - seems quite smooth for at least two Windows users who have
   relatively fast graphics cards.
 
 o  My solution of
   for(..) { points(..) ; Sys.sleep(..) } 
doesn't redraw the coordinate system and so doesn't flicker 
(afaik, independently of platform)
 
HOWEVER on windows; the graphics are somehow buffered and
points are not drawn one by one, but rather in batches -- not
smooth


Thanks Martin; I wasn't very clear on my original message.  Double 
buffering has only been implemented on the Windows graphics device at 
this stage (thanks to Brian) and this implementation basically always 
writes to a buffer and updates the screen at fixed time intervals 
(quoting the source: 100ms after last plotting call or 500ms after
last 
update) so there is no user control of when the off-screen buffer is 
swapped to the screen.

For animating a plot where only new output is added (i.e., no
existing 
output is modified or removed), your suggestion should produce the 
smoothest result.

Paul
-- 
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
[EMAIL PROTECTED]
http://www.stat.auckland.ac.nz/~paul/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] have R informed of MySQL table updates

2005-02-08 Thread bogdan romocea
Dear useRs,

I have a script (Python) that every once in a while appends data to a
MySQL table. Meanwhile, I have a running R session, and I want it to be
aware of such table updates. I could write a loop in R to periodically
check whether new data has become available; however, are you aware of
a way to make MySQL/Python talk directly to R? I'm interested in both
GNU/Linux and Windows approaches (if any).

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] question about sorting POSIXt vector

2005-02-09 Thread bogdan romocea
Dear useRs,

How come the first attempt to sort a POSIXt vector fails (Error:
non-atomic type in greater), while the second succeeds? (Code inserted
below.) The documentation says that POSIXt is used to allow operations
such as subtraction, so I'd expect sorting to work. Is this perhaps an
OS issue? (I run R 2.0.1 on Win xp.)

Thank you,
b.

#code
test - c(2005-02-08 18:49:15,2005-02-07 18:36:54,
2005-02-04 18:37:03,2005-02-06 18:29:04)
test - strptime(test,format=%Y-%m-%d %H:%M:%S)
order(test,decreasing=F)#doesn't work - why?
tst - test + 0
order(tst,decreasing=F) #works - how come?
print(tst)
#run
 test - c(2005-02-08 18:49:15,2005-02-07 18:36:54,
+ 2005-02-04 18:37:03,2005-02-06 18:29:04)
 test - strptime(test,format=%Y-%m-%d %H:%M:%S)
 order(test,decreasing=F)#doesn't work - why?
Error in order(test, decreasing = F) : non-atomic type in greater
 tst - test + 0
 order(tst,decreasing=F)#works - how come?
[1] 3 4 2 1
 print(tst)
[1] 2005-02-08 18:49:15 Eastern Standard Time 2005-02-07 18:36:54
Eastern Standard Time
[3] 2005-02-04 18:37:03 Eastern Standard Time 2005-02-06 18:29:04
Eastern Standard Time


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] download files through secure http (HTTPS)

2005-02-27 Thread bogdan romocea
Dear useRs,

I'm trying to download some data through the HTTPS protocol. However,
download.file() does not support HTTPS (R 2.0.1 on WinXP):
Error in download.file(https.url, destfile = test.txt) : 
unsupported URL scheme

1. Is there any other function/package in R that can work with HTTPS?
2. If not, what would need to happen to make download.file() support
HTTPS? 

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] draw random samples from empirical distribution

2005-02-28 Thread bogdan romocea
Dear useRs,

I have an empirical distribution (not normal etc) and I want to draw
random samples from it. One solution I can think of is to compute let's
say 100 quantiles, then use runif() to draw a random number Q between 1
and 100, and finally run runif() again to pull a random value from the
quantile Q. Is there perhaps a better/more elegant way of doing this?

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Temporal Analysis of variable x; How to select the outlier threshold in R?

2005-03-01 Thread bogdan romocea
I'm not sure I understand. 
You have financial data and want to throw away some outliers?? 
Why would you ever do this?

First of all, I'd suggest you pay close attention to what the data is
trying to say. Maybe your distribution is not normal after all (see
tests for normality etc). Maybe you shouldn't force your normality
assumption upon the data. 



-Original Message-
From: Melanie Vida [mailto:[EMAIL PROTECTED]
Sent: Friday, February 25, 2005 1:30 PM
To: r-help
Subject: [R] Temporal Analysis of variable x; How to select the outlier
threshold in R?


For a financial data set with large variance, I'm trying to find the 
outlier threshold of one variable x over a two year period. I 
qqplot(x2001, x2002) and found a normal distribution. The latter part
of 
the normal distribution did not look linear though. Is there a suitable

method in R to find the outlier threshold of this variable from 2001
and 
2002  in R?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] XML to data frame or list

2005-03-10 Thread bogdan romocea
Dear useRs,

I have a simple/RTFM question about XML parsing. Given an XML file,
such as (fragment)
A100/A
B23/B
Ctrue/C
how do I import it in a data frame or list, so that the values (100,
23, true) can be accessed through the names A, B and C?

I installed the XML package and looked over the documentation...
however after 20 minutes and a couple of tests I still don't know what
I should start with. 

Can someone provide an example or point me to the appropriate
function(s)?

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] XML to data frame or list

2005-03-13 Thread bogdan romocea
I managed to parse more complex XML files as well. The trick was to
manually determine the position of the child nodes of interest, after
which they can be parsed in a loop. For example:

require(XML)
doc - xmlTreeParse(file.xml,getDTD=T,addAttributeNamespaces=T)
r - xmlRoot(doc)

#find the nodes of interest
r[[i]][[j]]

#then read them
xmldata - list(NULL)
for (i in 1:xmlSize(r[[2]][[1]])) {
  xmldata[[i]] - as.data.frame(xmlSApply(r[[2]][[1]][[i]],xmlValue))
  }


--- Barry Rowlingson [EMAIL PROTECTED] wrote:
 Gabor Grothendieck wrote:
 
 
  You could check out the ctv package that was recently announced.
  It uses XML so its source would provide an example.
 
  If its a one-time operation, Excel reads XML and you could then
  use one of the many Excel to R possibilities.
 
   For an xml file like this:
 
 ?xml version=1.0?
 variables
 a100/a
 b23/b
 z666/z
 /variables
 
 its a one-liner with the XML package (library(XML)):
 
 xmlReadSimple -
 function(xmlFile){
as.list(xmlSApply(xmlRoot(xmlTreeParse(xmlFile)),xmlValue))
 }
 
 add an lapply(...,as.numeric) for conversion to numbers.
 
   sweet.
 
 Baz
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
   
 
 __
 Do You Yahoo!?

 http://mail.yahoo.com 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Mandrake 10.1

2005-03-15 Thread bogdan romocea
I managed to install R 2.0.1 on Mandrake 10.1 a couple of weeks ago. It
wasn't that easy, first I had to manually track, download and install
3-4 dependencies.

I would suggest that you consider another GNU/Linux distribution,
Mepis. Mepis combines the best features of several distributions:
- You can run it from CD, like Knoppix/Quantian.
- If you like it, you can easily install it on your hard drive (unlike
Knoppix/Quantian). Just double click an icon and a graphical wizard
will guide you through the installation steps. It's as easy to install
as Mandrake, perhaps a bit easier (automatic hardware detection and
configuration etc).
- Package management is done automatically (no more annoying
notifications from Mdk's urpmi - like sorry, can't do this, go ahead
and figure it out by yourself). You can use Synaptic or apt-get, and
you can install packages from the Debian testing and unstable
repositories (which is great). Unlike Debian though, Mepis is much
easier to install (imho).

As someone who went through several failed installation attempts
(Gentoo, Debian, Quantian), primarily due to hardware issues which I
didn't have the patience to try to fix, I appreciate a lot what Mepis
has to offer. You can have a complete system (R + packages etc) up and
running in 30 minutes starting from scratch (assuming you have
broadband) -- which is about what it would take you to fix the
dependencies for just one binary (such as R-2.0.0-1mdk.i586.rpm) on
Mandrake.

hth,
b.


-Original Message-
From: Christian [mailto:[EMAIL PROTECTED]
Sent: Monday, March 14, 2005 1:21 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Mandrake 10.1


Dear all,

I am trying to install the R-2.0.0-1mdk.i586.rpm 
http://cran.planetmirror.com/bin/linux/mandrake/10.0/R-2.0.0-1mdk.i586.rpm

file   on mandrake 10.1. Since the file is, originally, meant for 
Mandrake 10.0, it is not surprising me that the installation does not
work.

The error message that I get can be translated in something like: 
impossible to install since the info is not satisfied.
Could you please help me in installing R on my Mandrake 10.1?

PS If you feel to answer me,  consider that I am almost an absolute 
beginner at linux:)

Thanks a lot

Christian

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Mandrake 10.1

2005-03-16 Thread bogdan romocea
--- Rau, Roland [EMAIL PROTECTED] wrote:

  -Original Message-
  From: r-help
  On Behalf Of bogdan romocea
  Sent: Tuesday, March 15, 2005 2:49 PM
  
  I would suggest that you consider another GNU/Linux distribution,

 I don't think it is necessary. Mandrake 10.1 is fine for 
 running R.[1] I have Mandrake 10.1 (Community) at home 
 running on my notebook and I was able to compile R without 
 any problems - just using the software that was shipped with 
 this distribution.


It is certainly not necessary; even Windows is fine for running R.
However, assuming R is not the only package to be installed and then
upgraded, switching from something like Mandrake to something like
Mepis may result in significant time savings, which is what I care
about most. (Your mileage may vary.) I used Mdk for a couple of years
and prefer to not remember how many hours I wasted on something as
trivial as installing and upgrading packages. (Compilation will not
save you always from having to manually upgrade other libraries,
especially as your Mdk installation gets older.)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Basic questions about RMySQL

2005-03-18 Thread bogdan romocea
 1. No way. You must have MySQL installed on your computer.

In fact this is not true. You can use a MySQL server installed
somewhere else on the network.



--- bogdan romocea [EMAIL PROTECTED] wrote:
 1. No way. You must have MySQL installed on your computer.
 
 2. You must install the server. For details, see
 http://dev.mysql.com/doc/mysql/en/index.html . 
 For portability, I would suggest that you run MySQL in the shell
 (ignore the GUIs) and save the syntax for adding users, creating
 tables
 etc. This will likely take more time when you first do it, but if you
 have to move to another computer later on, you can setup the new
 MySQL
 installation very quickly and easily.
 
 hth,
 b.
 
 
 -Original Message-
 From: De la Vega Góngora Jorge [mailto:[EMAIL PROTECTED]
 Sent: Friday, March 18, 2005 11:58 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Basic questions about RMySQL
 
 
 Hello,
 
 Please forget me if I am asking something that is well documented. I
 have read documentation but there are points that are not clear for
 me.
 I am not expert in R nor Databases, but if someone direct me to a
 tutorial, I will appreciate it..
 
  1. In my understanding, I can install and use RMySQL withouth having
 to install MySQL in my PC, to have access to and to create new tables
 .
 Is this right? 
 
  2. I have created a c:\my.cnf file to access a database I have, but
 withouth installing the server, where I can define the user, password
 and host to establish a connection?
 
 Thanks in advance
 
 
 ---
 Jorge de la Vega Gongora | Telefono: (525) 5268 8379
 Investigador | Fax:  (525) 5268 8481
 Banco de Mexico  | email:  [EMAIL PROTECTED]
 Planeación y Programación de Emisión | web:   
 http://www.stat.umn.edu/~jvega
 Calzada Legaria 691 Módulo IV|
 Col. Irrigación 11500|
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
 
 
   
 __ 
 Do you Yahoo!? 

 http://smallbusiness.yahoo.com/resources/ 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Basic questions about RMySQL

2005-03-18 Thread bogdan romocea
I certainly can't; I initially misunderstood the question.

If connecting to MySQL is the problem, then you need to know the user
ID, the domain and the password. Ask your DB administrator for help.

Here's an example that works for me (local MySQL installation):
require(DBI)
require(RMySQL)
MySQL(max.con = 16, fetch.default.rec = 5000, force.reload = F)
drv - dbDriver(MySQL)
con - dbConnect(drv,username=userid,password=pswd,dbname=db)
dbListTables(con)



--- Uwe Ligges [EMAIL PROTECTED] wrote:
 bogdan romocea wrote:
 
  1. No way. You must have MySQL installed on your computer.
  
  2. You must install the server. For details, see
  http://dev.mysql.com/doc/mysql/en/index.html . 
  For portability, I would suggest that you run MySQL in the shell
  (ignore the GUIs) and save the syntax for adding users, creating
 tables
  etc. This will likely take more time when you first do it, but if
 you
  have to move to another computer later on, you can setup the new
 MySQL
  installation very quickly and easily.
 
 Can you tell us any reason why the server should run on the same
 machine 
 R is running on?
 
 Uwe Ligges
 
 
  hth,
  b.
  
  
  -Original Message-
  From: De la Vega Góngora Jorge [mailto:[EMAIL PROTECTED]
  Sent: Friday, March 18, 2005 11:58 AM
  To: r-help@stat.math.ethz.ch
  Subject: [R] Basic questions about RMySQL
  
  
  Hello,
  
  Please forget me if I am asking something that is well documented.
 I
  have read documentation but there are points that are not clear for
 me.
  I am not expert in R nor Databases, but if someone direct me to a
  tutorial, I will appreciate it..
  
   1. In my understanding, I can install and use RMySQL withouth
 having
  to install MySQL in my PC, to have access to and to create new
 tables .
  Is this right? 
  
   2. I have created a c:\my.cnf file to access a database I have,
 but
  withouth installing the server, where I can define the user,
 password
  and host to establish a connection?
  
  Thanks in advance
  
  
  ---
  Jorge de la Vega Gongora | Telefono: (525) 5268 8379
  Investigador | Fax:  (525) 5268 8481
  Banco de Mexico  | email:  [EMAIL PROTECTED]
  Planeación y Programación de Emisión | web:   
  http://www.stat.umn.edu/~jvega
  Calzada Legaria 691 Módulo IV|
  Col. Irrigación 11500|
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Graphics (for goodness of fit) Question

2005-03-21 Thread bogdan romocea
In regards to your plot question, you could use points() or lines():
a - sample(1:50,10)
b - sample(20:40,10)
plot(1:10,a,pch=20,col=red)
points(1:10,b,pch=20,col=blue)
#or
#lines(1:10,b,pch=20,col=blue,type=o)



-Original Message-
From: Mohammad Ehsanul Karim [mailto:[EMAIL PROTECTED]
Sent: Sunday, March 20, 2005 10:46 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Graphics (for goodness of fit) Question


Dear List,

Suppose, I have some observed and expected
frequencies, such as following. 
I need to draw a graph where plots of observed and
expected frequencies are merged into one.

 m - c(1,2,3,4,5,6,7,8,9,10,12,13,17)
 k - c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 19)
 ExpWW - c(0.309330628803245, 0.213645190887434,
0.147558189649435, 0.101913922060107,
0.0703888244654489, 0.0486154051328303,
0.0335771712935674, 0.0231907237838939,
0.0160171226134196, 0.0110625360037919,
0.00764055478558038, 0.00527709716935116,
0.000395627498345897)
 ExpDD - c(0.420249653259362, 0.243639882194748,
0.141250306182253, 0.0818899139863827,
0.0474757060281664, 0.0275240570315860,
0.0159570816077711, 0.00925112359507395,
0.00536334211198462, 0.00310939944911175,
0.00104510169329968, 0.00060589806906972,
6.84484529305126e-05)
 ObjDD - c(0.468646864686469, 0.198019801980198,
0.151815181518152, 0.0759075907590759,
0.0396039603960396, 0.0198019801980198,
0.0165016501650165, 0.0099009900990099,
0.0033003300330033, 0.0033003300330033,
0.0033003300330033, 0.0066006600660066,
0.0033003300330033)
 ObjWW - c(0.373770491803279, 0.150819672131148,
0.127868852459016, 0.0721311475409836,
0.0885245901639344, 0.0622950819672131,
0.039344262295082, 0.0327868852459016,
0.0360655737704918, 0.00327868852459016,
0.00655737704918033, 0.00327868852459016,
0.00327868852459016)

  par(mfrow=c(2,2))
  plot(k,ObjWW, type=l) # Plot 1
  plot(k,ExpWW, type=l) # Plot 2
  plot(m,ObjDD, type=l) # Plot 3
  plot(m,ExpDD, type=l) # Plot 4

# I need to see plot 1 and 2 in same axis, and plot 3
and 4 in another 
# (i.e., 3, 4 both in same axis too, but not with 1
and 2's).
# How can i use different types of legends in the same
graph??

 sum(((ObjWW-ExpWW)^2)/ExpWW) # Chi-Squared Goodness
of Fit Test
 sum(((ObjDD-ExpDD)^2)/ExpDD) # Chi-Squared Goodness
of Fit Test

# Also, is there any other convenient way of doing
chi-squared goodness of fit test (any function or
package may be, to do this directly)?
# And how can i find the P-values of the respective
chi-squared tests in R?


Any suggestion, direction, references, help, replies
will be highly appreciated.

Thank you for your time.


Mohammad Ehsanul Karim

Web: http://snipurl.com/ehsan
Institute of Statistical Reseach and Training
University of Dhaka, Dhaka - 1000, Bangladesh

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Gmail invitation

2005-03-25 Thread bogdan romocea
You can also buy these things on Ebay. I noticed the supply about 2
months ago when I guess you would have made about $1-2 per invitation.
The profit opportunity is much diminished now that the supply has
greatly increased (it appears every gmail account was allocated 50
invitations instead of 5 a few weeks ago). By the way, how much do you
charge? :-)



-Original Message-
From: Gorjanc Gregor [mailto:[EMAIL PROTECTED]
Sent: Friday, March 25, 2005 1:32 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Gmail invitation


Hello R users!

I just found out that I have 49 invitations for Gmail
(gmail.google.com).
I have been using it now for a while and is really nice. Don't forget 
1 GB for free.

I will invite those who respond to this mail by FIFO.

--
Lep pozdrav / With regards,
Gregor Gorjanc


University of Ljubljana
Biotechnical Faculty   URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Departmentemail: gregor.gorjanc at bfro.uni-lj.si
Groblje 3  tel: +386 (0)1 72 17 861
SI-1230 Domzalefax: +386 (0)1 72 17 888
Slovenia

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] how to simulate a time series

2005-03-31 Thread bogdan romocea
Dear useRs,

I want to simulate a time series (stationary; the distribution of
values is skewed to the right; quite a few ARMA absolute standardized
residuals above 2 - about 8% of them). Is this the right way to do it?
#
load(rdtb)#the time series
 summary(rdtb)
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
-1.11800 -0.65010 -0.09091  0.30390  1.12500  2.67600 

farma - arima(rdtb,order=c(1,0,1),include.mean=T)
 farma[[coef]]
   ar1ma1  intercept 
0.58091575 0.02313803 0.30417062 

sim - list(NULL)   #simulated
for (i in 1:5) {
sim[[i]] - as.vector(arima.sim(list(ar=c(farma[[coef]][1]),
ma=c(farma[[coef]][2])),n=length(rdtb),innov=rdtb))
}
allsim - as.data.frame(sim)
colnames(allsim) - paste(sim,1:5,sep=)
all - cbind(rdtb,allsim)
#

I don't understand why the simulation runs generate virtually identical
values:
 all[100:105,]
  rdtb sim1 sim2 sim3 sim4 sim5
100  2.3863636 1.065661 1.065661 1.065661 1.065661 1.065661
101  1.9318182 2.606093 2.606093 2.606093 2.606093 2.606093
102  2.2954545 3.854074 3.854074 3.854074 3.854074 3.854074
103  2.5882353 4.880240 4.880240 4.880240 4.880240 4.880240
104  2.0227273 4.917622 4.917622 4.917622 4.917622 4.917622
105 -0.1521739 2.751352 2.751352 2.751352 2.751352 2.751352

It appears I may be missing something (very) basic, but don't know
what.

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] a R function for sort a data frame.

2005-04-01 Thread bogdan romocea
dfr - data.frame(sample(1:50,10),sample(1:50,10))
colnames(dfr) - c(a,b)
dfr - dfr[order(dfr$a),]
dfr - dfr[order(-dfr$a),]



-Original Message-
From: Mario Morales [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 31, 2005 10:23 PM
To: r-help@stat.math.ethz.ch
Subject: [R] a R function for sort a data frame.


Is there a R function for sort a data frame by a variable ?

I know sort a vector, but I don't know sort a data frame by a
column. Can you help me ?

the sort() function don't  work with data frame.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Amount of memory under different OS

2005-04-04 Thread bogdan romocea
You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use
more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of
RAM, the seller will probably warn you about Windows and recommend a
suitable OS.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Saturday, April 02, 2005 12:48 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Amount of memory under different OS


Hi,
I have a problem: I need to perform a very tough analysis, so I would
like
to buy a new computer with about 16 GB of RAM. Is it possible to use
all
this memory under Windows or  have I to install other OS?
Thanks,


  Marco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




__ 

Show us what our next emoticon should look like. Join the fun. 
http://www.advision.webevents.yahoo.com/emoticontest

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] looking for a plot function

2005-04-06 Thread bogdan romocea
Dear useRs,

I have a data frame and I want to plot all rows. Each row is
represented as a line that links the values in each column. The plot
looks like this:

dfr - data.frame(A=sample(1:50,10),B=sample(1:50,10),
C=sample(1:50,10),D=sample(1:50,10))
xa - 10*1:4
plot(c(10,40),c(0,50))
for (i in 1:nrow(dfr)) {
lines(xa,dfr[i,],pch=20,type=o)
}

Things get more complicated because I want the columns to be rescaled
so as to fit nicely on a graph (for example if A has values between 0
and 100 but B has values between 100 and 1000, then rescale A or B),
labels etc. Is there a function that can do plots like this? 

Thank you,
b.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Considering port of SAS application to R

2006-04-21 Thread bogdan romocea
Forget about R for now and port the application to MySQL/PostgreSQL
etc, it is possible and worthwhile. In case you happen to use (and
really need) some SAS DATA STEP looping features you might be forced
to look into SQL cursors, otherwise the port should be (very)
straightforward.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Werner
 Wernersen
 Sent: Friday, April 21, 2006 7:09 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Considering port of SAS application to R

 Hi there!

 I am considering to port a SAS application to R and I would
 like to hear your opinion if you think this is possible and
 worthwhile. SAS is mainly used to do data management and then
 to do some aggregations and simple computations on the data
 and to output a modified data set. The main problem I see is
 the size of the data file. As I have no access to SAS yet I
 cannot give real details but the SAS data file is about 7
 gigabytes large. (It's only the basic SAS system without any
 additional modules)

 What do you think, would a port to R be possible with
 reasonable effort? Is R able to handle that size of data? Or
 is R prepared to work together with some database system?

 Thanks for your thoughts!

 Best regards,
   Werner

   
 -

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Need R code

2006-04-21 Thread bogdan romocea
Here's an example.

lst - list()
for (i in 1:5) {
   lst[[i]] - data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE))
   colnames(lst[[i]])[2] - paste(x,i,sep=)
   }
dfr - lst[[1]]
for (i in 2:length(lst)) dfr - merge(dfr,lst[[i]],all=TRUE)
dfr - dfr[order(dfr[,1]),]
print(dfr)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of stat stat
 Sent: Thursday, April 20, 2006 1:15 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Need R code

 Dear r-users,

 Suppose I have three datasets:
   Dataset-1:
   Date  x y
   Jan-1,2005120   230
 Jan-2,2005123   -125
 Jan-3,2005-110  300
 Jan-4,2005114   -21
 Jan-7,200511299
 Mar-5,2005200   311

   Dataset-2:
   Date  x  y
   Jan-2,2005123   -125
 Jan-3,2005-110  300
 Jan-4,2005114   -21
 Jan-5,200511299
 Jan-6,2005-23   12
 Mar-5,2005200   311

   Dataset-3:
   Date  x  y
   Jan-3,2005-110  300
 Jan-4,2005114   -21
 Jan-5,200511299
 Mar-5,2005200   311
 Apl-23,2005   123   200
   Now I want to get the common dates along with x and y from
 this above three datasets keeping the same order
 in date-variable as it is.
   For ex. I want to get:
   Datex  y xy
  x  y
(from dataset-1) (from dataset-2)
 (from dataset-3)
 --
 --
   Jan-3,2005-110  300  -110 300
-110  300
 Jan-4,2005 114  -21 114-21
114   -21
 Mar-5,2005200   311   200 311
  200   311
   Can anyone give me any R code to implement this for any
 number of datasets ?
   Thanks and regards



 thanks in advance
   
 -


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] regression modeling

2006-04-25 Thread bogdan romocea
There is an aspect, worthy of careful consideration, you don't seem to
be aware of. I'll ask the question for you: How does the
explanatory/predictive potential of a dataset vary as the dataset gets
larger and larger?


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi
 Sent: Monday, April 24, 2006 12:45 PM
 To: r-help
 Subject: [R] regression modeling

 Hi, there:
 I am looking for a regression modeling (like regression
 trees) approach for
 a large-scale industry dataset. Any suggestion on a package
 from R or from
 other sources which has a decent accuracy and scalability? Any
 recommendation from experience is highly appreciated.

 Thanks,

 Weiwei

 --
 Weiwei Shi, Ph.D

 Did you always know?
 No, I did not. But I believed...
 ---Matrix III

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] www.r-project.org

2006-04-25 Thread bogdan romocea
I agree it would be worthwhile to make some cosmetic changes to
r-project.org (nothing fancy though - no javascript, Flash etc). The
general public may not be fully aware of how R compares to other
statistical software, and I doubt that a web site which looks like it
was put together 10 years ago helps bend the perceptions in the right
direction. (Also, can someone finally change the graph on the first
page??)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of roger bos
 Sent: Tuesday, April 25, 2006 1:09 PM
 To: Romain Francois
 Cc: RHELP
 Subject: Re: [R] www.r-project.org

 While there is nothing about the r-project site that I would
 consider fancy,
 it is pretty functional.  I would be interested to hear more
 about what you
 hope to accomplish by re-doing the web site.  Fancy graphics
 may just slow
 down the experience for those not on broadband.  After all,
 the r-help list
 doesn't even like HTML in email, so it may not like too many
 fancy stuff on
 their website either.




 On 4/25/06, Romain Francois [EMAIL PROTECTED] wrote:
 
  Dear R users and developpers,
 
  My question is adressed to both of you, so I choose R-help
 to post it.
 
  Are there any plans to jazz up the main R website :
  http://www.r-project.org
  The look it have now is the same for a long time and kind of sad
  compared to other statistical package's website. Of course, the
  comparison is not fair, since companies are paying web
 designers to draw
  lollipop websites ...
 
  My first idea was to organize some kind of web designing contest.
  But, I had a small talk with Friedrich Leisch about that,
 who said that
  I shouldn't expect too many competitors.
  So, what about creating a small team, create a home page project and
  then propose it to the core team.
  It goes without saying it : The core team has the final word.
 
  What do you think ? Who would like to play ?
 
  Romain
 
  --
  visit the R Graph Gallery : http://addictedtor.free.fr/graphiques
  mixmod 1.7 is released :
 http://www-math.univ-fcomte.fr/mixmod/index.php
  +---+
  | Romain FRANCOIS - http://francoisromain.free.fr   |
  | Doctorant INRIA Futurs / EDF  |
  +---+
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
  http://www.R-project.org/posting-guide.html
 

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] efficiency in merging two data frames

2006-05-01 Thread bogdan romocea
Another good option is SQL, the fastest and most scalable solution. If
you decide to give it a try pay close attention to indexes.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller
 Sent: Monday, May 01, 2006 8:55 AM
 To: 'Guojun Zhu'; r-help@stat.math.ethz.ch
 Subject: Re: [R] efficiency in merging two data frames

 I'm sure you'll get ingenious responses to help you optimize
 your R code. I
 deal with similar investment data in even larger numbers
 (e.g. 10 years of
 daily return data for each stock in the Russell 3000), and
 prefer reading
 and consolidating the data in Python using dictionaries and
 lists, then
 either piping the data to R in a read statement (read.table(pipe
 python...)) or using Rpy to write R data frames directly from Python.
 Python is more facile with these basic data manipulations for
 hundreds of
 thousands or even millions of records, and performance is generally
 considerably better.

 Steve Miller

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Guojun Zhu
 Sent: Monday, May 01, 2006 2:35 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] efficiency in merging two data frames

 I have two data sets about lots of companies' stock
 and fiscal data.  One is monthly data with about
 144,000 lines, and the other is quaterly with about
 56,000.  Each data set takes different company code.
 I need to merge these two together.  I read both ask
 cvs.  And the other file with corresponding firm code.
  Now I have three data sets. return$PERMNO,
 account$GVKEY.  id is the data frames of the
 corresponding relation and has both id$PERMNO and
 id$GVKEY.  Also, I need to convert the return's month
 into quarter and finally merge two data frames(return
 and account).  I end up write a short program for
 this, but it runs very slow.  15+ minutes.  Is there
 quick way to do it.  Here is my original codes.



 id$fy=rep(0,length(id$PERMNO))
 for (i in 1:length(id$PERMNO))

 id$fy[[i]]-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]]

 return$GVKEY=rep(0,length(return$PERMNO))
 return$fyy=rep(0,length(return$PERMNO))
 return$fyq=rep(0,length(return$PERMNO))
 for (i in i:length(return$PERMNO)) {
 temp-id$PERMNO==return$PERMNO[[i]];
 tempmon-id$fy[temp][[1]];
 if (return$month[[i]]-tempmon) {
   return$fyy[[i]]-return$year[[i]];
   return$fyq[[i]]-4-(tempmon-return$month[[i]])%/%3;
   }
   else{
   return$fyy[[i]]-return$year[[i]]+1;
   return$fyq[[i]]-(return$month[[i]]-tempmon-1)%/%3;
   }
 return$GVKEY[[i]]-id$GVKEY[temp][[1]];
 }

 returnnew=merge(return,account,by.x-c(GVKEY,fyy,fyq),by
 .y-c(GVKEY,
 fyy,fyq))

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Axis labels

2006-05-02 Thread bogdan romocea
plot(1:10,axes=FALSE)
axis(1,at=1:10,labels=10:1)
axis(2,at=1:10,labels=5*10:1)
box()


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Christopher Brown
 Sent: Tuesday, May 02, 2006 12:13 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Axis labels

 I cannot find a way to apply custom axis tick label text. Is
 there a way?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Listing Variables

2006-05-03 Thread bogdan romocea
Here's an example.
dfr - data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50)
vars - colnames(dfr)
for (v in vars[grep(B,vars)]) print(mean(dfr[,v]))


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Farrel
 Buchinsky
 Sent: Wednesday, May 03, 2006 10:46 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Listing Variables

 How does one create a vector whose contents is the list of
 variables in a
 dataframe pertaining to a particular pattern?
 This is so simple but I cannot find a straightforward answer.
 I want to be able to pass the contents of that list to a for loop.

 So let us assume that one has a dataframe whose name is Data.
 And let us
 assume one had the height of a group of people measured at
 various ages.

 It could be made up of vectors Data$PersonalID, Data$FirstName,
 Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9,
 Data$Height.10,Data$Height.12,Data$Height.20many many
 more variables.

 How would one create a vector of all the Height variable names.

 The simple workaround is to not bother creating the vector
 Data$Height.1
 Data$Height.5 Data$Height.9 Data$Height.10
 Data$Height.12Data$Height.20...but rather just to use the sapply
 function. However with some functions the sapply will not
 work and it is
 necessary to supply each variable name to a function (see thread at
 Repeating tdt function on thousands of variables)


 This is such a core capability. I would like to see it in the
 R-Wiki but
 could not find it there.

 --
 Farrel Buchinsky, MD
 Pediatric Otolaryngologist
 Allegheny General Hospital
 Pittsburgh, PA

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] SQL like manipulations on data frames

2006-05-05 Thread bogdan romocea
This goes the other way - all SQL manipulations are a subset of what
can be done with R. Read up on indexing and see ?merge, ?aggregate,
?by, ?tapply, among others. (For the R equivalent to your query, check
?grep and ?order, and search the list if needed.) Also, this example
might be a good start:

gby - function(var,BY,byname=BY)
{
if (!exists(summarize)) library(Hmisc)#you need to install Hmisc
grouped - summarize(var,BY,function(x) {c(count=length(x),min=min(x),
max=max(x),mean=mean(x))})
colnames(grouped) - c(byname,count,min,max,mean)
grouped
}
#---
x - rnorm(1000)
state - sample(c(A,B,C,D),1000,replace=TRUE)
city - sample(1:5,1000,replace=TRUE)
gby(x,paste(state,city,sep=-),State-City)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Robert Citek
 Sent: Thursday, May 04, 2006 6:56 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] SQL like manipulations on data frames


 Is there a cheat-sheet anywhere that describes how to do SQL-like
 manipulations on a data frame?

 My knowledge of R is rather limited.  But from my experience
 it seems
 as though one can think of data frames as being similar to tables in
 a database: there are rows, columns, and values.  Also, one can
 perform similar manipulations on a data frame as one can on a
 table.
 For example:

   select * from foo where bar  10 ;

 is similar to

   foo[foo[bar]  10,]

 I'm just wondering how many other SQL-like manipulations can be done
 on a data frame?  As an extreme example, is it reasonable to assume
 there is an R equivalent to:

 select bar, bat, baz, baz*100 as 'pctbaz' from foo where bar
 like %xyz
 % order by bat, baz desc ;

 Regards,
 - Robert
 http://www.cwelug.org/downloads
 Help others get OpenSource software.  Distribute FLOSS
 for Windows, Linux, *BSD, and MacOS X with BitTorrent

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Using DBI and RMySQL

2006-05-11 Thread bogdan romocea
 I'll see if I can reproduce the steps under Knoppix[1].  Then you can
 run Knoppix with a Persistent Disk Image (PDI)[2] that contains R,
 the DBI, and RMySQL on just about any machine that runs Knoppix.

Don't bother, it's been done already. See
http://dirk.eddelbuettel.com/quantian.html


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Robert Citek
 Sent: Thursday, May 11, 2006 11:08 AM
 To: R-help@stat.math.ethz.ch
 Subject: Re: [R] Using DBI and RMySQL


 On May 11, 2006, at 3:09 AM, Indrajit Sengupta wrote:
  Did you create RMySQL windows binary in the process?

 Sorry, but no.  This was done on Mac OS X.  And was done a while ago.

  Can you share it with me?

 Wish I could, but I can't.  I don't have a Windows machine.

 I'll see if I can reproduce the steps under Knoppix[1].  Then
 you can
 run Knoppix with a Persistent Disk Image (PDI)[2] that contains R,
 the DBI, and RMySQL on just about any machine that runs Knoppix.

 [1] http://knoppix.net/
 [2] http://knoppix.net/wiki/
 Customizing_environment_using_4.0.2CD#Persistent_Disk_Image_.28PDI.29

 Regards,
 - Robert
 http://www.cwelug.org/downloads
 Help others get OpenSource software.  Distribute FLOSS
 for Windows, Linux, *BSD, and MacOS X with BitTorrent

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Fast update of a lot of records in a database?

2006-05-19 Thread bogdan romocea
Your approach seems very inefficient - it looks like you're executing
thousands of update statements. Try something like this instead:
#---build a table 'updates' (id and value)
...
#---do all updates via a single left join
UPDATE bigtable a LEFT JOIN updates b
ON a.id = b.id
SET a.col1 = b.value;
You may need to adjust the syntax.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch
 Sent: Friday, May 19, 2006 11:17 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Fast update of a lot of records in a database?

 We have a PostgreSQL table with about 40 records in it.  Using
 either RODBC or RdbiPgSQL, what is the fastest way to update
 one (or a
 few) column(s) in a large collection of records?  Currently we're
 sending sql like

 BEGIN
 UPDATE table SET col1=value WHERE id=id
 (repeated thousands of times for different ids)
 COMMIT

 and this takes hours to complete.  Surely there must be a quicker way?

 Duncan Murdoch

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] win2k memory problem with merge()'ing repeatedly (long email)

2006-05-22 Thread bogdan romocea
Repeated merge()-ing does not always increase the space requirements
linearly. Keep in mind that a join between two tables where the same
value appears M and N times will produce M*N rows for that particular
value. My guess is that the number of rows in atot explodes because
you have some duplicate values in your files (having the same
duplicate date in each data frame would cause atot to contain 4, then
8, 16, 32, 64... rows for that date).


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Sean O'Riordain
 Sent: Monday, May 22, 2006 10:12 AM
 To: r-help
 Subject: [R] win2k memory problem with merge()'ing repeatedly
 (long email)

 Good afternoon,

 I have a 63 small .csv files which I process daily, and until two
 weeks ago they processed just fine and only took a matter of moments
 and had non noticeable memory problem.  Two weeks ago they have
 reached 318 lines and my script broke.  There are some
 missing-values in some of the files.  I have tried hard many times
 over the last two weeks to create a small repeatable example to give
 you but I've failed - unless I use my data it works fine... :-(

 Am I missing something obvious? (again)

 A line in a typical file has lines which look like :
 01/06/2005,1372

 Though there are three files which have two values (files 3,32,33) and
 these have lines which look like...
 01/06/2005,1766,
 or
 15/05/2006,289,114

 a1 - read.csv(file1.csv,header=F)
 etc...
 a63 - read.csv(file63.csv,header=F)
 names(a1) - c(mdate,file1.column.description)

 atot - merge(a1,a2,all=T)

 followed by repeatedly doing...
 atot - merge(atot, a3,all=T)
 atot - merge(atot, a4,all=T)
 etc...

 I normally start R with --vanilla.

 What appears to happen is that atot doubles in size each iteration and
 just falls over due to lack of memory at about i=17... even though the
 total memory required for all of these individual a1...a63 is only
 1001384 bytes (doing an object.size() on a1..a63)
 at this point I've been trying to pin down this problem for two weeks
 and I just gave up...

 The following works fine as I'd expect with minimal memory usage...

 for (i in 3:67) {
 datelist - as.Date(start.date)+0:(count-1)
 #remove a couple of elements...
 datelist - datelist[-(floor(runif(nacount)*count))]
 a2 - as.data.frame(datelist)
 names(a2) - mdate
 vname - paste(value, i, sep=)
 a2[vname] - runif(length(datelist))
 #a2[floor(runif(nacount)*count), vname] - NA

 # atot - merge(atot,a2,all=T)
 i - 2
 a.eval.text - paste(merge(atot, a, i, , all=T), sep=)
 cat(a.eval.text is: -, a.eval.text, -\n, sep=)
 atot - eval(parse(text=a.eval.text))

 cat(i:, i,  , gc(), \n)
 }

 this works fine... but on my files (as per attached 'lastsave.txt'
 file) it just gobbles memory.
 Am I doing something wrong?  I (wrongly?) expected that repeatedly
 merge(atot,aN) would only increase the memory requirement linearly
 (with jumps perhaps as we go through a 2^n boundary)... which is what
 happens when merging simulated data.frames as above... no problem at
 all and its really fast...

 The attached text file shows a (slightly edited) session where the
 memory required by the merge() operation just doubles with each use...
 and I can only allow it to run until i=17!!!

 I've even run it with gctorture() set on... with similar, but
 excruciatingly slow results...

 Is there any relevant info that I'm missing?  Unfortunately I am not
 able to post the contents of the files to a public list like this...

 As per a previous thread, I know that I can use a list to handle these
 dataframes - but I had difficulty with the syntax of a list of
 dataframes...

 I'd like to know why the memory requirements for this merge
 just explode...

 cheers, (and thanks in advance!)
 Sean O'Riordain

 ==
  version
_
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status Patched
 major  2
 minor  3.0
 year   2006
 month  05
 day09
 svn rev38014
 language   R
 version.string Version 2.3.0 Patched (2006-05-09 r38014)
 
 Running on Win2k with 1Gb ram.

 I also tried it (with the same results) on 2.2.1 and 2.3.0.

 

 R : Copyright 2006, The R Foundation for Statistical Computing
 Version 2.3.0 Patched (2006-05-09 r38014)
 ISBN 3-900051-07-0

 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.

   Natural language support but running in an English locale

 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and
 'citation()' on how to cite R or R packages in publications.

 Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' 

Re: [R] Manipulating code?

2006-05-23 Thread bogdan romocea
Macro stuff à la SAS is something that should be avoided whenever
possible - it's messy, limited, and limiting. (I've done it
ocasionally and it works, but I think it's best not to go there.) Read
the documentation on lists (in particular named lists), and keep
everything in one or more lists. For example:
lst - list()
for (v in c(var1,var2,var3)) lst[[v]] - runif(sample(c(50,100),1))
for (v in c(var1,var2,var3)) print(sd(lst[[v]]))


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Johannes Hüsing
 Sent: Tuesday, May 23, 2006 12:26 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Manipulating code?

 Dear expeRts,
 I am currently struggling with the problem of finding
 cut points for a set of stimulus variables. I would like
 to obtain cut points iteratively for each variable by
 re-applying a dichotomised variable in the model and then
 recalculate it. I planned to have fixed names for the
 dichotomised variables so I could use the same syntax
 for every recalculation of the whole model. I furthermore
 want to reiterate the process until no cut point changes
 any more.

 My problem is in accomplishing this syntactically. How can
 I pass a variable name to a function without getting lost
 in as.symbol and eval and parse mayhem? I am feeling
 I am thinking too much in macro expansion à la SAS when
 trying to tackle this.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] progressive slowdown during script execution?

2006-06-01 Thread bogdan romocea
Compare
  system.time({
  v - vector()
  for (i in 1:10^5) v - c(v,1)
  })
with
  system.time({
  v - vector(length=10^5)
  for (i in 1:10^5) v[i] - 1
  })
If you don't know exactly how long v will be, use a value that's large
enough, then throw away what's extra.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Tim Alcon
 Sent: Thursday, June 01, 2006 2:04 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] progressive slowdown during script execution?

 I'm an R novice, so I hope my question is a valid one.  I'm trying to
 run the following script in the current version of R.

 for (i in 1:1640){for (j in (i+1):1641){
 if (i == 1  j == 2){x -
 cor(sage[i,],sage[j,],method=spearman); y
 - cor(frie[i,],frie[j,],method=spearman)}
 if (i != 1 || j != 2){x -
 c(x,cor(sage[i,],sage[j,],method=spearman)); y -
 c(y,cor(frie[i,],frie[j,],method=spearman))}}}

 It basically just finds all pairwise correlations of the rows in a
 matrix for each of two matrices and stores the results for
 each matrix
 in a vector.  The problem I seem to be running into is that
 it seems to
 slow way down during execution somehow.  When I first tried
 running it I
 stopped execution to see how fast it was running, before trying to
 compute the whole job (the two matrices each have 1641 rows).
  Based on
 what I saw, I figured it would easily finish overnight.
 Instead, it was
 still running almost 24 hours later.  To quantify this a
 little better I
 checked it after running for 5 minutes, at which point it had added
 79120 correlations to each of the x and y vectors.  Since
 there should
 be a total of (1641*1640)/2 = 1345620 pairwise correlations in each
 vector when it finishes running, I worked out that it should take
 (1345620/79120)*5 = 85 minutes to run the whole job.  However, when I
 checked it after running for 2 hours, it had added only 341870
 correlations to each vector.

 Any ideas what I'm doing wrong, or why it would run more slowly the
 longer it runs?  Thanks for any help or advice.

 Tim

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R usage for log analysis

2006-06-12 Thread bogdan romocea
I wouldn't use a DBMS at all -- it is not necessary and I don't see
what you would get in return. Instead I would split very large log
files into a number of pieces so that each piece fits in memory (see
below for an example), then process them in a loop. See the list and
the documentation if you have questions about how to read text files,
count strings etc.

#---split big files in two---
for F in `ls *log`
do
  fn=`echo $F | awk -F\. '{print $1}'`
  ln=`wc -l $F | awk '{print $1}'`  #number of lines in the file
  forsplit=`expr $ln / 2 + 50`  #no. of lines in each chunk, tweak as needed
  echo Splitting $F into pieces of $forsplit lines each
  split -l $forsplit $F $fn
done


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Gabriel Diaz
 Sent: Monday, June 12, 2006 9:52 AM
 To: Jean-Luc Fontaine
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] R usage for log analysis

 Hello

 Thanks all for the answers.

 I'm taking an overview to the project documentation, and seems the
 database is the way to go to handle log files of GB order (normally
 between 2 and 4 GB each 15 day dump).

 In this document http://cran.r-project.org/doc/manuals/R-data.html,
 says R will load all data into memory to process it when using
 read.table and such. Using a database will do the same? Well,
 currently i have no machine with  2 GB of memory.

 The moodss thing looks nice, thanks for the link. But what i have to
 do now is an offline analysis of big log files :-). I will try to go
 with the mysql - R way.

 gabi



 On 6/12/06, Jean-Luc Fontaine [EMAIL PROTECTED] wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Allen S. Rout wrote:
  
  
   Don't expect a warm welcome.  This community is like all
 open-source
   communities, sharply focused on its' own concerns and
 expertise.  And,
   in an unusual experience for computer types, our core competencies
   hold little or no sway here; they don't even give us much
 of a leg up.
   Just wait 'till you want to do something nutso like
 produce a business
   graphic. :)
  
   I'm working on understanding enough of R packaging and
 documentation
   to begin a 'task view' focused on systems administration,
 for humble
   submission. That might end up being mostly log
 analysis; the term
   can describe much of what we do, if it's stretched a bit.
  I'm hoping
   the task view will attract the teeming masses of
 sysadmins trapped in
   the mire of Gnuplot and friends.
  Although not specifically solving the problem at hand, you
 might want
  to take a look at moodss and moomps
 (http://moodss.sourceforge.net/),
  modular monitoring applications, which uses R
  (http://jfontain.free.fr/statistics.htm) and its log module
  (http://jfontain.free.fr/log/log.htm).
 
  - --
  Jean-Luc Fontaine  http://jfontain.free.fr/
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.3 (GNU/Linux)
  Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org
 
  iD8DBQFEjT2ykG/MMvcT1qQRAuF6AJ9nf5phV/GMmCHPuc5bVyA+SoXqGACgnLuZ
  u1tZpFOTCHNKOfFLZOC9uXI=
  =V8yo
  -END PGP SIGNATURE-
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] bubbleplot for matrix

2006-06-14 Thread bogdan romocea
Here's an example. By the way, I find that it's more convenient (where
applicable) to keep the data in 3 vectors/factors rather than one
matrix/data frame.

a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
x - y - z - vector()
for (i in 1:nrow(a)) {
  x - c(x,rep(rownames(a)[i],ncol(a)))
  y - c(y,colnames(a))
  z - c(z,a[i,])
}
symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki)
text(as.numeric(x),as.numeric(y),labels=z)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Albert Vilella
 Sent: Tuesday, June 13, 2006 7:11 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] bubbleplot for matrix

 Hi all,

 I would like to ask if it is possible to use bubbleplot for a 20x20
 matrix, instead of a dataframe with factors in columns.

 The idea would be to get a tabular representation with bubbles like in
 Rnews_2006_2 article, which look very nice.

 Thanks in advance,

 Albert.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] bubbleplot for matrix

2006-06-15 Thread bogdan romocea
This works, though I'm not sure why symbols() complains about
axes=FALSE while fulfilling the request.

a - matrix(sample(1:5,100,replace=TRUE),nrow=10)
rownames(a) - c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd)
colnames(a) - c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd)
x - y - z - vector()
for (i in 1:nrow(a)) {
  x - c(x,rep(rownames(a)[i],ncol(a)))
  y - c(y,colnames(a))
  z - c(z,a[i,])
}

xp - as.numeric(as.factor(x))
yp - as.numeric(as.factor(y))
symbols(xp,yp,z,inches=0.2,bg=khaki,axes=FALSE)
axis(1,at=1:length(unique(x)),labels=sort(unique(x)))
axis(2,at=1:length(unique(y)),labels=sort(unique(y)))
box()
text(xp,yp,labels=z)



On 6/15/06, Albert Vilella [EMAIL PROTECTED] wrote:
 Thanks Bogdan for the reply,

 I almost got it working, but in my case, the rownames and colnames are
 strings, not numbers, and I guess that this is a problem when using your
 snippet:

 a -
 matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
 rownames(a) =
 c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd)
 colnames(a) =
 c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd)
 x - y - z - vector()
 for (i in 1:nrow(a)) {
  x - c(x,rep(rownames(a)[i],ncol(a)))
  y - c(y,colnames(a))
  z - c(z,a[i,])
 }
 symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki)
 text(as.numeric(x),as.numeric(y),labels=z)

  symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki)
 Error in plot.window(xlim, ylim, log, asp, ...) :
need finite 'xlim' values
 In addition: Warning messages:
 1: NAs introduced by coercion
 2: NAs introduced by coercion
 3: no finite arguments to min; returning Inf
 4: no finite arguments to max; returning -Inf
 5: no finite arguments to min; returning Inf
 6: no finite arguments to max; returning -Inf

 Any guess?

 Thanks in advance,

Albert.

 On Wed, 2006-06-14 at 16:47 -0400, bogdan romocea wrote:
  Here's an example. By the way, I find that it's more convenient (where
  applicable) to keep the data in 3 vectors/factors rather than one
  matrix/data frame.
 
  a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
  x - y - z - vector()
  for (i in 1:nrow(a)) {
x - c(x,rep(rownames(a)[i],ncol(a)))
y - c(y,colnames(a))
z - c(z,a[i,])
  }
  symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki)
  text(as.numeric(x),as.numeric(y),labels=z)
 
 
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Albert Vilella
   Sent: Tuesday, June 13, 2006 7:11 AM
   To: r-help@stat.math.ethz.ch
   Subject: [R] bubbleplot for matrix
  
   Hi all,
  
   I would like to ask if it is possible to use bubbleplot for a 20x20
   matrix, instead of a dataframe with factors in columns.
  
   The idea would be to get a tabular representation with bubbles like in
   Rnews_2006_2 article, which look very nice.
  
   Thanks in advance,
  
   Albert.
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
   http://www.R-project.org/posting-guide.html
  



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] modeling logit(y/n) using lrm

2006-06-16 Thread bogdan romocea
Not sure about your data set, but if you have some kind of
(weighted/stratified) sample of hospitals you need to pay special
attention. Survey data violates the assumptions of the classical
linear models (infinite population, identically distributed errors
etc) and needs to be analyzed differently. In SAS, it's wrong to throw
such data into a PROC LOGISTIC / REG; PROC SURVEYLOGISTIC / SURVEYREG
should be used instead. In R, take a look at the survey package. For
details check
http://www2.sas.com/proceedings/sugi31/193-31.pdf



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Hamilton, Cody
 Sent: Friday, June 16, 2006 1:32 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] modeling logit(y/n) using lrm


 I have a dataset at a hospital level (as opposed to the patient level)
 that contains number of patients experiencing events (call this number
 y), and the number of patients eligible for such events (call this
 number n).  I am trying to model logit(y/n) = XBeta.  In SAS
 this can be
 done in PROC LOGISTIC or GENMOD with a model statement such as: model
 y/n = predictors;.  Can this be done using lrm from the
 Hmisc library
 without restructuring the dataset so that for each hospital
 there is one
 row with y = 1 and one row with y = 0 and then using the weight option
 in lrm to weight these two responses by the number of 'successes' and
 'failures' for that hospital, respectively?  I would like to avoid the
 restructuring, and I understand that the use of the weight function is
 not compatible with a lot of the validation functions
 available in Hmisc
 (validate, bootcov, etc.).



 Cody Hamilton, Ph.D

 Institute for Health Care Research and Improvement

 Baylor Health Care System

 (214) 265-3618





 This e-mail, facsimile, or letter and any files or
 attachments transmitted with it contains information that is
 confidential and privileged. This information is intended
 only for the use of the individual(s) and entity(ies) to whom
 it is addressed. If you are the intended recipient, further
 disclosures are prohibited without proper authorization. If
 you are not the intended recipient, any disclosure, copying,
 printing, or use of this information is strictly prohibited
 and possibly a violation of federal or state law and
 regulations. If you have received this information in error,
 please notify Baylor Health Care System immediately at
 1-866-402-1661 or via e-mail at [EMAIL PROTECTED]
 Baylor Health Care System, its subsidiaries, and affiliates
 hereby claim all applicable privileges related to this information.
 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] print color

2006-07-10 Thread bogdan romocea
One option is
library(R2HTML)
?HTML.cormat
The thing you're after is traffic highlighting (via CSS or HTML tags).
If HTML.cormat() doesn't do exactly what you want, modify the source
code. (By the way, I haven't used R2HTML so far so maybe there's a
more appropriate function.)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Robert Mcfadden
 Sent: Monday, July 10, 2006 4:00 PM
 To: R-help@stat.math.ethz.ch
 Subject: [R] print color

 Dear R Users,

 Is it possible to make R print the largest item in each row
 of a matrix X
 with red font? Example:

 1247

 8431

 ...

 Therefore 7 and 8 should be in red color.

 I would appreciate any suggestion

 Robert McFadden




 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Is it possible to only read a subset by read.table ?

2006-07-12 Thread bogdan romocea
It's possible and straightforward (just don't use R). IMHO the GNU
Core Utilities
http://www.gnu.org/software/coreutils/
plus a few other tools such as sed, awk, grep etc are much more
appropriate than R for processing massive text files. (Get a good book
about UNIX shell scripting. On Windows you can use Services For Unix
or Cygwin.)

Also, here's an example that you could adapt to print the males from
your data set to a separate file, which you could then import in R.
#---print specific lines to another file---
suffix=_JAN06
for F in `ls *data*`
do
  echo $F
  sed -n -e '/2006-01-[0-9][0-9]/p' $F  ${F}${suffix}
done


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of David Vonka
 Sent: Wednesday, July 12, 2006 8:37 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Is it possible to only read a subset by read.table ?

 Hello,

 is it possible to do something like

 DATA - read.table(file=blabla.dat,subset=(sex==male)),

 i.e. make R read only a subset of a csv file ?
 I think it would be useful in case of very big datasets,
 but I can't find such a feature.

 Thanks for an answer,
 David Vonka

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] 15-min mean values

2006-02-02 Thread bogdan romocea
Here's another approach which can be easily implemented in SQL.
1. Start with the dates as character vectors,
   dt - as.character(Sys.time())
2. Extract the minutes and round them to 0,15,30,45:
   minutes - floor(as.numeric(substr(dt,15,16))/15)*15
   final.mins - as.character(minutes)
   final.mins[final.mins == 0] - 00
3. Get the dates you need for aggregating:
   final.dt - paste(substr(dt,1,14),final.mins,:00,sep=)
(If you had wanted to use 10 minutes, it would have been enough to
transform MM:SS to M0:00.)
4. Use aggregate(), SQL GROUP BY etc
5. Finally, convert final.dt from character to datettime.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
 Grothendieck
 Sent: Thursday, February 02, 2006 1:44 AM
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] 15-min mean values

 Assume VDATE is a character vector.  If its a
 factor first convert it using

 VDATE - as.character(VDATE)

 Lets assume we only need the times portion
 and later we handle the full case which may or
 may be needed.

 We create a times object from the times portion of
 vdate and then in the aggregate statement we use
 trunc.times -- note that trunc.times is a recent
 addition to the chron package so make sure you have
 the latest chron and R 2.2.1.See ?trunc.times

 # test data
 library(chron)
 library(zoo)
 VDATE - c(1998-10-22:02:11, 1998-10-22:02:12,
 1998-10-22:02:13, 1998-10-22:02:14, 1998-10-22:02:15)
 WS - c(12.5, 10.1, 11.2, 10.5, 11.5)

 # convert VDATES to times class and aggregate
 vtimes - times(sub(.*:(..:..), \\1:00, VDATE))
 aggregate(zoo(WS), trunc(vtimes, 00:15:00), mean)

 If we need the day part too then its only a little
 harder.

 Represent VDATE as a chron object, vdate.  We do this
 by extracting out the date and time portions
 and converting each separately.  We use regular
 expressions to do that conversion but show in a
 comment how to do it without regular expressions.
 See R News 4/1 Help Desk for more info on this and
 the table at the end of the article in particular.

 # alternative way to convert to vdate would be:
 # vdate - chron(dates = as.numeric(as.Date(substring(VDATE, 1, 10))),
 #times = paste(substring(VDATE, 12), 0, sep =:))


 vdate - chron(dates = sub(()-(..)-(..).*,
 \\2/\\3/\\1, VDATE),
 times = sub(.*:(..:..), \\1:00, VDATE))
 aggregate(zoo(WS), chron(trunc(times(vdate), 00:15:00)), mean)

 On 2/2/06, [EMAIL PROTECTED]
 [EMAIL PROTECTED] wrote:
 
  Good day everyone,
 
  I want to use zoo(aggregate) to calculate
  15-min mean values from a wind dataset which
  has 1-min values. The data I have looks like this:
 
  vector VDATE   vector WS
  1   1998-10-22:02:11  12.5
  2   1998-10-22:02:12  10.1
  3   1998-10-22:02:13  11.2
  4   1998-10-22:02:14  10.5
  5   1998-10-22:02:15  11.5
   .
   .
   .
  n   2005-06-30:23:59   9.1
 
 
  I want to use:
 
  aggregate(zoo(WS),'in 15-min intervals',mean)
 
  How do you specify 'in 15-min intervals' using
  vector VDATE? The length of VDATE cannot be
  changed, otherwise it would be a trivial problem
  because I can generate a 15-min spaced vector
  using 'seq'.
 
  Am I missing something?
 
  Thanks a lot,
 
  Augusto
 
 
  
  Augusto Sanabria. MSc, PhD.
  Mathematical Modeller
  Risk Research Group
  Geospatial  Earth Monitoring Division
  Geoscience Australia (www.ga.gov.au)
  Cnr. Jerrabomberra Av.  Hindmarsh Dr.
  Symonston ACT 2609
  Ph. (02) 6249-9155
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] matching tables

2006-02-07 Thread bogdan romocea
t1 - as.data.frame(table(1:10)) ; colnames(t1)[2] - A
t2 - as.data.frame(table(5:20)) ; colnames(t2)[2] - B
t3 - merge(t1,t2,all=TRUE)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante
 Sent: Tuesday, February 07, 2006 4:22 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] matching tables

 Dear Listers,

 I am trying to match tables that DO NOT have the same length. The
 tables result from the function table() so they look like this:

 table 1
 2 3 4
 3 5 7

 table 2
 1 2 3
 6 4 5

 I need the following output: (NOTICE THE ZEROS)
  1 2 3 4
 table1 0 3 5 7
 table2 6 4 5 0

 Unfortunately, I was not successful using match(). Previous
 postings
 explain how to do similar matching, but for tables for same length,
 specifically. Any thoughts ?

 Thanks !
 eric

 Eric Pante
 
 College of Charleston, Grice Marine Laboratory
 205 Fort Johnson Road, Charleston SC 29412
 Phone: 843-953-9190 (lab)  -9200 (main office)
 

   On ne force pas la curiosite, on l'eveille ...
   Daniel Pennac

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] dataframe subset

2006-02-08 Thread bogdan romocea
Here's one way,
  x - data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10))
  y - data.frame(V=c(2,9,10))
  xy - merge(x,y,all=FALSE)
Pay close attention to what happens if you have duplicate values in y, say
  y - data.frame(V=c(2,9,10,10))


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Bernhard Baumgartner
 Sent: Wednesday, February 08, 2006 9:22 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] dataframe subset

 I have a dataframe with a column, say x consisting of values, each
 value appearing different times, e.g.
 x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
 and a vector, including e.g.:
 y: 2,9,10,...
 I need a subset of the dataframe: all rows where x is equal to one of
 the values in y. Currently I use a loop for this, but because x and y
 are large this is very slow.
 Is there any idea how to solve this problem faster?
 Thank you,
 Bernhard

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Interleaving elements of two vectors?

2006-03-07 Thread bogdan romocea
For a general solution without warnings try
interleave - function(v1,v2)
{
ord1 - 2*(1:length(v1))-1
ord2 - 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
interleave(rep(1,5),rep(3,8))


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
 Grothendieck
 Sent: Monday, March 06, 2006 12:12 AM
 To: Ajay Narottam Shah
 Cc: R-help
 Subject: Re: [R] Interleaving elements of two vectors?

 Try this (note that  your x and y do not have the same length
 and in this case the expression will recycle the shorter one
 and give a warning):

 z - c(rbind(x, y))


 On 3/5/06, Ajay Narottam Shah [EMAIL PROTECTED] wrote:
  Suppose one has
 
 x - c(1,  2,  7,  9,  14)
 y - c(71, 72, 77)
 
  How would one write an R function which alternates between
 elements of
  one vector and the next? In other words, one wants
 
 z - c(x[1], y[1], x[2], y[2], x[3], y[3], x[4],
 y[4], x[5], y[5])
 
  I couldn't think of a clever and general way to write this.
 I am aware
  of gdata::interleave() but it deals with interleaving rows of a data
  frame, not elems of vectors.
 
  --
  Ajay Shah
 http://www.mayin.org/ajayshah
  [EMAIL PROTECTED]
 http://ajayshahblog.blogspot.com
  *(:-? - wizard who doesn't know the answer.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] \r with RSQLite

2006-03-15 Thread bogdan romocea
\r is a carriage return character which some editors may use as a line
terminator when writing files.  My guess is that RSQLite writes your
data frame to a temp file using \r as a line terminator and then runs
a script to have SQLite import the data (together with \r - this would
be the problem), but I have no idea if that's really the case. Check
the documentation or ask the maintainer.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Mikkel Grum
 Sent: Wednesday, March 15, 2006 1:46 PM
 To: r-help@stat.math.ethz.ch
 Cc: [EMAIL PROTECTED]
 Subject: [R] \r with RSQLite

 What am I doing wrong, or is the \r that I'm getting
 in the example below a bug?

  a - (1:10)
  b - (LETTERS[1:10])
  df - as.data.frame(cbind(a, b))
 
  df
 a b
 1   1 A
 2   2 B
 3   3 C
 4   4 D
 5   5 E
 6   6 F
 7   7 G
 8   8 H
 9   9 I
 10 10 J
  library(RSQLite)
  drv - dbDriver(SQLite)
  con - dbConnect(drv, dbname = Test)
  dbWriteTable(con, DF, df, row.names = FALSE,
 overwrite = TRUE)
 [1] TRUE
  df2 - dbGetQuery(con, SELECT DISTINCT * FROM
 DF)
  dbDisconnect(con)
 [1] TRUE
  df2
 a   b
 1   1 A\r
 2   2 B\r
 3   3 C\r
 4   4 D\r
 5   5 E\r
 6   6 F\r
 7   7 G\r
 8   8 H\r
 9   9 I\r
 10 10 J\r

  sessionInfo()
 R version 2.2.1, 2005-12-20, i386-pc-mingw32

 attached base packages:
 [1] methods   stats graphics  grDevices
 utils datasets
 [7] base

 other attached packages:
  RSQLite  DBI
  0.4-1 0.1-10


 Mikkel Grum
 Genetic Diversity
 International Plant Genetic Resources Institute

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] renaming dataframe1 using column names from dataframe2?

2006-03-17 Thread bogdan romocea
?assign, but _don't_ use it; lists are better.
dfr - list()
for(j in 1:9) {
  dfr[[as.character(j)]] - ...
  }
Don't try to imitate the limited macro approach of other software
(e.g. SAS). You can do all that in R, but it's much simpler and much
safer to rely on list indexing and functions that return values
(rather than create objects).


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of r user
 Sent: Friday, March 17, 2006 10:26 AM
 To: rhelp
 Subject: [R] renaming dataframe1 using column names from
 dataframe2?

 I have a dataframe named temp, and another dataframe
 named descriptions.

 I wish to rename temp, and to call it the names of
 a certain column in the dataframe descriptions.

 Is there a good way to do this?

 A similar question:

 I am using a for loop to create several new
 dataframes.
 e.g.
 for(j in 1:9){…..

 I'd like each dataframe to be named d1, d2, d3, with
 the number being tied to the j (the iteration).

 Is this possible

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] create a gui with a button to change graphic?

2006-03-20 Thread bogdan romocea
Adapt the function below to suit your needs. If you really want to
plot 5 minutes at a time, round the time series to the last MM:00
times (where MM is in 5*0:11) and have idx below loop over them.

splitplot - function(x,points)
{
boundaries - c(1,points*1:floor(length(x)/points),length(x))
for (i in 2:length(boundaries)) {
idx - boundaries[i-1]:boundaries[i]
plot(idx,x[idx],type=o)#here you may prefer time.of.x[idx] to idx
}
}
#examples
par(ask=TRUE) ; splitplot(rnorm(1000),350)
par(mfrow=c(3,1),ask=FALSE) ; splitplot(rnorm(1000),350)



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Gael de Lannoy
 Sent: Monday, March 20, 2006 7:40 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] create a gui with a button to change graphic?

 Hello everybody,

 I am wondering if it is possible to create a gui to plot a
 time series
 that is very big, it's an EEG signal of 20mins. What I would
 like to do
 is plot the first 5mins, then have a button on the gui that plots the
 next 5mins when pushed.

 Is it possible?

 Thanks in advance !

 Gael.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Multivariate linear regression

2006-04-06 Thread bogdan romocea
Apparently you do not understand the point, and seem to (want to) see
patterns all over the place. A good start for the treatment of this
interesting disease is 'Fooled by Randomness' by Nassim Nicholas
Taleb. The main point of the book is that many things may be a lot
more random than one might care to imagine or believe. (Ramsey theory
is misleading and of no help here, given its biased premise that
complete disorder is impossible (T. S. Motzkin, Wikipedia).)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Nagu
 Sent: Wednesday, April 05, 2006 8:09 PM
 To: Berton Gunter
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Multivariate linear regression

 Hi Bert,

 Thank you for your prompt reply.

 I understand your point.

 But randomness is just a matter of scale of the object (Ramsey
 Theory) . The X matrix does not explain the complete variation in Y
 due to a large noise in X or simply the mapping f: X-Y is many valued
 (or due to other finite number of reasons). Theoretically inverse does
 not exist for many valued functions. In regression type problems, we
 are evaluating the pseudoinverse of data space.

 To estimate the inverses of many valued functions, theoretically, we
 may have to use branch cuts method or something called Riemann
 surfaces, they are partition of the domain of connected sheets.

 As I am not a qualified statistician or have a good experience in
 building statistical models for highly noisy data, I am wondering how
 did you deal with such situations, if any exist, in your working
 experience?

 I will try your idea of feeding some random variables as
 predictors in X.

 Thank you again,
 Nagu

 P.S. Why is that pattern recognition is all about finding patterns
 that can not be seen easily, huh?

 On 4/5/06, Berton Gunter [EMAIL PROTECTED] wrote:
  Ummm...
 
  If y is unrelated to x, then why would one expect any
 reasonable method to
  show a greater or lesser relationship than any other? It's
 all random. Of
  course, put enough random regressors into/tune the
 parameters enough of
  any regression methodology and you'll be able to precisely
 predict the data
  at hand -- but **only** the data at hand. I should note
 that such work
  apparently frequently appears in various sorts of
 informatics/data
  mining/omics/etc. journals these days, as various papers
 demonstrating
  the irreproducibility of numerous purported discoveries
 have infamously
  demonstrated. Let us not forget Occam!
 
  Just being cranky ...
 
  -- Bert Gunter
 
 
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Nagu
   Sent: Wednesday, April 05, 2006 3:52 PM
   To: r-help@stat.math.ethz.ch
   Subject: [R] Multivariate linear regression
  
   Hi,
  
   I am working on a multivariate linear regression of the
 form y = Ax.
  
   I am seeing a great dispersion of y w.r.t x. For example, the
   correlations between y and x are very small, even after using some
   typical transformations like log, power.
  
   I tried with simple linear regression, robust regression
 and ace and
   avas package in R (or splus). I didn't see an improvement
 in the fit
   and predictions over simple linear regression. (I also
 tried this with
   transformed variables)
  
   I am sure that some of you came across such data. How did you
   deal with it?
  
   Linear regressions are good for the data like y = x +
   0.01Normal(mu,sigma2) i.e. a small noise (data observed
 in a lab). But
   linear regressions are bad for large noise, like typical
 market (or
   survey) data.
  
   Thank you,
   Nagu
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
   http://www.R-project.org/posting-guide.html
  
 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)

2006-04-06 Thread bogdan romocea
There are several kinds of standardization, and 'normalization' is
only one of them. For some details you could check
http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm
(see Details for standardization methods).

Standardization is required prior to clustering to control for the
impact of scale. (Variables with large variances tend to have more
effect on the resulting clusters than those with small variances.) I
don't know how valuable standardization may be in other areas.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of roger bos
 Sent: Thursday, April 06, 2006 1:15 PM
 To: Berton Gunter; Liaw, Andy
 Cc: rhelp
 Subject: Re: [R] pros and cons of robust regression? (i.e.
 rlm vs lm)

 I'm asking this question purely for my own benefit, not to
 try to correct
 anyone.  The procedure you refer to as normalization I have
 always heard
 referred to as standardization.  Is the former the proper
 term?  Also, you
 say its not necessary given today's hardware, but isn't it
 beneficial to get
 all the variables in a similar range?  Is thre any other
 transformation that
 you would suggest?  I use rlm (and normalization) in my
 models I use every
 day, so I was happy to read the above comments.

 Thanks,

 Roger



 On 4/6/06, Berton Gunter [EMAIL PROTECTED] wrote:
 
  Thanks, Andy. Well said. Excellent points. The final
 weights from rlm
  serve
  this diagnostic purpose, of course.
 
  -- Bert
 
 
   -Original Message-
   From: Liaw, Andy [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 06, 2006 9:56 AM
   To: 'Berton Gunter'; 'r user'; 'rhelp'
   Subject: RE: [R] pros and cons of robust regression? (i.e.
   rlm vs lm)
  
   To add to Bert's comments:
  
   -  Normalizing data (e.g., subtracting mean and dividing by
   SD) can help
   numerical stability of the computation, but that's mostly
   unnecessary with
   modern hardware.  As Bert said, that has nothing to do with
   robustness.
  
   -  Instead of _replacing_ lm() with rlm() or other robust
   procedure, I'd do
   both of them.  Some scientists view robust procedures that
   omit some data
   points (e.g., by assigning basically 0 weight to them) in
   automatic fashion
   and just trust the result as bad science, and I think they
   have a point.
   Use of robust procedure does not free one from examining the
   data carefully
   and looking at diagnostics.  Careful treatment of outliers is
   esspecially
   important, I think, for data coming from a confirmatory
   experiment.  If the
   conclusion you draw depends on downweighting or omitting
 certain data
   points, you ought to have very good reason for doing so.  I
   think it can not
   be over-emphasized how important it is not to take outlier
   deletion lightly.
   I've seen many cases that what seems like outlier originally
   turned out to
   be legitimate data, and omission of them just lead to overly
   optimistic
   assessment of variability.
  
   Andy
  
   From: Berton Gunter
   
There is a **Huge** literature on robust regression,
including many books that you can search on at e.g. Amazon. I
think it fair to say that we have known since at least the
1970's that practically any robust downweighting procedure
(see, e.g M-estimation) is preferable (more efficient,
better continuity properties, better estimates) to trimming
outliers defined by arbitrary threshholds. An excellent but
now probably dated introductory discussion can be found in
UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited
by Hoaglin, Tukey, Mosteller, et. al.
   
The rub in all this is that nice small sample inference
results go our the window, though bootstrapping can help with
this. Nevertheless, for a variety of reasons, my
recommendation is simply to **never** use lm and **always**
use rlm (with maybe a few minor caveats). Many would disagree
with this, however.
   
I don't think normalizing data as it's conventionally used
has anything to do with robust regression, btw.
   
-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
   
The business of the statistician is to catalyze the
scientific learning process.  - George E. P. Box
   
   
   
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of r user
 Sent: Thursday, April 06, 2006 8:51 AM
 To: rhelp
 Subject: [R] pros and cons of robust regression? (i.e.
   rlm vs lm)

 Can anyone comment or point me to a discussion of the
 pros and cons of robust regressions, vs. a more
 manual approach to trimming outliers and/or
 normalizing data used in regression analysis?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
   

Re: [R] I am surprised (and a little irritated)

2006-04-19 Thread bogdan romocea
Installing R on SuSE 10.0 may be less than trivial for a beginner (I
ended up compiling GCC plus 3-4 other things). In case you lose your
patience I'd suggest trying Mepis Linux: it's very easy to install and
the package management GUI (Synaptic) is great. Installing R together
with a bunch of R packages, courtesy of the Debian folks, is a breeze.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom
 Backer Johnsen
 Sent: Wednesday, April 19, 2006 3:05 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] I am surprised (and a little irritated)

 I have started with using R on Windows, and I am really happy about
 the system.  Now, one of my other ambitions is to learn how to use
 Linux, so yesterday I downloaded OpenSuse and installed that.  The
 next problem was to try to use R with Linux.  And there I met the
 wall.  I've understood that RPM's are somewhat like installing
 programs on Windows, so that was downloaded and started with YAST.

 And got some error messages about missing stuff.  The first reactions
 is surprise -- there must be an error in the installation procedure.
 I have never (well, almost) met an installation procedure on Windows
 that did not include everything needed.  And the installation of R on
 Windows was very smooth.  Then I discover to my big surprise that the
 readme file says that I need to have eight installed packages.  Then
 it says Most of them are included in a standard install.  Sigh.
 Then the problem next is to find out which of the eight I
 already have
 and which ones I need to locate somewhere.  Where can I find them I
 wonder.  Somewhere on the net?  And that is how far I got today.

 So, one of the complaints I have is that the instructions for
 installing R on Linux are very cryptic, and to a large extent assume
 that you already know Linux.  Which I do not.  And I expect
 instructions on installing should be simple and clear.  But I am a
 very experienced computer user, so I really expect to be able to
 understand instructions.  I cannot expect my students to
 manage what I
 cannot manage myself, so Linux is out, or at least Suse Linux.  And
 that is a pity, for a number of reasons.

 The second is just as much surprise at the installation procedure.
 Under Windows there are any number of installers which make it easy
 for a programmer to put together all the files needed and place them
 in the right place.

 And simeone should get the OpenSuse people to include R in the
 installation.

 Tom

 ++
 | Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
 | University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
 | Tel : +47-5558-9185Fax : +47-5558-9879 |
 | Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ |
 ++

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Aggregating data (with more than one function)

2005-04-21 Thread bogdan romocea
I am looking for an answer to a similar question - a generalized
solution that would be able to apply
   (1) any number of functions
   (2) to any number of vectors
   (3) by any number of factors 
(just like SQL's group by). 
The output data frame must contain the values of the by factors, to be
used for joins.

Aggregate() does (2) and (3). The solutions posted to this thread
(split+sapply, by, tapply) do (1) and (3) (or so it seems to me). What
would be the best way to get to (1)+(2)+(3)?

I am inclined to use aggregate() in a loop with 
eval(parse(text=aggregate expression here)). 
Running
groupby - do.call(rbind, by(var_i, list(a,b,c,d,e,f),
function(x) c(fct1(x),fct2(x),fct3(x),fct4(x
in a loop (var_1, var_2 etc) would be very nice but I don't know how
to add a-f as columns in the output data frame.

Thank you,
b.



On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman wrote:
 I have the data similar to the following in a data frame:
 LastName   Department  Salary
 1   JohnsonIT  56000
 2   James  HR  54223
 3   Howe   Finance 8
 4   Jones  Finance 82000
 5   NorwoodIT  67000
 6   Benson Sales   76000
 7   Smith  Sales   65778
 8   Baker  HR  56778
 9   DempseyHR  78999
 10  Nolan  Sales   45667
 11  Garth  Finance 89777
 12  JamesonIT  56786
 
 I want to calculate both the mean salary broken down by Department and 
 also the
 total amount paid out per department i.e. I want both sum(Salary) and
 mean(Salary) for each Department. Right now, I am using aggregate.data.frame
 twice, creating two data frames, and then combining them using data.frame.
 However, this seems to be very memory and processor intensive and is 
 taking a
 very long time on my data set. Is there a quicker way to do this?
 
 Thanks in advance,
 Siv Raman
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


  1   2   >