Re: [R] [R-SIG-Finance] Method dispatch in functions?
Look at the UseMethod function. The help for the print method, a heavily overloaded function, can also help. Regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Thu, 2007-06-28 at 09:05 -0700, John McHenry wrote: Hi, Could someone point me in the right direction for documentation on the following question? Let's say I have two objects a and b of classes A and B, respectively. Now let's say I write a function foo that does something similar to objects of type A and B. Basically I want to overload the function in C++ talk, so if I give foo and object of type A something (and this is my question) dispatches the call to, say, foo.A, and if I give foo and object of type B something dispatches the call to, say, foo.B. I want to write foo.A and foo.B. How to I perform the method dispatch? From what I understand there are two ways in R: S3 and S4. What is the simple S3 way? Thanks! Jack. - [[alternative HTML version deleted]] ___ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sobre tutorial
Hello, There are a number of resources (in Spanish) at the bottom of the page http://es.wikipedia.org/wiki/Lenguaje_de_programación_R Regards, Carlos J. Gil Bellosta http://www.datanalytics.com On Tue, 2007-05-15 at 18:49 +0200, Danilo Ceschin wrote: Estimado, te agradeceria me envies el tutorial en espaniol para R. Estoy dando mis primeros pasos con esta aplicacion. Desde ya muchas gracias Danilo Ceschin Ph.D IGBMC 1 rue Laurent Fries 67404 ILLKIRCH CEDEX - FRANCE tel 33 3 88 65 3457 email [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SAS proc format
On 3/7/07, Peter Dalgaard [EMAIL PROTECTED] wrote: Jason Barnhart wrote: - Original Message - From: John Kane [EMAIL PROTECTED] To: lamack lamack [EMAIL PROTECTED]; R-help@stat.math.ethz.ch Sent: Tuesday, March 06, 2007 2:13 PM Subject: Re: [R] R and SAS proc format --- lamack lamack [EMAIL PROTECTED] wrote: Dear all, Is there an R equivalent to SAS's proc format? What does the SAS PROC FORMAT do? It formats or reformats data in the SAS system. Slightly more precisely: It creates user-defined formats, which are subsequently associated with variables and used for reading, printing, tabulating, and analyzing data. It is akin to R's factor() constructions, but not quite. For one thing, SAS's formats are separate entities - same format can be used for many variables, whereas R's factors have the formatting coded as a part of the object. For related reasons, a variable in SAS can have more distinct values than there are value labesl for, etc. It looks this: proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me out'; data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end; proc print; run; And produces this: Obs i kanevar 1 1A 2 2B 3 3C 4 4X 5 5Throw me out 6 6 6 7 7 7 8 8 8 9 9 9 1010 10 But it is more robust than what is shown here. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Also, SAS formats are used as a (somewhat cumbersome) replacement for dictionary data structures. Starting from SAS 9.1 (I believe), hash tables can be used within data steps for the same purpose (albeit still cumbersome). In this regard, not only formats but also lists could be a replacement for them. They can be used as a way to get key-value mappings. These key-value mappings (I mean, these kind of data structures) are very handy tools. I have used both factors and lists for some kind of ad hoc replacement for these data structures. Hasn't anybody considered the posibility of having these data structures implemented in R in a much python-like or java-like touch and feel? Regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error loading a dependency in a package: missing namespace?
Dear r-helpers, I am building a package that depends on some others. I recently added a new dependency: package outliers. But does not work any more. Let me show some information below: [EMAIL PROTECTED]:pcrAnalysis$ cat DESCRIPTION Package: pcrAnalysis Type: Package Title: pcrAnalysis Version: 0.7.2 Date: 2007-02-27 Depends: Biobase, methods, outliers Author: Carlos J. Gil Bellosta [EMAIL PROTECTED] Maintainer: Carlos J. Gil Bellosta [EMAIL PROTECTED] Description: Package for the analysis of Taqman experiments License: TBA [EMAIL PROTECTED]:pcrAnalysis$ cat NAMESPACE import(methods, Biobase, outliers) exportPattern(^tqmn) exportClasses(pcrExprSet) exportMethods(task, task-, phenoData.sort) But now, the load of the packages fails. If I try to run [EMAIL PROTECTED]:tmp$ R CMD check pcrAnalysis I get the following log: * checking for working latex ... OK * using log directory '/tmp/pcrAnalysis.Rcheck' * using R version 2.4.1 (2006-12-18) * checking for file 'pcrAnalysis/DESCRIPTION' ... OK * checking extension type ... Package * this is package 'pcrAnalysis' version '0.7.2' * checking package dependencies ... OK * checking if this is a source package ... OK * checking whether package 'pcrAnalysis' can be installed ... OK * checking package directory ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking index information ... OK * checking package subdirectories ... OK * checking whether the package can be loaded ... ERROR Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()' or start with 'help(Biobase)'. For details on reading vignettes, see the openVignette help page. Loading required package: outliers Error in loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = keep.source) : in 'pcrAnalysis' classes for export not defined: pcrExprSet In addition: Warning message: package 'pcrAnalysis' contains no R code in: loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = keep.source) Error: package/namespace load failed for 'pcrAnalysis' Execution halted It seems that the error is related to something having to do with namespaces. The thing is that package outliers does not have a NAMESPACE file. Could this be an issue? I have contacted the author of the package and he sais that outliers has been used in another package, quantchem (also in CRAN). However, quantchem does not have a NAMESPACE file either. I have been looking for information on how the loadNamespace function works and even looking at its code. But can anybody give me a clue? Would the outliers package require a NAMESPACE file? By the way, I have contacted the author of the package and he has been quite helpful, but he says he feels that that (lack of this file) may not be causing the problem. And I am using R version 2.4.1 (2006-12-18) on an Ubuntu Edgy (6.10) box. Regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv issue
Dear Harold, One thing you can do is to read the file plainly, even if the \ is lost and then, inside R, change the string value with gsub. Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com El mié, 16-08-2006 a las 14:43 -0400, Doran, Harold escribió: I'm trying to read in some data from a .csv format and have come across the following issue. Here is a simple example for replication # A sample .csv format schid,sch_name 331-802-7081,School One 464-551-7357,School Two 388-517-7627,School Three \ Four 388-517-4394,School Five Note the third line includes the \ character. However, when I read the data in I get read.csv(file.choose()) schid sch_name 1 331-802-7081School One 2 464-551-7357School Two 3 388-517-7627 School Three Four 4 388-517-4394 School Five It turns out to be very important to read in this character as I have a program that loops through a data set and Sweave's about 30,000 files. The variable sch_name gets dropped into the tex file using \Sexpr{tmp$sch_name}. However, if there is an , the latex file won't compile properly. So, what I need is for the data to be read in as schid sch_name 1 331-802-7081School One 2 464-551-7357School Two 3 388-517-7627 School Three \ Four 4 388-517-4394 School Five I am obligated by a client to include the in the school name, so eliminating that isn't an option. I thought maybe comment.char or quote would be what I needed, but they didn't resolve the issue. I'm certain I'm missing something simple, I just can't see it. Any thoughts? Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first and last case
Dear Jacques, I believe you need dat ordered by ind and y before you apply your solution, right? Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Jacques VESLOT [EMAIL PROTECTED]: do.call(rbind,lapply(split(dat, dat$ind), function(x) x[c(1,nrow(x)),])) --- Jacques VESLOT CNRS UMR 8090 I.B.L (2ème étage) 1 rue du Professeur Calmette B.P. 245 59019 Lille Cedex Tel : 33 (0)3.20.87.10.44 Fax : 33 (0)3.20.87.10.31 http://www-good.ibl.fr --- Mauricio Cardeal a écrit : Hi all Sometime ago I asked for a solution about how to aggregate data and the help was wonderful. Now, I´d like to know how to extract for each individual case below the first and the last observation to obtain this: ind y 18 19 27 2 11 39 3 10 4 8 4 5 # Below the example: ind - c(1,1,1,2,2,3,3,3,4,4,4,4) y - c(8,10,9,7,11,9,9,10,8,7,6,5) dat - as.data.frame(cbind(ind,y)) dat attach(dat) mean.ind - aggregate(dat$y, by=list(dat$ind), mean) mean.ind Thanks Mauricio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random section of samples based on group membership
Dear Wade, Say that your groups are groups - sort(sample(1:10, 100, replace = TRUE)) Create a dummy rows - 1:length(groups) Then tapply( rows, groups, function(x) sample(x, 1)) does the trick to select the row numbers you need for your sampling. Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Wade Wall [EMAIL PROTECTED]: Hi all, I have a matrix of 474 rows (samples) with 565 columns (variables). each of the 474 samples belong to one of 120 groups, with the groupings as a column in the above matrix. For example, the group column would be: 1 1 1 2 2 2 . . . 120 120 I want to randomly select one from each group. Not all the groups have the same number of samples, some have 4, some 3 etc. Is there a function to do this, or would I need to write a looping statement to look at each successive group? I basically want to combine the randomly selected samples from the 120 groups into a new matrix in order to perform a cluster analysis. Thanks, Wade __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping by consecutive integers
Dear Kevin, Try something like groups - cut( tmp, c(-Inf, which(diff(tmp) 1 ) + 0.5, Inf) ) Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Kevin J Emerson [EMAIL PROTECTED]: Hello R-helpers! I have a question concerning extracting sequence information from a vector. I have a vector (representing the bins of a time series where the frequency of occurrences is greater than some threshold) where I would like to extract the min, median and max of each group of consecutive numbers. For Example: tmp - c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) I would like to have the max,min,median of the following groups: 24,25 29 35,36,37,38,39,40,41,42,43,44,45,46,47, 68,69,70,71 I would like to be able to perform this for many time series so an automated process would be nice. I am hoping to use this as a peak detection protocol. Any advice would be greatly appreciated, Kevin - - Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, OR 97403 USA [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse command
Dear Xin, Although I have no idea what your function does, I believe it would be formally correct in the following way: foo - function (parameters,y,x1,x2) { p -parameters[1] alpha1-parameters[2] beta1-parameters[3] delta1-parameters[4] alpha2-parameters[5] mu-alpha1*((x1)^beta1)*exp(-delta1*(x1^alpha2)) if(y0 x10 x2==1) return( lgamma(y+p)+p*(log(p)-log(mu +p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) ) if(y0 x10 x2==2) return( lgamma(y+p)+p*(log(p)-log(mu +p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) ) if(y0 x10 x2==3) return( lgamma(y+p)+p*(log(p)-log(mu +p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) ) return( what then? ) } Please, compare your code to mine. You will discover more than one error in your code. See also where the if() parenthesis closes. I do not believe your else's are necessary within a function. See also that the code after your ifs is always the same. Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com El sáb, 22-07-2006 a las 13:25 +0100, Xin escribió: Dear: I try to revise the maximum likelihood function below using something constrains. But it seems something wrong with it. Becasue R would not allow me to edit the function like this. It is very appreciate if you can help. function (parameters,y,x1,x2) { p-parameters[1] alpha1-parameters[2] beta1-parameters[3)] delta1-parameters[4] alpha2-parameters[5] mu-alpha1*((x1)^beta1)*exp(-delta1*(x1^alpha2)) if(y0 x10 x2==1, L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) ) else if(y0 x10 x2==2, L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) ) else if(y0 x10 x2==3, L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) ) else L } Thanks a lot! Xin Shi [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automating package building packages and repository uploading
Dear Rusers, Well, then it seems that the problem is that I am building linux binary packages. Since I do not have any compiled code within --just R code--, their contents should --and, in fact, are-- directly installable on Windows platforms (which is what I intend to do). If I understand things right, I could just rebundle the packages using zip instead of tar | gzip and getting rid of the arch string in the file name. They they would work on Windows. And they actually do when I do this by hand. But, is there a less involved way to generate these binary Windows packages with proper file names and compression method directly from my linux box? Thank you very much, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Prof Brian Ripley [EMAIL PROTECTED]: On Thu, 20 Jul 2006, Carlos J. Gil Bellosta wrote: Dear Rusers, I have developed two packages for a client of mine. After new features are added or bugs corrected, I upload them to my own web repository. I create both source and binary versions. binary Linux packages, it seems. The latter are .tar.gz with the arch as part of the name. .zip is used for Windows packages only. update.packages for Linux is designed to look for source packages only: see the 'type' argument. You can use the distro's packaging facilities for binary packages, and Dirk does for the Debian R distribution. I think those misconceptions explain your confusion. In fact, I made an script that checks, builds, and uploads them via ftp. However, I am facing two nuisances that do make it difficult to automate: 1) Even if I build the binary version with the command R CMD build --use-zip --binary $package within my script, the output package still gets tarballed and gzipped instead than simply zipped. I come around this automatically extracting and compressing back the files but, am I missing something some other option that would make all this simpler? 2) I expect my packages to be named something like mypackage_1.3.12.tar.gz or mypackage_1.3.12.zip. However, sometimes --I haven't looked at the code that decides the name to give to the packages, so it looks quite random to me-- they get renamed into something like mypackage_1.3.12_R_i486-pc-linux-gnu.tar.gz or mypackage_1.3.12_R_i486-pc-linux-gnu.zip. The problem is that, then, the update.packages() function cannot find them. Is there a way to prevent this trailing string from appearing in the file name? Or else, is there a way to have update.packages() find the package regardless of it? I am running platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) on Debian Etch with kernel 2.6.15-1-k7. Thank you very much. Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automating package building packages and repository uploading
Dear Rusers, I have developed two packages for a client of mine. After new features are added or bugs corrected, I upload them to my own web repository. I create both source and binary versions. In fact, I made an script that checks, builds, and uploads them via ftp. However, I am facing two nuisances that do make it difficult to automate: 1) Even if I build the binary version with the command R CMD build --use-zip --binary $package within my script, the output package still gets tarballed and gzipped instead than simply zipped. I come around this automatically extracting and compressing back the files but, am I missing something some other option that would make all this simpler? 2) I expect my packages to be named something like mypackage_1.3.12.tar.gz or mypackage_1.3.12.zip. However, sometimes --I haven't looked at the code that decides the name to give to the packages, so it looks quite random to me-- they get renamed into something like mypackage_1.3.12_R_i486-pc-linux-gnu.tar.gz or mypackage_1.3.12_R_i486-pc-linux-gnu.zip. The problem is that, then, the update.packages() function cannot find them. Is there a way to prevent this trailing string from appearing in the file name? Or else, is there a way to have update.packages() find the package regardless of it? I am running platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major 2 minor 3.1 year 2006 month 06 day01 svn rev38247 language R version.string Version 2.3.1 (2006-06-01) on Debian Etch with kernel 2.6.15-1-k7. Thank you very much. Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Distributing packages online
Dear R users, A customer of mine has asked me to develop a R-package. The sitiuation is as follows: 1) Being a special purpose package, it does not merit to be uploaded to any CRAN server. 2) It is subject to many changes, bug fixes, and the like. 3) It has then to be distributed to a relatively large and sparse base of users. I thought that it would be convenient to have it distributed in the following way: 1) I upload new versions to a server. 2) I provide a function that points to that server and installs it. Therefore, users need only invoke this function in order to access the last package version. Suppose I upload the package to http://www.myserver.com/packages/mypackage_X.Y.Z.tar.gz Typically, several versions would be stored there. The uploading function should be invoked simply. For instance, update.mypackage() Can anybody give me a hint as to which would be key command in this function how could the uploading function be defined so that the latest version would be updated or installed? Thank you very much for your help. Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] trace of matrix product
Try sum( rowSums( A * t(B)) ) Then you do not make any calculation you do not really need. Carlos J. Gil Bellosta http://www.datanalytics.com Quoting Robin Hankin [EMAIL PROTECTED]: Hi what is the best way to calculate the trace of a product of two matrices? If A - matrix(rnorm(48),nrow=6) B - matrix(rnorm(48),nrow=8) Is there a better (faster, more elegant) way to compute the trace of A%*%B than sum(diag(A %*% B))? I would call the above solution inelegant because all the elements of A %*% B are calculated, when one really only needs the elements on the diagonal. It also uses %*% instead of crossprod() or trcrossprod() -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] trace of matrix product
It is still better to do sum( A * t(B) ) Sorry!! Carlos J. Gil Bellosta http://www.datanalytics.com Quoting Robin Hankin [EMAIL PROTECTED]: Hi what is the best way to calculate the trace of a product of two matrices? If A - matrix(rnorm(48),nrow=6) B - matrix(rnorm(48),nrow=8) Is there a better (faster, more elegant) way to compute the trace of A%*%B than sum(diag(A %*% B))? I would call the above solution inelegant because all the elements of A %*% B are calculated, when one really only needs the elements on the diagonal. It also uses %*% instead of crossprod() or trcrossprod() -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Patient Rule Induction Method implemented in R?
Dear R users, I have not been able to find any implementation of Friedman's Patient Rule Induction Method, PRIM, available on R. Is there any? In any case, is there any reasonable alternative? So far, I have been using trees and just keeping the extreme leaves. But a project of mine requires a method that would just hunt the bumps in a high dimensional data set and that would produce a relatively understandable/graphical set of rules. Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Are minbucket and minsplit rpart options working as expected?
Dear r-list: I am using rpart to build a tree on a dataset. First I obtain a perhaps too large tree: arbol.bsvg.02 - rpart(formula, data = bsvg, subset=grp.entr, control=rpart.control(cp=0.001)) arbol.bsvg.02 n= 10 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 10 6657 0 (0.93343000 0.06657000) 2) meses_antiguedad_svg=10.5 73899 3658 0 (0.95050001 0.0494) 4) eor_n1_gns 1.5 63968 2807 0 (0.95611868 0.04388132) 8) tarifa_gas=31,32,33,34 63842 2771 0 (0.95659597 0.04340403) * 9) tarifa_gas=NO 126 36 0 (0.71428571 0.28571429) 18) tipo_mercado=ESP,N/A 90 10 0 (0.8889 0.) * 19) tipo_mercado=NE ,SAH,SAV 36 10 1 (0.2778 0.7222) * 5) eor_n1_gns=1.5 9931 851 0 (0.91430873 0.08569127) 10) sn_calef=0.5 8390 546 0 (0.93492253 0.06507747) * 11) sn_calef 0.5 1541 305 0 (0.80207657 0.19792343) 22) tarifa_gas=31,NO 1134 141 0 (0.87566138 0.12433862) * 23) tarifa_gas=32 407 164 0 (0.59705160 0.40294840) 46) cons_gas_delta_1 6997 196 51 0 (0.73979592 0.26020408) * 47) cons_gas_delta_1=6997 211 98 1 (0.46445498 0.53554502) 94) meses_antiguedad_svg=23.5 134 54 0 (0.59701493 0.40298507) 188) altitud 312 61 16 0 (0.73770492 0.26229508) * 189) altitud=312 73 35 1 (0.47945205 0.52054795) 378) back_office=1.5 39 12 0 (0.69230769 0.30769231) * 379) back_office 1.5 348 1 (0.23529412 0.76470588) * 95) meses_antiguedad_svg 23.5 77 18 1 (0.23376623 0.76623377) * 3) meses_antiguedad_svg 10.5 26101 2999 0 (0.88510019 0.11489981) 6) sn_calef=0.5 20129 1853 0 (0.90794376 0.09205624) * 7) sn_calef 0.5 5972 1146 0 (0.80810449 0.19189551) 14) tarifa_gas=31 4406 664 0 (0.84929641 0.15070359) * 15) tarifa_gas=32,NO 1566 482 0 (0.69220945 0.30779055) 30) eor_n1_gns 0.5 1168 306 0 (0.73801370 0.26198630) * 31) eor_n1_gns=0.5 398 176 0 (0.55778894 0.44221106) 62) back_office=0.5 148 35 0 (0.76351351 0.23648649) * 63) back_office 0.5 250 109 1 (0.4360 0.5640) * So I decide not to consider branches with less than 1000 observations, a 1% of the original number of observations. Therefore, according to the rpart.control help pages, I set minbucket=1000. However, arbol.bsvg.02 n= 10 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 10 6657 0 (0.9334300 0.0665700) * And I get an empty tree. But there were branches in the original tree with more than 1000 observations. Something similar happens if I set minsplit (or both minbucket and minsplit) to a similar value: I end up with the same root, branch-less tree. Am I misreading something? Can anybody cast a light on the correct usage of the minbucket (and/or minsplit) for me? Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] hamming distance
True, but if x - c(1, 0, 0) y - c(1, 0, 1), then you can just define hamming.distance - function(x,y){ sum(x != y) } The problem is just a bit harder when x and y are, for instance, strings. Carlos J. Gil Bellosta http://www.datanalytics.com Quoting JeeBee [EMAIL PROTECTED]: You could install.packages(e1071) and see help(hamming.distance) JeeBee. hamming.distancepackage:e1071R Documentation Hamming Distances of Vectors Description: If both 'x' and 'y' are vectors, 'hamming.distance' returns the Hamming distance (number of different bytes) between this two vectors. If 'x' is a matrix, the Hamming distances between the rows of 'x' are computed and 'y' is ignored. Usage: hamming.distance(x, y) Arguments: x: a vector or matrix. y: an optional vector. Examples: x - c(1, 0, 0) y - c(1, 0, 1) hamming.distance(x, y) z - rbind(x,y) rownames(z) - c(Fred, Tom) hamming.distance(z) On Thu, 24 Nov 2005 11:33:45 +0100, Ana Conesa wrote: Hi, Does anyone know an R function to impute hamming distance? Thanks Ana O@ nb @@@O@@O@ Centro de Genómica @OO@ Instituto Valenciano de Investigaciones Agrarias (IVIA) @@@O Carretera Moncada - Naquera, Km. 4,5 O@46113 Moncada (Valencia) SPAIN || || __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Applying a function to a vector
You should then try apply. See ?apply. For instance: a - matrix(c(1:10), 5, 2) apply(a, 1, function(x) x[1] + 100 * x[2]) This way you can disaggregate the components of your input (that is not a singleton) and use them inside your function separately. Carlos J. Gil Bellosta http://www.datanalytics.com Quoting Florent Bresson [EMAIL PROTECTED]: I just have a try with sapply. The problem is that my function pbeta2 has two parameters z and p (wich is a vector of two parameters). If I use sapply , R returns a message incicating that parameter p is missing. It is a problem since both z and p are varying along my data.frame. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problem with RCMD build ...
Dear Justin, I also had similar problems recently... In fact, I have just created a package using package.skeleton and if I try to build it without editing the DESCRIPTION file (you did not mention you did, right?) R complains. In fact, it appears to be malformed (I am running version 2.1.0 of R on a Debian (testing) Linux box); it is [sic]: Package: fooType: Package Title: What the package does (short line) Version: 1.0 Date: 2005-08-23 Author: Who wrote it Maintainer: Who to complain to [EMAIL PROTECTED] Description: More about what it does (maybe more than one line) License: What license is it under? As you can see, the two first lines are pasted together. Can this perhaps be the problem? Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com justin bem wrote: Hi I have write A function to draw pyramid of age. I have two function draw.pyramide(h,f,l) , pyramide(h,f,l) and a data frame with data. I first use package.skeleton(pyra) I got the package structure Then in my shell I use RCMD build pyra I get this : check for description ... OK removing junk files but cannot open pyra/description and Install failed. I use windows XP, with and intel PIV with 2.66GHZ and 512MO of memory. Can you help me ? Justin BEM Elève Ingénieur Statisticien Economiste BP 294 Yaoundé. Tél (00237)9597295. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] write.table() performance.
Dear r-helpers, I know that there has already been enough questions on IO performance these last days, but I came accross the following situation today. I was comparing the performance of R with that of SAS's Risk Dimensions at generating random scenarios. My dataset --all numeric entries-- would nicely fit into RAM and R would outperform SAS until... I wanted to export the results to a .csv file using the write.table() function. For reference, this output file was of about 30MB. Moreover, the memory needed by R would increase sharply during the writing process. I had a look at the code for the write.table() function and I found out that, basically, what it does is to create a very long text string from the data using paste() and then to print it using writeLines(). Rprof() showed that writeLines() would only use a mere 3% of the computing time, the rest being taken almost entirely by paste(). There are two directions in which performance could potentially be improved: 1.- Writing speed. 2.- Memory usage. Regarding memory usage, I thought that perhaps a little rewriting of the write.table() function could be considered: instead of writing in RAM a single long text string, with a little overhead, the data frame to be printed could be splitted into shorter, recyclable, chunks, then paste()-ing them into shorter buffer strings and print them sequentially into the the output file. (Note: I am a complete ignorant on R's memory recycling rules and this could perhaps not work as intended because of them). Regarding speed considerations, I see little hope as long as the paste() function is implicitly called by write.table(). Most likely, its execution time scales linearly with the number of lines in the data frame, so splitting it would render no benefits. Are there any hints on how could a performance improvement (other than linking external, ad hoc C code) be achieved? Do we really need to go through parse()? Would it perhaps be beneficial to include in R some specialized functions that achieved high output performance for writing out, say, only numeric values (this happens to be the case for me most of the time)? Sorry for the long posting. Carlos J. Gil Bellosta __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] BoxPlots, 1 Way ANOVA and Non-Statisticians.
Dear R-Helpers, I am working in a project and I have a number of observations belonging to several classes. Using a 1 Way ANOVA, I have rejected the equality of means hypothesis with a very small p-value. However, the people I have to present my results to are not statisticians and they are not very likely to be much impressed by a 1.32434e-12 like number/thing. Therefore I have decided to make to boxplots, one for the actual data and the second one for simulated data where the equality of the means holds so that the difference in the distributions can be visually appreciated. The problem is that, for the simulated values, being more regular, the range of variation is smaller and, therefore, the heigth of the window where their boxplot is drawn is also smaller. As a result, the scales of the two boxplots are not the same and part of the appeal of the visual approach is lost in the way. My question is, is there a way to make two different boxplots within a common window? (Or rather, a common size window or, more concretely, so that it spans over the same range on the vertical axis). Sincerely, Carlos J. Gil Bellosta Sigma Consultores Estadi'sticos http://www.consultoresestadisticos.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Partial least squares.
Dear R-helpers, I am looking, quite unsuccesfully, for a number of functions/packages. Firstly, I am interested in a package for partial least squares. I have found that there seemed to exist a package called pls, but which seems not to run any more with modern versions of R. I have not been able to find certain chemometrics package I found some people discussing about in this list some time ago and that, it seems, included these kind of functions. Secondly, I have not been able to find a function equivalent to the SAS procedure STEPDISC which performs a step process (only available for lm and glm on R) on linear discriminant analysis (lda on R). Does anybody know of a top-the-shelf answer to these questions? Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R icon for RedHat 9.
Dear R-helpers, I just installed R on a RedHat 9 machine and when I was trying add a launcher on the panel to it, I could not find any R icon in png format. Is there any available? Do I have it and I could not find it in my file hierarchy? Thank you for your help. Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R-icon for RH Linux 9.
Dear R helpers, It is the first time I have to install R under Linux (Red Hat 9) and as I was setting up the panels, I could not find any png icon to launch the program. Does it exist and I have overlooked it? Can I get it somewhere? Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] test for equal distributions with small numbes of observations
Probably, to check homogenity you only need to consider a vector of dimension 8 consisting in the number of times you get any of the configurations (0,7), (1,6), (2,5),...(7,0). This is most likely a sufficient statistics. Under homogeneity, it should be distributed according to a multinomial random variable with probability vector equal to the density of a binomial variable of unknown parameter p. Then you can use standard tests to see if the MVE fits or if it does not fit the data. Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com Christoph Lehmann wrote: Dear R-pros I have a problem, for which usually I would apply a chisq.test or a fisher.test: 40 objects, each given either a 0 or 1, regarding if this object later on will be remembered by a subject or not. 7 subjects investigated means: we have a 2x40 matrix, each cell the number of subjects for who the object i has been given either 0 or 1 e.g. objects: 1 1 3 39 40 -- 0 1 2 2 .. 7 7 1 4 4 5 .. 0 0 over all 40 objects, we have 67% of 1 and 33% of 0 I want to know, if for the 40 objects, the ratio of 0/1 differs or not, i.e. if they have the same distribution. I cannot use a chisq.test since the expected frequencies are 5 for the 0 cells. Fisher.test seems to run for 12h on a PIV 1.8GHz... what do you recommend me to do? Many thanks Christoph -- recognition [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,]122223333 3 3 3 4 4 [2,]445443444 4 3 4 3 3 [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [1,] 4 4 4 4 4 4 5 5 5 5 5 5 [2,] 3 2 3 3 2 3 2 2 2 2 2 2 [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [1,] 5 5 5 5 5 6 6 6 6 6 6 6 [2,] 2 2 2 2 2 1 1 1 1 1 1 1 [,39] [,40] [1,] 7 7 [2,] 0 0 fisher.test(recognition) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Statistical analysis of huge datasets.
Dear R-users, I am faced with the problem of analyzing a huge dataset (+ 2 million records, +150 variables) which does not fit into memory. I would like to know if there are pre-packaged tools (in the spirit of Insigthful I-Miner, for instance) aimed at subsampling or splitting the dataset into data-frameable subdatasets, applying functions record-wise, etc. Thank you very much for your help. Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] C compiler to R
gcc (I just need to write that; I tried Borland's and I only got trouble). Sincerely, Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com Cezar Augusto de Freitas Anselmo wrote: Hi, all. I'd like know which is the more appropriate C compiler to use with R programs. Thanks, C. Cezar Freitas (ICQ 109128967) IMECC - UNICAMP Campinas, SP - Brasil __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] univariate normal mixtures
Well, If k is known, you can use maximun likelihood to fit the weights, means, and sd's. The EM algorithm can be of help to solve the optimization problem. You would have to implement it yourself for your particular case, but I do not think it is big trouble. Then you could estimate k using Bayesian formalism: from a reasonable a priory distribution on k=1, 2,... compute the posterior distributions using the densities obtained above, etc. Carlos J. Gil Bellosta Sigma Consultores Estadísticos http://www.consultoresestadisticos.com Joke Allemeersch wrote: Hello, I have a concrete statistical question: I have a sample of an univariate mixture of an unknown number (k) of normal distributions, each time with an unknown mean `m_i' and a standard deviation `k * m_i', where k is known factor constant for all the normal distributions. (The `i' is a subscript.) Is there a function in R that can estimate the number of normal distributions k and the means `m_i' for the different normal distributions from a sample? Or evt. a function that can estimate the `m_i', when the number of distributions `k' is known? So far I only found a package, called `normix'. But at first sight it only provides methods to sample from such distributions and to estimate the densities; but not to fit such a distribution. Can someone indicate where I can find an elegant solution? Thank you in advance Joke Allemeersch Katholieke universiteit Leuven. Belgium. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help