Re: [R] RWeka cross-validation and Weka_control Parametrization

2007-08-14 Thread Kurt Hornik
 On Wed, 01 Aug 2007 10:52:02 +0200, Bjoern wrote:

 Hello,

  I have two questions concerning the RWeka package:

  1.) First question:
  How can one perform a cross validation, -say 10fold- for a given 
 data set and given model ?

  2.) Second question
  What is the correct syntax for the parametrization of e.g. Kernel 
 classifiers interface
m1 - SMO(Species ~ ., data = iris, control = 
  
 Weka_control(K=weka.classifiers.functions.supportVector.RBFKernel,G=0.1))
m2 - SMO(Species ~ ., data = iris, control = 
  
 Weka_control(K=weka.classifiers.functions.supportVector.RBFKernel,G=1.0))

 m1
  SMO

  Kernel used:
  RBF kernel: K(x,y) = e^-(0.01* x-y,x-y^2)

  ## should be: RBF kernel: K(x,y) = e^-(0.1* x-y,x-y^2)

 etc.

The answer for question 2 is surprisingly simple, but nevertheless took
me about half an hour to find:

  m2 - SMO(Species ~ ., data = iris,
  control = Weka_control(K = 
weka.classifiers.functions.supportVector.RBFKernel -G 2))

gives

R m2
SMO

Kernel used:
  RBF kernel: K(x,y) = e^-(2.0* x-y,x-y^2)

[Using Weka_control(K = ..., G = ...) passes the G option to SMO but not
RBFKernel.  The docs for SMO() say

 -K classname and parameters
  The Kernel to use.
  (default: weka.classifiers.functions.supportVector.PolyKernel)

and one needs to remember Weka's command line style interface to realize
that this deparses into putting everything into a string for the K
option.]

This is of course not quite what R users would expect, and we'll try to
improve the Weka control mechanism so that specifying (Weka class)
options which require additional parameters becomes more convenient.

Best
-k

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get.hist.quote problem yahoo

2007-02-13 Thread Kurt Hornik
 Rene Braeckman writes:

 I had the same problem some time ago. Below is a function that I
 picked up on the web somewhere (can't remember where; may have been a
 newsletter).  It's based on the tseries function but the difference is
 that this function produces a data frame with a column containing the
 dates of the quotes, instead of a time series object. I had to replace
 %d-%b-%y by %Y-%m-%d to make it work, probably as you stated
 because the format was changed by Yahoo.

This issue should be taken care of now by a new release of tseries I put
out two days ago.

-k

 Hope this helps.

 Rene

 # --
 # df.get.hist.quote() function
 #
 # Based on code by A. Trapletti (package tseries)
 #
 # The main difference is that this function produces a data frame with
 # a column containing the dates of the quotes, instead of a time series
 # object.
 df.get.hist.quote - function (instrument = ibm,
start, end,
quote = c(Open,High, Low,
 Close,Volume),
provider = yahoo, method = auto) 
 {
 if (missing(start)) 
 start - 1970-01-02
 if (missing(end)) 
 end - format(Sys.time() - 86400, %Y-%m-%d)
 provider - match.arg(provider)
 start - as.POSIXct(start, tz = GMT)
 end - as.POSIXct(end, tz = GMT)
 if (provider == yahoo) {
 url - paste(http://chart.yahoo.com/table.csv?s=;, instrument, 
 format(start, a=%mb=%dc=%Y), format(end,
 d=%me=%df=%Y), 
 g=dq=qy=0z=, instrument, x=.csv, sep = )
 destfile - tempfile()
 status - download.file(url, destfile, method = method)
 if (status != 0) {
 unlink(destfile)
 stop(paste(download error, status, status))
 }
 status - scan(destfile, , n = 1, sep = \n, quiet = TRUE)
 if (substring(status, 1, 2) == No) {
 unlink(destfile)
 stop(paste(No data available for, instrument))
 }
 x - read.table(destfile, header = TRUE, sep = ,)
 unlink(destfile)
 nser - pmatch(quote, names(x))
 if (any(is.na(nser))) 
 stop(This quote is not available)
 n - nrow(x)
 lct - Sys.getlocale(LC_TIME)
 Sys.setlocale(LC_TIME, C)
 on.exit(Sys.setlocale(LC_TIME, lct))
 dat - gsub( , 0, as.character(x[, 1]))
 dat - as.POSIXct(strptime(dat, %Y-%m-%d), tz = GMT)
 if (dat[n] != start) 
 cat(format(dat[n], time series starts %Y-%m-%d\n))
 if (dat[1] != end) 
 cat(format(dat[1], time series ends   %Y-%m-%d\n))
 
 return(data.frame(cbind(Date=I(format(dat[n:1],%Y-%m-%d)),x[n:1,nser]),row
 .names=1:n))
   }
 else stop(Provider not implemented)
 } 

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Daniele Amberti
 Sent: Friday, February 09, 2007 5:22 AM
 To: r-help
 Subject: [R] get.hist.quote problem yahoo

 I have functions using get.hist.quote() from library tseries.

 It seems that something changed (yahoo) and function get broken.

 try with a simple

 get.hist.quote('IBM')

 and let me kow if for someone it is still working.

 I get this error:
 Error in if (!quiet  dat[n] != start) cat(format(dat[n], time series
 starts %Y-%m-%d\n)) : 
 missing value where TRUE/FALSE needed

 Looking at the code it seems that before the format of dates in yahoo's cv
 file was not iso.
 Now it is iso standard year-month-day

 Anyone get the same problem?


 --
 Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom
 http://click.libero.it/infostrada9feb07

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R CMD check fails at package dependencies check on Fedora Core 5, works on other systems

2006-09-20 Thread Kurt Hornik
 Marc Schwartz (via MN) writes:

 On Tue, 2006-09-19 at 22:16 +1000, Robert King wrote:
 Here is another thing that might help work out what is happening.  If I 
 use --no-install, ade4 actually fails as well, in the same way as zipfR.
 
 [Desktop]$ R CMD check --no-install ade4
 * checking for working latex ... OK
 * using log directory '/home/rak776/Desktop/ade4.Rcheck'
 * using Version 2.3.1 (2006-06-01)
 * checking for file 'ade4/DESCRIPTION' ... OK
 * this is package 'ade4' version '1.4-1'
 * checking if this is a source package ... OK
 * checking package directory ... OK
 * checking for portable file names ... OK
 * checking for sufficient/correct file permissions ... OK
 * checking DESCRIPTION meta-information ... ERROR
 
 [Desktop]$ R CMD check --no-install zipfR
 * checking for working latex ... OK
 * using log directory '/home/rak776/Desktop/zipfR.Rcheck'
 * using Version 2.3.1 (2006-06-01)
 * checking for file 'zipfR/DESCRIPTION' ... OK
 * checking extension type ... Package
 * this is package 'zipfR' version '0.6-0'
 * checking if this is a source package ... OK
 * checking package directory ... OK
 * checking for portable file names ... OK
 * checking for sufficient/correct file permissions ... OK
 * checking DESCRIPTION meta-information ... ERROR

 snip

 Robert,

 I tried the process last night (my time) using the initial instructions
 on my FC5 system with:

 $ R --version
 R version 2.3.1 Patched (2006-08-06 r38829)
 Copyright (C) 2006 R Development Core Team


 I could not replicate the problem.

 However, this morning, with your additional communication:

 $ R CMD check --no-install zipfR_0.6-0.tar.gz
 * checking for working latex ... OK
 * using log directory '/home/marcs/Downloads/zipfR.Rcheck'
 * using Version 2.3.1 Patched (2006-08-06 r38829)
 * checking for file 'zipfR/DESCRIPTION' ... OK
 * checking extension type ... Package
 * this is package 'zipfR' version '0.6-0'
 * checking if this is a source package ... OK
 * checking package directory ... OK
 * checking for portable file names ... OK
 * checking for sufficient/correct file permissions ... OK
 * checking DESCRIPTION meta-information ... OK
 * checking top-level files ... OK
 * checking index information ... OK
 * checking package subdirectories ... OK
 * checking R files for syntax errors ... OK
 * checking R files for library.dynam ... OK
 * checking S3 generic/method consistency ... OK
 * checking replacement functions ... OK
 * checking foreign function calls ... OK
 * checking Rd files ... OK
 * checking Rd cross-references ... WARNING
 Warning in grep(pattern, x, ignore.case, extended, value, fixed,
 useBytes) :
  input string 70 is invalid in this locale
 * checking for missing documentation entries ... WARNING
 Warning in grep(pattern, x, ignore.case, extended, value, fixed,
 useBytes) :
  input string 70 is invalid in this locale
 All user-level objects in a package should have documentation entries.
 See chapter 'Writing R documentation files' in manual 'Writing R
 Extensions'.
 * checking for code/documentation mismatches ... OK
 * checking Rd \usage sections ... OK
 * checking DVI version of manual ... OK

 WARNING: There were 2 warnings, see
   /home/marcs/Downloads/zipfR.Rcheck/00check.log
 for details



 So I am wondering if this raises the possibility of a locale issue on
 your FC5 system resulting in a problem reading DESCRIPTION files?  It
 may be totally unrelated, but one never knows I suppose. Mine is:

 $ locale
 LANG=en_US.UTF-8
 LC_CTYPE=en_US.UTF-8
 LC_NUMERIC=en_US.UTF-8
 LC_TIME=en_US.UTF-8
 LC_COLLATE=en_US.UTF-8
 LC_MONETARY=en_US.UTF-8
 LC_MESSAGES=en_US.UTF-8
 LC_PAPER=en_US.UTF-8
 LC_NAME=en_US.UTF-8
 LC_ADDRESS=en_US.UTF-8
 LC_TELEPHONE=en_US.UTF-8
 LC_MEASUREMENT=en_US.UTF-8
 LC_IDENTIFICATION=en_US.UTF-8
 LC_ALL=


 HTH,

 Marc Schwartz

That's a bug in tools:::Rd_aliases (it needs to preprocess the Rd lines,
which re-encodes if necessary and possible).

I'll commit a fix later today.

Thanks for spotting this.

Best
-k

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Job Openings at WU Wien

2005-06-21 Thread Kurt Hornik

The Department of Statistics and Mathematics at the Vienna University of
Economics and Business Administration invites applications for two new
faculty positions in computational statistics and quantitative research
methodology, to begin in fall 2005.  The positions will be at the
Assistant level.  Candidates should have a strong potential for
statistical computing or intramural research support and statistical
consulting, be interested in involving graduate and undergraduate
students in their research, have completed their Ph.D. or a comparable
degree by June 2005, and be citizens of the European Union.

We seek candidates who can teach a graduate course in advanced applied
statistics and quantitative research methodology and other courses at
the graduate and undergraduate level, mentor students in undergraduate
and graduate research projects as well as Master's and Ph.D. theses.
Candidates should have a strong background in one of the following
areas: psychometrics, computational management or social sciences, and
information systems.  Desirable knowledge and skills include topics such
as statistical software development, quantitative research methodology,
and advanced applied statistical techniques.

The Department of Statistics and Mathematics at the Vienna University of
Economics and Business Administration (WU Wien) has a strong research
focus with currently 14 full time faculty with substantial graduate and
undergraduate teaching responsibilities.  It maintains a leading
position in the development of R, a comprehensive open source
environment for statistical computing.  WU Wien
(http://www.wu-wien.ac.at/english/about) is one of the leading Central
European institutions for international business education with about
20,000 students and more than 1,000 full-time and adjunct faculty and
staff members.

Applicants should submit a letter of interest (with reference numbers
43448 [6-yr position] or 42948 [4-yr position]), current vitae, recent
papers, etc., by July 18, 2005 to

  PERSONALABTEILUNG
  Wirtschaftsuniversitaet Wien
  Augasse 2-6
  1090 Vienna
  Austria


Kurt Hornik, Chair
Department of Statistics and Mathematics
Wirtschaftsuniversitaet Wien

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] algorithms for matching and Hungarian method

2005-02-14 Thread Kurt Hornik
 Martin Olivier writes:

 Hi all,

 I would like to match two partitions. That is, if I have exactly the
 same objects grouped together for the two partitions, the labels may
 be arbitrarly permuted.  and so, i would like to know the
 correspondances of the groups between the two clusterings.

 In the e1701 pachage, it is possible to use the function
 matchClasses() for this problem. The problem is that for k greater
 than 10 (k number of classes), I have a memory problem. So I would
 like to know if this function explicitly examine all k! possible
 matches, or if it uses the Hungarian method (or an other optimal
 algorithm).  If not, do you know if I can find one.

e1071::matchClasses() does not.

However, clue::solve_LSAP() provides an implementation of the Hungarian
method for solving the LSAP.

Of course, you can also use the Simplex algorithm for solving the LSAP,
and in fact lpSolve::lp.assign() does that for you.

Hth
-k

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html