Re: [R] Classification

2007-07-19 Thread Michal Kneifl
For all who sent help on topic Classification:

Thank you very much folks. 
I have got some inspiration how to solve this task.

Michael

- Original Message - 
From: "Marc Schwartz" <[EMAIL PROTECTED]>
To: "Ing. Michal Kneifl, Ph.D." <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, July 18, 2007 7:53 PM
Subject: Re: [R] Classification


> On Wed, 2007-07-18 at 19:36 +0200, Ing. Michal Kneifl, Ph.D. wrote:
>> Hi,
>> I am also a quite new user of R and would like to ask you for help:
>> I have a data frame where all columns are numeric variables. My aim is  
>> to convert one columnt in factors.
>> Example:
>> MD
>> 0.2
>> 0.1
>> 0.8
>> 0.3
>> 0.7
>> 0.6
>> 0.01
>> 0.2
>> 0.5
>> 1
>> 1
>> 
>> 
>> I want to make classes:
>> 0-0.2 A
>> 0.21-0.4 B
>> 0.41-0.6 C
>> . and so on
>> 
>> So after classification I wil get:
>> MD
>> A
>> A
>> D
>> B
>> .
>> .
>> .
>> and so on
>> 
>> Please could you give an advice to a newbie?
>> Thanks a lot in advance..
>> 
>> Michael
> 
> See ?cut
> 
> You can then do something like:
> 
>> DF
> MD
> 1  0.20
> 2  0.10
> 3  0.80
> 4  0.30
> 5  0.70
> 6  0.60
> 7  0.01
> 8  0.20
> 9  0.50
> 10 1.00
> 11 1.00
> 
> 
>> cut(DF$MD, breaks = c(seq(0, 1, .2)), labels = LETTERS[1:5])
> [1] A A D B D C A A C E E
> Levels: A B C D E
> 
> 
> HTH,
> 
> Marc Schwartz
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification

2007-07-18 Thread Marc Schwartz
On Wed, 2007-07-18 at 12:53 -0500, Marc Schwartz wrote:
> On Wed, 2007-07-18 at 19:36 +0200, Ing. Michal Kneifl, Ph.D. wrote:
> > Hi,
> > I am also a quite new user of R and would like to ask you for help:
> > I have a data frame where all columns are numeric variables. My aim is  
> > to convert one columnt in factors.
> > Example:
> > MD
> > 0.2
> > 0.1
> > 0.8
> > 0.3
> > 0.7
> > 0.6
> > 0.01
> > 0.2
> > 0.5
> > 1
> > 1
> > 
> > 
> > I want to make classes:
> > 0-0.2 A
> > 0.21-0.4 B
> > 0.41-0.6 C
> > . and so on
> > 
> > So after classification I wil get:
> > MD
> > A
> > A
> > D
> > B
> > .
> > .
> > .
> > and so on
> > 
> > Please could you give an advice to a newbie?
> > Thanks a lot in advance..
> > 
> > Michael
> 
> See ?cut
> 
> You can then do something like:
> 
> > DF
>  MD
> 1  0.20
> 2  0.10
> 3  0.80
> 4  0.30
> 5  0.70
> 6  0.60
> 7  0.01
> 8  0.20
> 9  0.50
> 10 1.00
> 11 1.00
> 
> 
> > cut(DF$MD, breaks = c(seq(0, 1, .2)), labels = LETTERS[1:5])
>  [1] A A D B D C A A C E E
> Levels: A B C D E

For precision, let's clean that up as I just realized that I left the
remnants of c() in there from an alternative solution, which is not
needed here:

  cut(DF$MD, breaks = seq(0, 1, .2), labels = LETTERS[1:5])

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification

2007-07-18 Thread jim holtman
You can use 'cut':

> x
 MD
1  0.20
2  0.10
3  0.80
4  0.30
5  0.70
6  0.60
7  0.01
8  0.20
9  0.50
10 1.00
11 1.00
> cut(x$MD, breaks=seq(0,1,.2), include.lowest=TRUE, labels=LETTERS[1:5])
 [1] A A D B D C A A C E E
Levels: A B C D E
>


On 7/18/07, Ing. Michal Kneifl, Ph.D. <[EMAIL PROTECTED]> wrote:
> Hi,
> I am also a quite new user of R and would like to ask you for help:
> I have a data frame where all columns are numeric variables. My aim is
> to convert one columnt in factors.
> Example:
> MD
> 0.2
> 0.1
> 0.8
> 0.3
> 0.7
> 0.6
> 0.01
> 0.2
> 0.5
> 1
> 1
>
>
> I want to make classes:
> 0-0.2 A
> 0.21-0.4 B
> 0.41-0.6 C
> . and so on
>
> So after classification I wil get:
> MD
> A
> A
> D
> B
> .
> .
> .
> and so on
>
> Please could you give an advice to a newbie?
> Thanks a lot in advance..
>
> Michael
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification

2007-07-18 Thread Marc Schwartz
On Wed, 2007-07-18 at 19:36 +0200, Ing. Michal Kneifl, Ph.D. wrote:
> Hi,
> I am also a quite new user of R and would like to ask you for help:
> I have a data frame where all columns are numeric variables. My aim is  
> to convert one columnt in factors.
> Example:
> MD
> 0.2
> 0.1
> 0.8
> 0.3
> 0.7
> 0.6
> 0.01
> 0.2
> 0.5
> 1
> 1
> 
> 
> I want to make classes:
> 0-0.2 A
> 0.21-0.4 B
> 0.41-0.6 C
> . and so on
> 
> So after classification I wil get:
> MD
> A
> A
> D
> B
> .
> .
> .
> and so on
> 
> Please could you give an advice to a newbie?
> Thanks a lot in advance..
> 
> Michael

See ?cut

You can then do something like:

> DF
 MD
1  0.20
2  0.10
3  0.80
4  0.30
5  0.70
6  0.60
7  0.01
8  0.20
9  0.50
10 1.00
11 1.00


> cut(DF$MD, breaks = c(seq(0, 1, .2)), labels = LETTERS[1:5])
 [1] A A D B D C A A C E E
Levels: A B C D E


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification

2007-07-18 Thread Benilton Carvalho
maybe:

x = c(.2, .1, .8, .3, .7, .6, .01, .2, .5, 1, 1)
breaks = seq(0, 1, .2)
LETTERS[1:(length(breaks)-1)][cut(x, breaks)]

b

On Jul 18, 2007, at 1:50 PM, Doran, Harold wrote:

> Michael
>
> Assume your data frame is called "data" and your variable is called
> "V1". Converting this to a factor is:
>
> data$V1 <- factor(data$V1)
>
> Creating the classes can be done using ifelse(). Something like
>
> data$class <- ifelse(data$V1 < .21, A, ifelse(data$V1 < .41, B, C))
>
> Harold
>
>
>> -Original Message-
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of Ing.
>> Michal Kneifl, Ph.D.
>> Sent: Wednesday, July 18, 2007 1:37 PM
>> To: r-help@stat.math.ethz.ch
>> Subject: [R] Classification
>>
>> Hi,
>> I am also a quite new user of R and would like to ask you for help:
>> I have a data frame where all columns are numeric variables.
>> My aim is to convert one columnt in factors.
>> Example:
>> MD
>> 0.2
>> 0.1
>> 0.8
>> 0.3
>> 0.7
>> 0.6
>> 0.01
>> 0.2
>> 0.5
>> 1
>> 1
>>
>>
>> I want to make classes:
>> 0-0.2 A
>> 0.21-0.4 B
>> 0.41-0.6 C
>> . and so on
>>
>> So after classification I wil get:
>> MD
>> A
>> A
>> D
>> B
>> .
>> .
>> .
>> and so on
>>
>> Please could you give an advice to a newbie?
>> Thanks a lot in advance..
>>
>> Michael
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification

2007-07-18 Thread John Kane
Have a look at the recode function in the car package

library(car)
?recode
 should give you what you need.


--- "Ing. Michal Kneifl, Ph.D." <[EMAIL PROTECTED]>
wrote:

> Hi,
> I am also a quite new user of R and would like to
> ask you for help:
> I have a data frame where all columns are numeric
> variables. My aim is  
> to convert one columnt in factors.
> Example:
> MD
> 0.2
> 0.1
> 0.8
> 0.3
> 0.7
> 0.6
> 0.01
> 0.2
> 0.5
> 1
> 1
> 
> 
> I want to make classes:
> 0-0.2 A
> 0.21-0.4 B
> 0.41-0.6 C
> . and so on
> 
> So after classification I wil get:
> MD
> A
> A
> D
> B
> .
> .
> .
> and so on
> 
> Please could you give an advice to a newbie?
> Thanks a lot in advance..
> 
> Michael
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification

2007-07-18 Thread Doran, Harold
Michael

Assume your data frame is called "data" and your variable is called
"V1". Converting this to a factor is:

data$V1 <- factor(data$V1) 

Creating the classes can be done using ifelse(). Something like

data$class <- ifelse(data$V1 < .21, A, ifelse(data$V1 < .41, B, C))

Harold


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ing. 
> Michal Kneifl, Ph.D.
> Sent: Wednesday, July 18, 2007 1:37 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Classification
> 
> Hi,
> I am also a quite new user of R and would like to ask you for help:
> I have a data frame where all columns are numeric variables. 
> My aim is to convert one columnt in factors.
> Example:
> MD
> 0.2
> 0.1
> 0.8
> 0.3
> 0.7
> 0.6
> 0.01
> 0.2
> 0.5
> 1
> 1
> 
> 
> I want to make classes:
> 0-0.2 A
> 0.21-0.4 B
> 0.41-0.6 C
> . and so on
> 
> So after classification I wil get:
> MD
> A
> A
> D
> B
> .
> .
> .
> and so on
> 
> Please could you give an advice to a newbie?
> Thanks a lot in advance..
> 
> Michael
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] classification tables

2006-08-07 Thread Gabor Grothendieck
Also check out CrossTable in the gmodels package.

Regarding your other question, assuming we have
tab<-table(x,y) as in Philippe's post, the fraction of
pairs in x and y that match can be calculated via
any of these:

  sum(x==y) / length(x)

  sum(diag(tab)) / sum(tab)

  library(e1071)
  classAgreement(tab) # tab from above

  sum(diag(prop.table(tab)))


On 8/7/06, Philippe Grosjean <[EMAIL PROTECTED]> wrote:
>
>  > x <- c(1,2,3,4,2,3,3,1,2,3)
>  > y <- c(2,1,3,4,1,3,3,2,2,3)
>  > table(x, y)
>y
> x   1 2 3 4
>   1 0 2 0 0
>   2 2 1 0 0
>   3 0 0 4 0
>   4 0 0 0 1
>  > ?table
>
> Best,
>
> Philippe Grosjean
>
> ..<°}))><
>  ) ) ) ) )
> ( ( ( ( (Prof. Philippe Grosjean
>  ) ) ) ) )
> ( ( ( ( (Numerical Ecology of Aquatic Systems
>  ) ) ) ) )   Mons-Hainaut University, Belgium
> ( ( ( ( (
> ..
>
> Taka Matzmoto wrote:
> > Dear R-users
> >
> > I have two vectors. One vector includes true values and the other vector has
> > estimated values. Values are all integers from 1 to 4.
> >
> > For example,
> >
> > x <- c(1,2,3,4,2,3,3,1,2,3)
> > y <- c(2,1,3,4,1,3,3,2,2,3)
> >
> > I would like to a classfication table x by y. With the table, I would like
> > to calculate what percentage is correct classfication.
> >
> > Which R function do I need to use for creating a 4 * 4 classification table?
> >
> > Thank you.
> >
> > Taka,
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] classification tables

2006-08-06 Thread Philippe Grosjean

 > x <- c(1,2,3,4,2,3,3,1,2,3)
 > y <- c(2,1,3,4,1,3,3,2,2,3)
 > table(x, y)
y
x   1 2 3 4
   1 0 2 0 0
   2 2 1 0 0
   3 0 0 4 0
   4 0 0 0 1
 > ?table

Best,

Philippe Grosjean

..<°}))><
  ) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
  ) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
  ) ) ) ) )   Mons-Hainaut University, Belgium
( ( ( ( (
..

Taka Matzmoto wrote:
> Dear R-users
> 
> I have two vectors. One vector includes true values and the other vector has 
> estimated values. Values are all integers from 1 to 4.
> 
> For example,
> 
> x <- c(1,2,3,4,2,3,3,1,2,3)
> y <- c(2,1,3,4,1,3,3,2,2,3)
> 
> I would like to a classfication table x by y. With the table, I would like 
> to calculate what percentage is correct classfication.
> 
> Which R function do I need to use for creating a 4 * 4 classification table?
> 
> Thank you.
> 
> Taka,
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classification trees and written conditions

2006-05-18 Thread Paul Smith
On 5/18/06, Carlos Ortega <[EMAIL PROTECTED]> wrote:
> Yes, that is right.
> The conditions on top of the branches refer to the left-hand side.

Thanks, Carlos. Then, it should be explicitly said in ? text.rpart.

Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Classification trees and written conditions

2006-05-18 Thread Paul Smith
On 5/18/06, Carlos Ortega <[EMAIL PROTECTED]> wrote:
> Are you referring to ?:
>  - library(tree)
> -  library(rpart)
>
> On 5/18/06, Paul Smith <[EMAIL PROTECTED]> wrote:
> >
> Dear All
>
> When drawing a classification tree with
>
> plot(mytree)
> text(mytree)
>
> the conditions are written just before the nodes branch. My question
> is: can one be certain that those conditions refer to the left-side
> branches? (The R documentation surprisingly lacks the information that
> I am asking for.)

Thanks, Carlos. I am referring to

ibrary(rpart)

Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Classification of Imbalanced Data

2006-02-06 Thread Liaw, Andy
The implementation of weighted RF is still on the to-do list for the
package.  Use Breiman & Cutler's Fortran code for now.

Andy

From: [EMAIL PROTECTED]
> 
> Hi,
> I'm looking to perform  a classification analysis on an 
> imbalanced data 
> set  using random Forest and I'd like to reproduce the 
> weighted random 
> forest analysis proposed in the Chen, Liaw & Breiman paper 
> "Using Random 
> Forest to Learn Imbalanced Data"; can I use the R package 
> randomForest 
> to perform such analysis? What is the easiest way to 
> accomplish this task?
> Thanks,
> Paolo Sonego
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Classification tree data structure

2005-10-18 Thread Hades

   
 That's most helpful.  Thank you very much for your time.
   Best regards,
   Maria
   On  Tue  Oct  18 17:18 , Prof Brian Ripley <[EMAIL PROTECTED]> se   nt:
   

 On Tue, 18 Oct 2005,  > Hi there,
 >
 >  I  am  growing  classification  trees using the 'tree' p add-on to 
R.
 >
 >  I  would like to convert the 'R' output to the SAS fo by  Salford 
Systems' commercial CART software in order to interfac e with some
 > other software.
 >
 > My question is:
 > How can I parse the R tree data structure in order t tree
 >  structure?  The 'tree' class has a member '$frame' wh the
 >  splits  at each node, but as far as I can see does no the
 >  daughter  nodes.  Is  this  information accessible throu interface 
to
 > class 'tree' or do I need to dive into the C code?
 The  daughter  nodes  of  n  are  2n  and  2n+1.  The print method,
 print.tree,  is  < parse the tree (and you can see  the pattern of 
the numbers from its result).
 --
 Brian D. Ripley, [EMAIL PROTECTED]
 Professor of Applied Statistics, [2]http ://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel: +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UK Fax: +44 1865 272595

   
   
References

   1. 3D"javascript:top.opencompose('[EMAIL PROTECTED]   2. 
file://localhost/tmp/3D"parse.pl?redirect=http%3A%2F%
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Classification tree data structure

2005-10-18 Thread Prof Brian Ripley
On Tue, 18 Oct 2005, Hades wrote:

> Hi there,
>
> I am growing classification trees using the 'tree' package add-on to R.
>
> I would like to convert the 'R' output to the SAS format used by Salford 
> Systems' commercial CART software in order to interface with some
> other software.
>
> My question is:

> How can I parse the R tree data structure in order to infer the tree 
> structure?  The 'tree' class has a member '$frame' which gives the 
> splits at each node, but as far as I can see does not specify the 
> daughter nodes.  Is this information accessible through the interface to 
> class 'tree' or do I need to dive into the C code?

The daughter nodes of n are 2n and 2n+1.  The print method, print.tree, is 
written entirely in R and shows you how to parse the tree (and you can see 
the pattern of the numbers from its result).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Classification of an image

2005-04-01 Thread Berton Gunter
Search CRAN!

-- Bert Gunter
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Poizot Emmanuel
> Sent: Thursday, March 31, 2005 11:59 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Classification of an image
> 
> Dear all,
> 
> I need to do a automatic classification of a raster file 
> (image) using 
> training samples. I would like to know if there is a library 
> able to do 
> such a work.
> 
> Thanks
> 
> 
> Emmanuel Poizot
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] classification using logistic regression

2004-12-27 Thread Kevin Wang
Hi,

On Mon, 27 Dec 2004, Rajdeep Das wrote:

> I would like to do classification using logistic regression. Which R package 
> can I use?

Have you tried glm() function?

> Also is there any package for feature selection for logistic regression based 
> method?

Do you mean model selection methods like forward selection?  If so, try
step()

HTH,

Kevin


Ko-Kang Kevin Wang
PhD Student
Centre for Mathematics and its Applications
Building 27, Room 1004
Mathematical Sciences Institute (MSI)
Australian National University
Canberra, ACT 0200
Australia

Homepage: http://wwwmaths.anu.edu.au/~wangk/
Ph (W): +61-2-6125-2431
Ph (H): +61-2-6125-7407
Ph (M): +61-40-451-8301

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] classification for huge datasets: SVM yields memory troubles

2004-12-14 Thread John Maindonald
While it is true that the large number of variables relative to
the number of observations restricts what can be inferred,
the situation is not as hopeless as Bert seems to suggest.
If it were, attempts at the analysis of expression array data
would be a waste to time.  Methods developed to that
general area may well be relevant to other data where the
number of variables is similarly far larger than the number
of observations.
See Ambroise, C. and Mclachlan, G.J. 2002.  Selection bias
in gene extraction on the basis of microarray gene-expression
data.  PNAS 99: 6562--6566.
This discusses some of the literature on the use of SVMs.
The selection bias that these authors discuss also affects
plots, even principal components and other ordination-base
plots where features have been selected on the basis of their
ability to separate into known groups.  I have draft versions
of code that addresses this selection bias as it affects the
plotting of graphs, which (along a paper that has been
submitted for inclusion in a conference proceedings) I am
happy to make available to anyone who wants to experiment.
Another good place to look, as a starting point, may be
Gordon Smyth's LIMMA User's Guide.  This can be a bit
hard to find. With limma installed, type help.start().
After some time a browser window should open. Click on
Packages | limma | Overview | LIMMA User's Guide (pdf)
John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 14 Dec 2004, at 10:09 PM, [EMAIL PROTECTED] wrote:
From: Berton Gunter <[EMAIL PROTECTED]>
Date: 14 December 2004 9:23:08 AM
To: "'Andreas'" <[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
Cc: Subject: RE: [R] classification for huge datasets: SVM yields 
memory troubles

" I have a matrix with 30 observations and roughly 3
variables, ... "
Comment: This is ** not ** a "huge" data set -- it is a tiny one with a
large number of covariates. The difference is: If it were truly huge, 
SVM
and/or LDA or ... might actually be able to produce useful results. 
With so
few data and so many variables, it is hard to see how any approach 
that one
uses is not simply a fancy random number generator.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] classification for huge datasets: SVM yields memory troubles

2004-12-13 Thread Berton Gunter
" I have a matrix with 30 observations and roughly 3 
variables, ... " 

Comment: This is ** not ** a "huge" data set -- it is a tiny one with a
large number of covariates. The difference is: If it were truly huge, SVM
and/or LDA or ... might actually be able to produce useful results. With so
few data and so many variables, it is hard to see how any approach that one
uses is not simply a fancy random number generator.


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Andreas
> Sent: Monday, December 13, 2004 12:56 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [R] classification for huge datasets: SVM yields 
> memory troubles
> 
> Hi,
> 
> I'm a beginner in the SVM-module but I have seen there is a 
> parameter called
> :
> cachesize #cache memory in MB (default 40)
> 
> please let me know if this parameter solved your problem, I 
> might get the
> same number of samples in the near future.
> 
> regards Andreas
> 
> "Christoph Lehmann" <[EMAIL PROTECTED]> schrieb im Newsbeitrag
> news:[EMAIL PROTECTED]
> > Hi
> > I have a matrix with 30 observations and roughly 3 
> variables, each
> > obs belongs to one of two groups. With svm and slda I get 
> into memory
> > troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs
> > fine. Are there any way to use the memory issue withe 
> SVM's? Or can you
> > recommend any other classification method for such huge datasets?
> >
> >
> > P.S. I run suse 9.1 on a 2G RAM PIV machine.
> > thanks for a hint
> >
> > Christoph
> >
> > __
> > [EMAIL PROTECTED] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
> 
> __
> [EMAIL PROTECTED] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] classification for huge datasets: SVM yields memory troubles

2004-12-13 Thread Andreas
Hi,

I'm a beginner in the SVM-module but I have seen there is a parameter called
:
cachesize #cache memory in MB (default 40)

please let me know if this parameter solved your problem, I might get the
same number of samples in the near future.

regards Andreas

"Christoph Lehmann" <[EMAIL PROTECTED]> schrieb im Newsbeitrag
news:[EMAIL PROTECTED]
> Hi
> I have a matrix with 30 observations and roughly 3 variables, each
> obs belongs to one of two groups. With svm and slda I get into memory
> troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs
> fine. Are there any way to use the memory issue withe SVM's? Or can you
> recommend any other classification method for such huge datasets?
>
>
> P.S. I run suse 9.1 on a 2G RAM PIV machine.
> thanks for a hint
>
> Christoph
>
> __
> [EMAIL PROTECTED] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] classification trees

2004-07-30 Thread Uwe Ligges
[EMAIL PROTECTED] wrote:
I'm working with S-Plus 6 in Windows.  Does anyone know if the prune.tree or
prune.misclass function automatically cross-validates or do you have to use
cv.tree if you want to do cross-validation?
This mailing list is about R. There is, e.g., the s-news lists for 
questions related to S-PLUS.

Uwe Ligges

Heather
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] classification and association rules in R

2004-04-19 Thread Jason Turner
Rong-En Fan wrote:

> By the way, I heard that there are some people developing a better
> search interface for R (or CRAN?). Where are the related information
> I can get?

Strangely enough, by following the "Search" link on CRAN.

Jason

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] classification with quantitative variables

2003-08-12 Thread Martin Maechler
> "OlivierM" == Martin Olivier <[EMAIL PROTECTED]>
> on Tue, 12 Aug 2003 14:45:58 + writes:

OlivierM> I want to conduct a cluster analysis with
OlivierM> quantitative variables.  More precisely, it
OlivierM> concerns binary and non-ordered categorical
OlivierM> variables. For such data, various similarity
OlivierM> measures have been proposed, such as the Jaccard
OlivierM> index or the simple matching index.

OlivierM> So, is there a package such as mva or multiv in
OlivierM> the case of quantitative variables?  Could you
OlivierM> indicate me reviews, papers or technical reports
OlivierM> dealing with this problem?

The package 'cluster' has a function daisy() that allows to work
with combinations of "all" kinds of variables.

Note that I think you mistyped 
"quantitative" where you meant
"qualitative".

Regards,
Martin Maechler <[EMAIL PROTECTED]> http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   <><

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help