Re: [R] levels

2020-07-19 Thread Chris Gordon-Smith
There is an interesting item on stringsAsFactors in this useR! 2020 session:

https://www.youtube.com/watch?v=X_eDHNVceCU=youtu.be

It's about 27 minutes in.

Chris Gordon-Smith

On 15/07/2020 17:16, Marc Schwartz via R-help wrote:
>> On Jul 15, 2020, at 4:31 AM, andy elprama  wrote:
>>
>> Dear R-users,
>>
>> Something strange happened within the command "levels"
>>
>> R version 3.6.1
>> name <- c("a","b","c")
>> values <- c(1,2,3)
>> data <- data.frame(name,values)
>> levels(data$name)
>> [1] "a" "b" "c"
>>
>> R version 4.0
>> name <- c("a","b","c")
>> values <- c(1,2,3)
>> data <- data.frame(name,values)
>> levels(data$name)
>> [1] NULL
>>
>> What is happening here?
>
> Hi,
>
> The default value for 'stringsAsFactors' for data.frame() and read.table() 
> changed from TRUE to FALSE in version 4.0.0, per the news() file:
>
> "R now uses a stringsAsFactors = FALSE default, and hence by default no 
> longer converts strings to factors in calls to data.frame() and read.table()."
>
>
> Using 4.0.2:
>
> data <- data.frame(name, values, stringsAsFactors = TRUE)
>
>> levels(data$name)
> [1] "a" "b" "c"
>
>
> If you see behavioral changes from one version of R to another, especially 
> major version increments, check the news() file.
>
> Regards,
>
> Marc Schwartz
>
>   
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels

2020-07-18 Thread andy elprama
Thanks, I will check it out.

Op za 18 jul. 2020 om 00:47 schreef Chris Gordon-Smith <
c.gordonsm...@gmail.com>:

> There is an interesting item on stringsAsFactors in this useR! 2020
> session:
>
> https://www.youtube.com/watch?v=X_eDHNVceCU=youtu.be
>
> It's about 27 minutes in.
>
> Chris Gordon-Smith
> On 15/07/2020 17:16, Marc Schwartz via R-help wrote:
>
> On Jul 15, 2020, at 4:31 AM, andy elprama  
>  wrote:
>
> Dear R-users,
>
> Something strange happened within the command "levels"
>
> R version 3.6.1
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] "a" "b" "c"
>
> R version 4.0
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] NULL
>
> What is happening here?
>
> Hi,
>
> The default value for 'stringsAsFactors' for data.frame() and read.table() 
> changed from TRUE to FALSE in version 4.0.0, per the news() file:
>
> "R now uses a stringsAsFactors = FALSE default, and hence by default no 
> longer converts strings to factors in calls to data.frame() and read.table()."
>
>
> Using 4.0.2:
>
> data <- data.frame(name, values, stringsAsFactors = TRUE)
>
>
> levels(data$name)
>
> [1] "a" "b" "c"
>
>
> If you see behavioral changes from one version of R to another, especially 
> major version increments, check the news() file.
>
> Regards,
>
> Marc Schwartz
>
>
> __r-h...@r-project.org mailing 
> list -- To UNSUBSCRIBE and more, 
> seehttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels

2020-07-15 Thread Erin Hodgess
Hi Andy:

I just checked in "options", and the following appears:


$stringsAsFactors

[1] FALSE


I think this might be it.


You may want to look at options() in R-3.6.1.


Thanks,

Erin


Erin Hodgess, PhD
mailto: erinm.hodg...@gmail.com


On Wed, Jul 15, 2020 at 9:45 AM andy elprama  wrote:

> Dear R-users,
>
> Something strange happened within the command "levels"
>
> R version 3.6.1
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] "a" "b" "c"
>
> R version 4.0
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] NULL
>
> What is happening here?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels

2020-07-15 Thread Marc Schwartz via R-help


> On Jul 15, 2020, at 4:31 AM, andy elprama  wrote:
> 
> Dear R-users,
> 
> Something strange happened within the command "levels"
> 
> R version 3.6.1
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] "a" "b" "c"
> 
> R version 4.0
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] NULL
> 
> What is happening here?


Hi,

The default value for 'stringsAsFactors' for data.frame() and read.table() 
changed from TRUE to FALSE in version 4.0.0, per the news() file:

"R now uses a stringsAsFactors = FALSE default, and hence by default no longer 
converts strings to factors in calls to data.frame() and read.table()."


Using 4.0.2:

data <- data.frame(name, values, stringsAsFactors = TRUE)

> levels(data$name)
[1] "a" "b" "c"


If you see behavioral changes from one version of R to another, especially 
major version increments, check the news() file.

Regards,

Marc Schwartz

 
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels

2020-07-15 Thread Jeff Newmiller
Read the NEWS about R4.0.0 [1] (search for stringsAsFactors), or read any of 
the many announcements in blogs and forums around the Internet.

[1] https://cran.r-project.org/doc/manuals/r-release/NEWS.html

On July 15, 2020 1:31:06 AM PDT, andy elprama  wrote:
>Dear R-users,
>
>Something strange happened within the command "levels"
>
>R version 3.6.1
>name <- c("a","b","c")
>values <- c(1,2,3)
>data <- data.frame(name,values)
>levels(data$name)
>[1] "a" "b" "c"
>
>R version 4.0
>name <- c("a","b","c")
>values <- c(1,2,3)
>data <- data.frame(name,values)
>levels(data$name)
>[1] NULL
>
>What is happening here?
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels

2020-07-15 Thread Eric Berger
Hi Andy,
I believe this is because R 4.0 has changed the default behavior of
data.frame().
Prior to 4.0, the default was stringsAsFactors=TRUE.
In 4.0, the default is stringsAsFactors=FALSE.

If you run your code in R 3.6.1 and change the command to

data <- data.frame(name,values,stringsAsFactors=FALSE)

you will get the same behavior as in R 4.0.

HTH,
Eric


On Wed, Jul 15, 2020 at 6:45 PM andy elprama  wrote:

> Dear R-users,
>
> Something strange happened within the command "levels"
>
> R version 3.6.1
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] "a" "b" "c"
>
> R version 4.0
> name <- c("a","b","c")
> values <- c(1,2,3)
> data <- data.frame(name,values)
> levels(data$name)
> [1] NULL
>
> What is happening here?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels

2020-07-15 Thread andy elprama
Dear R-users,

Something strange happened within the command "levels"

R version 3.6.1
name <- c("a","b","c")
values <- c(1,2,3)
data <- data.frame(name,values)
levels(data$name)
[1] "a" "b" "c"

R version 4.0
name <- c("a","b","c")
values <- c(1,2,3)
data <- data.frame(name,values)
levels(data$name)
[1] NULL

What is happening here?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels of a factor

2013-07-25 Thread Borja Rivier
That makes sense.

Thanks all!


2013/7/24 David Carlson dcarl...@tamu.edu

 Benchmark is probably a subset from a larger dataframe. R does
 not automatically remove empty levels but you can do it:

 set.seed(42)
 dataset - data.frame(Benchmark=factor(sample(LETTERS[1:26],
 50,
 replace=TRUE), levels=LETTERS[1:26]))
 levels(dataset$Benchmark)
 # [1] A B C D E F G H I J K L M N
 O P Q R S
 # [20] T U V W X Y Z
 dataset$Benchmark - factor(dataset$Benchmark)
 levels(dataset$Benchmark)
 # [1] A C D F G H J K L M N O P Q
 R S T V X
 # [20] Y Z

 There are times when you want to know if certain factor levels
 do not appear in a subset of the original data.

 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352

 Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Borja
 Rivier
 Sent: Wednesday, July 24, 2013 8:25 AM
 To: r-help@r-project.org
 Subject: [R] Levels of a factor

 Hi all,

 I am having a bit of trouble using the levels() function.
 I have a factor with many elements, and when I use the
 function levels() to
 extract the list of unique elements, some of the elements
 returned are not
 actually in the factor.

 For example I would have this:

  vector - dataset$Benchmark
  class(vector)
 [1] factor
  length(vector)
 [1] 35615
  vector2 - levels(vector)
  length(which(!(vector2 %in% vector)))
 [1] 235

 Does anyone know how this is possible?

 Many thanks!

 Borja

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
 code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Levels of a factor

2013-07-24 Thread Borja Rivier
Hi all,

I am having a bit of trouble using the levels() function.
I have a factor with many elements, and when I use the function levels() to
extract the list of unique elements, some of the elements returned are not
actually in the factor.

For example I would have this:

 vector - dataset$Benchmark
 class(vector)
[1] factor
 length(vector)
[1] 35615
 vector2 - levels(vector)
 length(which(!(vector2 %in% vector)))
[1] 235

Does anyone know how this is possible?

Many thanks!

Borja

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels of a factor

2013-07-24 Thread arun
Hi,
 vec1- factor(1:5,levels=1:10)
 vec1
#[1] 1 2 3 4 5
#Levels: 1 2 3 4 5 6 7 8 9 10


vec2-droplevels(vec1)
 levels(vec2)
#[1] 1 2 3 4 5
 vec2
#[1] 1 2 3 4 5
#Levels: 1 2 3 4 5
A.K.

Hi all, 

I am having a bit of trouble using the levels() function. 
I have a factor with many elements, and when I use the function 
levels() to extract the list of unique elements, some of the elements 
returned are not actually in the factor. 

For example I would have this: 

 vector - dataset$Benchmark 
 class(vector) 
[1] factor 
 length(vector) 
[1] 35615 
 vector2 - levels(vector) 
 length(which(!(vector2 %in% vector))) 
[1] 235 

Does anyone know how this is possible? 

Many thanks! 

Borja

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels of a factor

2013-07-24 Thread David Winsemius

On Jul 24, 2013, at 6:25 AM, Borja Rivier wrote:

 Hi all,
 
 I am having a bit of trouble using the levels() function.
 I have a factor with many elements, and when I use the function levels() to
 extract the list of unique elements, some of the elements returned are not
 actually in the factor.
 
 For example I would have this:
 
 vector - dataset$Benchmark
 class(vector)
 [1] factor
 length(vector)
 [1] 35615
 vector2 - levels(vector)
 length(which(!(vector2 %in% vector)))
 [1] 235
 
 Does anyone know how this is possible?
 

When you take a subset of a factor vector, the levels are not reduced to the 
unique values in the new vector. There is droplevels function that would need 
to be applied if you already have such a vector, and there is a drop argument 
that you need to set to TRUE in the `[.factors` call if you want to attack the 
problem at the source.

?`[.factor
?droplevels

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels of a factor

2013-07-24 Thread David Winsemius

On Jul 24, 2013, at 11:35 AM, David Winsemius wrote:

 
 On Jul 24, 2013, at 6:25 AM, Borja Rivier wrote:
 
 Hi all,
 
 I am having a bit of trouble using the levels() function.
 I have a factor with many elements, and when I use the function levels() to
 extract the list of unique elements, some of the elements returned are not
 actually in the factor.
 
 snipped
 
 When you take a subset of a factor vector, the levels are not reduced to the 
 unique values in the new vector. There is droplevels function that would need 
 to be applied if you already have such a vector, and there is a drop argument 
 that you need to set to TRUE in the `[.factors`

Make that `[.factor`

 call if you want to attack the problem at the source.
 
 ?`[.factor  # missing trailing back-tick

?`[.factor`


 ?droplevels
 
 -- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels of a factor

2013-07-24 Thread David Carlson
Benchmark is probably a subset from a larger dataframe. R does
not automatically remove empty levels but you can do it:

set.seed(42)
dataset - data.frame(Benchmark=factor(sample(LETTERS[1:26],
50, 
replace=TRUE), levels=LETTERS[1:26]))
levels(dataset$Benchmark)
# [1] A B C D E F G H I J K L M N
O P Q R S
# [20] T U V W X Y Z
dataset$Benchmark - factor(dataset$Benchmark)
levels(dataset$Benchmark)
# [1] A C D F G H J K L M N O P Q
R S T V X
# [20] Y Z

There are times when you want to know if certain factor levels
do not appear in a subset of the original data.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Borja
Rivier
Sent: Wednesday, July 24, 2013 8:25 AM
To: r-help@r-project.org
Subject: [R] Levels of a factor

Hi all,

I am having a bit of trouble using the levels() function.
I have a factor with many elements, and when I use the
function levels() to
extract the list of unique elements, some of the elements
returned are not
actually in the factor.

For example I would have this:

 vector - dataset$Benchmark
 class(vector)
[1] factor
 length(vector)
[1] 35615
 vector2 - levels(vector)
 length(which(!(vector2 %in% vector)))
[1] 235

Does anyone know how this is possible?

Many thanks!

Borja

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Levels and labels in factor

2013-04-10 Thread chong shiauyun
Hi R users, I have a imputed dataset of undefinedundefined cycles which I 
generated using StAta version undefinedundefined. Then I imported my data from 
Stata into R and I used a loop to run Mclust package in R. My observation 
starts with ID=2 (ID=1 has been excluded from the sample) and ends with 27950. 
Here is my code: 
library(mclust)
library(foreign)
dat-read.dta(file=tempeundefined.dta)
impdat-subset(dat,mim!=undefined)
datn-impdat
apply(datn,undefined,range)
fix(datn)
mdlnc-matrix(,undefinedundefined,undefined)
undefinedgetting the final output
n-dim(datn)[undefined]
datf-matrix(undefined,n,undefined)
for(i in undefined:undefinedundefined){
set.seed(undefinedundefinedundefinedundefinedundefinedundefined)
datnss - subset(datn, mim==i)
datnssMclust-Mclust(datnss[,undefined:undefinedundefined],model=VEV,G=undefined)
zv-datnssMclust$z
clas-datnssMclust$classification
zval-cbind(zv,clas))
colnames(zval)-c(Pundefined,Pundefined,Pundefined,class)
impd-datnss[,c(cid_undefinedundefinedundefineda,qlet,mim)]
fd-as.matrix(cbind(impd,zval))
datf[((undefinedundefinedundefinedundefinedundefined*(i-undefined)+undefined):(i*undefinedundefinedundefinedundefinedundefined)),]-fd
}
cid_731a is my observation ID and mim is the number of imputed dataset. When I 
write the output in dta format (Stata data format), the IDs were reorganised. 
ID is now started with 1,2,3,4,...13797 which is not right. Label values have 
been attached to the existing data. The variables were now in long format. I 
guess that is because the factor in R is always begins with 1,2,3,4,... Is 
there anyway I can fix this? Please help 
SY
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels and labels in factor

2013-04-10 Thread Blaser Nello
Perhaps write.dta(..., convert.factors=string) might help.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of chong shiauyun
Sent: Mittwoch, 10. April 2013 10:01
To: r-help@r-project.org
Subject: [R] Levels and labels in factor

Hi R users, I have a imputed dataset of undefinedundefined cycles which
I generated using StAta version undefinedundefined. Then I imported my
data from Stata into R and I used a loop to run Mclust package in R. My
observation starts with ID=2 (ID=1 has been excluded from the sample)
and ends with 27950. Here is my code: 
library(mclust)
library(foreign)
dat-read.dta(file=tempeundefined.dta)
impdat-subset(dat,mim!=undefined)
datn-impdat
apply(datn,undefined,range)
fix(datn)
mdlnc-matrix(,undefinedundefined,undefined)
undefinedgetting the final output
n-dim(datn)[undefined]
datf-matrix(undefined,n,undefined)
for(i in undefined:undefinedundefined){
set.seed(undefinedundefinedundefinedundefinedundefinedundefined)
datnss - subset(datn, mim==i)
datnssMclust-Mclust(datnss[,undefined:undefinedundefined],model=VEV,G
=undefined)
zv-datnssMclust$z
clas-datnssMclust$classification
zval-cbind(zv,clas))
colnames(zval)-c(Pundefined,Pundefined,Pundefined,class)
impd-datnss[,c(cid_undefinedundefinedundefineda,qlet,mim)]
fd-as.matrix(cbind(impd,zval))
datf[((undefinedundefinedundefinedundefinedundefined*(i-undefined)+undef
ined):(i*undefinedundefinedundefinedundefinedundefined)),]-fd
}
cid_731a is my observation ID and mim is the number of imputed dataset.
When I write the output in dta format (Stata data format), the IDs were
reorganised. ID is now started with 1,2,3,4,...13797 which is not right.
Label values have been attached to the existing data. The variables were
now in long format. I guess that is because the factor in R is always
begins with 1,2,3,4,... Is there anyway I can fix this? Please help 
SY
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels in new data fed to SVM

2013-01-10 Thread Uwe Ligges



On 08.01.2013 21:14, Claus O'Rourke wrote:

Hi all,
I've encountered an issue using svm (e1071) in the specific case of
supplying new data which may not have the full range of levels that
were present in the training data.

I've constructed this really primitive example to illustrate the point:


library(e1071)
training.data - data.frame(x = c(yellow,red,yellow,red), a = c(alpha,alpha,beta,beta), b = 
c(a, b, a, c))
my.model - svm(x ~ .,data=training.data)
test.data - data.frame(x = c(yellow,red), a = c(alpha,beta), b = c(a, 
b))
predict(my.model,test.data)

Error in predict.svm(my.model, test.data) :
   test data does not match model !


levels(test.data$b) - levels(training.data$b)
predict(my.model,test.data)

  1  2
yellowred
Levels: red yellow

In the first case test.data$b does not have the level c and this
results in the input data being rejected. I've debugged this down to
the point of model matrix creation in the SVM R code. Once I fill up
the levels in the test data with the levels from the original data,
then there is no problem at all.

Assuming my test data has to come from another source where the number
of category levels seen might not always be as large as those for the
original training data, is there a better way I should be handling
this?



You have to tell the factor about the possible levels, it does not 
necessarily contain examples.

That means:

levels(test.data$b) - C(a, b, c)
predict(my.model,test.data)

will help.

Best,
Uwe Ligges




Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels in new data fed to SVM

2013-01-10 Thread Claus O'Rourke
Thanks for clarifying!

On Thu, Jan 10, 2013 at 12:47 PM, Uwe Ligges
lig...@statistik.tu-dortmund.de wrote:


 On 08.01.2013 21:14, Claus O'Rourke wrote:

 Hi all,
 I've encountered an issue using svm (e1071) in the specific case of
 supplying new data which may not have the full range of levels that
 were present in the training data.

 I've constructed this really primitive example to illustrate the point:

 library(e1071)
 training.data - data.frame(x = c(yellow,red,yellow,red), a =
 c(alpha,alpha,beta,beta), b = c(a, b, a, c))
 my.model - svm(x ~ .,data=training.data)
 test.data - data.frame(x = c(yellow,red), a = c(alpha,beta), b =
 c(a, b))
 predict(my.model,test.data)

 Error in predict.svm(my.model, test.data) :
test data does not match model !


 levels(test.data$b) - levels(training.data$b)
 predict(my.model,test.data)

   1  2
 yellowred
 Levels: red yellow

 In the first case test.data$b does not have the level c and this
 results in the input data being rejected. I've debugged this down to
 the point of model matrix creation in the SVM R code. Once I fill up
 the levels in the test data with the levels from the original data,
 then there is no problem at all.

 Assuming my test data has to come from another source where the number
 of category levels seen might not always be as large as those for the
 original training data, is there a better way I should be handling
 this?



 You have to tell the factor about the possible levels, it does not
 necessarily contain examples.
 That means:

 levels(test.data$b) - C(a, b, c)
 predict(my.model,test.data)

 will help.

 Best,
 Uwe Ligges



 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Levels in new data fed to SVM

2013-01-08 Thread Claus O'Rourke
Hi all,
I've encountered an issue using svm (e1071) in the specific case of
supplying new data which may not have the full range of levels that
were present in the training data.

I've constructed this really primitive example to illustrate the point:

 library(e1071)
 training.data - data.frame(x = c(yellow,red,yellow,red), a = 
 c(alpha,alpha,beta,beta), b = c(a, b, a, c))
 my.model - svm(x ~ .,data=training.data)
 test.data - data.frame(x = c(yellow,red), a = c(alpha,beta), b = 
 c(a, b))
 predict(my.model,test.data)
Error in predict.svm(my.model, test.data) :
  test data does not match model !

 levels(test.data$b) - levels(training.data$b)
 predict(my.model,test.data)
 1  2
yellowred
Levels: red yellow

In the first case test.data$b does not have the level c and this
results in the input data being rejected. I've debugged this down to
the point of model matrix creation in the SVM R code. Once I fill up
the levels in the test data with the levels from the original data,
then there is no problem at all.

Assuming my test data has to come from another source where the number
of category levels seen might not always be as large as those for the
original training data, is there a better way I should be handling
this?

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels of comma separated data

2012-05-25 Thread Stefan
analyst41 at hotmail.com analyst41 at hotmail.com writes:

 
 I have a data set that has some comma separated strings in each row.
 I'd like to create a vector consisting of all distinct strings that
 occur.  The number of strings in each row may vary.
 
 Thanks for any help.
 
 
#
#
# Some data:
d - data.frame(id = 1:5, 
  text = c('one,two',
'two,three,three,four',
'one,three,three,five',
'five,five,five,five',
'one,two,three'),
  stringsAsFactors = FALSE
)
#
# 
# A function. I'm not a black belt at this, so there 
# are probably a more efficient way of writing this.
fcn - function(x){
  a - strsplit(x, ',') # Split the string by comma
  unique(a[[1]]) # Uniquify the vector
}
#
#
# Use the function with sapply.
sapply(d[,2], fcn)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels of comma separated data

2012-05-25 Thread analys...@hotmail.com


On May 25, 4:46 am, Stefan ste...@inizio.se wrote:
 analyst41 at hotmail.com analyst41 at hotmail.com writes:



  I have a data set that has some comma separated strings in each row.
  I'd like to create a vector consisting of all distinct strings that
  occur.  The number of strings in each row may vary.

  Thanks for any help.

 #
 #
 # Some data:
 d - data.frame(id = 1:5,
   text = c('one,two',
     'two,three,three,four',
     'one,three,three,five',
     'five,five,five,five',
     'one,two,three'),
   stringsAsFactors = FALSE
 )
 #
 #
 # A function. I'm not a black belt at this, so there
 # are probably a more efficient way of writing this.
 fcn - function(x){
   a - strsplit(x, ',') # Split the string by comma
   unique(a[[1]]) # Uniquify the vector}

 #
 #
 # Use the function with sapply.
 sapply(d[,2], fcn)



Thanks - but this solves a slightly different problem - it outputs the
unique values in each row.  I want a list of the unique values in the
whole data frame.

In this case the output should be a single vector =
 c(one,two,three,four,five).


 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels of comma separated data

2012-05-25 Thread analys...@hotmail.com


On May 25, 7:23 am, analys...@hotmail.com analys...@hotmail.com
wrote:
 On May 25, 4:46 am, Stefan ste...@inizio.se wrote:





  analyst41 at hotmail.com analyst41 at hotmail.com writes:

   I have a data set that has some comma separated strings in each row.
   I'd like to create a vector consisting of all distinct strings that
   occur.  The number of strings in each row may vary.

   Thanks for any help.

  #
  #
  # Some data:
  d - data.frame(id = 1:5,
    text = c('one,two',
      'two,three,three,four',
      'one,three,three,five',
      'five,five,five,five',
      'one,two,three'),
    stringsAsFactors = FALSE
  )
  #
  #
  # A function. I'm not a black belt at this, so there
  # are probably a more efficient way of writing this.
  fcn - function(x){
    a - strsplit(x, ',') # Split the string by comma
    unique(a[[1]]) # Uniquify the vector}

  #
  #
  # Use the function with sapply.
  sapply(d[,2], fcn)

 Thanks - but this solves a slightly different problem - it outputs the
 unique values in each row.  I want a list of the unique values in the
 whole data frame.

 In this case the output should be a single vector =
  c(one,two,three,four,five).


Actually I figured it out after I posted this:

 levels(as.factor(unlist(strsplit(d$text,','
[1] five  four  one   three two

Thanks for pointing me the right way.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels of comma separated data

2012-05-24 Thread analys...@hotmail.com
I have a data set that has some comma separated strings in each row.
I'd like to create a vector consisting of all distinct strings that
occur.  The number of strings in each row may vary.

Thanks for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Levels of interaction terms between numeric and factor in glm

2010-10-21 Thread achilles tsoumanis

Hello everyone, 

I have been working on a model to describe the counts of a certain event. I
use glm function with Poisson family and log link. the model is:

model-glm(event~week+year+week:var1+year:var1+year:var2, family=poisson), 

where week and season are factor variables with 52 and 7 levels respectively
and var1 and var2 are numerical variables. 

The model seems to describe well the actual counts of events and it is
reasonable in its structure. 

My problem is how to interpret the coefficients. When I use anova(model,
test=Chisq), or summary(model)  I see that the degrees of freedom in
variables week and year are 51 and 6 respectively, which makes sense since
the first level is used as a reference. 
The problem is in the interaction terms: week:var1 has 52 degrees of
freedom, year:var1 has 6 and year:var2 has 7 degrees of freedom. 

I am able to interpret the results in week and year coefficients, but not in
the rest of the terms. Why are there differences in the degrees of freedom
in the interaction terms? How could the results be explained?

Any assistance would be valuable.


Thank you in advance


Achilles Tsoumanis
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Levels-of-interaction-terms-between-numeric-and-factor-in-glm-tp3005967p3005967.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels in returned data.frame after subset

2010-09-05 Thread Ulrik Stervbo
Thanks for the replies!  Obviously I must have used to wrong search
terms - sorry.

@greg: I care about the levels after the subset, because if they are
not dropped, then they still appear in the subsequent heatmap I make
with ggplot (with my read data-set of course). Admittedly I am quite
green, and may do things in a rather silly way - but it works (at
least I think it does)



On 4 September 2010 15:41, Ista Zahn iz...@psych.rochester.edu wrote:
 Hi Ulrik

 On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo ulrik.ster...@gmail.com 
 wrote:
 Dear List,

 When I subset a data.frame, the levels are not re-adjusted (see
 example). Why is this? Am I missing out on some basic stuff here?

 Only that this issue has come up many times before, and that this list
 is archived and searchable. Try

 RSiteSearch(subset drop levels, restrict = c(Rhelp10, Rhelp08, 
 Rhelp02))


 -Ista


 Thanks
 Ulrik


 m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = 
 c(91,99, 74))
 dim(m)
 [1] 3 3

 levels(m$gender)
 [1] F M

 s - subset(m, m$gender == M)
 dim(s)
 [1] 2 3

 levels(s$gender)
 [1] F M

 cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
 dim(s)
 [1] 2 3

 levels(s$gender)
 [1] M

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Levels in returned data.frame after subset

2010-09-04 Thread Ulrik Stervbo
Dear List,

When I subset a data.frame, the levels are not re-adjusted (see
example). Why is this? Am I missing out on some basic stuff here?

Thanks
Ulrik


 m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = 
 c(91,99, 74))
 dim(m)
[1] 3 3

 levels(m$gender)
[1] F M

 s - subset(m, m$gender == M)
 dim(s)
[1] 2 3

 levels(s$gender)
[1] F M

 cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
 dim(s)
[1] 2 3

 levels(s$gender)
[1] M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels in returned data.frame after subset

2010-09-04 Thread Ista Zahn
Hi Ulrik

On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo ulrik.ster...@gmail.com wrote:
 Dear List,

 When I subset a data.frame, the levels are not re-adjusted (see
 example). Why is this? Am I missing out on some basic stuff here?

Only that this issue has come up many times before, and that this list
is archived and searchable. Try

RSiteSearch(subset drop levels, restrict = c(Rhelp10, Rhelp08, Rhelp02))


-Ista


 Thanks
 Ulrik


 m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = 
 c(91,99, 74))
 dim(m)
 [1] 3 3

 levels(m$gender)
 [1] F M

 s - subset(m, m$gender == M)
 dim(s)
 [1] 2 3

 levels(s$gender)
 [1] F M

 cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
 dim(s)
 [1] 2 3

 levels(s$gender)
 [1] M

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Levels in returned data.frame after subset

2010-09-04 Thread Greg Snow
The advantage of computers is that they do exactly what they are told.
The disadvantage of computers is that they do exactly what they are told.

R is a set of instructions to the computer, those instructions are a 
combinations from the original programmers and from you.  Who should make 
important decisions about the structure of your data?  A group of (admittedly 
brilliant) programmers who have never seen your data nor know what questions 
you are trying to answer, or you (who hopefully knows more about your data and 
questions)?

I don't claim to be more intelligent/knowledgable than the programmers of R, 
but I am grateful that they have/had sufficient humility to allow for the 
possibility that I may actually know something about my data and questions that 
they don't (or maybe they are just to lazy to do my job for me, but that is 
also appropriate).

In your example below, why do you care what the levels of gender are after the 
subset?  Why waste time/effort dropping the levels for a column that by 
definition only has one value?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ulrik Stervbo
 Sent: Saturday, September 04, 2010 6:53 AM
 To: r-help@r-project.org
 Subject: [R] Levels in returned data.frame after subset
 
 Dear List,
 
 When I subset a data.frame, the levels are not re-adjusted (see
 example). Why is this? Am I missing out on some basic stuff here?
 
 Thanks
 Ulrik
 
 
  m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt
 = c(91,99, 74))
  dim(m)
 [1] 3 3
 
  levels(m$gender)
 [1] F M
 
  s - subset(m, m$gender == M)
  dim(s)
 [1] 2 3
 
  levels(s$gender)
 [1] F M
 
  cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor)
  dim(s)
 [1] 2 3
 
  levels(s$gender)
 [1] M
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels update

2008-12-05 Thread Antje

Hello,

I hope this question is not too stupid. I would like to know how to update 
levels after subsetting data from a data.frame.


df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4))
names(df) - c(X1,X2,X3)

my.sub - subset(df, X1 == a | X1 == b)
levels(my.sub$X1)

# still gives me a,b,c, though the subset does not contain entries with 
c anymore


I guess, the solution is rather simple, but I cannot find it.

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels update

2008-12-05 Thread jim holtman
try this:

 df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4))
 names(df) - c(X1,X2,X3)

 my.sub - subset(df, X1 == a | X1 == b)
 levels(my.sub$X1)
[1] a b c
 my.sub$X1 - factor(my.sub$X1)
 levels(my.sub$X1)
[1] a b



On Fri, Dec 5, 2008 at 7:50 AM, Antje [EMAIL PROTECTED] wrote:
 Hello,

 I hope this question is not too stupid. I would like to know how to update
 levels after subsetting data from a data.frame.

 df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4))
 names(df) - c(X1,X2,X3)

 my.sub - subset(df, X1 == a | X1 == b)
 levels(my.sub$X1)

 # still gives me a,b,c, though the subset does not contain entries
 with c anymore

 I guess, the solution is rather simple, but I cannot find it.

 Antje

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels update

2008-12-05 Thread Erich Neuwirth
I do the following for a subsetted dataframe:

cleanfactors - function(mydf){
  outdf-mydf
  for (i in 1:dim(mydf)[2]){
if (is.factor(mydf[,i]))
  outdf[,i]-factor(mydf[,i])
  }
outdf
}


Antje wrote:
 Hello,
 
 I hope this question is not too stupid. I would like to know how to
 update levels after subsetting data from a data.frame.
 
 df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8),
 c(9,1,2,3,4))
 names(df) - c(X1,X2,X3)
 
 my.sub - subset(df, X1 == a | X1 == b)
 levels(my.sub$X1)
 
 # still gives me a,b,c, though the subset does not contain entries
 with c anymore
 
 I guess, the solution is rather simple, but I cannot find it.
 
 Antje
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 No virus found in this incoming message.
 Checked by AVG - http://www.avg.com 
 Version: 8.0.176 / Virus Database: 270.9.14/1831 - Release Date: 12/4/2008 
 9:55 PM
 

-- 
Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels update

2008-12-05 Thread hadley wickham
On Fri, Dec 5, 2008 at 6:50 AM, Antje [EMAIL PROTECTED] wrote:
 Hello,

 I hope this question is not too stupid. I would like to know how to update
 levels after subsetting data from a data.frame.

 df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4))
 names(df) - c(X1,X2,X3)

 my.sub - subset(df, X1 == a | X1 == b)
 levels(my.sub$X1)

 # still gives me a,b,c, though the subset does not contain entries
 with c anymore

 I guess, the solution is rather simple, but I cannot find it.

You might find it easier just to work with character vectors:

options(stringsAsFactors = FALSE)

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels update

2008-12-05 Thread Prof Brian Ripley

On Fri, 5 Dec 2008, jim holtman wrote:


try this:


df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4))
names(df) - c(X1,X2,X3)

my.sub - subset(df, X1 == a | X1 == b)
levels(my.sub$X1)

[1] a b c

my.sub$X1 - factor(my.sub$X1)


I find

my.sub$X1 - my.sub$X1[drop=TRUE]

a lot more self-explanatory.  See ?[.factor.  However, if you find 
yourself wanting to do this, ask why you have a factor (rather than a 
character vector) in the first place.




levels(my.sub$X1)

[1] a b





On Fri, Dec 5, 2008 at 7:50 AM, Antje [EMAIL PROTECTED] wrote:

Hello,

I hope this question is not too stupid. I would like to know how to update
levels after subsetting data from a data.frame.

df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), c(9,1,2,3,4))
names(df) - c(X1,X2,X3)

my.sub - subset(df, X1 == a | X1 == b)
levels(my.sub$X1)

# still gives me a,b,c, though the subset does not contain entries
with c anymore

I guess, the solution is rather simple, but I cannot find it.

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels update

2008-12-05 Thread Antje

Thanks a lot!!!
the drop thing was exactly what I was looking for (I already used it some 
time ago but forgot about it).


Thanks to everybody else too.

Antje


Prof Brian Ripley schrieb:

On Fri, 5 Dec 2008, jim holtman wrote:


try this:

df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), 
c(9,1,2,3,4))

names(df) - c(X1,X2,X3)

my.sub - subset(df, X1 == a | X1 == b)
levels(my.sub$X1)

[1] a b c

my.sub$X1 - factor(my.sub$X1)


I find

my.sub$X1 - my.sub$X1[drop=TRUE]

a lot more self-explanatory.  See ?[.factor.  However, if you find 
yourself wanting to do this, ask why you have a factor (rather than a 
character vector) in the first place.




levels(my.sub$X1)

[1] a b





On Fri, Dec 5, 2008 at 7:50 AM, Antje [EMAIL PROTECTED] wrote:

Hello,

I hope this question is not too stupid. I would like to know how to 
update

levels after subsetting data from a data.frame.

df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), 
c(9,1,2,3,4))

names(df) - c(X1,X2,X3)

my.sub - subset(df, X1 == a | X1 == b)
levels(my.sub$X1)

# still gives me a,b,c, though the subset does not contain entries
with c anymore

I guess, the solution is rather simple, but I cannot find it.

Antje

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels update

2008-12-05 Thread Richard . Cotton
 I hope this question is not too stupid. I would like to know how to 
update 
 levels after subsetting data from a data.frame.
 
 df - data.frame(factor(c(a,a,c,b,b)), c(4,5,6,7,8), 
c(9,1,2,3,4))
 names(df) - c(X1,X2,X3)
 
 my.sub - subset(df, X1 == a | X1 == b)
 levels(my.sub$X1)
 
 # still gives me a,b,c, though the subset does not contain entries 
with 
 c anymore

Two questions in one afternon; aren't I good to you!

levels(my.sub$X1[,drop=TRUE])
[1] a b
levels(factor(my.sub$X1))
[1] a b

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels values of cut()

2008-08-09 Thread baptiste auguie

Dear list,

 I have the following example, from which I am hoping to retrieve  
numeric values of the factor levels (that is, without the brackets):




x - seq(1, 15, length=100)
y - sin(x)

my.cuts - cut(which(abs(y)  1e-1), 3)
levels(my.cuts)


hist() does not suit me for this, as it does not necessarily respect  
the number of breaks.


getAnywhere hasn't got me very far: I cannot seem to find a readable  
code for the built-in cut function in the base library. I think  
getMethod should do it but I don't understand the arguments to pass.


Any pointers appreciated,

Thanks,

baptiste



_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels values of cut()

2008-08-09 Thread Stephen Tucker
Not sure what you're looking for, but does this help?

Extending your code,

 library(gsubfn)
 t(strapply(levels(my.cuts),([0-9.]+),([0-9.]+),
+  function(...) as.numeric(c(...)),backref=-2,simplify=TRUE))
 [,1] [,2]
[1,] 15.9 38.3
[2,] 38.3 60.7
[3,] 60.7 83.1



- Original Message 
From: baptiste auguie [EMAIL PROTECTED]
To: r-help@r-project.org
Sent: Saturday, August 9, 2008 1:51:01 AM
Subject: [R] levels values of cut()

Dear list,

  I have the following example, from which I am hoping to retrieve  
numeric values of the factor levels (that is, without the brackets):


 x - seq(1, 15, length=100)
 y - sin(x)

 my.cuts - cut(which(abs(y)  1e-1), 3)
 levels(my.cuts)

hist() does not suit me for this, as it does not necessarily respect  
the number of breaks.

getAnywhere hasn't got me very far: I cannot seem to find a readable  
code for the built-in cut function in the base library. I think  
getMethod should do it but I don't understand the arguments to pass.

Any pointers appreciated,

Thanks,

baptiste



_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels values of cut()

2008-08-09 Thread Prof Brian Ripley

On Sat, 9 Aug 2008, baptiste auguie wrote:


Dear list,

I have the following example, from which I am hoping to retrieve numeric 
values of the factor levels (that is, without the brackets):




x - seq(1, 15, length=100)
y - sin(x)

my.cuts - cut(which(abs(y)  1e-1), 3)
levels(my.cuts)


hist() does not suit me for this, as it does not necessarily respect the 
number of breaks.


getAnywhere hasn't got me very far: I cannot seem to find a readable code for 
the built-in cut function in the base library. I think getMethod should do it 
but I don't understand the arguments to pass.


Not getMethod (that's for S4 methods).  Just type cut.default at the R 
prompt.


However, try

example(cut)
foo - levels(cut(aaa, 3))
lims - matrix(nrow=length(foo), ncol=2)
lims[,1] - as.numeric( sub(\\((.+),.*, \\1, foo) )
lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, foo) )

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels values of cut()

2008-08-09 Thread baptiste auguie
Thank you all for the precious tips. For memory I've made the  
following wrapper function for this. I wonder whether a short note on  
these regular expressions could be useful on the help page of cut().




cutIntervals - function(x, ...){
dotArgs - unlist(c(...))
	if( any(names(dotArgs) == labels)) stop(labels cannot be  
specified,  use cut instead)


cut.fact - levels(cut(x,labels=NULL, ...))
# tip from Brian Ripley
lims - matrix(nrow=length(cut.fact), ncol=2)
lims[,1] - as.numeric( sub(\\((.+),.*, \\1, cut.fact) )
lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, cut.fact) )
# alternatively (Stephen Tucker)
 # library(gsubfn)
 # lims - t(strapply(cut.fact,([0-9.]+),([0-9.]+),
 #  function(...) 
as.numeric(c(...)),backref=-2,simplify=TRUE))
lims
}

cutIntervals(1:5, 3)



Many thanks,

baptiste

On 9 Aug 2008, at 11:12, Prof Brian Ripley wrote:


On Sat, 9 Aug 2008, baptiste auguie wrote:


Dear list,

I have the following example, from which I am hoping to retrieve  
numeric values of the factor levels (that is, without the brackets):



x - seq(1, 15, length=100)
y - sin(x)
my.cuts - cut(which(abs(y)  1e-1), 3)
levels(my.cuts)


hist() does not suit me for this, as it does not necessarily  
respect the number of breaks.


getAnywhere hasn't got me very far: I cannot seem to find a  
readable code for the built-in cut function in the base library. I  
think getMethod should do it but I don't understand the arguments  
to pass.


Not getMethod (that's for S4 methods).  Just type cut.default at the  
R prompt.


However, try

example(cut)
foo - levels(cut(aaa, 3))
lims - matrix(nrow=length(foo), ncol=2)
lims[,1] - as.numeric( sub(\\((.+),.*, \\1, foo) )
lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, foo) )

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels values of cut()

2008-08-09 Thread Prof Brian Ripley

On Sat, 9 Aug 2008, baptiste auguie wrote:

Thank you all for the precious tips. For memory I've made the following 
wrapper function for this. I wonder whether a short note on these regular 
expressions could be useful on the help page of cut().


Already there in R-devel 





cutIntervals - function(x, ...){
dotArgs - unlist(c(...))
	if( any(names(dotArgs) == labels)) stop(labels cannot be 
specified,  use cut instead)


cut.fact - levels(cut(x,labels=NULL, ...))
# tip from Brian Ripley
lims - matrix(nrow=length(cut.fact), ncol=2)
lims[,1] - as.numeric( sub(\\((.+),.*, \\1, cut.fact) )
lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, cut.fact) )
# alternatively (Stephen Tucker)
 # library(gsubfn)
 # lims - t(strapply(cut.fact,([0-9.]+),([0-9.]+),
	 # 		function(...) 
as.numeric(c(...)),backref=-2,simplify=TRUE))

lims
}

cutIntervals(1:5, 3)



Many thanks,

baptiste

On 9 Aug 2008, at 11:12, Prof Brian Ripley wrote:


On Sat, 9 Aug 2008, baptiste auguie wrote:


Dear list,

I have the following example, from which I am hoping to retrieve numeric 
values of the factor levels (that is, without the brackets):



x - seq(1, 15, length=100)
y - sin(x)
my.cuts - cut(which(abs(y)  1e-1), 3)
levels(my.cuts)


hist() does not suit me for this, as it does not necessarily respect the 
number of breaks.


getAnywhere hasn't got me very far: I cannot seem to find a readable code 
for the built-in cut function in the base library. I think getMethod 
should do it but I don't understand the arguments to pass.


Not getMethod (that's for S4 methods).  Just type cut.default at the R 
prompt.


However, try

example(cut)
foo - levels(cut(aaa, 3))
lims - matrix(nrow=length(foo), ncol=2)
lims[,1] - as.numeric( sub(\\((.+),.*, \\1, foo) )
lims[,2] - as.numeric( sub([^,]*,([^]]*)\\], \\1, foo) )

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Levels error after printing

2008-05-28 Thread Gundala Viswanath
Hi all,

After running this code:

__BEGIN__
dat - read.table(gene_prob.txt, sep = \t)
n - length(dat$V1)
print(n)
print(dat$V1)
__END__

With this input in gene_prob.txt

__INPUT__
HFE 0.00107517988586552
NF1 0.000744355305599206
PML 0.000661649160532628
TCF30.000661649160532628
NF2 0.000578943015466049
GNAS0.000578943015466049
GGA20.000578943015466049
.

I get this print out.

..
[8541] LOC552889 GPR15 SLC2A11   GRIP2 SGEF
[8546] PIK3IP1   RPS27 AQP7
8548 Levels: 3.8-1 A2M A4GALT A4GNT AAAS AAK1 AAMP AANAT AARSD1 AASS
... hCG_1730474

What's the meaning of the last line? Is it an error?
How can I fix it?

-- 
Gundala Viswanath

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels in dataframes

2008-04-22 Thread Georg Ehret
Dear R community, I wish to ask a short question concerning factor-data
in dataframes: When I subset the data and get rid of all data for one level,
I still retain the level name (obtained by levels(dataframe$variablename) ).
Is there a convenient way to get rid of the levels for which all data has
been deleted?
Thank you and wishing you an excellent day!
Georg.

Georg Ehret
Johns Hopkins
Baltimore, US

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels in dataframes

2008-04-22 Thread Peter Alspach
Georg

One way is to call factor() on the subsetted object.

 georg - factor(LETTERS[1:4])
 summary(georg)
A B C D 
1 1 1 1 
 georg - georg[georg!='A']
 summary(georg) # the level is still there
A B C D 
0 1 1 1 
 georg - factor(georg)
 summary(georg) # now it is gone
B C D 
1 1 1 

HTH 

Peter Alspach
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Georg Ehret
 Sent: Wednesday, 23 April 2008 8:58 a.m.
 To: r-help
 Subject: [R] levels in dataframes
 
 Dear R community, I wish to ask a short question 
 concerning factor-data
 in dataframes: When I subset the data and get rid of all data 
 for one level, I still retain the level name (obtained by 
 levels(dataframe$variablename) ).
 Is there a convenient way to get rid of the levels for which 
 all data has been deleted?
 Thank you and wishing you an excellent day!
 Georg.
 
 Georg Ehret
 Johns Hopkins
 Baltimore, US
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

The contents of this e-mail are privileged and/or confidential to the named
 recipient and are not to be used by any other person and/or organisation.
 If you have received this e-mail in error, please notify the sender and delete
 all material pertaining to this e-mail.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] levels() function for a vector

2008-03-11 Thread Richard Pearson
Karen

levels returns the levels attribute of a variable, and a vector has no 
such attribute. This is usually used with a factor, e.g.

  temp - c(3, 5, 5, NA)
  levels(factor(temp))
[1] 3 5

Best wishes

Richard


Chang Liu wrote:
 Hello:
  
 I'm trying to use levels function, but I don't know why it's returning NULL. 
 For example:
  
   
 temp[1]  3  5  5 NA levels(temp)NULL
 
  
 Also, I've tried: 
   
 list(temp)[[1]][1] 3 5 5 NA 
 levels(list(temp))NULL
 
 Is there a specific requirement on the parameter?
  
 Karen
  
 _


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] levels() function for a vector

2008-03-10 Thread Chang Liu

Hello:
 
I'm trying to use levels function, but I don't know why it's returning NULL. 
For example:
 
 temp[1]  3  5  5 NA levels(temp)NULL
 
Also, I've tried: 
 list(temp)[[1]][1] 3 5 5 NA 
 levels(list(temp))NULL
Is there a specific requirement on the parameter?
 
Karen
 
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.