[R] ROCR data input

2010-08-17 Thread wookie

Hi there,

I'm having some difficulty with the ROCR package. I've installed it fine,
and the sample data works (ROCR.simple), however when I try to load my own
data it complains that there is an error in prediction as the number of
classes is not equal to 2. I read the data from a text file which contains
one column of probabilities and one column of binary 0 and 1. I then put it
into a data file called prob, and try

 pred - prediction(prob$probabilities, prob$label)

which is when it comes up with an error. I think maybe I'm not importing it
properly?

probabilities
0.0
0.00282
0.1
0.04990
0.9


label
0
0
0
1
0


Thanks for any assistance 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/ROCR-data-input-tp2328117p2328117.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR data input

2010-08-17 Thread wookie

Sorry, I'm new to R, and relatively new to statistics too so I'm still a bit
unclear. The values in the post were only a sample of around 8400 rows. The
label has 1 or 0 (I thought this was the two classes needed). Each label row
has an equivalent probability. This is the data that I output from the
logistic regression analysis, but it is seemingly not the right format for
ROC curve analysis. There is a difference in how R displays the data, when I
type ROCR.simple it is in the format:

$predictions
  [1] 0.612547843 0.364270971 0.432136142...
$labels
  [1] 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 ... etc.

whereas mine is in columns, e.g.

ID, labels, probs
8930 0 0.00070
8931 0 0.00036
8932 1 0.0
8933 1 0.2
8934 0 0.1
etc.

That is why I think it is a format issue, but being new to R, I'm not sure
what I need to do to rectify it.
 I have attached the text file if this helps.

Thank you for your time,


http://r.789695.n4.nabble.com/file/n2328240/prob.txt prob.txt 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/ROCR-data-input-tp2328117p2328240.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROCR data input

2010-08-17 Thread Claudia Beleites

Anneley,


Sorry, I'm new to R, and relatively new to statistics too so I'm still a bit
unclear.

That's OK - everyone started some time and was new.

However, it is really important to post a reproducible example here. If you are 
so new that you don't know how to do that exactly, you should probably write 
into your email that you tried but don't know how to do. Your chances to get an 
answer will probably increase quite a bit by that.


Also, I'd suggest you to go thoroughly through some introduction for R. There's 
a lot available on cran, the web and in many libraries.

E.g. a collection divided into more or less than 100 pages
http://cran.r-project.org/other-docs.html
r-project.org also has links to books, and to non-english material.


The values in the post were only a sample of around 8400 rows. The
label has 1 or 0 (I thought this was the two classes needed).

yes.


Each label row
has an equivalent probability. This is the data that I output from the
logistic regression analysis, but it is seemingly not the right format for
ROC curve analysis.

It is the right format.


There is a difference in how R displays the data, when I
type ROCR.simple it is in the format:

$predictions
  [1] 0.612547843 0.364270971 0.432136142...
$labels
  [1] 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 ... etc.

whereas mine is in columns, e.g.

ID, labels, probs
8930 0 0.00070
8931 0 0.00036
8932 1 0.0
8933 1 0.2
8934 0 0.1
etc.

Look up the difference between list and data.frame.
Also: you can find out a lot about variables with class () and str (), and maybe 
 summary ()



That is why I think it is a format issue, but being new to R, I'm not sure
what I need to do to rectify it.
 I have attached the text file if this helps.
No, we don't need it to reproduce your error - I think it's all more or less 
about typos:


 prediction(prob$probabilities, prob$label)
Error in prediction(prob$probabilities, prob$label) :
  Number of classes is not equal to 2.
ROCR currently supports only evaluation of binary classification tasks.

Now, if you need to trace down such an error, it is really a good idea to check 
what the arguments are that you hand over:


As many errors come from typos, it is a good idea to copy and paste literally 
what you put into the function:

 prob$probabilities
[1] prob$probabilities
 prob$label
[1] prob$label

See the difference between what your argument evaluates to and
what you thought to hand over?

Does this get you on the right track? I don't want to be nasty, but if you 
discover the mistakes yourself, you'll be much faster finding such things next time.


So: try with these hints, and if it doesn't work, you can ask again.

HTH,

Claudia
--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.