Re: [R] hourly prediction time series

2016-02-05 Thread Sean Porter
Try the auto.arima function in the forecast package..

Regards,
 
DR SEAN PORTER
Scientist

South African Association for Marine Biological Research
Direct Tel: +27 (31) 328 8169   Fax: +27 (31) 328 8188
E-mail: spor...@ori.org.za Web: www.saambr.org.za
1 King Shaka Avenue, Point, Durban 4001 KwaZulu-Natal South Africa
PO Box 10712, Marine Parade 4056 KwaZulu-Natal South Africa

 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of AURORA GONZALEZ 
VIDAL
Sent: 05 February 2016 10:50 AM
To: r-help@r-project.org
Subject: [R] hourly prediction time series

Dear R users,

I am fronting my firts time series problem. I have hourly temperature data for 
3 years (from 01/01/2013 to 5/02/2016). I would like to use those in order to 
PREDICT TEMPERATURE OF THE NEXT HOURS according to the observations.

A subset of the data look like this:

date <- rep(seq(as.Date("14-01-01"), as.Date("14-01-03"), by="days"), 24) hour 
<-rep(c(paste0("0",0:9,":00:00"), paste0(10:23,":00:00")),3) temperature <- 
c(6.1, 6.8, 6.5, 7.2, 7.1, 7.9, 5.9, 6.8, 7.7, 9.5, 12.6,
 14.0, 15.9, 17.3, 17.5, 17.2, 15.0, 14.1, 13.1, 11.7, 10.9,
 11.0, 11.6, 11.0, 11.2, 11.0, 11.0, 11.4, 12.2, 13.7, 12.9,
 12.9, 12.8, 13.4, 13.9, 14.9, 16.6, 16.0, 15.2, 15.4, 14.7,
 14.6, 13.3, 13.0, 13.8, 13.1, 12.0, 11.9, 11.8, 11.6, 11.0,
 11.2, 11.6, 10.6, 9.5, 9.8, 9.9, 11.7, 15.3, 18.6, 20.7,
 22.2, 22.2, 20.8, 20.2, 18.3, 15.6, 13.6, 12.8, 13.1, 13.7, 
14.7)

dfExample <- data.frame(date, hour, temperature) 

So as to plot 3 years ( from 01/01/2013 to 31/12/2015) I use this code and 
obtained the attached picture. It is observed seasonality.

tempdf4 <- ts(df4$temperature, frequency=365*24*3)
plot.ts(tempdf4)

Am I doing it well? Could you help me with any information in this type of 
problem (mainly with the prediction). For example, if I want to use Arima, 
according with my data structure, what are the arguments of the funcion??

fit=Arima(df4$temperature, seasonal=list(order=c(xxx,xxx,xxx),period=xxx)
plot(forecast(fit))

I could use also some predictions from other source that I am collecting since 
January, 2016. But I would prefer to understand the simplest way to solve the 
problem and then, progressively, understand more complex approaches.

Thank you very much for any kind of help.


--
Aurora González Vidal
Phd student in Data Analytics for Energy Efficiency

Faculty of Computer Sciences
University of Murcia

@. aurora.gonzal...@um.es
T. 868 88 7866
www.um.es/ae

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] monte carlo simulations in permanova in vegan package

2015-10-30 Thread Sean Porter
Thank you Jari,

It seems now that my question is morphing more into a statistical one, and
perhaps not appropriate for R-help list, so apologies. Yes we are talking
about the latest versions of the vegan and permute packages. 

When there are an insufficient number of permutations available due to low
sample sizes apparently an alternative is to use the result given in
Anderson & Robinson (2003) regarding the asymptotic permutation of the
numerator (or denominator) of the test statistic under permutation. And I
quote from Anderson et al. 2008 "It is demonstrated that each of the sums of
squares has, under permutation, an asymptotic distribution that is a linear
form in chi-square variables, where the coefficients are actually the
eigenvalues from a PCO of the resemblance matrix. Thus, chi-square variables
can be drawn randomly and independently, using Monte Carlo sampling, and
these can be combined with the eigenvalues to construct the asymptotic
permutation distribution for each of the numerator and denominator and,
thus,  for the entire pseudo-F statistic, in the event that too few actual
unique permutations exist."

Anderson, Gorley & Clarke. 2008. PERMANOVA+ for PRIMER: Guide to software
and statistical models.
Anderson & Robinson 2003. Generalised discriminant analysis based on
distances. Australian and New Zealand Journal of Statistics. 45: 301-318

I am sure you already know this! The above is what I am trying to do in the
vegan package though.. 

Apologies if I am missing something and if what you have said still applies
(that is not appropriate to exceed the possible number of permutations), I
am not a statistician..so any help/clarity would be welcome.. 


Regards, sean

 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jari Oksanen
Sent: 29 October 2015 03:23 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] monte carlo simulations in permanova in vegan package

Sean Porter  ori.org.za> writes:

> I am trying to run a PERMANOVA in the vegan package with an 
> appropriate number of permutations (see example below), ideally .
> Obviously that number of permutations does not exists so I would like 
> to use Monte Carlo permutation tests to derive the probability value, 
> as is done in the commercial package PERMANOVA+ for PRIMER. How can I 
> adapt my code so that adonis will do so ? Many thanks, Sean
[...clip...]
> 
> > permanova <- adonis(species ~ time, data = time, permutations=999,
> method="bray")
> 
> 'nperm' > set of all permutations; Resetting 'nperm'.
> 
I assume we are talking about the latest version of vegan and permute
packages. In that case you really should switch to complete enumeration if
you request exceeds the number of distinct permutations. As people have told
you, you should be satisfied with that because there are no more distinct
permutations. Alternatively, you need more data.

If you mean by Monte Carlo that the same that you have a sampling with
return instead of permutation, or that the same observation can appear
several times and therefore some other unit is missing, then there are two
pieces of advice:

1. You should not do so.
2. If you want to do so, you can generate your resampling matrices by hand
and use that matrix as the argument of permutations=. See the documentations
(?adonis) which tells how to do so.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] monte carlo simulations in permanova in vegan package

2015-10-30 Thread Sean Porter
Thank you Jari,

It seems now that my question is morphing more into a statistical one, and
perhaps not appropriate for R-help list, so apologies. Yes we are talking
about the latest versions of the vegan and permute packages. 

When there are an insufficient number of permutations available due to low
sample sizes apparently an alternative is to use the result given in
Anderson & Robinson (2003) regarding the asymptotic permutation of the
numerator (or denominator) of the test statistic under permutation. And I
quote from Anderson et al. 2008 "It is demonstrated that each of the sums of
squares has, under permutation, an asymptotic distribution that is a linear
form in chi-square variables, where the coefficients are actually the
eigenvalues from a PCO of the resemblance matrix. Thus, chi-square variables
can be drawn randomly and independently, using Monte Carlo sampling, and
these can be combined with the eigenvalues to construct the asymptotic
permutation distribution for each of the numerator and denominator and,
thus,  for the entire pseudo-F statistic, in the event that too few actual
unique permutations exist."

Anderson, Gorley & Clarke. 2008. PERMANOVA+ for PRIMER: Guide to software
and statistical models.
Anderson & Robinson 2003. Generalised discriminant analysis based on
distances. Australian and New Zealand Journal of Statistics. 45: 301-318

I am sure you already know this! The above is what I am trying to do in the
vegan package though.. 

Apologies if I am missing something and if what you have said still applies
(that is not appropriate to exceed the possible number of permutations), I
am not a statistician..so any help/clarity would be welcome.. 


Regards, sean

 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jari Oksanen
Sent: 29 October 2015 03:23 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] monte carlo simulations in permanova in vegan package

Sean Porter  ori.org.za> writes:

> I am trying to run a PERMANOVA in the vegan package with an 
> appropriate number of permutations (see example below), ideally . 
> Obviously that number of permutations does not exists so I would like 
> to use Monte Carlo permutation tests to derive the probability value, 
> as is done in the commercial package PERMANOVA+ for PRIMER. How can I 
> adapt my code so that adonis will do so ? Many thanks, Sean
[...clip...]
> 
> > permanova <- adonis(species ~ time, data = time, permutations=999,
> method="bray")
> 
> 'nperm' > set of all permutations; Resetting 'nperm'.
> 
I assume we are talking about the latest version of vegan and permute
packages. In that case you really should switch to complete enumeration if
you request exceeds the number of distinct permutations. As people have told
you, you should be satisfied with that because there are no more distinct
permutations. Alternatively, you need more data.

If you mean by Monte Carlo that the same that you have a sampling with
return instead of permutation, or that the same observation can appear
several times and therefore some other unit is missing, then there are two
pieces of advice:

1. You should not do so.
2. If you want to do so, you can generate your resampling matrices by hand
and use that matrix as the argument of permutations=. See the documentations
(?adonis) which tells how to do so.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] monte carlo simulations in permanova in vegan package

2015-10-27 Thread Sean Porter
Dear colleagues,

 

I am trying to run a PERMANOVA in the vegan package with an appropriate
number of permutations (see example below), ideally . Obviously that
number of permutations does not exists so I would like to use Monte Carlo
permutation tests to derive the probability value, as is done in the
commercial package PERMANOVA+ for PRIMER. How can I adapt my code so that
adonis will do so ? Many thanks, Sean

 

> permanova <- adonis(species ~ time, data = time, permutations=99,
method="bray")

> permanova

 

Call:

adonis(formula = species ~ time, data = time, permutations = 99,  method
= "bray") 

 

Permutation: free

Number of permutations: 99

 

Terms added sequentially (first to last)

 

  Df SumsOfSqs  MeanSqs F.Model  R2 Pr(>F)   

time   1  0.070504 0.070504  123.65 0.96866   0.01 **

Residuals  4  0.002281 0.000570 0.03134  

Total  5  0.072785  1.0  

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 

 

> permanova <- adonis(species ~ time, data = time, permutations=999,
method="bray")

'nperm' > set of all permutations; Resetting 'nperm'.

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] monte carlo simulations in permanova in vegan package

2015-10-27 Thread Sean Porter
Hi Stephen and others,

 

I am trying to run a one-way permanova where I have only 2 levels in the factor 
“time”, and each level contains only 3 replicates. So because I have such few 
observations (6 in total) and levels (2) there are not enough possible 
permutations to get a reasonable test (i.e. (2*3)!/ [2!(3!)^2].   That is why 
for example if I run the analysis with only 99 permutations it completes the 
task. However, if I set the number of permutations to anything larger it 
returns the message “'nperm' > set of all permutations; Resetting 'nperm'.” as 
the number of possible permutations exceeds the number set by the argument 
“permutations=”. In PERMANOVA + for PRIMER there is a way of dealing with this 
issue – by using Monte Carlo simulations to generate the p value with a 
reasonable number of permutations. Hopefully this clarifies my situation and 
aim?

 

I was therefore hoping there was a way of coding for the Monte-Carlo 
permutation procedure into adonis?  

 

Thanks for your help!

 

From: stephen sefick [mailto:ssef...@gmail.com] 
Sent: 27 October 2015 03:11 PM
To: Sean Porter
Cc: r-help@r-project.org
Subject: Re: [R] monte carlo simulations in permanova in vegan package

 

The example code works, and reports  permutations. Can you provide more 
information? 

 

data(dune)
data(dune.env)
adonis(dune ~ Management*A1, data=dune.env, permutations=)

 

 

On Tue, Oct 27, 2015 at 3:56 AM, Sean Porter <spor...@ori.org.za> wrote:

Dear colleagues,



I am trying to run a PERMANOVA in the vegan package with an appropriate
number of permutations (see example below), ideally . Obviously that
number of permutations does not exists so I would like to use Monte Carlo
permutation tests to derive the probability value, as is done in the
commercial package PERMANOVA+ for PRIMER. How can I adapt my code so that
adonis will do so ? Many thanks, Sean



> permanova <- adonis(species ~ time, data = time, permutations=99,
method="bray")

> permanova



Call:

adonis(formula = species ~ time, data = time, permutations = 99,  method
= "bray")



Permutation: free

Number of permutations: 99



Terms added sequentially (first to last)



  Df SumsOfSqs  MeanSqs F.Model  R2 Pr(>F)

time   1  0.070504 0.070504  123.65 0.96866   0.01 **

Residuals  4  0.002281 0.000570 0.03134

Total  5  0.072785  1.0

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1





> permanova <- adonis(species ~ time, data = time, permutations=999,
method="bray")

'nperm' > set of all permutations; Resetting 'nperm'.










[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







-- 

Stephen Sefick
**
Auburn University 
Biological Sciences  
331 Funchess Hall   
Auburn, Alabama
36849   
**
sas0...@auburn.edu  
http://www.auburn.edu/~sas0025 
**

Let's not spend our time and resources thinking about things that are so little 
or so large that all they really do for us is puff us up and make us feel like 
gods.  We are mammals, and have not exhausted the annoying little problems of 
being mammals.

-K. Mullis

"A big computer, a complex algorithm and a long time does not equal science."

  -Robert Gentleman


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need help with excel data

2015-01-21 Thread Sean Porter

Hi Dr Polanski,

I would recommend you do this in excel seeing as you know how to work with 
excel. You can use excel to put different parts of a cell into another cell. 

For example if cell A1 is 12*23 34*45

And you want 12 in a separate cell  (say cell A2) go to cell A2 and type: 
=LEFT(A1,2)

This will extract the first 2 characters from the left.

To extract 45 you would type: =Right(A1,2)

To get 2 characters starting at position 4 you would type: =MID(A1, 4,2)

Which will give you 23.


Hope this helps.

Regards,
 
DR. SEAN PORTER
Scientist

South African Association for Marine Biological Research
Direct Tel: +27 (31) 328 8169   Fax: +27 (31) 328 8188
E-mail: spor...@ori.org.za Web: www.saambr.org.za
1 King Shaka Avenue, Point, Durban 4001 KwaZulu-Natal South Africa
PO Box 10712, Marine Parade 4056 KwaZulu-Natal South Africa

 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Dr Polanski
Sent: 21 January 2015 10:32 PM
To: r-help@r-project.org
Subject: [R] need help with excel data

Hi all!

Sorry to bother you, I am trying to learn some R via coursera courses and other 
internet sources yet haven’t managed to go far

And now I need to do some, I hope, not too difficult things, which I think R 
can do, yet have no idea how to make it do so

I have a big set of data (empirical) which was obtained by my colleagues and 
store at not convenient  way - all of the data in two cells of an excel table 
an example of the data is in the attached file (the link)

https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing

so the first column has a number and the second has a whole vector (I guess it 
is) which looks like «some words in Cyrillic(the length varies)» and then the 
set of numbers «12*23 34*45» (another problem that some times it is «12*23, 
34*56» 

And the number of raws is about 3000 so it is impossible to do manually 

what I need to have at the end is to have it separately in different excel cells
- what is written in words - |  12  | 23 | 34 | 45 |

Do you think it is possible to do so using R (or something else?)

Thank you very much in advance and sorry for asking for help and so stupid 
question, the problem is - I am trying and yet haven’t even managed to install 
openSUSE onto my laptop - only Ubuntu! :)


Thank you very much!
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gradientForest input data structure

2014-04-02 Thread Sean Porter
 0 0.35 0.9 0 0 0 0 ...
 $ q : num  0 0 0 0 0 0 0 9.4 0 0 ...
 $ r : num  0 41 0 0 1.75 0 0 0 0 0 ...
 $ s : num  0 0 0 0 0 ...
 $ t : num  0 0 22.1 0 0 ...
 $ u : num  0 0 0 0 0 0 0 0 0 0 ...
 $ v : num  0 0 0 0 0.12 0 0 0 0 0 ...
 $ w : num  0 0 0 0 4.95 6.6 0 3.3 0 3.3 ...
 $ x : num  0 0 0 0 7.9 ...
 $ y : int  0 0 0 0 0 1 0 0 0 0 ...
 $ z : num  0 0 0 0 0 0 0 0 0 0.8 ...
 $ aa: num  0 0 0 0 0 0 0 0 0 0 ...
 $ ab: num  0 47 0 136.3 9.4 ...
 $ ac: num  0 0 0 0 0 0 0 0 0 0 ...
 $ ad: num  0 4.2 0 8.4 0 0 0 0.7 0 0 ...
 $ ae: int  0 0 0 2 0 1 0 0 0 0 ...
 $ af: num  0 92.4 720.7 0 554.4 ...
 $ ag: int  0 0 0 0 0 0 0 0 0 0 ...
 $ ah: int  0 0 0 0 0 0 0 0 0 0 ...
 $ ai: num  43.4 3.4 26.4 0 1.7 ...
 $ aj: num  0 0 0 0 0 ...
 $ ak: num  0 0 0.25 0 0 0 0 0 0 0 ...
 $ al: num  0 0 0 0 0 ...
 $ am: num  561.6 0 93.6 0 374.4 ...
 $ an: num  234 0 562 0 187 ...
 $ ao: num  15.92 2.16 0 0 1.08 ...
 $ ap: num  31.84 0 1.08 0 3.24 ...
 $ aq: num  0 0 0 37.8 29.4 0 92.4 0 0 0 ...
 $ ar: int  0 72 0 76 16 49 0 8 0 0 ...
 $ as: num  0 0 0 0 0 0 0 0 0 0 ...
 $ at: num  0 0 0 0 0 0 0 0 0 0 ...
 $ au: num  0 0 0 0 0 0 0 0 0 0 ...
 $ av: num  0 31.8 0 25.4 0 ...
 $ aw: num  0 0 0 0 0 0 0 0 0 0 ...
 $ ax: num  0 2.7 0 0 0 0 0 2.7 2.7 0 ...
 $ ay: int  0 0 0 0 0 1 0 0 0 0 ...
 $ az: num  2.7 0 0 0 0 0 0 0 0 0 ...
 $ ba: num  7.72 0 0 0 0 0 0 0 0 0 ...
 $ bb: num  262 0 0 0 0 ...
 $ bc: num  0 1.6 0 13.6 0 ...
 $ bd: num  0 0 7.96 0 0 0 0 0 0 0 ...
 $ be: num  2493 0 1254 0 988 ...
 $ bf: num  0 46.4 0 72.5 45 ...
 $ bg: num  218 0 265 0 884 ...
 $ bh: num  0 0 0 0 0 0 0 2.8 0 0 ...
 $ bi: num  0 0 0 0 0 ...
 $ bj: num  0 0 0 0 0 0 0 0 0 0 ...
 $ bk: num  0 0 0 0 0 1.4 0 0 0 0 ...
 $ bl: num  0 0 0 0 0 0 3.2 0 0 0 ...
 $ bm: num  0 2.6 0 72.8 0 ...
 $ bn: num  0 0 82.8 0 0 0 0 0 0 0 ...
 $ bo: num  0 0 0 0 0 ...
 $ bp: int  0 0 0 0 0 0 0 288 0 0 ...
 $ bq: num  28.4 530.5 433.4 473.9 615.6 ...
 $ br: num  0 0 0 0 0 0 0 0 0 0 ...
 $ bs: num  0 0 0 0 0 0 0 0 0 14.5 ...
 $ bt: num  56.2 0 1125 0 78.8 ...
 $ bu: num  205.4 7.9 130.3 0 0 ...
 $ bv: num  1353.2 0 119.4 0 79.6 ...
 $ bw: num  0 0 0 2.45 0.7 2.1 0 0 0 0 ...
 $ bx: num  0 0 0 0 0 ...
 $ by: num  0 0 0 0 0 0 26.4 0 0 0 ...
 $ bz: num  208 1806 3727 208 8427 ...
 $ ca: num  49.2 0 32.8 0 57.4 ...
 $ cb: num  0 7.15 0 0 0 0 1.65 0 0 0 ...
 $ cc: num  0 590 0 419 0 ...
 $ cd: num  0 0 0 0 0 0 0 0 1.5 0 ...
 $ ce: num  1390 0 1394 0 552 ...
 $ cf: num  75.6 0 0 0 0 ...
 $ cg: num  3.86 0 0 0 0 0 0 0 0 0 ...
 $ ch: num  81.3 0 0 0 0 ...
 $ ci: num  0 0 0 0 12.2 ...
 $ cj: num  0 1.2 0 0.8 0 0.8 0.8 3.6 0 0 ...
 $ ck: num  0 0 0 0 0 17.4 0 0 0 0 ...
 $ cl: int  0 0 0 0 0 0 0 0 0 435 ...
 $ cm: num  0 0 0 0 0 0 31.2 0 0 0 ...
 $ cn: num  0 0 0 16.8 0 0 0 0 0 0 ...
 $ co: num  11.61 0 2.11 0 10.55 ...
 $ cp: num  15.05 1.4 0.35 0 0 ...
 $ cq: num  0 0 0 0 0 0 0 4.2 0 0 ...
 $ cr: int  0 0 0 0 1 0 0 0 0 0 ...
 $ cs: num  0 0 0 0 0 0 17.1 0 0 0 ...
 $ ct: num  2.7 0 0 0 0 0 0 0 0 0 ...
 $ cu: num  0 0 30.9 0 41.2 ...
  [list output truncated]

I thought it may be that some values are numbers and some are integers but I
tested this using only numbers and found that this is not the problem. How
do I get my response/species data into the correct structure such as in the
example (GZ.sps.mat.Rdata) ?

Thank you very much

Sean


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Sean Porter
Sent: 25 March 2014 09:34 AM
To: 'Liaw, Andy'; r-help@r-project.org
Subject: Re: [R] randomForest warning: The response has five or fewer unique
values. Are you sure you want to do regression?

Dear Andy,

Thank you for your help! Below are the full details of what I am doing in R
along with the data structure, so hopefully this will help. Okay so the
warning is just a warning and nothing to worry about when doing regression.
But why is randomForest only producing regression trees for each of only 3
species when I have 100 species in the matrix, surely this is not correct,
what am I doing wrong? Also, what did you mean when you said by using the
code I am not using randomForest directly ? 

Many thanks, Sean 

 # For Andy
 # get biological data into R
  biological - read.table (file = C:/bio1.txt, header = TRUE)
 dim(biological)
[1]  14 100
 # get environmental data into R
 enviro - read.table (file = C:/abio1.txt, header = TRUE)
 dim(enviro)
[1] 14  8
 # data structure of biological data
 str(biological)
'data.frame':   14 obs. of  100 variables:
 $ a : num  0 0 0 0 0 0 0 0 0 0 ...
 $ b : num  0 0 0 0 257 ...
 $ c : int  0 0 0 0 0 0 441 0 0 0 ...
 $ d : num  179 0 1430 0 0 ...
 $ e : num  100 0 601 0 123 ...
 $ f : num  0 0 3 0 1.5 0 0 0 0 4.5 ...
 $ g : num  0 0 0 0 0 0 0 0 0 0 ...
 $ h : int  0 0 0 0 0 0 0 0 1 0 ...
 $ i : num  0 0 0 0 0 0 0 0 0 3.85 ...
 $ j : num  0 0 0 27.6 3.6 ...
 $ k : num  0 0 0 0 0 0 0 0 0 1.8 ...
 $ l : num  0 0 0 0 0 0 0 0 0 0 ...
 $ m : num  0 0 0 0 0 0 0 0 0 0 ...
 $ n : num  0 0 0 0 0 0 0 1.1 0 0

Re: [R] randomForest warning: The response has five or fewer unique values. Are you sure you want to do regression?

2014-03-25 Thread Sean Porter
 1e-03 1e-03 1e-04 1e-04
1e-03 1e-03 1e-04 ...
 $ Sediment.nlw551.667.: num  0.231 0.229 0.229 0.237 0.227 ...
 $ Depth   : num  4.8 4.1 5 4 6.2 7.7 10.1 4.3 5.1 7.9 ...
 # conduct randomForest regression
 gf - gradientForest(cbind(enviro, biological), predictor.vars =
colnames(enviro), response.vars = colnames(biological), ntree = 500,
transform = NULL, compact = T, nbin = 201, maxLevel = 5, corr.threshold =
0.5)
There were 50 or more warnings (use warnings() to see the first 50)
 gf
A forest of 500 regression trees for each of 3 species

Call:

gradientForest(data = cbind(enviro, biological), predictor.vars =
colnames(enviro), 
response.vars = colnames(biological), ntree = 500, transform = NULL, 
maxLevel = 5, corr.threshold = 0.5, compact = T, nbin = 201)



Important variables:
[1] Sediment.nlw551.667. DepthnLw551   nLw667
Chlorophyll 

 # End


 


-Original Message-
From: Liaw, Andy [mailto:andy_l...@merck.com] 
Sent: 25 March 2014 02:37 AM
To: Sean Porter; r-help@r-project.org
Subject: RE: [R] randomForest warning: The response has five or fewer unique
values. Are you sure you want to do regression?

If you are using the code, that's not really using randomForest directly.  I
don't understand the data structure you have (since you did not show
anything) so can't really tell you much.  In any case, that warning came
from randomForest() when it is run in regression mode but the response has
fewer than five distinct values.  It may be legitimate regression data, and
if so you can safely ignore the warning (that's why it's not an error).
It's there to catch the cases when people try to do classification with
class labels 1, 2, ..., k and forgot to make it a factor.

Best,
Andy Liaw

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Sean Porter
Sent: Thursday, March 20, 2014 3:27 AM
To: r-help@r-project.org
Subject: [R] randomForest warning: The response has five or fewer unique
values. Are you sure you want to do regression?

Hello everyone,

 

Im relatively new to R and new to the randomForest package and have scoured
the archives for help with no luck. I am trying to perform a regression on a
set of predictors and response variables to determine the most important
predictors. I have 100 response variables collected from 14 sites and 8
predictor variables from the same 14 sites. I run the code to perform the
randomForest  regression given by Pitcher et al 2011   (
http://gradientforest.r-forge.r-project.org/biodiversity-survey.pdf ). 

 

However, after running the code I get the warning:

 

 In randomForest.default(m, y, ...) :

  The response has five or fewer unique values.  Are you sure you want to do
regression?

 

And it produces a set of 500 regression trees for each of 3 species only
when the number of species in the response file is 100. I noticed that in
the example by Pitcher they get 500 trees from only 90 species even though
they input 110 species in the response data.

 

Why am I getting the warning/how do I solve it, and why is randomForest
producing trees for only 3 species when I am looking at 100 species
(response variables)?

 

Many thanks

 

Sean

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] randomForest warning: The response has five or fewer unique values. Are you sure you want to do regression?

2014-03-20 Thread Sean Porter
Hello everyone,

 

Im relatively new to R and new to the randomForest package and have scoured
the archives for help with no luck. I am trying to perform a regression on a
set of predictors and response variables to determine the most important
predictors. I have 100 response variables collected from 14 sites and 8
predictor variables from the same 14 sites. I run the code to perform the
randomForest  regression given by Pitcher et al 2011   (
http://gradientforest.r-forge.r-project.org/biodiversity-survey.pdf ). 

 

However, after running the code I get the warning:

 

 In randomForest.default(m, y, ...) :

  The response has five or fewer unique values.  Are you sure you want to do
regression?

 

And it produces a set of 500 regression trees for each of 3 species only
when the number of species in the response file is 100. I noticed that in
the example by Pitcher they get 500 trees from only 90 species even though
they input 110 species in the response data.

 

Why am I getting the warning/how do I solve it, and why is randomForest
producing trees for only 3 species when I am looking at 100 species
(response variables)?

 

Many thanks

 

Sean

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.