### [R-sig-eco] Using rq() for least absolute deviation regression

I've seen several websites say that the function rq() from the package quantreg can be used to do least absolute deviation regression. How do you go about doing this and what's the connection between quantile regression and LAD? (I'm very new to the former topic.) Thanks, Jane

### Re: [R-sig-eco] Using rq() for least absolute deviation regression

The short answer is that what you seem to want is the rq() default with tau not specified. (Default is tau=.5). In general rq() minimizes a sum of weighted absolute residuals. The weights depend on tau (the conditional quantile of interest), and turn out to be equal with tau = 0.5, i.e., median regression in rq is LAD.

r-sig-ecology-boun...@r-project.org wrote on 04/26/2011 03:45:23 PM:

From: Jane Shevtsov jane@gmail.com
To: r-sig-ecology@r-project.org
Date: 04/26/2011 03:46 PM
Subject: [R-sig-eco] Using rq() for least absolute deviation regression

I've seen several websites say that the function rq() from the package quantreg can be used to do least absolute deviation regression. How do you go about doing this and what's the connection between quantile regression and LAD? (I'm very new to the former topic.) Thanks, Jane

### Re: [R-sig-eco] subsetting data in R

If this isn't already answered: I don't quite understand the question: what do you mean by do a complete data set from an object in R? What do you mean by the subsetting is dangerous ... as you need to specify the levels for all your factors again? (What do your 3000 columns of data represent? If these are predictor variables I hope you have a truly enormous number of responses ...) It may have been mentioned already, but droplevels(subset(...)) will probably do what you want. (I have tried very hard over the years to get drop.levels= to be an optional argument to subset(), but so far I have failed. droplevels() is an improvement over the drop.levels() function in gdata because (1) it is in base R and (2) it doesn't reorder the factor by default (which is what gdata::drop.levels [insanely in my opinion] does).

On 11-04-24 11:21 AM, Manuel Spínola wrote:

Thank you for all the responses.

Is there a way to do a complete data set from an object in R?

I have a data set with more than 3000 columns. The subsetting is ok but it could be dangerous if you are using other factors to do some analysis as you need to specify the levels for all your factors again.

Best,

Manuel

On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote:

pa2- subset(pa, influencia==AP)
pa2$influencia- factor(pa2$influencia)
levels(pa2$influencia)

On Sun, Apr 24, 2011 at 11:24 AM, Manuel Spínola wrote:

Thank you very much for your response, Christian, Roman, and Sarah.

Sarah, I am trying your suggestion but I cannot see the levels:

pa2 = factor(subset(pa, influencia==AP)$influencia)
levels(pa2$influencia)
Error in pa2$influencia : $ operator is invalid for atomic vectors

Best,

Manuel

On 24/04/2011 07:51 a.m., Sarah Goslee wrote:

By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist.

If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that.

testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6)
testdata
group value
1 A 1
2 B 2
3 C 3
4 A 4
5 B 5
6 C 6
str(testdata)
'data.frame': 6 obs. of 2 variables:
$ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3
$ value: int 1 2 3 4 5 6
subset(testdata, group==A)
group value
1 A 1
4 A 4
subset(testdata, group==A)$group
[1] A A
Levels: A B C
?subset
factor(subset(testdata, group==A)$group)
[1] A A
Levels: A

Sarah

On Sun, Apr 24, 2011 at 9:04 AM, Manuel Spínola wrote:

Dear list members,

I have a question regarding too subsetting a data set in R.

I created an object for my data:

pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F)

levels(pa$influencia)
[1] AID AII AP

The object has 3 levels for influencia (AP, AID, AII)

Now I subset only observations with influencia = AID

pa2 = subset(pa, influencia==AID)

but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII.

levels(pa2$influencia)
[1] AID AII AP

Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia.

How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID?

Best,

Manuel

### Re: [R-sig-eco] The final result of TWINSPAN

Dear List,

Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on.

The problem is that I based the code on Hill, Bunch Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me.

So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up.

-- David W. Roberts office 406-994-4548
Professor and Head FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

On 04/14/2011 01:57 AM, Jari Oksanen wrote:

On 14/04/11 10:37 AM, Yong Zhang wrote:

Dear all,

I conducted the two-way indicator species analysis using TWINSPAN program, and following is the final result:

0111
00011011
011000111
01001001

I have to certify my analysis, I want to classify the above 24 sampling sites into 3 major groups based on 7 biotic metrics. The name of my 24 samples could be site1 to site24, from the left to the right, and I set the cut levels 0, 2, 5, 10, 20, the maximum level of divisions: 6, and maximum group size for division:3 . Now, my question is whether my setting is correct? And how should I classify these sites into 3 groups accoding to this final result?

Dear Yong Zhang,

This is not an R issue, because there is no TWINSPAN in R. However, the answer to your question is that strictly speaking you cannot group your data into three major groups with TWINSPAN. TWINSPAN is a bisection method so that first division gives you two groups, and second splits each of these into two groups so that the next choice is to have four groups. However, in this case one of the groups was so small (3 plots were split off from other in the first division, and then these were split into groups of 2 plots and 1 plot) that you probably can ignore the second division of the small group. If your goal was as vague as wanting to classify 24 sites into 3 major groups you could do better than use TWINSPAN: what's the problem with proper classification methods in R? Moreover, have you checked that your biotic metrics suit to the pseudospecies cut level concept of TWINSPAN?

Cheers, jari oksanen

### Re: [R-sig-eco] subsetting data in R

Thank you very much Ben.

I was doing an analysis of indicator species with the subset data and the other levels were still in my subset data and the analysis was considering them in the analysis.

My 3000 columns are plant species presence/absence type of data.

Best,

Manuel

On 26/04/2011 12:06 p.m., Ben Bolker wrote:

If this isn't already answered: I don't quite understand the question: what do you mean by do a complete data set from an object in R? What do you mean by the subsetting is dangerous ... as you need to specify the levels for all your factors again? (What do your 3000 columns of data represent? If these are predictor variables I hope you have a truly enormous number of responses ...) It may have been mentioned already, but droplevels(subset(...)) will probably do what you want. (I have tried very hard over the years to get drop.levels= to be an optional argument to subset(), but so far I have failed. droplevels() is an improvement over the drop.levels() function in gdata because (1) it is in base R and (2) it doesn't reorder the factor by default (which is what gdata::drop.levels [insanely in my opinion] does).

On 11-04-24 11:21 AM, Manuel Spínola wrote:

Thank you for all the responses.

Is there a way to do a complete data set from an object in R?

I have a data set with more than 3000 columns. The subsetting is ok but it could be dangerous if you are using other factors to do some analysis as you need to specify the levels for all your factors again.

Best,

Manuel

On 24/04/2011 08:30 a.m., Gustavo Carvalho wrote:

pa2- subset(pa, influencia==AP)
pa2$influencia- factor(pa2$influencia)
levels(pa2$influencia)

On Sun, Apr 24, 2011 at 11:24 AM, Manuel Spínola wrote:

Thank you very much for your response, Christian, Roman, and Sarah.

Sarah, I am trying your suggestion but I cannot see the levels:

pa2 = factor(subset(pa, influencia==AP)$influencia)
levels(pa2$influencia)
Error in pa2$influencia : $ operator is invalid for atomic vectors

Best,

Manuel

On 24/04/2011 07:51 a.m., Sarah Goslee wrote:

By default, read.csv() turns character variables into factors, using all the unique values as the levels. subset() retains those levels by default, as they are a vital element of the data. If you are studying some attribute of men and women, say height, even if you are only looking at the heights for women it's important to remember that men still exist.

If you don't want influencia to be a factor, you can change that in the import stringsAsFactors=FALSE. If you do want influencia to be a factor, but want the unused levels to be removed, you can use factor() to do that.

testdata- data.frame(group=c(A, B, C, A, B, C), value=1:6)
testdata
group value
1 A 1
2 B 2
3 C 3
4 A 4
5 B 5
6 C 6
str(testdata)
'data.frame': 6 obs. of 2 variables:
$ group: Factor w/ 3 levels A,B,C: 1 2 3 1 2 3
$ value: int 1 2 3 4 5 6
subset(testdata, group==A)
group value
1 A 1
4 A 4
subset(testdata, group==A)$group
[1] A A
Levels: A B C
?subset
factor(subset(testdata, group==A)$group)
[1] A A
Levels: A

Sarah

On Sun, Apr 24, 2011 at 9:04 AM, Manuel Spínola wrote:

Dear list members,

I have a question regarding too subsetting a data set in R.

I created an object for my data:

pa = read.csv(espec_indic.csv, header = T, sep=,, check.names = F)

levels(pa$influencia)
[1] AID AII AP

The object has 3 levels for influencia (AP, AID, AII)

Now I subset only observations with influencia = AID

pa2 = subset(pa, influencia==AID)

but if I ask for the levels of influencia still show me the 3 levels, AP, AID, AII.

levels(pa2$influencia)
[1] AID AII AP

Why is that? I was thinking that I was creating a new data frame with only AID as a level for influencia.

How can I make a complete new object with only the observations for AID and that the only level for influencia is indeed AID?

Best,

Manuel

### Re: [R-sig-eco] The final result of TWINSPAN

On 27/04/11 00:40 AM, Dave Roberts wrote:

Earlier this year on an (undoubtedly ill-advised) lark I coded up an R version of TWINSPAN. It's far from a polished package at this point, but the code does run. One of the interesting features is that you can partition a PCO or NMDS in addition to the traditional CA. To be clear, I am not a TWINSPAN fan either, but I wanted it for a methods paper I was working on.

The problem is that I based the code on Hill, Bunch Shaw (1975, J of Ecol 63:597-613) which is what I had available. Apparently the algorithm in the commercial TWINSPAN is significantly modified from the original, but I couldn't find a description of the actual algorithm anywhere in the literature. It is probably described in the User Manual of the software, but I was not sufficiently motivated to chase down a copy. I do have a copy of the FORTRAN code, but it was apparently written in FORTRAN II, and is basically inscrutable, even to an old FORTRAN dog like me.

So, if somebody has a clear description of the actual algorithm (and I think it is disturbing that I could not find one), it would be possible to code it up in native R. The alternative, to write a wrapper for the original FORTRAN code is not a trivial task. I gave it a couple of days and gave up.

Dave,

Hill, Bunch Shaw describe the general idea of TWINSPAN, but the implementation is more complicated. Martin Kent and Paddy Coker do a great job of explaining the twists in their book (vegetation description and analysis: a practical approach). If I remember correctly, the