Re: [R] How to assign a rank to a range of values..cumulative area distribution

2006-10-12 Thread Thomas P. Colson
 

Here's what I came up with, thanks to Alex Brown and Roger for the solution:


#needs the maptools package to read ESRI grid
require(maptools)
#import the flow accumulation grid
basin.map - readAsciiGrid(c:/temp/eno_fac.asc, colname=fac)
#split on unique fac cell values
areas -split(basin.map$fac,basin.map$fac)
length(areas)
#count each occurence of fac value
cell_count-sapply(areas, length)
#calculate each drainage area, original is 20 ft resolution, we want square
meters
basin_area - cell_count * 20 * 20 * 0.09290304
#read the area into a dataframe
freqs-as.data.frame(table(basin_area))
#rank the frequencies based on each unique occerence, note, ranks from 1 to
n
r-rank(freqs$basin_area)
n-length(r)
#determing the probability, n+1 insures there is no 100%, 1- reverses the
order so
#low drainage area gets high probability of exceedence
z-cbind(Rank = r, PRank = 1-(r/(n+1)))
#attach the probability to the table, result is high prob of exceed is in
row with low drainage
#and low probabibility is in row with high drainage
freqs$rank-z

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to assign a rank to a range of values..

2006-10-10 Thread Thomas P. Colson
From the following:

basin.map - readAsciiGrid(c:/temp/area.asc, colname=area) 

I have a SpatialGridDataFrame which has the x and y cordinate of a cell, and
the drainage area of that cell. There are many cells with a low drainage
area (in my case, 33000 with an area of 37.16) and one cell with the highest
drainage area (again, in my case, a drainage area of of 80). 

What I'd like to do, is to rank the drainage area cells based upon the
number of times they occur, with a rank of 100 going to the cells with
area=37.16, and 1 going to the cell(s) with area=80). There are 6,000
different drainage areas out of 180,000 cells in this grid, so the ranks
would have values like 100, 99.01, 57.34, 20, 1.08, 1 and so forth. 


I have been struggeling with the split, length and rank commands in R, but
can't seem to figure out how to attach a new column (or make a new
dataset) that has a colums of ranks, or how to calculate the rank. 

Thanks for any help. 



Thomas Colson
North Carolina State University
Department of Forestry and Environmental Resources
(919)624-6329
(919)515 3434
[EMAIL PROTECTED]

Schedule: www4.ncsu.edu/~tpcolson

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Probability of exceedance function question

2006-10-08 Thread Thomas P. Colson
I'm trying to calculate a cumulative area distribution (graph) of drainage
areas. This is defined as P(A  A*). Simple in principle. I can do this in
excel, with COUNTIF, which will count the number of cells in the row
area that have area A, then determine, for each cell in the row area, how
many cells exceede that area, then dividing that number by the total number
of cells, which gives me the probability that drainage area A exceeds
drainage area A*. 

E.g, drainage area of 6 sq meters (One DEM grid cell) has a high probability
of exceedance(.99), while a drainage area of 100,000 square meters has a low
probability of exceedance (.001). 

I wish to plot this relationship, and we all know that excel is not the tool
of choice when working with hundreds of thousands of records. I'd like to
port the CAD into a few R functions that I've already developed for other
tests as well.  

So my challenge, in R, is to 
(1)count the number of rows in column Area that have AREA(*), 

(2) determine, by row, how many rows have an area greater than the area
given in that one row 

(3) divide step 2 by number of rows (how can I do a row count and port that
to a variable, as I have to do this on 10 datasets?)

Thanks for any advice you can offer to this endevour

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Probability of exceedance function question

2006-10-08 Thread Thomas P. Colson
I am able to convert the flow accumulation grid into an area (for each
pixel) grid, then import this into R as an ASCII file. The plot(ecdf)
function in R seems to plot the opposite: curve starts at probability 0, for
drainage area 0, should be the other way? 

About 150,000 data points in these sets, ecdf curve plots in about 15
seconds. 


Could the problem be, how I'm importing the data from ascii grid? Cellsize
is 20 ft and z is the drainage area, for each cell (flow weighted)

 area - read.table(file = c:/temp/area.asc, sep =  , na.strings =
-, skip = 6) 
area - area[,-ncol(area)] 
xLLcorner - 1985649.0700408898
yLLcorner - 841301.04004059616
cellsize -20
xURcorner - xLLcorner + (cellsize * (ncol(area) - 1)) 
xLRcorner - xURcorner 
xULcorner - xLLcorner
yULcorner - yLLcorner + (cellsize * (nrow(area) - 1)) 
yURcorner - yULcorner 
yLRcorner - yLLcorner 
coordsa - expand.grid(y = seq(yULcorner, yLRcorner, by = -20),x =
seq(xULcorner, xLRcorner, by = 20))
area- data.frame(coordsa, tmin = as.vector(c(area,recursive = T))) 
names(area)-c(x,y,z)
Plot(ecdf(area$z))


-Original Message-
From: Roger Bivand [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 08, 2006 4:37 PM
To: Thomas P. Colson
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Probability of exceedance function question

On Sun, 8 Oct 2006, Thomas P. Colson wrote:

 I'm trying to calculate a cumulative area distribution (graph) of 
 drainage areas. This is defined as P(A  A*). Simple in principle. I 
 can do this in excel, with COUNTIF, which will count the number of 
 cells in the row area that have area A, then determine, for each 
 cell in the row area, how many cells exceede that area, then dividing 
 that number by the total number of cells, which gives me the 
 probability that drainage area A exceeds drainage area A*.

Is this ecdf() of the vector or its suitable subset? If so, it runs very
fast even for large data sets. For plotting, bear in mind that you are
generating a lot of output, though:

 t0 - runif(10)
 system.time(t1 - ecdf(t0))
[1] 0.222 0.022 0.248 0.000 0.000
 system.time(plot(t1, pch=.))
[1] 1.089 0.079 1.186 0.000 0.000

isn't at all bad!

 
 E.g, drainage area of 6 sq meters (One DEM grid cell) has a high 
 probability of exceedance(.99), while a drainage area of 100,000 
 square meters has a low probability of exceedance (.001).
 
 I wish to plot this relationship, and we all know that excel is not 
 the tool of choice when working with hundreds of thousands of records. 
 I'd like to port the CAD into a few R functions that I've already 
 developed for other tests as well.
 
 So my challenge, in R, is to
 (1)count the number of rows in column Area that have AREA(*),
 
 (2) determine, by row, how many rows have an area greater than the 
 area given in that one row
 
 (3) divide step 2 by number of rows (how can I do a row count and port 
 that to a variable, as I have to do this on 10 datasets?)
 
 Thanks for any advice you can offer to this endevour
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.