Re: [R] How to assign a rank to a range of values..cumulative area distribution
Here's what I came up with, thanks to Alex Brown and Roger for the solution: #needs the maptools package to read ESRI grid require(maptools) #import the flow accumulation grid basin.map - readAsciiGrid(c:/temp/eno_fac.asc, colname=fac) #split on unique fac cell values areas -split(basin.map$fac,basin.map$fac) length(areas) #count each occurence of fac value cell_count-sapply(areas, length) #calculate each drainage area, original is 20 ft resolution, we want square meters basin_area - cell_count * 20 * 20 * 0.09290304 #read the area into a dataframe freqs-as.data.frame(table(basin_area)) #rank the frequencies based on each unique occerence, note, ranks from 1 to n r-rank(freqs$basin_area) n-length(r) #determing the probability, n+1 insures there is no 100%, 1- reverses the order so #low drainage area gets high probability of exceedence z-cbind(Rank = r, PRank = 1-(r/(n+1))) #attach the probability to the table, result is high prob of exceed is in row with low drainage #and low probabibility is in row with high drainage freqs$rank-z __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to assign a rank to a range of values..
From the following: basin.map - readAsciiGrid(c:/temp/area.asc, colname=area) I have a SpatialGridDataFrame which has the x and y cordinate of a cell, and the drainage area of that cell. There are many cells with a low drainage area (in my case, 33000 with an area of 37.16) and one cell with the highest drainage area (again, in my case, a drainage area of of 80). What I'd like to do, is to rank the drainage area cells based upon the number of times they occur, with a rank of 100 going to the cells with area=37.16, and 1 going to the cell(s) with area=80). There are 6,000 different drainage areas out of 180,000 cells in this grid, so the ranks would have values like 100, 99.01, 57.34, 20, 1.08, 1 and so forth. I have been struggeling with the split, length and rank commands in R, but can't seem to figure out how to attach a new column (or make a new dataset) that has a colums of ranks, or how to calculate the rank. Thanks for any help. Thomas Colson North Carolina State University Department of Forestry and Environmental Resources (919)624-6329 (919)515 3434 [EMAIL PROTECTED] Schedule: www4.ncsu.edu/~tpcolson __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Probability of exceedance function question
I'm trying to calculate a cumulative area distribution (graph) of drainage areas. This is defined as P(A A*). Simple in principle. I can do this in excel, with COUNTIF, which will count the number of cells in the row area that have area A, then determine, for each cell in the row area, how many cells exceede that area, then dividing that number by the total number of cells, which gives me the probability that drainage area A exceeds drainage area A*. E.g, drainage area of 6 sq meters (One DEM grid cell) has a high probability of exceedance(.99), while a drainage area of 100,000 square meters has a low probability of exceedance (.001). I wish to plot this relationship, and we all know that excel is not the tool of choice when working with hundreds of thousands of records. I'd like to port the CAD into a few R functions that I've already developed for other tests as well. So my challenge, in R, is to (1)count the number of rows in column Area that have AREA(*), (2) determine, by row, how many rows have an area greater than the area given in that one row (3) divide step 2 by number of rows (how can I do a row count and port that to a variable, as I have to do this on 10 datasets?) Thanks for any advice you can offer to this endevour __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Probability of exceedance function question
I am able to convert the flow accumulation grid into an area (for each pixel) grid, then import this into R as an ASCII file. The plot(ecdf) function in R seems to plot the opposite: curve starts at probability 0, for drainage area 0, should be the other way? About 150,000 data points in these sets, ecdf curve plots in about 15 seconds. Could the problem be, how I'm importing the data from ascii grid? Cellsize is 20 ft and z is the drainage area, for each cell (flow weighted) area - read.table(file = c:/temp/area.asc, sep = , na.strings = -, skip = 6) area - area[,-ncol(area)] xLLcorner - 1985649.0700408898 yLLcorner - 841301.04004059616 cellsize -20 xURcorner - xLLcorner + (cellsize * (ncol(area) - 1)) xLRcorner - xURcorner xULcorner - xLLcorner yULcorner - yLLcorner + (cellsize * (nrow(area) - 1)) yURcorner - yULcorner yLRcorner - yLLcorner coordsa - expand.grid(y = seq(yULcorner, yLRcorner, by = -20),x = seq(xULcorner, xLRcorner, by = 20)) area- data.frame(coordsa, tmin = as.vector(c(area,recursive = T))) names(area)-c(x,y,z) Plot(ecdf(area$z)) -Original Message- From: Roger Bivand [mailto:[EMAIL PROTECTED] Sent: Sunday, October 08, 2006 4:37 PM To: Thomas P. Colson Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Probability of exceedance function question On Sun, 8 Oct 2006, Thomas P. Colson wrote: I'm trying to calculate a cumulative area distribution (graph) of drainage areas. This is defined as P(A A*). Simple in principle. I can do this in excel, with COUNTIF, which will count the number of cells in the row area that have area A, then determine, for each cell in the row area, how many cells exceede that area, then dividing that number by the total number of cells, which gives me the probability that drainage area A exceeds drainage area A*. Is this ecdf() of the vector or its suitable subset? If so, it runs very fast even for large data sets. For plotting, bear in mind that you are generating a lot of output, though: t0 - runif(10) system.time(t1 - ecdf(t0)) [1] 0.222 0.022 0.248 0.000 0.000 system.time(plot(t1, pch=.)) [1] 1.089 0.079 1.186 0.000 0.000 isn't at all bad! E.g, drainage area of 6 sq meters (One DEM grid cell) has a high probability of exceedance(.99), while a drainage area of 100,000 square meters has a low probability of exceedance (.001). I wish to plot this relationship, and we all know that excel is not the tool of choice when working with hundreds of thousands of records. I'd like to port the CAD into a few R functions that I've already developed for other tests as well. So my challenge, in R, is to (1)count the number of rows in column Area that have AREA(*), (2) determine, by row, how many rows have an area greater than the area given in that one row (3) divide step 2 by number of rows (how can I do a row count and port that to a variable, as I have to do this on 10 datasets?) Thanks for any advice you can offer to this endevour __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.