`I asked Elias Krainski (the autor of skater()), who replied as copied`

`inline below:`

On Tue, 13 Sep 2016, Michael O'Donnell wrote:

## Advertising

Hi,I am interested in calculating multiple statistics based onskater{spdep} results for a SpatialPointsDataFrame, and I was wonderingif someone could help me verify that what I have done is correct (Q1).My objective is to evaluate the performance of the clustering whileusing different parameters for different skater() runs. Specifically, Iam not sure how to measure the within-group similarity and I believe theother statistics are defined correctly.Also, can someone provide more details on the objects "not.prune" and"candidates" (Q2)?

`not.prune is the set of edges that if once pruned generate groups that`

`does not follows the restriction. For example, when you want to have`

`groups with at least 10 areas and at some point a group stop to be`

`considered to be pruned due this.`

Q1 ------------------------------ These are the statistics that I would like to calculate: res1 <- skater() # Example of skater object # The sum of the between-group dissimilarity sst <- res1$ssto # The within-group similarity sse <- sum(res1$ssw)/max(res1$groups)

`SSW is the sum of homogeneity at each step of the SKATER algorithm. So the`

`first number coincides with SSTO, the second is for the case of two`

`groups, the third for the case of three groups and so on. That is it has`

`length equal the number of clusters. However, res1$groups is the`

`identification of each area to with group it belongs to and has length`

`equals the number of areas. So, it doesn't makes sense to divide`

`sum(res1$ssw) to the number of groups. You may want`

`res1$ssw/1:length(res1$ssw)`

# R2 R2 <- (sst-sse)/sst

`Is it the case to compute some kind of gain when having groups? The gain`

`can be the difference between consecutive partitions, like diff(res1$ssw)`

# AIC,AICc # AIC = n*log(SSD/n)+2*cov_count # AICc = AIC + 2*cov_count(cov_count+1)/(n-cov_count-1)) cov_count <- 1 # Number of covariates considered by skater and provided in data n_count <- nrow(shape2) # Node count aic <- (n_count * log(sst)/(n_count) + 2.0 * cov_count) aicc <- aic + 2.0 * cov_count * (cov_count + 1.0)/(n_count - cov_count - 1.0)

I'm not sure about this anymore...

# Calinski-Harabasz pseudo F-statistic nc <- max(res1$groups) n <- nrow(shape2) fstat = (R2 / (nc - 1)) / ((1 - R2) / (n - nc))

`It will be useful to consider the function index.G1 from the clusterSim`

`package.`

# Review print(c(aic, aicc, fstat, R2)) Q2 ------------------------------ Define "not.prune" and "candidates"For example, are candidates a list of cluster groups that arestatistically significant while not.prune is a list of nodes that didnot get assigned to a group. I have not been able to locate enoughdocumentation on these objects and I am not sure how to interpret.

`No. We haven't considered any kind of statistical test. As I mentioned`

`above, the not.prune are those that doesn't matches the criteria (about`

`size of the cluster).`

Elias

Thank you for your assistance, Mike

-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 91 00 e-mail: roger.biv...@nhh.no http://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en http://depsy.org/person/434412 _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo