I am looking for simple introduction to cluster analysis using R, that would be understandable to a novice in statistics. Or, could someone perhaps help me understand how to proceed in my analysis? I am very new to both statistics and R, but am trying hard to avoid having to use SPSS as everyone around me...
I have dataset on people presenting their opinions on different religious communities coded on 5 point scale, and I want to see if those communities can be grouped (clustered) in some way that would be illuminatin for my research purposes. So, I have data that looks like this: > describe(R12) R12 18 Variables 1035 Observations --------------------------------------------------------------------------- R12.1 n missing unique 416 619 5 More negative (51, 12%), More positive (112, 27%) Completely negative (41, 10%), Completely positive (23, 6%) Neutral (189, 45%) <skip> R12.12 n missing unique 451 584 5 More negative (111, 25%), More positive (43, 10%) Completely negative (79, 18%), Completely positive (5, 1%) Neutral (213, 47%) <and so on> So you can see there is a lot (more than half) at times NA's in this questionnairre. Here is also a correlation matrix (only part is displayed): > x=cor(R12, use="pairwise.complete.obs") > round(x,2) R12.1 R12.2 R12.3 R12.4 R12.5 R12.6 R12.7 R12.8 R12.9 R12.10 R12.11 R12.1 1.00 0.57 0.57 0.61 0.57 0.48 0.43 0.38 0.52 0.58 0.58 R12.2 0.57 1.00 0.82 0.78 0.73 0.62 0.43 0.49 0.64 0.69 0.75 R12.3 0.57 0.82 1.00 0.89 0.90 0.73 0.54 0.57 0.70 0.77 0.78 R12.4 0.61 0.78 0.89 1.00 0.91 0.68 0.51 0.56 0.65 0.80 0.76 R12.5 0.57 0.73 0.90 0.91 1.00 0.73 0.53 0.55 0.68 0.78 0.74 R12.6 0.48 0.62 0.73 0.68 0.73 1.00 0.59 0.62 0.68 0.79 0.78 R12.7 0.43 0.43 0.54 0.51 0.53 0.59 1.00 0.62 0.55 0.65 0.65 R12.8 0.38 0.49 0.57 0.56 0.55 0.62 0.62 1.00 0.55 0.65 0.62 R12.9 0.52 0.64 0.70 0.65 0.68 0.68 0.55 0.55 1.00 0.79 0.82 R12.10 0.58 0.69 0.77 0.80 0.78 0.79 0.65 0.65 0.79 1.00 0.88 R12.11 0.58 0.75 0.78 0.76 0.74 0.78 0.65 0.62 0.82 0.88 1.00 R12.12 0.47 0.59 0.64 0.65 0.60 0.61 0.56 0.50 0.68 0.77 0.83 R12.13 0.62 0.69 0.77 0.70 0.74 0.76 0.65 0.61 0.78 0.81 0.82 R12.14 0.58 0.70 0.71 0.75 0.70 0.74 0.64 0.62 0.78 0.86 0.86 R12.15 0.58 0.61 0.72 0.72 0.71 0.72 0.64 0.59 0.73 0.83 0.79 R12.16 0.56 0.67 0.77 0.72 0.78 0.75 0.57 0.54 0.75 0.85 0.80 R12.17 0.61 0.69 0.79 0.77 0.75 0.73 0.56 0.57 0.74 0.82 0.80 R12.18 0.63 0.73 0.84 0.82 0.83 0.71 0.54 0.64 0.68 0.71 0.74 so you can see there is a lot of correlation in the opinions. I doubt clusterization would be meaningfull, but I still want to try. How do I proceed with this? -- Donatas Glodenis ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.