No problem John, thanks for your help, and also thanks to Dan and Patrick. Wasn't able to read or try anybody's suggestions yesterday. Here's what I've discovered in the meantime:
What I did not include yesterday is that my original data frame, called "data", was this: X Y V3 1 1 1 0.000000 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 7 2 2 0.000000 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 13 3 3 0.000000 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 19 4 4 0.000000 20 5 4 5.385165 21 1 5 5.000000 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 25 5 5 0.000000 To this data frame I applied the following command: data <- data[data$V3 >0,];data #to remove all rows where V3 = 0 giving me this (the point from which I started yesterday): X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.000000 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 So far so good. But when I then submit the command > data = data[X>Y,] #to select all rows where X > Y I get the problem result already mentioned, namely: X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 which is clearly wrong! It doesn't matter if I give a new name to the data frame at each step or not, or whether I use the name "data" or not. It always gives the same wrong answer. However, if I instead use the command: subset(data, X>Y), I get the right answer, namely: X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 14 4 3 8.062258 15 5 3 5.099020 20 5 4 5.385165 OK so the lesson so far is "use the subset function". But here it gets weirder. If I instead go straight from the initial data frame ("data", given at the top of this post), selecting only rows where X>Y (without the intermediate step of removing rows with V3 = 0, which although is unnecessary in getting the result I want, is very relevant to the larger issue here), by using the command that caused me the original trouble (data = data[X>Y,]), I get the RIGHT answer (the data frame just above). The subset function also gives the right answer. Now what in the world is going on? This kind of thing scares me. Below is the full set of commands starting from scratch: #Point of the following is to measure the pairwise euclidean distances between 5 objects, each having X and Y coordinates #and put them into data frame format that labels each pair and gives the distance between them d = data.frame(x=sample(1:10, 5), y=sample(1:10, 5)) #create a sample data set ss2 = as.data.frame(as.matrix(dist(d))) #create a data.frame to extract row and column names X = rep(seq(1:length(row.names(ss2))), length(names(ss2))) #make a vector containing the X coordinate names Y = rep(seq(1:length(names(ss2))), length(row.names(ss2))) #the same for Y Y = sort(Y) #first sort coords = cbind(X, Y);rm(X,Y) #then cbind and remove X and Y data1 = as.data.frame(cbind(coords, as.vector(as.matrix(dist(d)))));rm(coords) # column bind the 3 vectors data2 = data1[data1$V3 >0,] #remove those with V3 = 0 (= the original matrix diagonal) data3 = data2[X>Y,] #remove duplicates from original distance matrix data1;data2;data3 Thoughts much appreciated. Thanks. Jim Bouldin > > Clearly I was more tired than I realised last night. :( My appologies. > > In any case with the data.frame name changed to xx this seems to give you > what you want > > subset(xx, xx[,1] > xx[,2]) > > or using the data name > subset(data, data[,1] > data[,2]) > should work as well ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.