[R] Simplify iterative programming
Dear, I am looking for the simplification of a formula to improve the calculation speed of my program. Therefore I want to simplify the following formula: H = sum{i=0..n-1 , [ sum {j=0..m-1 , sqrt ( (Ai - Bj)^2 + (Ci - Dj)^2) } ] } where: A, C = two vectors (with numerical data) of length n B, D = two vectors (with numerical data) of length m sqrt = square root Ai = element of A with index i Bj = element of B with index j Ci = element of C with index i Dj = element of C with index j I am calculating H in a merging process so A, B will merge in step 2 into A' and C, D into C': A' = {A,B} : vector of length (n+m) C' = {C,D} : vector of length (n+m) Then again I will calculate H with two new vectors X and Y (of length p): H = sum{i=0..n+m-1 , [ sum {j=0..p-1 , sqrt ( (A'i - Xj)^2 + (C'i - Yj)^2) } ] } These steps are iterated in a loop with always new vectors (e.g. X and Y) Now I'am looking for a simplication of H in order to avoid long calculation time. I know a computional simplified formula exists for the standard deviation (sd) that is much easier in iterative programming. Therefore I wondered I anybody knew about analog simplifications to simplify H: sd = sqrt [ sum{i=0..n-1, (Xi - mean(X) )^2 ) /n } ] - simplified computation - sqrt [ (n * sum{i=0..n-1, X^2 } ) - ( sum{i=0..n-1, X } ^2 ) / n^2 ] This simplied formula is much easier in iterative programming, since I don't have to keep every element of X . For example if we want to calculate sd of A' with the vectors A and C: sd(A') = sqrt [ ((n+m) * sum{i=0..n+m-1, A'^2 } ) - ( sum{i=0..n+m-1, A' } ^2 ) / (n+m)^2 ] = sqrt [ ((n+m)* (sum{i=0..n, A^2 } + sum{i=0..m, C^2 } ) ) - ( ( sum{i=0..n-1, A } + sum{i=0..m-1, C } )^2 ) / (n+m)^2 ] The advantage of this formula is, that I don't have to keep every value of A and C to calculate sd(A'). I can do the following replacements: sum{i=0..n, A^2 } = A2 sum{i=0..m, C^2 } = C2 sum{i=0..n-1, A } = A3 sum{i=0..m-1, C } = C3 So sd(A')= sqrt [ ( (n+m)*(A2+ C2) ) - ( (A3 + C3)^2 ) / (n+m)^2 ] In this way my computation intensive calculation is replaced by a calculation of simple numbers. Can anybody help me to do something comparable for H? Any other help to calculate H easily in an iterative process is also welcome! Kind regards, Stef Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Simplify formula for iterative programming
Dear R-ians, I am looking for the simplification of a formula to improve the calculation speed of my program. Therefore I want to simplify the following formula: H = Si (Sj ( sqrt [ (Ai - Aj)² + (Bi - Bj)² ] ) ) where: A, B = two vectors (with numerical data) of length n sqrt = square root Si = summation over i (= 0 to n) Sj = summation over j (= 0 to n) Ai = element of A with index i Aj = element of A with index j Bi = element of B with index i Bj = element of B with index j n is not fixed, but it changes with every run for my program. Therefore for I am looking for a simplication of h in order to calculate it when my A and B get extendend by 1 element (n = n + 1). I know a computional simplified formula exists for the standard deviation (sd) that is much easier in iterative programming. Therefore I wondered I anybody knew about analog simplifications to simplify H: sd = sqrt [ ( Si (Xi - mean(X) )² ) /n ] - simplified computation - sqrt [ (n * Si( X² ) - ( Si( X ) )² )/ n² ] This simplied formula is much easier in iterative programming, since I don't have to keep every element of X. E.g.: I have a vector X[1:10] and I already have caculated Si( X[1:10]² ) (I will call this A) and Si( X ) (I will call this B). When X gets extendend by 1 element (eg. X[11]) it easy fairly simple to calculate sd(X[1:11]) without having to reuse the elements of X[1:10]. I just have to calculate: sd = sqrt [ (n * (A + X[11]²) - (A + X[11]²)² ) / n² ] This is fairly easy in an iterative process, since before we continue with the next step we set: A = (A + X[11]²) B = (B + X[11]) Can anybody help me to do something comparable for H? Any other help to calculate H easily in an iterative process is also welcome! Kind regards, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simplify formula for iterative programming
Dear Christoph, Thanks for your help! I checked it in R and it works if we extend a and b with one element for each run. Unfortunately, I actually want to merge two vectors and then calculate the H for the merge. It is consequently no addition of 1 element but an addition of x elements. I did not mention it in my first post in order not to make it too complicated. The second problem that still remains is that I have to keep the original values of my original vectors. In the example I gave with sd only a calculated value and the new value are needed. The iterative process does not the need previously processed values of A anymore. I hoped something comaprable was possible for H. Thanks anyway for your help! Kind regards, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simplify formula for heterogeneity
Thank you very much Ted! I have been looking at your simplification for more then an hour, but I don't see how you did it. Could you perhaps, if it is not to much work, explain me how you reduced H? It would help me to understand what I am realy doing. Looking at the result, it seems indeed that H does add more information than sd already did. Intuitively I thought the square of the sum of all possible differences would not be related to the standard deviation. Looking at your result it seems it is related by a factor sqrt(2*(n-1)) so there is no special point in calculating H and I know I cannot trust my intuition anymore. Thanks again! Kind regards, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Simplify formula for heterogeneity
Dear R-ians, I'm looking for a computational simplified formula to calculate a measure for heterogeneity (let's say H ): H = sqrt [ (Si (Sj (Xi - Xj)² ) ) /n ] where: sqrt = square root Si = summation over i (= 0 to n) Sj = summation over j (= 0 to n) Xi = element of X with index i Xj = element of X with index j I can simplify the formula to: H = sqrt [ ( 2 * n * Si (Xi) - 2 Si (Sj ( Xi * Xj)) ) / n] Unfortunately this formula stays difficult in iterative programming, because I have to keep every element of X to calculate H. I know a computional simplified formula exists for the standard deviation (sd) that is much easier in iterative programming. Therefore I wondered I anybody knew about analog simplifications to simplify H: sd = sqrt [ ( Si (Xi - mean(X) )² ) /n ] - simplified computation - sqrt [ (n * Si( X² ) - ( Si( X ) )² )/ n² ] This simplied formula is much easier in iterative programming, since I don't have to keep every element of X. E.g.: I have a vector X[1:10] and I already have caculated Si( X[1:10]² ) (I will call this A) and Si( X ) (I will call this B). When X gets extendend by 1 element (eg. X[11]) it easy fairly simple to calculate sd(X[1:11]) without having to reuse the elements of X[1:10]. I just have to calculate: sd = sqrt [ (n * (A + X[11]²) - (A + X[11]²)² ) / n² ] This is failry easy in an iterative process, since before we continue with the next step we set: A = (A + X[11]²) B = (B + X[11]) Can anybody help me to do something comparable for H? Any other help to calculate H easily in an iterative process is also welcome! Thanx in advance! Kind regards, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Princomp and calculations of original values
Dear R-ians, I am working with princomp and I now want to manually recalculate my original values. I want to do it to completely understand the procedure of principal components. I tried it with a test data set (2 dimensions) and I was able to calculate my original values (of a random point of my dataset) using the output of princomp: test.data$scores[1,]%*%matrix(data=test.data$loadings,ncol=2)+test.data$center it seemed that result is identical with test[1,] When I tried it with my 4 dimensional real dataset, it did not work: pca.data$scores[1,]%*%matrix(data=pca.data$loadings,ncol=4)+pca.data$center is not equal to data[1,] Do I misunderstand the procedure somewhere, or what else do I do wrong? Thanx in advance and kind regards, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Maximize volume of cloud
Dear R-ians, I have a datamatrix X of nrows (large) and 3 colums. The 3 columns represent the coordinates of a cloud in a 3 dimensional space. Now I want to find the extremes of my cloud by using the N-FINDR algoritm. This algorithm consists of maximizing the volume between tree points. Therefore I need an optimaztion script, but I don't have any idea how to program it. According to literature the maximization of the volume can be replaced by maximizinng the determinant of E, where: E-matrix(c(1,e1,1,e2,1,e3), ncol=3) and e1 to e3 are initially column vectors of the coordinates of random points. The procedure works by replacing an existing e1, e2 or e2 with another trial point. If the replacement results in an increase of determinant(E), the newly chosen pixel will replace the previous pixel. The procedure has to be repeated till there are no possible replacements left. In this way I get the coordinates of my extremes by e1 to e3. Anyone any idea how to program this? Kind regards and thanx in advance! Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Optimization of constrained linear least-squares problem
Thanx Dimitris, Patrick and Berwin! For other people interested in this problem, here are two valid solutions that work. 1) Re-parameterize e.g., EM - c(100,0,0,0,100,0,0,0,100) W - array(EM, c(3,3)) d - c(10, 20, 70) fn - function(x){ x - exp(x) / sum(exp(x)) r - W%*%x - d crossprod(r, r)[1,1] } opt - optim(rnorm(3), fn) res - exp(opt$par) / sum(exp(opt$par)) res The first line of the `fn()' function is just a re-pameterization of your problem, i.e., if `y' is a vector of real numbers, then it is straightforward to see that `x = exp(y) / sum(exp(y))' will be real numbers in (0, 1) for which `sum(y)=1'. So instead of finding xs that minimize your function under the constraint (which is more difficult) you just find the ys using the above transformation. (I owe you a drink Dimitris !!!) 2) Or minimize it as a quadratic function under a linear constraint: EM - c(100,0,0,0,100,0,0,0,100) W - array(EM, c(3,3)) d - c(10, 20, 70) library(quadprog) Dmat - crossprod(W,W) dvec - crossprod(d,W) A -matrix(c(1,1,1),ncol=1) bvec - 1 solve.QP(Dmat, dvec, A, bvec, meq=1) This is based on the objective function (i.e. the thing you want to minimise) : min x'C'Cx - 2 d'Cx + d'd where sum(x) = 1 (Thanx Berwin!!) Kind regards, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Optimization of constrained linear least-squares problem
Dear R-ians, I want to perform an linear unmixing of image pixels in fractions of pure endmembers. Therefore I need to perform a constrained linear least-squares problem that looks like : min || Cx - d || ² where sum(x) = 1. I have a 3x3 matrix C, containing the values for endmembers and I have a 3x1 column vector d (for every pixel in the image). In theory my x values should all be in the (0,1) interval but I don't want to force them so I can check the validity of my solution. I just want to calculate the x values. Can anyone help me with this problem? I've been checking the optim, optimize, constrOptim and nlm help files, burt I don't understand it very well. Wich function should I use for my problem? I did a first test using optim: # Make my C matrix EM- c(4.5000,6.,10.5000,5.,27.,20.7500,16.7500,23.,38.7500) C - array(EM, c(3,3)) # Take an arbitrary d d-c(10, 20, 20) # Define the function fr - function(x) { C[1,]*x=d C[2,]*x=d C[3,]*x=d sum(x)=1} # Perform the optimization optim(c(0.25,0.25,0.25),fr) But it did not work. I got the eror couldn't find function. Can anyone tell me what functyion I should use for my problem and how should I program it? Thanx in advance, Stef __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Multi-figure plotting
Dear R-ians, I have a question concerning plotting different plots on one figure. I have written a script to plot an image, a legend (based on different rectangles) and a timeseries plot on one figure. In my R-lagnuage it looks like this (without arguments that are not usefull for my question): #--- I first define the functions image.data-function(...){ #--Plot the data par(fig=c(0,1,0.25,1),new=TRUE) par(pin=c(1.5,(num.y/num.x*1.5))) image(c(1:(num.x+1)),c(1:(num.y+1)),output.matrix,xlab=X coord,ylab=Y coord,main='NDVI',las =1,cex.axis=0.7, cex.lab=0.7, col=topo.colors(255), asp=1) #--- Add a legend (colored rectangles) + legend axis (=values) par(fig=c(0,1,0.3,0.4),new=TRUE) plot.window(xlim = c(0,1), ylim = c(0,1), yaxs=i) rect((1:254)/255,0,(2:255)/255,0.1,col=topo.colors(255),border=FALSE) axis(1,padj=-1,cex.axis=0.7) ... } profile.data-function(...){ ... #--Plot the data par(fig=c(0,1,0,0.3),new=TRUE) par(pin=c(1.5,num.y/num.x*0.5)) plot(NDVI, type=l,cex.axis=0.7, cex.lab=0.7) ... } When I subsequently use the functions I get problems with the legend. It plots my image nice in the upper part of my pixture, adds the timeseries below, but it does not plot the rectangles of the legend. I don't know what I do wrong, because it plots perfectly the axis of the legend, but the colorbar is invisible. All suggestions would be wonderfull! Thanx in advance, Stef __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html