On Thu, 12 Nov 2009, Ravi Varadhan wrote:


Hi,

I have a complex-valued vector X in C^n. Given another complex-valued vector Y in C^n, I want to find a permutation of Y, say, Y*, that minimizes ||X - Y*||, the distance between X and Y*.

Note that this problem can be trivially solved for "Real" vectors, since real numbers possess the ordering property. Complex numbers, however, do not possess this property. Hence the challenge.

The obvious solution is to enumerate all the permutations of Y and pick out the one that yields the smallest distance. This, however, is only feasible for small n. I would like to be able to solve this for n as large as 100 - 1000, in which cases the permutation approach is infeasible.

I am looking for algorithms, possibly iterative, that can provide a "good" approximate solutions that are not necessarily optimal for high-dimensional vectors. I can do random sampling, but this can be very inefficient in high-dimensional problems. I am looking for efficient algorithms because this step has to be performed in each iteration of an "outer" algorithm.

Are there any clever adaptive algorithms out there?


I do not know.

But would you settle for a not-so-clever adaptive heuristic?

If so, see below.



Here is an example illustrating the problem:

require(e1071)

n <- 8
x <- runif(n) + 1i * runif(n)
y <- runif(n) + 1i * runif(n)

dist <- function(x, y) sqrt(sum(Mod(x - y)^2))

perms <- permutations(n)
dim(perms)  # [1] 40320     8
tmp <- apply(perms, 1, function(ord) dist(x, y[ord]))
z <- y[perms[which.min(tmp), ]]  # exact solution
dist(x, z)

# an aproximate random-sampling approach
nsamp <- 10000
perms <- t(replicate(nsamp, sample(1:n, size=n, replace=FALSE)))
tmp <- apply(perms, 1, function(ord) dist(x, y[ord]))
z.app <- y[perms[which.min(tmp), ]]  # approximate solution
dist(x, z.app)


The heuristic is to use a stochastic greedy updates. Here is a very simple one:

swap.samp <-
 function(index) {
        sub.ind <- sample(seq(along=index),2)
        index[sub.ind]<- rev(sub.ind)
        index
 }


z.app <- y
z.cand <- y

for (i in 1:100) z.cand <-
        if( dist(x,z.app) < dist(x,z.cand) ) {

                z.app[swap.samp(1:8)]

        } else {
        z.app <- z.cand
        z.cand[swap.samp(1:8)]
        }

On your toy example, this usually finds the min(dist(x,z.app)) in < 100 trials.

Note that when

        z.diff <- z.app != z.cand

        dist(x[ z.diff ],z.app[ z.diff ])^2 - dist(x[ z.diff ],z.cand[ z.diff 
])^2

equals

        dist(x,z.app)^2 - dist(x,z.cand)^2

so you could vectorize the above to randomly pair up all the points (for n even) then update the n%/%2 pairs all at once.


For large problems, you might try to preferentially sample pairs of points with similar values of Mod ( x - z.app ) to begin with. The heuristic being that in two pairs of points that are both distant, swapping them might realize a bigger benefit.


--

Another version is to sample k points, randomly permute, then do a greedy update:

sub.samp <-
 function(index,n) {
        sub.ind <- sample(seq(along=index),n)
        index[sub.ind]<- sample(sub.ind)
        index
}


# k is 4 here

for (i in 1:100) z.cand <-
        if( dist(x,z.app) < dist(x,z.cand) ){
                z.app[sub.samp(1:8,4)]
        } else {
                z.app <- z.cand
                z.cand[sub.samp(1:8,4)]
        }


On your toy problem, this takes in the low 100's to find the min.

I think you can do the similar vectorization trick here.
--

I suppose another variant would repeatedly sample k points, then enumerate distances for all permutations of them, then choose the best one to update z.app.

Again, in larger problems, one might weight those k points towards higher values of Mod( x - z.app ) at least to begin with.

HTH,

Chuck


Thanks for any help.

Best,
Ravi.


____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to