Re: [R] Finding pairs with least magnitude difference from mean

2011-03-01 Thread rex.dwyer
No, that's not what I meant, but maybe I didn't understand the question.
What I suggested would involve sorting y, not x: sort the *distances*.
If you want to minimize the sd of a subset of numbers, you sort the numbers and 
find a subset that is clumped together.
If the numbers are a function of pairs, you compute the function for all pairs 
of numbers, and find a subset that's clumped together.
Anyway, it's an idea, not a theorem, so proof is left as an exercise for the 
esteemed reader.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Hans W Borchers
Sent: Monday, February 28, 2011 2:17 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] Finding pairs with least magnitude difference from mean

 rex.dwyer at syngenta.com writes:

 James,
 It seems the 2*mean(x) term is irrelevant if you are seeking to
 minimize sd. Then you want to sort the distances from smallest to
 largest. Then it seems clear that your five values will be adjacent in
 the list, since if you have a set of five adjacent values, exchanging
 any of them for one further away in the list will increase the sd. The
 only problem I see with this is that you can't use a number more than
 once. In any case, you need to compute the best five pairs beginning
 at position i in the sorted list, for 1=i=choose(n,2), then take the
 max over all i.
 There no R in my answer such as you'd notice, but I hope it helps just
 the same.
 Rex

You probably mean something like the following:

x - rnorm(10)
y - outer(x, x, +) - (2 * mean(x))

o - order(x)
sd(c(y[o[1],o[10]], y[o[2],o[9]], y[o[3],o[8]], y[o[4],o[7]], y[o[5],o[6]]))

This seems reasonable, though you would have to supply a more stringent
argument. I did two tests and it works alright.

--Hans Werner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding pairs with least magnitude difference from mean

2011-02-28 Thread rex.dwyer
James,
It seems the 2*mean(x) term is irrelevant if you are seeking to minimize sd.  
Then you want to sort the distances from smallest to largest.  Then it seems 
clear that your five values will be adjacent in the list, since if you have a 
set of five adjacent values, exchanging any of them for one further away in the 
list will increase the sd.  The only problem I see with this is that you can't 
use a number more than once.  In any case, you need to compute the best five 
pairs beginning at position i in the sorted list, for 1=i=choose(n,2), then 
take the max over all i.
There no R in my answer such as you'd notice, but I hope it helps just the same.
Rex

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Hans W Borchers
Sent: Saturday, February 26, 2011 6:43 AM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] Finding pairs with least magnitude difference from mean

 I have what I think is some kind of linear programming question.
 Basically, what I want to figure out is if I have a vector of numbers,

  x - rnorm(10)
  x
  [1] -0.44305959 -0.26707077  0.07121266  0.44123714 -1.10323616
 -0.19712807  0.20679494 -0.98629992  0.97191659 -0.77561593

  mean(x)
 [1] -0.2081249

 Using each number only once, I want to find the set of five pairs
 where the magnitude of the differences between the mean(x) and each
 pairs sum is least.

  y - outer(x, x, +) - (2 * mean(x))

 With this matrix, if I put together a combination of pairs which uses
 each number only once, the sum of the corresponding numbers is 0.

 For example, compare the SD between this set of 5 pairs
  sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
 [1] 1.007960

 versus this hand-selected, possibly lowest SD combination of pairs
  sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
 [1] 0.2367030

Your selection is not bad, as only about 0.4% of all possible distinct
combinations have a smaller value -- the minimum is 0.1770076, for example
[10 7 9 5 8 4 6 2 3 1].

(1) combinat() from the 'combinations' package seems slow, try instead the
permutations() function from 'e1071'.

(2) Yes, except your vector is getting much larger in which case brute force
is no longer feasible.

(3) This is not a linear programming, but a combinatorial optimization task.
You could try optim() with the SANN method, or some mixed-integer linear
program (e.g., lpSolve, Rglpk, Rsymphony) by intelligently using binary
variables to define the sets.

This does not mean that some specialized approach might not be more
appropriate.

--Hans Werner

 I believe that if I could test all the various five pair combinations,
 the combination with the lowest SD of values from the table would give
 me my answer.  I believe I have 3 questions regarding my problem.

 1) How can I find all the 5 pair combinations of my 10 numbers so that
 I can perform a brute force test of each set of combinations?  I
 believe there are 45 different pairs (i.e. choose(10,2)). I found
 combinations from the {Combinations} package but I can't figure out
 how to get it to provide pairs.

 2) Will my brute force strategy of testing the SD of each of these 5
 pair combinations actually give me the answer I'm searching for?

 3) Is there a better way of doing this?  Probably something to do with
 real linear programming, rather than this method I've concocted.

 Thanks for any help you can provide regarding my question.

 Best regards,

 James


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




message may contain confidential information. If you are not the designated 
recipient, please notify the sender immediately, and delete the original and 
any copies. Any use of the message by you is prohibited. 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding pairs with least magnitude difference from mean

2011-02-28 Thread Hans W Borchers
 rex.dwyer at syngenta.com writes:

 James,
 It seems the 2*mean(x) term is irrelevant if you are seeking to
 minimize sd. Then you want to sort the distances from smallest to
 largest. Then it seems clear that your five values will be adjacent in
 the list, since if you have a set of five adjacent values, exchanging
 any of them for one further away in the list will increase the sd. The
 only problem I see with this is that you can't use a number more than
 once. In any case, you need to compute the best five pairs beginning
 at position i in the sorted list, for 1=i=choose(n,2), then take the
 max over all i.
 There no R in my answer such as you'd notice, but I hope it helps just
 the same.
 Rex

You probably mean something like the following:

x - rnorm(10)
y - outer(x, x, +) - (2 * mean(x))

o - order(x)
sd(c(y[o[1],o[10]], y[o[2],o[9]], y[o[3],o[8]], y[o[4],o[7]], y[o[5],o[6]]))

This seems reasonable, though you would have to supply a more stringent
argument. I did two tests and it works alright.

--Hans Werner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding pairs with least magnitude difference from mean

2011-02-26 Thread Hans W Borchers
 I have what I think is some kind of linear programming question.
 Basically, what I want to figure out is if I have a vector of numbers,
 
  x - rnorm(10)
  x
  [1] -0.44305959 -0.26707077  0.07121266  0.44123714 -1.10323616
 -0.19712807  0.20679494 -0.98629992  0.97191659 -0.77561593
 
  mean(x)
 [1] -0.2081249
 
 Using each number only once, I want to find the set of five pairs
 where the magnitude of the differences between the mean(x) and each
 pairs sum is least.
 
  y - outer(x, x, +) - (2 * mean(x))
 
 With this matrix, if I put together a combination of pairs which uses
 each number only once, the sum of the corresponding numbers is 0.
 
 For example, compare the SD between this set of 5 pairs
  sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
 [1] 1.007960
 
 versus this hand-selected, possibly lowest SD combination of pairs
  sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
 [1] 0.2367030

Your selection is not bad, as only about 0.4% of all possible distinct
combinations have a smaller value -- the minimum is 0.1770076, for example
[10 7 9 5 8 4 6 2 3 1].

(1) combinat() from the 'combinations' package seems slow, try instead the
permutations() function from 'e1071'. 

(2) Yes, except your vector is getting much larger in which case brute force
is no longer feasible.

(3) This is not a linear programming, but a combinatorial optimization task.
You could try optim() with the SANN method, or some mixed-integer linear
program (e.g., lpSolve, Rglpk, Rsymphony) by intelligently using binary
variables to define the sets.

This does not mean that some specialized approach might not be more
appropriate.

--Hans Werner

 I believe that if I could test all the various five pair combinations,
 the combination with the lowest SD of values from the table would give
 me my answer.  I believe I have 3 questions regarding my problem.
 
 1) How can I find all the 5 pair combinations of my 10 numbers so that
 I can perform a brute force test of each set of combinations?  I
 believe there are 45 different pairs (i.e. choose(10,2)). I found
 combinations from the {Combinations} package but I can't figure out
 how to get it to provide pairs.
 
 2) Will my brute force strategy of testing the SD of each of these 5
 pair combinations actually give me the answer I'm searching for?
 
 3) Is there a better way of doing this?  Probably something to do with
 real linear programming, rather than this method I've concocted.
 
 Thanks for any help you can provide regarding my question.
 
 Best regards,
 
 James


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding pairs with least magnitude difference from mean

2011-02-25 Thread jctoll
Hi,

I have what I think is some kind of linear programming question.
Basically, what I want to figure out is if I have a vector of numbers,

 x - rnorm(10)

 x
 [1] -0.44305959 -0.26707077  0.07121266  0.44123714 -1.10323616
-0.19712807  0.20679494 -0.98629992  0.97191659 -0.77561593

 mean(x)
[1] -0.2081249

Using each number only once, I want to find the set of five pairs
where the magnitude of the differences between the mean(x) and each
pairs sum is least.

 y - outer(x, x, +) - (2 * mean(x))
 y
 [,1][,2][,3][,4]   [,5]
 [,6]   [,7]   [,8]  [,9]   [,10]
 [1,] -0.46986936 -0.29388054  0.04440289  0.41442737 -1.1300459
-0.22393784  0.1799852 -1.0131097 0.9451068 -0.80242569
 [2,] -0.29388054 -0.11789173  0.22039171  0.59041619 -0.9540571
-0.04794902  0.3559740 -0.8371209 1.1210956 -0.62643688
 [3,]  0.04440289  0.22039171  0.55867514  0.92869962 -0.6157737
0.29033441  0.6942574 -0.4988374 1.4593791 -0.28815345
 [4,]  0.41442737  0.59041619  0.92869962  1.29872410 -0.2457492
0.66035889  1.0642819 -0.1288130 1.8294035  0.08187104
 [5,] -1.13004593 -0.95405711 -0.61577368 -0.24574920 -1.7902225
-0.88411441 -0.4801914 -1.6732863 0.2849302 -1.46260226
 [6,] -0.22393784 -0.04794902  0.29033441  0.66035889 -0.8841144
0.02199368  0.4259167 -0.7671782 1.1910383 -0.55649417
 [7,]  0.17998518  0.35597399  0.69425742  1.06428191 -0.4801914
0.42591670  0.8298397 -0.3632552 1.5949614 -0.15257116
 [8,] -1.01310969 -0.83712087 -0.49883744 -0.12881296 -1.6732863
-0.76717817 -0.3632552 -1.5563500 0.4018665 -1.34566603
 [9,]  0.94510682  1.12109563  1.45937907  1.82940355  0.2849302
1.19103834  1.5949614  0.4018665 2.3600830  0.61255048
[10,] -0.80242569 -0.62643688 -0.28815345  0.08187104 -1.4626023
-0.55649417 -0.1525712 -1.3456660 0.6125505 -1.13498203

With this matrix, if I put together a combination of pairs which uses
each number only once, the sum of the corresponding numbers is 0.

For example, compare the SD between this set of 5 pairs
 y[10,1] + y[9,2] + y[8,3] + y[7,4] + y[6,5]
[1] 0
 sum(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
[1] 5.551115e-17# basically 0, I assume this is round-off error
 mean(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
[1] 1.111307e-17# basically 0, I assume this is round-off error
 sd(c(y[10,1], y[9,2], y[8,3], y[7,4], y[6,5]))
[1] 1.007960

versus this hand-selected, possibly lowest SD combination of pairs
 sum(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
[1] -1.665335e-16   # basically 0, I assume this is round-off error
 mean(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
[1] -3.330669e-17   # basically 0, I assume this is round-off error
 sd(c(y[3,1], y[6,2], y[10,4], y[9,5], y[8,7]))
[1] 0.2367030

I believe that if I could test all the various five pair combinations,
the combination with the lowest SD of values from the table would give
me my answer.  I believe I have 3 questions regarding my problem.

1) How can I find all the 5 pair combinations of my 10 numbers so that
I can perform a brute force test of each set of combinations?  I
believe there are 45 different pairs (i.e. choose(10,2)). I found
combinations from the {Combinations} package but I can't figure out
how to get it to provide pairs.

2) Will my brute force strategy of testing the SD of each of these 5
pair combinations actually give me the answer I'm searching for?

3) Is there a better way of doing this?  Probably something to do with
real linear programming, rather than this method I've concocted.

Thanks for any help you can provide regarding my question.

Best regards,

James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.