On 4/11/2011 6:08 AM, Viktor Cerovski wrote:
>>> After some quick tries with J, I got a bunch of 123 123 123 124
>>> >>  distributions, and they are all mutually different.
>>> >>  Here is one of them:
>>> >>
>>> >>     conf
>>> >>    0  0  2  3 1 0 3  3  0  2  1 1 2  1  3 1 2 2  1  2
>>> >>  23 43 12 54 7 3 5 10 54 55 26 9 9 43 54 1 8 6 38 33
>>> >>
>>> >>     +//./conf
>>> >>  123 123 123 124
>>> >>
>>> >>     </./conf
>>> >>  ┌──────────┬──────────────┬──────────┬──────────────┐
>>> >>  │23 43 3 54│12 55 9 8 6 33│54 5 10 54│7 26 9 43 1 38│
>>> >>  └──────────┴──────────────┴──────────┴──────────────┘
>>> >
> I haven't given any method.  Which size of the problem
> would you consider nontrivial?
>
Very nice, Victor, I would like to understand your conf function in more 
detail.

I am hoping to come up with a general solution for any number of masses 
and buckets. Unfortunately, as the number of masses and buckets 
increase, a brute force attack on the problem rapidly gets out of hand. 
For example, let's propose a bit larger problem. A problem of say, 200 
masses and 10 buckets would generate very large lists of possible 
combinations.  "20 comb 200" runs out of memory on my machine. I would 
consider this a non-trivial problem.

Let's create a random set of 200 masses. We can make the list of masses 
repeatable for everyone, by setting the random link:

    setrl 100
    m =. ? 200 # 1000
    m
440 512 395 799 175 471 872 938 358 980 394 453 378 858 574 74 .......

So now we have 200 masses that need to be distributed among 10 buckets. 
Keep in mind that this generation process repeatedly selects from the 
same 1000 integers with replacement, so we can have multiple instances 
of the same mass value in m. In fact, the masses that occur more than 
once in m are:

(0 = ~: m) # m
147 558 440 820 394 994 292 574 986 839 523 742 540 802 158 512 329 759 
717 725 455 476 430 599

A general solution needs to have a way to reduce the number of potential 
mass-group candidates. In order to minimize the mass differences between 
buckets,it is clear that the total mass in each bucket must be fairly 
near (+/m)%n  where n is the number of buckets. I call this number the 
"bucket mean", or bm. This fact alone drastically reduces the number of 
subsets of m that need to be examined to solve this problem. It is also 
clear that the sum of masses in each bucket must be an integer. So we 
need to reduce our search space using these facts.

The mass in each of our buckets must each be close to the bucket mean, 
in order to minimize the mass difference between buckets. In our 
specific problem, since we have 200 randomly selected numbers from 
0-999, so +/m should be around 100,000. We would expect our bm to be 
near 100,000 divided by 20 or 5000, and it is. bm = 5273.4 for our 200 
randomly-generated masses.

The ideal solution to the problem would be when the n buckets each have 
exactly bm mass in them.  However, bm will not necessarily be an 
integer, which is the likely case, yet the bucket sums must be an 
integer. Also, there may not be n exclusive subsets of m that sum 
exactly to bm. By "exclusive subsets", I mean n subsets of m that in 
total contain the exact set of m, with no additions or exclusions. So 
the problem boils down to finding n exclusive subsets of m, where each 
of the n subsets sums to an integer _near_ to bm. The n exclusive 
subsets of m whose absolute differences from bm are minimized (closest 
to bm), will be the answer. If the n subsets of m are truly exclusive, 
then by definition, the total of the n exclusive subsets must sum to +/m.

The initial process step, and the real key to solving this problem, is 
to efficiently find the subsets of m where the sum of each subset is 
very near to bm. An ideal search algorithm would find subsets closest to 
bm first, and discover subsets with more distance from bm later. How far 
away from bm should the subset sums get before we stop the search is not 
clear, but I suspect that an answer will be found with subsets having 
sums within a couple of mass units of bm, in most cases.

The ideal search-from-closest algorithm may be difficult to implement, 
so a process of looping through all the possible subsets of m and 
creating a list of "close-to-bm" subsets, might work. The final list of 
subsets should be sorted by their nearness to bm. Unfortunately, this 
brute force approach could take considerable processing time, 
considering the number of possible mass subsets in any significant-sized 
problem. There is much opportunity for clever optimization in this 
stage. A mechanism to bulk-eliminate obviously unusable subsets before 
summing and sorting, would be a useful pre-process step here. An 
efficient "close-to-bm" subset-finding-and-sorting algorithm is the true 
key to solving this problem.

Once we have found all the subsets of m that sum near bm, we have taken 
a big step, but we aren't through. We need a way to find a group of n 
subsets from the overall list of close-to-bm subsets that is 
_exclusive_. That is, the set of n subsets we pick must in total 
comprise the exact set of m. We will need to sort through our previously 
collected and sorted list of close-to-bm subsets, and find the group of 
n exclusive subsets whose sums are closest to bm. If we can efficiently 
generate the previous sorted list of close-to-bm subsets, finding the n 
exclusive subsets with sums closest to bm, should be fairly straightforward.

My J skills run out of gas as soon as I get into these iterative 
processes. I can't imagine using monolithic arrays to solve this 
problem, because of the array sizes involved. It would be great though, 
to be able to solve this in the J/array way. In any case, this is an 
interesting and challenging problem, particularly in the general case. I 
would appreciate any hints or help, as I am using this problem to 
further my J skills, as much as sharpening my problem-solving skills.

On a related topic, is there anything on Rosetta Code like this? If we 
can come up with an elegant general solution, it might be worth 
proposing a fairly large version of the problem to them, to see how 
other languages fare in solving the problem.

Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to