On 4/11/2011 6:08 AM, Viktor Cerovski wrote:
>>> After some quick tries with J, I got a bunch of 123 123 123 124
>>> >> distributions, and they are all mutually different.
>>> >> Here is one of them:
>>> >>
>>> >> conf
>>> >> 0 0 2 3 1 0 3 3 0 2 1 1 2 1 3 1 2 2 1 2
>>> >> 23 43 12 54 7 3 5 10 54 55 26 9 9 43 54 1 8 6 38 33
>>> >>
>>> >> +//./conf
>>> >> 123 123 123 124
>>> >>
>>> >> </./conf
>>> >> ┌──────────┬──────────────┬──────────┬──────────────┐
>>> >> │23 43 3 54│12 55 9 8 6 33│54 5 10 54│7 26 9 43 1 38│
>>> >> └──────────┴──────────────┴──────────┴──────────────┘
>>> >
> I haven't given any method. Which size of the problem
> would you consider nontrivial?
>
Very nice, Victor, I would like to understand your conf function in more
detail.
I am hoping to come up with a general solution for any number of masses
and buckets. Unfortunately, as the number of masses and buckets
increase, a brute force attack on the problem rapidly gets out of hand.
For example, let's propose a bit larger problem. A problem of say, 200
masses and 10 buckets would generate very large lists of possible
combinations. "20 comb 200" runs out of memory on my machine. I would
consider this a non-trivial problem.
Let's create a random set of 200 masses. We can make the list of masses
repeatable for everyone, by setting the random link:
setrl 100
m =. ? 200 # 1000
m
440 512 395 799 175 471 872 938 358 980 394 453 378 858 574 74 .......
So now we have 200 masses that need to be distributed among 10 buckets.
Keep in mind that this generation process repeatedly selects from the
same 1000 integers with replacement, so we can have multiple instances
of the same mass value in m. In fact, the masses that occur more than
once in m are:
(0 = ~: m) # m
147 558 440 820 394 994 292 574 986 839 523 742 540 802 158 512 329 759
717 725 455 476 430 599
A general solution needs to have a way to reduce the number of potential
mass-group candidates. In order to minimize the mass differences between
buckets,it is clear that the total mass in each bucket must be fairly
near (+/m)%n where n is the number of buckets. I call this number the
"bucket mean", or bm. This fact alone drastically reduces the number of
subsets of m that need to be examined to solve this problem. It is also
clear that the sum of masses in each bucket must be an integer. So we
need to reduce our search space using these facts.
The mass in each of our buckets must each be close to the bucket mean,
in order to minimize the mass difference between buckets. In our
specific problem, since we have 200 randomly selected numbers from
0-999, so +/m should be around 100,000. We would expect our bm to be
near 100,000 divided by 20 or 5000, and it is. bm = 5273.4 for our 200
randomly-generated masses.
The ideal solution to the problem would be when the n buckets each have
exactly bm mass in them. However, bm will not necessarily be an
integer, which is the likely case, yet the bucket sums must be an
integer. Also, there may not be n exclusive subsets of m that sum
exactly to bm. By "exclusive subsets", I mean n subsets of m that in
total contain the exact set of m, with no additions or exclusions. So
the problem boils down to finding n exclusive subsets of m, where each
of the n subsets sums to an integer _near_ to bm. The n exclusive
subsets of m whose absolute differences from bm are minimized (closest
to bm), will be the answer. If the n subsets of m are truly exclusive,
then by definition, the total of the n exclusive subsets must sum to +/m.
The initial process step, and the real key to solving this problem, is
to efficiently find the subsets of m where the sum of each subset is
very near to bm. An ideal search algorithm would find subsets closest to
bm first, and discover subsets with more distance from bm later. How far
away from bm should the subset sums get before we stop the search is not
clear, but I suspect that an answer will be found with subsets having
sums within a couple of mass units of bm, in most cases.
The ideal search-from-closest algorithm may be difficult to implement,
so a process of looping through all the possible subsets of m and
creating a list of "close-to-bm" subsets, might work. The final list of
subsets should be sorted by their nearness to bm. Unfortunately, this
brute force approach could take considerable processing time,
considering the number of possible mass subsets in any significant-sized
problem. There is much opportunity for clever optimization in this
stage. A mechanism to bulk-eliminate obviously unusable subsets before
summing and sorting, would be a useful pre-process step here. An
efficient "close-to-bm" subset-finding-and-sorting algorithm is the true
key to solving this problem.
Once we have found all the subsets of m that sum near bm, we have taken
a big step, but we aren't through. We need a way to find a group of n
subsets from the overall list of close-to-bm subsets that is
_exclusive_. That is, the set of n subsets we pick must in total
comprise the exact set of m. We will need to sort through our previously
collected and sorted list of close-to-bm subsets, and find the group of
n exclusive subsets whose sums are closest to bm. If we can efficiently
generate the previous sorted list of close-to-bm subsets, finding the n
exclusive subsets with sums closest to bm, should be fairly straightforward.
My J skills run out of gas as soon as I get into these iterative
processes. I can't imagine using monolithic arrays to solve this
problem, because of the array sizes involved. It would be great though,
to be able to solve this in the J/array way. In any case, this is an
interesting and challenging problem, particularly in the general case. I
would appreciate any hints or help, as I am using this problem to
further my J skills, as much as sharpening my problem-solving skills.
On a related topic, is there anything on Rosetta Code like this? If we
can come up with an elegant general solution, it might be worth
proposing a fairly large version of the problem to them, to see how
other languages fare in solving the problem.
Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm