On 4/12/2011 5:22 AM, Aai wrote:
> Giving bucket sums:
>
> 12 8 # (,>:)<. bm
Aai,
Nice approach to finding the set of integer bucket sums that would come
the nearest to the bucket mean, bm. I knew I needed to be able to find
that set of integer sums, but I hadn't come up with a programmatic way
to do that. The key point here, is that any other set of sums other than
the one you found, will have a larger sum of differences between
buckets. It would make sense that our first attempt at the problem would
be to see if the masses could be partitioned to the
minimal-bucket-difference configuration you found.
Of course, there is no guarantee that m can be partitioned in this
specific manner, which would produce the theoretical minimal mass
difference between buckets. So in the long run, our search may have to
identify more partition sets which produce somewhat larger differences,
Ideally, we could find these additional partition sets, and then sort
them by their distance magnitude, so the smaller differences could be
examined first. This definitely looks like a way to cut down on
computations.
However for now, the challenge is to see if we can partition the m
masses to result in the minimal-difference sum sets you defined. We need
an efficient algorithm that will find partitions of m that sum to either
<. bm or >: <. bm which in our large case, is 5273 or 5274. as you
discovered, we need 12 buckets summing to 5273, & 8 buckets summing to 5274.
+/ 12 8 # (,>:)<. bm
105468
+/ m
105468
Though we can reduce the search space significantly by considering only
partitions that sum to the two integers nearest bm, we still have a huge
search to perform. Even if we assume that the number of masses in each
bucket would all be equal, we still have a big search.
In our large example, if the number of masses in each bucket were equal,
we would have (#m)%nb masses in each bucket, which in our large case, is
10 masses in each bucket. If each bucket had exactly 10 masses in it,
and we use a brute force approach, we would have 10 comb 200 possible
partitions of m to examine, to see which 10-sets summed to 5273 or 5274.
Using the combinations formula, the number of combinations of 10 comb 200 is
(!#m)%(!10)*!(#m)-10
Using normal floating point arithmetic, this formula generates values
larger than my machine's floating point range, which equates to infinity.
(!#m)%(!10)*!(#m)-10
|NaN error
| (!#m) %(!10)*!(#m)-10 NB. Oops! infinity % infinity is a NaN.
!#m NB. The factorial of #m is infinity in regular floating-point
arithmetic
_
so we must resort to extended precision:
fm =. */ >: x: i.#m NB. factorial #m
f10 =. */ >: x: i.10 NB. factorial 10 (min number of masses in each
bucket if all the same)
fd =. */ >: x: i.(#m)-10 NB. factorial of difference (#m - 10)
fm % F10 * fd
22,451,004,309,013,280 NB. Commas added for clarity
or 2.2451e16 NB. This is the number of possible partitions
(combinations) of m, taken 10 at a time, in floating point notation.
10 comb 200 is a lot of partitions of m to examine. The 10 comb 200
function, which would generate all those possible combinations, creates
an array that is bigger than my machine can handle:
c =. 10 comb 200
|out of memory: comb
| z=.k ,.&.>,&.>/\.>:&.>z
I have a 32-bit machine, so perhaps a 64 bit machine could deal with
this array of all 10-partitions of m. If I could get all the 10
combinations in an array, it would be simple to see which ones sum up to
5273 or 5274. For that matter, I could find all of the partition sums
within a few mass units of bm.
Roger Hui has provided an interesting Jwiki article on the efficient
computation of partition sums, at:
http://www.jsoftware.com/jwiki/Essays/Combination%20Sums
I haven't gotten my head around all of Roger's math as yet, but his
approach looks promising as an efficient way to find all the sums of all
the partitions of a certain size.
However, in the general case, any brute-force, monolithic array approach
will always eventually run out of memory as the number of masses and
buckets grow. The brute-force monolithic array approach is probably not
an option for generality's sake. An iterative approach seems more
appropriate here. I would at least like the general solution to be
time-constrained, rather than space constrained. I seem to have more of
the former than the latter, at least as far as this problem is concerned.
In any case, this initial plan only gets us the 10-partitions of m.
There is no guarantee that m will partition out with 10 masses in each
bucket. We will likely need to look at a range of partition sizes to
solve our problem.
For now, we can still set about finding more limits to the problem, in
order to continue to reduce the computations needed. Given any specific
m, we should be able to discover the minimum and maximum partition sizes
that would sum to any specific integer. By sorting m and looking at
running sums, it seems that we should be able to find the smallest
number of m's masses that will sum to (or near) a specific integer, as
well as the largest number of m's masses that will sum to that same
integer. This will give us boundaries on the smallest and largest
partitions we need to examine for our solution.
Thinking about the sorted m, I sense a glimmer of an idea about using
sorted m to pick out partitions that sum close to bm. I will have to
think about this more.
I have some paying work to get to now, so I will need to postpone
working on the partition-size limits computation and the sorted-m issue
until tomorrow. This is definitely giving me a workout in polishing my J
skills, though.
Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm