On 4/12/2011 5:22 AM, Aai wrote:
> Giving bucket sums:
>
> 12 8 # (,>:)<. bm
Aai,

Nice approach to finding the set of integer bucket sums that would come 
the nearest to the bucket mean, bm. I knew I needed to be able to find 
that set of integer sums, but I hadn't come up with a programmatic way 
to do that. The key point here, is that any other set of sums other than 
the one you found, will have a larger sum of differences between 
buckets. It would make sense that our first attempt at the problem would 
be to see if the masses could be partitioned to the 
minimal-bucket-difference configuration you found.

Of course, there is no guarantee that m can be partitioned in this 
specific manner, which would produce the theoretical minimal mass 
difference between buckets. So in the long run, our search may have to 
identify more partition sets which produce somewhat larger differences, 
Ideally, we could find these additional partition sets, and then sort 
them by their distance magnitude, so the smaller differences could be 
examined first. This definitely looks like a way to cut down on 
computations.

However for now, the challenge is to see if we can partition the m 
masses to result in the minimal-difference sum sets you defined. We need 
an efficient algorithm that will find partitions of m that sum to either 
<. bm   or >: <. bm   which in our large case, is 5273 or 5274. as you 
discovered, we need 12 buckets summing to 5273, & 8 buckets summing to 5274.

+/ 12 8 # (,>:)<. bm
105468
    +/ m
105468

Though we can reduce the search space significantly by considering only 
partitions that sum to the two integers nearest bm, we still have a huge 
search to perform. Even if we assume that the number of masses in each 
bucket would all be equal, we still have a big search.

In our large example, if the number of masses in each bucket were equal, 
we would have (#m)%nb masses in each bucket, which in our large case, is 
10 masses in each bucket. If each bucket had exactly 10 masses in it, 
and we use a brute force approach, we would have 10 comb 200 possible 
partitions of m to examine, to see which 10-sets summed to 5273 or 5274.

Using the combinations formula, the number of combinations of 10 comb 200 is
(!#m)%(!10)*!(#m)-10

Using normal floating point arithmetic, this formula generates values 
larger than my machine's floating point range, which equates to infinity.

(!#m)%(!10)*!(#m)-10
|NaN error
|   (!#m)    %(!10)*!(#m)-10  NB. Oops! infinity % infinity is a NaN.

    !#m    NB. The factorial of #m is infinity in regular floating-point 
arithmetic
_


so we must resort to extended precision:

    fm =. */ >: x: i.#m  NB. factorial #m
    f10 =. */ >: x: i.10  NB. factorial 10 (min number of masses in each 
bucket if all the same)
    fd =. */ >: x: i.(#m)-10  NB. factorial of difference (#m - 10)
    fm % F10 * fd
22,451,004,309,013,280   NB. Commas added for clarity

or 2.2451e16  NB. This is the number of possible partitions 
(combinations) of m, taken 10 at a time, in floating point notation.

10 comb 200 is a lot of partitions of m to examine. The 10 comb 200 
function, which would generate all those possible combinations, creates 
an array that is bigger than my machine can handle:

c =. 10 comb 200
|out of memory: comb
|   z=.k    ,.&.>,&.>/\.>:&.>z

I have a 32-bit machine, so perhaps a 64 bit machine could deal with 
this array of all 10-partitions of m. If I could get all the 10 
combinations in an array, it would be simple to see which ones sum up to 
5273 or 5274. For that matter, I could find all of the partition sums 
within a few mass units of bm.

Roger Hui has provided an interesting Jwiki article on the efficient 
computation of partition sums, at:
http://www.jsoftware.com/jwiki/Essays/Combination%20Sums
I haven't gotten my head around all of Roger's math as yet, but his 
approach looks promising as an efficient way to find all the sums of all 
the partitions of a certain size.

However, in the general case, any brute-force, monolithic array approach 
will always eventually run out of memory as the number of masses and 
buckets grow. The brute-force monolithic array approach is probably not 
an option for generality's sake. An iterative approach seems more 
appropriate here. I would at least like the general solution to be 
time-constrained, rather than space constrained. I seem to have more of 
the former than the latter, at least as far as this problem is concerned.

In any case, this initial plan only gets us the 10-partitions of m. 
There is no guarantee that m will partition out with 10 masses in each 
bucket. We will likely need to look at a range of partition sizes to 
solve our problem.

For now, we can still set about finding more limits to the problem, in 
order to continue to reduce the computations needed. Given any specific 
m, we should be able to discover the minimum and maximum partition sizes 
that would sum to any specific integer. By sorting m and looking at 
running sums, it seems that we should be able to find the smallest 
number of m's masses that will sum to (or near) a specific integer, as 
well as the largest number of m's masses that will sum to that same 
integer. This will give us boundaries on the smallest and largest 
partitions we need to examine for our solution.

Thinking about the sorted m, I sense a glimmer of an idea about using 
sorted m to pick out partitions that sum close to bm. I will have to 
think about this more.

I have some paying work to get to now, so I will need to postpone 
working on the partition-size limits computation and the sorted-m issue 
until tomorrow. This is definitely giving me a workout in polishing my J 
skills, though.

Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to