Re: [Jprogramming] Weight distribution problem

Marshall Lochbaum Thu, 14 Apr 2011 13:43:41 -0700

This is a verb for finding the combinations that will sum to numbers within
given bounds. It discards all the combinations it can at any given step, but
is still quite slow.

NB. x is bounds for sum (inclusive), y is sorted down list of weights (ie.
(-: \:~) y).
NB. returns a rank two array whose items are the Boolean selections for
items of y to return correct weights.
   findcombs=:4 :0
list=.$0
combs=.1 0$0
sums=.,0
'min max'=._1 0+x
remainder=.+/y
for_i. y do.
  list=.list,i
  remainder=.remainder-i
  combs=.,/ 0 1 ,~"0 1/ combs
  sums=. ,/ (0,i) +/ sums
  bounds=.(min-remainder),max
  combs=. combs #~ sel=.(1=bounds&I.)"0 sums
  sums=.sums #~sel
end.
)

Also not fast enough to compute the sums for 200 weights and 10 buckets. 50
weights and 10 buckets is possible:

   ]m =. ? 50 # 1000
484 696 111 937 761 131 504 777 284 772 729 729 107 450 215 701 733 788 371
903 175 195 771 754 144 322 443 918 758 143 43 804 291 223 767 954 698 650
855 53 337 832 858 137 154 201 80 320 295 443
   bounds=.(,>:) <. b%~+/m

   # bounds findcombs m
470258
   6!:2 'bounds findcombs m'
35.0327

Note that there are still rather large numbers of solutions, and probably
larger numbers of intermediate solutions.

Marshall

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Skip Cave
Sent: Thursday, April 14, 2011 2:16 PM
To: Programming forum
Subject: Re: [Jprogramming] Weight distribution problem

On 4/12/2011 5:22 AM, Aai wrote:
> Giving bucket sums:
>
> 12 8 # (,>:)<. bm
Aai,

Nice approach to finding the set of integer bucket sums that would come the
nearest to the bucket mean, bm. I knew I needed to be able to find that set
of integer sums, but I hadn't come up with a programmatic way to do that.
The key point here, is that any other set of sums other than the one you
found, will have a larger sum of differences between buckets. It would make
sense that our first attempt at the problem would be to see if the masses
could be partitioned to the minimal-bucket-difference configuration you
found.

Of course, there is no guarantee that m can be partitioned in this specific
manner, which would produce the theoretical minimal mass difference between
buckets. So in the long run, our search may have to identify more partition
sets which produce somewhat larger differences, Ideally, we could find these
additional partition sets, and then sort them by their distance magnitude,
so the smaller differences could be examined first. This definitely looks
like a way to cut down on computations.

However for now, the challenge is to see if we can partition the m masses to
result in the minimal-difference sum sets you defined. We need an efficient
algorithm that will find partitions of m that sum to either 
<. bm   or >: <. bm   which in our large case, is 5273 or 5274. as you 
discovered, we need 12 buckets summing to 5273, & 8 buckets summing to 5274.

+/ 12 8 # (,>:)<. bm
105468
    +/ m
105468

Though we can reduce the search space significantly by considering only
partitions that sum to the two integers nearest bm, we still have a huge
search to perform. Even if we assume that the number of masses in each
bucket would all be equal, we still have a big search.

In our large example, if the number of masses in each bucket were equal, we
would have (#m)%nb masses in each bucket, which in our large case, is
10 masses in each bucket. If each bucket had exactly 10 masses in it, and we
use a brute force approach, we would have 10 comb 200 possible partitions of
m to examine, to see which 10-sets summed to 5273 or 5274.

Using the combinations formula, the number of combinations of 10 comb 200 is
(!#m)%(!10)*!(#m)-10

Using normal floating point arithmetic, this formula generates values larger
than my machine's floating point range, which equates to infinity.

(!#m)%(!10)*!(#m)-10
|NaN error
|   (!#m)    %(!10)*!(#m)-10  NB. Oops! infinity % infinity is a NaN.

    !#m    NB. The factorial of #m is infinity in regular floating-point 
arithmetic
_

so we must resort to extended precision:

    fm =. */ >: x: i.#m  NB. factorial #m
    f10 =. */ >: x: i.10  NB. factorial 10 (min number of masses in each
bucket if all the same)
    fd =. */ >: x: i.(#m)-10  NB. factorial of difference (#m - 10)
    fm % F10 * fd
22,451,004,309,013,280   NB. Commas added for clarity

or 2.2451e16  NB. This is the number of possible partitions
(combinations) of m, taken 10 at a time, in floating point notation.

10 comb 200 is a lot of partitions of m to examine. The 10 comb 200
function, which would generate all those possible combinations, creates an
array that is bigger than my machine can handle:

c =. 10 comb 200
|out of memory: comb
|   z=.k    ,.&.>,&.>/\.>:&.>z

I have a 32-bit machine, so perhaps a 64 bit machine could deal with this
array of all 10-partitions of m. If I could get all the 10 combinations in
an array, it would be simple to see which ones sum up to
5273 or 5274. For that matter, I could find all of the partition sums within
a few mass units of bm.

Roger Hui has provided an interesting Jwiki article on the efficient
computation of partition sums, at:
http://www.jsoftware.com/jwiki/Essays/Combination%20Sums
I haven't gotten my head around all of Roger's math as yet, but his approach
looks promising as an efficient way to find all the sums of all the
partitions of a certain size.

However, in the general case, any brute-force, monolithic array approach
will always eventually run out of memory as the number of masses and buckets
grow. The brute-force monolithic array approach is probably not an option
for generality's sake. An iterative approach seems more appropriate here. I
would at least like the general solution to be time-constrained, rather than
space constrained. I seem to have more of the former than the latter, at
least as far as this problem is concerned.

In any case, this initial plan only gets us the 10-partitions of m. 
There is no guarantee that m will partition out with 10 masses in each
bucket. We will likely need to look at a range of partition sizes to solve
our problem.

For now, we can still set about finding more limits to the problem, in order
to continue to reduce the computations needed. Given any specific m, we
should be able to discover the minimum and maximum partition sizes that
would sum to any specific integer. By sorting m and looking at running sums,
it seems that we should be able to find the smallest number of m's masses
that will sum to (or near) a specific integer, as well as the largest number
of m's masses that will sum to that same integer. This will give us
boundaries on the smallest and largest partitions we need to examine for our
solution.

Thinking about the sorted m, I sense a glimmer of an idea about using sorted
m to pick out partitions that sum close to bm. I will have to think about
this more.

I have some paying work to get to now, so I will need to postpone working on
the partition-size limits computation and the sorted-m issue until tomorrow.
This is definitely giving me a workout in polishing my J skills, though.

Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Weight distribution problem

Reply via email to