This is a verb for finding the combinations that will sum to numbers within given bounds. It discards all the combinations it can at any given step, but is still quite slow.
NB. x is bounds for sum (inclusive), y is sorted down list of weights (ie. (-: \:~) y). NB. returns a rank two array whose items are the Boolean selections for items of y to return correct weights. findcombs=:4 :0 list=.$0 combs=.1 0$0 sums=.,0 'min max'=._1 0+x remainder=.+/y for_i. y do. list=.list,i remainder=.remainder-i combs=.,/ 0 1 ,~"0 1/ combs sums=. ,/ (0,i) +/ sums bounds=.(min-remainder),max combs=. combs #~ sel=.(1=bounds&I.)"0 sums sums=.sums #~sel end. ) Also not fast enough to compute the sums for 200 weights and 10 buckets. 50 weights and 10 buckets is possible: ]m =. ? 50 # 1000 484 696 111 937 761 131 504 777 284 772 729 729 107 450 215 701 733 788 371 903 175 195 771 754 144 322 443 918 758 143 43 804 291 223 767 954 698 650 855 53 337 832 858 137 154 201 80 320 295 443 bounds=.(,>:) <. b%~+/m # bounds findcombs m 470258 6!:2 'bounds findcombs m' 35.0327 Note that there are still rather large numbers of solutions, and probably larger numbers of intermediate solutions. Marshall -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Skip Cave Sent: Thursday, April 14, 2011 2:16 PM To: Programming forum Subject: Re: [Jprogramming] Weight distribution problem On 4/12/2011 5:22 AM, Aai wrote: > Giving bucket sums: > > 12 8 # (,>:)<. bm Aai, Nice approach to finding the set of integer bucket sums that would come the nearest to the bucket mean, bm. I knew I needed to be able to find that set of integer sums, but I hadn't come up with a programmatic way to do that. The key point here, is that any other set of sums other than the one you found, will have a larger sum of differences between buckets. It would make sense that our first attempt at the problem would be to see if the masses could be partitioned to the minimal-bucket-difference configuration you found. Of course, there is no guarantee that m can be partitioned in this specific manner, which would produce the theoretical minimal mass difference between buckets. So in the long run, our search may have to identify more partition sets which produce somewhat larger differences, Ideally, we could find these additional partition sets, and then sort them by their distance magnitude, so the smaller differences could be examined first. This definitely looks like a way to cut down on computations. However for now, the challenge is to see if we can partition the m masses to result in the minimal-difference sum sets you defined. We need an efficient algorithm that will find partitions of m that sum to either <. bm or >: <. bm which in our large case, is 5273 or 5274. as you discovered, we need 12 buckets summing to 5273, & 8 buckets summing to 5274. +/ 12 8 # (,>:)<. bm 105468 +/ m 105468 Though we can reduce the search space significantly by considering only partitions that sum to the two integers nearest bm, we still have a huge search to perform. Even if we assume that the number of masses in each bucket would all be equal, we still have a big search. In our large example, if the number of masses in each bucket were equal, we would have (#m)%nb masses in each bucket, which in our large case, is 10 masses in each bucket. If each bucket had exactly 10 masses in it, and we use a brute force approach, we would have 10 comb 200 possible partitions of m to examine, to see which 10-sets summed to 5273 or 5274. Using the combinations formula, the number of combinations of 10 comb 200 is (!#m)%(!10)*!(#m)-10 Using normal floating point arithmetic, this formula generates values larger than my machine's floating point range, which equates to infinity. (!#m)%(!10)*!(#m)-10 |NaN error | (!#m) %(!10)*!(#m)-10 NB. Oops! infinity % infinity is a NaN. !#m NB. The factorial of #m is infinity in regular floating-point arithmetic _ so we must resort to extended precision: fm =. */ >: x: i.#m NB. factorial #m f10 =. */ >: x: i.10 NB. factorial 10 (min number of masses in each bucket if all the same) fd =. */ >: x: i.(#m)-10 NB. factorial of difference (#m - 10) fm % F10 * fd 22,451,004,309,013,280 NB. Commas added for clarity or 2.2451e16 NB. This is the number of possible partitions (combinations) of m, taken 10 at a time, in floating point notation. 10 comb 200 is a lot of partitions of m to examine. The 10 comb 200 function, which would generate all those possible combinations, creates an array that is bigger than my machine can handle: c =. 10 comb 200 |out of memory: comb | z=.k ,.&.>,&.>/\.>:&.>z I have a 32-bit machine, so perhaps a 64 bit machine could deal with this array of all 10-partitions of m. If I could get all the 10 combinations in an array, it would be simple to see which ones sum up to 5273 or 5274. For that matter, I could find all of the partition sums within a few mass units of bm. Roger Hui has provided an interesting Jwiki article on the efficient computation of partition sums, at: http://www.jsoftware.com/jwiki/Essays/Combination%20Sums I haven't gotten my head around all of Roger's math as yet, but his approach looks promising as an efficient way to find all the sums of all the partitions of a certain size. However, in the general case, any brute-force, monolithic array approach will always eventually run out of memory as the number of masses and buckets grow. The brute-force monolithic array approach is probably not an option for generality's sake. An iterative approach seems more appropriate here. I would at least like the general solution to be time-constrained, rather than space constrained. I seem to have more of the former than the latter, at least as far as this problem is concerned. In any case, this initial plan only gets us the 10-partitions of m. There is no guarantee that m will partition out with 10 masses in each bucket. We will likely need to look at a range of partition sizes to solve our problem. For now, we can still set about finding more limits to the problem, in order to continue to reduce the computations needed. Given any specific m, we should be able to discover the minimum and maximum partition sizes that would sum to any specific integer. By sorting m and looking at running sums, it seems that we should be able to find the smallest number of m's masses that will sum to (or near) a specific integer, as well as the largest number of m's masses that will sum to that same integer. This will give us boundaries on the smallest and largest partitions we need to examine for our solution. Thinking about the sorted m, I sense a glimmer of an idea about using sorted m to pick out partitions that sum close to bm. I will have to think about this more. I have some paying work to get to now, so I will need to postpone working on the partition-size limits computation and the sorted-m issue until tomorrow. This is definitely giving me a workout in polishing my J skills, though. Skip ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
