[R] The 'subset matching' challenge

Yvonnick Noël Thu, 29 Oct 2009 07:48:59 -0700

Dear all,

The following problem just has been submitted to me by an accountant.

In his new job, he has to close some old accounts. He has yearlyamounts, and a list of products that have been bought over the years, atcertain prices for which he has an exhaustive record. The problem is: Hedoes not know what product was bought this or that year (don't ask). Hedoes not want to find back the real story, but just write realisticaccounts, for which the sum of a subset of product prices will give theexact yearly amount.


Here is a real example from his data:

# A list of 64 product prices

products =c(30500,30500,30500,30500,42000,42000,42000,42000,42000,42000,42000,42000,42000,42000,71040,90900,76950,35100,71190,

53730,456000,70740,70740,533600,83800,59500,27465,28000,28000,28000,28000,28000,26140,49600,77000,123289,27000,27000,27000,
27000,27000,27000,80000,33000,33000,55000,77382,48048,51186,40000,35000,21716,63051,15025,15025,15025,15025,800000,1110000,
59700,25908,829350,1198000,1031655)

# Global amount
amount = 4748652

Now he wants to find all subsets of the 'product' vector which sums to'amount'.


I wrote the following code, which is clearly not optimal:

# Create a matrix of subsets of size r among the integer set 1:n
subsets <- function(n, r, v = 1:n) {
 if(r <= 0) vector(mode(v), 0)
 else if(r >= n) v[1:n]
 else rbind(cbind(v[1], Recall(n-1, r-1, v[-1])),Recall(n-1, r, v[-1]))
}

# Main function
find.amount = function(amount,products) {

 if(sum(products)<amount) {
   cat("There is no solution.\n")
   return()
 }

 l = length(products)
 cat("\nThere are",l,"product prices\n\n")
 names(products) = paste("Product",1:l,sep="")
 products = sort(products)

 for(i in 2:l) {

   # If the sum of the i smallest prices is greater than amount, then stop
   if(sum(products[1:i])>amount) break

# Look for matching subsets only in the case when the sum of ilargest prices is greater than amount

   if(sum(rev(products)[1:i])>=amount) {
     # Generates all subsets of i indicies in 1:l
     subs = subsets(l,i)
     nl = nrow(subs)
     nc = ncol(subs)

     # Compute sums of corresponding price subsets
     sums = rowSums(matrix(products[subs],nl,nc))

     # Which ones match the global amount ?
     w = which(sums == amount)
     how.many = length(w)
     if(how.many>0) {
       cat("\n-->> There are",how.many,"solutions with",nc,"products :\n")
       for(j in 1:how.many) {
         print(products[subs[w[j],]])
       }
     }
     else cat("\n-->> There is no solution with",nc,"products.\n")
   }
   else cat("\n-->> There is no solution with",i,"products.\n")
 }
}


Then I can use these functions on a smaller example:

> find.amount(4,c(1,1,1,1,2,2))

and a number of matching subsets are provided. The problem is: Thisapproach creates a whole matrix of subsets of r integers among 1:n,which rapidly gives huge matrices, and this is clearly not optimal forthe real data provided above.

Would anyone have a suggestion as to an alternative and more efficientstrategy?


Good luck,

Yvonnick Noel
University of Brittany, Rennes 2
France

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] The 'subset matching' challenge

Reply via email to