Hi Tutors, 

I am looking for the proper approach regarding the analysis of a dictionary of 
combinations I have.

What I need to do is read from a supplied text file that has a unique ID and 
that unique ID's associated combination of elements. So let's say I have the 
following lines in a text file (real file could be millions of lines):

"ID"    "Elements"
1    'A, B, C, D'
2    'A, D'
3    'D, E'
4    'A, D'
5    'A, B'
6    'A, C, D'

and I do something like...

combinationDict = {}
for line in file:
    data = line.split('\t')
    comb = tuple(data[1].split(','))
    if comb not in combinationDict:
        combinationDict[comb] = 1
    else:
        combination[comb] +=1

Now after I read all of the data I end up with a dictionary with the 
combination as the key and the associated total qty as its value.

print combinationDict
{('A','B','C','D'):1, ('A','D'):2, ('D','E'):1, ('A','B'):1, ('A', 'C', 'D'):1}

What I am looking for is a starting point for a solution in python to analyze 
the combination list so that I can determine for example that ('A', 'D') is the 
most popular combination and then determining how many other combinations in 
the dictionary contain this combination. 

I would like to incorporate some parameters so for example the combination 
('A','B','C','D') and ('A', 'C', 'D') contain ('A','D') so they are valid but I 
could also say that as long as one element is contained in a combination it is 
valid as well provided I add no more than one additional item to the 
combination. If I apply this logic then ('D','E') can ('A','B') can contain 
('A', 'D') and if I apply this to the combination dictionary I have:

{('B','C', ('A', 'D')):1, ('A','D'):2, ('E', ('A', 'D')):1, ('B', ('A', 
'D')):1, ('C', ('A', 'D')):1}

which I could then query the keys for ('A', 'D') inclusion to get a total of 4 
for ('A', 'D').

I hope this isn't too long and confusing but I am looking for an approach where 
I can analyze for the highest quantity of combinations and then iterate through 
the dictionary substituting those combinations that were determined a "highest 
qty" combination into other low qty combinations when valid.

I was hoping to have parameters to qualify a high qty combination (e.g. every 
combination with qty above 10,000) with the highest quantity of that determined 
set taking precedence for substitution for the first pass then moving on to the 
next highest combination for the second pass of substitution etc.. The other 
parameter would be for the combination that would receive a substitution 
whereby I might say that I can only substitute if a substitution results in 
only one additional (superfluous) value being added to the combination existing 
low qty combination.

I have looked around and this sounds like it might be similar to a packing 
problem and in particular the knapsack problem but I can't seem to wrap my head 
around an approach for this in python. I am not looking for a solution just 
some guidance on a starting point or perhaps libraries that may be helpful.

Thank you.

                                          
_________________________________________________________________

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to