On Thu, Feb 11, 2010 at 1:59 PM, Matthew Matson <gtx...@hotmail.com> wrote: > > Hi Tutors, > > I am looking for the proper approach regarding the analysis of a dictionary > of combinations I have. > > What I need to do is read from a supplied text file that has a unique ID and > that unique ID's associated combination of elements. So let's say I have the > following lines in a text file (real file could be millions of lines): > > "ID" "Elements" > 1 'A, B, C, D' > 2 'A, D' > 3 'D, E' > 4 'A, D' > 5 'A, B' > 6 'A, C, D' > > and I do something like... > > combinationDict = {} > for line in file: > data = line.split('\t') > comb = tuple(data[1].split(',')) > if comb not in combinationDict: > combinationDict[comb] = 1 > else: > combination[comb] +=1
Use combinationDict = collections.defaultdict(int) Then you can just write combination[comb] += 1 without the test for comb in combinattionDict. > Now after I read all of the data I end up with a dictionary with the > combination as the key and the associated total qty as its value. > > print combinationDict > {('A','B','C','D'):1, ('A','D'):2, ('D','E'):1, ('A','B'):1, ('A', 'C', > 'D'):1} > > What I am looking for is a starting point for a solution in python to > analyze the combination list so that I can determine for example that ('A', > 'D') is the most popular combination and then determining how many other > combinations in the dictionary contain this combination. maxComb = max(combinationDict, key=combinationDict.__getitem__) will give you the single key with the largest count. maxCombSet = set(maxComb) [ comb for comb in combinationDict if maxCombSet <= set(comb) ] will give you a list of all the combinations that contain the max though it could be slow if you have lots of unique combinations (because of all the set conversions). > I would like to incorporate some parameters so for example the combination > ('A','B','C','D') and ('A', 'C', 'D') contain ('A','D') so they are valid > but I could also say that as long as one element is contained in a > combination it is valid as well provided I add no more than one additional > item to the combination. Now you are starting to lose me but you can modify the conditional in the above list comprehension to make whatever kind of test you want. > If I apply this logic then ('D','E') can ('A','B') > can contain ('A', 'D') and if I apply this to the combination dictionary I > have: > > {('B','C', ('A', 'D')):1, ('A','D'):2, ('E', ('A', 'D')):1, ('B', ('A', > 'D')):1, ('C', ('A', 'D')):1} > > which I could then query the keys for ('A', 'D') inclusion to get a total of > 4 for ('A', 'D'). Now you have lost me completely. What are the keys in this new dict? How do you get a total of 4 fro ('A', 'D')? Kent > I hope this isn't too long and confusing but I am looking for an approach > where I can analyze for the highest quantity of combinations and then > iterate through the dictionary substituting those combinations that were > determined a "highest qty" combination into other low qty combinations when > valid. > > I was hoping to have parameters to qualify a high qty combination (e.g. > every combination with qty above 10,000) with the highest quantity of that > determined set taking precedence for substitution for the first pass then > moving on to the next highest combination for the second pass of > substitution etc.. The other parameter would be for the combination that > would receive a substitution whereby I might say that I can only substitute > if a substitution results in only one additional (superfluous) value being > added to the combination existing low qty combination. > > I have looked around and this sounds like it might be similar to a packing > problem and in particular the knapsack problem but I can't seem to wrap my > head around an approach for this in python. I am not looking for a solution > just some guidance on a starting point or perhaps libraries that may be > helpful. > > Thank you. > > > ________________________________ > Windows® phone-your Windows stuff, on the go. See more. > _______________________________________________ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor