On Jul 18, 4:49 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > "larry.mart...@gmail.com" <larry.mart...@gmail.com> writes: > > I have an interesting problem I'm trying to solve. I have a solution > > almost working, but it's super ugly, and know there has to be a > > better, cleaner way to do it. ... > > > My solution involves multiple maps and multiple iterations through the > > data. How would you folks do this? > > You could post your code and ask for suggestions how to improve it. > There are a lot of not-so-natural constraints in that problem, so it > stands to reason that the code will be a bit messy. The whole > specification seems like an antipattern though. You should just give a > sensible encoding for the filename regardless of whether other fields > are duplicated or not. You also don't seem to address the case where > basename, dir4, and dir5 are all duplicated. > > The approach I'd take for the spec as you wrote it is: > > 1. Sort the list on the (basename, dir4, dir5) triple, saving original > location (numeric index) of each item > 2. Use itertools.groupby to group together duplicate basenames. > 3. Within the groups, use groupby again to gather duplicate dir4's, > 4. Within -those- groups, group by dir5 and assign sequence numbers in > groups where there's more than one file > 5. Unsort to get the rewritten items back into the original order. > > Actual code is left as an exercise.
Thanks very much for the reply Paul. I did not know about itertools. This seems like it will be perfect for me. But I'm having 1 issue, how do I know how many of a given basename (and similarly how many basename/dir4s) there are? I don't know that I have to modify a file until I've passed it, so I have to do all kinds of contortions to save the previous one, and deal with the last one after I fall out of the loop, and it's getting very nasty. reports_list is the list sorted on basename, dir4, dir5 (tool is dir4, file_date is dir5): for file, file_group in groupby(reports_list, lambda x: x[0]): # if file is unique in file_group do nothing, but how can I tell if file is unique? for tool, tool_group in groupby(file_group, lambda x: x[1]): # if tool is unique for file, change file to tool_file, but how can I tell if tool is unique for file? for file_date, file_date_group in groupby(tool_group, lambda x: x[2]): You can't do a len on the iterator that is returned from groupby, and I've tried to do something with imap or defaultdict, but I'm not getting anywhere. I guess I can just make 2 passes through the data, the first time getting counts. Or am I missing something about how groupby works? Thanks! -larry -- http://mail.python.org/mailman/listinfo/python-list