On Jul 18, 4:49 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > "larry.mart...@gmail.com" <larry.mart...@gmail.com> writes: > > I have an interesting problem I'm trying to solve. I have a solution > > almost working, but it's super ugly, and know there has to be a > > better, cleaner way to do it. ... > > > My solution involves multiple maps and multiple iterations through the > > data. How would you folks do this? > > You could post your code and ask for suggestions how to improve it. > There are a lot of not-so-natural constraints in that problem, so it > stands to reason that the code will be a bit messy. The whole > specification seems like an antipattern though. You should just give a > sensible encoding for the filename regardless of whether other fields > are duplicated or not. You also don't seem to address the case where > basename, dir4, and dir5 are all duplicated. > > The approach I'd take for the spec as you wrote it is: > > 1. Sort the list on the (basename, dir4, dir5) triple, saving original > location (numeric index) of each item > 2. Use itertools.groupby to group together duplicate basenames. > 3. Within the groups, use groupby again to gather duplicate dir4's, > 4. Within -those- groups, group by dir5 and assign sequence numbers in > groups where there's more than one file > 5. Unsort to get the rewritten items back into the original order. > > Actual code is left as an exercise.
I replied to this before, but I don't see, so if this is a duplicate, sorry. Thanks for the reply Paul. I had not heard of itertools. It sounds like just what I need for this. But I am having 1 issue - how do you know how many items are in each group? Without knowing that I have to either make 2 passes through the data, or else work on the previous item (when I'm in an iteration after the first then I know I have dups). But that very quickly gets crazy with trying to keep the previous values. -- http://mail.python.org/mailman/listinfo/python-list