RE: Namedtuples: some unexpected inconveniences
Peter Otten wrote, on Saturday, April 15, 2017 12:44 AM > > Deborah Swanson wrote: > > > I know it's your "ugly" answer, but can I ask what the '**' in > > > > fix = {label: max(values, key=len)} > > group[:] = [record._replace(**fix) for record in group] > > > > means? > > d = {"a": 1, "b": 2} > f(**d) > > is equivalent to > > f(a=1, b=2) Thisis perfect Peter, thank you very much. Now I can understand why your "ugly" fix works, instead of just seeing in the degugger that mysteriously it does somehow work. > so ** is a means to call a function with keyword arguments when you want to > decide about the *names* at runtime. Example: > > >>> def f(a=1, b=2): > ... print("a =", a) > ... print("b =", b) > ... print() > ... > >>> for d in [{"a": 10}, {"b": 42}, {"a": 100, "b": 200}]: > ... f(**d) > ... > a = 10 > b = 2 > > a = 1 > b = 42 > > a = 100 > b = 200 Looks like a very handy type of "kwarg" to know about. It would be nice if the doc writers weren't so mysterious about what can be used for "kwargs", or they explained somewhere what the possible "kwargs" can be. Even in the one index entry for "kwargs", all they say about what it is, is "A dict of keyword arguments values", and that only applies to the Signature object. I've see "kwargs" in articles other than for namedtuples, always mysteriously, with no details on what the possibilities are and how to use them. (Makes me wonder where you learned all this ... ;) > Starting from a namedtuple `record` > > record._replace(Location="elswhere") > > creates a new namedtuple with the Location attribute changed to "elsewhere", > and the slice [:] on the left causes all items in the `groups` list to be > replaced with new namedtuples, > > group[:] = [record._replace(Location="elsewhere") for record in group] > > is basically the same as > > tmp = group.copy() > group.clear() > for record in tmp: > group.append(record_replace(Location="elsewhere")) > > To support not just Location, but also Kind and Notes we need > the double asterisk. I saw this in the debugger, but again didn't really understand why it was working. So this really clears it up, and I plan to look at it in the debugger again to be sure I understand it all. Thanks very much. -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: > I know it's your "ugly" answer, but can I ask what the '**' in > > fix = {label: max(values, key=len)} > group[:] = [record._replace(**fix) for record in group] > > means? d = {"a": 1, "b": 2} f(**d) is equivalent to f(a=1, b=2) so ** is a means to call a function with keyword arguments when you want to decide about the *names* at runtime. Example: >>> def f(a=1, b=2): ... print("a =", a) ... print("b =", b) ... print() ... >>> for d in [{"a": 10}, {"b": 42}, {"a": 100, "b": 200}]: ... f(**d) ... a = 10 b = 2 a = 1 b = 42 a = 100 b = 200 Starting from a namedtuple `record` record._replace(Location="elswhere") creates a new namedtuple with the Location attribute changed to "elsewhere", and the slice [:] on the left causes all items in the `groups` list to be replaced with new namedtuples, group[:] = [record._replace(Location="elsewhere") for record in group] is basically the same as tmp = group.copy() group.clear() for record in tmp: group.append(record_replace(Location="elsewhere")) To support not just Location, but also Kind and Notes we need the double asterisk. -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Roel Schroeven wrote, on Thursday, April 13, 2017 5:26 PM > > Gregory Ewing schreef op 13/04/2017 9:34: > > Deborah Swanson wrote: > >> Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM > >> > >>> Personally I would immediately discard the header row > once and for > >>> all, not again and again on every operation. > >> Well, perhaps, but I need the header row to stay in place to write > >> the list to a csv when I'm done > > > > That's no problem, just write the header row separately. > > > > Do this at the beginning: > > > >header = [Record._make(fieldnames)] > >records = [Record._make(row) for row in rows] > > > > and then to write out the file: > > > >writer = csv.writer(outputfile) > >writer.writerow(header) > >writer.writerows(records) > > I don't even think there's any need to store the field names anywhere > else than in fieldnames. So unless I'm missing something, > just do this > at the beginning: > > fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > records = [Record._make(row) for row in rows] > > and this at the end: > > writer = csv.writer(outputfile) > writer.writerow(fieldnames) # or writer.writerow(Record._fields) > writer.writerows(records) > > > -- > The saddest aspect of life right now is that science gathers > knowledge faster than society gathers wisdom. >-- Isaac Asimov > > Roel Schroeven This is essentially what Peter Otten most recently recommended. I know you got there first, but it is better to only get the header row to name the fields as you and Greg Ewing suggested, and then use just the records in processing the field data, using the field names only for the output. Thanks, Deborah -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
Peter Otten wrote: PS: Personally I would probably take the opposite direction and use dicts throughout... Yes, my suggestion to used namedtuples in the first place was based on the assumption that you would mostly be referring to fields using fixed names. If that's not true, then using namedtuples (or a mutable equivalent) might just be making things harder. If the names are sometimes fixed and sometimes not, then you have a tradeoff to make. In the code you posted most recently, the only fixed field reference seems to be row.title, and that only appears once. So as long as that's all you want to do with the rows, storing them as dicts would appear to be a clear winner. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Gregory Ewing wrote, on Thursday, April 13, 2017 12:17 AM > > Deborah Swanson wrote: > > But I think you got it right in your last sentence below. defaultdict > > copied them because they were immutable, > > No, definitely not. A defaultdict will never take it upon > itself to copy an object you give it, either as a key or a value. > > The copying, if any, must have occurred somewhere else, in > code that you didn't show us. > > Can you show us the actual code you used to attempt to > update the namedtuples? > > -- > Greg I think you've heard my sob story of how the actual code was lost (PyCharm ate it). I've made some attempts to recover that code, but honestly, at this point Peter Otten has showed my enough examples of getattr() that work with namedtuples with variable names, that I'd rather just accept that probably some tranformation of the structure I did caused the copying of values only. I remember looking at it in the debugger, but that code was convoluted and I don't think it's worth teasing out exactly what went wrong. (or figuring out how I did it in the first place) Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
MRAB wrote, on Friday, April 14, 2017 2:19 PM > > In the line: > > values = {row[label] for row in group} > > 'group' is a list of records; row is a record (namedtuple). > > You can get the members of a namedtuple (also 'normal' tuple) by numeric > index, e.g. row[0], but the point of a namedtuple is that you can get > them by name, as an attribute, e.g. row.Location. > > As the name of the attribute isn't fixed, but passed by name, use > getattr(row, label) instead: > > values = {getattr(row, label) for row in group} > > As for the values: > > # Remove the missing value, if present. > values.discard('') > > # There's only 1 value left, so fill in the empty places. > if len(values) == 1: > ... Thanks for this, but honestly, I'm namedtupled-out at the moment and I have several other projects I'd like to be working on. But I saved your suggestion with ones that others have made, so I'll revisit yours again when I come back for another look at namedtuples. > The next point is that namedtuples, like normal tuples, are immutable. > You can't change the value of an attribute. No you can't, but you can use somenamedtuple._replace(kwargs) to replace the value. Works just as well. -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
fix = {label: max(values, key=len)} group[:] = [record._replace(**fix) for record in group] Peter Otten wrote, on Friday, April 14, 2017 2:16 PM > > def complete(group, label): > > values = {row[label] for row in group} > > # get "TypeError: tuple indices must be integers, not str" > > Yes, the function expects row to be dict-like. However when > you change > > row[label] > > to > > getattr(row, label) > > this part of the code will work... > > > has_empty = not min(values, key=len) > > if len(values) - has_empty != 1: > > # no value or multiple values; manual intervention needed > > return False > > elif has_empty: > > for row in group: > > row[label] = max(values, key=len) > > but here you'll get an error. I made the experiment to change > everything > necessary to make it work with namedtuples, but you'll > probably find the > result a bit hard to follow: > > import csv > from collections import namedtuple, defaultdict > > INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 > in - test.csv" OUTFILE = "tmp.csv" > > def get_title(row): > return row.title > > def complete(group, label): > values = {getattr(row, label) for row in group} > has_empty = not min(values, key=len) > if len(values) - has_empty != 1: > # no value or multiple values; manual intervention needed > return False > elif has_empty: > # replace namedtuples in the group. Yes, it's ugly > fix = {label: max(values, key=len)} > group[:] = [record._replace(**fix) for record in group] > return True > > with open(INFILE) as infile: > rows = csv.reader(infile) > fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > groups = defaultdict(list) > for row in rows: > record = Record._make(row) > groups[get_title(record)].append(record) > > LABELS = ['Location', 'Kind', 'Notes'] > > # add missing values > for group in groups.values(): > for label in LABELS: > complete(group, label) > > # dump data (as a demo that you do not need the list of all > records) with open(OUTFILE, "w") as outfile: > writer = csv.writer(outfile) > writer.writerow(fieldnames) > writer.writerows( > record for group in groups.values() for record in group > ) > > One alternative is to keep the original and try to replace > the namedtuple > with the class suggested by Gregory Ewing. Then it should > suffice to also > change > > > elif has_empty: > > for row in group: > > row[label] = max(values, key=len) > > to > > > elif has_empty: > > for row in group: > setattr(row, label, max(values, key=len)) > > PS: Personally I would probably take the opposite direction > and use dicts > throughout... Ok, thank you. I haven't run it on a real input file yet, but this seems to work with the test file. Because the earlier incarnation defined 'values' as values = {row[label] for row in group} I'd incorrectly guessed what was going on in has_empty = not min(values, key=len). Now that values = {getattr(row, label) for row in group} works properly as you intended it to, I see you get the set of unique values for that label in that group, which makes the rest of it make sense. I know it's your "ugly" answer, but can I ask what the '**' in fix = {label: max(values, key=len)} group[:] = [record._replace(**fix) for record in group] means? I haven't seen it before, and I imagine it's one of the possible 'kwargs' in 'somenamedtuple._replace(kwargs)', but I have no idea where to look up the possible 'kwargs'. (probably short for keyword args) Also, I don't see how you get a set for values with the notation you used. Looks like if anything you've got a comprehension that should give you a dict. (But I haven't worked a lot with sets either.) Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
On 2017-04-14 20:34, Deborah Swanson wrote: Peter, Retracing my steps to rewrite the getattr(row, label) code, this is what sent me down the rabbit hole in the first place. (I changed your 'rows' to 'records' just to use the same name everywhere, but all else is the same as you gave me.) I'd like you to look at it and see if you still think complete(group, label) should work. Perhaps seeing why it fails will clarify some of the difficulties I'm having. I ran into problems with values and has_empty. values has a problem because row[label] gets a TypeError. has_empty has a problem because a list of field values will be shorter with missing values than a full list, but a namedtuple with missing values will be the same length as a full namedtuple since missing values have '' placeholders. Two more unexpected inconveniences. In the line: values = {row[label] for row in group} 'group' is a list of records; row is a record (namedtuple). You can get the members of a namedtuple (also 'normal' tuple) by numeric index, e.g. row[0], but the point of a namedtuple is that you can get them by name, as an attribute, e.g. row.Location. As the name of the attribute isn't fixed, but passed by name, use getattr(row, label) instead: values = {getattr(row, label) for row in group} As for the values: # Remove the missing value, if present. values.discard('') # There's only 1 value left, so fill in the empty places. if len(values) == 1: ... The next point is that namedtuples, like normal tuples, are immutable. You can't change the value of an attribute. A short test csv is at the end, for you to read in and attempt to execute the following code, and I'm still working on reconstructing the lost getattr(row, label) code. import csv from collections import namedtuple, defaultdict def get_title(row): return row.title def complete(group, label): values = {row[label] for row in group} # get "TypeError: tuple indices must be integers, not str" has_empty = not min(values, key=len) if len(values) - has_empty != 1: # no value or multiple values; manual intervention needed return False elif has_empty: for row in group: row[label] = max(values, key=len) return True infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv") rows = csv.reader(infile) fieldnames = next(rows) Record = namedtuple("Record", fieldnames) records = [Record._make(fieldnames)] records.extend(Record._make(row) for row in rows) # group rows by title groups = defaultdict(list) for row in records: groups[get_title(row)].append(row) LABELS = ['Location', 'Kind', 'Notes'] # add missing values for group in groups.values(): for label in LABELS: complete(group, label) Moving 2017 in - test.csv: (If this doesn't come through the mail system correctly, I've also uploaded the file to http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv. Permissions should be set correctly, but let me know if you run into problems downloading the file.) [snip] -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: > Peter, > > Retracing my steps to rewrite the getattr(row, label) code, this is what > sent me down the rabbit hole in the first place. (I changed your 'rows' > to 'records' just to use the same name everywhere, but all else is the > same as you gave me.) I'd like you to look at it and see if you still > think complete(group, label) should work. Perhaps seeing why it fails > will clarify some of the difficulties I'm having. > > I ran into problems with values and has_empty. values has a problem > because > row[label] gets a TypeError. has_empty has a problem because a list of > field values will be shorter with missing values than a full list, but a > namedtuple with missing values will be the same length as a full > namedtuple since missing values have '' placeholders. Two more > unexpected inconveniences. > > A short test csv is at the end, for you to read in and attempt to > execute the following code, and I'm still working on reconstructing the > lost getattr(row, label) code. > > import csv > from collections import namedtuple, defaultdict > > def get_title(row): > return row.title > > def complete(group, label): > values = {row[label] for row in group} > # get "TypeError: tuple indices must be integers, not str" Yes, the function expects row to be dict-like. However when you change row[label] to getattr(row, label) this part of the code will work... > has_empty = not min(values, key=len) > if len(values) - has_empty != 1: > # no value or multiple values; manual intervention needed > return False > elif has_empty: > for row in group: > row[label] = max(values, key=len) but here you'll get an error. I made the experiment to change everything necessary to make it work with namedtuples, but you'll probably find the result a bit hard to follow: import csv from collections import namedtuple, defaultdict INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv" OUTFILE = "tmp.csv" def get_title(row): return row.title def complete(group, label): values = {getattr(row, label) for row in group} has_empty = not min(values, key=len) if len(values) - has_empty != 1: # no value or multiple values; manual intervention needed return False elif has_empty: # replace namedtuples in the group. Yes, it's ugly fix = {label: max(values, key=len)} group[:] = [record._replace(**fix) for record in group] return True with open(INFILE) as infile: rows = csv.reader(infile) fieldnames = next(rows) Record = namedtuple("Record", fieldnames) groups = defaultdict(list) for row in rows: record = Record._make(row) groups[get_title(record)].append(record) LABELS = ['Location', 'Kind', 'Notes'] # add missing values for group in groups.values(): for label in LABELS: complete(group, label) # dump data (as a demo that you do not need the list of all records) with open(OUTFILE, "w") as outfile: writer = csv.writer(outfile) writer.writerow(fieldnames) writer.writerows( record for group in groups.values() for record in group ) One alternative is to keep the original and try to replace the namedtuple with the class suggested by Gregory Ewing. Then it should suffice to also change > elif has_empty: > for row in group: > row[label] = max(values, key=len) to > elif has_empty: > for row in group: setattr(row, label, max(values, key=len)) PS: Personally I would probably take the opposite direction and use dicts throughout... -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Peter, Retracing my steps to rewrite the getattr(row, label) code, this is what sent me down the rabbit hole in the first place. (I changed your 'rows' to 'records' just to use the same name everywhere, but all else is the same as you gave me.) I'd like you to look at it and see if you still think complete(group, label) should work. Perhaps seeing why it fails will clarify some of the difficulties I'm having. I ran into problems with values and has_empty. values has a problem because row[label] gets a TypeError. has_empty has a problem because a list of field values will be shorter with missing values than a full list, but a namedtuple with missing values will be the same length as a full namedtuple since missing values have '' placeholders. Two more unexpected inconveniences. A short test csv is at the end, for you to read in and attempt to execute the following code, and I'm still working on reconstructing the lost getattr(row, label) code. import csv from collections import namedtuple, defaultdict def get_title(row): return row.title def complete(group, label): values = {row[label] for row in group} # get "TypeError: tuple indices must be integers, not str" has_empty = not min(values, key=len) if len(values) - has_empty != 1: # no value or multiple values; manual intervention needed return False elif has_empty: for row in group: row[label] = max(values, key=len) return True infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv") rows = csv.reader(infile) fieldnames = next(rows) Record = namedtuple("Record", fieldnames) records = [Record._make(fieldnames)] records.extend(Record._make(row) for row in rows) # group rows by title groups = defaultdict(list) for row in records: groups[get_title(row)].append(row) LABELS = ['Location', 'Kind', 'Notes'] # add missing values for group in groups.values(): for label in LABELS: complete(group, label) Moving 2017 in - test.csv: (If this doesn't come through the mail system correctly, I've also uploaded the file to http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv. Permissions should be set correctly, but let me know if you run into problems downloading the file.) CLDesc,url,title,Description,Location,ST,co,miles,Kind,Rent,Date,first,b r,Notes,yesno,mark,arc Jan 3 1 bedroom/1 bath mobile $700 1br - (Mount Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1 bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath mobile $700 1br,Mount Vernon,WA,sk,,trailer,700,1/3/2017,1/3/2017,1,no smoking,,,deleted by its author Jan 6 1 bedroom/1 bath mobile $700 1br - (Mount Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1 bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath mobile $700 1br,,WA,,,trailer,700,1/6/2017,1/3/2017,1,no smoking,,, Jan 10 1 bedroom/1 bath mobile $700 1br - (Mount Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1 bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath mobile $700 1br,,700,1/10/2017,1/3/2017,1 Jan 17 1 bedroom/1 bath mobile $700 1br - (Mount Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1 bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath mobile $700 1br,Mount Vernon,WA,,,trailer,700,1/17/2017,1/3/2017,1,no smoking,,, Jan 19 1 bedroom/1 bath mobile $700 1br - (Mount Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1 bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath mobile $700 1br,Mount Vernon,WA,,,trailer,700,1/19/2017,1/3/2017,1,no smoking,,, Jan 26 1240 8th Avenue $725 2br - 676ft2 - (Longview),http://portland.craigslist.org/clk/apa/5976442500.html,1240 8th Avenue $725 2br - 676ft2 - (Longview),1240 8th Avenue $725 2br - 676ft2,,725,1/26/2017,1/16/2017,2 Jan 16 1240 8th Avenue $725 2br - 676ft2 - (Longview),http://portland.craigslist.org/clk/apa/5961794305.html,1240 8th Avenue $725 2br - 676ft2 - (Longview) - (Longview),1240 8th Avenue $725 2br - 676ft2,Longview,WA,,,house,725,1/16/2017,1/16/2017,2,"detached garage, w/d hookups",,, Jan 6 1424 California Avenue $750 2br - 1113ft2 - (Klamath Falls),http://klamath.craigslist.org/apa/5947977083.html,1424 California Avenue $750 2br - 1113ft2 - (Klamath Falls) - (Klamath Falls),1424 California Avenue $750 2br - 1113ft2,Klamath Falls,OR,kl,,house,750,1/6/2017,1/6/2017,2,no smoking,,, Jan 11 1424 California Avenue $750 2br - 1113ft2 - (Klamath Falls),http://klamath.craigslist.org/apa/5947977083.html,1424 California Avenue $750 2br - 1113ft2 - (Klamath Falls) - (Klamath Falls),1424 California Avenue $750 2br - 1113ft2,,OR,kl,,house,750,1/11/2017,1/6/2017,2,no smoking,,, "Jan 3 1838 Alma Drive Kelso, WA 98626 $550 1br - 600ft2 - (1838 Alma Drive Kelso, WA)",http://portland.craigslist.org/clk/apa/5937961608.html,"1838 Alma Drive Kelso, WA 98626 $550 1br -
RE: Namedtuples: some unexpected inconveniences
Roel Schroeven wrote, on Thursday, April 13, 2017 5:26 PM > > Gregory Ewing schreef op 13/04/2017 9:34: > > Deborah Swanson wrote: > >> Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM > >> > >>> Personally I would immediately discard the header row > once and for > >>> all, not again and again on every operation. > >> Well, perhaps, but I need the header row to stay in place to write > >> the list to a csv when I'm done > > > > That's no problem, just write the header row separately. > > > > Do this at the beginning: > > > >header = [Record._make(fieldnames)] > >records = [Record._make(row) for row in rows] > > > > and then to write out the file: > > > >writer = csv.writer(outputfile) > >writer.writerow(header) > >writer.writerows(records) > > I don't even think there's any need to store the field names anywhere > else than in fieldnames. So unless I'm missing something, > just do this > at the beginning: > > fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > records = [Record._make(row) for row in rows] > > and this at the end: > > writer = csv.writer(outputfile) > writer.writerow(fieldnames) # or writer.writerow(Record._fields) > writer.writerows(records) > > > -- > The saddest aspect of life right now is that science gathers > knowledge faster than society gathers wisdom. >-- Isaac Asimov > > Roel Schroeven Thanks Roel. I'll try your version when I get the code reconstructed, and that might take a few to several days. I'll try to get back to you though on how it goes. Read the previous messages if you want the sad story of what happened to the original code. Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Gregory Ewing wrote, on Thursday, April 13, 2017 1:14 AM > > Deborah Swanson wrote: > > I don't exactly understand your point (2). If the > namedtuple does not > > have a label attribute, then getattr(record, label) will > get the error > > whether the name label holds the string 'label' or not. > > You sound rather confused. Maybe the following interactive > session transcript will help. > > >>> from collections import namedtuple > >>> record = namedtuple('record', 'alpha,beta') > >>> r = record(1, 2) > >>> r > record(alpha=1, beta=2) > >>> label = 'alpha' > >>> getattr(r, label) > 1 > >>> label = 'beta' > >>> getattr(r, label) > 2 > >>> label = 'gamma' > >>> getattr(r, label) > Traceback (most recent call last): >File "", line 1, in > AttributeError: 'record' object has no attribute 'gamma' > > Can you see what's happening here? The expression > > label > > is being evaluated, and whatever string it evaluates to is > being used as the attribute name to look up. > > Now, I'm not sure exactly what you were doing to get the > message "'record' object has no attribute 'label'". Here are > a few possible ways to get that effect: > > >>> r.label > Traceback (most recent call last): >File "", line 1, in > AttributeError: 'record' object has no attribute 'label' > > >>> getattr(r, 'label') > Traceback (most recent call last): >File "", line 1, in > AttributeError: 'record' object has no attribute 'label' > > >>> label = 'label' > >>> getattr(r, label) > Traceback (most recent call last): >File "", line 1, in > AttributeError: 'record' object has no attribute 'label' > > Or maybe you did something else again. We would need to > see your code in order to tell. > > -- > Greg And it's reproducing the code that's the roadblock to all of these issues. Rest assured I will get to the bottom of this, or at least come back with the code to ask more questions about it and let you see what I had. I want to see what's going on here too. Might be a day or two though. Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Gregory Ewing wrote, on Thursday, April 13, 2017 12:34 AM > > Deborah Swanson wrote: > > Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM > > > >> Personally I would immediately discard the header row once and for > >> all, not again and again on every operation. > > > > Well, perhaps, but I need the header row to stay in place > to write the > > list to a csv when I'm done > > That's no problem, just write the header row separately. > > Do this at the beginning: > >header = [Record._make(fieldnames)] >records = [Record._make(row) for row in rows] > > and then to write out the file: > >writer = csv.writer(outputfile) >writer.writerow(header) >writer.writerows(records) > > > There might be a tiny performance edge in discarding the header row > > for the sort, but there would also be a hit to recreate it > at output > > time. > > It's not about performance, it's about keeping the code as > clean and simple as you can, thus making it easier to > understand and maintain. > > The general idea to take away from this is that it's almost > always best to arrange things so that a given collection > contains just one kind of data, so you can treat every > element of it in exactly the same way. > > -- > Greg That's good advice and I'll rewrite it that way, after I have the code I started with to answer the other questions. I certainly know I have a lot to learn about writing good code, and I can see that what you're suggesting is much cleaner than what I had. Thanks, Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Gregory Ewing wrote, on Thursday, April 13, 2017 12:36 AM > > If you want to be able to update your rows, you may find > this useful: > https://pypi.python.org/pypi/recordclass It's very similar to a namedtuple, but mutable. Looks like it should be a drop-in replacement. -- Greg Thanks Greg, I'll definitely take a look at it -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Gregory Ewing wrote, on Thursday, April 13, 2017 12:17 AM > > Deborah Swanson wrote: > > But I think you got it right in your last sentence below. > defaultdict > > copied them because they were immutable, > > No, definitely not. A defaultdict will never take it upon > itself to copy an object you give it, either as a key or a value. > > The copying, if any, must have occurred somewhere else, in > code that you didn't show us. > > Can you show us the actual code you used to attempt to > update the namedtuples? > > -- > Greg As I just told Peter, I just discovered earlier today that all of that code is lost, and it will take awhile to rewrite. And now I have several reasons to do so. I don't know how long it will take, but I will come back and produce the code that gave me this behavior. -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Peter Otten wrote, on Thursday, April 13, 2017 12:17 AM > > Deborah Swanson wrote: > > > Peter Otten wrote, on Wednesday, April 12, 2017 11:35 PM > >> > >> Deborah Swanson wrote: > >> > >> > It's a small point, but I suspect getattr(record, label) > >> would still > >> > fail, even if label's value is 'label' and only 'label', > but what's > >> > the point of having a variable if it will only ever have > just one > >> > value? > >> > >> You are misunderstanding. Your getattr() call fails because you have > >> > >> label = "label" > >> > >> burried somewhere in your code. As soon as you change that to > >> > >> label = > >> > >> the error will go away. > > > > > > Yes, the error goes away, but now getattr(record, label) is useless > > for processing field names, unless you want to write a line of code > > for each one. (I have 17 field names, and forget about passing label > > to a function.) Uh-oh, I boobooed and misread what you wrote here. > No, it's not useless: > > >>> from collections import namedtuple > >>> T = namedtuple("T", "foo bar baz") > >>> t = T(1, 2, 3) > >>> for name in t._fields: > ... print(name, "=", getattr(t, name)) > ... > foo = 1 > bar = 2 > baz = 3 Wow. Ok, I can see that the specific circumstance I got the "object has no attribute 'label' error was quite likely not due to using getattr() with a variable for a namedtuple field name, and probably some other factor was at work. Unfortunately, when I shifted gears on the overall problem and abandoned the strategy of making the group-by defaultdict, I renamed the project file and started over, going back to the original list of namedtuples. As a result, all the back versions of my code producing this error were lost. I've spent the better part of today rewriting the lost code, and I'm nowhere near finished, and now my illness is ganging up on me again. So anything further will have to wait til tomorrow. I remain quite sure that at no point did I have the line label = "label" in my code, and I wouldn't even have thought of writing it because it's so absurd in so many ways. Hopefully I can show you what I wrote soon, and you can see for yourself. > And as a special service here's a mutable datatype with sufficient > namedtuple compatibility to replicate the above snippet: > > $ cat namedtuple_replacement.py > def struct(name, wanted_columns): > class Struct: > _fields = __slots__ = wanted_columns.split() > > def __init__(self, *args): > names = self.__slots__ > if len(args) != len(names): > raise ValueError > for name, value in zip(names, args): > setattr(self, name, value) > > @classmethod > def _make(cls, args): > return cls(*args) > > def __repr__(self): > names = self.__slots__ > return "{}({})".format( > self.__class__.__name__, > ", ".join("{}={!r}".format(n, getattr(self, > n)) for n in > names) > ) > > Struct.__name__ = name > return Struct > > T = struct("T", "foo bar baz") > t = T(1, 2, 3) > print(t) > for name in t._fields: > print(name, "=", getattr(t, name)) > t.bar = 42 > print(t) > $ python3 namedtuple_replacement.py > T(foo=1, bar=2, baz=3) > foo = 1 > bar = 2 > baz = 3 > T(foo=1, bar=42, baz=3) Thank you for this datatype definition. I won't take a serious look at it until I rewrite the code I lost and get to the bottom of why getattr() got the attribute error, but once that issue is resolved I will return to your mutable datatype with namedtuple compatibility (to some extent, I gather). I apologize for the delay, but your simple getattr() example above demands that I find out why it wasn't working for me before moving on with the rest of this. And that will take some time. Probably not the 3 weeks it took me to get to the point where I was consistently seeing the error, but it will be awhile. I'll come back to address these issues and your datatype when I've got code to show what I did. (For my own sanity, if no other reason.) -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
Gregory Ewing schreef op 13/04/2017 9:34: Deborah Swanson wrote: Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM Personally I would immediately discard the header row once and for all, not again and again on every operation. Well, perhaps, but I need the header row to stay in place to write the list to a csv when I'm done That's no problem, just write the header row separately. Do this at the beginning: header = [Record._make(fieldnames)] records = [Record._make(row) for row in rows] and then to write out the file: writer = csv.writer(outputfile) writer.writerow(header) writer.writerows(records) I don't even think there's any need to store the field names anywhere else than in fieldnames. So unless I'm missing something, just do this at the beginning: fieldnames = next(rows) Record = namedtuple("Record", fieldnames) records = [Record._make(row) for row in rows] and this at the end: writer = csv.writer(outputfile) writer.writerow(fieldnames) # or writer.writerow(Record._fields) writer.writerows(records) -- The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom. -- Isaac Asimov Roel Schroeven -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: I don't exactly understand your point (2). If the namedtuple does not have a label attribute, then getattr(record, label) will get the error whether the name label holds the string 'label' or not. You sound rather confused. Maybe the following interactive session transcript will help. >>> from collections import namedtuple >>> record = namedtuple('record', 'alpha,beta') >>> r = record(1, 2) >>> r record(alpha=1, beta=2) >>> label = 'alpha' >>> getattr(r, label) 1 >>> label = 'beta' >>> getattr(r, label) 2 >>> label = 'gamma' >>> getattr(r, label) Traceback (most recent call last): File "", line 1, in AttributeError: 'record' object has no attribute 'gamma' Can you see what's happening here? The expression label is being evaluated, and whatever string it evaluates to is being used as the attribute name to look up. Now, I'm not sure exactly what you were doing to get the message "'record' object has no attribute 'label'". Here are a few possible ways to get that effect: >>> r.label Traceback (most recent call last): File "", line 1, in AttributeError: 'record' object has no attribute 'label' >>> getattr(r, 'label') Traceback (most recent call last): File "", line 1, in AttributeError: 'record' object has no attribute 'label' >>> label = 'label' >>> getattr(r, label) Traceback (most recent call last): File "", line 1, in AttributeError: 'record' object has no attribute 'label' Or maybe you did something else again. We would need to see your code in order to tell. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
If you want to be able to update your rows, you may find this useful: https://pypi.python.org/pypi/recordclass It's very similar to a namedtuple, but mutable. Looks like it should be a drop-in replacement. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM Personally I would immediately discard the header row once and for all, not again and again on every operation. Well, perhaps, but I need the header row to stay in place to write the list to a csv when I'm done That's no problem, just write the header row separately. Do this at the beginning: header = [Record._make(fieldnames)] records = [Record._make(row) for row in rows] and then to write out the file: writer = csv.writer(outputfile) writer.writerow(header) writer.writerows(records) There might be a tiny performance edge in discarding the header row for the sort, but there would also be a hit to recreate it at output time. It's not about performance, it's about keeping the code as clean and simple as you can, thus making it easier to understand and maintain. The general idea to take away from this is that it's almost always best to arrange things so that a given collection contains just one kind of data, so you can treat every element of it in exactly the same way. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: But I think you got it right in your last sentence below. defaultdict copied them because they were immutable, No, definitely not. A defaultdict will never take it upon itself to copy an object you give it, either as a key or a value. The copying, if any, must have occurred somewhere else, in code that you didn't show us. Can you show us the actual code you used to attempt to update the namedtuples? -- Greg -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: > Peter Otten wrote, on Wednesday, April 12, 2017 11:35 PM >> >> Deborah Swanson wrote: >> >> > It's a small point, but I suspect getattr(record, label) >> would still >> > fail, even if label's value is 'label' and only 'label', but what's >> > the point of having a variable if it will only ever have just one >> > value? >> >> You are misunderstanding. Your getattr() call fails because you have >> >> label = "label" >> >> burried somewhere in your code. As soon as you change that to >> >> label = >> >> the error will go away. > > > Yes, the error goes away, but now getattr(record, label) is useless for > processing field names, unless you want to write a line of code for each > one. (I have 17 field names, and forget about passing label to a > function.) No, it's not useless: >>> from collections import namedtuple >>> T = namedtuple("T", "foo bar baz") >>> t = T(1, 2, 3) >>> for name in t._fields: ... print(name, "=", getattr(t, name)) ... foo = 1 bar = 2 baz = 3 And as a special service here's a mutable datatype with sufficient namedtuple compatibility to replicate the above snippet: $ cat namedtuple_replacement.py def struct(name, wanted_columns): class Struct: _fields = __slots__ = wanted_columns.split() def __init__(self, *args): names = self.__slots__ if len(args) != len(names): raise ValueError for name, value in zip(names, args): setattr(self, name, value) @classmethod def _make(cls, args): return cls(*args) def __repr__(self): names = self.__slots__ return "{}({})".format( self.__class__.__name__, ", ".join("{}={!r}".format(n, getattr(self, n)) for n in names) ) Struct.__name__ = name return Struct T = struct("T", "foo bar baz") t = T(1, 2, 3) print(t) for name in t._fields: print(name, "=", getattr(t, name)) t.bar = 42 print(t) $ python3 namedtuple_replacement.py T(foo=1, bar=2, baz=3) foo = 1 bar = 2 baz = 3 T(foo=1, bar=42, baz=3) -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Peter Otten wrote, on Wednesday, April 12, 2017 11:35 PM > > Deborah Swanson wrote: > > > It's a small point, but I suspect getattr(record, label) > would still > > fail, even if label's value is 'label' and only 'label', but what's > > the point of having a variable if it will only ever have just one > > value? > > You are misunderstanding. Your getattr() call fails because you have > > label = "label" > > burried somewhere in your code. As soon as you change that to > > label = > > the error will go away. Yes, the error goes away, but now getattr(record, label) is useless for processing field names, unless you want to write a line of code for each one. (I have 17 field names, and forget about passing label to a function.) -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: > It's a small point, but I suspect getattr(record, label) would still > fail, even if label's value is 'label' and only 'label', but what's the > point of having a variable if it will only ever have just one value? You are misunderstanding. Your getattr() call fails because you have label = "label" burried somewhere in your code. As soon as you change that to label = the error will go away. -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote, on Wednesday, April 12, 2017 4:29 PM > > Peter Otten wrote, on Wednesday, April 12, 2017 3:15 PM > > > > >> Indeed you cannot change the namedtuple's attributes. Like the > > >> "normal" tuple it is designed to be immutable. If you want changes in > > >> one list (the group) to appear in another (the original records) you > > >> need a mutable data type. > > > > > > Sadly, that does seem to be the correct conclusion here. > > > > Think hard if you really need the original list. > > It's possible you might transform the namedtuple into a > mutable type, and I didn't try that. But it seems like the > group-by defaultdict strategy would have to have a > significant performance edge to be worth it and you wouldn't > have any of the namedtuple properties to work with after the > transformation. I also ran into some trouble with your > algorithm that follows making the defaultdict, and I'm not > sure what value there would be in hashing through that. > Though I'm certainly willing to if you are. > > It worked to simply stay with the original list of > namedtuples to begin with. > > I remain grateful for your introduction to the collections > module. What a neat little package of tools! I know it's quick for this double-take, but it occurs to me that transforming to a mutable type isn't a performance evaluation at all. Filling in missing values is the last step before outputting the processed list, so it might not be necessary to work with namedtuples at that point. The algorithm to fill in the missing values for each group (which would no longer be namedtuples) in the defaultdict is something I'm back at the drawing board for. But it shouldn't be too hard. Haha, we'll see! -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Peter Otten wrote, on Wednesday, April 12, 2017 3:15 PM > > Deborah Swanson wrote: > > >> >value = getattr(record, label) > >> > >> That should work. > > > > We may agree that it *should* work, by an intuitive grasp of how it > > should work, but it doesn't. You get "object has no attribute 'label'. > > Only if the namedtuple > > (1) does not have a label attribute and > (2) the value of the name label is the string "label" > > In that case both > > label = "label" > getattr(record, label) > > and > > record.label > > will fail with the same AttributeError. The problem is *not* the dynamic > access through getattr(). Agreed, it's not getattr's fault. It's a small point, but I suspect getattr(record, label) would still fail, even if label's value is 'label' and only 'label', but what's the point of having a variable if it will only ever have just one value? The question would be whether the compiler (interpreter?) would look at getattr(record, label), evaluate label and see that there is a field named 'label', but I suspect it wouldn't take that many steps. It wants to see recordset.fieldname, and a bare "label" does not reference the object. I don't exactly understand your point (2). If the namedtuple does not have a label attribute, then getattr(record, label) will get the error whether the name label holds the string 'label' or not. And it wants to see recordset.fieldname, not just fieldname. But maybe I misunderstood what you were saying. This stuff is quite loopy to think about, at least for me it is. > >> Indeed you cannot change the namedtuple's attributes. Like the > >> "normal" tuple it is designed to be immutable. If you want changes in > >> one list (the group) to appear in another (the original records) you > >> need a mutable data type. > > > > Sadly, that does seem to be the correct conclusion here. > > Think hard if you really need the original list. It's possible you might transform the namedtuple into a mutable type, and I didn't try that. But it seems like the group-by defaultdict strategy would have to have a significant performance edge to be worth it and you wouldn't have any of the namedtuple properties to work with after the transformation. I also ran into some trouble with your algorithm that follows making the defaultdict, and I'm not sure what value there would be in hashing through that. Though I'm certainly willing to if you are. It worked to simply stay with the original list of namedtuples to begin with. I remain grateful for your introduction to the collections module. What a neat little package of tools! -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: >> >value = getattr(record, label) >> >> That should work. > > We may agree that it *should* work, by an intuitive grasp of how it > should work, but it doesn't. You get "object has no attribute 'label'. Only if the namedtuple (1) does not have a label attribute and (2) the value of the name label is the string "label" In that case both label = "label" getattr(record, label) and record.label will fail with the same AttributeError. The problem is *not* the dynamic access through getattr(). >> Indeed you cannot change the namedtuple's attributes. Like the "normal" >> tuple it is designed to be immutable. If you want changes in one list >> (the group) to appear in another (the original records) you need a >> mutable data type. > > Sadly, that does seem to be the correct conclusion here. Think hard if you really need the original list. -- https://mail.python.org/mailman/listinfo/python-list
RE: Namedtuples: some unexpected inconveniences
Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM > > Deborah Swanson wrote: > > > I won't say the following points are categorically true, but I became > > convinced enough they were true in this instance that I abandoned the > > advised strategy. Which was to use defaultdict to group the list of > > namedtuples by one of the fields for the purpose of determining > > whether certain other fields in each group were either missing values > > or contained contradictory values. > > > > Are these bugs, or was there something I could have done to avoid > > these problems? Or are they just things you need to know working with > > namedtuples? > > > > The list of namedtuples was created with: > > > > infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - > > test.csv") > > rows = csv.reader(infile)fieldnames = next(rows) > > Record = namedtuple("Record", fieldnames) > > records = [Record._make(fieldnames)] > > records.extend(Record._make(row) for row in rows) > > . . . > > (many lines of field processing code) > > . . . > > > > then the attempt to group the records by title: > > > > import operator > > records[1:] = sorted(records[1:], key=operator.attrgetter("title", > > "Date")) > > Personally I would immediately discard the header row once and for all, not > again and again on every operation. Well, perhaps, but I need the header row to stay in place to write the list to a csv when I'm done (which is why it's there in the first place). There might be a tiny performance edge in discarding the header row for the sort, but there would also be a hit to recreate it at output time. > > groups = defaultdict() for r in records[1:]: > > # if the key doesn't exist, make a new group > > if r.title not in groups.keys(): > > groups[r.title] = [r] > > # if key (group) exists, append this record > > else: > > groups[r.title].append(r) > > You are not using the defaultdict the way it is intended; the > groups can be built with > > groups = defaultdict(list) > for r in records[1:]: > groups[r.title].append(r) Yes, going back to your original post I see now that's what you gave, and it's probably why I noticed defaultdict's being characterized by what you make the default to be. Too bad I lost track of that. > > (Please note that this default dict will not automatically make new > > keys when they are encountered, possibly because the keys of the > > defaultdict are made from namedtuples and the values are namedtuples. > > So you have to include the step to make a new key when a key is not > > found.) > > > > If you succeed in modifying records in a group, the dismaying thing is > > that the underlying records are not updated, making the entire > > exercise totally pointless, which was a severe and unexpected > > inconvenience. > > > > It looks like the values and the structure were only copied from the > > original list of namedtuples to the defaultdict. The rows of the > > grouped-by dict still behave like namedtuples, but they are no longer > > the same namedtuples as the original list of namedtuples. (I'm sure I > > didn't say that quite right, please correct me if you have better > > words for it.) > > They should be the same namedtuple. Something is wrong with > your actual code or your diagnosis or both. Well, I didn't see them behaving as the same namedtuples, and I looked hard at it, many different ways. If someone could point out the mistake I might have made to get only copies of them or why they necessarily would be the same namedtuples, I'd certainly look into it. Or better yet some code that does the same thing and they remain the same ones. (But I think you got it right in your last sentence below. defaultdict copied them because they were immutable, leaving the original list unchanged.) > > It might be possible to complete the operation and then write out the > > groups of rows of namedtuples in the dict to a simple list of > > namedtuples, discarding the original, but at the time I noticed that > > modifying rows in a group didn't change the values in the original > > list of namedtuples, I still had further to go with the dict of > > groups, and it was looking easier by the minute to solve the missing > > values problem directly from the original list of namedtuples, so > > that's what I did. > > > > If requested I can reproduce how I saw that the original list of > > namedtuples was not changed when I modified field values in group rows > > of the dict, but it's lengthy and messy. It might be worthwhile though > > if someone might see a mistake I made, though I found the same > > behavior several different ways. Which was when I called it barking up > > the wrong tree and quit trying to solve the problem that way. > > > > Another inconvenience is that there appears to be no way to access > > field values of a named tuple by variable, although I've had limited > > success accessing by
RE: Namedtuples: some unexpected inconveniences
> -Original Message- > From: Python-list > [mailto:python-list-bounces+python=deborahswanson.net@python.o > rg] On Behalf Of MRAB > Sent: Wednesday, April 12, 2017 1:42 PM > To: python-list@python.org > Subject: Re: Namedtuples: some unexpected inconveniences > > > On 2017-04-12 20:57, Deborah Swanson wrote: > > Are these bugs, or was there something I could have done to avoid > > these problems? Or are they just things you need to know > working with namedtuples? > > > > The list of namedtuples was created with: > > > > infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving > 2017 in - > > test.csv") > > rows = csv.reader(infile)fieldnames = next(rows) > > Record = namedtuple("Record", fieldnames) > > records = [Record._make(fieldnames)] > > records.extend(Record._make(row) for row in rows) > > . . . > > (many lines of field processing code) > > . . . > > > > then the attempt to group the records by title: > > > > import operator > > records[1:] = sorted(records[1:], key=operator.attrgetter("title", > > "Date")) groups = defaultdict() for r in records[1:]: > > # if the key doesn't exist, make a new group > > if r.title not in groups.keys(): > > groups[r.title] = [r] > > # if key (group) exists, append this record > > else: > > groups[r.title].append(r) > > > > (Please note that this default dict will not automatically make new > > keys when they are encountered, possibly because the keys of the > > defaultdict are made from namedtuples and the values are > namedtuples. > > So you have to include the step to make a new key when a key is not > > found.) MRAB said: > The defaultdict _will_ work when you use it properly. :-) > > The line should be: > > groups = defaultdict(list) > > so that it'll make a new list every time a new key is > automatically added. Arg. Now I remember the thought crossing my mind early on, and noticing that the characterizing property of a defaultdict was what you set the default to be. Too bad I forgot that useful thought once I was entangled with all those other problems. Thanks for jogging that memory stuck in a hidey hole. > Another point: namedtuples, as with normal tuples, are immutable; once > created, you can't change an attribute. A dict might be a better bet. Yes, namedtuples still being tuples was a point mentioned in passing by someone, I think Steve D'Aprano, but I didn't immediately see that as being the roadblock to accessing field values by variable. It does make sense now though, although others on the list also didn't see it, so I'm not feeling as bad about it as I could. Namedtuples absolutely was the right data structure for two thirds of this program. I only ran into trouble with it trying to do the defaultdict group by thing. And it all turned out ok just by going back to the original list. Now, if I could understand why the namedtuples grouped by the defaultdict were only copied instead of remaining the same namedtuples as the list they were copied from, that should wrap this set of problems up. Many thanks again! Deborah -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
Deborah Swanson wrote: > I won't say the following points are categorically true, but I became > convinced enough they were true in this instance that I abandoned the > advised strategy. Which was to use defaultdict to group the list of > namedtuples by one of the fields for the purpose of determining whether > certain other fields in each group were either missing values or > contained contradictory values. > > Are these bugs, or was there something I could have done to avoid these > problems? Or are they just things you need to know working with > namedtuples? > > The list of namedtuples was created with: > > infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - > test.csv") > rows = csv.reader(infile)fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > records = [Record._make(fieldnames)] > records.extend(Record._make(row) for row in rows) > . . . > (many lines of field processing code) > . . . > > then the attempt to group the records by title: > > import operator > records[1:] = sorted(records[1:], key=operator.attrgetter("title", > "Date")) Personally I would immediately discard the header row once and for all, not again and again on every operation. > groups = defaultdict() for r in records[1:]: > # if the key doesn't exist, make a new group > if r.title not in groups.keys(): > groups[r.title] = [r] > # if key (group) exists, append this record > else: > groups[r.title].append(r) You are not using the defaultdict the way it is intended; the groups can be built with groups = defaultdict(list) for r in records[1:]: groups[r.title].append(r) > (Please note that this default dict will not automatically make new keys > when they are encountered, possibly because the keys of the defaultdict > are made from namedtuples and the values are namedtuples. So you have to > include the step to make a new key when a key is not found.) > > If you succeed in modifying records in a group, the dismaying thing is > that the underlying records are not updated, making the entire exercise > totally pointless, which was a severe and unexpected inconvenience. > > It looks like the values and the structure were only copied from the > original list of namedtuples to the defaultdict. The rows of the > grouped-by dict still behave like namedtuples, but they are no longer > the same namedtuples as the original list of namedtuples. (I'm sure I > didn't say that quite right, please correct me if you have better words > for it.) They should be the same namedtuple. Something is wrong with your actual code or your diagnosis or both. > > It might be possible to complete the operation and then write out the > groups of rows of namedtuples in the dict to a simple list of > namedtuples, discarding the original, but at the time I noticed that > modifying rows in a group didn't change the values in the original list > of namedtuples, I still had further to go with the dict of groups, and > it was looking easier by the minute to solve the missing values problem > directly from the original list of namedtuples, so that's what I did. > > If requested I can reproduce how I saw that the original list of > namedtuples was not changed when I modified field values in group rows > of the dict, but it's lengthy and messy. It might be worthwhile though > if someone might see a mistake I made, though I found the same behavior > several different ways. Which was when I called it barking up the wrong > tree and quit trying to solve the problem that way. > > Another inconvenience is that there appears to be no way to access field > values of a named tuple by variable, although I've had limited success > accessing by variable indices. However, direct attempts to do so, like: > > values = {row[label] for row in group} > (where 'label' is a variable for the field names of a namedtuple) > > gets "object has no attribute 'label' > > or, where 'record' is a row in a list of namedtuples and 'label' is a > variable for the fieldnames of a namedtuple: > > value = getattr(record, label) That should work. > setattr(record, label, value) also don't work. > > You get the error 'object has no attribute 'label' every time. Indeed you cannot change the namedtuple's attributes. Like the "normal" tuple it is designed to be immutable. If you want changes in one list (the group) to appear in another (the original records) you need a mutable data type. -- https://mail.python.org/mailman/listinfo/python-list
Re: Namedtuples: some unexpected inconveniences
On 2017-04-12 20:57, Deborah Swanson wrote: I won't say the following points are categorically true, but I became convinced enough they were true in this instance that I abandoned the advised strategy. Which was to use defaultdict to group the list of namedtuples by one of the fields for the purpose of determining whether certain other fields in each group were either missing values or contained contradictory values. Are these bugs, or was there something I could have done to avoid these problems? Or are they just things you need to know working with namedtuples? The list of namedtuples was created with: infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv") rows = csv.reader(infile)fieldnames = next(rows) Record = namedtuple("Record", fieldnames) records = [Record._make(fieldnames)] records.extend(Record._make(row) for row in rows) . . . (many lines of field processing code) . . . then the attempt to group the records by title: import operator records[1:] = sorted(records[1:], key=operator.attrgetter("title", "Date")) groups = defaultdict() for r in records[1:]: # if the key doesn't exist, make a new group if r.title not in groups.keys(): groups[r.title] = [r] # if key (group) exists, append this record else: groups[r.title].append(r) (Please note that this default dict will not automatically make new keys when they are encountered, possibly because the keys of the defaultdict are made from namedtuples and the values are namedtuples. So you have to include the step to make a new key when a key is not found.) The defaultdict _will_ work when you use it properly. :-) The line should be: groups = defaultdict(list) so that it'll make a new list every time a new key is automatically added. Another point: namedtuples, as with normal tuples, are immutable; once created, you can't change an attribute. A dict might be a better bet. If you succeed in modifying records in a group, the dismaying thing is that the underlying records are not updated, making the entire exercise totally pointless, which was a severe and unexpected inconvenience. It looks like the values and the structure were only copied from the original list of namedtuples to the defaultdict. The rows of the grouped-by dict still behave like namedtuples, but they are no longer the same namedtuples as the original list of namedtuples. (I'm sure I didn't say that quite right, please correct me if you have better words for it.) It might be possible to complete the operation and then write out the groups of rows of namedtuples in the dict to a simple list of namedtuples, discarding the original, but at the time I noticed that modifying rows in a group didn't change the values in the original list of namedtuples, I still had further to go with the dict of groups, and it was looking easier by the minute to solve the missing values problem directly from the original list of namedtuples, so that's what I did. If requested I can reproduce how I saw that the original list of namedtuples was not changed when I modified field values in group rows of the dict, but it's lengthy and messy. It might be worthwhile though if someone might see a mistake I made, though I found the same behavior several different ways. Which was when I called it barking up the wrong tree and quit trying to solve the problem that way. Another inconvenience is that there appears to be no way to access field values of a named tuple by variable, although I've had limited success accessing by variable indices. However, direct attempts to do so, like: values = {row[label] for row in group} (where 'label' is a variable for the field names of a namedtuple) gets "object has no attribute 'label' or, where 'record' is a row in a list of namedtuples and 'label' is a variable for the fieldnames of a namedtuple: value = getattr(record, label) setattr(record, label, value) also don't work. You get the error 'object has no attribute 'label' every time. -- https://mail.python.org/mailman/listinfo/python-list