RE: Namedtuples: some unexpected inconveniences

2017-04-16 Thread Deborah Swanson
Peter Otten wrote, on Saturday, April 15, 2017 12:44 AM
> 
> Deborah Swanson wrote:
> 
> > I know it's your "ugly" answer, but can I ask what the '**' in
> > 
> > fix = {label: max(values, key=len)}
> > group[:] = [record._replace(**fix) for record in group]
> > 
> > means?
> 
> d = {"a": 1, "b": 2}
> f(**d)
> 
> is equivalent to
> 
> f(a=1, b=2)

Thisis perfect Peter, thank you very much. Now I can understand why your
"ugly" fix works, instead of just seeing in the degugger that
mysteriously it does somehow work.

> so ** is a means to call a function with keyword arguments when you
want to 
> decide about the *names* at runtime. Example:
> 
> >>> def f(a=1, b=2):
> ... print("a =", a)
> ... print("b =", b)
> ... print()
> ... 
> >>> for d in [{"a": 10}, {"b": 42}, {"a": 100, "b": 200}]:
> ... f(**d)
> ... 
> a = 10
> b = 2
> 
> a = 1
> b = 42
> 
> a = 100
> b = 200

Looks like a very handy type of "kwarg" to know about. It would be nice
if the doc writers weren't so mysterious about what can be used for
"kwargs", or they explained somewhere what the possible "kwargs" can be.
Even in the one index entry for "kwargs", all they say about what it is,
is "A dict of keyword arguments values", and that only applies to the
Signature object. I've see "kwargs" in articles other than for
namedtuples, always mysteriously, with no details on what the
possibilities are and how to use them. (Makes me wonder where you
learned all this ... ;)
 
> Starting from a namedtuple `record`
> 
> record._replace(Location="elswhere")
> 
> creates a new namedtuple with the Location attribute changed to
"elsewhere", 
> and the slice [:] on the left causes all items in the `groups` list to
be 
> replaced with new namedtuples,
> 
> group[:] = [record._replace(Location="elsewhere") for record in group]
> 
> is basically the same as
> 
> tmp = group.copy()
> group.clear()
> for record in tmp:
> group.append(record_replace(Location="elsewhere"))
> 
> To support not just Location, but also Kind and Notes we need 
> the double asterisk.

I saw this in the debugger, but again didn't really understand why it
was working. So this really clears it up, and I plan to look at it in
the debugger again to be sure I understand it all. Thanks very much.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-15 Thread Peter Otten
Deborah Swanson wrote:

> I know it's your "ugly" answer, but can I ask what the '**' in
> 
> fix = {label: max(values, key=len)}
> group[:] = [record._replace(**fix) for record in group]
> 
> means? 

d = {"a": 1, "b": 2}
f(**d)

is equivalent to

f(a=1, b=2)

so ** is a means to call a function with keyword arguments when you want to 
decide about the *names* at runtime. Example:

>>> def f(a=1, b=2):
... print("a =", a)
... print("b =", b)
... print()
... 
>>> for d in [{"a": 10}, {"b": 42}, {"a": 100, "b": 200}]:
... f(**d)
... 
a = 10
b = 2

a = 1
b = 42

a = 100
b = 200

Starting from a namedtuple `record`

record._replace(Location="elswhere")

creates a new namedtuple with the Location attribute changed to "elsewhere", 
and the slice [:] on the left causes all items in the `groups` list to be 
replaced with new namedtuples,

group[:] = [record._replace(Location="elsewhere") for record in group]

is basically the same as

tmp = group.copy()
group.clear()
for record in tmp:
group.append(record_replace(Location="elsewhere"))

To support not just Location, but also Kind and Notes we need the double 
asterisk.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Deborah Swanson
Roel Schroeven wrote, on Thursday, April 13, 2017 5:26 PM
> 
> Gregory Ewing schreef op 13/04/2017 9:34:
> > Deborah Swanson wrote:
> >> Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM
> >>
> >>> Personally I would immediately discard the header row 
> once and for 
> >>> all, not again and again on every operation.
> >> Well, perhaps, but I need the header row to stay in place to write 
> >> the list to a csv when I'm done
> > 
> > That's no problem, just write the header row separately.
> > 
> > Do this at the beginning:
> > 
> >header = [Record._make(fieldnames)]
> >records = [Record._make(row) for row in rows]
> > 
> > and then to write out the file:
> > 
> >writer = csv.writer(outputfile)
> >writer.writerow(header)
> >writer.writerows(records)
> 
> I don't even think there's any need to store the field names anywhere 
> else than in fieldnames. So unless I'm missing something, 
> just do this 
> at the beginning:
> 
>  fieldnames = next(rows)
>  Record = namedtuple("Record", fieldnames)
>  records = [Record._make(row) for row in rows]
> 
> and this at the end:
> 
>  writer = csv.writer(outputfile)
>  writer.writerow(fieldnames) # or writer.writerow(Record._fields)
>  writer.writerows(records)
> 
> 
> -- 
> The saddest aspect of life right now is that science gathers 
> knowledge faster than society gathers wisdom.
>-- Isaac Asimov
> 
> Roel Schroeven

This is essentially what Peter Otten most recently recommended. I know
you got there first, but it is better to only get the header row to name
the fields as you and Greg Ewing suggested, and then use just the
records in processing the field data, using the field names only for the
output.

Thanks,
Deborah

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Gregory Ewing

Peter Otten wrote:

PS: Personally I would probably take the opposite direction and use dicts 
throughout...


Yes, my suggestion to used namedtuples in the first place was
based on the assumption that you would mostly be referring to
fields using fixed names. If that's not true, then using
namedtuples (or a mutable equivalent) might just be making
things harder.

If the names are sometimes fixed and sometimes not, then
you have a tradeoff to make.

In the code you posted most recently, the only fixed field
reference seems to be row.title, and that only appears once.
So as long as that's all you want to do with the rows,
storing them as dicts would appear to be a clear winner.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Deborah Swanson
Gregory Ewing wrote, on Thursday, April 13, 2017 12:17 AM
> 
> Deborah Swanson wrote:
> > But I think you got it right in your last sentence below.
defaultdict 
> > copied them because they were immutable,
> 
> No, definitely not. A defaultdict will never take it upon 
> itself to copy an object you give it, either as a key or a value.
> 
> The copying, if any, must have occurred somewhere else, in
> code that you didn't show us.
> 
> Can you show us the actual code you used to attempt to
> update the namedtuples?
> 
> -- 
> Greg

I think you've heard my sob story of how the actual code was lost
(PyCharm ate it).

I've made some attempts to recover that code, but honestly, at this
point Peter Otten has showed my enough examples of getattr() that work
with namedtuples with variable names, that I'd rather just accept that
probably some tranformation of the structure I did caused the copying of
values only. I remember looking at it in the debugger, but that code was
convoluted and I don't think it's worth teasing out exactly what went
wrong. (or figuring out how I did it in the first place)

Deborah

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Deborah Swanson
MRAB wrote, on Friday, April 14, 2017 2:19 PM
> 
> In the line:
> 
>  values = {row[label] for row in group}
> 
> 'group' is a list of records; row is a record (namedtuple).
> 
> You can get the members of a namedtuple (also 'normal' tuple) by
numeric 
> index, e.g. row[0], but the point of a namedtuple is that you can get 
> them by name, as an attribute, e.g. row.Location.
> 
> As the name of the attribute isn't fixed, but passed by name, use 
> getattr(row, label) instead:
> 
>  values = {getattr(row, label) for row in group}
> 
> As for the values:
> 
>  # Remove the missing value, if present.
>  values.discard('')
> 
>  # There's only 1 value left, so fill in the empty places.
>  if len(values) == 1:
>  ...

Thanks for this, but honestly, I'm namedtupled-out at the moment and I
have several other projects I'd like to be working on. But I saved your
suggestion with ones that others have made, so I'll revisit yours again
when I come back for another look at namedtuples.

> The next point is that namedtuples, like normal tuples, are immutable.

> You can't change the value of an attribute.

No you can't, but you can use

somenamedtuple._replace(kwargs) 

to replace the value. Works just as well.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Deborah Swanson
fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]

Peter Otten wrote, on Friday, April 14, 2017 2:16 PM
> > def complete(group, label):
> > values = {row[label] for row in group}
> > # get "TypeError: tuple indices must be integers, not str"
> 
> Yes, the function expects row to be dict-like. However when 
> you change 
> 
> row[label]
> 
> to
> 
> getattr(row, label)
> 
> this part of the code will work...
> 
> > has_empty = not min(values, key=len)
> > if len(values) - has_empty != 1:
> > # no value or multiple values; manual intervention needed
> > return False
> > elif has_empty:
> > for row in group:
> > row[label] = max(values, key=len)
> 
> but here you'll get an error. I made the experiment to change 
> everything 
> necessary to make it work with namedtuples, but you'll 
> probably find the 
> result a bit hard to follow:
> 
> import csv
> from collections import namedtuple, defaultdict
> 
> INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 
> in - test.csv" OUTFILE = "tmp.csv" 
> 
> def get_title(row):
> return row.title
> 
> def complete(group, label):
> values = {getattr(row, label) for row in group}  
> has_empty = not min(values, key=len)
> if len(values) - has_empty != 1:
> # no value or multiple values; manual intervention needed
> return False
> elif has_empty:
> # replace namedtuples in the group. Yes, it's ugly
> fix = {label: max(values, key=len)}
> group[:] = [record._replace(**fix) for record in group]
> return True
> 
> with open(INFILE) as infile:
> rows = csv.reader(infile)
> fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> groups = defaultdict(list)
> for row in rows:
> record = Record._make(row)
> groups[get_title(record)].append(record)
> 
> LABELS = ['Location', 'Kind', 'Notes']
> 
> # add missing values
> for group in groups.values():
> for label in LABELS:
> complete(group, label)
> 
> # dump data (as a demo that you do not need the list of all 
> records) with open(OUTFILE, "w") as outfile:
> writer = csv.writer(outfile)
> writer.writerow(fieldnames)
> writer.writerows(
> record for group in groups.values() for record in group
> )
> 
> One alternative is to keep the original and try to replace 
> the namedtuple 
> with the class suggested by Gregory Ewing. Then it should 
> suffice to also 
> change
> 
> > elif has_empty:
> > for row in group:
> > row[label] = max(values, key=len)
> 
> to
> 
> > elif has_empty:
> > for row in group:
>   setattr(row, label, max(values, key=len))
> 
> PS: Personally I would probably take the opposite direction 
> and use dicts 
> throughout...

Ok, thank you. I haven't run it on a real input file yet, but this seems
to work with the test file.

Because the earlier incarnation defined 'values' as

values = {row[label] for row in group}

I'd incorrectly guessed what was going on in 

has_empty = not min(values, key=len).

Now that 

values = {getattr(row, label) for row in group}

works properly as you intended it to, I see you get the set of unique
values for that label in that group, which makes the rest of it make
sense.

I know it's your "ugly" answer, but can I ask what the '**' in

fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]

means? 

I haven't seen it before, and I imagine it's one of the possible
'kwargs' in 'somenamedtuple._replace(kwargs)', but I have no idea where
to look up the possible 'kwargs'. (probably short for keyword args) 

Also, I don't see how you get a set for values with the notation you
used. Looks like if anything you've got a comprehension that should give
you a dict. (But I haven't worked a lot with sets either.)

Thanks

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-14 Thread MRAB

On 2017-04-14 20:34, Deborah Swanson wrote:

Peter,

Retracing my steps to rewrite the getattr(row, label) code, this is what
sent me down the rabbit hole in the first place. (I changed your 'rows'
to 'records' just to use the same name everywhere, but all else is the
same as you gave me.) I'd like you to look at it and see if you still
think complete(group, label) should work. Perhaps seeing why it fails
will clarify some of the difficulties I'm having.

I ran into problems with values and has_empty. values has a problem
because
row[label] gets a TypeError. has_empty has a problem because a list of
field values will be shorter with missing values than a full list, but a
namedtuple with missing values will be the same length as a full
namedtuple since missing values have '' placeholders.  Two more
unexpected inconveniences.


In the line:

values = {row[label] for row in group}

'group' is a list of records; row is a record (namedtuple).

You can get the members of a namedtuple (also 'normal' tuple) by numeric 
index, e.g. row[0], but the point of a namedtuple is that you can get 
them by name, as an attribute, e.g. row.Location.


As the name of the attribute isn't fixed, but passed by name, use 
getattr(row, label) instead:


values = {getattr(row, label) for row in group}


As for the values:

# Remove the missing value, if present.
values.discard('')

# There's only 1 value left, so fill in the empty places.
if len(values) == 1:
...


The next point is that namedtuples, like normal tuples, are immutable. 
You can't change the value of an attribute.



A short test csv is at the end, for you to read in and attempt to
execute the following code, and I'm still working on reconstructing the
lost getattr(row, label) code.

import csv
from collections import namedtuple, defaultdict

def get_title(row):
 return row.title

def complete(group, label):
 values = {row[label] for row in group}
 # get "TypeError: tuple indices must be integers, not str"
 has_empty = not min(values, key=len)
 if len(values) - has_empty != 1:
 # no value or multiple values; manual intervention needed
 return False
 elif has_empty:
 for row in group:
 row[label] = max(values, key=len)
 return True

infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
test.csv")
rows = csv.reader(infile)
fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
records = [Record._make(fieldnames)]
records.extend(Record._make(row) for row in rows)

# group rows by title
groups = defaultdict(list)
for row in records:
 groups[get_title(row)].append(row)

LABELS = ['Location', 'Kind', 'Notes']

# add missing values
for group in groups.values():
 for label in LABELS:
 complete(group, label)

Moving 2017 in - test.csv:
(If this doesn't come through the mail system correctly, I've also
uploaded the file to
http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv.
Permissions should be set correctly, but let me know if you run into
problems downloading the file.)


[snip]
--
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Peter Otten
Deborah Swanson wrote:

> Peter,
> 
> Retracing my steps to rewrite the getattr(row, label) code, this is what
> sent me down the rabbit hole in the first place. (I changed your 'rows'
> to 'records' just to use the same name everywhere, but all else is the
> same as you gave me.) I'd like you to look at it and see if you still
> think complete(group, label) should work. Perhaps seeing why it fails
> will clarify some of the difficulties I'm having.
> 
> I ran into problems with values and has_empty. values has a problem
> because
> row[label] gets a TypeError. has_empty has a problem because a list of
> field values will be shorter with missing values than a full list, but a
> namedtuple with missing values will be the same length as a full
> namedtuple since missing values have '' placeholders.  Two more
> unexpected inconveniences.
> 
> A short test csv is at the end, for you to read in and attempt to
> execute the following code, and I'm still working on reconstructing the
> lost getattr(row, label) code.
> 
> import csv
> from collections import namedtuple, defaultdict
> 
> def get_title(row):
> return row.title
> 
> def complete(group, label):
> values = {row[label] for row in group}
> # get "TypeError: tuple indices must be integers, not str"

Yes, the function expects row to be dict-like. However when you change 

row[label]

to

getattr(row, label)

this part of the code will work...

> has_empty = not min(values, key=len)
> if len(values) - has_empty != 1:
> # no value or multiple values; manual intervention needed
> return False
> elif has_empty:
> for row in group:
> row[label] = max(values, key=len)

but here you'll get an error. I made the experiment to change everything 
necessary to make it work with namedtuples, but you'll probably find the 
result a bit hard to follow:

import csv
from collections import namedtuple, defaultdict

INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv"
OUTFILE = "tmp.csv" 

def get_title(row):
return row.title

def complete(group, label):
values = {getattr(row, label) for row in group}  
has_empty = not min(values, key=len)
if len(values) - has_empty != 1:
# no value or multiple values; manual intervention needed
return False
elif has_empty:
# replace namedtuples in the group. Yes, it's ugly
fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]
return True

with open(INFILE) as infile:
rows = csv.reader(infile)
fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
groups = defaultdict(list)
for row in rows:
record = Record._make(row)
groups[get_title(record)].append(record)

LABELS = ['Location', 'Kind', 'Notes']

# add missing values
for group in groups.values():
for label in LABELS:
complete(group, label)

# dump data (as a demo that you do not need the list of all records)
with open(OUTFILE, "w") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
writer.writerows(
record for group in groups.values() for record in group
)

One alternative is to keep the original and try to replace the namedtuple 
with the class suggested by Gregory Ewing. Then it should suffice to also 
change

> elif has_empty:
> for row in group:
> row[label] = max(values, key=len)

to

> elif has_empty:
> for row in group:
  setattr(row, label, max(values, key=len))

PS: Personally I would probably take the opposite direction and use dicts 
throughout...

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-14 Thread Deborah Swanson
Peter,

Retracing my steps to rewrite the getattr(row, label) code, this is what
sent me down the rabbit hole in the first place. (I changed your 'rows'
to 'records' just to use the same name everywhere, but all else is the
same as you gave me.) I'd like you to look at it and see if you still
think complete(group, label) should work. Perhaps seeing why it fails
will clarify some of the difficulties I'm having.

I ran into problems with values and has_empty. values has a problem
because 
row[label] gets a TypeError. has_empty has a problem because a list of
field values will be shorter with missing values than a full list, but a
namedtuple with missing values will be the same length as a full
namedtuple since missing values have '' placeholders.  Two more
unexpected inconveniences. 

A short test csv is at the end, for you to read in and attempt to
execute the following code, and I'm still working on reconstructing the
lost getattr(row, label) code.

import csv
from collections import namedtuple, defaultdict

def get_title(row):
return row.title

def complete(group, label):
values = {row[label] for row in group}  
# get "TypeError: tuple indices must be integers, not str"
has_empty = not min(values, key=len)
if len(values) - has_empty != 1:
# no value or multiple values; manual intervention needed
return False
elif has_empty:
for row in group:
row[label] = max(values, key=len)
return True

infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
test.csv")
rows = csv.reader(infile)
fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
records = [Record._make(fieldnames)]
records.extend(Record._make(row) for row in rows)

# group rows by title
groups = defaultdict(list)
for row in records:
groups[get_title(row)].append(row)

LABELS = ['Location', 'Kind', 'Notes']

# add missing values
for group in groups.values():
for label in LABELS:
complete(group, label)

Moving 2017 in - test.csv:
(If this doesn't come through the mail system correctly, I've also
uploaded the file to
http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv.
Permissions should be set correctly, but let me know if you run into
problems downloading the file.)


CLDesc,url,title,Description,Location,ST,co,miles,Kind,Rent,Date,first,b
r,Notes,yesno,mark,arc
Jan 3 1 bedroom/1 bath mobile $700 1br - (Mount
Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1
bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath
mobile $700 1br,Mount Vernon,WA,sk,,trailer,700,1/3/2017,1/3/2017,1,no
smoking,,,deleted by its author
Jan 6 1 bedroom/1 bath mobile $700 1br - (Mount
Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1
bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath
mobile $700 1br,,WA,,,trailer,700,1/6/2017,1/3/2017,1,no smoking,,,
Jan 10 1 bedroom/1 bath mobile $700 1br - (Mount
Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1
bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath
mobile $700 1br,,700,1/10/2017,1/3/2017,1
Jan 17 1 bedroom/1 bath mobile $700 1br - (Mount
Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1
bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath
mobile $700 1br,Mount Vernon,WA,,,trailer,700,1/17/2017,1/3/2017,1,no
smoking,,,
Jan 19 1 bedroom/1 bath mobile $700 1br - (Mount
Vernon),http://skagit.craigslist.org/apa/5943737902.html,1 bedroom/1
bath mobile $700 1br - (Mount Vernon) - (Mount Vernon),1 bedroom/1 bath
mobile $700 1br,Mount Vernon,WA,,,trailer,700,1/19/2017,1/3/2017,1,no
smoking,,,
Jan 26 1240 8th Avenue $725 2br - 676ft2 -
(Longview),http://portland.craigslist.org/clk/apa/5976442500.html,1240
8th Avenue $725 2br - 676ft2 - (Longview),1240 8th Avenue $725 2br -
676ft2,,725,1/26/2017,1/16/2017,2
Jan 16 1240 8th Avenue $725 2br - 676ft2 -
(Longview),http://portland.craigslist.org/clk/apa/5961794305.html,1240
8th Avenue $725 2br - 676ft2 - (Longview) - (Longview),1240 8th Avenue
$725 2br -
676ft2,Longview,WA,,,house,725,1/16/2017,1/16/2017,2,"detached garage,
w/d hookups",,,
Jan 6 1424 California Avenue $750 2br - 1113ft2 - (Klamath
Falls),http://klamath.craigslist.org/apa/5947977083.html,1424 California
Avenue $750 2br - 1113ft2 - (Klamath Falls) - (Klamath Falls),1424
California Avenue $750 2br - 1113ft2,Klamath
Falls,OR,kl,,house,750,1/6/2017,1/6/2017,2,no smoking,,,
Jan 11 1424 California Avenue $750 2br - 1113ft2 - (Klamath
Falls),http://klamath.craigslist.org/apa/5947977083.html,1424 California
Avenue $750 2br - 1113ft2 - (Klamath Falls) - (Klamath Falls),1424
California Avenue $750 2br -
1113ft2,,OR,kl,,house,750,1/11/2017,1/6/2017,2,no smoking,,,
"Jan 3 1838 Alma Drive Kelso, WA 98626 $550 1br - 600ft2 - (1838 Alma
Drive Kelso,
WA)",http://portland.craigslist.org/clk/apa/5937961608.html,"1838 Alma
Drive Kelso, WA 98626 $550 1br - 

RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Roel Schroeven wrote, on Thursday, April 13, 2017 5:26 PM
> 
> Gregory Ewing schreef op 13/04/2017 9:34:
> > Deborah Swanson wrote:
> >> Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM
> >>
> >>> Personally I would immediately discard the header row 
> once and for 
> >>> all, not again and again on every operation.
> >> Well, perhaps, but I need the header row to stay in place to write 
> >> the list to a csv when I'm done
> > 
> > That's no problem, just write the header row separately.
> > 
> > Do this at the beginning:
> > 
> >header = [Record._make(fieldnames)]
> >records = [Record._make(row) for row in rows]
> > 
> > and then to write out the file:
> > 
> >writer = csv.writer(outputfile)
> >writer.writerow(header)
> >writer.writerows(records)
> 
> I don't even think there's any need to store the field names anywhere 
> else than in fieldnames. So unless I'm missing something, 
> just do this 
> at the beginning:
> 
>  fieldnames = next(rows)
>  Record = namedtuple("Record", fieldnames)
>  records = [Record._make(row) for row in rows]
> 
> and this at the end:
> 
>  writer = csv.writer(outputfile)
>  writer.writerow(fieldnames) # or writer.writerow(Record._fields)
>  writer.writerows(records)
> 
> 
> -- 
> The saddest aspect of life right now is that science gathers 
> knowledge faster than society gathers wisdom.
>-- Isaac Asimov
> 
> Roel Schroeven

Thanks Roel. I'll try your version when I get the code reconstructed,
and that might take a few to several days. I'll try to get back to you
though on how it goes.

Read the previous messages if you want the sad story of what happened to
the original code.

Deborah

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Gregory Ewing wrote, on Thursday, April 13, 2017 1:14 AM
> 
> Deborah Swanson wrote:
> > I don't exactly understand your point (2). If the 
> namedtuple does not 
> > have a label attribute, then getattr(record, label) will 
> get the error 
> > whether the name label holds the string 'label' or not.
> 
> You sound rather confused. Maybe the following interactive 
> session transcript will help.
> 
>  >>> from collections import namedtuple
>  >>> record = namedtuple('record', 'alpha,beta')
>  >>> r = record(1, 2)
>  >>> r
> record(alpha=1, beta=2)
>  >>> label = 'alpha'
>  >>> getattr(r, label)
> 1
>  >>> label = 'beta'
>  >>> getattr(r, label)
> 2
>  >>> label = 'gamma'
>  >>> getattr(r, label)
> Traceback (most recent call last):
>File "", line 1, in 
> AttributeError: 'record' object has no attribute 'gamma'
> 
> Can you see what's happening here? The expression
> 
> label
> 
> is being evaluated, and whatever string it evaluates to is 
> being used as the attribute name to look up.
> 
> Now, I'm not sure exactly what you were doing to get the 
> message "'record' object has no attribute 'label'". Here are 
> a few possible ways to get that effect:
> 
>  >>> r.label
> Traceback (most recent call last):
>File "", line 1, in 
> AttributeError: 'record' object has no attribute 'label'
> 
>  >>> getattr(r, 'label')
> Traceback (most recent call last):
>File "", line 1, in 
> AttributeError: 'record' object has no attribute 'label'
> 
>  >>> label = 'label'
>  >>> getattr(r, label)
> Traceback (most recent call last):
>File "", line 1, in 
> AttributeError: 'record' object has no attribute 'label'
> 
> Or maybe you did something else again. We would need to
> see your code in order to tell.
> 
> -- 
> Greg

And it's reproducing the code that's the roadblock to all of these
issues.

Rest assured I will get to the bottom of this, or at least come back
with the code to ask more questions about it and let you see what I had.
I want to see what's going on here too.

Might be a day or two though.

Deborah

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Gregory Ewing wrote, on Thursday, April 13, 2017 12:34 AM
> 
> Deborah Swanson wrote:
> > Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM
> > 
> >> Personally I would immediately discard the header row once and for 
> >> all, not again and again on every operation.
> > 
> > Well, perhaps, but I need the header row to stay in place 
> to write the 
> > list to a csv when I'm done
> 
> That's no problem, just write the header row separately.
> 
> Do this at the beginning:
> 
>header = [Record._make(fieldnames)]
>records = [Record._make(row) for row in rows]
> 
> and then to write out the file:
> 
>writer = csv.writer(outputfile)
>writer.writerow(header)
>writer.writerows(records)
> 
> > There might be a tiny performance edge in discarding the header row 
> > for the sort, but there would also be a hit to recreate it 
> at output 
> > time.
> 
> It's not about performance, it's about keeping the code as 
> clean and simple as you can, thus making it easier to 
> understand and maintain.
> 
> The general idea to take away from this is that it's almost 
> always best to arrange things so that a given collection 
> contains just one kind of data, so you can treat every 
> element of it in exactly the same way.
> 
> -- 
> Greg

That's good advice and I'll rewrite it that way, after I have the code I
started with to answer the other questions.

I certainly know I have a lot to learn about writing good code, and I
can see that what you're suggesting is much cleaner than what I had.

Thanks,
Deborah

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Gregory Ewing wrote, on Thursday, April 13, 2017 12:36 AM
> 
> If you want to be able to update your rows, you may find
> this useful:
> 
https://pypi.python.org/pypi/recordclass

It's very similar to a namedtuple, but mutable. Looks like it should be
a drop-in replacement.

-- 
Greg

Thanks Greg, I'll definitely take a look at it

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Gregory Ewing wrote, on Thursday, April 13, 2017 12:17 AM
> 
> Deborah Swanson wrote:
> > But I think you got it right in your last sentence below. 
> defaultdict 
> > copied them because they were immutable,
> 
> No, definitely not. A defaultdict will never take it upon 
> itself to copy an object you give it, either as a key or a value.
> 
> The copying, if any, must have occurred somewhere else, in
> code that you didn't show us.
> 
> Can you show us the actual code you used to attempt to
> update the namedtuples?
> 
> -- 
> Greg

As I just told Peter, I just discovered earlier today that all of that
code is lost, and it will take awhile to rewrite. And now I have several
reasons to do so.

I don't know how long it will take, but I will come back and produce the
code that gave me this behavior.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Peter Otten wrote, on Thursday, April 13, 2017 12:17 AM
> 
> Deborah Swanson wrote:
> 
> > Peter Otten wrote, on Wednesday, April 12, 2017 11:35 PM
> >> 
> >> Deborah Swanson wrote:
> >> 
> >> > It's a small point, but I suspect getattr(record, label)
> >> would still
> >> > fail, even if label's value is 'label' and only 'label', 
> but what's 
> >> > the point of having a variable if it will only ever have 
> just one 
> >> > value?
> >> 
> >> You are misunderstanding. Your getattr() call fails because you
have
> >> 
> >> label = "label"
> >> 
> >> burried somewhere in your code. As soon as you change that to
> >> 
> >> label = 
> >> 
> >> the error will go away.
> > 
> > 
> > Yes, the error goes away, but now getattr(record, label) is useless 
> > for processing field names, unless you want to write a line of code 
> > for each one. (I have 17 field names, and forget about passing label

> > to a function.)

Uh-oh, I boobooed and misread what you wrote here. 

> No, it's not useless:
> 
> >>> from collections import namedtuple
> >>> T = namedtuple("T", "foo bar baz")
> >>> t = T(1, 2, 3)
> >>> for name in t._fields:
> ... print(name, "=", getattr(t, name))
> ... 
> foo = 1
> bar = 2
> baz = 3

Wow. Ok, I can see that the specific circumstance I got the "object has
no attribute 'label' error was quite likely not due to using getattr()
with a variable for a namedtuple field name, and probably some other
factor was at work.

Unfortunately, when I shifted gears on the overall problem and abandoned
the strategy of making the group-by defaultdict, I renamed the project
file and started over, going back to the original list of namedtuples.
As a result, all the back versions of my code producing this error were
lost.

I've spent the better part of today rewriting the lost code, and I'm
nowhere near finished, and now my illness is ganging up on me again. So
anything further will have to wait til tomorrow.

I remain quite sure that at no point did I have the line

label = "label"

in my code, and I wouldn't even have thought of writing it because it's
so absurd in so many ways. Hopefully I can show you what I wrote soon,
and you can see for yourself.

> And as a special service here's a mutable datatype with sufficient 
> namedtuple compatibility to replicate the above snippet:
> 
> $ cat namedtuple_replacement.py
> def struct(name, wanted_columns):
> class Struct:
> _fields = __slots__ = wanted_columns.split()
> 
> def __init__(self, *args):
> names = self.__slots__
> if len(args) != len(names):
> raise ValueError
> for name, value in zip(names, args):
> setattr(self, name, value)
> 
> @classmethod
> def _make(cls, args):
> return cls(*args)
> 
> def __repr__(self):
> names = self.__slots__
> return "{}({})".format(
> self.__class__.__name__,
> ", ".join("{}={!r}".format(n, getattr(self, 
> n)) for n in 
> names)
> )
> 
> Struct.__name__ = name
> return Struct
> 
> T = struct("T", "foo bar baz")
> t = T(1, 2, 3)
> print(t)
> for name in t._fields:
> print(name, "=", getattr(t, name))
> t.bar = 42
> print(t)
> $ python3 namedtuple_replacement.py 
> T(foo=1, bar=2, baz=3)
> foo = 1
> bar = 2
> baz = 3
> T(foo=1, bar=42, baz=3)

Thank you for this datatype definition.

I won't take a serious look at it until I rewrite the code I lost and
get to the bottom of why getattr() got the attribute error, but once
that issue is resolved I will return to your mutable datatype with
namedtuple 
compatibility (to some extent, I gather).

I apologize for the delay, but your simple getattr() example above
demands that I find out why it wasn't working for me before moving on
with the rest of this. And that will take some time. Probably not the 3
weeks it took me to get to the point where I was consistently seeing the
error, but it will be awhile. I'll come back to address these issues and
your datatype when I've got code to show what I did. (For my own sanity,
if no other reason.)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Roel Schroeven

Gregory Ewing schreef op 13/04/2017 9:34:

Deborah Swanson wrote:

Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM


Personally I would immediately discard the header row once and for
all, not again and again on every operation.

Well, perhaps, but I need the header row to stay in place to write the
list to a csv when I'm done


That's no problem, just write the header row separately.

Do this at the beginning:

   header = [Record._make(fieldnames)]
   records = [Record._make(row) for row in rows]

and then to write out the file:

   writer = csv.writer(outputfile)
   writer.writerow(header)
   writer.writerows(records)


I don't even think there's any need to store the field names anywhere 
else than in fieldnames. So unless I'm missing something, just do this 
at the beginning:


fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
records = [Record._make(row) for row in rows]

and this at the end:

writer = csv.writer(outputfile)
writer.writerow(fieldnames) # or writer.writerow(Record._fields)
writer.writerows(records)


--
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
  -- Isaac Asimov

Roel Schroeven

--
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Gregory Ewing

Deborah Swanson wrote:

I don't exactly understand your point (2). If the namedtuple does not
have a label attribute, then getattr(record, label) will get the error
whether the name label holds the string 'label' or not.


You sound rather confused. Maybe the following interactive
session transcript will help.

>>> from collections import namedtuple
>>> record = namedtuple('record', 'alpha,beta')
>>> r = record(1, 2)
>>> r
record(alpha=1, beta=2)
>>> label = 'alpha'
>>> getattr(r, label)
1
>>> label = 'beta'
>>> getattr(r, label)
2
>>> label = 'gamma'
>>> getattr(r, label)
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'record' object has no attribute 'gamma'

Can you see what's happening here? The expression

   label

is being evaluated, and whatever string it evaluates to is
being used as the attribute name to look up.

Now, I'm not sure exactly what you were doing to get the
message "'record' object has no attribute 'label'".
Here are a few possible ways to get that effect:

>>> r.label
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'record' object has no attribute 'label'

>>> getattr(r, 'label')
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'record' object has no attribute 'label'

>>> label = 'label'
>>> getattr(r, label)
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'record' object has no attribute 'label'

Or maybe you did something else again. We would need to
see your code in order to tell.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Gregory Ewing

If you want to be able to update your rows, you may find
this useful:

https://pypi.python.org/pypi/recordclass

It's very similar to a namedtuple, but mutable. Looks like it
should be a drop-in replacement.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Gregory Ewing

Deborah Swanson wrote:

Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM


Personally I would immediately discard the header row once and for
all, not again and again on every operation.


Well, perhaps, but I need the header row to stay in place to write the
list to a csv when I'm done


That's no problem, just write the header row separately.

Do this at the beginning:

  header = [Record._make(fieldnames)]
  records = [Record._make(row) for row in rows]

and then to write out the file:

  writer = csv.writer(outputfile)
  writer.writerow(header)
  writer.writerows(records)


There might be a tiny performance edge in discarding the header
row for the sort, but there would also be a hit to recreate it at output
time.


It's not about performance, it's about keeping the code as clean
and simple as you can, thus making it easier to understand and
maintain.

The general idea to take away from this is that it's almost always
best to arrange things so that a given collection contains just
one kind of data, so you can treat every element of it in exactly
the same way.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Gregory Ewing

Deborah Swanson wrote:

But I think you got it right in your last sentence below. defaultdict
copied them because they were immutable,


No, definitely not. A defaultdict will never take it upon
itself to copy an object you give it, either as a key or a
value.

The copying, if any, must have occurred somewhere else, in
code that you didn't show us.

Can you show us the actual code you used to attempt to
update the namedtuples?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Peter Otten
Deborah Swanson wrote:

> Peter Otten wrote, on Wednesday, April 12, 2017 11:35 PM
>> 
>> Deborah Swanson wrote:
>> 
>> > It's a small point, but I suspect getattr(record, label)
>> would still
>> > fail, even if label's value is 'label' and only 'label', but what's
>> > the point of having a variable if it will only ever have just one
>> > value?
>> 
>> You are misunderstanding. Your getattr() call fails because you have
>> 
>> label = "label"
>> 
>> burried somewhere in your code. As soon as you change that to
>> 
>> label = 
>> 
>> the error will go away.
> 
> 
> Yes, the error goes away, but now getattr(record, label) is useless for
> processing field names, unless you want to write a line of code for each
> one. (I have 17 field names, and forget about passing label to a
> function.)

No, it's not useless:

>>> from collections import namedtuple
>>> T = namedtuple("T", "foo bar baz")
>>> t = T(1, 2, 3)
>>> for name in t._fields:
... print(name, "=", getattr(t, name))
... 
foo = 1
bar = 2
baz = 3

And as a special service here's a mutable datatype with sufficient 
namedtuple compatibility to replicate the above snippet:

$ cat namedtuple_replacement.py
def struct(name, wanted_columns):
class Struct:
_fields = __slots__ = wanted_columns.split()

def __init__(self, *args):
names = self.__slots__
if len(args) != len(names):
raise ValueError
for name, value in zip(names, args):
setattr(self, name, value)

@classmethod
def _make(cls, args):
return cls(*args)

def __repr__(self):
names = self.__slots__
return "{}({})".format(
self.__class__.__name__,
", ".join("{}={!r}".format(n, getattr(self, n)) for n in 
names)
)

Struct.__name__ = name
return Struct

T = struct("T", "foo bar baz")
t = T(1, 2, 3)
print(t)
for name in t._fields:
print(name, "=", getattr(t, name))
t.bar = 42
print(t)
$ python3 namedtuple_replacement.py 
T(foo=1, bar=2, baz=3)
foo = 1
bar = 2
baz = 3
T(foo=1, bar=42, baz=3)


-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Deborah Swanson
Peter Otten wrote, on Wednesday, April 12, 2017 11:35 PM
> 
> Deborah Swanson wrote:
> 
> > It's a small point, but I suspect getattr(record, label) 
> would still 
> > fail, even if label's value is 'label' and only 'label', but what's 
> > the point of having a variable if it will only ever have just one 
> > value?
> 
> You are misunderstanding. Your getattr() call fails because you have
> 
> label = "label"
> 
> burried somewhere in your code. As soon as you change that to
> 
> label = 
> 
> the error will go away. 


Yes, the error goes away, but now getattr(record, label) is useless for
processing field names, unless you want to write a line of code for each
one. (I have 17 field names, and forget about passing label to a
function.)

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-13 Thread Peter Otten
Deborah Swanson wrote:

> It's a small point, but I suspect getattr(record, label) would still
> fail, even if label's value is 'label' and only 'label', but what's the
> point of having a variable if it will only ever have just one value?

You are misunderstanding. Your getattr() call fails because you have

label = "label"

burried somewhere in your code. As soon as you change that to

label = 

the error will go away. 


-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-12 Thread Deborah Swanson
Deborah Swanson wrote, on Wednesday, April 12, 2017 4:29 PM
> 
> Peter Otten wrote, on Wednesday, April 12, 2017 3:15 PM
> > 
> > >> Indeed you cannot change the namedtuple's attributes. Like the
> > >> "normal" tuple it is designed to be immutable. If you want
changes in 
> > >> one list (the group) to appear in another (the original records)
you 
> > >> need a mutable data type.
> > > 
> > > Sadly, that does seem to be the correct conclusion here.
> > 
> > Think hard if you really need the original list.
> 
> It's possible you might transform the namedtuple into a 
> mutable type, and I didn't try that. But it seems like the 
> group-by defaultdict strategy would have to have a 
> significant performance edge to be worth it and you wouldn't 
> have any of the namedtuple properties to work with after the 
> transformation. I also ran into some trouble with your 
> algorithm that follows making the defaultdict, and I'm not 
> sure what value there would be in hashing through that. 
> Though I'm certainly willing to if you are.
> 
> It worked to simply stay with the original list of 
> namedtuples to begin with.
> 
> I remain grateful for your introduction to the collections 
> module. What a neat little package of tools!

I know it's quick for this double-take, but it occurs to me that
transforming to a mutable type isn't a performance evaluation at all.
Filling in missing values is the last step before outputting the
processed list, so it might not be necessary to work with namedtuples at
that point.

The algorithm to fill in the missing values for each group (which would
no longer be namedtuples) in the defaultdict is something I'm back at
the drawing board for. But it shouldn't be too hard. Haha, we'll see!

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-12 Thread Deborah Swanson
Peter Otten wrote, on Wednesday, April 12, 2017 3:15 PM
> 
> Deborah Swanson wrote:
> 
> >> >value = getattr(record, label)
> >>
> >> That should work.
> > 
> > We may agree that it *should* work, by an intuitive grasp of how it 
> > should work, but it doesn't. You get "object has no attribute
'label'.
> 
> Only if the namedtuple 
> 
> (1) does not have a label attribute and
> (2) the value of the name label is the string "label"
> 
> In that case both
> 
> label = "label"
> getattr(record, label)
> 
> and
> 
> record.label
> 
> will fail with the same AttributeError. The problem is *not* the
dynamic 
> access through getattr().

Agreed, it's not getattr's fault. 

It's a small point, but I suspect getattr(record, label) would still
fail, even if label's value is 'label' and only 'label', but what's the
point of having a variable if it will only ever have just one value?

The question would be whether the compiler (interpreter?) would look at 
getattr(record, label), evaluate label and see that there is a field
named 'label', but I suspect it wouldn't take that many steps. It wants
to see recordset.fieldname, and a bare "label" does not reference the
object.

I don't exactly understand your point (2). If the namedtuple does not
have a label attribute, then getattr(record, label) will get the error
whether the name label holds the string 'label' or not. And it wants to
see recordset.fieldname, not just fieldname. But maybe I misunderstood
what you were saying. This stuff is quite loopy to think about, at least
for me it is.

> >> Indeed you cannot change the namedtuple's attributes. Like the 
> >> "normal" tuple it is designed to be immutable. If you want changes
in 
> >> one list (the group) to appear in another (the original records)
you 
> >> need a mutable data type.
> > 
> > Sadly, that does seem to be the correct conclusion here.
> 
> Think hard if you really need the original list.

It's possible you might transform the namedtuple into a mutable type,
and I didn't try that. But it seems like the group-by defaultdict
strategy would have to have a significant performance edge to be worth
it and you wouldn't have any of the namedtuple properties to work with
after the transformation. I also ran into some trouble with your
algorithm that follows making the defaultdict, and I'm not sure what
value there would be in hashing through that. Though I'm certainly
willing to if you are.

It worked to simply stay with the original list of namedtuples to begin
with.

I remain grateful for your introduction to the collections module. What
a neat little package of tools!

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-12 Thread Peter Otten
Deborah Swanson wrote:

>> >value = getattr(record, label)
>>
>> That should work.
> 
> We may agree that it *should* work, by an intuitive grasp of how it
> should work, but it doesn't. You get "object has no attribute 'label'.

Only if the namedtuple 

(1) does not have a label attribute and
(2) the value of the name label is the string "label"

In that case both

label = "label"
getattr(record, label)

and

record.label

will fail with the same AttributeError. The problem is *not* the dynamic 
access through getattr().

>> Indeed you cannot change the namedtuple's attributes. Like the "normal"
>> tuple it is designed to be immutable. If you want changes in one list
>> (the group) to appear in another (the original records) you need a 
>> mutable data type.
> 
> Sadly, that does seem to be the correct conclusion here.

Think hard if you really need the original list.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Namedtuples: some unexpected inconveniences

2017-04-12 Thread Deborah Swanson
Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM
> 
> Deborah Swanson wrote:
> 
> > I won't say the following points are categorically true, but I
became 
> > convinced enough they were true in this instance that I abandoned
the 
> > advised strategy. Which was to use defaultdict to group the list of 
> > namedtuples by one of the fields for the purpose of determining 
> > whether certain other fields in each group were either missing
values 
> > or contained contradictory values.
> > 
> > Are these bugs, or was there something I could have done to avoid 
> > these problems? Or are they just things you need to know working
with 
> > namedtuples?
> > 
> > The list of namedtuples was created with:
> > 
> > infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in
-
> > test.csv")
> > rows = csv.reader(infile)fieldnames = next(rows)
> > Record = namedtuple("Record", fieldnames)
> > records = [Record._make(fieldnames)]
> > records.extend(Record._make(row) for row in rows)
> > . . .
> > (many lines of field processing code)
> > . . .
> > 
> > then the attempt to group the records by title:
> > 
> > import operator
> > records[1:] = sorted(records[1:], key=operator.attrgetter("title",
> > "Date"))
> 
> Personally I would immediately discard the header row once and for
all, not 
> again and again on every operation.

Well, perhaps, but I need the header row to stay in place to write the
list to a csv when I'm done (which is why it's there in the first
place). There might be a tiny performance edge in discarding the header
row for the sort, but there would also be a hit to recreate it at output
time.
 
> > groups = defaultdict() for r in records[1:]:
> > # if the key doesn't exist, make a new group
> > if r.title not in groups.keys():
> > groups[r.title] = [r]
> > # if key (group) exists, append this record
> > else:
> > groups[r.title].append(r)
> 
> You are not using the defaultdict the way it is intended; the 
> groups can be built with
> 
> groups = defaultdict(list)
> for r in records[1:]:
> groups[r.title].append(r)

Yes, going back to your original post I see now that's what you gave,
and it's probably why I noticed defaultdict's being characterized by
what you make the default to be. Too bad I lost track of that.

> > (Please note that this default dict will not automatically make new 
> > keys when they are encountered, possibly because the keys of the 
> > defaultdict are made from namedtuples and the values are
namedtuples. 
> > So you have to include the step to make a new key when a key is not 
> > found.)
> > 
> > If you succeed in modifying records in a group, the dismaying thing
is 
> > that the underlying records are not updated, making the entire 
> > exercise totally pointless, which was a severe and unexpected 
> > inconvenience.
> > 
> > It looks like the values and the structure were only copied from the

> > original list of namedtuples to the defaultdict. The rows of the 
> > grouped-by dict still behave like namedtuples, but they are no
longer 
> > the same namedtuples as the original list of namedtuples. (I'm sure
I 
> > didn't say that quite right, please correct me if you have better 
> > words for it.)
> 
> They should be the same namedtuple. Something is wrong with 
> your actual code or your diagnosis or both.

Well, I didn't see them behaving as the same namedtuples, and I looked
hard at it, many different ways. If someone could point out the mistake
I might have made to get only copies of them or why they necessarily
would be the same namedtuples, I'd certainly look into it. Or better yet
some code that does the same thing and they remain the same ones. 

(But I think you got it right in your last sentence below. defaultdict
copied them because they were immutable, leaving the original list
unchanged.)

> > It might be possible to complete the operation and then write out
the 
> > groups of rows of namedtuples in the dict to a simple list of 
> > namedtuples, discarding the original, but at the time I noticed that

> > modifying rows in a group didn't change the values in the original 
> > list of namedtuples, I still had further to go with the dict of 
> > groups,  and it was looking easier by the minute to solve the
missing 
> > values problem directly from the original list of namedtuples, so 
> > that's what I did.
> > 
> > If requested I can reproduce how I saw that the original list of 
> > namedtuples was not changed when I modified field values in group
rows 
> > of the dict, but it's lengthy and messy. It might be worthwhile
though 
> > if someone might see a mistake I made, though I found the same 
> > behavior several different ways. Which was when I called it barking
up 
> > the wrong tree and quit trying to solve the problem that way.
> > 
> > Another inconvenience is that there appears to be no way to access 
> > field values of a named tuple by variable, although I've had limited

> > success accessing by 

RE: Namedtuples: some unexpected inconveniences

2017-04-12 Thread Deborah Swanson


> -Original Message-
> From: Python-list 
> [mailto:python-list-bounces+python=deborahswanson.net@python.o
> rg] On Behalf Of MRAB
> Sent: Wednesday, April 12, 2017 1:42 PM
> To: python-list@python.org
> Subject: Re: Namedtuples: some unexpected inconveniences
> 
> 
> On 2017-04-12 20:57, Deborah Swanson wrote:
> > Are these bugs, or was there something I could have done to avoid 
> > these problems? Or are they just things you need to know 
> working with namedtuples?
> > 
> > The list of namedtuples was created with:
> > 
> > infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 
> 2017 in -
> > test.csv")
> > rows = csv.reader(infile)fieldnames = next(rows)
> > Record = namedtuple("Record", fieldnames)
> > records = [Record._make(fieldnames)]
> > records.extend(Record._make(row) for row in rows)
> >  . . .
> > (many lines of field processing code)
> >  . . .
> > 
> > then the attempt to group the records by title:
> > 
> > import operator
> > records[1:] = sorted(records[1:], key=operator.attrgetter("title",
> > "Date")) groups = defaultdict() for r in records[1:]:
> >  # if the key doesn't exist, make a new group
> >  if r.title not in groups.keys():
> >  groups[r.title] = [r]
> >  # if key (group) exists, append this record
> >  else:
> >  groups[r.title].append(r)
> > 
> > (Please note that this default dict will not automatically make new 
> > keys when they are encountered, possibly because the keys of the 
> > defaultdict are made from namedtuples and the values are 
> namedtuples. 
> > So you have to include the step to make a new key when a key is not 
> > found.)

MRAB said:
 
> The defaultdict _will_ work when you use it properly. :-)
> 
> The line should be:
> 
>  groups = defaultdict(list)
> 
> so that it'll make a new list every time a new key is 
> automatically added.

Arg. Now I remember the thought crossing my mind early on, and noticing
that the characterizing property of a defaultdict was what you set the
default to be. Too bad I forgot that useful thought once I was entangled
with all those other problems.

Thanks for jogging that memory stuck in a hidey hole.

> Another point: namedtuples, as with normal tuples, are immutable; once

> created, you can't change an attribute. A dict might be a better bet.

Yes, namedtuples still being tuples was a point mentioned in passing by
someone, I think Steve D'Aprano, but I didn't immediately see that as
being the roadblock to accessing field values by variable. It does make
sense now though, although others on the list also didn't see it, so I'm
not feeling as bad about it as I could.

Namedtuples absolutely was the right data structure for two thirds of
this program. I only ran into trouble with it trying to do the
defaultdict group by thing. And it all turned out ok just by going back
to the original list.

Now, if I could understand why the namedtuples grouped by the
defaultdict were only copied instead of remaining the same namedtuples
as the list they were copied from, that should wrap this set of problems
up.

Many thanks again!

Deborah

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-12 Thread Peter Otten
Deborah Swanson wrote:

> I won't say the following points are categorically true, but I became
> convinced enough they were true in this instance that I abandoned the
> advised strategy. Which was to use defaultdict to group the list of
> namedtuples by one of the fields for the purpose of determining whether
> certain other fields in each group were either missing values or
> contained contradictory values.
> 
> Are these bugs, or was there something I could have done to avoid these
> problems? Or are they just things you need to know working with
> namedtuples?
> 
> The list of namedtuples was created with:
> 
> infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
> test.csv")
> rows = csv.reader(infile)fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)
> . . .
> (many lines of field processing code)
> . . .
> 
> then the attempt to group the records by title:
> 
> import operator
> records[1:] = sorted(records[1:], key=operator.attrgetter("title",
> "Date")) 

Personally I would immediately discard the header row once and for all, not 
again and again on every operation.

> groups = defaultdict() for r in records[1:]:
> # if the key doesn't exist, make a new group
> if r.title not in groups.keys():
> groups[r.title] = [r]
> # if key (group) exists, append this record
> else:
> groups[r.title].append(r)

You are not using the defaultdict the way it is intended; the groups can be 
built with

groups = defaultdict(list)
for r in records[1:]:
groups[r.title].append(r)
 
> (Please note that this default dict will not automatically make new keys
> when they are encountered, possibly because the keys of the defaultdict
> are made from namedtuples and the values are namedtuples. So you have to
> include the step to make a new key when a key is not found.)
> 
> If you succeed in modifying records in a group, the dismaying thing is
> that the underlying records are not updated, making the entire exercise
> totally pointless, which was a severe and unexpected inconvenience.
> 
> It looks like the values and the structure were only copied from the
> original list of namedtuples to the defaultdict. The rows of the
> grouped-by dict still behave like namedtuples, but they are no longer
> the same namedtuples as the original list of namedtuples. (I'm sure I
> didn't say that quite right, please correct me if you have better words
> for it.)

They should be the same namedtuple. Something is wrong with your actual code 
or your diagnosis or both.

> 
> It might be possible to complete the operation and then write out the
> groups of rows of namedtuples in the dict to a simple list of
> namedtuples, discarding the original, but at the time I noticed that
> modifying rows in a group didn't change the values in the original list
> of namedtuples, I still had further to go with the dict of groups,  and
> it was looking easier by the minute to solve the missing values problem
> directly from the original list of namedtuples, so that's what I did.
> 
> If requested I can reproduce how I saw that the original list of
> namedtuples was not changed when I modified field values in group rows
> of the dict, but it's lengthy and messy. It might be worthwhile though
> if someone might see a mistake I made, though I found the same behavior
> several different ways. Which was when I called it barking up the wrong
> tree and quit trying to solve the problem that way.
> 
> Another inconvenience is that there appears to be no way to access field
> values of a named tuple by variable, although I've had limited success
> accessing by variable indices. However, direct attempts to do so, like:
> 
> values = {row[label] for row in group}
> (where 'label' is a variable for the field names of a namedtuple)
> 
> gets "object has no attribute 'label'
> 
> or, where 'record' is a row in a list of namedtuples and 'label' is a
> variable for the fieldnames of a namedtuple:
> 
> value = getattr(record, label)

That should work.

> setattr(record, label, value) also don't work.
> 
> You get the error 'object has no attribute 'label' every time.

Indeed you cannot change the namedtuple's attributes. Like the "normal" 
tuple it is designed to be immutable. If you want changes in one list (the 
group) to appear in another (the original records) you need a mutable data 
type.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Namedtuples: some unexpected inconveniences

2017-04-12 Thread MRAB

On 2017-04-12 20:57, Deborah Swanson wrote:

I won't say the following points are categorically true, but I became
convinced enough they were true in this instance that I abandoned the
advised strategy. Which was to use defaultdict to group the list of
namedtuples by one of the fields for the purpose of determining whether
certain other fields in each group were either missing values or
contained contradictory values.

Are these bugs, or was there something I could have done to avoid these
problems? Or are they just things you need to know working with
namedtuples?

The list of namedtuples was created with:

infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
test.csv")
rows = csv.reader(infile)fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
records = [Record._make(fieldnames)]
records.extend(Record._make(row) for row in rows)
 . . .
(many lines of field processing code)
 . . .

then the attempt to group the records by title:

import operator
records[1:] = sorted(records[1:], key=operator.attrgetter("title",
"Date")) groups = defaultdict() for r in records[1:]:
 # if the key doesn't exist, make a new group
 if r.title not in groups.keys():
 groups[r.title] = [r]
 # if key (group) exists, append this record
 else:
 groups[r.title].append(r)

(Please note that this default dict will not automatically make new keys
when they are encountered, possibly because the keys of the defaultdict
are made from namedtuples and the values are namedtuples. So you have to
include the step to make a new key when a key is not found.)


The defaultdict _will_ work when you use it properly. :-)

The line should be:

groups = defaultdict(list)

so that it'll make a new list every time a new key is automatically added.

Another point: namedtuples, as with normal tuples, are immutable; once 
created, you can't change an attribute. A dict might be a better bet.



If you succeed in modifying records in a group, the dismaying thing is
that the underlying records are not updated, making the entire exercise
totally pointless, which was a severe and unexpected inconvenience.

It looks like the values and the structure were only copied from the
original list of namedtuples to the defaultdict. The rows of the
grouped-by dict still behave like namedtuples, but they are no longer
the same namedtuples as the original list of namedtuples. (I'm sure I
didn't say that quite right, please correct me if you have better words
for it.)

It might be possible to complete the operation and then write out the
groups of rows of namedtuples in the dict to a simple list of
namedtuples, discarding the original, but at the time I noticed that
modifying rows in a group didn't change the values in the original list
of namedtuples, I still had further to go with the dict of groups,  and
it was looking easier by the minute to solve the missing values problem
directly from the original list of namedtuples, so that's what I did.

If requested I can reproduce how I saw that the original list of
namedtuples was not changed when I modified field values in group rows
of the dict, but it's lengthy and messy. It might be worthwhile though
if someone might see a mistake I made, though I found the same behavior
several different ways. Which was when I called it barking up the wrong
tree and quit trying to solve the problem that way.

Another inconvenience is that there appears to be no way to access field
values of a named tuple by variable, although I've had limited success
accessing by variable indices. However, direct attempts to do so, like:

values = {row[label] for row in group}
 (where 'label' is a variable for the field names of a namedtuple)
 
 gets "object has no attribute 'label'


or, where 'record' is a row in a list of namedtuples and 'label' is a
variable for the fieldnames of a namedtuple:

 value = getattr(record, label)
 setattr(record, label, value)  also don't work.
 
You get the error 'object has no attribute 'label' every time.



--
https://mail.python.org/mailman/listinfo/python-list