Re: Using namedtuples field names for column indices in a list of lists
On 01/12/2017 02:26 AM, Deborah Swanson wrote: > It's true, I've only been on this list a few weeks, although I've seen > and been on the receiving end of the kind of "help" that feels more like > being sneered at than help. Not on this list, but on Linux and similar > lists. There does seem to be a "tough love" approach to "helping" > people, and I haven't seen that it really helped that much, in other > places that I've seen it in action over a period of time. If you go down a wrong path, people are going to try to warn you. For example, you were told several times, no that object really is a list, yet you argued with them on that point for several posts. Tough love or sneering? No, absolutely not. Communication difficulties? Yes! But not even close to wholly the fault of those who were trying to assist you. If you haven't been helped, it's not for lack of their trying. > I'm willing > though to just see how it works on this list. Since I've been here, I > haven't seen people come back who get that kind of approach, but a few > weeks is too short a time to draw conclusions. Still, when people who > need help don't come right back, that should be a first clue that they > didn't get it. Fortunately this list is pretty friendly and open to newbies. And long-time posters have no problem telling other long-time posters when they do cross the line into bullying or trolling territory. Fortunately I've seen nothing but people wanting to help you with your ventures in Python programming responding to your queries, which I'm gratified to see. And many many newbies have been helped in their explorations of Python land. -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On Thu, Jan 12, 2017 at 9:27 PM, Marko Rauhamaawrote: > An instructive anecdote: somebody I know told me he once needed the > definitive list of some websites. He posted a question on the relevant > online forum, but it fell on deaf ears. After some days, he replied to > his own posting saying he had found the list and included the list in > his reply. He knew the list was probably not current, but it served its > purpose: right away he got an angry response pointing out his list was > completely wrong with a link to the up-to-date list. There's a reason for that. Inaccurate information is worse than none at all, because it's indistinguishable from accurate information without deep analysis. Also, often someone won't know the correct answer, but will recognize a wrong answer. In correcting the record, you come a bit closer to the truth. Sometimes all it takes is one wrong answer, and the discussion kicks off. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
"Deborah Swanson": > I've only been on this list a few weeks, although I've seen and been > on the receiving end of the kind of "help" that feels more like being > sneered at than help. Not on this list, but on Linux and similar > lists. There does seem to be a "tough love" approach to "helping" > people, An instructive anecdote: somebody I know told me he once needed the definitive list of some websites. He posted a question on the relevant online forum, but it fell on deaf ears. After some days, he replied to his own posting saying he had found the list and included the list in his reply. He knew the list was probably not current, but it served its purpose: right away he got an angry response pointing out his list was completely wrong with a link to the up-to-date list. Marko -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Antoon Pardon wrote, on January 12, 2017 12:49 AM > > Op 11-01-17 om 23:57 schreef Deborah Swanson: > > > >> What are we supposed to do when somebody asks a question based on an > >> obvious mistake? Assume that they're a quick learner who has probably > >> already worked out their mistake and doesn't need an answer? That > >> would certainly make our life easier: we can just ignore everybody's > >> questions. > > No, of course not. My advice to people who want to help is to not > > assume that you know what the question asker does and doesn't know, > > and just answer the questions without obsessing about what > they know. > > With all respect, such an answer betrays not much experience > on this list. Half the time answering in this way would mean > making very little progress in actually helping the person. > There is an important difference in trying to help someone > and just answering his questions. And your advice may be the > best way to help someone like you but not everyone is like > you. A lot of people have been helped by a remark that didn't > answer the question. It's true, I've only been on this list a few weeks, although I've seen and been on the receiving end of the kind of "help" that feels more like being sneered at than help. Not on this list, but on Linux and similar lists. There does seem to be a "tough love" approach to "helping" people, and I haven't seen that it really helped that much, in other places that I've seen it in action over a period of time. I'm willing though to just see how it works on this list. Since I've been here, I haven't seen people come back who get that kind of approach, but a few weeks is too short a time to draw conclusions. Still, when people who need help don't come right back, that should be a first clue that they didn't get it. > > If that's > > impossible because they have something so wrong that you don't know > > what they're asking, that would be a good time to point it out and > > give them a chance to correct it. > > It is rarely a question of impossibility. It often enough is > a sense that the person asking the question is approaching > the problem from the wrong side. Often enough that sense is > correct, often enough that sense is wrong. All the > participants can do is take a clue from the question and then > guess what respons would help this person best. This sounds right to me. Any list or forum that fields questions has the problem of understanding the questioner who's writing in. Any strategy that briges that gap seems like a good one to me. > Nobody can expect that this list will treat their questions > in a way that suits their personal style. > > -- > Antoon Pardon > > Oh, I'm sure that's true, though I do think more direct question asking and answering is always helpful. Communication in lists and forums is somewhat in the dark, because there's so little context in many of the conversations. Questions (and waiting for the answers before responding) are an excellent way to fill in some of the dark spaces. -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
Op 11-01-17 om 23:57 schreef Deborah Swanson: > >> What are we supposed to do when somebody asks a question based on an >> obvious mistake? Assume that they're a quick learner who has probably >> already worked out their mistake and doesn't need an answer? That >> would certainly make our life easier: we can just ignore everybody's >> questions. > No, of course not. My advice to people who want to help is to not assume > that you know what the question asker does and doesn't know, and just > answer the questions without obsessing about what they know. With all respect, such an answer betrays not much experience on this list. Half the time answering in this way would mean making very little progress in actually helping the person. There is an important difference in trying to help someone and just answering his questions. And your advice may be the best way to help someone like you but not everyone is like you. A lot of people have been helped by a remark that didn't answer the question. > If that's > impossible because they have something so wrong that you don't know what > they're asking, that would be a good time to point it out and give them > a chance to correct it. It is rarely a question of impossibility. It often enough is a sense that the person asking the question is approaching the problem from the wrong side. Often enough that sense is correct, often enough that sense is wrong. All the participants can do is take a clue from the question and then guess what respons would help this person best. Nobody can expect that this list will treat their questions in a way that suits their personal style. -- Antoon Pardon -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Steven D'Aprano wrote, on January 10, 2017 6:19 PM > > On Tuesday 10 January 2017 18:14, Deborah Swanson wrote: > > > I'm guessing that you (and those who see things like you do) might > > not be used to working with quick learners who make mistakes at > > first but catch up with them real fast, or you're very judgemental > > about people who make mistakes, period. I certainly don't care if > > you want to judge me, you're entitled to your opinion. > > Be fair. We're not mind-readers, and we're trying to help you, not > attack you. You aren't, and I've never seen you as attacking me, but there's a few others here who have attacked me, and they weren't trying to help me either. But it's only a very few people who've harrassed me, and the person I was writing to above wasn't one of them. > It is true that we're often extremely pedantic. Sometimes annoyingly > so, but on the other hand precision is important, especially in > technical fields. > There's a *huge* difference between a MTA and a > MUA, even though they differ only by one letter and both are related > to email. > > One of the techs I work with has the same habit of correcting me, and > when I'm invariably forced to agree that he is technical correct, he > replies "technical correctness is the best kind of correctness". > Annoying as it is to be on the receiving end, he is right: at least > half the time I learn something from his pedantry. (The other half of > the time, its a difference that makes no difference.) I'm sorry you have to work with someone like that, he sounds perfectly awful. (But that doesn't give you license to do the same. You're better than that.) > What are we supposed to do when somebody asks a question based on an > obvious mistake? Assume that they're a quick learner who has probably > already worked out their mistake and doesn't need an answer? That > would certainly make our life easier: we can just ignore everybody's > questions. No, of course not. My advice to people who want to help is to not assume that you know what the question asker does and doesn't know, and just answer the questions without obsessing about what they know. If that's impossible because they have something so wrong that you don't know what they're asking, that would be a good time to point it out and give them a chance to correct it. > Sometimes people make errors of terminology that doesn't affect the > semantics of what they're asking: > > "I have an array [1, 2, 3, 4] and I want to ..." > > It's very likely that they have a list and they're merely using a term > they're familiar with from another language, rather than the array > module. > > But what are we supposed to make of mistakes that *do* affect the > semantics of the question: > > "I have a dict of ints {1, 2, 3, 4} and want to sort the values by > key so that I get [4, 2, 3, 1], how do I do that?" > > How can we answer that without correcting their misuse of terminology > and asking for clarification of what data structure they *actually* > have? > > We get questions like this very frequently. How would you answer it? Well, I'd tell them how to reverse sort a dictionary, and point out that what they've given isn't a dictionary because it doesn't have any keys, and on this occasion I'd just give them an example of what a dictionary looks like (as part of showing them how to reverse sort it) with its keys, including the curly braces, and see what they come back with. They pretty clearly mean a dictionary and not a list, since they said "dict", used curly braces, and said they want to sort the values by key" in the first clause of their sentence. So they're just a little confused and maybe absent-mindedly slipping back into the more familiar list notation and concepts, or they don't exactly know what a dictionary is. I wouldn't belabor the point at this time, unless they keep coming back with the same issues. That would be the time to belabor it, in my opinion. When some people are learning it's hard to keep all the new things firmly in mind and not confuse them with more familiar things they already know. But if they get it all straight in reasonably good time, that should be ok. > Or: > > "I have a list l = namedtuple('l', 'field1 field2') and can't > extract fields by index, l[0] doesn't work..." > > Of course it doesn't work. How would you respond to that if you were > in our place? > > - ignore the question because the poster is ever so smart and will have > worked it out by now? > > - point out that l is not a list, or even a tuple, and of course l[0] > doesn't work because l is actually a class? I'd go with some variant of option 2, depending on how well I knew the person asking. If the asker had just dropped in out of the blue and I knew nothing about them I'd say something like "You can't because 'l' isn't a list." Then I'd try to gauge how useful it would be to them to know exactly what 'l' is, but most likely
Re: Using namedtuples field names for column indices in a list of lists
On 11/01/2017 02:18, Steven D'Aprano wrote: On Tuesday 10 January 2017 18:14, Deborah Swanson wrote: I'm guessing that you (and those who see things like you do) might not be used to working with quick learners who make mistakes at first but catch up with them real fast, or you're very judgemental about people who make mistakes, period. I certainly don't care if you want to judge me, you're entitled to your opinion. Be fair. We're not mind-readers, and we're trying to help you, not attack you. It is true that we're often extremely pedantic. Sometimes annoyingly so, but on the other hand precision is important, especially in technical fields. There's a *huge* difference between a MTA and a MUA, even though they differ only by one letter and both are related to email. There's a bigger difference between USB and USA! -- Bartc -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
On Tuesday 10 January 2017 18:14, Deborah Swanson wrote: > I'm guessing that you (and those who > see things like you do) might not be used to working with quick learners > who make mistakes at first but catch up with them real fast, or you're > very judgemental about people who make mistakes, period. I certainly > don't care if you want to judge me, you're entitled to your opinion. Be fair. We're not mind-readers, and we're trying to help you, not attack you. It is true that we're often extremely pedantic. Sometimes annoyingly so, but on the other hand precision is important, especially in technical fields. There's a *huge* difference between a MTA and a MUA, even though they differ only by one letter and both are related to email. One of the techs I work with has the same habit of correcting me, and when I'm invariably forced to agree that he is technical correct, he replies "technical correctness is the best kind of correctness". Annoying as it is to be on the receiving end, he is right: at least half the time I learn something from his pedantry. (The other half of the time, its a difference that makes no difference.) What are we supposed to do when somebody asks a question based on an obvious mistake? Assume that they're a quick learner who has probably already worked out their mistake and doesn't need an answer? That would certainly make our life easier: we can just ignore everybody's questions. Sometimes people make errors of terminology that doesn't affect the semantics of what they're asking: "I have an array [1, 2, 3, 4] and I want to ..." It's very likely that they have a list and they're merely using a term they're familiar with from another language, rather than the array module. But what are we supposed to make of mistakes that *do* affect the semantics of the question: "I have a dict of ints {1, 2, 3, 4} and want to sort the values by key so that I get [4, 2, 3, 1], how do I do that?" How can we answer that without correcting their misuse of terminology and asking for clarification of what data structure they *actually* have? We get questions like this very frequently. How would you answer it? Or: "I have a list l = namedtuple('l', 'field1 field2') and can't extract fields by index, l[0] doesn't work..." Of course it doesn't work. How would you respond to that if you were in our place? - ignore the question because the poster is ever so smart and will have worked it out by now? - point out that l is not a list, or even a tuple, and of course l[0] doesn't work because l is actually a class? Its easy to criticise us for answering the questions you ask instead of the questions you intended, but we're not mind-readers. We don't know what you're thinking, we only know what you communicate to us. Telling us off for answering your questions is hardly likely to encourage us to answer them in the future. -- Steven "Ever since I learned about confirmation bias, I've been seeing it everywhere." - Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 01/09/2017 11:14 PM, Deborah Swanson wrote: So I guess you should just do your thing and I'll do mine. As you say. Takes all kinds, and I think in the end what will count is the quality of my finished work (which has always been excellent), and not the messy process to get there. Agreed. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Ethan Furman wrote, on January 09, 2017 10:06 PM > > On 01/09/2017 08:51 PM, Deborah Swanson wrote: > > Ethan Furman wrote, on January 09, 2017 8:01 PM > > >> As I said earlier, I admire your persistence -- but take some time > >> and learn the basic vocabulary as that will make it much easier for > >> you to ask questions, and for us to give you meaningful answers. > > > > As I mentioned, I have completed MIT's 2 introductory Python courses > > with final grades of 98% and 97%. What tutorials do you think would > > significantly add to that introduction? > > The Python version of "Think like a computer scientist" is > good. Otherwise, ask the list for recommendations. I'm not > suggesting more advanced topics, but rather basic topics such > as how the REPL works, how to tell what objects you have, how > to find the methods those objects have, etc. I'm working on all of that, and I've been taking physics, chem, computer science and theoretical mathematics (straight 4.0s my last 2 years, graduated summa cum laude), but it's been a couple of decades since my last brick building university course. It's coming back fast, but there's a lot I'm still pulling out of cobwebs. I basically know the tools to use for the things you mention, but in the online coursework I've done in the past year, we didn't need to use them because they told us everything. So that's another thing I'm doing catch up on. Really shouldn't take too long, it's not that complicated or difficult. > > It's true that I didn't spend much time in the forums while I was > > taking those courses, so this is the first time I've talked with > > people about Python this intensively. But I'm a good learner and I'm > > picking up a lot of it pretty quickly. People on the list also talk > > and comprehend differently than people in the MIT courses did, so I > > have to become accustomed to this as well. And the only place to learn > > that is right here. > > Indeed. > > The issue I (and others) see, though, is more along the lines > of basic understanding: you seemed to think that a list of > lists should act the same as a list of tuples, even though > lists and tuples are not the same thing. It's like expecting > a basket of oranges to behave like a basket of avocados. ;) > > As you say, you're making good progress. > > -- > ~Ethan~ I'm sorry, I didn't think a list of namedtuples would be like a list of lists when I wrote to Erik just a bit ago today, but I sort of did when I first started trying to use them a couple days ago. And to the extent I was pretty sure they weren't the same, I still didn't know in what ways they were different. So when people ask me questions about why I did things the way I did, I try to explain that I didn't know certain things then, but I know them now. I'm guessing that you (and those who see things like you do) might not be used to working with quick learners who make mistakes at first but catch up with them real fast, or you're very judgemental about people who make mistakes, period. I certainly don't care if you want to judge me, you're entitled to your opinion. One of my MIT professors was always screwing up little things, even things he knew inside out. Some people are like that, and I think I might be one of them, although I'm really not as bad as he was. So I guess you should just do your thing and I'll do mine. I don't promise that I'll ever get to the point where I never set a foot wrong. I know some people can do that, and I hope someday I will. But then I remember the MIT professor, who I really learned a lot from, despite all his flub-ups, and that I might be a little bit like him. Takes all kinds, and I think in the end what will count is the quality of my finished work (which has always been excellent), and not the messy process to get there. It shocked me the first time someone on this list jumped down my throat for a momentary lapse on my part, as if I was a total idiot and knew nothing, but I'm sort of getting used to that too. It must be nice to be so perfect, I guess. Deborah -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 01/09/2017 08:51 PM, Deborah Swanson wrote: Ethan Furman wrote, on January 09, 2017 8:01 PM As I said earlier, I admire your persistence -- but take some time and learn the basic vocabulary as that will make it much easier for you to ask questions, and for us to give you meaningful answers. As I mentioned, I have completed MIT's 2 introductory Python courses with final grades of 98% and 97%. What tutorials do you think would significantly add to that introduction? The Python version of "Think like a computer scientist" is good. Otherwise, ask the list for recommendations. I'm not suggesting more advanced topics, but rather basic topics such as how the REPL works, how to tell what objects you have, how to find the methods those objects have, etc. It's true that I didn't spend much time in the forums while I was taking those courses, so this is the first time I've talked with people about Python this intensively. But I'm a good learner and I'm picking up a lot of it pretty quickly. People on the list also talk and comprehend differently than people in the MIT courses did, so I have to become accustomed to this as well. And the only place to learn that is right here. Indeed. The issue I (and others) see, though, is more along the lines of basic understanding: you seemed to think that a list of lists should act the same as a list of tuples, even though lists and tuples are not the same thing. It's like expecting a basket of oranges to behave like a basket of avocados. ;) As you say, you're making good progress. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Erik wrote, on January 09, 2017 8:06 PM > > On 10/01/17 03:02, Deborah Swanson wrote: > > Erik wrote, on January 09, 2017 5:47 PM > >> IIRC, you create it using a list comprehension which creates the > >> records. A list comprehension always creates a list. > > > > Well no. The list is created with: > > > > records.extend(Record._make(row) for row in rows) > > No, the list is _extended_ by that code. The list is _created_ with a > line that will say something like "records = []" or "records > = list()" > (or "records = "). The list was created with this code: infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in.csv") rows = csv.reader(infile) fieldnames = next(rows) Record = namedtuple("Record", fieldnames) records = [Record._make(fieldnames)] records.extend(Record._make(row) for row in rows) I just pulled out the .extend statement to show you because that's what looks like a list comprehension, but turns out not to be one. We still get a list though, on that we agree. ;) > It's nice to see you agree that it's a list though. Oh, hold on ... ;) > > > I'm not exactly > > sure if this statement is a list comprehension. > > No, it's not. I was remembering an old message where someone > suggested > using the _make() method and that was expressed as a list > comprehension. > > What you have there is a call to list.extend() passing a _generator_ > comprehension as its only parameter (which in this case you > can consider > to be equivalent to a list comprehension as all of the data are > generated greedily). You see that I said "list.extend()". > That's because > 'records' is a list. > > type(records) > > > > Yes, it's an instance of the list class. A list object. A list. > > >>> type(list()) > > >>> type([]) > > >>> class foo: pass > ... > >>> type(foo()) > > >>> > > ... type() will tell you what class your object is an instance of. > "" tells you that your object is a list. > > > And it behaves like a list sometimes, but many times > > not. > > I think that's impossible. I'm 100% sure it's a list. Please give an > example of when 'records' does not behave like a list. I gave an example in one of my last two replies to other people. The thing is that it's a list, but it's not a list of lists. It's a list of namedtuples, and the non-listlike behaviors appear when I'm directly working with the namedtuples. > > The only thing I don't think you have 100% correct is your assertion > > that records is a list. > > It's a list. I agree, now. > > But that's just a quibble. The important thing in this context is that > > both .sort() and sorted() treat it like a list and DTRT. > > That's because it's a list :) > > E. It is! A list of namedtuples that is, not a list of lists. sorted() and .sort work because they only interact with the outer data structure, which is a list. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Ethan Furman wrote, on January 09, 2017 8:01 PM > > On 01/09/2017 07:02 PM, Deborah Swanson wrote: > > Erik wrote, on January 09, 2017 5:47 PM > > >> As people keep saying, the object you have called 'records' is a > >> *list* of namedtuple objects. It is not a namedtuple. > >> > >> IIRC, you create it using a list comprehension which creates the > >> records. A list comprehension always creates a list. > > > > Well no. The list is created with: > > > > records.extend(Record._make(row) for row in rows) > > > > I'm new to both namedtuples and list comprehensions, so I'm not > > exactly sure if this statement is a list comprehension. It > looks like > > it could be. In any case I recreated records in IDLE and got > > > >--> type(records) > > > > > > So it's a class, derived from list? (Not sure what the > 'list' means.) > > On the one hand, Deborah, I applaud your perseverance. On > the other, it seems as if you trying to run before you can > walk. I know tutorials can be boring, but you really should > go through one so you have a basic understanding of the fundamentals. I actually have had a solid foundation of study in 2 terms of MIT's introductory Python courses. But they can't cover everything in that short a time. > Working in the REPL (the python console), we can see: > > Python 3.4.0 (default, Apr 11 2014, 13:05:18) > ... > --> type(list) > > --> > --> type(list()) > > --> type([1, 2, 3]) > > > So the `list` type is 'type', and the type of list instances > is 'class list'. I just saw that while replying to MRAB. 'records' has type list, but it's only the outer data structure that's a list. Inside, all the records are namedtuples, and I think that accounts for behaviors that are unlike a list of lists. (And the reason I was reluctant to accept that it could be sorted until I tried it for myself.) The method calls I was able to use were from the namedtuples, not the list of namedtuples. > Your records variable is an instance of a list filled with > instances of a namedtuple, 'Record'. One cannot sort a > namedtuple, but one can sort a list of namedtuples -- which > is what you are doing. Yes, I think we've got that straight now. > As I said earlier, I admire your persistence -- but take some > time and learn the basic vocabulary as that will make it much > easier for you to ask questions, and for us to give you > meaningful answers. > > -- > ~Ethan~ As I mentioned, I have completed MIT's 2 introductory Python courses with final grades of 98% and 97%. What tutorials do you think would significantly add to that introduction? It's true that I didn't spend much time in the forums while I was taking those courses, so this is the first time I've talked with people about Python this intensively. But I'm a good learner and I'm picking up a lot of it pretty quickly. People on the list also talk and comprehend differently than people in the MIT courses did, so I have to become accustomed to this as well. And the only place to learn that is right here. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
MRAB wrote, on January 09, 2017 7:37 PM > > On 2017-01-10 03:02, Deborah Swanson wrote: > > Erik wrote, on January 09, 2017 5:47 PM > >> As people keep saying, the object you have called 'records' is a > >> *list* of namedtuple objects. It is not a namedtuple. > >> > >> IIRC, you create it using a list comprehension which creates the > >> records. A list comprehension always creates a list. > > > > Well no. The list is created with: > > > > records.extend(Record._make(row) for row in rows) > > > > I'm new to both namedtuples and list comprehensions, so I'm not > > exactly sure if this statement is a list comprehension. It > looks like > > it could be. > > > This is a list comprehension: > > [Record._make(row) for row in rows] > > and this is a generator expression: > > (Record._make(row) for row in rows) > > It needs the outer parentheses. > > The .extend method will accept any iterable, including list > comprehensions: > > records.extend([Record._make(row) for row in rows]) > > and generator expressions: > > records.extend((Record._make(row) for row in rows)) > > In the latter case, the generator expression is the only > argument of the > .extend method, and Python lets us drop the pair of parentheses: > > records.extend(Record._make(row) for row in rows) > > If there were another argument, it would be ambiguous and > Python would > complain. Appreciate your explanation of why this statement looks like a list comprehension, but it isn't one. > > In any case I recreated records in IDLE and got > > > type(records) > > > > > > So it's a class, derived from list? (Not sure what the > 'list' means.) >>> [1,2,3] [1, 2, 3] >>> type(_) So it is a list, despite not being made by a list comprehension and despite its non-listlike behaviors. Guess I've never looked at the type of a list before, probably because lists are so obvious by looking at them. > > 'records' is in fact a class, it has an fget method and data members > > that I've used. And it behaves like a list sometimes, but many times not. > > > Its type is 'list', so it's an instance of a list, i.e. it's a list! As testified by IDLE above! ;) A list of namedtuples may be an instance of a list, but it doesn't always behave like a list of lists. For example, if you want to modify an element of a record in records, you can't just say 'record.Location = Tulsa' like you can say 'record[Location] = Tulsa' because each record is very much like a tuple, and tuples are immutable. You have to use the _replace function: record._replace(Location) = Tulsa This is very unlike a list of lists. Only the outer data structure is a list, and inside it's all namedtuples. So it's not a list of lists, it's a list of namedtuples. But .sort and sorted() DTRT, and that's valuable. > > The only reason I've hedged away from advice to treat records as a > > list for sorting until I tried it for myself, was because of an awful > > lot of strange behavior I've seen, while trying to do the same things > > with namedtuples as I routinely do with scalars and lists. This is all > > new, and until now, unexplored territory for me. And I generally avoid > > saying I'm sure or outright agreeing with something unless I really do > > know it. > > > >> The sorted() function and the list.sort() method can be used to sort > >> a list containing any objects - it's just a case of telling them how > >> to obtain the key values to compare (which, in the case of > >> simple attribute access which the namedtuple objects allow, > >> "operator.attrgetter()" will > >> do that). This is why sorting the list works for you. > >> > >> You could sort objects of different types - but you might need to > >> supply a function instead of operator.attrgetter() which looks at > >> the type of > >> each object and returns something that's obtained differently > >> for each > >> type (but which the sort function can compare). > >> > >> > >> > >> > >> When you say 'Foo = namedtuple("Foo", "spam ham")', you > are creating > >> a "factory" which is able to generate "Foo" objects for you. > >> > >> When you say "x = Foo(1, 2)" you are using the factory to > create an > >> object for you which has its "spam" and "ham" attributes > set to the > >> values 1 and 2 respectively. > >> > >> When you say "records = [Foo(x, y) for x, y in > some_iterable()]", you > >> are creating a list of such objects. This is the thing you > are then > >> sorting. > >> > >> > >> > >> Does that make sense? > >> > >> Regards, E. > > > > Perfect sense. And now that I've confirmed in code that both sorted() and > > .sort() behave as hoped for with namedtuples, I couldn't be happier. > > ;) > > > > The only thing I don't think you have 100% correct is your assertion > > that records is a list. And I'm really not sure now that > > > > records.extend(Record._make(row) for row in rows) > > > > is a list comprehension. > > > > That's the last statement
Re: Using namedtuples field names for column indices in a list of lists
On 10/01/17 03:02, Deborah Swanson wrote: Erik wrote, on January 09, 2017 5:47 PM IIRC, you create it using a list comprehension which creates the records. A list comprehension always creates a list. Well no. The list is created with: records.extend(Record._make(row) for row in rows) No, the list is _extended_ by that code. The list is _created_ with a line that will say something like "records = []" or "records = list()" (or "records = "). It's nice to see you agree that it's a list though. Oh, hold on ... ;) I'm not exactly sure if this statement is a list comprehension. No, it's not. I was remembering an old message where someone suggested using the _make() method and that was expressed as a list comprehension. What you have there is a call to list.extend() passing a _generator_ comprehension as its only parameter (which in this case you can consider to be equivalent to a list comprehension as all of the data are generated greedily). You see that I said "list.extend()". That's because 'records' is a list. type(records) Yes, it's an instance of the list class. A list object. A list. >>> type(list()) >>> type([]) >>> class foo: pass ... >>> type(foo()) >>> ... type() will tell you what class your object is an instance of. "" tells you that your object is a list. And it behaves like a list sometimes, but many times not. I think that's impossible. I'm 100% sure it's a list. Please give an example of when 'records' does not behave like a list. The only thing I don't think you have 100% correct is your assertion that records is a list. It's a list. But that's just a quibble. The important thing in this context is that both .sort() and sorted() treat it like a list and DTRT. That's because it's a list :) E. -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 01/09/2017 07:02 PM, Deborah Swanson wrote: Erik wrote, on January 09, 2017 5:47 PM As people keep saying, the object you have called 'records' is a *list* of namedtuple objects. It is not a namedtuple. IIRC, you create it using a list comprehension which creates the records. A list comprehension always creates a list. Well no. The list is created with: records.extend(Record._make(row) for row in rows) I'm new to both namedtuples and list comprehensions, so I'm not exactly sure if this statement is a list comprehension. It looks like it could be. In any case I recreated records in IDLE and got --> type(records) So it's a class, derived from list? (Not sure what the 'list' means.) On the one hand, Deborah, I applaud your perseverance. On the other, it seems as if you trying to run before you can walk. I know tutorials can be boring, but you really should go through one so you have a basic understanding of the fundamentals. Working in the REPL (the python console), we can see: Python 3.4.0 (default, Apr 11 2014, 13:05:18) ... --> type(list) --> --> type(list()) --> type([1, 2, 3]) So the `list` type is 'type', and the type of list instances is 'class list'. Your records variable is an instance of a list filled with instances of a namedtuple, 'Record'. One cannot sort a namedtuple, but one can sort a list of namedtuples -- which is what you are doing. As I said earlier, I admire your persistence -- but take some time and learn the basic vocabulary as that will make it much easier for you to ask questions, and for us to give you meaningful answers. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 2017-01-10 03:02, Deborah Swanson wrote: Erik wrote, on January 09, 2017 5:47 PM As people keep saying, the object you have called 'records' is a *list* of namedtuple objects. It is not a namedtuple. IIRC, you create it using a list comprehension which creates the records. A list comprehension always creates a list. Well no. The list is created with: records.extend(Record._make(row) for row in rows) I'm new to both namedtuples and list comprehensions, so I'm not exactly sure if this statement is a list comprehension. It looks like it could be. > This is a list comprehension: [Record._make(row) for row in rows] and this is a generator expression: (Record._make(row) for row in rows) It needs the outer parentheses. The .extend method will accept any iterable, including list comprehensions: records.extend([Record._make(row) for row in rows]) and generator expressions: records.extend((Record._make(row) for row in rows)) In the latter case, the generator expression is the only argument of the .extend method, and Python lets us drop the pair of parentheses: records.extend(Record._make(row) for row in rows) If there were another argument, it would be ambiguous and Python would complain. In any case I recreated records in IDLE and got type(records) So it's a class, derived from list? (Not sure what the 'list' means.) 'records' is in fact a class, it has an fget method and data members that I've used. And it behaves like a list sometimes, but many times not. Its type is 'list', so it's an instance of a list, i.e. it's a list! The only reason I've hedged away from advice to treat records as a list for sorting until I tried it for myself, was because of an awful lot of strange behavior I've seen, while trying to do the same things with namedtuples as I routinely do with scalars and lists. This is all new, and until now, unexplored territory for me. And I generally avoid saying I'm sure or outright agreeing with something unless I really do know it. The sorted() function and the list.sort() method can be used to sort a list containing any objects - it's just a case of telling them how to obtain the key values to compare (which, in the case of simple attribute access which the namedtuple objects allow, "operator.attrgetter()" will do that). This is why sorting the list works for you. You could sort objects of different types - but you might need to supply a function instead of operator.attrgetter() which looks at the type of each object and returns something that's obtained differently for each type (but which the sort function can compare). When you say 'Foo = namedtuple("Foo", "spam ham")', you are creating a "factory" which is able to generate "Foo" objects for you. When you say "x = Foo(1, 2)" you are using the factory to create an object for you which has its "spam" and "ham" attributes set to the values 1 and 2 respectively. When you say "records = [Foo(x, y) for x, y in some_iterable()]", you are creating a list of such objects. This is the thing you are then sorting. Does that make sense? Regards, E. Perfect sense. And now that I've confirmed in code that both sorted() and .sort() behave as hoped for with namedtuples, I couldn't be happier. ;) The only thing I don't think you have 100% correct is your assertion that records is a list. And I'm really not sure now that records.extend(Record._make(row) for row in rows) is a list comprehension. That's the last statement in the creation of 'records', and immediately after that statement executes, the type function says the resulting 'records' is a class, probably derived from list, but it's not a straight up list. 'records' is enough different that you can't assume across the board that namedtuples created this way are equivalent to a list. You do run into problems if you assume it behaves like a list, or even like standard tuples, because it doesn't always. Believe me, when I first started working with namedtuples, I got plenty snarled up debugging code that was written assuming list behavior to know that a namedtuple of namedtuples is not exactly a list. Or even exactly like a list. But that's just a quibble. The important thing in this context is that both .sort() and sorted() treat it like a list and DTRT. And that's very nice. ;) The list class has the .sort method, which sorts in-place. The 'sorted' function is a simple function that takes an iterable, iterates over it to build a list, sorts that list in-place, and then returns the list. The oft-stated rule is that not every 2- or 3-line function needs to be a built-in, but 'sorted' is one of those cases where it's just nice to have it, a case of "practicality beats purity". -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Erik wrote, on January 09, 2017 5:47 PM > As people keep saying, the object you have called 'records' > is a *list* > of namedtuple objects. It is not a namedtuple. > > IIRC, you create it using a list comprehension which creates the > records. A list comprehension always creates a list. Well no. The list is created with: records.extend(Record._make(row) for row in rows) I'm new to both namedtuples and list comprehensions, so I'm not exactly sure if this statement is a list comprehension. It looks like it could be. In any case I recreated records in IDLE and got >>> type(records) So it's a class, derived from list? (Not sure what the 'list' means.) 'records' is in fact a class, it has an fget method and data members that I've used. And it behaves like a list sometimes, but many times not. The only reason I've hedged away from advice to treat records as a list for sorting until I tried it for myself, was because of an awful lot of strange behavior I've seen, while trying to do the same things with namedtuples as I routinely do with scalars and lists. This is all new, and until now, unexplored territory for me. And I generally avoid saying I'm sure or outright agreeing with something unless I really do know it. > The sorted() function and the list.sort() method can be used > to sort a > list containing any objects - it's just a case of telling them how to > obtain the key values to compare (which, in the case of > simple attribute > access which the namedtuple objects allow, > "operator.attrgetter()" will > do that). This is why sorting the list works for you. > > You could sort objects of different types - but you might > need to supply > a function instead of operator.attrgetter() which looks at > the type of > each object and returns something that's obtained differently > for each > type (but which the sort function can compare). > > > > > When you say 'Foo = namedtuple("Foo", "spam ham")', you are > creating a > "factory" which is able to generate "Foo" objects for you. > > When you say "x = Foo(1, 2)" you are using the factory to create an > object for you which has its "spam" and "ham" attributes set to the > values 1 and 2 respectively. > > When you say "records = [Foo(x, y) for x, y in some_iterable()]", you > are creating a list of such objects. This is the thing you > are then sorting. > > > > Does that make sense? > > Regards, E. Perfect sense. And now that I've confirmed in code that both sorted() and .sort() behave as hoped for with namedtuples, I couldn't be happier. ;) The only thing I don't think you have 100% correct is your assertion that records is a list. And I'm really not sure now that records.extend(Record._make(row) for row in rows) is a list comprehension. That's the last statement in the creation of 'records', and immediately after that statement executes, the type function says the resulting 'records' is a class, probably derived from list, but it's not a straight up list. 'records' is enough different that you can't assume across the board that namedtuples created this way are equivalent to a list. You do run into problems if you assume it behaves like a list, or even like standard tuples, because it doesn't always. Believe me, when I first started working with namedtuples, I got plenty snarled up debugging code that was written assuming list behavior to know that a namedtuple of namedtuples is not exactly a list. Or even exactly like a list. But that's just a quibble. The important thing in this context is that both .sort() and sorted() treat it like a list and DTRT. And that's very nice. ;) Deborah -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 10/01/17 00:54, Deborah Swanson wrote: Since I won't change the order of the records again after the sort, I'm using records.sort(key=operator.attrgetter("Description", "Date")) once, which also works perfectly. So both sorted() and sort() can be used to sort namedtuples. Good to know! As people keep saying, the object you have called 'records' is a *list* of namedtuple objects. It is not a namedtuple. IIRC, you create it using a list comprehension which creates the records. A list comprehension always creates a list. The sorted() function and the list.sort() method can be used to sort a list containing any objects - it's just a case of telling them how to obtain the key values to compare (which, in the case of simple attribute access which the namedtuple objects allow, "operator.attrgetter()" will do that). This is why sorting the list works for you. You could sort objects of different types - but you might need to supply a function instead of operator.attrgetter() which looks at the type of each object and returns something that's obtained differently for each type (but which the sort function can compare). When you say 'Foo = namedtuple("Foo", "spam ham")', you are creating a "factory" which is able to generate "Foo" objects for you. When you say "x = Foo(1, 2)" you are using the factory to create an object for you which has its "spam" and "ham" attributes set to the values 1 and 2 respectively. When you say "records = [Foo(x, y) for x, y in some_iterable()]", you are creating a list of such objects. This is the thing you are then sorting. Does that make sense? Regards, E. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Peter Otten wrote, on January 09, 2017 3:27 PM > > While stable sort is nice in this case you can just say > > key=operator.attrgetter("Description", "Date") > > Personally I'd only use sorted() once and then switch to the > sort() method. This works perfectly, thank you. As I read the docs, the main (only?) difference between sorted() and .sort() is that .sort() sorts the list in place. Since I won't change the order of the records again after the sort, I'm using records.sort(key=operator.attrgetter("Description", "Date")) once, which also works perfectly. So both sorted() and sort() can be used to sort namedtuples. Good to know! -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
breamore...@gmail.com wrote: > On Monday, January 9, 2017 at 5:34:12 PM UTC, Tim Chase wrote: >> On 2017-01-09 08:31, breamoreboy wrote: >> > On Monday, January 9, 2017 at 2:22:19 PM UTC, Tim Chase wrote: >> > > I usually wrap the iterable in something like >> > > >> > > def pairwise(it): >> > > prev = next(it) >> > > for thing in it: >> > > yield prev, thing >> > > prev = thing >> > >> > Or from >> > https://docs.python.org/3/library/itertools.html#itertools-recipes:->> > >> > def pairwise(iterable): >> > "s -> (s0,s1), (s1,s2), (s2, s3), ..." >> > a, b = tee(iterable) >> > next(b, None) >> > return zip(a, b) >> > >> > This and many other recipes are available in the more-itertools >> > module which is on pypi. >> >> Ah, helpful to not have to do it from scratch each time. Also, I see >> several others that I've coded up from scratch (particularly the >> partition() and first_true() functions). >> >> I usually want to make sure it's tailored for my use cases. The above >> pairwise() is my most common use case, but I occasionally want N-wise >> pairing > > def ntuplewise(iterable, n=2): > args = tee(iterable, n) > loops = n - 1 > while loops: > for _ in range(loops): > next(args[loops], None) > loops -= 1 > return zip(*args) > >> >> s -> (s0,s1,…sN), (s1,s2,…S{N+1}), (s2,s3,…s{N+2}), … >> >> or to pad them out so either the leader/follower gets *all* of the >> values, with subsequent values being a padding value: >> >> # lst = [s0, s1, s2] >> (s0,s1), (s1, s2), (s2, PADDING) > > Use zip_longest instead of zip in the example code above. > >> # or >> (PADDING, s0), (s0, s1), (s1, s2) > > Haven't a clue off of the top of my head and I'm too darn tired to think > about it :) In both cases modify the iterable before feeding it to ntuplewise(): >>> PADDING = None >>> N = 3 >>> items = range(5) >>> list(ntuplewise(chain(repeat(PADDING, N-1), items), N)) [(None, None, 0), (None, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, 4)] >>> list(ntuplewise(chain(items, repeat(PADDING, N-1)), N)) [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, None), (4, None, None)] -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On Monday, January 9, 2017 at 5:34:12 PM UTC, Tim Chase wrote: > On 2017-01-09 08:31, breamoreboy wrote: > > On Monday, January 9, 2017 at 2:22:19 PM UTC, Tim Chase wrote: > > > I usually wrap the iterable in something like > > > > > > def pairwise(it): > > > prev = next(it) > > > for thing in it: > > > yield prev, thing > > > prev = thing > > > > Or from > > https://docs.python.org/3/library/itertools.html#itertools-recipes:- > > > > def pairwise(iterable): > > "s -> (s0,s1), (s1,s2), (s2, s3), ..." > > a, b = tee(iterable) > > next(b, None) > > return zip(a, b) > > > > This and many other recipes are available in the more-itertools > > module which is on pypi. > > Ah, helpful to not have to do it from scratch each time. Also, I see > several others that I've coded up from scratch (particularly the > partition() and first_true() functions). > > I usually want to make sure it's tailored for my use cases. The above > pairwise() is my most common use case, but I occasionally want N-wise > pairing def ntuplewise(iterable, n=2): args = tee(iterable, n) loops = n - 1 while loops: for _ in range(loops): next(args[loops], None) loops -= 1 return zip(*args) > > s -> (s0,s1,…sN), (s1,s2,…S{N+1}), (s2,s3,…s{N+2}), … > > or to pad them out so either the leader/follower gets *all* of the > values, with subsequent values being a padding value: > > # lst = [s0, s1, s2] > (s0,s1), (s1, s2), (s2, PADDING) Use zip_longest instead of zip in the example code above. > # or > (PADDING, s0), (s0, s1), (s1, s2) Haven't a clue off of the top of my head and I'm too darn tired to think about it :) > > but it's good to have my common cases already coded & tested. > > -tkc Kindest regards. Mark Lawrence. -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
Rhodri James wrote: > On 09/01/17 21:40, Deborah Swanson wrote: >> Peter Otten wrote, on January 09, 2017 6:51 AM >>> >>> records = sorted( >>> set(records), >>> key=operator.attrgetter("Description") >>> ) >> >> Good, this is confirmation that 'sorted()' is the way to go. I want a 2 >> key sort, Description and Date, but I think I can figure out how to do >> that. > > There's a handy trick that you can use because the sorting algorithm is > stable. First, sort on your secondary key. This will leave the data in > the wrong order, but objects with the same primary key will be in the > right order by secondary key relative to each other. > > Then sort on your primary key. Because the sorting algorithm is stable, > it won't disturb the relative order of objects with the same primary > key, giving you the sort that you want! > > So assuming you want your data sorted by date, and then by description > within the same date, it's just: > > records = sorted( > sorted( > set(records), > key=operator.attrgetter("Description") > ), > key=operator.attrgetter("Date") > ) While stable sort is nice in this case you can just say key=operator.attrgetter("Description", "Date") Personally I'd only use sorted() once and then switch to the sort() method. -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 09/01/17 21:40, Deborah Swanson wrote: Peter Otten wrote, on January 09, 2017 6:51 AM records = sorted( set(records), key=operator.attrgetter("Description") ) Good, this is confirmation that 'sorted()' is the way to go. I want a 2 key sort, Description and Date, but I think I can figure out how to do that. There's a handy trick that you can use because the sorting algorithm is stable. First, sort on your secondary key. This will leave the data in the wrong order, but objects with the same primary key will be in the right order by secondary key relative to each other. Then sort on your primary key. Because the sorting algorithm is stable, it won't disturb the relative order of objects with the same primary key, giving you the sort that you want! So assuming you want your data sorted by date, and then by description within the same date, it's just: records = sorted( sorted( set(records), key=operator.attrgetter("Description") ), key=operator.attrgetter("Date") ) -- Rhodri James *-* Kynesim Ltd -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
breamore...@gmail.com wrote, on January 09, 2017 8:32 AM > > On Monday, January 9, 2017 at 2:22:19 PM UTC, Tim Chase wrote: > > On 2017-01-08 22:58, Deborah Swanson wrote: > > > 1) I have a section that loops through the sorted data, compares two > > > adjacent rows at a time, and marks one of them for deletion if the > > > rows are identical. > > > and my question is whether there's a way to work with two adjacent > > > rows without using subscripts? > > > > I usually wrap the iterable in something like > > > > def pairwise(it): > > prev = next(it) > > for thing in it: > > yield prev, thing > > prev = thing > > > > for prev, cur in pairwise(records): > > compare(prev, cur) > > > > which I find makes it more readable. > > > > -tkc > > Or from > https://docs.python.org/3/library/itertools.ht> ml#itertools-recipes:- > > def pairwise(iterable): > "s -> (s0,s1), (s1,s2), (s2, s3), ..." > a, b = tee(iterable) > next(b, None) > return zip(a, b) > > This and many other recipes are available in the > more-itertools module which is on pypi. Thanks, I'll keep this since I seem to do pairwise comparisons a lot. I'm going to try using set or OrderedDict for the current problem, per Peter's suggestion, but if I can't make that work, this surely will. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Tim Chase wrote, on January 09, 2017 6:22 AM > > On 2017-01-08 22:58, Deborah Swanson wrote: > > 1) I have a section that loops through the sorted data, compares two > > adjacent rows at a time, and marks one of them for deletion if the > > rows are identical. > > and my question is whether there's a way to work with two adjacent > > rows without using subscripts? > > I usually wrap the iterable in something like > > def pairwise(it): > prev = next(it) > for thing in it: > yield prev, thing > prev = thing > > for prev, cur in pairwise(records): > compare(prev, cur) > > which I find makes it more readable. > > -tkc This looks very useful, and comparing two adjacent rows is something I do often. Thanks Tim! Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Peter Otten wrote, on January 09, 2017 6:51 AM > > Deborah Swanson wrote: > > > Even better, to get hold of all the records with the same Description > > as the current row, compare them all, mark all but the different ones > > for deletion, and then resume processing the records after the last > > one? > > When you look at all fields for deduplication anyway there's no need to > treat one field (Description) specially. Just > > records = set(records) I haven't worked with sets before, so this would be a good time to start. > should be fine. As the initial order is lost* you probably want to sort > afterwards. The code then becomes > > records = sorted( > set(records), > key=operator.attrgetter("Description") > ) Good, this is confirmation that 'sorted()' is the way to go. I want a 2 key sort, Description and Date, but I think I can figure out how to do that. > Now if you want to fill in missing values, you should probably do this > before deduplication That's how my original code was written, to fill in missing values as the very last thing before saving to csv. > -- and the complete() function introduced in >https://mail.python.org/pipermail/python-list/2016-December/717847.html > can be adapted to work with namedtuples instead of dicts. Ah, your defaultdict suggestion. Since my original comprows() function to fill in missing values is now broken after the rest of the code was rewritten for namedtuples (I just commented it out to test the namedtuples version), this would be a good time to look at defaultdict. > (*) If you want to preserve the initial order you can use a > collections.OrderedDict instead of the set. OrderedDict is another thing I haven't used, but would love to, so I think I'll try both the set and the OrderedDict, and see which one is best here. Thanks again Peter, all your help is very much appreciated. Deborah -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On Monday, January 9, 2017 at 2:22:19 PM UTC, Tim Chase wrote: > On 2017-01-08 22:58, Deborah Swanson wrote: > > 1) I have a section that loops through the sorted data, compares two > > adjacent rows at a time, and marks one of them for deletion if the > > rows are identical. > > > > I'm using > > > > for i in range(len(records)-1): > > r1 = records[i] > > r2 = records[i+1] > > if r1.xx = r2.xx: > > . > > . > > and my question is whether there's a way to work with two adjacent > > rows without using subscripts? > > I usually wrap the iterable in something like > > def pairwise(it): > prev = next(it) > for thing in it: > yield prev, thing > prev = thing > > for prev, cur in pairwise(records): > compare(prev, cur) > > which I find makes it more readable. > > -tkc Or from https://docs.python.org/3/library/itertools.html#itertools-recipes:- def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) next(b, None) return zip(a, b) This and many other recipes are available in the more-itertools module which is on pypi. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Deborah Swanson wrote: > Even better, to get hold of all the records with the same Description as > the current row, compare them all, mark all but the different ones for > deletion, and then resume processing the records after the last one? When you look at all fields for deduplication anyway there's no need to treat one field (Description) specially. Just records = set(records) should be fine. As the initial order is lost* you probably want to sort afterwards. The code then becomes records = sorted( set(records), key=operator.attrgetter("Description") ) Now if you want to fill in missing values, you should probably do this before deduplication -- and the complete() function introduced in https://mail.python.org/pipermail/python-list/2016-December/717847.html can be adapted to work with namedtuples instead of dicts. (*) If you want to preserve the initial order you can use a collections.OrderedDict instead of the set. -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
On 2017-01-08 22:58, Deborah Swanson wrote: > 1) I have a section that loops through the sorted data, compares two > adjacent rows at a time, and marks one of them for deletion if the > rows are identical. > > I'm using > > for i in range(len(records)-1): > r1 = records[i] > r2 = records[i+1] > if r1.xx = r2.xx: > . > . > and my question is whether there's a way to work with two adjacent > rows without using subscripts? I usually wrap the iterable in something like def pairwise(it): prev = next(it) for thing in it: yield prev, thing prev = thing for prev, cur in pairwise(records): compare(prev, cur) which I find makes it more readable. -tkc -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Steve D'Aprano wrote, on January 09, 2017 3:40 AM > > On Mon, 9 Jan 2017 09:57 pm, Deborah Swanson wrote: > > [...] > > I think you are replying to my question about sorting a > namedtuple, in > > this case it's called 'records'. > > > > I think your suggestion works for lists and tuples, and probably > > dictionaries. But namedtuples doesn't have a sort function. > > Tuples in general (whether named or not) represent structs or > records, where the position of the item is significant. It > doesn't usually make sense to sort individual elements of a > record or tuple: > > Before sorting: Record(name='George', spouse='', > position='Accountant') After sorting: Record(name='', > spouse='Accountant', position='George') > > > I think what you want to do is sort a list of records, not > each record itself. Or possibly you want to reorder the > columns, in which case the easiest way to do that is by > editing the CSV file in LibreOffice or Excel or another > spreadsheet application. > If you have a list of records, call .sort() on the list, not > the individual records. > I want to sort a nametuple of records. I could convert it to a list easy enough, with: recs = list(records) and then use the column copying and deleting method I described in my previous post and use mergesort. This would give me exactly what I had in my original code, and it would be ok. There's only one step after the mergesort, and I could do it without the namedtuple, although I'd have to count column indices for that section, and rewrite them whenever the columns changed, which was what my original 2-letter codes and the conversion to namedtuples were both meant to avoid. So all in all, the best thing would be if there's a way to sort records as a namedtuple. > But if I am wrong, and you absolutely must sort the fields of > each record, call the sorted() function, which will copy the > fields into a list and sort the list. That is: > > alist = sorted(record) > > -- > Steve > "Cheer up," they said, "things could be worse." So I cheered > up, and sure enough, things got worse. I'm not sure what you mean by sorting the fields of each record. I want all the rows in records sorted by the Description and Date in each record. alist = sorted(record) looks like it sorts one record, by what? An alphanumeric ordering of all the fields in record? That would be beyond useless for my purposes. It's possible sorted() will work on namedtuples. Stackoverflow has an example: from operator import attrgetter from collections import namedtuple Person = namedtuple('Person', 'name age score') seq = [Person(name='nick', age=23, score=100), Person(name='bob', age=25, score=200)] # Sort list by name sorted(seq, key=attrgetter('name')) # Sort list by age sorted(seq, key=attrgetter('age')) So apparently it's done, and it's a keyed sort too. Although what they're sorting is a list of namedtuples, which may or may not work on a single namedtuple made of row (record) namedtuples. I'll definitely try it tomorrow. Know any way to convert a list back into a namedtuple? I suppose I could go through all the steps used at the beginning to make records, but that seems a waste if there's any way at all to sort the namedtuple without converting it into a list. Thanks Steven! Maybe sorted() is my friend here. (haha, and maybe not.) Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
On Mon, 9 Jan 2017 09:57 pm, Deborah Swanson wrote: [...] > I think you are replying to my question about sorting a namedtuple, in > this case it's called 'records'. > > I think your suggestion works for lists and tuples, and probably > dictionaries. But namedtuples doesn't have a sort function. Tuples in general (whether named or not) represent structs or records, where the position of the item is significant. It doesn't usually make sense to sort individual elements of a record or tuple: Before sorting: Record(name='George', spouse='', position='Accountant') After sorting: Record(name='', spouse='Accountant', position='George') I think what you want to do is sort a list of records, not each record itself. Or possibly you want to reorder the columns, in which case the easiest way to do that is by editing the CSV file in LibreOffice or Excel or another spreadsheet application. If you have a list of records, call .sort() on the list, not the individual records. But if I am wrong, and you absolutely must sort the fields of each record, call the sorted() function, which will copy the fields into a list and sort the list. That is: alist = sorted(record) -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Antoon Pardon wrote, on January 09, 2017 2:35 AM > If I understand correctly you want something like: > > records.sort(key = lamda rec: rec.xx) > > AKA > > from operator import attrgetter > records.sort(key = attrgetter('xx')) > > or maybe: > > records.sort(key = lambda rec: (rec.xx,) + tuple(rec)) > > -- > Antoon Pardon I think you are replying to my question about sorting a namedtuple, in this case it's called 'records'. I think your suggestion works for lists and tuples, and probably dictionaries. But namedtuples doesn't have a sort function. >>> from collections import namedtuple >>> dir(namedtuple) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] so nothing with records.sort will work. :( Deborah -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Antoon Pardon wrote, on January 09, 2017 2:14 AM > > 1) I have a section that loops through the sorted data, compares two > > adjacent rows at a time, and marks one of them for deletion if the > > rows are identical. > > > > I'm using > > > > for i in range(len(records)-1): > > r1 = records[i] > > r2 = records[i+1] > > if r1.xx = r2.xx: > > . > > . > > and my question is whether there's a way to work with two adjacent > > rows without using subscripts? > > for r1, r2 in zip(records[i], records[i+1]): > if r1.xx == r2.xx > . > . Ok, I've seen the zip function before and it might do the job, but here I think you're suggesting: for i in range(len(records)-1): for r1, r2 in zip(records[i], records[i+1]): if r1.xx == r2.xx . . The hope was to do the loop without subscripts, and this may or may not have gotchas because records is a namedtuple: for r in records(1:): . . Deborah -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
Op 09-01-17 om 07:58 schreef Deborah Swanson: > Peter Otten wrote, on January 08, 2017 5:21 AM >> Deborah Swanson wrote: >> >>> Peter Otten wrote, on January 08, 2017 3:01 AM >> >> Personally I would recommend against mixing data (an actual location) > and >> metadata (the column name,"Location"), but if you wish my code can be >> adapted as follows: >> >> infile = open("dictreader_demo.csv") >> rows = csv.reader(infile) >> fieldnames = next(rows) >> Record = namedtuple("Record", fieldnames) >> records = [Record._make(fieldnames)] >> records.extend(Record._make(row) for row in rows) > Works like a charm. I stumbled a bit changing all my subscripted > variables to namedtuples and rewriting the inevitable places my code > that didn't work the same. But actually it was fun, especially deleting > all the sections and variables I no longer needed. And it executes > correctly now too - with recognizable fieldnames instead of my quirky > 2-letter code subscripts. All in all a huge win! > > I do have two more questions. > > 1) I have a section that loops through the sorted data, compares two > adjacent rows at a time, and marks one of them for deletion if the rows > are identical. > > I'm using > > for i in range(len(records)-1): > r1 = records[i] > r2 = records[i+1] > if r1.xx = r2.xx: > . > . > and my question is whether there's a way to work with two adjacent rows > without using subscripts? > > Even better, to get hold of all the records with the same Description as > the current row, compare them all, mark all but the different ones for > deletion, and then resume processing the records after the last one? If I understand correctly you want something like: records.sort(key = lamda rec: rec.xx) AKA from operator import attrgetter records.sort(key = attrgetter('xx')) or maybe: records.sort(key = lambda rec: (rec.xx,) + tuple(rec)) -- Antoon Pardon -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
Op 09-01-17 om 07:58 schreef Deborah Swanson: > Peter Otten wrote, on January 08, 2017 5:21 AM >> Deborah Swanson wrote: >> >>> Peter Otten wrote, on January 08, 2017 3:01 AM >> >> Personally I would recommend against mixing data (an actual location) > and >> metadata (the column name,"Location"), but if you wish my code can be >> adapted as follows: >> >> infile = open("dictreader_demo.csv") >> rows = csv.reader(infile) >> fieldnames = next(rows) >> Record = namedtuple("Record", fieldnames) >> records = [Record._make(fieldnames)] >> records.extend(Record._make(row) for row in rows) > Works like a charm. I stumbled a bit changing all my subscripted > variables to namedtuples and rewriting the inevitable places my code > that didn't work the same. But actually it was fun, especially deleting > all the sections and variables I no longer needed. And it executes > correctly now too - with recognizable fieldnames instead of my quirky > 2-letter code subscripts. All in all a huge win! > > I do have two more questions. > > 1) I have a section that loops through the sorted data, compares two > adjacent rows at a time, and marks one of them for deletion if the rows > are identical. > > I'm using > > for i in range(len(records)-1): > r1 = records[i] > r2 = records[i+1] > if r1.xx = r2.xx: > . > . > and my question is whether there's a way to work with two adjacent rows > without using subscripts? for r1, r2 in zip(records[i], records[i+1]): if r1.xx == r2.xx . . -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Steven D'Aprano wrote, on January 08, 2017 7:30 PM > > On Sunday 08 January 2017 20:53, Deborah Swanson wrote: > > > Steven D'Aprano wrote, on January 07, 2017 10:43 PM > > No, I'm pretty sure that's not the case. I don't have access > to your CSV file, > but I can simulate it: > > ls = [['Location', 'Date', 'Price'], > ['here', '1/1/17', '1234'], > ['there', '1/1/17', '5678'], > ['everywhere', '1/1/17', '9821'] > ] > > from collections import namedtuple > lst = namedtuple('lst', ls[0]) > > print(type(lst)) > print(lst) > > > > If you run that code, you should see: > > > > > > which contradicts your statement: > > 'lst' is a namedtuple instance with each of the column > titles as field names. Yes, yes. In a careless moment I called a class an instance. > and explains why you had to access the individual property > method `fget`: you > were accessing the *class object* rather than an actual named > tuple instance. That code is deleted and long gone now, so I can't look at it in the debugger, but yes, I'm pretty sure 'fget' is a class member. > The code you gave was: > > lst.Location.fget(l) > > where l was not given, but I can guess it was a row of the > CSV file, i.e. an > individual record. So: > > - lst was the named tuple class, a subclass of tuple > > - lst.Location returns a property object > > - lst.Location.fget is the internal fget method of the > property object. And your point is? Perhaps I didn't express myself in a way that you could recognize, but I understood all of that before I wrote to you, and attempted to convey that understanding to you. Obviously I failed, if you now think I need a lesson in what's going on here. > I think Peter Otten has the right idea: create a list of > records with something > like this: > > > Record = namedtuple('Record', ls[0]) > data = [Record(*row) for row in ls[1:]) > > > or if you prefer Peter's version: > > data = [Record._make(row) for row in ls[1:]) > > > Half the battle is coming up with the right data structures :-) Can't and wouldn't disagree with any part of that! > -- > Steven > "Ever since I learned about confirmation bias, I've been seeing > it everywhere." - Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Peter Otten wrote, on January 08, 2017 5:21 AM > > Deborah Swanson wrote: > > > Peter Otten wrote, on January 08, 2017 3:01 AM > > Personally I would recommend against mixing data (an actual location) and > metadata (the column name,"Location"), but if you wish my code can be > adapted as follows: > > infile = open("dictreader_demo.csv") > rows = csv.reader(infile) > fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > records = [Record._make(fieldnames)] > records.extend(Record._make(row) for row in rows) Works like a charm. I stumbled a bit changing all my subscripted variables to namedtuples and rewriting the inevitable places my code that didn't work the same. But actually it was fun, especially deleting all the sections and variables I no longer needed. And it executes correctly now too - with recognizable fieldnames instead of my quirky 2-letter code subscripts. All in all a huge win! I do have two more questions. 1) I have a section that loops through the sorted data, compares two adjacent rows at a time, and marks one of them for deletion if the rows are identical. I'm using for i in range(len(records)-1): r1 = records[i] r2 = records[i+1] if r1.xx = r2.xx: . . and my question is whether there's a way to work with two adjacent rows without using subscripts? Even better, to get hold of all the records with the same Description as the current row, compare them all, mark all but the different ones for deletion, and then resume processing the records after the last one? 2) I'm using mergesort. (I didn't see any way to sort a namedtuple in the docs.) In the list version of my code I copied and inserted the 2 columns I wanted to sort by into the beginning of the list, and then deleted them after the list was sorted. But just looking at records, I'm not so sure that can easily be done. I remember your code to work with columns of the data: columnA = [record.A for record in records] and I can see how that would get me columnA and columnB, but then is there any better way to insert and delete columns in an existing namedtuple than slicing? And I don't think you can insert or delete a whole column while slicing. Or maybe my entire approach is not the best. I know it's possible to do keyed sorts, but I haven't actually written or used any. So I just pulled a mergesort off the shelf and got what I wanted by inserting copies of those 2 columns at the front, and then deleting them when the sort was complete. Not exactly elegant, but it works. Any suggestions would be most welcome. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Steven D'Aprano wrote, on January 07, 2017 10:43 PM > > On Sunday 08 January 2017 16:39, Deborah Swanson wrote: > > The recommended way is with the _replace method: > > py> instance._replace(A=999) > Record(A=999, B=20, C=30) > py> instance._replace(A=999, C=888) > Record(A=999, B=20, C=888) > > -- > Steven > "Ever since I learned about confirmation bias, I've been seeing > it everywhere." - Jon Ronson instance._replace(A=999) works perfectly, and editting my existing assignment statements was really easy. Thanks - a lot. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
On Sunday 08 January 2017 20:53, Deborah Swanson wrote: > Steven D'Aprano wrote, on January 07, 2017 10:43 PM >> >> On Sunday 08 January 2017 16:39, Deborah Swanson wrote: >> >> > What I've done so far: >> > >> > with open('E:\\Coding projects\\Pycharm\\Moving\\Moving >> 2017 in.csv', >> > 'r') as infile: >> > ls = list(csv.reader(infile)) >> > lst = namedtuple('lst', ls[0]) >> > >> > where 'ls[0]' is the header row of the csv, and it works perfectly >> > well. 'lst' is a namedtuple instance with each of the >> column titles as >> > field names. >> >> Are you sure? namedtuple() returns a class, not a list: > > Yes. 'ls' is defined as 'list(csv.reader(infile))', so ls[0] is the > first row from the csv, the header row. 'lst' is the namedtuple. > > Perhaps what's puzzling you is that the way I've written it, the list of > data and the namedtuple are disjoint, and that's the problem. No, I'm pretty sure that's not the case. I don't have access to your CSV file, but I can simulate it: ls = [['Location', 'Date', 'Price'], ['here', '1/1/17', '1234'], ['there', '1/1/17', '5678'], ['everywhere', '1/1/17', '9821'] ] from collections import namedtuple lst = namedtuple('lst', ls[0]) print(type(lst)) print(lst) If you run that code, you should see: which contradicts your statement: 'lst' is a namedtuple instance with each of the column titles as field names. and explains why you had to access the individual property method `fget`: you were accessing the *class object* rather than an actual named tuple instance. The code you gave was: lst.Location.fget(l) where l was not given, but I can guess it was a row of the CSV file, i.e. an individual record. So: - lst was the named tuple class, a subclass of tuple - lst.Location returns a property object - lst.Location.fget is the internal fget method of the property object. I think Peter Otten has the right idea: create a list of records with something like this: Record = namedtuple('Record', ls[0]) data = [Record(*row) for row in ls[1:]) or if you prefer Peter's version: data = [Record._make(row) for row in ls[1:]) Half the battle is coming up with the right data structures :-) -- Steven "Ever since I learned about confirmation bias, I've been seeing it everywhere." - Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Paul Rudin wrote, on January 08, 2017 6:49 AM > > "Deborah Swanson"writes: > > > Peter Otten wrote, on January 08, 2017 3:01 AM > >> > >> columnA = [record.A for record in records] > > > > This is very neat. Something like a list comprehension for named > > tuples? > > Not something like - this *is* a list comprehension - it > creates a list of named tuples. > > The thing you iterate over within the comprehension can be > any iterator. (Of course you're going to run into problems if > you try to construct a list from an infinite iterator.) Thanks Paul. I've been meaning to spend some time getting to thoroughly know list comprehensions for awhile now, but I keep running into so many new things I just haven't gotten to it. I thought it looked like one, but I hedged my wording because I wasn't sure. Infinite iterators definitely sound like something to remember! -- https://mail.python.org/mailman/listinfo/python-list
Re: Using namedtuples field names for column indices in a list of lists
"Deborah Swanson"writes: > Peter Otten wrote, on January 08, 2017 3:01 AM >> >> columnA = [record.A for record in records] > > This is very neat. Something like a list comprehension for named tuples? Not something like - this *is* a list comprehension - it creates a list of named tuples. The thing you iterate over within the comprehension can be any iterator. (Of course you're going to run into problems if you try to construct a list from an infinite iterator.) -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Peter Otten wrote, on January 08, 2017 5:21 AM > > Deborah Swanson wrote: > > > Peter Otten wrote, on January 08, 2017 3:01 AM > >> > >> Deborah Swanson wrote: > >> > >> > to do that is with .fget(). Believe me, I tried every > possible > >> > way > > to > >> > use instance.A or instance[1] and no way could I get > >> > ls[instance.A]. > >> > >> Sorry, no. > > > > I quite agree, I was describing the dead end I was in from > peeling the > > list of data and the namedtuple from the header row off the csv > > separately. That was quite obviously the wrong path to take, but I > > didn't know what a good way would be. > > > >> To get a list of namedtuple instances use: > >> > >> rows = csv.reader(infile) > >> Record = namedtuple("Record", next(rows)) > >> records = [Record._make(row) for row in rows] > > > > This is slightly different from Steven's suggestion, and it makes a > > block of records that I think would be iterable. At any > rate all the > > data from the csv would belong to a single data structure, and that > > seems inherently a good thing. > > > > a = records[i].A , for example > > > > And I think that this would produce recognizable field names in my > > code (which was the original goal) if the following works: > > > > records[0] is the header row == ('Description', 'Location', etc.) > > Personally I would recommend against mixing data (an actual > location) and > metadata (the column name,"Location"), but if you wish my code can be > adapted as follows: > > infile = open("dictreader_demo.csv") > rows = csv.reader(infile) > fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > records = [Record._make(fieldnames)] > records.extend(Record._make(row) for row in rows) Peter, this looks really good, and yes, I didn't feel so good about records[i].Location either, but it was the only way I could see to get the recognizable variable names I want. By extending records from a namedtuple of field names, I think it can be done cleanly. I'll try it and see. > If you want a lot of flexibility without doing the legwork > yourself you > might also have a look at pandas. Example session: > > $ cat places.csv > Location,Description,Size > here,something,17 > there,something else,10 > $ python3 > Python 3.4.3 (default, Nov 17 2016, 01:08:31) > [GCC 4.8.4] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import pandas > >>> places = pandas.read_csv("places.csv") > >>> places > Location Description Size > 0 here something17 > 1there something else10 > > [2 rows x 3 columns] > >>> places.Location > 0 here > 1there > Name: Location, dtype: object > >>> places.sort(columns="Size") > Location Description Size > 1there something else10 > 0 here something17 > > [2 rows x 3 columns] > >>> places.Size.mean() > 13.5 > > Be aware that there is a learning curve... Yes, and I'm sure the learning curve is steep. I watched a webinar on pandas about a year ago, not to actually learn it, but just to take in the big picture and see something people were really accomplishing with python. I won't take this on any time right away, but I'll definitely keep it and work with it sometime. Maybe as just an intro to pandas, using my data from the real estate project. > > If I can use records[i].Location for the Location column > data in row > > 'i', then I've got my recognizable-field-name variables. > > > >> If you want a column from a list of records you need to extract it > >> manually: > >> > >> columnA = [record.A for record in records] > > > > This is very neat. Something like a list comprehension for named > > tuples? > > > > Thanks Peter, I'll try it all tomorrow and see how it goes. > > > > PS. I haven't forgotten your defaultdict suggestion, I'm > just taking > > the suggestions I got in the "Cleaning up Conditionals" > thread one at > > a time, and I will get to defaultdict. Then I'll look at > all of them > > and see what final version of the code will work best with all the > > factors to consider. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Deborah Swanson wrote: > Peter Otten wrote, on January 08, 2017 3:01 AM >> >> Deborah Swanson wrote: >> >> > to do that is with .fget(). Believe me, I tried every > possible way > to >> > use instance.A or instance[1] and no way could I get ls[instance.A]. >> >> Sorry, no. > > I quite agree, I was describing the dead end I was in from peeling the > list of data and the namedtuple from the header row off the csv > separately. That was quite obviously the wrong path to take, but I > didn't know what a good way would be. > >> To get a list of namedtuple instances use: >> >> rows = csv.reader(infile) >> Record = namedtuple("Record", next(rows)) >> records = [Record._make(row) for row in rows] > > This is slightly different from Steven's suggestion, and it makes a > block of records that I think would be iterable. At any rate all the > data from the csv would belong to a single data structure, and that > seems inherently a good thing. > > a = records[i].A , for example > > And I think that this would produce recognizable field names in my code > (which was the original goal) if the following works: > > records[0] is the header row == ('Description', 'Location', etc.) Personally I would recommend against mixing data (an actual location) and metadata (the column name,"Location"), but if you wish my code can be adapted as follows: infile = open("dictreader_demo.csv") rows = csv.reader(infile) fieldnames = next(rows) Record = namedtuple("Record", fieldnames) records = [Record._make(fieldnames)] records.extend(Record._make(row) for row in rows) If you want a lot of flexibility without doing the legwork yourself you might also have a look at pandas. Example session: $ cat places.csv Location,Description,Size here,something,17 there,something else,10 $ python3 Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas >>> places = pandas.read_csv("places.csv") >>> places Location Description Size 0 here something17 1there something else10 [2 rows x 3 columns] >>> places.Location 0 here 1there Name: Location, dtype: object >>> places.sort(columns="Size") Location Description Size 1there something else10 0 here something17 [2 rows x 3 columns] >>> places.Size.mean() 13.5 Be aware that there is a learning curve... > If I can use records[i].Location for the Location column data in row > 'i', then I've got my recognizable-field-name variables. > >> If you want a column from a list of records you need to >> extract it manually: >> >> columnA = [record.A for record in records] > > This is very neat. Something like a list comprehension for named tuples? > > Thanks Peter, I'll try it all tomorrow and see how it goes. > > PS. I haven't forgotten your defaultdict suggestion, I'm just taking the > suggestions I got in the "Cleaning up Conditionals" thread one at a > time, and I will get to defaultdict. Then I'll look at all of them and > see what final version of the code will work best with all the factors > to consider. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Peter Otten wrote, on January 08, 2017 3:01 AM > > Deborah Swanson wrote: > > > to do that is with .fget(). Believe me, I tried every > possible way to > > use instance.A or instance[1] and no way could I get ls[instance.A]. > > Sorry, no. I quite agree, I was describing the dead end I was in from peeling the list of data and the namedtuple from the header row off the csv separately. That was quite obviously the wrong path to take, but I didn't know what a good way would be. > To get a list of namedtuple instances use: > > rows = csv.reader(infile) > Record = namedtuple("Record", next(rows)) > records = [Record._make(row) for row in rows] This is slightly different from Steven's suggestion, and it makes a block of records that I think would be iterable. At any rate all the data from the csv would belong to a single data structure, and that seems inherently a good thing. a = records[i].A , for example And I think that this would produce recognizable field names in my code (which was the original goal) if the following works: records[0] is the header row == ('Description', 'Location', etc.) If I can use records[i].Location for the Location column data in row 'i', then I've got my recognizable-field-name variables. > If you want a column from a list of records you need to > extract it manually: > > columnA = [record.A for record in records] This is very neat. Something like a list comprehension for named tuples? Thanks Peter, I'll try it all tomorrow and see how it goes. PS. I haven't forgotten your defaultdict suggestion, I'm just taking the suggestions I got in the "Cleaning up Conditionals" thread one at a time, and I will get to defaultdict. Then I'll look at all of them and see what final version of the code will work best with all the factors to consider. -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Deborah Swanson wrote: > to do that is with .fget(). Believe me, I tried every possible way to > use instance.A or instance[1] and no way could I get ls[instance.A]. Sorry, no. To get a list of namedtuple instances use: rows = csv.reader(infile) Record = namedtuple("Record", next(rows)) records = [Record._make(row) for row in rows] If you want a column from a list of records you need to extract it manually: columnA = [record.A for record in records] -- https://mail.python.org/mailman/listinfo/python-list
RE: Using namedtuples field names for column indices in a list of lists
Steven D'Aprano wrote, on January 07, 2017 10:43 PM > > On Sunday 08 January 2017 16:39, Deborah Swanson wrote: > > > What I've done so far: > > > > with open('E:\\Coding projects\\Pycharm\\Moving\\Moving > 2017 in.csv', > > 'r') as infile: > > ls = list(csv.reader(infile)) > > lst = namedtuple('lst', ls[0]) > > > > where 'ls[0]' is the header row of the csv, and it works perfectly > > well. 'lst' is a namedtuple instance with each of the > column titles as > > field names. > > Are you sure? namedtuple() returns a class, not a list: Yes. 'ls' is defined as 'list(csv.reader(infile))', so ls[0] is the first row from the csv, the header row. 'lst' is the namedtuple. Perhaps what's puzzling you is that the way I've written it, the list of data and the namedtuple are disjoint, and that's the problem. > py> from collections import namedtuple > py> names = ['A', 'B', 'C'] > py> namedtuple('lst', names) > > > The way namedtuple() is intended to be used is like this: > > > py> from collections import namedtuple > py> names = ['A', 'B', 'C'] > py> Record = namedtuple('Record', names) > py> instance = Record(10, 20, 30) > py> print(instance) > Record(A=10, B=20, C=30) > > > There is no need to call fget directly to access the > individual fields: > > py> instance.A > 10 > py> instance.B > 20 > py> instance[1] # indexing works too > 20 > > > which is *much* simpler than: > > py> Record.A.fget(instance) > 10 I don't disagree with anything you've said and shown here. But I want to use the 'instance.A' as a subscript for the list 'ls', and the only way to do that is with .fget(). Believe me, I tried every possible way to use instance.A or instance[1] and no way could I get ls[instance.A]. The problem I'm having here is one of linkage between the named tuple for the column titles and the list that holds the data in the columns. > I think you should be doing something like this: > > pathname = 'E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 > in.csv' with open(pathname, 'r') as infile: > rows = list(csv.reader(infile)) > Record = namedtuple("Record", rows[0]) > for row in rows[1:]: # skip the first row, the header > row = Record(row) > # process this row... > if row.location == 0: > ... Now here you have something I didn't think of: 'row = Record(row)' in a loop through the rows. > [...] > > But I haven't found a way to assign new values to a list element. > > using namedtuple.fieldname. I think a basic problem is that > > namedtuples have the properties of tuples, and you can't assign to an > > existing tuple because they're immutable. > > Indeed. Being tuples, you have to create a new one. You can > do it with slicing, > like ordinary tuples, but that's rather clunky: > > py> print(instance) > Record(A=10, B=20, C=30) > py> Record(999, *instance[1:]) > Record(A=999, B=20, C=30) Very clunky. I don't like modifying standard tuples with slicing, and this is even worse. > The recommended way is with the _replace method: > > py> instance._replace(A=999) > Record(A=999, B=20, C=30) > py> instance._replace(A=999, C=888) > Record(A=999, B=20, C=888) > > > Note that despite the leading underscore, _replace is *not* a > private method of > the class. It is intentionally documented as public. The > leading underscore is > so that it won't clash with any field names. > > > > > -- > Steven > "Ever since I learned about confirmation bias, I've been seeing > it everywhere." - Jon Ronson I will have to work with this. It's entirely possible it will do what I want it to do. The key problem I was having was getting a linkage between the namedtuple and the list of data from the csv. I want to implement a suggestion I got to use a namedtuple made from the header row as subscripts for elements in the list of data, and the example given in the docs: EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade') import csv for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))): print(emp.name, emp.title) assumes the field names will be hardcoded. Reading the csv into a list and then trying to use the namedtuple made from the header row as subscripts is how I ended up resorting to 'Record.A.fget(instance)' to read values, and wasn't able to assign them. But assigning the rows of data into namedtuple instances with: Record = namedtuple("Record", rows[0]) for row in rows[1:]: row = Record(row) does look like the linkage I need and wasn't finding the way I was doing it. If 'Record(row)' is the list data and the columns are the same as defined in 'namedtuple("Record", rows[0])', it really should work. And I didn't get it that _replace could be used to assign new values to namedtuples (duh. Pretty clear now that I reread it, and all the row data is in namedtuple instances.) The big question is whether the namedtuple instances can be used as something recognizable as
Re: Using namedtuples field names for column indices in a list of lists
On Sunday 08 January 2017 16:39, Deborah Swanson wrote: > What I've done so far: > > with open('E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in.csv', > 'r') as infile: > ls = list(csv.reader(infile)) > lst = namedtuple('lst', ls[0]) > > where 'ls[0]' is the header row of the csv, and it works perfectly well. > 'lst' is a namedtuple instance with each of the column titles as field > names. Are you sure? namedtuple() returns a class, not a list: py> from collections import namedtuple py> names = ['A', 'B', 'C'] py> namedtuple('lst', names) The way namedtuple() is intended to be used is like this: py> from collections import namedtuple py> names = ['A', 'B', 'C'] py> Record = namedtuple('Record', names) py> instance = Record(10, 20, 30) py> print(instance) Record(A=10, B=20, C=30) There is no need to call fget directly to access the individual fields: py> instance.A 10 py> instance.B 20 py> instance[1] # indexing works too 20 which is *much* simpler than: py> Record.A.fget(instance) 10 I think you should be doing something like this: pathname = 'E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in.csv' with open(pathname, 'r') as infile: rows = list(csv.reader(infile)) Record = namedtuple("Record", rows[0]) for row in rows[1:]: # skip the first row, the header row = Record(row) # process this row... if row.location == 0: ... [...] > But I haven't found a way to assign new values to a list element. using > namedtuple.fieldname. I think a basic problem is that namedtuples have > the properties of tuples, and you can't assign to an existing tuple > because they're immutable. Indeed. Being tuples, you have to create a new one. You can do it with slicing, like ordinary tuples, but that's rather clunky: py> print(instance) Record(A=10, B=20, C=30) py> Record(999, *instance[1:]) Record(A=999, B=20, C=30) The recommended way is with the _replace method: py> instance._replace(A=999) Record(A=999, B=20, C=30) py> instance._replace(A=999, C=888) Record(A=999, B=20, C=888) Note that despite the leading underscore, _replace is *not* a private method of the class. It is intentionally documented as public. The leading underscore is so that it won't clash with any field names. -- Steven "Ever since I learned about confirmation bias, I've been seeing it everywhere." - Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list