Re: PEP Request: Advanced Data Structures
On Sunday, July 17, 2016 at 3:45:04 AM UTC+5:30, Shrey Desai wrote: > I have found it slightly frustrating that Python does not have built-in > support for advanced data structures (Linked Lists, Stacks/Queues, BST) in > its distribution. Many computer science students, developers, and software > engineers rely on these data structures; having the data structures be a part > of the distribution and be maintained by the Python community would be > immensely useful. > > Currently, we are required to write our own modules that represent these data > structures and rigorously test/refactor them before we can actually use them. > This gets annoying because instead of spending time USING the "correct" > version of the data structure, we have to spend time creating them in the > first place. > > Programming languages like Java have support for Linked Lists, for example, > which makes it easy to just use it instead of trying to create it over again. > As a computer science undergraduate student, I don't want to spend time > writing the module but instead I want to work with it, play around with it, > and do problems with it. > > I know Python currently has a Queue module, but this can definitely be > expanded. There are other more advanced data structures out there, like AVL > trees, splay trees, and tries, but I think that would be overkilll. Having > these data structures above would be immensely useful. > > I'm looking to create a PEP for this issue and some people that would be > willing to 1) vouch for this idea and 2) co-author the draft. Eventually, we > would be co-developers for the project as well. > > Who's in? Hi Shrey Your wish and direction is commendable And looks at an important question Do consider however that the boot may well be on the other foot: viz. So “Python does not have advanced data structures… fir students/edu-purposes” may suggest that python should change Or that education should! Some examples of the nature the the fast-shifting ground under our feet: It was important for programmers to know and use registers/instruction-formats etc very effectively at one time. At some time it stopped being relevant. Or card-punches, Or large central ACs, Or tape drives, Or interrupts. Some of these have just died; some remain in specialized places; some are ubiquitous but with enough strong abstractions that vanilla programmers dont ever think of these nowadays The same holds for broad areas Cryptography is where CS started in WWII; vanished thereafter; reappeared in specialized quarters of late Numerical computing was the hi-cathedral for the next few decades By the time I was a student in the 80s it was there but somehow felt old and past-tense. I expect recently instituted courses dont have it; or have it specialized. These shifts are elaborated somewhat more https://mail.python.org/pipermail/python-list/2016-June/710678.html and sequel Coming to data structures (and algorithms) yes it (they) are important courses… and widely misunderstood Here is Bob Harper *quoting a developer* everyone knows that algorithms as we learned them at school are irrelevant to practice." https://www.cs.cmu.edu/~rwh/papers/secp/secp.pdf (pg 2) [Harper is a senior respected faculty at Carnegie Mellon] I think the same applies to data structures with redoubled force: Most of what is called data structures — linked lists etc — should really be called *storage structures* ie How to survive and do useful work when your system/software is highly crippling. Greenspun's 10th law: Any sufficiently complicated C program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Lisp. If you understand that python is in a sense closer to Lisp than to C, maybe you see that there are broadly two choices - Traditional CS-edu view... C, C++, Java, OO-gook, data-structures, algorithms etc where the basic philosophy is Crippling is a way of life; Be Brave! And hobble more vigorously!‘ - The alternative CS view as understood by Harper, MIT and other progressive places embracing the ‘functional’ style (stupid word if you ask me) The idea being to throw out 70-80 % of traditional CS from the curriculum and get students doing good stuff much faster by putting them on a *different* learning curve. That this stuff is a bit bleeding edge can be seen in the fact that ACM curriculum has started embracing “functional” only in 2013 — ie 50 years after Lisp: http://blog.languager.org/2015/06/functional-programming-moving-target.html Sorry for a ramble. Summary: Yes the languages we use to teach need to be rethought So does our teaching ;-) -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
MRAB: > Given that Python has dict, there's not much need for a binary search tree. Dicts don't have the concept of key order. I use my own AVL tree to implement timers. A balanced tree data structure is the only major data structure I've missed in Python. It is there in Java and C++, for example. The Linux kernel makes use of it as well. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On 2016-07-17 08:19, Chris Angelico wrote: > Why do you need a linked list? That's an implementation detail; why > not simply use a regular list? > > Not trolling, genuinely asking. Is there something that you > specifically need those exact structures for? I know there have been times I want known performance characteristics. My main reason for wanting linked lists is usually for stacks/queues with O(1) push/pop, and I understand that a deque handles most of that fairly efficiently ("approximately the same O(1) performance in either direction"). The bisect and heapq modules also help with some of my usual use-cases for BSTs (presuming "BST" unpacks as "binary search tree"), while nested arrays/dicts usually serve for most of the rest of my other tree/trie needs. So usually I'm less concerned with the actual algorithm name than I am with the "I want to {push,pop,search,insert,delete} in O(f) time" where O(f) is usually either O(1) or O(N log N), instead of the alternatives which might use O(N), O(N**k), or worse, O(k**N). For anything beyond those basic CS data-structures, there's usually something conveniently in PyPI. -tkc -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Sun, 17 Jul 2016 08:14 am, shrey.de...@gmail.com wrote: > I have found it slightly frustrating that Python does not have built-in > support for advanced data structures (Linked Lists, Stacks/Queues, BST) in > its distribution. They are hardly "advanced" data structures. They are trivial data structures. When I did an undergraduate course in computer science, we were expected to write them ourselves, from scratch. Linked lists are primitive data structures used by languages that don't have Python's advanced list data structure. For nearly everything that you think you want a linked list, you will be better off to use a built-in list. For stacks and queues, use collections.deque. For binary search trees, you will mostly use a dict. If there are exceptions, they are unusual. > Many computer science students, developers, and software > engineers rely on these data structures; having the data structures be a > part of the distribution and be maintained by the Python community would > be immensely useful. They really wouldn't be. I cannot imagine when anyone would want to use a linked list when lists are available. That would be a very big step backwards in both performance and power: harder to use, and slower. [...] > I'm looking to create a PEP for this issue and some people that would be > willing to 1) vouch for this idea and 2) co-author the draft. Eventually, > we would be co-developers for the project as well. Perhaps I'm wrong, but it sounds to me that you should spend more time learning Python and less time trying to mechanically translate C algorithms into Python code. Python doesn't have a linked list class because it is unnecessary. Python doesn't need a dedicated stack or single-threaded queue class, because it has deque. (As a comp sci undergrad, you have probably heard of double-ended queues.) Python already has a thread-safe queue. *Maybe* there's a use-case for a self-balancing tree structure in Python, but which one? How often do you need to use keys that can't be hashed? -- Steven “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Sun, Jul 17, 2016 at 1:27 PM,wrote: >> Right, but how did you *get* that deep into the list? By following a >> chain of pointers. That's a relatively costly operation, so the >> benefit of not having to move all the following elements is damaged >> some by the cost of chasing pointers to get there in the first place. > > > No, you're assuming too much here. Consider: > > LL = LinkedList() > item = LL.insert( (some, tuple, value) ) > ... do lots of stuff with the list ... > ... now item refers to something that might be anywhere ... > item.add_after( (context, for, a, new, item, in , the, list) ) > ... > > and any number of other scenarios of similar nature: note a node in the list > and get to done things at or near that node at an arbirary other time. > Sure, that makes sense. The ability to hang onto a "list position" is a useful one, I don't deny that. (Indices work if all you ever do is append, but your add_after will invalidate any indices after that.) I just very much doubt that the linked list will afford any overall performance improvement over the standard CPython built-in list object - and if it does, you're definitely in the realm of special-purpose code for a specific situation. And there's nothing wrong with that. Having not myself had a good early grounding in data structure design, I think it's something worth teaching. And Python's fine at doing that - the concepts of a linked list translate nicely into Python objects and attributes - even though it'll never actually be something Python needs. Once you've learned what the different fundamental structures are, you'll have a better understanding of what's going on (for instance, knowing the advantages and critical disadvantages of hashtables like CPython's dict), and possibly know when to roll your own instead of using someone else's (knowing that "lst.insert(0, lst.pop(idx))" is slow, you might instead "lst.append(lst.pop(idx))" and have your list in reverse order - a good start, but maybe you'd decide to go linked-list anyway). But for most jobs, just think in terms of abstract data structures, and trust the language to do the details. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On 17Jul2016 12:43, Chris Angelicowrote: On Sun, Jul 17, 2016 at 12:33 PM, Paul Rubin wrote: Chris Angelico writes: keep a reference to an element deep in the list, and insert a new element in O(1) time at that point. at the C level, wouldn't tracing the links cost massively more than the occasional insertion too? I'm not sure O(1) is of value at any size, if the costs of all your other operations go up. I think the idea is that you're already deep in the list when you decide to insert an element or do other surgery on the list. An example might be a lookup table with linear search, where you want to bring the LRU item to the front of the list after finding it. Really though, that's an ugly thing to be doing in any language, and it definitely isn't something that comes up much in Python. Right, but how did you *get* that deep into the list? By following a chain of pointers. That's a relatively costly operation, so the benefit of not having to move all the following elements is damaged some by the cost of chasing pointers to get there in the first place. No, you're assuming too much here. Consider: LL = LinkedList() item = LL.insert( (some, tuple, value) ) ... do lots of stuff with the list ... ... now item refers to something that might be anywhere ... item.add_after( (context, for, a, new, item, in , the, list) ) ... and any number of other scenarios of similar nature: note a node in the list and get to done things at or near that node at an arbirary other time. This applies to _any_ graph like data structure where nodes would have to be found by traversal. Anyway, I'm not arguing that Pythons basic list type doesn't address a great many needs. I'm arguing that no one size fits all. The core strength of a linked list is O(1) insertion at any point, provided you have a reference to that point. Whether that is enough benefit depends on the use case. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On 2016-07-17 03:33, Paul Rubin wrote: Chris Angelicowrites: keep a reference to an element deep in the list, and insert a new element in O(1) time at that point. at the C level, wouldn't tracing the links cost massively more than the occasional insertion too? I'm not sure O(1) is of value at any size, if the costs of all your other operations go up. I think the idea is that you're already deep in the list when you decide to insert an element or do other surgery on the list. An example might be a lookup table with linear search, where you want to bring the LRU item to the front of the list after finding it. Really though, that's an ugly thing to be doing in any language, and it definitely isn't something that comes up much in Python. I once sped up lookups on a doubly-linked list by adding a dict that would take me straight to the appropriate node. This was in C, though. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Sun, Jul 17, 2016 at 12:33 PM, Paul Rubinwrote: > Chris Angelico writes: >>> keep a reference to an element deep in the list, and insert a new >>> element in O(1) time at that point. >> at the C level, wouldn't tracing the links cost massively more than >> the occasional insertion too? I'm not sure O(1) is of value at any >> size, if the costs of all your other operations go up. > > I think the idea is that you're already deep in the list when you decide > to insert an element or do other surgery on the list. An example might > be a lookup table with linear search, where you want to bring the LRU > item to the front of the list after finding it. Really though, that's > an ugly thing to be doing in any language, and it definitely isn't > something that comes up much in Python. Right, but how did you *get* that deep into the list? By following a chain of pointers. That's a relatively costly operation, so the benefit of not having to move all the following elements is damaged some by the cost of chasing pointers to get there in the first place. So overall, performance would be better with the high-performance list, even if it does mean moving a bunch of elements (when you delete some). Since it's a difference in asymptotic cost, there would theoretically be some number of elements after which it would be cheaper to use the linked list... but maybe the cost of chasing pointers goes up even more, to make that never happen. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
Chris Angelicowrites: >> keep a reference to an element deep in the list, and insert a new >> element in O(1) time at that point. > at the C level, wouldn't tracing the links cost massively more than > the occasional insertion too? I'm not sure O(1) is of value at any > size, if the costs of all your other operations go up. I think the idea is that you're already deep in the list when you decide to insert an element or do other surgery on the list. An example might be a lookup table with linear search, where you want to bring the LRU item to the front of the list after finding it. Really though, that's an ugly thing to be doing in any language, and it definitely isn't something that comes up much in Python. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Sun, Jul 17, 2016 at 10:54 AM,wrote: > Well, in a larger context you can keep a reference to an element deep in the > list, and insert a new element in O(1) time at that point. > I'd like to know how many elements your list needs before that actually becomes faster than CPython's heavily-optimized C-implemented list structure. And if someone's proposing a new core data type, I very much doubt that'll fly - and at the C level, wouldn't tracing the links cost massively more than the occasional insertion too? I'm not sure O(1) is of value at any size, if the costs of all your other operations go up. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On 16Jul2016 19:49, Dennis Lee Bieberwrote: On Sat, 16 Jul 2016 15:33:12 -0700 (PDT), Shrey Desai declaimed the following: - Education: the Linked List is a core data structure that CS undergraduates (among others) use and study, so it is vital that they have hands on access to them. A list is basically a dynamic array; the only property it shares with a Linked List is that it's dynamic. My CS courses required us to implement our own linked lists (in FORTRAN IV yet). Providing a built-in linked list just negates the educational aspect. I was thinking this exact same thing. - Development: the use of correct data structures is important when developing applications, especially for issues like scaling and efficiency. For instance, when working with polynomials, Linked Lists provide a great interface to access and edit them. What does having links gain you that you don't get from a regular list with slicing? Well, in a larger context you can keep a reference to an element deep in the list, and insert a new element in O(1) time at that point. The usualy argument against providing linked lists and other basic graph related functions in the stdlib is that there are in practice many many decisions one can make about exactly what kind of implementation specifics one might like in such a thing, and what methods to present for use with the thing. On the flip side, implementing a simple linked list in a specific context is pretty trivial as Chris has demonstrated. I do take the point that a preexisting (and hopefully debugged) class would obviate a testing burden for users. Provided the class provided enough. If it didn't the end user is back to replacing or extending it, and having to test anyway. To the OP: have you looked in PyPI and found no graph classes? Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On 2016-07-17 01:10, Terry Reedy wrote: On 7/16/2016 6:14 PM, shrey.de...@gmail.com wrote: I have found it slightly frustrating that Python does not have built-in support for advanced data structures (Linked Lists, You and I have different ideas of 'advanced data structures' ;-). To me, linked list are limited structures used in functional programming to make mutable structure from immutable-except-on-creation cells. In any case, one can easily use tuples to create branching structures. Tuples and lists containing tuples and lists are routine in python programming. Wrapping such usage in a LinkedList class is optional -- and unusual. They're the kind of things I'd write when using C, but, then, C is missing a lot of stuff! :-) Stacks/Queues, Nearly two decades ago, I promoted the addition of the list.pop method as the inverse of list.append, in order to make lists easily usable as stacks. This is routine in python code today. collections.deque instances are advanced data structures that can be used as stacks, queues, or both, at either end. The class has tests that I presume are rigorous. BST) British Summer Time? (Suggestion from Google) Binary search tree. Given that Python has dict, there's not much need for a binary search tree. [snip] -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On 7/16/2016 6:14 PM, shrey.de...@gmail.com wrote: I have found it slightly frustrating that Python does not have built-in support for advanced data structures (Linked Lists, You and I have different ideas of 'advanced data structures' ;-). To me, linked list are limited structures used in functional programming to make mutable structure from immutable-except-on-creation cells. In any case, one can easily use tuples to create branching structures. Tuples and lists containing tuples and lists are routine in python programming. Wrapping such usage in a LinkedList class is optional -- and unusual. Stacks/Queues, Nearly two decades ago, I promoted the addition of the list.pop method as the inverse of list.append, in order to make lists easily usable as stacks. This is routine in python code today. collections.deque instances are advanced data structures that can be used as stacks, queues, or both, at either end. The class has tests that I presume are rigorous. BST) British Summer Time? (Suggestion from Google) Currently, we are required to write our own modules that represent these data structures and rigorously test/refactor them before we can actually use them. If an instructor makes you wrap the structures that Python provides before you can use then, that is between you and the instructor, and not our doing. The instructor could let you use Python as it is or hand you the wrapping he likes. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Saturday, July 16, 2016 at 11:15:04 PM UTC+1, Shrey Desai wrote: > I have found it slightly frustrating that Python does not have built-in > support for advanced data structures (Linked Lists, Stacks/Queues, BST) in > its distribution. Many computer science students, developers, and software > engineers rely on these data structures; having the data structures be a part > of the distribution and be maintained by the Python community would be > immensely useful. > > Currently, we are required to write our own modules that represent these data > structures and rigorously test/refactor them before we can actually use them. > This gets annoying because instead of spending time USING the "correct" > version of the data structure, we have to spend time creating them in the > first place. > > Programming languages like Java have support for Linked Lists, for example, > which makes it easy to just use it instead of trying to create it over again. > As a computer science undergraduate student, I don't want to spend time > writing the module but instead I want to work with it, play around with it, > and do problems with it. > > I know Python currently has a Queue module, but this can definitely be > expanded. There are other more advanced data structures out there, like AVL > trees, splay trees, and tries, but I think that would be overkilll. Having > these data structures above would be immensely useful. > > I'm looking to create a PEP for this issue and some people that would be > willing to 1) vouch for this idea and 2) co-author the draft. Eventually, we > would be co-developers for the project as well. > > Who's in? Thanks but no thanks :) I find the structures listed here http://kmike.ru/python-data-structures and here https://pypi.python.org/pypi/sortedcontainers more than adequate for my needs. Cheers. Mark Lawrence. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
shrey.de...@gmail.com writes: > As a computer science undergraduate student, I don't want to spend > time writing the module but instead I want to work with it, play > around with it, and do problems with it. For educational purposes, I think writing the module yourself is part of the idea. Also, Python isn't really the right language for that. Most things in Python are done with Python lists (elastic vectors), deques/queues, or dictionaries. That's almost always sufficient in practice. I've occasionally wanted an AVL tree for shared dictionaries but that's about it. The C++ standard template library (STL) and Boost have most of what you want, at a lower level than Python. Or if you really want to be hardcore, implement everything yourself in assembler. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Sun, Jul 17, 2016 at 8:33 AM, Shrey Desaiwrote: > Hi Chris, thanks for the reply. There's a couple of reasons why I would need > a Linked List (and other data structures): > - Education: the Linked List is a core data structure that CS undergraduates > (among others) use and study, so it is vital that they have hands on access > to them. A list is basically a dynamic array; the only property it shares > with a Linked List is that it's dynamic. > - Development: the use of correct data structures is important when > developing applications, especially for issues like scaling and efficiency. > For instance, when working with polynomials, Linked Lists provide a great > interface to access and edit them. > For education, I wouldn't worry too much about performance or edge cases, and just use something really simple: class LinkedList: def __init__(self): self.head = self.tail = None def append(self, item): if self.tail: self.tail = self.tail.append(item) self.head = self.tail = _listitem(item) class _listitem: def __init__(self, item): self.item = item self.next = None def append(self, item): self.next = type(self)(item) return self.next Then add other methods as you teach them (iteration, item removal, etc). Performance is probably terrible, but who cares? You won't notice it in sample code. For development, I'm pretty sure the native CPython list type is going to out-perform anything you could implement as a linked list. Even more so if you're using PyPy, where it knows how to optimize the list (but won't necessarily be able to optimize your type). Instead of messing around with data types, just implement your code using the most simple and obvious style, and worry about performance/scaling/efficiency only if and when you find out that that's a bottleneck. I very much doubt that it will be. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Saturday, July 16, 2016 at 3:19:56 PM UTC-7, Chris Angelico wrote: > On Sun, Jul 17, 2016 at 8:14 AM,wrote: > > I have found it slightly frustrating that Python does not have built-in > > support for advanced data structures (Linked Lists, Stacks/Queues, BST) in > > its distribution. Many computer science students, developers, and software > > engineers rely on these data structures; having the data structures be a > > part of the distribution and be maintained by the Python community would be > > immensely useful. > > > > Why do you need a linked list? That's an implementation detail; why > not simply use a regular list? > > Not trolling, genuinely asking. Is there something that you > specifically need those exact structures for? > > Also: You may find what you want on PyPI. There's no need for them to > be in the core language if they can be published as third-party > modules. > > ChrisA Hi Chris, thanks for the reply. There's a couple of reasons why I would need a Linked List (and other data structures): - Education: the Linked List is a core data structure that CS undergraduates (among others) use and study, so it is vital that they have hands on access to them. A list is basically a dynamic array; the only property it shares with a Linked List is that it's dynamic. - Development: the use of correct data structures is important when developing applications, especially for issues like scaling and efficiency. For instance, when working with polynomials, Linked Lists provide a great interface to access and edit them. Also, I have a couple of issues with them being published as third-party modules: 1. These data structures have to be rock solid. They should be clearly defined, rigorously tested, and they should simply work. Python would put more emphasis on its development rather than some external developer that might quit half way. 2. Some external packages might contain better data structures than others. For instance, there might be some "DataStructures" package that contains everything, but maybe some developer published a "LinkedList" package that has the best Linked List implementation and some other developer published a "BST" package that has the best BST implementation. Why mix and match when everything can be in one place? 3. Python, fundamentally, is an easy to use language and it should stay that way. Beginners shouldn't have to worry about installing external modules when the data structures package is in the distribution and all they need to do to use it is import it. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP Request: Advanced Data Structures
On Sun, Jul 17, 2016 at 8:14 AM,wrote: > I have found it slightly frustrating that Python does not have built-in > support for advanced data structures (Linked Lists, Stacks/Queues, BST) in > its distribution. Many computer science students, developers, and software > engineers rely on these data structures; having the data structures be a part > of the distribution and be maintained by the Python community would be > immensely useful. > Why do you need a linked list? That's an implementation detail; why not simply use a regular list? Not trolling, genuinely asking. Is there something that you specifically need those exact structures for? Also: You may find what you want on PyPI. There's no need for them to be in the core language if they can be published as third-party modules. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
PEP Request: Advanced Data Structures
I have found it slightly frustrating that Python does not have built-in support for advanced data structures (Linked Lists, Stacks/Queues, BST) in its distribution. Many computer science students, developers, and software engineers rely on these data structures; having the data structures be a part of the distribution and be maintained by the Python community would be immensely useful. Currently, we are required to write our own modules that represent these data structures and rigorously test/refactor them before we can actually use them. This gets annoying because instead of spending time USING the "correct" version of the data structure, we have to spend time creating them in the first place. Programming languages like Java have support for Linked Lists, for example, which makes it easy to just use it instead of trying to create it over again. As a computer science undergraduate student, I don't want to spend time writing the module but instead I want to work with it, play around with it, and do problems with it. I know Python currently has a Queue module, but this can definitely be expanded. There are other more advanced data structures out there, like AVL trees, splay trees, and tries, but I think that would be overkilll. Having these data structures above would be immensely useful. I'm looking to create a PEP for this issue and some people that would be willing to 1) vouch for this idea and 2) co-author the draft. Eventually, we would be co-developers for the project as well. Who's in? -- https://mail.python.org/mailman/listinfo/python-list