Re: dbf.py API question concerning Index.index_search()
On 16/08/12 01:26:09, Ethan Furman wrote: Indexes have a new method (rebirth of an old one, really): .index_search( match, start=None, stop=None, nearest=False, partial=False ) The defaults are to search the entire index for exact matches and raise NotFoundError if it can't find anything. match is the search criteria start and stop is the range to search in nearest returns where the match should be instead of raising an error partial will find partial matches The question is what should the return value be? I don't like the usual pattern of -1 meaning not found (as in 'nothere'.find('a')), so I thought a fun and interesting way would be to subclass long and override the __nonzero__ method to return True/False based on whether the (partial) match was found. The main problems I see here is that the special return value reverts to a normal int/long if anything is done to it (adding, subtracting, etc), and the found status is lost. The other option is returning a (number, bool) tuple -- safer, yet more boring... ;) I think you should go for the safe boring option, because in many use cases the caller will need to known whether the number you're returning is the index of a match or just the nearest non-match. The caller could redo the match to find out. But you have already done the match, so you might as well tell them the result. Hope this helps, -- HansM -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
MRAB wrote: On 16/08/2012 02:22, Ethan Furman wrote: Steven D'Aprano wrote: On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote: Indexes have a new method (rebirth of an old one, really): .index_search( match, start=None, stop=None, nearest=False, partial=False ) [...] Why index_search rather than just search? Because search already exists and returns a dbf.List of all matching records. Perhaps that should've been called find_all! In interesting thought. Currently there are: .index(data) -- returns index of data in Index, or raises error .query(string) -- brute force search, returns all matching records .search(match) -- binary search through table, returns all matching records 'index' and 'query' are supported by Tables, Lists, and Indexes; search (and now index_search) are only supported on Indexes. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
On 16/08/2012 17:13, Ethan Furman wrote: MRAB wrote: On 16/08/2012 02:22, Ethan Furman wrote: Steven D'Aprano wrote: On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote: Indexes have a new method (rebirth of an old one, really): .index_search( match, start=None, stop=None, nearest=False, partial=False ) [...] Why index_search rather than just search? Because search already exists and returns a dbf.List of all matching records. Perhaps that should've been called find_all! In interesting thought. Currently there are: .index(data) -- returns index of data in Index, or raises error .query(string) -- brute force search, returns all matching records .search(match) -- binary search through table, returns all matching records 'index' and 'query' are supported by Tables, Lists, and Indexes; search (and now index_search) are only supported on Indexes. What exactly is the difference between .index and .index_search with the default arguments? -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
MRAB wrote: On 16/08/2012 17:13, Ethan Furman wrote: Currently there are: .index(data) -- returns index of data in Index, or raises error .query(string) -- brute force search, returns all matching records .search(match) -- binary search through table, returns all matching records 'index' and 'query' are supported by Tables, Lists, and Indexes; search (and now index_search) are only supported on Indexes. What exactly is the difference between .index and .index_search with the default arguments? .index requires a data structure that can be compared to a record (another record, a dictionary with the same field/key names, or a list/tuple with values in the same order as the fields). It returns the index or raises NotFoundError. It is brute force. .index_search requires match criteria (a tuple with the desired values in the same order as the key). It returns the index or raises NotFoundError (unless nearest is True -- then the value returned is where the match should be). It is binary search. So the only similarity is that they both return a number or raise NotFoundError. What they use for the search and how they perform the search are both completely different. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
On 08/15/12 18:26, Ethan Furman wrote: .index_search( match, start=None, stop=None, nearest=False, partial=False ) The defaults are to search the entire index for exact matches and raise NotFoundError if it can't find anything. The question is what should the return value be? I don't like the usual pattern of -1 meaning not found (as in 'nothere'.find('a')), so I thought a fun and interesting way would be to subclass long and override the __nonzero__ method to return True/False based on whether the (partial) match was found. The main problems I see here is that the special return value reverts to a normal int/long if anything is done to it (adding, subtracting, etc), and the found status is lost. The other option is returning a (number, bool) tuple -- safer, yet more boring... ;) I'm not quite sure I follow...you start off by saying that it will raise NotFoundError if it can't find anything. So if it finds something, just return it. Because if it found the item, it gives it to you; if it didn't find the item, it raised an error. That sounds like a good (easy to understand) interface, similar to how string.index() works. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
Tim Chase wrote: On 08/15/12 18:26, Ethan Furman wrote: .index_search( match, start=None, stop=None, nearest=False, partial=False ) The defaults are to search the entire index for exact matches and raise NotFoundError if it can't find anything. The question is what should the return value be? I don't like the usual pattern of -1 meaning not found (as in 'nothere'.find('a')), so I thought a fun and interesting way would be to subclass long and override the __nonzero__ method to return True/False based on whether the (partial) match was found. The main problems I see here is that the special return value reverts to a normal int/long if anything is done to it (adding, subtracting, etc), and the found status is lost. The other option is returning a (number, bool) tuple -- safer, yet more boring... ;) I'm not quite sure I follow...you start off by saying that it will raise NotFoundError if it can't find anything. So if it finds something, just return it. Because if it found the item, it gives it to you; if it didn't find the item, it raised an error. That sounds like a good (easy to understand) interface, similar to how string.index() works. Indeed, it's even less clear without the part you snipped. ;) Which wasn't very. The well-hidden clue was this line: nearest returns where the match should be instead of raising an error And my question should have been: What should the return value be when nearest == True? My bit of fun was this class: class IndexLocation(long): used by Index.index_search -- represents the index where the match criteria is if True, or would be if False def __new__(cls, value, found): value is the number, found is True/False result = long.__new__(cls, value) result.found = found return result def __nonzero__(self): return self.found ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
On 08/15/12 19:21, Ethan Furman wrote: The well-hidden clue was this line: nearest returns where the match should be instead of raising an error And my question should have been: What should the return value be when nearest == True? Ah, well that's somewhat clearer. Return the closest and not bother to let the user know it was inexact. Upon requesting it with nearest=True, they *knew* that the result might be a nearest match. Though if they ask for nearest, an exact match *better* be the nearest if it exists. :-P I'd say the API-user shouldn't ask for what they don't want. -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote: Indexes have a new method (rebirth of an old one, really): .index_search( match, start=None, stop=None, nearest=False, partial=False ) [...] Why index_search rather than just search? The question is what should the return value be? I don't like the usual pattern of -1 meaning not found (as in 'nothere'.find('a')) And you are right not to. The problem with returning -1 as a not found sentinel is that if it is mistakenly used where you would use a found result, your code silently does the wrong thing instead of giving an exception. So pick a sentinel value which *cannot* be used as a successful found result. Since successful searches return integer offsets (yes?), one possible sentinel might be None. (That's what re.search and re.match return instead of a MatchObject.) But first ensure that None is *not* valid input to any of your methods that take an integer. For example, if str.find was changed to return None instead of -1 that would not solve the problem, because None is a valid argument for slices: p = mystring.find(:) print(mystring[p:-1]) # Oops, no better with None You don't have to predict every imaginable failure mode or defend against utterly incompetent programmers, just against the obvious failure modes. If None is not suitable as a sentinel, create a constant value that can't be mistaken for anything else: class NotFoundType(object): def __repr__(self): return Not Found __str__ = __repr__ NOTFOUND = NotFoundType() del NotFoundType and then return that. (By the way, I'm assuming that negative offsets are valid for your application. If they aren't, then using -1 as sentinel is perfectly safe, since passing a not found -1 as offset to another method will result in an immediate exception.) The other option is returning a (number, bool) tuple -- safer, yet more boring... ;) Boring is good, but it is also a PITA to use, and that's not good. I never remember whether the signature is (offset, flag) or (flag, offset), and if you get it wrong, your code will probably fail silently: py flag, offset = (23, False) # Oops, I got it wrong. py if flag: ... print(hello world[offset+1:]) ... ello world -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
Steven D'Aprano wrote: On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote: Indexes have a new method (rebirth of an old one, really): .index_search( match, start=None, stop=None, nearest=False, partial=False ) [...] Why index_search rather than just search? Because search already exists and returns a dbf.List of all matching records. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
On 16/08/2012 01:28, Tim Chase wrote: On 08/15/12 19:21, Ethan Furman wrote: The well-hidden clue was this line: nearest returns where the match should be instead of raising an error And my question should have been: What should the return value be when nearest == True? Ah, well that's somewhat clearer. Return the closest and not bother to let the user know it was inexact. Upon requesting it with nearest=True, they *knew* that the result might be a nearest match. Though if they ask for nearest, an exact match *better* be the nearest if it exists. :-P I'd say the API-user shouldn't ask for what they don't want. +1 -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question concerning Index.index_search()
On 16/08/2012 02:22, Ethan Furman wrote: Steven D'Aprano wrote: On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote: Indexes have a new method (rebirth of an old one, really): .index_search( match, start=None, stop=None, nearest=False, partial=False ) [...] Why index_search rather than just search? Because search already exists and returns a dbf.List of all matching records. Perhaps that should've been called find_all! -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
Ed Leafe wrote: When converting from paradigms in other languages, I've often been tempted to follow the accepted pattern for that language, and I've almost always regretted it. +1 When in doubt, make it as Pythonic as possible. +1 QOTW ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
On Wed, Aug 8, 2012 at 5:18 PM, Ethan Furman et...@stoneleaf.us wrote: Ed Leafe wrote: When converting from paradigms in other languages, I've often been tempted to follow the accepted pattern for that language, and I've almost always regretted it. +1 When in doubt, make it as Pythonic as possible. +1 QOTW ~Ethan~ +2 from me as well. Totally in spirit with the Zen of Python! -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
On Aug 2, 2012, at 10:55 AM, Ethan Furman wrote: SQLite has a neat feature where if you give it a the file-name of ':memory:' the resulting table is in memory and not on disk. I thought it was a cool feature, but expanded it slightly: any name surrounded by colons results in an in-memory table. I'm looking at the same type of situation with indices, but now I'm wondering if the :name: method is not pythonic and I should use a flag (in_memory=True) when memory storage instead of disk storage is desired. When converting from paradigms in other languages, I've often been tempted to follow the accepted pattern for that language, and I've almost always regretted it. When in doubt, make it as Pythonic as possible. -- Ed Leafe -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
[redirecting back to list] Ole Martin Bjørndalen wrote: On Sun, Aug 5, 2012 at 4:09 PM, Ethan Furman et...@stoneleaf.us wrote: Ole Martin Bjørndalen wrote: You can do this by implementing either __getitem__ or __iter__, unless the streaming flag would also make your table not in memory. Cool! Wow! I realize now that this could in fact be fairly easy to implement. I just have to shuffle around the code a bit to make both possible. The API would be: # Returns table object which is a subclass of list table = dbfget.read('cables.dbf') for rec in table: print rec # Return a table object which behaves like an iterator table = dbfget.read('cables.dbf', iter=True) for rec in table: print rec I have a lot of questions in my mind about how to get this to work, but I feel like it's the right thing to do. I will make an attempt at a rewrite and get back to you all later. One more API question: I am uncomfortable with: dbfget.read() Should it just be: dbfget.get() ? - Ole `dbfget` is the package name, and `read()` or `get` is the class/function that loads the table into memory and returns it? Maybe `load()`? ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
Ole Martin Bjørndalen wrote: On Thu, Aug 2, 2012 at 5:55 PM, Ethan Furman et...@stoneleaf.us wrote: SQLite has a neat feature where if you give it a the file-name of ':memory:' the resulting table is in memory and not on disk. I thought it was a cool feature, but expanded it slightly: any name surrounded by colons results in an in-memory table. I'm looking at the same type of situation with indices, but now I'm wondering if the :name: method is not pythonic and I should use a flag (in_memory=True) when memory storage instead of disk storage is desired. Thoughts? I agree that the flag would be more pythonic in dbf.py. I was not aware that you are adding sqlite functionality to your library. This is very cool! Actually, I'm not. I had stumbled across that one tidbit and thought it was cool, but cool is not always pythonic. ;) I am considering adding a streaming=True flag which would make the table class a record generator, You can do this by implementing either __getitem__ or __iter__, unless the streaming flag would also make your table not in memory. I hope this can help you somehow in your decision making process. All comments appreciated. Thanks! ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
On Thu, Aug 2, 2012 at 5:55 PM, Ethan Furman et...@stoneleaf.us wrote: SQLite has a neat feature where if you give it a the file-name of ':memory:' the resulting table is in memory and not on disk. I thought it was a cool feature, but expanded it slightly: any name surrounded by colons results in an in-memory table. I'm looking at the same type of situation with indices, but now I'm wondering if the :name: method is not pythonic and I should use a flag (in_memory=True) when memory storage instead of disk storage is desired. Thoughts? I agree that the flag would be more pythonic in dbf.py. I was not aware that you are adding sqlite functionality to your library. This is very cool! I have been through the same questions with my own DBF library, and I've come to some conclusions: First, I decided to make the library read-only and in-memory. That is all we need in-house anyway. Second, I decided to make an external tool for converting DBF files to sqlite: https://github.com/olemb/dbfget/blob/master/extras/dbf2sqlite (To anyone reading: I have not yet made a public announcement of dbfget, but I will shortly. Consider this an informal announcement: https://github.com/olemb/dbfget/ ) I am considering adding a streaming=True flag which would make the table class a record generator, and a save() method which would allow you to save data back to the file, or to a new file if you provide an optional file name. In fact, I had this functionality in earlier versions, but decided to chuck it out in order to make the API as clean as possible. I hope this can help you somehow in your decision making process. -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
Ethan Furman wrote: SQLite has a neat feature where if you give it a the file-name of ':memory:' the resulting table is in memory and not on disk. I thought it was a cool feature, but expanded it slightly: any name surrounded by colons results in an in-memory table. I'm looking at the same type of situation with indices, but now I'm wondering if the :name: method is not pythonic and I should use a flag (in_memory=True) when memory storage instead of disk storage is desired. For SQLite it seems OK because you make the decision once per database. For dbase it'd be once per table, so I would prefer the flag. Random Thoughts? - Do you really want your users to work with multiple dbf files? I think I'd rather convert to SQLite, perform the desired operations using sql, then convert back. - Are names required to manipulate the table? If not you could just omit them to make the table in-memory. - How about a connection object that may either correspond to a directory or RAM: db = dbf.connect(:memory:) table = db.Table(foo, ...) -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
Peter Otten wrote: Ethan Furman wrote: SQLite has a neat feature where if you give it a the file-name of ':memory:' the resulting table is in memory and not on disk. I thought it was a cool feature, but expanded it slightly: any name surrounded by colons results in an in-memory table. I'm looking at the same type of situation with indices, but now I'm wondering if the :name: method is not pythonic and I should use a flag (in_memory=True) when memory storage instead of disk storage is desired. For SQLite it seems OK because you make the decision once per database. For dbase it'd be once per table, so I would prefer the flag. So far all feedback is for the flag, so that's what I'll do. Random Thoughts? - Do you really want your users to work with multiple dbf files? I think I'd rather convert to SQLite, perform the desired operations using sql, then convert back. Seems like that would be quite a slow-down (although if a user wants to do that, s/he certainly could). - Are names required to manipulate the table? If not you could just omit them to make the table in-memory. At one point I had thought to make tables singletons (so only one copy of /user/bob/scores.dbf) but that hasn't happened and is rather low priority, so at this point the name is not required for anything beside initial object creation. - How about a connection object that may either correspond to a directory or RAM: db = dbf.connect(:memory:) table = db.Table(foo, ...) dbf.py does not support the DB-API interface, so no connection objects. Tables are opened directly and dealt with directly. All interesting thoughts that made me think. Thank you. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: dbf.py API question
On 08/03/12 08:11, Ethan Furman wrote: So far all feedback is for the flag, so that's what I'll do. I agree with the flag, though would also be reasonably content with using None for the filename to indicate in-memory rather than on-disk storage. -tkc -- http://mail.python.org/mailman/listinfo/python-list