Re: dbf.py API question concerning Index.index_search()

2012-08-16 Thread Hans Mulder
On 16/08/12 01:26:09, Ethan Furman wrote:
 Indexes have a new method (rebirth of an old one, really):
 
   .index_search(
  match,
  start=None,
  stop=None,
  nearest=False,
  partial=False )
 
 The defaults are to search the entire index for exact matches and raise
 NotFoundError if it can't find anything.
 
 match is the search criteria
 start and stop is the range to search in
 nearest returns where the match should be instead of raising an error
 partial will find partial matches
 
 The question is what should the return value be?
 
 I don't like the usual pattern of -1 meaning not found (as in
 'nothere'.find('a')), so I thought a fun and interesting way would be to
 subclass long and override the __nonzero__ method to return True/False
 based on whether the (partial) match was found.  The main problems I see
 here is that the special return value reverts to a normal int/long if
 anything is done to it (adding, subtracting, etc), and the found status
 is lost.
 
 The other option is returning a (number, bool) tuple -- safer, yet more
 boring... ;)

I think you should go for the safe boring option, because in many use
cases the caller will need to known whether the number you're returning
is the index of a match or just the nearest non-match.  The caller could
redo the match to find out.  But you have already done the match, so you
might as well tell them the result.


Hope this helps,

-- HansM

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-16 Thread Ethan Furman

MRAB wrote:

On 16/08/2012 02:22, Ethan Furman wrote:

Steven D'Aprano wrote:

On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:


Indexes have a new method (rebirth of an old one, really):

   .index_search(
  match,
  start=None,
  stop=None,
  nearest=False,
  partial=False )

[...]

Why index_search rather than just search?


Because search already exists and returns a dbf.List of all matching
records.


Perhaps that should've been called find_all!


In interesting thought.

Currently there are:

  .index(data)   -- returns index of data in Index, or raises error
  .query(string) -- brute force search, returns all matching records
  .search(match) -- binary search through table, returns all matching
 records

'index' and 'query' are supported by Tables, Lists, and Indexes; search 
(and now index_search) are only supported on Indexes.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-16 Thread MRAB

On 16/08/2012 17:13, Ethan Furman wrote:

MRAB wrote:

On 16/08/2012 02:22, Ethan Furman wrote:

Steven D'Aprano wrote:

On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:


Indexes have a new method (rebirth of an old one, really):

   .index_search(
  match,
  start=None,
  stop=None,
  nearest=False,
  partial=False )

[...]

Why index_search rather than just search?


Because search already exists and returns a dbf.List of all matching
records.


Perhaps that should've been called find_all!


In interesting thought.

Currently there are:

.index(data)   -- returns index of data in Index, or raises error
.query(string) -- brute force search, returns all matching records
.search(match) -- binary search through table, returns all matching
   records

'index' and 'query' are supported by Tables, Lists, and Indexes; search
(and now index_search) are only supported on Indexes.


What exactly is the difference between .index and .index_search with
the default arguments?
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-16 Thread Ethan Furman

MRAB wrote:

On 16/08/2012 17:13, Ethan Furman wrote:

Currently there are:

.index(data)   -- returns index of data in Index, or raises error
.query(string) -- brute force search, returns all matching records
.search(match) -- binary search through table, returns all matching
   records

'index' and 'query' are supported by Tables, Lists, and Indexes; search
(and now index_search) are only supported on Indexes.


What exactly is the difference between .index and .index_search with
the default arguments?


.index requires a data structure that can be compared to a record 
(another record, a dictionary with the same field/key names, or a 
list/tuple with values in the same order as the fields).  It returns the 
index or raises NotFoundError.  It is brute force.


.index_search requires match criteria (a tuple with the desired values 
in the same order as the key).  It returns the index or raises 
NotFoundError (unless nearest is True -- then the value returned is 
where the match should be).  It is binary search.


So the only similarity is that they both return a number or raise 
NotFoundError.  What they use for the search and how they perform the 
search are both completely different.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread Tim Chase
On 08/15/12 18:26, Ethan Furman wrote:
.index_search(
   match,
   start=None,
   stop=None,
   nearest=False,
   partial=False )
 
 The defaults are to search the entire index for exact matches and raise
 NotFoundError if it can't find anything.
 
 The question is what should the return value be?
 
 I don't like the usual pattern of -1 meaning not found (as in
 'nothere'.find('a')), so I thought a fun and interesting way would be to
 subclass long and override the __nonzero__ method to return True/False
 based on whether the (partial) match was found.  The main problems I see
 here is that the special return value reverts to a normal int/long if
 anything is done to it (adding, subtracting, etc), and the found status
 is lost.
 
 The other option is returning a (number, bool) tuple -- safer, yet more
 boring... ;)


I'm not quite sure I follow...you start off by saying that it will
raise NotFoundError if it can't find anything.  So if it finds
something, just return it.  Because if it found the item, it gives
it to you; if it didn't find the item, it raised an error.  That
sounds like a good (easy to understand) interface, similar to how
string.index() works.

-tkc



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread Ethan Furman

Tim Chase wrote:

On 08/15/12 18:26, Ethan Furman wrote:

   .index_search(
  match,
  start=None,
  stop=None,
  nearest=False,
  partial=False )

The defaults are to search the entire index for exact matches and raise
NotFoundError if it can't find anything.

The question is what should the return value be?

I don't like the usual pattern of -1 meaning not found (as in
'nothere'.find('a')), so I thought a fun and interesting way would be to
subclass long and override the __nonzero__ method to return True/False
based on whether the (partial) match was found.  The main problems I see
here is that the special return value reverts to a normal int/long if
anything is done to it (adding, subtracting, etc), and the found status
is lost.

The other option is returning a (number, bool) tuple -- safer, yet more
boring... ;)


I'm not quite sure I follow...you start off by saying that it will
raise NotFoundError if it can't find anything.  So if it finds
something, just return it.  Because if it found the item, it gives
it to you; if it didn't find the item, it raised an error.  That
sounds like a good (easy to understand) interface, similar to how
string.index() works.



Indeed, it's even less clear without the part you snipped.  ;)  Which 
wasn't very.


The well-hidden clue was this line:

nearest returns where the match should be instead of raising an error

And my question should have been:

  What should the return value be when nearest == True?

My bit of fun was this class:

  class IndexLocation(long):
used by Index.index_search -- represents the index where the
   match criteria is if True, or would be if False
def __new__(cls, value, found):
  value is the number, found is True/False
  result = long.__new__(cls, value)
  result.found = found
  return result
def __nonzero__(self):
  return self.found

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread Tim Chase
On 08/15/12 19:21, Ethan Furman wrote:
 The well-hidden clue was this line:
 
 nearest returns where the match should be instead of raising an error
 
 And my question should have been:
 
What should the return value be when nearest == True?

Ah, well that's somewhat clearer.  Return the closest and not bother
to let the user know it was inexact.  Upon requesting it with
nearest=True, they *knew* that the result might be a nearest match.
 Though if they ask for nearest, an exact match *better* be the
nearest if it exists. :-P

I'd say the API-user shouldn't ask for what they don't want.

-tkc





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread Steven D'Aprano
On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:

 Indexes have a new method (rebirth of an old one, really):
 
.index_search(
   match,
   start=None,
   stop=None,
   nearest=False,
   partial=False )
[...]

Why index_search rather than just search?


 The question is what should the return value be?
 
 I don't like the usual pattern of -1 meaning not found (as in
 'nothere'.find('a'))

And you are right not to. The problem with returning -1 as a not found 
sentinel is that if it is mistakenly used where you would use a found 
result, your code silently does the wrong thing instead of giving an 
exception.

So pick a sentinel value which *cannot* be used as a successful found 
result.

Since successful searches return integer offsets (yes?), one possible 
sentinel might be None. (That's what re.search and re.match return 
instead of a MatchObject.) But first ensure that None is *not* valid 
input to any of your methods that take an integer.

For example, if str.find was changed to return None instead of -1 that 
would not solve the problem, because None is a valid argument for slices:

p = mystring.find(:)
print(mystring[p:-1])  # Oops, no better with None

You don't have to predict every imaginable failure mode or defend against 
utterly incompetent programmers, just against the obvious failure modes.

If None is not suitable as a sentinel, create a constant value that can't 
be mistaken for anything else:

class NotFoundType(object):
def __repr__(self):
return Not Found
__str__ = __repr__

NOTFOUND = NotFoundType()
del NotFoundType


and then return that.


(By the way, I'm assuming that negative offsets are valid for your 
application. If they aren't, then using -1 as sentinel is perfectly safe, 
since passing a not found -1 as offset to another method will result in 
an immediate exception.)


 The other option is returning a (number, bool) tuple -- safer, yet more
 boring... ;)

Boring is good, but it is also a PITA to use, and that's not good. I 
never remember whether the signature is (offset, flag) or (flag, offset), 
and if you get it wrong, your code will probably fail silently:

py flag, offset = (23, False)  # Oops, I got it wrong.
py if flag:
... print(hello world[offset+1:])
...
ello world


   

-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread Ethan Furman

Steven D'Aprano wrote:

On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:


Indexes have a new method (rebirth of an old one, really):

   .index_search(
  match,
  start=None,
  stop=None,
  nearest=False,
  partial=False )

[...]

Why index_search rather than just search?


Because search already exists and returns a dbf.List of all matching 
records.


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread MRAB

On 16/08/2012 01:28, Tim Chase wrote:

On 08/15/12 19:21, Ethan Furman wrote:

The well-hidden clue was this line:

nearest returns where the match should be instead of raising an error

And my question should have been:

   What should the return value be when nearest == True?


Ah, well that's somewhat clearer.  Return the closest and not bother
to let the user know it was inexact.  Upon requesting it with
nearest=True, they *knew* that the result might be a nearest match.
  Though if they ask for nearest, an exact match *better* be the
nearest if it exists. :-P

I'd say the API-user shouldn't ask for what they don't want.


+1
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question concerning Index.index_search()

2012-08-15 Thread MRAB

On 16/08/2012 02:22, Ethan Furman wrote:

Steven D'Aprano wrote:

On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:


Indexes have a new method (rebirth of an old one, really):

   .index_search(
  match,
  start=None,
  stop=None,
  nearest=False,
  partial=False )

[...]

Why index_search rather than just search?


Because search already exists and returns a dbf.List of all matching
records.


Perhaps that should've been called find_all!
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-08 Thread Ethan Furman

Ed Leafe wrote:

When converting from paradigms in other languages, I've often been 
tempted to follow the accepted pattern for that language, and I've almost 
always regretted it.


+1



When in doubt, make it as Pythonic as possible.


+1 QOTW

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-08 Thread Ole Martin Bjørndalen
On Wed, Aug 8, 2012 at 5:18 PM, Ethan Furman et...@stoneleaf.us wrote:
 Ed Leafe wrote:
 When converting from paradigms in other languages, I've often been
 tempted to follow the accepted pattern for that language, and I've almost
 always regretted it.
 +1
 When in doubt, make it as Pythonic as possible.
 +1 QOTW
 ~Ethan~

+2 from me as well.

Totally in spirit with the Zen of Python!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-07 Thread Ed Leafe
On Aug 2, 2012, at 10:55 AM, Ethan Furman wrote:

 SQLite has a neat feature where if you give it a the file-name of ':memory:' 
 the resulting table is in memory and not on disk.  I thought it was a cool 
 feature, but expanded it slightly: any name surrounded by colons results in 
 an in-memory table.
 
 I'm looking at the same type of situation with indices, but now I'm wondering 
 if the :name: method is not pythonic and I should use a flag (in_memory=True) 
 when memory storage instead of disk storage is desired.

When converting from paradigms in other languages, I've often been 
tempted to follow the accepted pattern for that language, and I've almost 
always regretted it.

When in doubt, make it as Pythonic as possible.


-- Ed Leafe



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-06 Thread Ethan Furman

[redirecting back to list]

Ole Martin Bjørndalen wrote:

On Sun, Aug 5, 2012 at 4:09 PM, Ethan Furman et...@stoneleaf.us wrote:

Ole Martin Bjørndalen wrote:
You can do this by implementing either __getitem__ or __iter__, unless the
streaming flag would also make your table not in memory.


Cool!

Wow! I realize now that this could in fact be fairly easy to
implement. I just have to shuffle around the code a bit to make both
possible. The API would be:

   # Returns table object which is a subclass of list
   table = dbfget.read('cables.dbf')
   for rec in table:
   print rec

   # Return a table object which behaves like an iterator
   table = dbfget.read('cables.dbf', iter=True)
   for rec in table:
  print rec

I have a lot of questions in my mind about how to get this to work,
but I feel like it's the right thing to do. I will make an attempt at
a rewrite and get back to you all later.

One more API question: I am uncomfortable with:


   dbfget.read()

Should it just be:

   dbfget.get()

?

- Ole


`dbfget` is the package name, and `read()` or `get` is the 
class/function that loads the table into memory and returns it?


Maybe `load()`?

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-05 Thread Ethan Furman

Ole Martin Bjørndalen wrote:

On Thu, Aug 2, 2012 at 5:55 PM, Ethan Furman et...@stoneleaf.us wrote:

SQLite has a neat feature where if you give it a the file-name of ':memory:'
the resulting table is in memory and not on disk.  I thought it was a cool
feature, but expanded it slightly: any name surrounded by colons results in
an in-memory table.

I'm looking at the same type of situation with indices, but now I'm
wondering if the :name: method is not pythonic and I should use a flag
(in_memory=True) when memory storage instead of disk storage is desired.

Thoughts?


I agree that the flag would be more pythonic in dbf.py.

I was not aware that you are adding sqlite functionality to your
library. This is very cool!


Actually, I'm not.  I had stumbled across that one tidbit and thought it 
was cool, but cool is not always pythonic.  ;)




I am considering adding a streaming=True flag which would make the
table class a record generator,


You can do this by implementing either __getitem__ or __iter__, unless 
the streaming flag would also make your table not in memory.




I hope this can help you somehow in your decision making process.


All comments appreciated.  Thanks!

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-04 Thread Ole Martin Bjørndalen
On Thu, Aug 2, 2012 at 5:55 PM, Ethan Furman et...@stoneleaf.us wrote:
 SQLite has a neat feature where if you give it a the file-name of ':memory:'
 the resulting table is in memory and not on disk.  I thought it was a cool
 feature, but expanded it slightly: any name surrounded by colons results in
 an in-memory table.

 I'm looking at the same type of situation with indices, but now I'm
 wondering if the :name: method is not pythonic and I should use a flag
 (in_memory=True) when memory storage instead of disk storage is desired.

 Thoughts?

I agree that the flag would be more pythonic in dbf.py.

I was not aware that you are adding sqlite functionality to your
library. This is very cool!

I have been through the same questions with my own DBF library, and
I've come to some conclusions: First, I decided to make the library
read-only and in-memory. That is all we need in-house anyway. Second,
I decided to make an external tool for converting DBF files to sqlite:

  https://github.com/olemb/dbfget/blob/master/extras/dbf2sqlite

(To anyone reading: I have not yet made a public announcement of
dbfget, but I will shortly. Consider this an informal announcement:
https://github.com/olemb/dbfget/ )

I am considering adding a streaming=True flag which would make the
table class a record generator, and a save() method which would
allow you to save data back to the file, or to a new file if you
provide an optional file name. In fact, I had this functionality in
earlier versions, but decided to chuck it out in order to make the API
as clean as possible.

I hope this can help you somehow in your decision making process.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-03 Thread Peter Otten
Ethan Furman wrote:

 SQLite has a neat feature where if you give it a the file-name of
 ':memory:' the resulting table is in memory and not on disk.  I thought
 it was a cool feature, but expanded it slightly: any name surrounded by
 colons results in an in-memory table.
 
 I'm looking at the same type of situation with indices, but now I'm
 wondering if the :name: method is not pythonic and I should use a flag
 (in_memory=True) when memory storage instead of disk storage is desired.
 
For SQLite it seems OK because you make the decision once per database. For 
dbase it'd be once per table, so I would prefer the flag.

Random

 Thoughts?

- Do you really want your users to work with multiple dbf files? I think I'd 
rather convert to SQLite, perform the desired operations using sql, then 
convert back.

- Are names required to manipulate the table? If not you could just omit 
them to make the table in-memory.

- How about a connection object that may either correspond to a directory or 
RAM:

db = dbf.connect(:memory:)
table = db.Table(foo, ...)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-03 Thread Ethan Furman

Peter Otten wrote:

Ethan Furman wrote:


SQLite has a neat feature where if you give it a the file-name of
':memory:' the resulting table is in memory and not on disk.  I thought
it was a cool feature, but expanded it slightly: any name surrounded by
colons results in an in-memory table.

I'm looking at the same type of situation with indices, but now I'm
wondering if the :name: method is not pythonic and I should use a flag
(in_memory=True) when memory storage instead of disk storage is desired.
 
For SQLite it seems OK because you make the decision once per database. For 
dbase it'd be once per table, so I would prefer the flag.


So far all feedback is for the flag, so that's what I'll do.



Random


Thoughts?


- Do you really want your users to work with multiple dbf files? I think I'd 
rather convert to SQLite, perform the desired operations using sql, then 
convert back.


Seems like that would be quite a slow-down (although if a user wants to 
do that, s/he certainly could).


- Are names required to manipulate the table? If not you could just omit 
them to make the table in-memory.


At one point I had thought to make tables singletons (so only one copy 
of /user/bob/scores.dbf) but that hasn't happened and is rather low 
priority, so at this point the name is not required for anything beside 
initial object creation.


- How about a connection object that may either correspond to a directory or 
RAM:


db = dbf.connect(:memory:)
table = db.Table(foo, ...)


dbf.py does not support the DB-API interface, so no connection objects. 
  Tables are opened directly and dealt with directly.


All interesting thoughts that made me think.  Thank you.

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: dbf.py API question

2012-08-03 Thread Tim Chase
On 08/03/12 08:11, Ethan Furman wrote:
 So far all feedback is for the flag, so that's what I'll do.

I agree with the flag, though would also be reasonably content with
using None for the filename to indicate in-memory rather than
on-disk storage.

-tkc




-- 
http://mail.python.org/mailman/listinfo/python-list