Python sqlite and regex.

2006-05-19 Thread Julien ARNOUX
Hi,
I'd like to use regular expressions in sqlite query, I using apsw module
but it doesn't work...Can you help me ?
My script:

import apsw
import re

path = 'db/db.db3'

#regexp function (extract from python-list discusion)
def regexp(expr, item):
reg = re.compile(expr)
return reg.match(item) is not None

con = apsw.Connection(path)
#create function
con.createscalarfunction(REGEXP, regexp)
cur = con.cursor()
#exampl
cur.execute(select foo from test where foo regex 'aa.[0-9]))

and the error is:

cur.execute('select foo from test where foo regex tata')
apsw.SQLError: SQLError: near regex: syntax error

Thanks



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Dan Sommers
On Fri, 19 May 2006 14:47:10 +0200,
Julien ARNOUX [EMAIL PROTECTED] wrote:

 cur.execute(select foo from test where foo regex 'aa.[0-9]))

 and the error is:

 cur.execute('select foo from test where foo regex tata')
 apsw.SQLError: SQLError: near regex: syntax error

I think you're missing a closing quote on that regex; or perhaps that's
an extra closing parenthesis at the end.

Also, it's probably best to let the database module do any escaping you
may need.  For example:

fooregex = r'aa.[0-9]'
sql = 'select foo from test where foo regex %s'
cur.execute( sql, tuple( fooregex ) )

See the DP API spec for more information.

Regards,
Dan

-- 
Dan Sommers
http://www.tombstonezero.net/dan/
I wish people would die in alphabetical order. -- My wife, the genealogist
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Matt Good
SQLite3 already has a REGEXP function, so you don't need to create your
own.

As Dan mentioned you also have a problem in your expression: 'aa.[0-9])
You need a closing quote on the expression, and you need to match the
close paren with an open paren, or remove it.

Also, in case you weren't aware, there will be a sqlite3 module in
Python 2.5 based on pysqlite 2.2: http://initd.org/tracker/pysqlite

Using pysqlite will make it easier to move to the Python 2.5 sqlite3
module if that's important to you.

-- Matt Good

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Gerhard Häring
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Matt Good wrote:
 SQLite3 already has a REGEXP function, so you don't need to create your
 own. [...]

Yes, but SQLite does not include a regular expression engine, and thus
according to the SQLite docs you need to register a REGEXP function in
order to make the REGEXP operator work:


The REGEXP operator is a special syntax for the regexp() user function. No
regexp() user function is defined by default and so use of the REGEXP
operator will normally result in an error message. If a user-defined
function named regexp is defined at run-time, that function will be
called in order to implement the REGEXP operator.


- -- Gerhard
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEbfbQdIO4ozGCH14RAqqqAJ49/9Kpi8xA6AyGB0tVJ/JcU4MczgCgoIsW
gdYgUl9ge63CiHqj4Mzgpns=
=1ZDV
-END PGP SIGNATURE-
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Paul McGuire
Gerhard Häring [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Matt Good wrote:
  SQLite3 already has a REGEXP function, so you don't need to create your
  own. [...]

 Yes, but SQLite does not include a regular expression engine, and thus
 according to the SQLite docs you need to register a REGEXP function in
 order to make the REGEXP operator work:

 
 The REGEXP operator is a special syntax for the regexp() user function. No
 regexp() user function is defined by default and so use of the REGEXP
 operator will normally result in an error message. If a user-defined
 function named regexp is defined at run-time, that function will be
 called in order to implement the REGEXP operator.
 


This is very interesting.  So I *could* define my own regexp function that
processes not regular expressions, but say, glob-like strings, which are
usually much easier for end users to work with (very basic wild-carding
where '*' matches one or more characters, and '?' matches any single
character - maybe add '#' to match any single digit and '@' to match any
single alpha character).

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python sqlite and regex.

2006-05-19 Thread Paul McGuire
Paul McGuire [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]

 where '*' matches one or more characters, and '?' matches any single

oops, I meant '*' matches zero or more characters.

In many applications, these tests are sufficient for most user queries.  And
this eliminates the problem of teaching application users how to create
queries using the full regular expression syntax.

-- Paul



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread John Salerno
Paul McGuire wrote:
 Paul McGuire [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
 
 where '*' matches one or more characters, and '?' matches any single
 
 oops, I meant '*' matches zero or more characters.

'?' also matches 0 characters
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Paul McGuire
John Salerno [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Paul McGuire wrote:
  Paul McGuire [EMAIL PROTECTED] wrote in message
  news:[EMAIL PROTECTED]
 
  where '*' matches one or more characters, and '?' matches any single
 
  oops, I meant '*' matches zero or more characters.

 '?' also matches 0 characters

Maybe it does, and maybe it doesn't... :)

Here is my test with the glob module (in my directory, there is a file named
PUTTY.RND):
 glob.glob('PUTTY.RND')
['PUTTY.RND']
 glob.glob('PUTTY.RN?')
['PUTTY.RND']
 glob.glob('PUTTY.RN?D')
[]
 glob.glob('PUTTY.RN?')
['PUTTY.RND']
 glob.glob('PUTTY.RND?')
[]
 glob.glob('PUTT?.RND')
['PUTTY.RND']
 glob.glob('PUTTY?.RND')
[]
 glob.glob('PUTT?Y.RND')
[]


Looks like '?' does *not* match zero characters in glob.  On the other hand,
in the Windows console window, it appears that '?' *does* match zero
characters.

Of course, you could write your regexp() routine to interpret '?' any way
you wanted.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Dan Sommers
On Fri, 19 May 2006 17:44:45 GMT,
Paul McGuire [EMAIL PROTECTED] wrote:

 Gerhard Häring [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]

 
 The REGEXP operator is a special syntax for the regexp() user
 function. No regexp() user function is defined by default and so use
 of the REGEXP operator will normally result in an error message. If a
 user-defined function named regexp is defined at run-time, that
 function will be called in order to implement the REGEXP operator.
 

 This is very interesting.  So I *could* define my own regexp function
 that processes not regular expressions, but say, glob-like strings,
 which are usually much easier for end users to work with (very basic
 wild-carding where '*' matches one or more characters, and '?' matches
 any single character - maybe add '#' to match any single digit and '@'
 to match any single alpha character).

Doesn't SQL already have lightweight wildcards?

SELECT somefield FROM sometable WHERE someotherfield LIKE '%foo%'

Regards,
Dan

-- 
Dan Sommers
http://www.tombstonezero.net/dan/
I wish people would die in alphabetical order. -- My wife, the genealogist
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python sqlite and regex.

2006-05-19 Thread Paul McGuire

Dan Sommers [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 On Fri, 19 May 2006 17:44:45 GMT,
 Paul McGuire [EMAIL PROTECTED] wrote:

  Gerhard Häring [EMAIL PROTECTED] wrote in message
  news:[EMAIL PROTECTED]

  
  The REGEXP operator is a special syntax for the regexp() user
  function. No regexp() user function is defined by default and so use
  of the REGEXP operator will normally result in an error message. If a
  user-defined function named regexp is defined at run-time, that
  function will be called in order to implement the REGEXP operator.
  

  This is very interesting.  So I *could* define my own regexp function
  that processes not regular expressions, but say, glob-like strings,
  which are usually much easier for end users to work with (very basic
  wild-carding where '*' matches one or more characters, and '?' matches
  any single character - maybe add '#' to match any single digit and '@'
  to match any single alpha character).

 Doesn't SQL already have lightweight wildcards?

 SELECT somefield FROM sometable WHERE someotherfield LIKE '%foo%'


Yes it does - '%' is like '*', and '_' is like '?'.  But it is rare for
application users to be familiar with these.  In my experience, users are
more likely to have seen '*' and '?'.

The reason I keep citing end/application users, as opposed to developers, is
for the case where the user has filled in some kind of wildcard search
field, to be passed to SQL in a query.  Woe to the app designer who figures
that users want to enter a regular expression in such a field.  '%' and '_'
wildcards a little better, but frankly, I think '*' and '?' looks more like
a wildcards than '%' and '_'.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python sqlite and regex.

2006-05-19 Thread Matt Good
Oops, sorry about the confusion regarding the built-in REGEXP.  That's
kind of disappointing.  It would appear that the user-defined regexp
function in the original post should work assuming the SQL and regex
syntax errors are corrected.

However, there *is* a GLOB built-in to SQLite 3 that has a default
registered implementation (though it can be overriden by a user-defined
glob method if necessary).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Dan Sommers
On Fri, 19 May 2006 18:52:38 GMT,
Paul McGuire [EMAIL PROTECTED] wrote:

 Dan Sommers [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]

 Doesn't SQL already have lightweight wildcards?
 
 SELECT somefield FROM sometable WHERE someotherfield LIKE '%foo%'

 Yes it does - '%' is like '*', and '_' is like '?'.  But it is rare
 for application users to be familiar with these.  In my experience,
 users are more likely to have seen '*' and '?'.

 The reason I keep citing end/application users, as opposed to
 developers ...

Users, application users, end users, etc., vary over time.  Since this
is c.l.p., though, we'd better just agree to disagree over which user(s)
may or may not be familiar with which (or any) wildcard conventions.

 ... the case where the user has filled in some kind of wildcard search
 field, to be passed to SQL in a query.  Woe to the app designer who
 figures that users want to enter a regular expression in such a field
 ...

Woe to the app designer who fails to target the app to the users, make
it clear to the users what is expected in any given field, and accept
constructive criticism from unhappy users.  ;-)

Knowing your users is one of the most important factors in software
design.

 ... '%' and '_' wildcards a little better, but frankly, I think '*'
 and '?' looks more like a wildcards than '%' and '_'.

Ah, but then you're not an end user, are you?  ;-)

Regards,
Dan

-- 
Dan Sommers
http://www.tombstonezero.net/dan/
I wish people would die in alphabetical order. -- My wife, the genealogist
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python sqlite and regex.

2006-05-19 Thread Ben Finney
John Salerno [EMAIL PROTECTED] writes:

 Paul McGuire wrote:
  Paul McGuire [EMAIL PROTECTED] wrote in message
  news:[EMAIL PROTECTED]
  
  where '*' matches one or more characters, and '?' matches any
  single
  
  oops, I meant '*' matches zero or more characters.
 
 '?' also matches 0 characters

Not in globs. In a glob, '?' matches any one character, '*' matches
any zero or more characters.

In a regex, '.' matches any one character, '?' matches the preceding
atom zero or one times, '*' matches the preceding atom zero or more
times, and '+' matches the preceding atom one or more times.

They're quite different syntaxes, but confusingly similar in
appearance.

On most GNU+Linux systems, these two commands get the relevant manual
pages:

$ man 7 glob
$ man 7 regex

-- 
 \  When I get real bored, I like to drive downtown and get a |
  `\ great parking spot, then sit in my car and count how many |
_o__) people ask me if I'm leaving.  -- Steven Wright |
Ben Finney

-- 
http://mail.python.org/mailman/listinfo/python-list