On Jun 7, 11:37 pm, ru...@yahoo.com ru...@yahoo.com wrote:
On 06/06/2011 08:33 AM, rusi wrote:
For any significant language feature (take recursion for example)
there are these issues:
1. Ease of reading/skimming (other's) code
2. Ease of writing/designing one's own
3. Learning curve
ru...@yahoo.com ru...@yahoo.com wrote:
On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
Yes, but you have to pay the cost of loading the re engine, even if
it is a one off cost, it's still a cost,
~$ time python -c 'pass'
real 0m0.015s
user 0m0.011s
sys 0m0.003s
~$ time
On 06/08/2011 03:01 AM, Duncan Booth wrote:
ru...@yahoo.com ru...@yahoo.com wrote:
On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
Yes, but you have to pay the cost of loading the re engine, even if
it is a one off cost, it's still a cost,
[...]
At least part of the reason that there's no
On 06/07/2011 06:30 PM, Roy Smith wrote:
On 06/06/2011 08:33 AM, rusi wrote:
Evidently for syntactic, implementation and cultural reasons, Perl
programmers are likely to get (and then overuse) regexes faster than
python programmers.
ru...@yahoo.com ru...@yahoo.com wrote:
I don't see how the
On Jun 8, 7:38 pm, ru...@yahoo.com ru...@yahoo.com wrote:
On 06/07/2011 06:30 PM, Roy Smith wrote:
On 06/06/2011 08:33 AM, rusi wrote:
Evidently for syntactic, implementation and cultural reasons, Perl
programmers are likely to get (and then overuse) regexes faster than
python
On 03/06/2011 03:58, Chris Torek wrote:
-
This is a bit surprising, since both s1 in s2 and re.search()
could use a Boyer-Moore-based algorithm for a sufficiently-long
fixed string, and the time required should be proportional to that
needed to
On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
On Sun, 05 Jun 2011 23:03:39 -0700, ru...@yahoo.com wrote:
[...]
I would argue that the first, non-regex solution is superior, as it
clearly distinguishes the multiple steps of the solution:
* filter lines that start with CUSTOMER
* extract
On 06/06/2011 08:33 AM, rusi wrote:
For any significant language feature (take recursion for example)
there are these issues:
1. Ease of reading/skimming (other's) code
2. Ease of writing/designing one's own
3. Learning curve
4. Costs/payoffs (eg efficiency, succinctness) of use
5.
On 06/06/2011 08:33 AM, rusi wrote:
Evidently for syntactic, implementation and cultural reasons, Perl
programmers are likely to get (and then overuse) regexes faster than
python programmers.
ru...@yahoo.com ru...@yahoo.com wrote:
I don't see how the different Perl and Python cultures
On 06/03/2011 08:05 PM, Steven D'Aprano wrote:
On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote:
I often find myself changing, for example, a startwith() to a RE when
I realize that the input can contain mixed case
Why wouldn't you just normalise the case?
Because some of the text
In article ef48ad50-da06-47a8-978a-47d6f4271...@d28g2000yqf.googlegroups.com
ru...@yahoo.com ru...@yahoo.com wrote (in part):
[mass snippage]
What I mean is that I see regexes as being an extremely small,
highly restricted, domain specific language targeted specifically
at describing text
: comp.lang.python
To: python-list@python.org
Sent: Monday, June 06, 2011 10:11 AM
Subject: Re: how to avoid leading white spaces
In article
ef48ad50-da06-47a8-978a-47d6f4271...@d28g2000yqf.googlegroups.com
ru...@yahoo.com ru...@yahoo.com wrote (in part):
[mass snippage]
What I mean is that I see regexes
On Mon, Jun 6, 2011 at 6:51 PM, Octavian Rasnita orasn...@gmail.com wrote:
It is not so hard to decide whether using RE is a good thing or not.
When the speed is important and every millisecond counts, RE should be used
only when there is no other faster way, because usually RE is less faster
For any significant language feature (take recursion for example)
there are these issues:
1. Ease of reading/skimming (other's) code
2. Ease of writing/designing one's own
3. Learning curve
4. Costs/payoffs (eg efficiency, succinctness) of use
5. Debug-ability
I'll start with 3.
When someone of
On Sun, 05 Jun 2011 23:03:39 -0700, ru...@yahoo.com wrote:
Thus what starts as
if line.startswith ('CUSTOMER '):
try:
kw, first_initial, last_name, code, rest = line.split(None, 4)
...
often turns into (sometimes before it is written) something like
m = re.match
On Mon, Jun 6, 2011 at 9:29 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
[...]
I would expect
any regex processor to compile the regex into an FSM.
Flying Spaghetti Monster?
I have been Touched by His Noodly Appendage!!!
Finite State Machine.
--
On 2011-06-06, ru...@yahoo.com ru...@yahoo.com wrote:
On 06/03/2011 02:49 PM, Neil Cerutti wrote:
Can you find an example or invent one? I simply don't remember
such problems coming up, but I admit it's possible.
Sure, the response to the OP of this thread.
Here's a recap, along with two
On Mon, Jun 6, 2011 at 10:08 AM, Neil Cerutti ne...@norwich.edu wrote:
import re
print(re solution)
with open(data.txt) as f:
for line in f:
fixed = re.sub(r(TABLE='\S+)\s+', r\1', line)
print(fixed, end='')
print(non-re solution)
with open(data.txt) as f:
for line
On 2011-06-06, Ian Kelly ian.g.ke...@gmail.com wrote:
On Mon, Jun 6, 2011 at 10:08 AM, Neil Cerutti ne...@norwich.edu wrote:
import re
print(re solution)
with open(data.txt) as f:
? ?for line in f:
? ? ? ?fixed = re.sub(r(TABLE='\S+)\s+', r\1', line)
? ? ? ?print(fixed, end='')
Ian Kelly wrote:
On Mon, Jun 6, 2011 at 10:08 AM, Neil Cerutti ne...@norwich.edu wrote:
import re
print(re solution)
with open(data.txt) as f:
for line in f:
fixed = re.sub(r(TABLE='\S+)\s+', r\1', line)
print(fixed, end='')
print(non-re solution)
with open(data.txt) as f:
On Mon, Jun 6, 2011 at 11:17 AM, Neil Cerutti ne...@norwich.edu wrote:
I wrestled with using addition like that, and decided against it.
The 7 is a magic number and repeats/hides information. I wanted
something like:
prefix = TABLE='
start = line.index(prefix) + len(prefix)
But decided
On Mon, Jun 6, 2011 at 11:48 AM, Ethan Furman et...@stoneleaf.us wrote:
I like the readability of this version, but isn't generating an exception on
every other line going to kill performance?
I timed it on the example data before I posted and found that it was
still 10 times as fast as the
On 2011-06-06, Ian Kelly ian.g.ke...@gmail.com wrote:
Fair enough, although if you ask me the + 1 is just as magical
as the + 7 (it's still the length of the string that you're
searching for). Also, re-finding the opening ' still repeats
information.
Heh, true. I doesn't really repeat
On 03/06/2011 03:58, Chris Torek wrote:
-
This is a bit surprising, since both s1 in s2 and re.search()
could use a Boyer-Moore-based algorithm for a sufficiently-long
fixed string, and the time required should be proportional to that
needed to
On Jun 3, 7:25 pm, Steven D'Aprano steve
+comp.lang.pyt...@pearwood.info wrote:
Regarding their syntax, I'd like to point out that even Larry Wall is
dissatisfied with regex culture in the Perl community:
http://www.perl.com/pub/2002/06/04/apo5.html
This is a very good link.
And it can be a
On 06/03/2011 02:49 PM, Neil Cerutti wrote:
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
or that I have to treat commas as well as spaces as
delimiters.
source.replace(,, ).split( )
Uhgg. create a whole new string just so you can split it on one
rather than two
On 06/03/2011 03:45 PM, Chris Torek wrote:
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
[prefers]
re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam',
On Sat, Jun 4, 2011 at 12:30 PM, Roy Smith r...@panix.com wrote:
Another nice thing about regexes (as compared to string methods) is that
they're both portable and serializable. You can use the same regex in
Perl, Python, Ruby, PHP, etc. You can transmit them over a network
connection to a
I wrote:
Another nice thing about regexes (as compared to string methods) is
that they're both portable and serializable. You can use the same
regex in Perl, Python, Ruby, PHP, etc.
In article 4de9bf50$0$29996$c3e8da3$54964...@news.astraweb.com,
Steven D'Aprano
The efficiently argument is specious. [This is a python list not a C
or assembly list]
The real issue is that complex regexes are hard to get right -- even
if one is experienced.
This is analogous to the fact that knotty programs can be hard to get
right even for experienced programmers.
The
On Sat, 04 Jun 2011 13:41:33 +1200, Gregory Ewing wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries
But is there any need for the Boyer-Moore algorithm to
operate on characters?
Seems to me
On Sat, 04 Jun 2011 05:14:56 +, Steven D'Aprano wrote:
This fails to support non-ASCII letters, and you know quite well that
having to spell out by hand regexes in both upper and lower (or mixed)
case is not support for case-insensitive matching. That's why Python's re
has a case
On Sat, 04 Jun 2011 09:39:24 -0400, Roy Smith wrote:
To be sure, if you explore the edges of the regex syntax space, you can
write non-portable expressions. You don't even have to get very far out
to the edge. But, as you say, if you limit yourself to a subset, you
can write portable ones.
On Sat, 04 Jun 2011 21:02:32 +0100, Nobody wrote:
On Sat, 04 Jun 2011 05:14:56 +, Steven D'Aprano wrote:
This fails to support non-ASCII letters, and you know quite well that
having to spell out by hand regexes in both upper and lower (or mixed)
case is not support for case-insensitive
* Roy Smith (Thu, 02 Jun 2011 21:57:16 -0400)
In article 94ph22frh...@mid.individual.net,
Neil Cerutti ne...@norwich.edu wrote:
On 2011-06-01, ru...@yahoo.com ru...@yahoo.com wrote:
For some odd reason (perhaps because they are used a lot in
Perl), this groups seems to have a great
On 06/02/2011 07:21 AM, Neil Cerutti wrote:
On 2011-06-01, ru...@yahoo.com ru...@yahoo.com wrote:
For some odd reason (perhaps because they are used a lot in
Perl), this groups seems to have a great aversion to regular
expressions. Too bad because this is a typical problem where
their
On Fri, 03 Jun 2011 04:30:46 +, Chris Torek wrote:
I'm not sure what you mean by full 16-bit Unicode string? Isn't
unicode inherently 32 bit?
Well, not exactly. As I understand it, Python is normally built
with a 16-bit unicode character type though
It's normally 32-bit on platforms
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
The other tradeoff, applying both to Perl and Python is with
maintenance. As mentioned above, even when today's
requirements can be solved with some code involving several
string functions, indexes, and conditionals, when those
On Fri, 03 Jun 2011 02:58:24 +, Chris Torek wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries (one per possible ord() value). However, if the
string being sought is all single-byte values, a
On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote:
On 06/02/2011 07:21 AM, Neil Cerutti wrote:
Python's str methods, when they're sufficent, are usually more
efficient.
Unfortunately, except for the very simplest cases, they are often not
sufficient.
Maybe so, but the very
On 03 Jun 2011 14:25:53 GMT
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
source.replace(,, ).split( )
I would do;
source.replace(,, ).split()
[steve@sylar ~]$ python -m timeit -s source = 'a b c,d,e,f,g h i j k'
What if the string is 'a b c, d, e,f,g h i j k'?
On 06/03/2011 07:17 AM, Neil Cerutti wrote:
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
The other tradeoff, applying both to Perl and Python is with
maintenance. As mentioned above, even when today's
requirements can be solved with some code involving several
string functions,
On 06/03/2011 08:25 AM, Steven D'Aprano wrote:
On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote:
On 06/02/2011 07:21 AM, Neil Cerutti wrote:
Python's str methods, when they're sufficent, are usually more
efficient.
Unfortunately, except for the very simplest cases, they are
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
or that I have to treat commas as well as spaces as
delimiters.
source.replace(,, ).split( )
Uhgg. create a whole new string just so you can split it on one
rather than two characters? Sorry, but I find
re.split ('[ ,]', source)
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
[prefers]
re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam', 'maps']
Instead, you probably want a comma
Chris Torek wrote:
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
[prefers]
re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam', 'maps']
I think you've got
On 03/06/2011 23:11, Ethan Furman wrote:
Chris Torek wrote:
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
[prefers]
re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '',
Chris Torek wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries
But is there any need for the Boyer-Moore algorithm to
operate on characters?
Seems to me you could just as well chop the UTF-16 up
into
On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote:
I often find myself changing, for example, a startwith() to a RE when
I realize that the input can contain mixed case
Why wouldn't you just normalise the case?
Because some of the text may be case-sensitive.
Perhaps you
On 04/06/2011 03:05, Steven D'Aprano wrote:
On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote:
I often find myself changing, for example, a startwith() to a RE when
I realize that the input can contain mixed case
Why wouldn't you just normalise the case?
Because some of the text
In article 4de992d7$0$29996$c3e8da3$54964...@news.astraweb.com,
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
Of course, if you include both case-sensitive and insensitive tests in
the same calculation, that's a good candidate for a regex... or at least
it would be if regexes
On Sat, 04 Jun 2011 03:24:50 +0100, MRAB wrote:
[snip]
Some regex implementations support scoped case sensitivity. :-)
Yes, you should link to your regex library :)
Have you considered the suggested Perl 6 syntax? Much of it looks good to
me.
I have at times thought that it would be
On Fri, 03 Jun 2011 22:30:59 -0400, Roy Smith wrote:
In article 4de992d7$0$29996$c3e8da3$54964...@news.astraweb.com,
Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote:
Of course, if you include both case-sensitive and insensitive tests in
the same calculation, that's a good
On 2011-06-01, ru...@yahoo.com ru...@yahoo.com wrote:
For some odd reason (perhaps because they are used a lot in
Perl), this groups seems to have a great aversion to regular
expressions. Too bad because this is a typical problem where
their use is the best solution.
Python's str methods,
In article 94ph22frh...@mid.individual.net,
Neil Cerutti ne...@norwich.edu wrote:
On 2011-06-01, ru...@yahoo.com ru...@yahoo.com wrote:
For some odd reason (perhaps because they are used a lot in
Perl), this groups seems to have a great aversion to regular
expressions. Too bad because
On 03/06/2011 02:57, Roy Smith wrote:
In article94ph22frh...@mid.individual.net,
Neil Ceruttine...@norwich.edu wrote:
On 2011-06-01, ru...@yahoo.comru...@yahoo.com wrote:
For some odd reason (perhaps because they are used a lot in
Perl), this groups seems to have a great aversion to
In article 94ph22frh...@mid.individual.net
Neil Cerutti ne...@norwich.edu wrote:
Python's str methods, when they're sufficent, are usually more
efficient.
In article roy-e2fa6f.21571602062...@news.panix.com
Roy Smith r...@panix.com replied:
I was all set to say, prove it! when I decided to
In article is9ikg0...@news1.newsguy.com,
Chris Torek nos...@torek.net wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries (one per possible ord() value).
I'm not sure what you mean by full 16-bit
On Fri, Jun 3, 2011 at 1:44 PM, Roy Smith r...@panix.com wrote:
In article is9ikg0...@news1.newsguy.com,
Chris Torek nos...@torek.net wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries (one per
On Fri, Jun 3, 2011 at 1:52 PM, Chris Angelico ros...@gmail.com wrote:
However, Unicode planes 0-2 have all
the defined printable characters
PS. I'm fully aware that there are ranges defined in page 14 / E.
They're non-printing characters, and unlikely to be part of a text
string, although it
In article is9ikg0...@news1.newsguy.com,
Chris Torek nos...@torek.net wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries (one per possible ord() value).
In article
Hi
i have a file which contains data
//ACCDJ EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB,
// UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ '
//ACCT EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB,
// UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCT'
//ACCUM
On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar
rakeshkumar.tec...@gmail.com wrote:
Hi
i have a file which contains data
//ACCDJ EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB,
// UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ '
//ACCT EXEC
On Jun 1, 11:11 am, Chris Rebert c...@rebertia.com wrote:
On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar
Hi
i have a file which contains data
//ACCDJ EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB,
// UNLDSYST=UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ '
//ACCT
On 06/01/2011 09:39 PM, ru...@yahoo.com wrote:
On Jun 1, 11:11 am, Chris Rebertc...@rebertia.com wrote:
On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar
Hi
i have a file which contains data
//ACCDJ EXEC DB2UNLDC,DFLID=DFLID,PARMLIB=PARMLIB,
//
65 matches
Mail list logo