Regexp problem when parsing a string

2010-03-21 Thread Alessandro Marino
I'm a beginner and I was trying to write a program to parse recursively all
file names in a directory specified as parameter. The problem is that I get
a None printed to stdout when a file is positively matched. While when the
file name doesn't match the regexp the output seems ok.

C:\c:\python.exe g:\a.py sample
 foo - bar.txt , first part is: foo
None
skipping: foo.txt

Instead I expect an output like this one:

C:\c:\python.exe g:\a.py sample
 foo - bar.txt , first part is: foo
None
skipping: foo.txt

Could anyone help me to figure out why None appears in the putput?

Thanks and regards,
Ale


a.py
Description: Binary data
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem when parsing a string

2010-03-21 Thread Steven D'Aprano
On Sun, 21 Mar 2010 19:12:18 +0100, Alessandro Marino wrote:

 Could anyone help me to figure out why None appears in the putput?

I get:

Attachment not shown: MIME type application/octet-stream; filename a.py

Posting attachments to Usenet is tricky. Many newsgroups filter out 
anything they think isn't text, or even any attachment at all. Some news 
clients do the same thing.

If you have too much code to include directly in your post, then you 
should put it up on a website somewhere and just include the link.

Without looking at your code, I'd guess that using regular expressions is 
the wrong approach. Perhaps you should look at the glob module, and 
possibly os.walk.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem when parsing a string

2010-03-21 Thread MRAB

Alessandro Marino wrote:
I'm a beginner and I was trying to write a program to 
parse recursively all file names in a directory specified as parameter. 
The problem is that I get a None printed to stdout when a file is 
positively matched. While when the file name doesn't match the regexp 
the output seems ok. 


C:\c:\python.exe g:\a.py sample
 foo - bar.txt , first part is: foo
None
skipping: foo.txt

Instead I expect an output like this one:

C:\c:\python.exe g:\a.py sample
 foo - bar.txt , first part is: foo
None
skipping: foo.txt

Could anyone help me to figure out why None appears in the putput?

Thanks and regards,
Ale


It's caused by:

print saveData(file, m)

The function saveData() returns None, which is then printed.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-31 Thread Ethan Furman

MRAB wrote:

Ethan Furman wrote:


Marcus Wanner wrote:

Wow, I really need to learn more about regexp...
Any tutorials you guys can recommend?

Marcus



Mastering Regular Expressions
Powerful Techniques for Perl and Other Tools
By Jeffrey E. F. Friedl

Great book!


+1

I have the first edition, seventh printing (December 1998). It refers to
the 'regex' module of Python 1.4b1, which was subsequently replaced by
the current 're' module and then removed from the standard library. I
hope it's been updated since then. :-)


I have the second edition (no idea which printing ;), and according to 
his preface it has indeed been much updated.  Most examples are in perl, 
the few in python are decent.  The knowledge embodied seems very 
thorough.  Since I've had the book (two weeks now?)  I've been able to 
solve two otherwise thorny issues using regular expressions.  Yay!


~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Regexp problem

2009-07-30 Thread Beldar
Hi there!

I have a problem and i'm not very good at regular expressions.
I have a text like lalala lalala tiruri beldar-is-listening tiruri
lalala I need a regexp to get the 'beldar' part, the format is
'something-is-listening', i need to get the something part, use it in
my code, and then replace the whole 'something-is-listening' for
another string.

Someone can help me please? Thank you!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread Tim Chase

I have a problem and i'm not very good at regular expressions.
I have a text like lalala lalala tiruri beldar-is-listening tiruri
lalala I need a regexp to get the 'beldar' part, the format is
'something-is-listening', i need to get the something part, use it in
my code, and then replace the whole 'something-is-listening' for
another string.



Pretty easy:

   import re
   s = lalala lalala tiruri beldar-is-listening tiruri lalala
   r = re.compile(r'(\w+)-is-listening')
   r.search(s).group(1)
  'beldar'
   r.sub('this is a replacement', s)
  'lalala lalala tiruri this is a replacement tiruri lalala'

-tkc


--
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread MRAB

Beldar wrote:

Hi there!

I have a problem and i'm not very good at regular expressions.
I have a text like lalala lalala tiruri beldar-is-listening tiruri
lalala I need a regexp to get the 'beldar' part, the format is
'something-is-listening', i need to get the something part, use it in
my code, and then replace the whole 'something-is-listening' for
another string.


\w+ will match a word and enclosing it in (...) will capture what was
matched:

m = re.search(r(\w+)-is-listening, text)
print Captured '%s' % m.group(1)
print Matched from %d to %d % (m.start(), m.end())
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread Beldar
On 30 jul, 15:07, MRAB pyt...@mrabarnett.plus.com wrote:
 Beldar wrote:
  Hi there!

  I have a problem and i'm not very good at regular expressions.
  I have a text like lalala lalala tiruri beldar-is-listening tiruri
  lalala I need a regexp to get the 'beldar' part, the format is
  'something-is-listening', i need to get the something part, use it in
  my code, and then replace the whole 'something-is-listening' for
  another string.

 \w+ will match a word and enclosing it in (...) will capture what was
 matched:

      m = re.search(r(\w+)-is-listening, text)
      print Captured '%s' % m.group(1)
      print Matched from %d to %d % (m.start(), m.end())

Ok, thank you all, it was very helpful!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread Marcus Wanner

On 7/30/2009 9:32 AM, Beldar wrote:

On 30 jul, 15:07, MRAB pyt...@mrabarnett.plus.com wrote:

Beldar wrote:

Hi there!
I have a problem and i'm not very good at regular expressions.
I have a text like lalala lalala tiruri beldar-is-listening tiruri
lalala I need a regexp to get the 'beldar' part, the format is
'something-is-listening', i need to get the something part, use it in
my code, and then replace the whole 'something-is-listening' for
another string.

\w+ will match a word and enclosing it in (...) will capture what was
matched:

 m = re.search(r(\w+)-is-listening, text)
 print Captured '%s' % m.group(1)
 print Matched from %d to %d % (m.start(), m.end())


Ok, thank you all, it was very helpful!

Wow, I really need to learn more about regexp...
Any tutorials you guys can recommend?

Marcus
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread Peter Brett
Marcus Wanner marc...@cox.net writes:

 On 7/30/2009 9:32 AM, Beldar wrote:
 On 30 jul, 15:07, MRAB pyt...@mrabarnett.plus.com wrote:
 Beldar wrote:
 Hi there!
 I have a problem and i'm not very good at regular expressions.
 I have a text like lalala lalala tiruri beldar-is-listening tiruri
 lalala I need a regexp to get the 'beldar' part, the format is
 'something-is-listening', i need to get the something part, use it in
 my code, and then replace the whole 'something-is-listening' for
 another string.
 \w+ will match a word and enclosing it in (...) will capture what was
 matched:

  m = re.search(r(\w+)-is-listening, text)
  print Captured '%s' % m.group(1)
  print Matched from %d to %d % (m.start(), m.end())

 Ok, thank you all, it was very helpful!
 Wow, I really need to learn more about regexp...
 Any tutorials you guys can recommend?

I have to confess that after fiddling with regexps for quite a while
with no great success, I learnt the hard (and best) way, i.e. using
them to write something vile and horrible. [*] I commend this path to
you also. ;-)

Cheers,

  Peter

[*] http://git.gpleda.org/?p=gaf.git;a=blob;f=libgeda/desktop-i18n;h=6fab9b85b

-- 
Peter Brett pe...@peter-b.co.uk
Remote Sensing Research Group
Surrey Space Centre
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread Ethan Furman

Marcus Wanner wrote:

On 7/30/2009 9:32 AM, Beldar wrote:


On 30 jul, 15:07, MRAB pyt...@mrabarnett.plus.com wrote:


Beldar wrote:


Hi there!
I have a problem and i'm not very good at regular expressions.
I have a text like lalala lalala tiruri beldar-is-listening tiruri
lalala I need a regexp to get the 'beldar' part, the format is
'something-is-listening', i need to get the something part, use it in
my code, and then replace the whole 'something-is-listening' for
another string.


\w+ will match a word and enclosing it in (...) will capture what was
matched:

 m = re.search(r(\w+)-is-listening, text)
 print Captured '%s' % m.group(1)
 print Matched from %d to %d % (m.start(), m.end())



Ok, thank you all, it was very helpful!


Wow, I really need to learn more about regexp...
Any tutorials you guys can recommend?

Marcus


Mastering Regular Expressions
Powerful Techniques for Perl and Other Tools
By Jeffrey E. F. Friedl

Great book!

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem

2009-07-30 Thread MRAB

Ethan Furman wrote:

Marcus Wanner wrote:

On 7/30/2009 9:32 AM, Beldar wrote:


On 30 jul, 15:07, MRAB pyt...@mrabarnett.plus.com wrote:


Beldar wrote:


Hi there!
I have a problem and i'm not very good at regular expressions.
I have a text like lalala lalala tiruri beldar-is-listening tiruri
lalala I need a regexp to get the 'beldar' part, the format is
'something-is-listening', i need to get the something part, use it in
my code, and then replace the whole 'something-is-listening' for
another string.


\w+ will match a word and enclosing it in (...) will capture what was
matched:

 m = re.search(r(\w+)-is-listening, text)
 print Captured '%s' % m.group(1)
 print Matched from %d to %d % (m.start(), m.end())



Ok, thank you all, it was very helpful!


Wow, I really need to learn more about regexp...
Any tutorials you guys can recommend?

Marcus


Mastering Regular Expressions
Powerful Techniques for Perl and Other Tools
By Jeffrey E. F. Friedl

Great book!


+1

I have the first edition, seventh printing (December 1998). It refers to
the 'regex' module of Python 1.4b1, which was subsequently replaced by
the current 're' module and then removed from the standard library. I
hope it's been updated since then. :-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-07 Thread Ant
On Aug 3, 10:41 pm, Ehsan [EMAIL PROTECTED] wrote:
...
 what can I do? what's wrong whit this pattern? thanx for your comments

Nothing. There's something wrong with the code you are using the regex
with. Post it and we may be able to help. Like Lawrence has said, it's
likely to be that you are using m.group(1) with your match object
instead of m.group(0) - the former gets the first group (i.e.
everything between the first set of parens - in your case the wmv|3gp
expression), whereas the latter will return the entire match.

Post your actual code, not just the regex.

--
Ant...

http://antroy.blogspot.com/



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-06 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Ehsan
wrote:

 I use this pattern :
 http.*?\.(wmv|3gp).*
 
 but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
 download/1716611/e2000f22/Jadeed_Mlak14.wmv?
 tsid=20070803-164051-9d637d11

What's the actual Python code that uses this regexp?

My guess is, you're not using the group method correctly in the returned
match object http://docs.python.org/lib/match-objects.html.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-04 Thread Sönmez Kartal
On 4 A ustos, 00:41, Ehsan [EMAIL PROTECTED] wrote:
 I want to find http://www.2shared.com/download/1716611/e2000f22/
 Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11  or 3gp instead of
 wmv in the text file like this :
 html
 some code
 function reportAbuse() {
 var windowname=abuse;
 var url=/abuse.jsp?link= + http://www.2shared.com/file/1716611/
 e2000f22/Jadeed_Mlak14.html;
 OpenWindow =
 window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500,height=500,left=50,top=50');
 OpenWindow.focus();
   }
   function startDownload(){
 window.location = http://www.2shared.com/download/1716611/
 e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11;
 //document.downloadForm.submit();
   }
   /script
 /head
 /htmlhttp://www.2shared.com/download/1716611/e2000f22/
 Jadeed_Mlak14.3gp?tsid=20070803-164051-9d637d11sfgsfgsfgv

 I use this pattern :
 http.*?\.(wmv|3gp).*

 but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
 download/1716611/e2000f22/Jadeed_Mlak14.wmv?
 tsid=20070803-164051-9d637d11

 what can I do? what's wrong whit this pattern? thanx for your comments

You could use r'window.location = (.*?\.(wmv|3gp);' as your regex
string, I guess..

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-04 Thread Ehsan
On Aug 4, 1:22 pm, Sönmez Kartal [EMAIL PROTECTED] wrote:
 On 4 A ustos, 00:41, Ehsan [EMAIL PROTECTED] wrote:





  I want to find http://www.2shared.com/download/1716611/e2000f22/
  Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11  or 3gp instead of
  wmv in the text file like this :
  html
  some code
  function reportAbuse() {
  var windowname=abuse;
  var url=/abuse.jsp?link= + http://www.2shared.com/file/1716611/
  e2000f22/Jadeed_Mlak14.html;
  OpenWindow =
  window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500­,height=500,left=50,top=50');
  OpenWindow.focus();
}
function startDownload(){
  window.location = http://www.2shared.com/download/1716611/
  e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11;
  //document.downloadForm.submit();
}
/script
  /head
  /htmlhttp://www.2shared.com/download/1716611/e2000f22/
  Jadeed_Mlak14.3gp?tsid=20070803-164051-9d637d11sfgsfgsfgv

  I use this pattern :
  http.*?\.(wmv|3gp).*

  but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
  download/1716611/e2000f22/Jadeed_Mlak14.wmv?
  tsid=20070803-164051-9d637d11

  what can I do? what's wrong whit this pattern? thanx for your comments

 You could use r'window.location = (.*?\.(wmv|3gp);' as your regex
 string, I guess..- Hide quoted text -

 - Show quoted text -

I didn't get what do you mean? i think i must just change the pattern
but I don't know how to find bestfit pattern

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regexp problem in Python

2007-08-04 Thread Fabio Z Tessitore
Il Fri, 03 Aug 2007 14:41:52 -0700, Ehsan ha scritto:

maybe you can use this to solve your prob:

myurl = http://www.2shared.com/download/1716611/e2000f22/
Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11

if myurl.startswith('http') and ('wmv' in myurl or '3pg' in myurl):
# myurl is the complete address you want
print myurl

#

about re, I'm waiting for someone enlightening all us,
bye
Fabio
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-04 Thread Sönmez Kartal
On 4 A ustos, 17:10, Ehsan [EMAIL PROTECTED] wrote:
 On Aug 4, 1:22 pm, Sönmez Kartal [EMAIL PROTECTED] wrote:







  On 4 A ustos, 00:41, Ehsan [EMAIL PROTECTED] wrote:

   I want to find http://www.2shared.com/download/1716611/e2000f22/
   Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11  or 3gp instead of
   wmv in the text file like this :
   html
   some code
   function reportAbuse() {
   var windowname=abuse;
   var url=/abuse.jsp?link= + http://www.2shared.com/file/1716611/
   e2000f22/Jadeed_Mlak14.html;
   OpenWindow =
   window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500­,height=500,left=50,top=50');
   OpenWindow.focus();
 }
 function startDownload(){
   window.location = http://www.2shared.com/download/1716611/
   e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11;
   //document.downloadForm.submit();
 }
 /script
   /head
   /htmlhttp://www.2shared.com/download/1716611/e2000f22/
   Jadeed_Mlak14.3gp?tsid=20070803-164051-9d637d11sfgsfgsfgv

   I use this pattern :
   http.*?\.(wmv|3gp).*

   but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
   download/1716611/e2000f22/Jadeed_Mlak14.wmv?
   tsid=20070803-164051-9d637d11

   what can I do? what's wrong whit this pattern? thanx for your comments

  You could use r'window.location = (.*?\.(wmv|3gp);' as your regex
  string, I guess..- Hide quoted text -

  - Show quoted text -

 I didn't get what do you mean? i think i must just change the pattern
 but I don't know how to find bestfit pattern

If you append window.location =  and ';' to your pattern, it would
be more clear to detect it.

r'window.location = (.*?);'

... I have used this and it gave me ...
 data =  html
... some code
... function reportAbuse() {
... var windowname=abuse;
... var url=/abuse.jsp?link= + http://www.2shared.com/file/
1716611/e2000f22/Jadeed_Mlak14.html;
... OpenWindow =
...
window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500,height=500,left=50,top=50');
... OpenWindow.focus();
...   }
...   function startDownload(){
... window.location = http://www.2shared.com/download/1716611/
e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11;
... //document.downloadForm.submit();
...   }
...   /script
... /head
... /html
 re.findall(r'window.location = (.*?);', data)
['http://www.2shared.com/download/1716611/e2000f22/Jadeed_Mlak14.wmv?
tsid=20070803-164051-9d637d11']
 print 'It works! :-)'
It works! :-)


Happy coding

-- 
http://mail.python.org/mailman/listinfo/python-list

regexp problem in Python

2007-08-03 Thread Ehsan
I want to find http://www.2shared.com/download/1716611/e2000f22/
Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11  or 3gp instead of
wmv in the text file like this :
html
some code
function reportAbuse() {
var windowname=abuse;
var url=/abuse.jsp?link= + http://www.2shared.com/file/1716611/
e2000f22/Jadeed_Mlak14.html;
OpenWindow =
window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500,height=500,left=50,top=50');
OpenWindow.focus();
  }
  function startDownload(){
window.location = http://www.2shared.com/download/1716611/
e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11;
//document.downloadForm.submit();
  }
  /script
/head
/htmlhttp://www.2shared.com/download/1716611/e2000f22/
Jadeed_Mlak14.3gp?tsid=20070803-164051-9d637d11sfgsfgsfgv




I use this pattern :
http.*?\.(wmv|3gp).*

but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
download/1716611/e2000f22/Jadeed_Mlak14.wmv?
tsid=20070803-164051-9d637d11

what can I do? what's wrong whit this pattern? thanx for your comments

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-03 Thread Dave Hansen
On Aug 3, 4:41 pm, Ehsan [EMAIL PROTECTED] wrote:
 I want to find http://www.2shared.com/download/1716611/e2000f22/
[...]
 I use this pattern :
 http.*?\.(wmv|3gp).*

 but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
 download/1716611/e2000f22/Jadeed_Mlak14.wmv?
 tsid=20070803-164051-9d637d11

 what can I do? what's wrong whit this pattern? thanx for your comments

Just a guess, based on too little information: Try (http.*?\.(wmv|
3gp).*)

Regards,

   -=Dave

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp problem in Python

2007-08-03 Thread Ehsan
On Aug 4, 1:36 am, Dave Hansen [EMAIL PROTECTED] wrote:
 On Aug 3, 4:41 pm, Ehsan [EMAIL PROTECTED] wrote:

  I want to find http://www.2shared.com/download/1716611/e2000f22/
 [...]
  I use this pattern :
  http.*?\.(wmv|3gp).*

  but it returns only 'wmv' and '3gp' instead of http://www.2shared.com/
  download/1716611/e2000f22/Jadeed_Mlak14.wmv?
  tsid=20070803-164051-9d637d11

  what can I do? what's wrong whit this pattern? thanx for your comments

 Just a guess, based on too little information: Try (http.*?\.(wmv|
 3gp).*)

 Regards,

-=Dave

no, it doesn't work

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem with `('

2007-03-22 Thread Steve Holden
Zeng Nan wrote:
 On Thu, Mar 22, 2007 at 01:26:22AM -0700, Johny wrote:
 I have  the following text

 titleGoods Item  146 (174459989)  - OurWebSite/title

 from which I need to extract
 `Goods Item  146 '

 Can anyone help with regexp?
 Thank you for help
 L.
 
 (Goods\s+Item\s+146\s+)
 
 
[snigger]

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings   http://holdenweb.blogspot.com

-- 
http://mail.python.org/mailman/listinfo/python-list


Regexp problem with `('

2007-03-22 Thread Johny
I have  the following text

titleGoods Item  146 (174459989)  - OurWebSite/title

from which I need to extract
`Goods Item  146 '

Can anyone help with regexp?
Thank you for help
L.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem with `('

2007-03-22 Thread Zeng Nan
On Thu, Mar 22, 2007 at 01:26:22AM -0700, Johny wrote:
 I have  the following text
 
 titleGoods Item  146 (174459989)  - OurWebSite/title
 
 from which I need to extract
 `Goods Item  146 '
 
 Can anyone help with regexp?
 Thank you for help
 L.

(Goods\s+Item\s+146\s+)

-- 
Zeng Nan   

MY BLOG: http://zengnan.blogspot.com
Public Key: http://pgp.mit.edu/ | www.keyserver.net

~~~
In Lexington, Kentucky, it's illegal to carry an ice cream cone in your
pocket.

~~~


pgp46CeY9IZzI.pgp
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regexp problem with `('

2007-03-22 Thread Bruno Desthuilliers
Johny a écrit :
 I have  the following text
 
 titleGoods Item  146 (174459989)  - OurWebSite/title
 
 from which I need to extract
 `Goods Item  146 '
 
 Can anyone help with regexp?

Sure : the documentation is here:
http://docs.python.org/lib/module-re.html

And there's a nice tutorial here:
http://www.amk.ca/python/howto/regex/

Read all this, try to solve your problem, and come back with what you've 
done so far if you need more help.

 Thank you for help

You're welcome.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem with `('

2007-03-22 Thread Paul McGuire
On Mar 22, 3:26 am, Johny [EMAIL PROTECTED] wrote:
 I have  the following text

 titleGoods Item  146 (174459989)  - OurWebSite/title

 from which I need to extract
 `Goods Item  146 '

 Can anyone help with regexp?
 Thank you for help
 L.
Here's the immediate answer to your question.


import re
src = titleGoods Item  146 (174459989)  - OurWebSite/title
pattern = rtitle(.*)\(
re.search(pattern,src).groups()[0]


I post it this way so that you can relate the re to your specific
question, and then maybe apply this to whatever else you are scraping
from this web page.

Please don't follow up with a post asking how to extract 45,Rubber
chicken from trtd45/tdtdRubber chicken/td/tr.  At this
point, you should try a little experimentation on your own.

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem with `('

2007-03-22 Thread John Nagle
Johny wrote:
 I have  the following text
 
 titleGoods Item  146 (174459989)  - OurWebSite/title
 
 from which I need to extract
 `Goods Item  146 '
 
 Can anyone help with regexp?
 Thank you for help
 L.

In general, parsing HTML with regular expressions is a bad idea.
Usually, you use something like BeautifulSoup to parse the HTML,
extract the desired field, like the contents of title, then
work on that.

If you try to do this line by line with regular expressions,
it will fail when the line breaks aren't where you expect.  If
you try to do a whole document with regular expressions, other
material such as content in comments can be misrecognized.

 Try something like this:

# Regular expression to extract group before (N)
kreextractitem = re.compile(r'^(.*)\(\d+\))
pagetree = BeautifulSoup.BeautifulSoup(stringcontaininghtml)
titleitem = pagetree.find({'title':True, 'TITLE':True})
if titleitem :
titletext =  .join(atag.findAll(text=True, recursive=True))   
#   Text of TITLE item is now in titletext as a string.
groups = kreextractitem.search(titletext)
if groups :
goodsitem = groups.group(1).strip() 
# goodsitem now contains everything before ()


This approach will work no matter where the line breaks are in the original
HTML.

John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem, which pattern to use in split

2004-12-14 Thread Fredrik Lundh
Hans Almåsbakk wrote:

 Is there a relatively hassle-free way to get the csv module working with
 2.1? The server is running Debian stable/woody, and it also seemed 2.2 can
 coexist with 2.1, when I checked the distro packages, if that is any help.

2.3 and 2.4 can also coexist with 2.1 (use make altinstall to leave python
alone, so if you're using a pure-Python application, upgrading might be a good
idea.

alternatively, the following module (with a slightly different API) should work
under 2.1:

http://www.object-craft.com.au/projects/csv/

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem, which pattern to use in split

2004-12-13 Thread Fredrik Lundh
Hans Almåsbakk wrote:

 These lines are in a csv file exported from excel.

 Any pointer will be greatly appreciated. Maybe I'm attacking this problem
 the wrong way already from the start? (Not that I can see another way
 myself :)

 import csv

http://online.effbot.org/2003_08_01_archive.htm#librarybook-csv-module

/F 



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem, which pattern to use in split

2004-12-13 Thread Matthias Huening
Hans Almåsbakk (14.12.2004 16:02):
Any pointer will be greatly appreciated. Maybe I'm attacking this problem
the wrong way already from the start? (Not that I can see another way
myself :)
Hans, did you try the csv module in the Python library?
Matthias
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regexp problem, which pattern to use in split

2004-12-13 Thread Hans Almåsbakk
Fredrik Lundh [EMAIL PROTECTED] writes:


  import csv
 
 http://online.effbot.org/2003_08_01_archive.htm#librarybook-csv-module
 

This seems be just the thing I need.

Now ofcourse, another problem arouse:
The csv module is new in Python 2.3.

hans:~# python -V
Python 2.1.3

Is there a relatively hassle-free way to get the csv module working with
2.1? The server is running Debian stable/woody, and it also seemed 2.2 can
coexist with 2.1, when I checked the distro packages, if that is any help.

Regards
-- 
Hans Almåsbakk
-remove .invalid for correct email
-- 
http://mail.python.org/mailman/listinfo/python-list