HTMLParser skipping HTML? [newbie]

2012-09-05 Thread BobAalsma
I'm trying to understand the HTMLParser so I've copied some code from 
http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and 
tried that on my LinkedIn page.
No errors, but some of the tags seem to go missing for no apparent reason - any 
advice?
I have searched extensively for this, but seem to be the only one with missing 
data from HTMLParser :(

Code:
import urllib2
from HTMLParser import HTMLParser

from GetHttpFileContents import getHttpFileContents

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print "Start tag:\n\t", tag
for attr in attrs:
print "\t\tattr:", attr
# end for attr in attrs:
#
def handle_endtag(self, tag):
print "End tag :\n\t", tag
#
def handle_data(self, data):
if data != '\n\n':
if data != '\n':
print "Data :\t\t", data
# end if 1
# end if 2
#
#
# -
#
def removeHtmlFromFileContents():
TextOut = ''

parser = MyHTMLParser()

parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read())

return TextOut
#
# -
#
if __name__ == '__main__':
TextOut = removeHtmlFromFileContents()





Part of the output:
End tag :
script
Start tag:
title
Data :  Bob Aalsma - Nederland | LinkedIn
End tag :
title
Start tag:
script
attr: ('type', 'text/javascript')
attr: ('src', 
'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma')
End tag :
script
Start tag:
link
attr: ('rel', 'stylesheet')
attr: ('type', 'text/css')
attr: ('href', 
'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69')
Start tag:
script
attr: ('type', 'text/javascript')
attr: ('src', 
'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo')
End tag :
script
End tag :
head



But the source text for this is [and all of the " seem to go missing:

Bob Aalsma | LinkedIn
https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";>
https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";>






-- 
http://mail.python.org/mailman/listinfo/python-list


Re: HTMLParser skipping HTML? [newbie]

2012-09-05 Thread BobAalsma
Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende:
> I'm trying to understand the HTMLParser so I've copied some code from 
> http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and 
> tried that on my LinkedIn page.
> 
> No errors, but some of the tags seem to go missing for no apparent reason - 
> any advice?
> 
> I have searched extensively for this, but seem to be the only one with 
> missing data from HTMLParser :(
> 
> 
> 
> Code:
> 
> import urllib2
> 
> from HTMLParser import HTMLParser
> 
> 
> 
> from GetHttpFileContents import getHttpFileContents
> 
> 
> 
> # create a subclass and override the handler methods
> 
> class MyHTMLParser(HTMLParser):
> 
>   def handle_starttag(self, tag, attrs):
> 
>   print "Start tag:\n\t", tag
> 
>   for attr in attrs:
> 
>   print "\t\tattr:", attr
> 
>   # end for attr in attrs:
> 
>   #
> 
>   def handle_endtag(self, tag):
> 
>   print "End tag :\n\t", tag
> 
>   #
> 
>   def handle_data(self, data):
> 
>   if data != '\n\n':
> 
>   if data != '\n':
> 
>   print "Data :\t\t", data
> 
>   # end if 1
> 
>   # end if 2
> 
>   #
> 
> #
> 
> # -
> 
> #
> 
> def removeHtmlFromFileContents():
> 
>   TextOut = ''
> 
> 
> 
>   parser = MyHTMLParser()
> 
>   
> parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read())
> 
> 
> 
>   return TextOut
> 
> #
> 
> # -
> 
> #
> 
> if __name__ == '__main__':
> 
>   TextOut = removeHtmlFromFileContents()
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Part of the output:
> 
> End tag :
> 
>   script
> 
> Start tag:
> 
>   title
> 
> Data :Bob Aalsma - Nederland | LinkedIn
> 
> End tag :
> 
>   title
> 
> Start tag:
> 
>   script
> 
>   attr: ('type', 'text/javascript')
> 
>   attr: ('src', 
> 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma')
> 
> End tag :
> 
>   script
> 
> Start tag:
> 
>   link
> 
>   attr: ('rel', 'stylesheet')
> 
>   attr: ('type', 'text/css')
> 
>   attr: ('href', 
> 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69')
> 
> Start tag:
> 
>   script
> 
>   attr: ('type', 'text/javascript')
> 
>   attr: ('src', 
> 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo')
> 
> End tag :
> 
>   script
> 
> End tag :
> 
>   head
> 
> 
> 
> 
> 
> 
> 
> But the source text for this is [and all of the " seem to go 
> missing:
> 
> 
> 
> Bob Aalsma | LinkedIn
> 
>  href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";>
> 
>  href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  content="/profile/view?id=24198692&authType=name&authToken=KhOG">
> 
> 

Hmm, OK, Peter, thanks. I didn't consider the effect of logging in, that could 
certainly be a reason. So how could I have the script log in?

[Didn't understand the bit about the kittens, though. How about that?]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: HTMLParser skipping HTML? [newbie]

2012-09-05 Thread BobAalsma
Op woensdag 5 september 2012 19:23:45 UTC+2 schreef BobAalsma het volgende:
> Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende:
> 
> > I'm trying to understand the HTMLParser so I've copied some code from 
> > http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser 
> > and tried that on my LinkedIn page.
> 
> > 
> 
> > No errors, but some of the tags seem to go missing for no apparent reason - 
> > any advice?
> 
> > 
> 
> > I have searched extensively for this, but seem to be the only one with 
> > missing data from HTMLParser :(
> 
> > 
> 
> > 
> 
> > 
> 
> > Code:
> 
> > 
> 
> > import urllib2
> 
> > 
> 
> > from HTMLParser import HTMLParser
> 
> > 
> 
> > 
> 
> > 
> 
> > from GetHttpFileContents import getHttpFileContents
> 
> > 
> 
> > 
> 
> > 
> 
> > # create a subclass and override the handler methods
> 
> > 
> 
> > class MyHTMLParser(HTMLParser):
> 
> > 
> 
> > def handle_starttag(self, tag, attrs):
> 
> > 
> 
> > print "Start tag:\n\t", tag
> 
> > 
> 
> > for attr in attrs:
> 
> > 
> 
> > print "\t\tattr:", attr
> 
> > 
> 
> > # end for attr in attrs:
> 
> > 
> 
> > #
> 
> > 
> 
> > def handle_endtag(self, tag):
> 
> > 
> 
> > print "End tag :\n\t", tag
> 
> > 
> 
> > #
> 
> > 
> 
> > def handle_data(self, data):
> 
> > 
> 
> > if data != '\n\n':
> 
> > 
> 
> > if data != '\n':
> 
> > 
> 
> > print "Data :\t\t", data
> 
> > 
> 
> > # end if 1
> 
> > 
> 
> > # end if 2
> 
> > 
> 
> > #
> 
> > 
> 
> > #
> 
> > 
> 
> > # -
> 
> > 
> 
> > #
> 
> > 
> 
> > def removeHtmlFromFileContents():
> 
> > 
> 
> > TextOut = ''
> 
> > 
> 
> > 
> 
> > 
> 
> > parser = MyHTMLParser()
> 
> > 
> 
> > 
> > parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read())
> 
> > 
> 
> > 
> 
> > 
> 
> > return TextOut
> 
> > 
> 
> > #
> 
> > 
> 
> > # -
> 
> > 
> 
> > #
> 
> > 
> 
> > if __name__ == '__main__':
> 
> > 
> 
> > TextOut = removeHtmlFromFileContents()
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > Part of the output:
> 
> > 
> 
> > End tag :
> 
> > 
> 
> > script
> 
> > 
> 
> > Start tag:
> 
> > 
> 
> > title
> 
> > 
> 
> > Data :  Bob Aalsma - Nederland | LinkedIn
> 
> > 
> 
> > End tag :
> 
> > 
> 
> > title
> 
> > 
> 
> > Start tag:
> 
> > 
> 
> > script
> 
> > 
> 
> > attr: ('type', 'text/javascript')
> 
> > 
> 
> > attr: ('src', 
> > 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma')
> 
> > 
> 
> > End tag :
> 
> > 
> 
> > script
> 
> > 
> 
> > Start tag:
> 
> > 
> 
> > link
> 
> > 
> 
> > attr: ('rel', 'stylesheet')
> 
> > 
> 
> > attr: ('type', 'text/css')
> 
> > 
> 
> > attr: ('href', 
> > 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69')
> 
> > 
> 
> > Start tag:
> 
> > 
> 
> > script
> 
> > 
> 
> > attr: ('type', 'text/javascript')
> 
> > 
> 
> > attr: ('src', 
> > 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo')
> 
> > 
> 
> > End tag :
> 
> > 
> 
> > script
> 
> > 
> 
> > End tag :
> 
> > 
> 
> > head
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > But the source text for this is [and all of the " seem to go 
> > missing:
> 
> > 
> 
> > 
> 
> > 
> 
> > Bob Aalsma | LinkedIn
> 
> > 
> 
> >  > href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";>
> 
> > 
> 
> >  > href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";>
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> >  > content="/profile/view?id=24198692&authType=name&authToken=KhOG">
> 
> > 
> 
> > 
> 
> 
> 
> Hmm, OK, Peter, thanks. I didn't consider the effect of logging in, that 
> could certainly be a reason. So how could I have the script log in?
> 
> 
> 
> [Didn't understand the bit about the kittens, though. How about that?]

Oops, sorry, found that bit about logging in - asked too soon; still wonder 
about the kittens ;)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: HTMLParser skipping HTML? [newbie]

2012-09-06 Thread BobAalsma
Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende:
> I'm trying to understand the HTMLParser so I've copied some code from 
> http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and 
> tried that on my LinkedIn page.
> 
> No errors, but some of the tags seem to go missing for no apparent reason - 
> any advice?
> 
> I have searched extensively for this, but seem to be the only one with 
> missing data from HTMLParser :(
> 
> 
> 
> Code:
> 
> import urllib2
> 
> from HTMLParser import HTMLParser
> 
> 
> 
> from GetHttpFileContents import getHttpFileContents
> 
> 
> 
> # create a subclass and override the handler methods
> 
> class MyHTMLParser(HTMLParser):
> 
>   def handle_starttag(self, tag, attrs):
> 
>   print "Start tag:\n\t", tag
> 
>   for attr in attrs:
> 
>   print "\t\tattr:", attr
> 
>   # end for attr in attrs:
> 
>   #
> 
>   def handle_endtag(self, tag):
> 
>   print "End tag :\n\t", tag
> 
>   #
> 
>   def handle_data(self, data):
> 
>   if data != '\n\n':
> 
>   if data != '\n':
> 
>   print "Data :\t\t", data
> 
>   # end if 1
> 
>   # end if 2
> 
>   #
> 
> #
> 
> # -
> 
> #
> 
> def removeHtmlFromFileContents():
> 
>   TextOut = ''
> 
> 
> 
>   parser = MyHTMLParser()
> 
>   
> parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read())
> 
> 
> 
>   return TextOut
> 
> #
> 
> # -
> 
> #
> 
> if __name__ == '__main__':
> 
>   TextOut = removeHtmlFromFileContents()
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Part of the output:
> 
> End tag :
> 
>   script
> 
> Start tag:
> 
>   title
> 
> Data :Bob Aalsma - Nederland | LinkedIn
> 
> End tag :
> 
>   title
> 
> Start tag:
> 
>   script
> 
>   attr: ('type', 'text/javascript')
> 
>   attr: ('src', 
> 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma')
> 
> End tag :
> 
>   script
> 
> Start tag:
> 
>   link
> 
>   attr: ('rel', 'stylesheet')
> 
>   attr: ('type', 'text/css')
> 
>   attr: ('href', 
> 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69')
> 
> Start tag:
> 
>   script
> 
>   attr: ('type', 'text/javascript')
> 
>   attr: ('src', 
> 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo')
> 
> End tag :
> 
>   script
> 
> End tag :
> 
>   head
> 
> 
> 
> 
> 
> 
> 
> But the source text for this is [and all of the " seem to go 
> missing:
> 
> 
> 
> Bob Aalsma | LinkedIn
> 
>  href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";>
> 
>  href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  content="/profile/view?id=24198692&authType=name&authToken=KhOG">
> 
> 

No offense and thanks for the reminder.
My background is software packages in 3GL, where different platforms mean 
different editors which mean it is sometimes difficult to recognize the end of 
blocks, especially when nested.
No need for that here, no.
I think it also means I'm still not really satisfied with my commenting in 
Python...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: HTMLParser skipping HTML? [newbie]

2012-09-06 Thread BobAalsma
Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende:
> I'm trying to understand the HTMLParser so I've copied some code from 
> http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and 
> tried that on my LinkedIn page.
> 
> No errors, but some of the tags seem to go missing for no apparent reason - 
> any advice?
> 
> I have searched extensively for this, but seem to be the only one with 
> missing data from HTMLParser :(
> 
> 
> 
> Code:
> 
> import urllib2
> 
> from HTMLParser import HTMLParser
> 
> 
> 
> from GetHttpFileContents import getHttpFileContents
> 
> 
> 
> # create a subclass and override the handler methods
> 
> class MyHTMLParser(HTMLParser):
> 
>   def handle_starttag(self, tag, attrs):
> 
>   print "Start tag:\n\t", tag
> 
>   for attr in attrs:
> 
>   print "\t\tattr:", attr
> 
>   # end for attr in attrs:
> 
>   #
> 
>   def handle_endtag(self, tag):
> 
>   print "End tag :\n\t", tag
> 
>   #
> 
>   def handle_data(self, data):
> 
>   if data != '\n\n':
> 
>   if data != '\n':
> 
>   print "Data :\t\t", data
> 
>   # end if 1
> 
>   # end if 2
> 
>   #
> 
> #
> 
> # -
> 
> #
> 
> def removeHtmlFromFileContents():
> 
>   TextOut = ''
> 
> 
> 
>   parser = MyHTMLParser()
> 
>   
> parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read())
> 
> 
> 
>   return TextOut
> 
> #
> 
> # -
> 
> #
> 
> if __name__ == '__main__':
> 
>   TextOut = removeHtmlFromFileContents()
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Part of the output:
> 
> End tag :
> 
>   script
> 
> Start tag:
> 
>   title
> 
> Data :Bob Aalsma - Nederland | LinkedIn
> 
> End tag :
> 
>   title
> 
> Start tag:
> 
>   script
> 
>   attr: ('type', 'text/javascript')
> 
>   attr: ('src', 
> 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma')
> 
> End tag :
> 
>   script
> 
> Start tag:
> 
>   link
> 
>   attr: ('rel', 'stylesheet')
> 
>   attr: ('type', 'text/css')
> 
>   attr: ('href', 
> 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69')
> 
> Start tag:
> 
>   script
> 
>   attr: ('type', 'text/javascript')
> 
>   attr: ('src', 
> 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo')
> 
> End tag :
> 
>   script
> 
> End tag :
> 
>   head
> 
> 
> 
> 
> 
> 
> 
> But the source text for this is [and all of the " seem to go 
> missing:
> 
> 
> 
> Bob Aalsma | LinkedIn
> 
>  href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";>
> 
>  href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";>
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  content="/profile/view?id=24198692&authType=name&authToken=KhOG">
> 
> 

I can see that my Tester is not logging in: the reply from the site reads 
"Sign In | LinkedIn" rather than "Bob Aalsma | 
LinkedIn".
How can I tell which part is not correct?
-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie: where's the new python gone?

2012-09-09 Thread BobAalsma
I think I've installed Python 2.7.3 according to the instructions in the 
README, and now want to use that version. 
However, when typing "python" in Terminal, I get "Python 2.6.4 (r264:75821M, 
Oct 27 2009, 19:48:32) ".
So:
(1) I can't seem to find where the new software has gone and 
(2) can't seem to find how to point to this new versoin.
I've searched Python.org and with Google but :(
[I'm on Mac OS X 10.7.4]

Please help.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie: where's the new python gone?

2012-09-09 Thread BobAalsma
Op zondag 9 september 2012 16:28:55 UTC+2 schreef BobAalsma het volgende:
> I think I've installed Python 2.7.3 according to the instructions in the 
> README, and now want to use that version. 
> 
> However, when typing "python" in Terminal, I get "Python 2.6.4 (r264:75821M, 
> Oct 27 2009, 19:48:32) ".
> 
> So:
> 
> (1) I can't seem to find where the new software has gone and 
> 
> (2) can't seem to find how to point to this new versoin.
> 
> I've searched Python.org and with Google but :(
> 
> [I'm on Mac OS X 10.7.4]
> 
> 
> 
> Please help.

Thanks Steven!

Most of what you wrote made very good sense, yes.

Umm, I didn't usa altinstall - should I (and can I) go back? [In hindsight I do 
like your solution to the versopns a lot more, yes]

Umm2, as said, I think I've installed (at least downloaded) 2.7.3 (note the 
three there) and with "python2.7" I now see "Python 2.7.1 (r271:86832, Jun 16 
2011, 16:59:05)"
-- 
http://mail.python.org/mailman/listinfo/python-list


How to apply the user's HTML environment in a Python programme?

2012-09-21 Thread BobAalsma
I'd like to write a programme that will be offered as a web service (Django), 
in which the user will point to a specific URL and the programme will be used 
to read the text of that URL.

This text can be behind a username/password, but for several reasons, I don't 
want to know those. 

So I would like to set up a situation where the user logs in (if/when 
appropriate), points out the URL to my programme and my programme would then be 
able to read that particular text.

I'm aware this may sound fishy. It should not be: I want the user to be fully 
aware and in control of this process.

Any thoughts on how to approach this?

Best regards,
Bob
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to apply the user's HTML environment in a Python programme?

2012-09-21 Thread BobAalsma
Op vrijdag 21 september 2012 15:23:14 UTC+2 schreef Joel Goldstick het volgende:
> On Fri, Sep 21, 2012 at 8:57 AM, BobAalsma wrote:
> 
> > I'd like to write a programme that will be offered as a web service 
> > (Django), in which the user will point to a specific URL and the programme 
> > will be used to read the text of that URL.
> 
> >
> 
> > This text can be behind a username/password, but for several reasons, I 
> > don't want to know those.
> 
> >
> 
> > So I would like to set up a situation where the user logs in (if/when 
> > appropriate), points out the URL to my programme and my programme would 
> > then be able to read that particular text.
> 
> >
> 
> > I'm aware this may sound fishy. It should not be: I want the user to be 
> > fully aware and in control of this process.
> 
> >
> 
> > Any thoughts on how to approach this?
> 
> 
> 
> There are several python modules to get web pages.  urllib, urllib2
> 
> and another called requests.
> 
> (http://kennethreitz.com/requests-python-http-module.html)  Check
> 
> those out
> 
> >
> 
> > Best regards,
> 
> > Bob
> 
> > --
> 
> > http://mail.python.org/mailman/listinfo/python-list
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Joel Goldstick

Thanks, Joel, yes, but as far as I'm aware these would all require the Python 
programme to have the user's username and password (or "credentials"), which I 
wanted to avoid.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to apply the user's HTML environment in a Python programme?

2012-09-21 Thread BobAalsma
Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende:
> On Fri, Sep 21, 2012 at 9:31 AM, BobAalsma wrote:
> 
> > Thanks, Joel, yes, but as far as I'm aware these would all require the 
> > Python programme to have the user's username and password (or 
> > "credentials"), which I wanted to avoid.
> 
> 
> 
> No matter what you do, your web service is going to have to
> 
> authenticate with the remote web site.  The details of that
> 
> authentication are going to vary with each remote web site you want to
> 
> connect to.
> 
> 
> 
> -- 
> 
> Jerry

Hmm, from the previous posts I get the impression that I could best solve this 
by asking the user for the specific combination of username, password and URL + 
promising not to keep any of that...

OK, that does sound doable - thank you all

Bob
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to apply the user's HTML environment in a Python programme?

2012-09-22 Thread BobAalsma
Op vrijdag 21 september 2012 17:28:02 UTC+2 schreef David Smith het volgende:
> On 2012-09-21 08:57, BobAalsma wrote:
> 
> > This text can be behind a username/password, but for several reasons, I 
> > don't want to know those.
> 
> >
> 
> > So I would like to set up a situation where the user logs in (if/when 
> > appropriate), points out the URL to my programme and my programme would 
> > then be able to read that particular text.
> 
> I do this from a bat file that I will later translate to Python.
> 
> I tell my work wiki which file I want. I use chrome, so for every new 
> 
> session I'm asked for my credentials. However, that is all transparent 
> 
> to my bat file.
> 
> 
> 
> For that matter, when I download a new build from part of another bat 
> 
> file, I use Firefox and never see the credential exchange.
> 
> 
> 
> I wouldn't expect any different behavior using Python.

Umm, David, sorry, you've lost me but I think this could be a good solution - 
at least the division in client side/server side sounds like what I'm looking 
for. Could you please elaborate?

Bob
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to apply the user's HTML environment in a Python programme?

2012-09-22 Thread BobAalsma
Op vrijdag 21 september 2012 22:10:04 UTC+2 schreef Dennis Lee Bieber het 
volgende:
> On Fri, 21 Sep 2012 09:36:08 -0400, Jerry Hill
> 
> declaimed the following in gmane.comp.python.general:
> 
> 
> 
> > On Fri, Sep 21, 2012 at 9:31 AM, BobAalsma wrote:
> 
> > > Thanks, Joel, yes, but as far as I'm aware these would all require the 
> > > Python programme to have the user's username and password (or 
> > > "credentials"), which I wanted to avoid.
> 
> > 
> 
> > No matter what you do, your web service is going to have to
> 
> > authenticate with the remote web site.  The details of that
> 
> > authentication are going to vary with each remote web site you want to
> 
> > connect to.
> 
> 
> 
>   Hmmm, convoluted but presuming the "login" third party site uses
> 
> cookies... Would it be possible to use Javascript on the client "copy"
> 
> the HTML from the third-party and then transmit it to the application
> 
> rather than having the application trying to do a direct fetch given
> 
> just the URL?
> 
> 
> 
>   This should keep the authentication local to the client machine.
> 
> 
> 
> 
> 
> -- 
> 
>   Wulfraed Dennis Lee Bieber AF6VN
> 
> wlfr...@comHTTP://wlfraed.home.netcom.com/

Wulfraed, yes, as with David's proposal: this sounds good, but I wouldn't know 
the first thing about Javascript... 
I'm also concerned that both solutions would seem to imply distributing 
software (or "software") to the clients systems.
Hmm.

Bob
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to apply the user's HTML environment in a Python programme?

2012-10-01 Thread BobAalsma
Op vrijdag 21 september 2012 16:15:30 UTC+2 schreef Joel Goldstick het volgende:
> On Fri, Sep 21, 2012 at 9:58 AM, BobAalsma wrote:
> 
> > Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende:
> 
> >> On Fri, Sep 21, 2012 at 9:31 AM, BobAalsma wrote:
> 
> >>
> 
> >> > Thanks, Joel, yes, but as far as I'm aware these would all require the 
> >> > Python programme to have the user's username and password (or 
> >> > "credentials"), which I wanted to avoid.
> 
> >>
> 
> >>
> 
> >>
> 
> >> No matter what you do, your web service is going to have to
> 
> >>
> 
> >> authenticate with the remote web site.  The details of that
> 
> >>
> 
> >> authentication are going to vary with each remote web site you want to
> 
> >>
> 
> >> connect to.
> 
> >>
> 
> >>
> 
> >>
> 
> >> --
> 
> >>
> 
> >> Jerry
> 
> >
> 
> > Hmm, from the previous posts I get the impression that I could best solve 
> > this by asking the user for the specific combination of username, password 
> > and URL + promising not to keep any of that...
> 
> >
> 
> > OK, that does sound doable - thank you all
> 
> 
> 
> 
> 
> I recommend that you write your program to read pages that are not
> 
> protected.  Once you get that working, you can go back and figure out
> 
> how you want to get the username/password from your 'friends' and add
> 
> that in.  Also look up Beautiful Soup (version 4) for a great library
> 
> to parse the pages that you retrieve
> 
> >
> 
> > Bob
> 
> > --
> 
> > http://mail.python.org/mailman/listinfo/python-list
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Joel Goldstick

Joel, 

I've spent some time with this but don't really understand my results - some 
help would be appreciated.
I've built a tester that will read my LinkedIn home page, which is password 
protected.
When I use that method for reading other people's pages, the program is 
redirected to the LinkedIn login page.
When I paste the URLs for the other people's pages in any browser, the 
requested pages are shown.

Bob
-- 
http://mail.python.org/mailman/listinfo/python-list


fork seems to make urlopen into a black hole?

2014-01-14 Thread BobAalsma
A program took much too long to check some texts collected from web pages.
As this could be made parallel easily, I put in fork.
And the result seems to be that the program simply stops in the line with 
urlopen. Any suggestions?

Relevant part:

try:
print 'urlopen by', kind_nummer, '- before'
extern_adres = urlopen(adres).geturl()
print 'urlopen by', kind_nummer, '- after'
except IOError:
tekst_string = 'URL "' + adres +'" geeft IO Error'
print tekst_string, 'door', kind_nummer
bloklijst_verslag.append(tekst_string + '\n')
tekst_string = '\t Deze pagina wordt overgeslagen\n'
bloklijst_verslag.append(tekst_string + '\n')
adres = ''
except:
fout = sys.exc_info()[:2]
tekst_string = 'URL "' + adres + '" geeft fout ' + fout
print tekst_string, 'door', kind_nummer
bloklijst_verslag.append(tekst_string + '\n')
tekst_string = '\t Deze pagina wordt overgeslagen\n'
bloklijst_verslag.append(tekst_string + '\n')
adres = ''
else:
print 'urlopen by', kind_nummer, '- else'
if extern_adres != adres:
tekst_string = '\t redirect naar ' + extern_adres
bloklijst_verslag.append(tekst_string + '\n')

>From this block, the only response seen is the first print statement ("url 
>open by  - before") and then nothing else seems to happen in that child.
This happens for all children: the expected number of children is reached and 
they all reach (only) this point.
-- 
https://mail.python.org/mailman/listinfo/python-list


Newbie: installation difficulties [webapp2 / babel]

2013-10-07 Thread BobAalsma
Hi,

I'm following webapp2 documentation (release 2.1).

I made a mistake in following the text. 
I typed "pip install babel" and this led to errors in the installation.
As that user is not in sudo list, I changed users, typed "sudo pip install 
babel" and everything seemed right.

Further on, the manual says "$ pybabel " and when I use that command, the 
response is "-bash: pybabel: command not found" (in both user environments).

I'd like to solve this and learn at the same time ;)

I'm on OS X 10.8.5, python 2.7.5.

I found a "babel" in /Library/Python/2.7/site-packages.
I found a "babel" in the response to "python -c "help('modules')" | grep babel".
>From which I conclude that babel is on the machine and is known to python.

However, I gather from the webapp2 documentation that babel should also be 
known to the shell.
I don't yet know how to check this, nor how to repair.

find /usr -name '*babel*' 
and
find /bin -name '*babel*' 
return no values, but
find /Library -name '*babel*' 
does (as expected).

I've tried to find similar situations in documentation, forums, Google but 
found nothing helpful.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie: installation difficulties [webapp2 / babel]

2013-10-07 Thread BobAalsma
Well Joel, umm, I'm not sure if I understand you correctly. 

$ python babel
/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python:
 can't open file 'babel': [Errno 2] No such file or directory

And


$ python
Python 2.7.5 (v2.7.5:ab05e7dd2788, May 13 2013, 13:18:45) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import babel
>>> 

Does this help?

Bob
-- 
https://mail.python.org/mailman/listinfo/python-list


Newbie question: python versions differ per user?

2010-03-08 Thread BobAalsma
I'm on Mac OS X 10.5.8 and downloaded 2.6.4 Mac Installer Disk Image
as/in(?) the sys admin user. For this user Pyhton 2.6.4 is now the
current version.
I want to use Python outside the sys asdmin user. However, all other
users still use Python 2.5.1 (Apple delivered).

The sys admin user looks in /Library/Frameworks/Python.framework/
Versions/2.6/lib/...
The other users look in/System/Library/Frameworks/
Python.framework/Version/2.5/lib/...

I could not find any questions on this matter, so am I the only one?
Did I do something wrong?
I assumed the paths for all users would be modified - too easy?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie question: python versions differ per user?

2010-03-09 Thread BobAalsma
On Mar 8, 8:15 pm, BobAalsma  wrote:
> I'm on Mac OS X 10.5.8 and downloaded 2.6.4 Mac Installer Disk Image
> as/in(?) the sys admin user. For this user Pyhton 2.6.4 is now the
> current version.
> I want to use Python outside the sys asdmin user. However, all other
> users still use Python 2.5.1 (Apple delivered).
>
> The sys admin user looks in /Library/Frameworks/Python.framework/
> Versions/2.6/lib/...
> The other users look in        /System/Library/Frameworks/
> Python.framework/Version/2.5/lib/...
>
> I could not find any questions on this matter, so am I the only one?
> Did I do something wrong?
> I assumed the paths for all users would be modified - too easy?

OK, sorry, found out.
-- 
http://mail.python.org/mailman/listinfo/python-list


newbie problem with str.replace

2010-08-04 Thread BobAalsma
I'm working on a set of scripts and I can't get a replace to work in
the script - please help.

The scripts show no errors, work properly apart from the replace, all
variables are filled as expected, the scripts works properly when the
commands are copied to the Python shell.

Text Main:
..
from LeadDevice_klant_nieuw_bewerken_bestanden import omkattenBoom,
omkattenNaam
...
#
# (3) omkatten bestandsnamen:
#
print 'Omkatten bestandsnamen'
omkattenNaam(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN,
ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN)
#



Text LeadDevice_klant_nieuw_bewerken_bestanden:
import os
import distutils.core

...

def omkattenNaam(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN,
ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN):
#
# Strings opbouwen voordat aan de lussen gewerkt wordt:
#
beginpunt = AANGRIJPINGSPUNT + KLANTNAAM_IN + '/'
#
ZOEKSET1_OUT_LOWER = ZOEKSET1_OUT.lower()
ZOEKSET1_IN_LOWER = ZOEKSET1_IN.lower()
ZOEKSET2_OUT_LOWER = ZOEKSET2_OUT.lower()
ZOEKSET2_IN_LOWER = ZOEKSET2_IN.lower()
#
# Lussen:
#
for root, dirs, files in os.walk(beginpunt, topdown = False):
for bestandsnaam in dirs and files:
#
bestandsnaam_nieuw = bestandsnaam
bestandsnaam_nieuw.replace(KLANTNAAM_OUT,KLANTNAAM_IN)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newbie problem with str.replace

2010-08-04 Thread BobAalsma
On Aug 4, 3:22 pm, Anthony Tolle  wrote:
> On Aug 4, 9:10 am, BobAalsma  wrote:
>
> >                         #
> >                         bestandsnaam_nieuw = bestandsnaam
> >                         
> > bestandsnaam_nieuw.replace(KLANTNAAM_OUT,KLANTNAAM_IN)
>
> The replace method does not modify the string (strings are immutable).
>
> You need to use the retun value of the method in an assignment, like
> so:
>
> bestandsnaam_nieuw = bestandsnaam.replace(KLANTNAAM_OUT,KLANTNAAM_IN)
>
> This will not change the value of bestandsnaam

YESS!

Thanks, this is what I wanted to achieve but could not find

Regards,
Bob
-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie: plist & control characters?

2010-09-21 Thread BobAalsma
I'm trying to modify a plist file. The modification works properly,
but I'm having difficulties in finding the proper way to restore.

The file contains HTML strings like "$#226;" and either this gets
replaced by "â" (which I don't want) but the programme completes or
the program fails when I try to use the data wrapper.

I must be doing something wrong, but can't find what (and this
includes Google searches). Please help.

Failing text:
def omkattenAgents(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN,
ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN):


tussentekst =
plistlib.Data(plistlib.readPlist(os.path.join(root,bestandsnaam)))
tussenblok = tussentekst.data
ding = tussenblok['RepresentedObject']
tekststring = ding.get('Name')

tekststring_0 = tekststring.replace(KLANTNAAM_OUT,KLANTNAAM_IN)
tekststring_1 = tekststring_0.replace(ZOEKSET1_OUT,ZOEKSET1_IN)
tekststring_2 = tekststring_1.replace(ZOEKSET2_OUT,ZOEKSET2_IN)
tekststring_3 =
tekststring_2.replace(ZOEKSET1_OUT_LOWER,ZOEKSET1_IN_LOWER)
tekststring_4 =
tekststring_3.replace(ZOEKSET2_OUT_LOWER,ZOEKSET2_IN_LOWER)

ding['Name'] = tekststring_4
tussenblok['RepresentedObject'] = ding
tussentekst.data = tussenblok

plistlib.Data(plistlib.writePlist(tussentekst,
os.path.join(root,bestandsnaam)))




Text in Terminal:
  File "LeadDevice_klant_nieuw_aanmaken.py", line 66, in 
omkattenAgents(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN,
ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN)
  File "/Volumes/LeadDevice-2/LeadDevice/Programmatuur/Python/
LeadDeviceProductie/LeadDevice_klant_nieuw_naamcorrectie_intern.py",
line 162, in omkattenAgents
plistlib.Data(plistlib.writePlist(tussentekst,
os.path.join(root,bestandsnaam)))
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/plistlib.py", line 94, in writePlist
writer.writeValue(rootObject)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/plistlib.py", line 254, in writeValue
self.writeData(value)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/plistlib.py", line 267, in writeData
for line in data.asBase64(maxlinelength).split("\n"):
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/plistlib.py", line 379, in asBase64
return _encodeBase64(self.data, maxlinelength)
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/plistlib.py", line 361, in _encodeBase64
chunk = s[i : i + maxbinsize]
TypeError: unhashable type


The difference between failure and completion with replacing HTML is
the statement
tussentekst.data = tussenblok (failure)
tussentekst = tussenblok (completion with replacing)

Regards,
Bob
-- 
http://mail.python.org/mailman/listinfo/python-list