HTMLParser skipping HTML? [newbie]
I'm trying to understand the HTMLParser so I've copied some code from http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and tried that on my LinkedIn page. No errors, but some of the tags seem to go missing for no apparent reason - any advice? I have searched extensively for this, but seem to be the only one with missing data from HTMLParser :( Code: import urllib2 from HTMLParser import HTMLParser from GetHttpFileContents import getHttpFileContents # create a subclass and override the handler methods class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print "Start tag:\n\t", tag for attr in attrs: print "\t\tattr:", attr # end for attr in attrs: # def handle_endtag(self, tag): print "End tag :\n\t", tag # def handle_data(self, data): if data != '\n\n': if data != '\n': print "Data :\t\t", data # end if 1 # end if 2 # # # - # def removeHtmlFromFileContents(): TextOut = '' parser = MyHTMLParser() parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read()) return TextOut # # - # if __name__ == '__main__': TextOut = removeHtmlFromFileContents() Part of the output: End tag : script Start tag: title Data : Bob Aalsma - Nederland | LinkedIn End tag : title Start tag: script attr: ('type', 'text/javascript') attr: ('src', 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma') End tag : script Start tag: link attr: ('rel', 'stylesheet') attr: ('type', 'text/css') attr: ('href', 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69') Start tag: script attr: ('type', 'text/javascript') attr: ('src', 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo') End tag : script End tag : head But the source text for this is [and all of the " seem to go missing: Bob Aalsma | LinkedIn https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";> https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";> -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser skipping HTML? [newbie]
Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende: > I'm trying to understand the HTMLParser so I've copied some code from > http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and > tried that on my LinkedIn page. > > No errors, but some of the tags seem to go missing for no apparent reason - > any advice? > > I have searched extensively for this, but seem to be the only one with > missing data from HTMLParser :( > > > > Code: > > import urllib2 > > from HTMLParser import HTMLParser > > > > from GetHttpFileContents import getHttpFileContents > > > > # create a subclass and override the handler methods > > class MyHTMLParser(HTMLParser): > > def handle_starttag(self, tag, attrs): > > print "Start tag:\n\t", tag > > for attr in attrs: > > print "\t\tattr:", attr > > # end for attr in attrs: > > # > > def handle_endtag(self, tag): > > print "End tag :\n\t", tag > > # > > def handle_data(self, data): > > if data != '\n\n': > > if data != '\n': > > print "Data :\t\t", data > > # end if 1 > > # end if 2 > > # > > # > > # - > > # > > def removeHtmlFromFileContents(): > > TextOut = '' > > > > parser = MyHTMLParser() > > > parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read()) > > > > return TextOut > > # > > # - > > # > > if __name__ == '__main__': > > TextOut = removeHtmlFromFileContents() > > > > > > > > > > > > Part of the output: > > End tag : > > script > > Start tag: > > title > > Data :Bob Aalsma - Nederland | LinkedIn > > End tag : > > title > > Start tag: > > script > > attr: ('type', 'text/javascript') > > attr: ('src', > 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma') > > End tag : > > script > > Start tag: > > link > > attr: ('rel', 'stylesheet') > > attr: ('type', 'text/css') > > attr: ('href', > 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69') > > Start tag: > > script > > attr: ('type', 'text/javascript') > > attr: ('src', > 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo') > > End tag : > > script > > End tag : > > head > > > > > > > > But the source text for this is [and all of the " seem to go > missing: > > > > Bob Aalsma | LinkedIn > > href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";> > > href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";> > > > > > > > > > > content="/profile/view?id=24198692&authType=name&authToken=KhOG"> > > Hmm, OK, Peter, thanks. I didn't consider the effect of logging in, that could certainly be a reason. So how could I have the script log in? [Didn't understand the bit about the kittens, though. How about that?] -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser skipping HTML? [newbie]
Op woensdag 5 september 2012 19:23:45 UTC+2 schreef BobAalsma het volgende: > Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende: > > > I'm trying to understand the HTMLParser so I've copied some code from > > http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser > > and tried that on my LinkedIn page. > > > > > > No errors, but some of the tags seem to go missing for no apparent reason - > > any advice? > > > > > > I have searched extensively for this, but seem to be the only one with > > missing data from HTMLParser :( > > > > > > > > > > > > Code: > > > > > > import urllib2 > > > > > > from HTMLParser import HTMLParser > > > > > > > > > > > > from GetHttpFileContents import getHttpFileContents > > > > > > > > > > > > # create a subclass and override the handler methods > > > > > > class MyHTMLParser(HTMLParser): > > > > > > def handle_starttag(self, tag, attrs): > > > > > > print "Start tag:\n\t", tag > > > > > > for attr in attrs: > > > > > > print "\t\tattr:", attr > > > > > > # end for attr in attrs: > > > > > > # > > > > > > def handle_endtag(self, tag): > > > > > > print "End tag :\n\t", tag > > > > > > # > > > > > > def handle_data(self, data): > > > > > > if data != '\n\n': > > > > > > if data != '\n': > > > > > > print "Data :\t\t", data > > > > > > # end if 1 > > > > > > # end if 2 > > > > > > # > > > > > > # > > > > > > # - > > > > > > # > > > > > > def removeHtmlFromFileContents(): > > > > > > TextOut = '' > > > > > > > > > > > > parser = MyHTMLParser() > > > > > > > > parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read()) > > > > > > > > > > > > return TextOut > > > > > > # > > > > > > # - > > > > > > # > > > > > > if __name__ == '__main__': > > > > > > TextOut = removeHtmlFromFileContents() > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Part of the output: > > > > > > End tag : > > > > > > script > > > > > > Start tag: > > > > > > title > > > > > > Data : Bob Aalsma - Nederland | LinkedIn > > > > > > End tag : > > > > > > title > > > > > > Start tag: > > > > > > script > > > > > > attr: ('type', 'text/javascript') > > > > > > attr: ('src', > > 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma') > > > > > > End tag : > > > > > > script > > > > > > Start tag: > > > > > > link > > > > > > attr: ('rel', 'stylesheet') > > > > > > attr: ('type', 'text/css') > > > > > > attr: ('href', > > 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69') > > > > > > Start tag: > > > > > > script > > > > > > attr: ('type', 'text/javascript') > > > > > > attr: ('src', > > 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo') > > > > > > End tag : > > > > > > script > > > > > > End tag : > > > > > > head > > > > > > > > > > > > > > > > > > > > > > > > But the source text for this is [and all of the " seem to go > > missing: > > > > > > > > > > > > Bob Aalsma | LinkedIn > > > > > > > href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";> > > > > > > > href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > content="/profile/view?id=24198692&authType=name&authToken=KhOG"> > > > > > > > > > > Hmm, OK, Peter, thanks. I didn't consider the effect of logging in, that > could certainly be a reason. So how could I have the script log in? > > > > [Didn't understand the bit about the kittens, though. How about that?] Oops, sorry, found that bit about logging in - asked too soon; still wonder about the kittens ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser skipping HTML? [newbie]
Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende: > I'm trying to understand the HTMLParser so I've copied some code from > http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and > tried that on my LinkedIn page. > > No errors, but some of the tags seem to go missing for no apparent reason - > any advice? > > I have searched extensively for this, but seem to be the only one with > missing data from HTMLParser :( > > > > Code: > > import urllib2 > > from HTMLParser import HTMLParser > > > > from GetHttpFileContents import getHttpFileContents > > > > # create a subclass and override the handler methods > > class MyHTMLParser(HTMLParser): > > def handle_starttag(self, tag, attrs): > > print "Start tag:\n\t", tag > > for attr in attrs: > > print "\t\tattr:", attr > > # end for attr in attrs: > > # > > def handle_endtag(self, tag): > > print "End tag :\n\t", tag > > # > > def handle_data(self, data): > > if data != '\n\n': > > if data != '\n': > > print "Data :\t\t", data > > # end if 1 > > # end if 2 > > # > > # > > # - > > # > > def removeHtmlFromFileContents(): > > TextOut = '' > > > > parser = MyHTMLParser() > > > parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read()) > > > > return TextOut > > # > > # - > > # > > if __name__ == '__main__': > > TextOut = removeHtmlFromFileContents() > > > > > > > > > > > > Part of the output: > > End tag : > > script > > Start tag: > > title > > Data :Bob Aalsma - Nederland | LinkedIn > > End tag : > > title > > Start tag: > > script > > attr: ('type', 'text/javascript') > > attr: ('src', > 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma') > > End tag : > > script > > Start tag: > > link > > attr: ('rel', 'stylesheet') > > attr: ('type', 'text/css') > > attr: ('href', > 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69') > > Start tag: > > script > > attr: ('type', 'text/javascript') > > attr: ('src', > 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo') > > End tag : > > script > > End tag : > > head > > > > > > > > But the source text for this is [and all of the " seem to go > missing: > > > > Bob Aalsma | LinkedIn > > href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";> > > href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";> > > > > > > > > > > content="/profile/view?id=24198692&authType=name&authToken=KhOG"> > > No offense and thanks for the reminder. My background is software packages in 3GL, where different platforms mean different editors which mean it is sometimes difficult to recognize the end of blocks, especially when nested. No need for that here, no. I think it also means I'm still not really satisfied with my commenting in Python... -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser skipping HTML? [newbie]
Op woensdag 5 september 2012 14:57:05 UTC+2 schreef BobAalsma het volgende: > I'm trying to understand the HTMLParser so I've copied some code from > http://docs.python.org/library/htmlparser.html?highlight=html#HTMLParser and > tried that on my LinkedIn page. > > No errors, but some of the tags seem to go missing for no apparent reason - > any advice? > > I have searched extensively for this, but seem to be the only one with > missing data from HTMLParser :( > > > > Code: > > import urllib2 > > from HTMLParser import HTMLParser > > > > from GetHttpFileContents import getHttpFileContents > > > > # create a subclass and override the handler methods > > class MyHTMLParser(HTMLParser): > > def handle_starttag(self, tag, attrs): > > print "Start tag:\n\t", tag > > for attr in attrs: > > print "\t\tattr:", attr > > # end for attr in attrs: > > # > > def handle_endtag(self, tag): > > print "End tag :\n\t", tag > > # > > def handle_data(self, data): > > if data != '\n\n': > > if data != '\n': > > print "Data :\t\t", data > > # end if 1 > > # end if 2 > > # > > # > > # - > > # > > def removeHtmlFromFileContents(): > > TextOut = '' > > > > parser = MyHTMLParser() > > > parser.feed(urllib2.urlopen('http://nl.linkedin.com/in/bobaalsma').read()) > > > > return TextOut > > # > > # - > > # > > if __name__ == '__main__': > > TextOut = removeHtmlFromFileContents() > > > > > > > > > > > > Part of the output: > > End tag : > > script > > Start tag: > > title > > Data :Bob Aalsma - Nederland | LinkedIn > > End tag : > > title > > Start tag: > > script > > attr: ('type', 'text/javascript') > > attr: ('src', > 'http://www.linkedin.com/uas/authping?url=http%3A%2F%2Fnl%2Elinkedin%2Ecom%2Fin%2Fbobaalsma') > > End tag : > > script > > Start tag: > > link > > attr: ('rel', 'stylesheet') > > attr: ('type', 'text/css') > > attr: ('href', > 'http://s3.licdn.com/scds/concat/common/css?h=5v4lkweptdvona6w56qelodrj-7pfvsr76gzb22ys278pbj80xm-b1io9ndljf1bvpack85gyxhv4-5xxmkfcm1ny97biv0pwj7ch69') > > Start tag: > > script > > attr: ('type', 'text/javascript') > > attr: ('src', > 'http://s4.licdn.com/scds/concat/common/js?h=7nhn6ycbvnz80dydsu88wbuk-1kjdwxpxv0c3z97afuz9dlr9g-dlsf699o6xkxgppoxivctlunb-8v6o0480wy5u6j7f3sh92hzxo') > > End tag : > > script > > End tag : > > head > > > > > > > > But the source text for this is [and all of the " seem to go > missing: > > > > Bob Aalsma | LinkedIn > > href="https://s3-s.licdn.com/scds/concat/common/css?h=7d22iuuoi1bmp3a2jb6jyv5z5";> > > href="https://s4-s.licdn.com/scds/concat/common/css?h=b1io9ndljf1bvpack85gyxhv4-6qrj4gxbwq8loasfnyfmyuphe-dhog2e5h8scik4whkpqccnzou-dmo1gwj6nlhvdvzx7rmluambv-69sgyia02rmcjmco0t9d3xpvo";> > > > > > > > > > > content="/profile/view?id=24198692&authType=name&authToken=KhOG"> > > I can see that my Tester is not logging in: the reply from the site reads "Sign In | LinkedIn" rather than "Bob Aalsma | LinkedIn". How can I tell which part is not correct? -- http://mail.python.org/mailman/listinfo/python-list
Newbie: where's the new python gone?
I think I've installed Python 2.7.3 according to the instructions in the README, and now want to use that version. However, when typing "python" in Terminal, I get "Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32) ". So: (1) I can't seem to find where the new software has gone and (2) can't seem to find how to point to this new versoin. I've searched Python.org and with Google but :( [I'm on Mac OS X 10.7.4] Please help. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie: where's the new python gone?
Op zondag 9 september 2012 16:28:55 UTC+2 schreef BobAalsma het volgende: > I think I've installed Python 2.7.3 according to the instructions in the > README, and now want to use that version. > > However, when typing "python" in Terminal, I get "Python 2.6.4 (r264:75821M, > Oct 27 2009, 19:48:32) ". > > So: > > (1) I can't seem to find where the new software has gone and > > (2) can't seem to find how to point to this new versoin. > > I've searched Python.org and with Google but :( > > [I'm on Mac OS X 10.7.4] > > > > Please help. Thanks Steven! Most of what you wrote made very good sense, yes. Umm, I didn't usa altinstall - should I (and can I) go back? [In hindsight I do like your solution to the versopns a lot more, yes] Umm2, as said, I think I've installed (at least downloaded) 2.7.3 (note the three there) and with "python2.7" I now see "Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)" -- http://mail.python.org/mailman/listinfo/python-list
How to apply the user's HTML environment in a Python programme?
I'd like to write a programme that will be offered as a web service (Django), in which the user will point to a specific URL and the programme will be used to read the text of that URL. This text can be behind a username/password, but for several reasons, I don't want to know those. So I would like to set up a situation where the user logs in (if/when appropriate), points out the URL to my programme and my programme would then be able to read that particular text. I'm aware this may sound fishy. It should not be: I want the user to be fully aware and in control of this process. Any thoughts on how to approach this? Best regards, Bob -- http://mail.python.org/mailman/listinfo/python-list
Re: How to apply the user's HTML environment in a Python programme?
Op vrijdag 21 september 2012 15:23:14 UTC+2 schreef Joel Goldstick het volgende: > On Fri, Sep 21, 2012 at 8:57 AM, BobAalsma wrote: > > > I'd like to write a programme that will be offered as a web service > > (Django), in which the user will point to a specific URL and the programme > > will be used to read the text of that URL. > > > > > > This text can be behind a username/password, but for several reasons, I > > don't want to know those. > > > > > > So I would like to set up a situation where the user logs in (if/when > > appropriate), points out the URL to my programme and my programme would > > then be able to read that particular text. > > > > > > I'm aware this may sound fishy. It should not be: I want the user to be > > fully aware and in control of this process. > > > > > > Any thoughts on how to approach this? > > > > There are several python modules to get web pages. urllib, urllib2 > > and another called requests. > > (http://kennethreitz.com/requests-python-http-module.html) Check > > those out > > > > > > Best regards, > > > Bob > > > -- > > > http://mail.python.org/mailman/listinfo/python-list > > > > > > > > -- > > Joel Goldstick Thanks, Joel, yes, but as far as I'm aware these would all require the Python programme to have the user's username and password (or "credentials"), which I wanted to avoid. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to apply the user's HTML environment in a Python programme?
Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende: > On Fri, Sep 21, 2012 at 9:31 AM, BobAalsma wrote: > > > Thanks, Joel, yes, but as far as I'm aware these would all require the > > Python programme to have the user's username and password (or > > "credentials"), which I wanted to avoid. > > > > No matter what you do, your web service is going to have to > > authenticate with the remote web site. The details of that > > authentication are going to vary with each remote web site you want to > > connect to. > > > > -- > > Jerry Hmm, from the previous posts I get the impression that I could best solve this by asking the user for the specific combination of username, password and URL + promising not to keep any of that... OK, that does sound doable - thank you all Bob -- http://mail.python.org/mailman/listinfo/python-list
Re: How to apply the user's HTML environment in a Python programme?
Op vrijdag 21 september 2012 17:28:02 UTC+2 schreef David Smith het volgende: > On 2012-09-21 08:57, BobAalsma wrote: > > > This text can be behind a username/password, but for several reasons, I > > don't want to know those. > > > > > > So I would like to set up a situation where the user logs in (if/when > > appropriate), points out the URL to my programme and my programme would > > then be able to read that particular text. > > I do this from a bat file that I will later translate to Python. > > I tell my work wiki which file I want. I use chrome, so for every new > > session I'm asked for my credentials. However, that is all transparent > > to my bat file. > > > > For that matter, when I download a new build from part of another bat > > file, I use Firefox and never see the credential exchange. > > > > I wouldn't expect any different behavior using Python. Umm, David, sorry, you've lost me but I think this could be a good solution - at least the division in client side/server side sounds like what I'm looking for. Could you please elaborate? Bob -- http://mail.python.org/mailman/listinfo/python-list
Re: How to apply the user's HTML environment in a Python programme?
Op vrijdag 21 september 2012 22:10:04 UTC+2 schreef Dennis Lee Bieber het volgende: > On Fri, 21 Sep 2012 09:36:08 -0400, Jerry Hill > > declaimed the following in gmane.comp.python.general: > > > > > On Fri, Sep 21, 2012 at 9:31 AM, BobAalsma wrote: > > > > Thanks, Joel, yes, but as far as I'm aware these would all require the > > > Python programme to have the user's username and password (or > > > "credentials"), which I wanted to avoid. > > > > > > No matter what you do, your web service is going to have to > > > authenticate with the remote web site. The details of that > > > authentication are going to vary with each remote web site you want to > > > connect to. > > > > Hmmm, convoluted but presuming the "login" third party site uses > > cookies... Would it be possible to use Javascript on the client "copy" > > the HTML from the third-party and then transmit it to the application > > rather than having the application trying to do a direct fetch given > > just the URL? > > > > This should keep the authentication local to the client machine. > > > > > > -- > > Wulfraed Dennis Lee Bieber AF6VN > > wlfr...@comHTTP://wlfraed.home.netcom.com/ Wulfraed, yes, as with David's proposal: this sounds good, but I wouldn't know the first thing about Javascript... I'm also concerned that both solutions would seem to imply distributing software (or "software") to the clients systems. Hmm. Bob -- http://mail.python.org/mailman/listinfo/python-list
Re: How to apply the user's HTML environment in a Python programme?
Op vrijdag 21 september 2012 16:15:30 UTC+2 schreef Joel Goldstick het volgende: > On Fri, Sep 21, 2012 at 9:58 AM, BobAalsma wrote: > > > Op vrijdag 21 september 2012 15:36:11 UTC+2 schreef Jerry Hill het volgende: > > >> On Fri, Sep 21, 2012 at 9:31 AM, BobAalsma wrote: > > >> > > >> > Thanks, Joel, yes, but as far as I'm aware these would all require the > >> > Python programme to have the user's username and password (or > >> > "credentials"), which I wanted to avoid. > > >> > > >> > > >> > > >> No matter what you do, your web service is going to have to > > >> > > >> authenticate with the remote web site. The details of that > > >> > > >> authentication are going to vary with each remote web site you want to > > >> > > >> connect to. > > >> > > >> > > >> > > >> -- > > >> > > >> Jerry > > > > > > Hmm, from the previous posts I get the impression that I could best solve > > this by asking the user for the specific combination of username, password > > and URL + promising not to keep any of that... > > > > > > OK, that does sound doable - thank you all > > > > > > I recommend that you write your program to read pages that are not > > protected. Once you get that working, you can go back and figure out > > how you want to get the username/password from your 'friends' and add > > that in. Also look up Beautiful Soup (version 4) for a great library > > to parse the pages that you retrieve > > > > > > Bob > > > -- > > > http://mail.python.org/mailman/listinfo/python-list > > > > > > > > -- > > Joel Goldstick Joel, I've spent some time with this but don't really understand my results - some help would be appreciated. I've built a tester that will read my LinkedIn home page, which is password protected. When I use that method for reading other people's pages, the program is redirected to the LinkedIn login page. When I paste the URLs for the other people's pages in any browser, the requested pages are shown. Bob -- http://mail.python.org/mailman/listinfo/python-list
fork seems to make urlopen into a black hole?
A program took much too long to check some texts collected from web pages. As this could be made parallel easily, I put in fork. And the result seems to be that the program simply stops in the line with urlopen. Any suggestions? Relevant part: try: print 'urlopen by', kind_nummer, '- before' extern_adres = urlopen(adres).geturl() print 'urlopen by', kind_nummer, '- after' except IOError: tekst_string = 'URL "' + adres +'" geeft IO Error' print tekst_string, 'door', kind_nummer bloklijst_verslag.append(tekst_string + '\n') tekst_string = '\t Deze pagina wordt overgeslagen\n' bloklijst_verslag.append(tekst_string + '\n') adres = '' except: fout = sys.exc_info()[:2] tekst_string = 'URL "' + adres + '" geeft fout ' + fout print tekst_string, 'door', kind_nummer bloklijst_verslag.append(tekst_string + '\n') tekst_string = '\t Deze pagina wordt overgeslagen\n' bloklijst_verslag.append(tekst_string + '\n') adres = '' else: print 'urlopen by', kind_nummer, '- else' if extern_adres != adres: tekst_string = '\t redirect naar ' + extern_adres bloklijst_verslag.append(tekst_string + '\n') >From this block, the only response seen is the first print statement ("url >open by - before") and then nothing else seems to happen in that child. This happens for all children: the expected number of children is reached and they all reach (only) this point. -- https://mail.python.org/mailman/listinfo/python-list
Newbie: installation difficulties [webapp2 / babel]
Hi, I'm following webapp2 documentation (release 2.1). I made a mistake in following the text. I typed "pip install babel" and this led to errors in the installation. As that user is not in sudo list, I changed users, typed "sudo pip install babel" and everything seemed right. Further on, the manual says "$ pybabel " and when I use that command, the response is "-bash: pybabel: command not found" (in both user environments). I'd like to solve this and learn at the same time ;) I'm on OS X 10.8.5, python 2.7.5. I found a "babel" in /Library/Python/2.7/site-packages. I found a "babel" in the response to "python -c "help('modules')" | grep babel". >From which I conclude that babel is on the machine and is known to python. However, I gather from the webapp2 documentation that babel should also be known to the shell. I don't yet know how to check this, nor how to repair. find /usr -name '*babel*' and find /bin -name '*babel*' return no values, but find /Library -name '*babel*' does (as expected). I've tried to find similar situations in documentation, forums, Google but found nothing helpful. -- https://mail.python.org/mailman/listinfo/python-list
Re: Newbie: installation difficulties [webapp2 / babel]
Well Joel, umm, I'm not sure if I understand you correctly. $ python babel /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python: can't open file 'babel': [Errno 2] No such file or directory And $ python Python 2.7.5 (v2.7.5:ab05e7dd2788, May 13 2013, 13:18:45) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import babel >>> Does this help? Bob -- https://mail.python.org/mailman/listinfo/python-list
Newbie question: python versions differ per user?
I'm on Mac OS X 10.5.8 and downloaded 2.6.4 Mac Installer Disk Image as/in(?) the sys admin user. For this user Pyhton 2.6.4 is now the current version. I want to use Python outside the sys asdmin user. However, all other users still use Python 2.5.1 (Apple delivered). The sys admin user looks in /Library/Frameworks/Python.framework/ Versions/2.6/lib/... The other users look in/System/Library/Frameworks/ Python.framework/Version/2.5/lib/... I could not find any questions on this matter, so am I the only one? Did I do something wrong? I assumed the paths for all users would be modified - too easy? -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie question: python versions differ per user?
On Mar 8, 8:15 pm, BobAalsma wrote: > I'm on Mac OS X 10.5.8 and downloaded 2.6.4 Mac Installer Disk Image > as/in(?) the sys admin user. For this user Pyhton 2.6.4 is now the > current version. > I want to use Python outside the sys asdmin user. However, all other > users still use Python 2.5.1 (Apple delivered). > > The sys admin user looks in /Library/Frameworks/Python.framework/ > Versions/2.6/lib/... > The other users look in /System/Library/Frameworks/ > Python.framework/Version/2.5/lib/... > > I could not find any questions on this matter, so am I the only one? > Did I do something wrong? > I assumed the paths for all users would be modified - too easy? OK, sorry, found out. -- http://mail.python.org/mailman/listinfo/python-list
newbie problem with str.replace
I'm working on a set of scripts and I can't get a replace to work in the script - please help. The scripts show no errors, work properly apart from the replace, all variables are filled as expected, the scripts works properly when the commands are copied to the Python shell. Text Main: .. from LeadDevice_klant_nieuw_bewerken_bestanden import omkattenBoom, omkattenNaam ... # # (3) omkatten bestandsnamen: # print 'Omkatten bestandsnamen' omkattenNaam(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN, ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN) # Text LeadDevice_klant_nieuw_bewerken_bestanden: import os import distutils.core ... def omkattenNaam(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN, ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN): # # Strings opbouwen voordat aan de lussen gewerkt wordt: # beginpunt = AANGRIJPINGSPUNT + KLANTNAAM_IN + '/' # ZOEKSET1_OUT_LOWER = ZOEKSET1_OUT.lower() ZOEKSET1_IN_LOWER = ZOEKSET1_IN.lower() ZOEKSET2_OUT_LOWER = ZOEKSET2_OUT.lower() ZOEKSET2_IN_LOWER = ZOEKSET2_IN.lower() # # Lussen: # for root, dirs, files in os.walk(beginpunt, topdown = False): for bestandsnaam in dirs and files: # bestandsnaam_nieuw = bestandsnaam bestandsnaam_nieuw.replace(KLANTNAAM_OUT,KLANTNAAM_IN) -- http://mail.python.org/mailman/listinfo/python-list
Re: newbie problem with str.replace
On Aug 4, 3:22 pm, Anthony Tolle wrote: > On Aug 4, 9:10 am, BobAalsma wrote: > > > # > > bestandsnaam_nieuw = bestandsnaam > > > > bestandsnaam_nieuw.replace(KLANTNAAM_OUT,KLANTNAAM_IN) > > The replace method does not modify the string (strings are immutable). > > You need to use the retun value of the method in an assignment, like > so: > > bestandsnaam_nieuw = bestandsnaam.replace(KLANTNAAM_OUT,KLANTNAAM_IN) > > This will not change the value of bestandsnaam YESS! Thanks, this is what I wanted to achieve but could not find Regards, Bob -- http://mail.python.org/mailman/listinfo/python-list
Newbie: plist & control characters?
I'm trying to modify a plist file. The modification works properly, but I'm having difficulties in finding the proper way to restore. The file contains HTML strings like "$#226;" and either this gets replaced by "â" (which I don't want) but the programme completes or the program fails when I try to use the data wrapper. I must be doing something wrong, but can't find what (and this includes Google searches). Please help. Failing text: def omkattenAgents(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN, ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN): tussentekst = plistlib.Data(plistlib.readPlist(os.path.join(root,bestandsnaam))) tussenblok = tussentekst.data ding = tussenblok['RepresentedObject'] tekststring = ding.get('Name') tekststring_0 = tekststring.replace(KLANTNAAM_OUT,KLANTNAAM_IN) tekststring_1 = tekststring_0.replace(ZOEKSET1_OUT,ZOEKSET1_IN) tekststring_2 = tekststring_1.replace(ZOEKSET2_OUT,ZOEKSET2_IN) tekststring_3 = tekststring_2.replace(ZOEKSET1_OUT_LOWER,ZOEKSET1_IN_LOWER) tekststring_4 = tekststring_3.replace(ZOEKSET2_OUT_LOWER,ZOEKSET2_IN_LOWER) ding['Name'] = tekststring_4 tussenblok['RepresentedObject'] = ding tussentekst.data = tussenblok plistlib.Data(plistlib.writePlist(tussentekst, os.path.join(root,bestandsnaam))) Text in Terminal: File "LeadDevice_klant_nieuw_aanmaken.py", line 66, in omkattenAgents(AANGRIJPINGSPUNT, KLANTNAAM_OUT, KLANTNAAM_IN, ZOEKSET1_OUT, ZOEKSET1_IN, ZOEKSET2_OUT, ZOEKSET2_IN) File "/Volumes/LeadDevice-2/LeadDevice/Programmatuur/Python/ LeadDeviceProductie/LeadDevice_klant_nieuw_naamcorrectie_intern.py", line 162, in omkattenAgents plistlib.Data(plistlib.writePlist(tussentekst, os.path.join(root,bestandsnaam))) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/plistlib.py", line 94, in writePlist writer.writeValue(rootObject) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/plistlib.py", line 254, in writeValue self.writeData(value) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/plistlib.py", line 267, in writeData for line in data.asBase64(maxlinelength).split("\n"): File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/plistlib.py", line 379, in asBase64 return _encodeBase64(self.data, maxlinelength) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/ python2.6/plistlib.py", line 361, in _encodeBase64 chunk = s[i : i + maxbinsize] TypeError: unhashable type The difference between failure and completion with replacing HTML is the statement tussentekst.data = tussenblok (failure) tussentekst = tussenblok (completion with replacing) Regards, Bob -- http://mail.python.org/mailman/listinfo/python-list