Τη Σάββατο, 13 Απριλίου 2013 4:41:57 π.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: > On 11Apr2013 09:55, Nikos <nagia.rets...@gmail.com> wrote: > > | Τη Πέμπτη, 11 Απριλίου 2013 1:45:22 μ.μ. UTC+3, ο χρήστης Cameron Simpson > έγραψε: > > | > On 10Apr2013 21:50, nagia.rets...@gmail.com <nagia.rets...@gmail.com> > wrote: > > | > | the doctype is coming form the attempt of script metrites.py to open > and read the 'index.html' file. > > | > | But i don't know how to try to open it as a byte file instead of an > tetxt file. > > > > Lele Gaifax showed one way: > > > > from codecs import open > > with open('index.html', encoding='utf-8') as f: > > content = f.read() > > > > But a plain open() should also do: > > > > with open('index.html') as f: > > content = f.read() > > > > if you're not taking tight control of the file encoding. > > > > The point here is to get _text_ (i.e. str) data from the file, not bytes. > > > > If the text turns out to be incorrectly decoded (i.e. incorrectly > > reading the file bytes and assembling them into text strings) because > > the default encoding is wrong, then you may need to read for Lele's > > more verbose open() example to select the correct encoding. > > > > But first ignore that and get text (str) instead of bytes. > > If you're already getting text from the file, something later is > > making bytes and handing it to print(). > > > > Another approach to try is to use > > sys.stdout.write() > > instead of > > print() > > > > The print() function will take _anything_ and write text of some form. > > The write() function will throw an exception if it gets the wrong type of > data. > > > > If sys.stdout is opened in binary mode then write() will require > > bytes as data; strings will need to be explicitly turned into bytes > > via .encode() in order to not raise an exception. > > > > If sys.stdout is open in text mode, write() will require str data. > > The sys.stdout file itself will transcribe to bytes for you. > > > > If you take that route, at least you will not have confusion about > > str versus bytes. > > > > For an HTML output page I would advocate arranging that sys.stdout > > is in text mode; that way you can do the natural thing and .write() > > str data and lovely UTF-8 bytes will come out the other end. > > > > If the above test (using .write() instead of print()) shows it to > > be in binary mode we can fix that. But you need to find out. > > > > You will want access to the error messages from the CGI environment; > > do you have access to the web servers error_log? You can tail that > > in a terminal while you reload the page to see what's going on. > > > > | This works in the shell, but doesn't work on my website: > > | > > | $ cat utf8.txt > > | υλικό!Πρόκειται γ > > > > Ok, so your terminal is using UTF-8 as its output coding. (And so > > is your mail posting program, since we see it unmangled on my screen > > here.) > > > > | $ python3 > > | Python 3.2.3 (default, Oct 19 2012, 20:10:41) > > | [GCC 4.6.3] on linux2 > > | Type "help", "copyright", "credits" or "license" for more information. > > | >>> data = open('utf8.txt').read() > > | >>> print(data) > > | υλικό!Πρόκειται γ > > > > Likewise. > > > > However, in an exciting twist, I seem to recall that Python invoked > > interactively with aterminal as output will have the default terminal > > encoding in place on sys.stdout. Producing what you expect. _However_, > > python invoked in a batch environment where stdout is not a terminal > > (such as in the CGI environment producing your web page), that is > > _not_ necessarily the case. > > > > | >>> print(data.encode('utf-8')) > > | > b'\xcf\x85\xce\xbb\xce\xb9\xce\xba\xcf\x8c!\xce\xa0\xcf\x81\xcf\x8c\xce\xba\xce\xb5\xce\xb9\xcf\x84\xce\xb1\xce\xb9 > \xce\xb3\n' > > | > > | See, the last line is what i'am getting on my website. > > > > The above line takes your Unicode text in "data" and transcribed > > it to bytes using UTF-8 as the encoding. And print() is then receiving > > that bytes object and printing its str() representation as "b'....'". > > That str is itself unicode, and when print passes it to sys.stdout, > > _that_ transcribed the unicode "b'...'" string as bytes to your > > terminal. Using UTF-8 based on the previous examples above, but > > since all those characters are in the bottom 127 code range the > > byte sequence will be the same if it uses ASCII or ISO8859-1 or > > almost anything else:-) > > > > As you can see, there's a lot of encoding/decoding going on behind > > the scenes even in this superficially simple example. > > > > | If i remove > > | the encode('utf-8') part in metrites.py, the webpage will not show > > | anything at all... > > > > Ah, but data will be being output. The print() function _will_ be > > writing "data" out in some form. I suggest you remove the .encode() > > and then examine the _source_ text of the web page, not its visible > > form. > > > > So: remove .encode(), reload the web page, "view page source" > > (depends on your browser, it is ctrl-U in Firefox ((Cmd-U in firefox > > on a Mac))). > > > > I think a lot of the issue you have in this thread is that your > > page is too complex. Make another page to do the same thing, and > > start with nothing. Add stuff to it a single item at a time until > > the page behaves incorrectly. Then you will know the exact item of > > code that introduced the issue. And then that single item can be > > examined in detail for the decode/encode issues. > > > > The other issue in the thread is that people losing patience get > > snarky. Respond only to the technical content. If a message is only > > snarky, _ignore_ it. People like the last word; let them have it > > and you won't get sidetracked into arguments. > > > > Cheers, > > -- > > Cameron Simpson <c...@zip.com.au> > > > > PCs are like a submarine, it will work fine till you open Windows. - zollie101
First of all thank you very much Cameron for your detailed help and effort to write to me: It seems another issue had happened without my knowledge, i was uploading stuff at /root/public_html/cgi-bin instead of /home/nikos/public_html/cgi-bin. I realized that when i deliberately made error to metrites.py scropt and i got still the same page. Ookey after that is corrected, i then tried the plain solution and i got this response back form the shell: Traceback (most recent call last): File "metrites.py", line 213, in <module> htmldata = f.read() File "/root/.local/lib/python2.7/lib/python3.3/encodings/iso8859_7.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0xae in position 47: character maps to <undefined> then i switched to: with open('/home/nikos/www/' + page, encoding='utf-8') as f: htmldata = f.read() and i got no error at all, just pure run *from the shell*! But i get internal server error when i try to run the webpage from the browser(Chrome). So, can you tell me please where can i find the apache error log so to display here please? Apcher error_log is always better than running 'python3 metrites.py' because even if the python script has no error apache will also display more web related things? Thank you Cameron. -- http://mail.python.org/mailman/listinfo/python-list