Τη Παρασκευή, 7 Ιουνίου 2013 5:29:25 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:

> This is a worse way of doing it because the ISO-8859-7 encoding has 1
> byte per codepoint, meaning that it's more 'tolerant' (if that's the 
> word) of errors. A sequence of bytes that is actually UTF-8 can be
> decoded as ISO-8859-7, giving gibberish.

> UTF-8 is less tolerant, and it's the encoding that ideally you should 
> be using everywhere, so it's better to assume UTF-8 and, if it fails,  
> try ISO-8859-7 and then rename so that any names that were ISO-8859-7
> will be converted to UTF-8.

Indeed iw asnt aware of that, at that time, i was under the impression that if 
a string was encoded to bytes using soem charset can only be switched back with 
the use of that and only that charset. Since this is the case here is my 
fixning:


#========================================================
# Collect filenames of the path dir as bytes
filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' )

for filename in filename_bytes:
        # Compute 'path/to/filename' into bytes
        filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename'
        flag = False
        
        try:
                # Assume current file is utf8 encoded
                filepath = filepath_bytes.decode('utf-8')
                flag = 'utf8' 
        except UnicodeDecodeError:
                try:
                        # Since current filename is not utf8 encoded then it 
has to be greek-iso encoded
                        filepath = filepath_bytes.decode('iso-8859-7')
                        flag = 'greek'
                except UnicodeDecodeError:
                        print( '''I give up! File name is unreadable!''' )
        
        if( flag = 'greek' )
                # Rename filename from greek bytes --> utf-8 bytes
                os.rename( filepath_bytes, filepath.encode('utf-8') )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for filename in filenames:
        try:
                # Check the presence of a file against the database and insert 
if it doesn't exist
                cur.execute('''SELECT url FROM files WHERE url = %s''', 
filename )
                data = cur.fetchone()
                
                if not data:
                        # First time for file; primary key is automatic, hit is 
defaulted 
                        cur.execute('''INSERT INTO files (url, host, lastvisit) 
VALUES (%s, %s, %s)''', (filename, host, lastvisit) )
        except pymysql.ProgrammingError as e:
                print( repr(e) )


#========================================================
filenames = os.listdir( '/home/nikos/public_html/data/apps/' )
filepaths = ()

# Build a set of 'path/to/filename' based on the objects of path dir
for filename in filenames:
        filepaths.add( filename )

# Delete spurious 
cur.execute('''SELECT url FROM files''')
data = cur.fetchall()

# Check database's filenames against path's filenames
for rec in data:
        if rec not in filepaths:
                cur.execute('''DELETE FROM files WHERE url = %s''', rec )

=============================
ni...@superhost.gr [~/www/cgi-bin]# [Fri Jun 07 21:49:33 2013] [error] [client 
79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 81
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]     if( flag == 
'greek' )
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]                       
  ^
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] SyntaxError: invalid 
syntax
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] Premature end of 
script headers: files.py
-------------------------------
i dont know why that if statement errors.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to