Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-05-02 Thread Davide Alberani
On Mon, May 2, 2011 at 08:47, darklow dark...@gmail.com wrote: Thank you for your patience and guiding through the tests, i really glad we finally found the problem and fixed it. Yep, even if it took a little too long. :-) Just curious, why only me and one another user encountered this

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-28 Thread Davide Alberani
On Thu, Apr 28, 2011 at 22:52, darklow dark...@gmail.com wrote: However last command pip install IMDbPY didn't succeeded so well, looks like i got exactly the same error, that another user reported some days ago in the same discussion and he has also UTF-8 encoding problem: Sure: you don't

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-26 Thread darklow
Thanks, let me know if you have any ideas, how to fix the problem... About virtalenv. I was also quite pedantic on ignoring virtualenv solution - i am programmer, not a system administrator, i am not familiar with python, i understand the code logic, but haven't coded any application so far, just

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-26 Thread Davide Alberani
On Tue, Apr 26, 2011 at 09:36, darklow dark...@gmail.com wrote: Thanks, let me know if you have any ideas, how to fix the problem... Eh... As usual, right now I'm really busy. :-( I looked at virtualenv documentation, i didn't understand how to use it, Ok, let's try: - download virtualenv

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread darklow
There has never been any issues with our PostgresSQL database, we always have used UTF-8 and are using this time. I have tried plenty of scripts, workarounds so far, many decode().encode() tries, but nothing helps, just gettings different errors by these. I also tried adding following lines, to be

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread darklow
Yes i can confirm - Script version 4.6 works perfectly on same server with same files. And i think by this we come closer to solution. Maybe this helps to identify the problem, this is what we did on our server. (Remember, we are doing this copying because there are only stable versions for Debian

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread Davide Alberani
On Sun, Apr 24, 2011 at 20:03, Thomas Stewart tho...@stewarts.org.uk wrote: I've just had a try using sqlite with fresh lists and on my Debian system and I get this: thomas@ikaite:~$ /tmp/imdbpy2sql.py -d /home/thomas/Desktop/imdb/lists -u sqlite:///home/thomas/Desktop/imdb/imdb.db

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread Davide Alberani
On Sun, Apr 24, 2011 at 21:03, darklow dark...@gmail.com wrote: I tried reinstalling all installed dependancies and run from clean sources, but no luck. I tried to run scripts with SQLAlchemy instead of SQLObject, but same error, so the problem is not there. Perfect - these tests are really

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread Davide Alberani
On Sun, Apr 24, 2011 at 22:44, darklow dark...@gmail.com wrote: Yes i can confirm - Script version 4.6 works perfectly on same server with same files. And i think by this we come closer to solution. Excellent! (well, it still baffles me why I'm absolutely unable to reproduce the problem on my

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-23 Thread Davide Alberani
On Wed, Apr 20, 2011 at 14:08, darklow dark...@gmail.com wrote: Still no luck :/ maybe the problem is in some environmental variables or settings, which on installed version are present, but running from source are missing or incorrect? Seems unlikely to me. What about this, i printed out

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-20 Thread darklow
Still no luck :/ maybe the problem is in some environmental variables or settings, which on installed version are present, but running from source are missing or incorrect? What about this, i printed out some variables: print sys.stdout.encoding - UTF-8 print sys.stdin.encoding - UTF-8 print

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-19 Thread Davide Alberani
On Mon, Apr 18, 2011 at 09:30, Davide Alberani davide.alber...@gmail.com wrote: Thanks for the file, I hope to look at it within a day or two. Ok: the file is correctly encoded in iso8859-1, as expected, and contains no garbage. Using it as the only input for imdbpy2sql.py (putting the

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-18 Thread darklow
On Sun, Apr 17, 2011 at 5:13 PM, Davide Alberani davide.alber...@gmail.comwrote: On Sun, Apr 17, 2011 at 14:04, darklow dark...@gmail.com wrote: Updated this morning to latest data files, no change and unfortunately this fix also doesn't work. Hmm... to debug a problem like this without

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-18 Thread Davide Alberani
On Mon, Apr 18, 2011 at 08:53, darklow dark...@gmail.com wrote: We have Debian linux on our server and our sysadmin allows only stable packs. However latest version of imdbpy has these md5 checksum that are quite important in our situation, that is why i have to run it from source. Ehhh...

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-17 Thread Petite Abeille
On Apr 13, 2011, at 8:46 AM, darklow wrote: Ananlyzed error a bit more. Mostly these errors occur in Japanese actors (actors.list), in filmography there apperars strange characters: Sounds like a character set encoding issue. Originally, something like actors.list is ISO-8859-1 encoded.

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-17 Thread Davide Alberani
On Sun, Apr 17, 2011 at 14:04, darklow dark...@gmail.com wrote: Updated this morning to latest data files, no change and unfortunately this fix also doesn't work. Hmm... to debug a problem like this without being able to reproduce, is extremely difficult. :-/ This error started when we

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-16 Thread Davide Alberani
On Wed, Apr 13, 2011 at 08:46, darklow dark...@gmail.com wrote: Maybe someone knows some fast dirty fix at least how to skip such invalid byte sequence strings while there are no official fix, so i can finish the import? Can we detect invalid byte characters? Hi again, actually my problem is

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-14 Thread darklow
Unfortunately adding this line k = k.replace('\xec\x8c\xa0', '') in the place you mentioned wont help. Still same error on same place :( SCANNING actor: Havel, Jir? * FLUSHING CharactersCache... Traceback (most recent call last): . self.flush() File ./imdbpy2sql.py, line 1195, in

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-13 Thread darklow
Since i am not familiar with python, maybe you could suggest some fast fix so that scripts doesn't hangs? Maybe this helps: In PHP we have perfeclty same error with encoding when importing some wrong decoded data. When we have no control over data and we cant all the time do utf8_encode since it

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-13 Thread darklow
Maybe someone knows some fast dirty fix at least how to skip such invalid byte sequence strings while there are no official fix, so i can finish the import? Can we detect invalid byte characters? Maybe we can somehow replace or get rid of *0xc320* character, which mostly is appearing. Or skip

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-13 Thread Davide Alberani
On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:   File ./imdbpy2sql.py, line 1194, in _toDB     CURS.executemany(self.sqlstr, self.converter(l)) psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320 HINT:  This error can also happen if the byte sequence does not

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-11 Thread Davide Alberani
On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:   File ./imdbpy2sql.py, line 1194, in _toDB     CURS.executemany(self.sqlstr, self.converter(l)) psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320 HINT:  This error can also happen if the byte sequence does not