Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-05-02 Thread Davide Alberani
On Mon, May 2, 2011 at 08:47, darklow dark...@gmail.com wrote:

 Thank you for your patience and guiding through the tests, i really glad we
 finally found the problem and fixed it.

Yep, even if it took a little too long. :-)

 Just curious, why only me and one another user encountered this problem, but
 when you run the same tests, you didn't see the error? :)

It may have something to do with the use python library to connect to
Postgres.  Maybe some libraries handle gracefully this kind of error; I have
to check better the versions installed on my system and on the virtualenv
I've used to reproduce the bug.
In fact the right thing to do in such cases is to raise an exception (like in
our case); other databases - or libraries to connect to databases - like MySQL
simply ignore with a warning these errors (not a great idea).

-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-28 Thread Davide Alberani
On Thu, Apr 28, 2011 at 22:52, darklow dark...@gmail.com wrote:

 However last command pip install IMDbPY didn't succeeded so well, looks like
 i got exactly the same error, that another user reported some days ago in
 the same discussion and he has also UTF-8 encoding problem:

Sure: you don't have the python-dev package installed
in your system. :-/
A per-user installation is possible, but a little tricky...

 By running python setup.py install  I receive the same error. I also tried
 latest version (4.8dev20110425) but got same error.

Using the latest version sources, run (after you've activated your
virtualenv!):
  python setup.py install --without-cutils

 Maybe this explains the problem why the script doesn't handle UTF-8 at first
 place - some strange incapabilities with cutils.c

I've run some tests without the compiled C module, so I think this
is not the cause, but at this point... who knows. :-)



-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-26 Thread darklow
Thanks, let me know if you have any ideas, how to fix the problem...
About virtalenv. I was also quite pedantic on ignoring virtualenv solution -
i am programmer, not a system administrator, i am not familiar with python,
i understand the code logic, but haven't coded any application so far, just
one test parser to diagnose error.
I looked at virtualenv documentation, i didn't understand how to use it, the
problem is my little knowledge in Python and its components, so i think you
have to be more familiar with Python and its libraries and way they are
installed and configured before installing and configuring virtualenv.
Also our sysadmin is quite pain in the a.. It is hard to prove the need of
that or another new tool to install. If it has a stable debian package, then
it is easier. But for all the other packages, almost impossible. Also i am
not sure i want to intrude in sysadmins environment and do some installs by
myself, even if it doesn't require root access..


On Mon, Apr 25, 2011 at 1:19 AM, Davide Alberani
davide.alber...@gmail.comwrote:

 On Sun, Apr 24, 2011 at 22:44, darklow dark...@gmail.com wrote:
  Yes i can confirm - Script version 4.6 works perfectly on same server
 with
  same files.
  And i think by this we come closer to solution.

 Excellent!  (well, it still baffles me why I'm absolutely unable to
 reproduce the problem on my system, but that's another story...)

  Maybe this helps to identify the problem, this is what we did on our
 server.
  (Remember, we are doing this copying because there are only stable
 versions
  for Debian on server allowed, but we need those md5 hashes from 4.7
 version)

 I'll look at your setup tomorrow.  I'll surely sound pedantic, but...
 seriously:
 why you don't use a virtualenv environment?  It's easy to install and
 doesn't require root privileges.



 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-26 Thread Davide Alberani
On Tue, Apr 26, 2011 at 09:36, darklow dark...@gmail.com wrote:
 Thanks, let me know if you have any ideas, how to fix the problem...

Eh... As usual, right now I'm really busy. :-(

 I looked at virtualenv documentation, i didn't understand how to use it,

Ok, let's try:
- download virtualenv from http://pypi.python.org/pypi/virtualenv#downloads
- tar xvfz virtualenv-1.6.tar.gz
- cd virtualenv-1.6
- python virtualenv.py --no-site-packages ~/myvenv
- cd ~/myvenv
- . ./bin/activate # notice the initial dot
- pip install formencode # bug with the dependencies. :(
- pip install IMDbPY # or download from the Mercurial repository and
run 'python setup.py install'

The most important step is the activation of the virtualenv: your prompt
should change to something like (myvenv)$ to denote that your virtualenv
is active.

Now, always from inside the virtualenv, you can run the imdbpy2sql.py script:
everything was installed locally to your ~/myvenv/ directory (the local python
interpreter is in ~/myvenv/bin/python).
If you need to deactivate the virtualenv, simply run the  deactivate command.

HTH,
-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread darklow
There has never been any issues with our PostgresSQL database, we always
have used UTF-8 and are using this time.
I have tried plenty of scripts, workarounds so far, many decode().encode()
tries, but nothing helps, just gettings different errors by these.
I also tried adding following lines, to be sure everything is fine with
connection to Database:

import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

import codecs
sys.setdefaultencoding('utf-8')

CURS.execute(SET NAMES 'utf8')
CURS.execute(SET CLIENT_ENCODING TO 'utf8')


But still nothing helps.
I tried reinstalling all installed dependancies and run from clean sources,
but no luck.
I tried to run scripts with SQLAlchemy instead of SQLObject, but same error,
so the problem is not there.

I woud like to ask you one thing.
Every test takes about 1h, because error takes place in Actors Cast list.
Can you please tell what are the exact list of commands that are converting
lines from file to line to sql.
So i could create new script, that tries small version of actors.list with
problematic lines only, runs few unicode() and decode() lines in correct
order and try to insert these lines in some test table into database. So i
could try, more faster and not to wait 1 hour for every try...

What i tried already is to open actor.list file with PHP, read every line
and using iconv converted string to UTF8 and inserted into PostgreSQL
database and everything worked fine. It makes me think that problem might be
somewhere in cutting line in peaces, maybe it does something wrong, cuts
some good unicode character into peaces and so invalid byte sequence
appears. If i had correct function list for Python, i could run more tests.

PS. Just run test with 4.6 version, to see if it still works with 4.6
version, then we could more easy diagnose by looking in file changes.
I'll post the results

Thank you.

On Sat, Apr 23, 2011 at 3:23 PM, Davide Alberani
davide.alber...@gmail.comwrote:

 On Wed, Apr 20, 2011 at 14:08, darklow dark...@gmail.com wrote:
  Still no luck :/ maybe the problem is in some environmental variables or
  settings, which on installed version are present, but running from source
  are missing or incorrect?

 Seems unlikely to me.

  What about this, i printed out some variables:
  print sys.stdout.encoding - UTF-8
  print sys.stdin.encoding   - UTF-8
  print sys.getdefaultencoding(); - ascii
  Is it ok that  sys.getdefaultencoding(); == ascii ?

 These are fine.

 I've reproduced - at the best of my capabilities - your environment:
 - no IMDbPY installed in the system.
 - IMDbPY from source (the latest version in the Mercurial repository),
  setting the PYTHONPATH environment variable to point to the
  source directory.
 - the cutils C module was not compiled.
 - the last actors.list.gz file.
 - postgres 8.4; my database was created with these settings:
  CREATE DATABASE imdb
WITH OWNER = postgres
   ENCODING = 'UTF8'
   TABLESPACE = pg_default
   LC_COLLATE = 'it_IT.utf8'
   LC_CTYPE = 'it_IT.utf8'
   CONNECTION LIMIT = -1;

 I've run it with your and other portions of the actors.list.gz file, and
 everything went fine.

 Now... if I were you, I'd:
 - create a virtualenv environment with:
virtualenv --no-site-packages
 - install in it IMDbPY, using easy_install or pip (the executable in
  your virtualenv, I mean) so that you'll have all the correct dependecies
  available.
 - run the imdbpy2sql.py within your virtualenv.

 If it still fails:
 - check your postgres settings.
 - try using SQLite (just for a test) - see notes in README.sqldb


 HTH,
 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/

--
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread darklow
Yes i can confirm - Script version 4.6 works perfectly on same server with
same files.
And i think by this we come closer to solution.
Maybe this helps to identify the problem, this is what we did on our server.
(Remember, we are doing this copying because there are only stable versions
for Debian on server allowed, but we need those md5 hashes from 4.7 version)

1. We installed imdbpy 4.6 with all the dependancies
(python-psycopg2, python-dns python-formencode python-pkg-resources
python-sqlobject)
2. I downloaded version 4.7 and overwritten following directories with files
from 4.7 source:

cp -r imdbpy4.7/docs/* /usr/share/doc/python-imdb/
cp -r imdbpy4.7/imdb/* /usr/share/pyshared/imdb/


3. Now i run imdbpy2sql.py from version 4.7 source like before and it fails
with invalid byte sequence.
4. I copied back 4.6. version files to mentioned directories and import for
version 4.6 works again.

By looking on install log, i didnt see any more relative files, that i
should overwrite. So the problem might be at dependancies.
You have any idea, where could be the problem and what else should we
overwrite or update so that v4.7 works?
Thank you.


On Sun, Apr 24, 2011 at 10:03 PM, darklow dark...@gmail.com wrote:

 There has never been any issues with our PostgresSQL database, we always
 have used UTF-8 and are using this time.
 I have tried plenty of scripts, workarounds so far, many decode().encode()
 tries, but nothing helps, just gettings different errors by these.
 I also tried adding following lines, to be sure everything is fine with
 connection to Database:

 import psycopg2
 import psycopg2.extensions
 psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
 psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

 import codecs
 sys.setdefaultencoding('utf-8')

 CURS.execute(SET NAMES 'utf8')
 CURS.execute(SET CLIENT_ENCODING TO 'utf8')


 But still nothing helps.
 I tried reinstalling all installed dependancies and run from clean sources,
 but no luck.
 I tried to run scripts with SQLAlchemy instead of SQLObject, but same
 error, so the problem is not there.

 I woud like to ask you one thing.
 Every test takes about 1h, because error takes place in Actors Cast list.
 Can you please tell what are the exact list of commands that are converting
 lines from file to line to sql.
 So i could create new script, that tries small version of actors.list with
 problematic lines only, runs few unicode() and decode() lines in correct
 order and try to insert these lines in some test table into database. So i
 could try, more faster and not to wait 1 hour for every try...

 What i tried already is to open actor.list file with PHP, read every line
 and using iconv converted string to UTF8 and inserted into PostgreSQL
 database and everything worked fine. It makes me think that problem might be
 somewhere in cutting line in peaces, maybe it does something wrong, cuts
 some good unicode character into peaces and so invalid byte sequence
 appears. If i had correct function list for Python, i could run more tests.

 PS. Just run test with 4.6 version, to see if it still works with 4.6
 version, then we could more easy diagnose by looking in file changes.
 I'll post the results

 Thank you.

 On Sat, Apr 23, 2011 at 3:23 PM, Davide Alberani 
 davide.alber...@gmail.com wrote:

 On Wed, Apr 20, 2011 at 14:08, darklow dark...@gmail.com wrote:
  Still no luck :/ maybe the problem is in some environmental variables or
  settings, which on installed version are present, but running from
 source
  are missing or incorrect?

 Seems unlikely to me.

  What about this, i printed out some variables:
  print sys.stdout.encoding - UTF-8
  print sys.stdin.encoding   - UTF-8
  print sys.getdefaultencoding(); - ascii
  Is it ok that  sys.getdefaultencoding(); == ascii ?

 These are fine.

 I've reproduced - at the best of my capabilities - your environment:
 - no IMDbPY installed in the system.
 - IMDbPY from source (the latest version in the Mercurial repository),
  setting the PYTHONPATH environment variable to point to the
  source directory.
 - the cutils C module was not compiled.
 - the last actors.list.gz file.
 - postgres 8.4; my database was created with these settings:
  CREATE DATABASE imdb
WITH OWNER = postgres
   ENCODING = 'UTF8'
   TABLESPACE = pg_default
   LC_COLLATE = 'it_IT.utf8'
   LC_CTYPE = 'it_IT.utf8'
   CONNECTION LIMIT = -1;

 I've run it with your and other portions of the actors.list.gz file, and
 everything went fine.

 Now... if I were you, I'd:
 - create a virtualenv environment with:
virtualenv --no-site-packages
 - install in it IMDbPY, using easy_install or pip (the executable in
  your virtualenv, I mean) so that you'll have all the correct dependecies
  available.
 - run the imdbpy2sql.py within your virtualenv.

 If it still fails:
 - check your postgres settings.
 - try using SQLite (just for a test) - see notes in README.sqldb


 HTH,
 --
 Davide 

Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread Davide Alberani
On Sun, Apr 24, 2011 at 20:03, Thomas Stewart tho...@stewarts.org.uk wrote:

 I've just had a try using sqlite with fresh lists and on my Debian
 system and I get this:

 thomas@ikaite:~$ /tmp/imdbpy2sql.py -d /home/thomas/Desktop/imdb/lists -u 
 sqlite:///home/thomas/Desktop/imdb/imdb.db --sqlite-transactions
 IMPORTING psyco... DONE!
  [...]
    CURS.executemany(self.sqlstr, self.converter(dataList))
 pysqlite2.dbapi2.ProgrammingError: You must not use 8-bit bytestrings unless 
 you use a text_factory that can interpret 8-bit bytestrings (like 
 text_factory = str). It is highly recommended that you instead just switch 
 your application to Unicode strings.

This specific bug (a bad interaction between SQLObject and SQLite) should
be fixed in the version in the Mercurial repository; isn't it?


-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread Davide Alberani
On Sun, Apr 24, 2011 at 21:03, darklow dark...@gmail.com wrote:

 I tried reinstalling all installed dependancies and run from clean sources,
 but no luck.
 I tried to run scripts with SQLAlchemy instead of SQLObject, but same error,
 so the problem is not there.

Perfect - these tests are really important to spot the problem.

 Every test takes about 1h, because error takes place in Actors Cast list.

Wait: I'll read the rest of your mails tomorrow, but this can help you
to do things faster: you don't need the other files at all.
Simply put the actors.list.gz file in a directory by itself, and run
imdbpy2sql.py
with this directory as -d argument.
You can even use a shorter version of actors.list.gz, just remember to leave
the lines at the begin and at the end (various separators are used to identify
where the data begin), like I did with the actors.lists.gz file that I attached
some days ago.

In the 'docs/goodies' directory you'll find the 'reduce.sh' script, which
takes a whole directory of *.list.gz files and reduce them to 1% of
their length.

 It makes me think that problem might be
 somewhere in cutting line in peaces, maybe it does something wrong, cuts
 some good unicode character into peaces and so invalid byte sequence
 appears.

My guess, too... it's just that I can't see where it happens... :-/

Thanks for your tests!

-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-24 Thread Davide Alberani
On Sun, Apr 24, 2011 at 22:44, darklow dark...@gmail.com wrote:
 Yes i can confirm - Script version 4.6 works perfectly on same server with
 same files.
 And i think by this we come closer to solution.

Excellent!  (well, it still baffles me why I'm absolutely unable to
reproduce the problem on my system, but that's another story...)

 Maybe this helps to identify the problem, this is what we did on our server.
 (Remember, we are doing this copying because there are only stable versions
 for Debian on server allowed, but we need those md5 hashes from 4.7 version)

I'll look at your setup tomorrow.  I'll surely sound pedantic, but... seriously:
why you don't use a virtualenv environment?  It's easy to install and
doesn't require root privileges.


-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-23 Thread Davide Alberani
On Wed, Apr 20, 2011 at 14:08, darklow dark...@gmail.com wrote:
 Still no luck :/ maybe the problem is in some environmental variables or
 settings, which on installed version are present, but running from source
 are missing or incorrect?

Seems unlikely to me.

 What about this, i printed out some variables:
 print sys.stdout.encoding - UTF-8
 print sys.stdin.encoding   - UTF-8
 print sys.getdefaultencoding(); - ascii
 Is it ok that  sys.getdefaultencoding(); == ascii ?

These are fine.

I've reproduced - at the best of my capabilities - your environment:
- no IMDbPY installed in the system.
- IMDbPY from source (the latest version in the Mercurial repository),
  setting the PYTHONPATH environment variable to point to the
  source directory.
- the cutils C module was not compiled.
- the last actors.list.gz file.
- postgres 8.4; my database was created with these settings:
  CREATE DATABASE imdb
WITH OWNER = postgres
   ENCODING = 'UTF8'
   TABLESPACE = pg_default
   LC_COLLATE = 'it_IT.utf8'
   LC_CTYPE = 'it_IT.utf8'
   CONNECTION LIMIT = -1;

I've run it with your and other portions of the actors.list.gz file, and
everything went fine.

Now... if I were you, I'd:
- create a virtualenv environment with:
virtualenv --no-site-packages
- install in it IMDbPY, using easy_install or pip (the executable in
  your virtualenv, I mean) so that you'll have all the correct dependecies
  available.
- run the imdbpy2sql.py within your virtualenv.

If it still fails:
- check your postgres settings.
- try using SQLite (just for a test) - see notes in README.sqldb


HTH,
-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-20 Thread darklow
Still no luck :/ maybe the problem is in some environmental variables or
settings, which on installed version are present, but running from source
are missing or incorrect?

What about this, i printed out some variables:

print sys.stdout.encoding - UTF-8
print sys.stdin.encoding   - UTF-8
print sys.getdefaultencoding(); - ascii

Is it ok that  sys.getdefaultencoding(); == ascii ?

Maybe there are some more variables i should check?


On Tue, Apr 19, 2011 at 11:11 PM, Davide Alberani davide.alber...@gmail.com
 wrote:

 On Mon, Apr 18, 2011 at 09:30, Davide Alberani
 davide.alber...@gmail.com wrote:
 
  Thanks for the file, I hope to look at it within a day or two.

 Ok: the file is correctly encoded in iso8859-1, as expected, and contains
 no garbage.

 Using it as the only input for imdbpy2sql.py (putting the attached file in
 a directory by itself), I can run the script with no errors (besides
 the expected
 warnings about missing files).

 I'm using the version from the Mercurial repository, without the cutils.so
 library.

 Please, if you can't install IMDbPY in your system, consider the use
 of virtualenv.
 Having tried that, I have to recommend you to double check the
 settings of your Postgresql server for some kind of incoherences
 about encodings and collations.

 HTH,
 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-19 Thread Davide Alberani
On Mon, Apr 18, 2011 at 09:30, Davide Alberani
davide.alber...@gmail.com wrote:

 Thanks for the file, I hope to look at it within a day or two.

Ok: the file is correctly encoded in iso8859-1, as expected, and contains
no garbage.

Using it as the only input for imdbpy2sql.py (putting the attached file in
a directory by itself), I can run the script with no errors (besides
the expected
warnings about missing files).

I'm using the version from the Mercurial repository, without the cutils.so
library.

Please, if you can't install IMDbPY in your system, consider the use
of virtualenv.
Having tried that, I have to recommend you to double check the
settings of your Postgresql server for some kind of incoherences
about encodings and collations.

HTH,
-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/


actors.list.gz
Description: GNU Zip compressed data
--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-18 Thread darklow
On Sun, Apr 17, 2011 at 5:13 PM, Davide Alberani
davide.alber...@gmail.comwrote:

 On Sun, Apr 17, 2011 at 14:04, darklow dark...@gmail.com wrote:
  Updated this morning to latest data files, no change and unfortunately
 this
  fix also doesn't work.

 Hmm...  to debug a problem like this without being able to reproduce,
 is extremely difficult. :-/

  This error started when we uninstalled imdbpy (left all the dependency
 libs)
  and started run it without installation. Maybe there is some kind of
 problem
  and some kind of hidden unicode dependencies? Maybe you can try to run
  without installation, jus from source?

 Have you some very good reason to do so? :-)


We have Debian linux on our server and our sysadmin allows only stable
packs. However latest version of imdbpy has these md5 checksum that are
quite important in our situation, that is why i have to run it from source.


 Can't you try to purge every reference to IMDbPY left on the
 system (search for the scripts in /usr/bin/ and /usr/local/bin/ and
 be sure that import imdb fails, at the python prompt) and see
 if the problem is solved, after IMDbPY 4.7 is reinstalled?


Unfortunately right now i can't do reinstall, just to run it by source.
However if this is the reason and there will be no way to fix this, i'll try
to convince sysadmin to install this version from unofficial debian packs


 If you have problems locating the IMDbPY package, just open
 the Python prompt and:
  import imdb
  print imdb

  Also every time i start the script i receive two warnings:
  2011-04-17 11:13:37,398 WARNING [imdbpy.parser.sql.aux]
  /data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:125: Unable
 to
  import the cutils.ratcliff function.  Searching names and titles using
 the
  sql data access system will be slower.

 This will force IMDbPY to use some pure-python fall-back functions.
 It's entirely possible that there are some bug in these functions, even
 if a run without cutils.so is running fine, for me (so far).

  IMPORTING psyco... FAILED (not a big deal, everything is alright...)

 That's not a problem for sure.

 Right now, my first guess is that somewhere, after the *.list files ar
 read and turned into utf-8 encoded strings, the imdbpy2sql.py
 script does Something Very Wrong(tm) to a string (like cutting it at a
 certain
 place, ending up cutting a single utf-8 encoded char in two: this could
 explain the error).

 I've tried the conversion suggested by Petite Abeille, and it works fine.

 Please, could you cut a small piece (few kilobytes) of the actors.list
 file,
 and attach it (no cut-and-paste)?
 It goes without saying that you should chose a portion where you see
 (or guess are) the strange chars :-)


I attached the small part of actors.list file right the place with the
broken characters (used unix sed command to cut the problematic lines out).



 Thanks!

 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/



actors.list.small
Description: Binary data
--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-18 Thread Davide Alberani
On Mon, Apr 18, 2011 at 08:53, darklow dark...@gmail.com wrote:

 We have Debian linux on our server and our sysadmin allows only stable
 packs. However latest version of imdbpy has these md5 checksum that are
 quite important in our situation, that is why i have to run it from source.

Ehhh... what about a virtual machine or - even easier - virtualenv [0]

Thanks for the file, I hope to look at it within a day or two.


+++
[0] http://pypi.python.org/pypi/virtualenv
-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-17 Thread Petite Abeille

On Apr 13, 2011, at 8:46 AM, darklow wrote:

 Ananlyzed error a bit more. Mostly these errors occur in Japanese actors
 (actors.list), in filmography there apperars strange characters:

Sounds like a character set encoding issue.

Originally, something like actors.list is ISO-8859-1 encoded. IMDbPY converts 
it to UTF-8 internally:

http://imdbpy.sourceforge.net/docs/README.utf8.txt

You can check if actors.list is properly encoded by converting it to UTF-8 
outside of IMDbPY.

For example, using iconv:

iconv -f ISO-8859-1 -t UTF-8  actors.list  actors.list.txt

This should result in a proper UTF-8 encoded file. If anything goes wrong, 
iconv should point out the issue.

For example, the entries for Hayakawa, Yuzo should look like the following:

A, zerosen (1965)  [Tokunaga]
Abunai Deka ritaanzu (1996)
Akumyo ichidai (1967)
Aru joshi kôkôi no kiroku: shisshin (1969)
Chijin no ai (1967)  [Namikawa]
Dai akutô (1968)
Daikaijû kettô: Gamera tai Barugon (1966)  [Kawajiri]  3
Dorodarake no junjô (1977)  [Det. Seki]
Furin (1965)  [Saruoka]  6
Genkai yûkyôden: Yabure kabure (1970)  [Yanagawa]
Haru kôrô no hana no en (1958)  [Sata]
Hiroshima (1995) (TV)  [Koshiro Oikawa]  70
Jet F-104 dassyutsu seyo (1968)
Kaidan otoshiana (1968)  [Sakabe]
Kawaki (1958)  4
Kimimachi-bune (1954)  (as Yûji Hayakawa)  [Tomii]
Konki (1961)
Malenkiy beglets (1966)
Mi wa jukushitari (1959)  [Chef at Mizumi]
Mushukunin Mikogami no Jôkichi: Kiba wa hikisaita (1972)  9
Nagasugita haru (1957)  [Student]
Nihonkai daikaisen: Umi yukaba (1983)  [Kataoka]
Nippon chinbotsu (1974)  [SDF General]
Nobi (1959)  (as Yuji Hayakawa)
Obi o toku Natsuko (1965)  [Kwashima]  6
Okoto to Sasuke (1961)  (as Yûzô Hayakawa)
Onna ga aishite nikumu toki (1963)  [Iwashita]
Onna tobakushi (1967)
Rikugun Nakano gakko (1966)  [Colonel Iwakura]  6
Rikugun Nakano gakko: Ryu-sango shirei (1967)
Sakura no ki no shita de (1989)
Salary man donto bushi - Kiraku na kagyô to kita monda (1962)  (as Yûzô 
Hayakawa)  [Shibayama]
Satsujinsha (1966)
Seisaku no tsuma (1965)  [Sergeant]
Sekkusu chekku: Daini no sei (1968)  [Sasanuma]  5
Shiroi Kyotou (1966)  14
Shuntou (1989) (TV)  15
Tokyo no josei (1960)
Tokyo onigiri musume (1961)  (as Yûzô Hayakawa)  [Draper]
Uchu kaijû Gamera (1980)  [Policeman]
Yoru no wana (1967)  [Fumikichi Hayashi]
Zatôichi rôyaburi (1967)
Zoku sex doctor no kiroku (1968)  5
Kôya no surônin (1972)  11
Sukeban Deka (1985) {Nerawareta atakkâ (#1.10)}  (as Yûzô Hayakawa)  14
Zoku zoku jiken: Tsuki no keshiki (1980) {(#1.2)}  [Dr. Arai]  11

There shouldn't be any strange characters in sight :)


--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-17 Thread Davide Alberani
On Sun, Apr 17, 2011 at 14:04, darklow dark...@gmail.com wrote:
 Updated this morning to latest data files, no change and unfortunately this
 fix also doesn't work.

Hmm...  to debug a problem like this without being able to reproduce,
is extremely difficult. :-/

 This error started when we uninstalled imdbpy (left all the dependency libs)
 and started run it without installation. Maybe there is some kind of problem
 and some kind of hidden unicode dependencies? Maybe you can try to run
 without installation, jus from source?

Have you some very good reason to do so? :-)
Can't you try to purge every reference to IMDbPY left on the
system (search for the scripts in /usr/bin/ and /usr/local/bin/ and
be sure that import imdb fails, at the python prompt) and see
if the problem is solved, after IMDbPY 4.7 is reinstalled?

If you have problems locating the IMDbPY package, just open
the Python prompt and:
 import imdb
 print imdb

 Also every time i start the script i receive two warnings:
 2011-04-17 11:13:37,398 WARNING [imdbpy.parser.sql.aux]
 /data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:125: Unable to
 import the cutils.ratcliff function.  Searching names and titles using the
 sql data access system will be slower.

This will force IMDbPY to use some pure-python fall-back functions.
It's entirely possible that there are some bug in these functions, even
if a run without cutils.so is running fine, for me (so far).

 IMPORTING psyco... FAILED (not a big deal, everything is alright...)

That's not a problem for sure.

Right now, my first guess is that somewhere, after the *.list files ar
read and turned into utf-8 encoded strings, the imdbpy2sql.py
script does Something Very Wrong(tm) to a string (like cutting it at a certain
place, ending up cutting a single utf-8 encoded char in two: this could
explain the error).

I've tried the conversion suggested by Petite Abeille, and it works fine.

Please, could you cut a small piece (few kilobytes) of the actors.list file,
and attach it (no cut-and-paste)?
It goes without saying that you should chose a portion where you see
(or guess are) the strange chars :-)

Thanks!

-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-16 Thread Davide Alberani
On Wed, Apr 13, 2011 at 08:46, darklow dark...@gmail.com wrote:
 Maybe someone knows some fast dirty fix at least how to skip such invalid
 byte sequence strings while there are no official fix, so i can finish the
 import?
 Can we detect invalid byte characters?

Hi again,
actually my problem is that I'm unable to reproduce this bug. :-)
Using Postgresql and SQLObject, my run goes on smooth.

I have downloaded the 'actors.list.gz' file today, so it's possible that some
garbage was removed.

Anyway, the previously proposed solution was obviously flawed, since
the problem was on _character_ names.

So, let's edit again the imdbpy2sql.py file and change the lines around 1540
so that they become:

movieid = CACHE_MID.addUnique(title)
if role is not None:
roles = filter(None, [x.strip() for x in role.split('/')])
for role in roles:
role = role.replace('\xec\x8c\xa0', '')  # TEMPORARY FIX
cid = CACHE_CID.addUnique(role)
sqldata.add((pid, movieid, cid, note, order))

Maybe this will help... who knows? :-)

-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-14 Thread darklow
Unfortunately adding this line
k = k.replace('\xec\x8c\xa0', '') in the place you mentioned wont help.

Still same error on same place :(

SCANNING actor: Havel, Jir?
 * FLUSHING CharactersCache...
Traceback (most recent call last):
 .
self.flush()
  File ./imdbpy2sql.py, line 1195, in _toDB
CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320

On Wed, Apr 13, 2011 at 11:56 PM, Davide Alberani davide.alber...@gmail.com
 wrote:

 On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:
 
File ./imdbpy2sql.py, line 1194, in _toDB
  CURS.executemany(self.sqlstr, self.converter(l))
  psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320
  HINT:  This error can also happen if the byte sequence does not match the
  encoding expected by the server, which is controlled by
 client_encoding.

 Hi all,
 I'm writing regarding the recent 0xc320 problem with IMDbPY.
 The above notice is extremely interesting, and should be investigated:
 how can it be that 0xc320 is not UTF8 encodable?
 It should work; from the Python prompt:
   unichr(0xc320).encode('utf8')
  '\xec\x8c\xa0'

 Anyway, as a very fast and dirty fix (the main problem is probably some
 crap in the data files), try this: after line 1181 of imdbpy2sql.py, add:
  k = k.replace('\xec\x8c\xa0', '')

 So that the nearby lines will become:
try:
k = k.replace('\xec\x8c\xa0', '')
t = analyze_name(k)
except IMDbParserError:

 Please be aware that this fix was not tested at all, but I'm
 almost sure that, at the above point, 'k' is a string encoded in utf8.

 Anyway, beside the garbage theory, I have another idea
 about the source of the error, but I have to verify it later...

 Bye, and let me know if it works!

 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-13 Thread darklow
Since i am not familiar with python, maybe you could suggest some fast fix
so that scripts doesn't hangs?
Maybe this helps: In PHP we have perfeclty same error with encoding when
importing some wrong decoded data. When we have no control over data and we
cant all the time do utf8_encode since it could encode string twice - to
bypass this error i use this function which at least prevents from
postgresql error:

function  fix_encoding($in_str) {
$cur_encoding = mb_detect_encoding($in_str) ;
if($cur_encoding == UTF-8  mb_check_encoding($in_str,UTF-8)){
return $in_str;
}else{
return utf8_encode($in_str);
}
}

Maybe you can help to adapt this function to Python if similar functions are
available so we can use it as a quick fix?
Thanks a lot.

On Mon, Apr 11, 2011 at 10:46 PM, Davide Alberani davide.alber...@gmail.com
 wrote:

 On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:
 
File ./imdbpy2sql.py, line 1194, in _toDB
  CURS.executemany(self.sqlstr, self.converter(l))
  psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320
  HINT:  This error can also happen if the byte sequence does not match the
  encoding expected by the server, which is controlled by
 client_encoding.
 
  Any suggestions? I found similar topic, but there were also no solutions.

 Yes, I've had other reports about this bug.
 Seems to be related to some garbage in the actors.list.gz file.
 I hope to have time to investigate the problem within a week or two.

 Thanks for the bug report!

 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/

--
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-13 Thread darklow
Maybe someone knows some fast dirty fix at least how to skip such invalid
byte sequence strings while there are no official fix, so i can finish the
import?
Can we detect invalid byte characters? Maybe we can somehow replace or get
rid of *0xc320* character, which mostly is appearing. Or skip these rows.

Ananlyzed error a bit more. Mostly these errors occur in Japanese actors
(actors.list), in filmography there apperars strange characters:

Hayakawa, Yuzo

Burai hij*8)*
*
*

Tried to delete these rows manually, but the are too much of them :/
Thank you.


On Wed, Apr 13, 2011 at 9:45 AM, darklow dark...@gmail.com wrote:

 Since i am not familiar with python, maybe you could suggest some fast fix
 so that scripts doesn't hangs?
 Maybe this helps: In PHP we have perfeclty same error with encoding when
 importing some wrong decoded data. When we have no control over data and we
 cant all the time do utf8_encode since it could encode string twice - to
 bypass this error i use this function which at least prevents from
 postgresql error:

 function  fix_encoding($in_str) {
 $cur_encoding = mb_detect_encoding($in_str) ;
 if($cur_encoding == UTF-8  mb_check_encoding($in_str,UTF-8)){
 return $in_str;
 }else{
 return utf8_encode($in_str);
 }
 }

 Maybe you can help to adapt this function to Python if similar functions
 are available so we can use it as a quick fix?
 Thanks a lot.

 On Mon, Apr 11, 2011 at 10:46 PM, Davide Alberani 
 davide.alber...@gmail.com wrote:

 On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:
 
File ./imdbpy2sql.py, line 1194, in _toDB
  CURS.executemany(self.sqlstr, self.converter(l))
  psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320
  HINT:  This error can also happen if the byte sequence does not match
 the
  encoding expected by the server, which is controlled by
 client_encoding.
 
  Any suggestions? I found similar topic, but there were also no
 solutions.

 Yes, I've had other reports about this bug.
 Seems to be related to some garbage in the actors.list.gz file.
 I hope to have time to investigate the problem within a week or two.

 Thanks for the bug report!

 --
 Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
 http://www.mimante.net/



--
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-13 Thread Davide Alberani
On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:

   File ./imdbpy2sql.py, line 1194, in _toDB
     CURS.executemany(self.sqlstr, self.converter(l))
 psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320
 HINT:  This error can also happen if the byte sequence does not match the
 encoding expected by the server, which is controlled by client_encoding.

Hi all,
I'm writing regarding the recent 0xc320 problem with IMDbPY.
The above notice is extremely interesting, and should be investigated:
how can it be that 0xc320 is not UTF8 encodable?
It should work; from the Python prompt:
   unichr(0xc320).encode('utf8')
  '\xec\x8c\xa0'

Anyway, as a very fast and dirty fix (the main problem is probably some
crap in the data files), try this: after line 1181 of imdbpy2sql.py, add:
  k = k.replace('\xec\x8c\xa0', '')

So that the nearby lines will become:
try:
k = k.replace('\xec\x8c\xa0', '')
t = analyze_name(k)
except IMDbParserError:

Please be aware that this fix was not tested at all, but I'm
almost sure that, at the above point, 'k' is a string encoded in utf8.

Anyway, beside the garbage theory, I have another idea
about the source of the error, but I have to verify it later...

Bye, and let me know if it works!

-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding UTF8

2011-04-11 Thread Davide Alberani
On Mon, Apr 11, 2011 at 18:35, darklow dark...@gmail.com wrote:

   File ./imdbpy2sql.py, line 1194, in _toDB
     CURS.executemany(self.sqlstr, self.converter(l))
 psycopg2.DataError: invalid byte sequence for encoding UTF8: 0xc320
 HINT:  This error can also happen if the byte sequence does not match the
 encoding expected by the server, which is controlled by client_encoding.

 Any suggestions? I found similar topic, but there were also no solutions.

Yes, I've had other reports about this bug.
Seems to be related to some garbage in the actors.list.gz file.
I hope to have time to investigate the problem within a week or two.

Thanks for the bug report!

-- 
Davide Alberani davide.alber...@gmail.com  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

--
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now!  http://p.sf.net/sfu/ibm-webcastpromo
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help