Hola Hector,

2012/1/23 Hector <hecto...@gmail.com>

> Hi,
> I'm trying to use pytables to do fast queries on text corpora. These
> are plain text files with millions of sentences. Because the sentences
> are not only in English but in other languages, like Spanish, I need
> to also use utf-8 encoding. A simple table description should look
> like this:
>
> token: variable-length unicode
> sentence: variable-length unicode
>
> The 'token' column should have an index to allow fast queries on it. I
> don't even need to modify the table once it's loaded. All I care about
> is the querying speed.
>
> Maybe I'm missing something obvious here, but in the docs it seems
> that the only way to use unicode in pytables is by the VLUnicodeAtom
> class, but if I use this in the description of a createTable call I
> get the following error:
>
> TypeError: Passing an incorrect value to a table column. Expected a
> Col (or subclass) instance and got: "VLUnicodeAtom()". Please make use
> of the Col(), or descendant, constructor to properly initialize
> columns.
>
> Could you please provide a skeleton for how pytables work with such a
> table?
>

Unfortunately, Unicode columns in Table objects are not supported.  Storing
sentences in a VLArray seems like a more sensible approach.  Could you tell
us why are you after using a Table?

-- 
Francesc Alted
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to