Working with legacy data via CQL

Erik Forsberg Tue, 11 Nov 2014 06:02:15 -0800

Hi!

I have some data in a table created using thrift. In cassandra-cli, the
'show schema' output for this table is:


create column family Users
  with column_type = 'Standard'
  and comparator = 'AsciiType'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'LexicalUUIDType'
  and column_metadata = [
    {column_name : 'date_created',
    validation_class : LongType},
    {column_name : 'active',
    validation_class : IntegerType,
    index_name : 'Users_active_idx_1',
    index_type : 0},
    {column_name : 'email',
    validation_class : UTF8Type,
    index_name : 'Users_email_idx_1',
    index_type : 0},
    {column_name : 'username',
    validation_class : UTF8Type,
    index_name : 'Users_username_idx_1',
    index_type : 0},
    {column_name : 'default_account_id',
    validation_class : LexicalUUIDType}];

>From cqlsh, it looks like this:

[cqlsh 4.1.1 | Cassandra 2.0.11 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh:test> describe table Users;

CREATE TABLE "Users" (
  key 'org.apache.cassandra.db.marshal.LexicalUUIDType',
  column1 ascii,
  active varint,
  date_created bigint,
  default_account_id 'org.apache.cassandra.db.marshal.LexicalUUIDType',
  email text,
  username text,
  value text,
  PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE;

CREATE INDEX Users_active_idx_12 ON "Users" (active);

CREATE INDEX Users_email_idx_12 ON "Users" (email);

CREATE INDEX Users_username_idx_12 ON "Users" (username);

Now, when I try to extract data from this using cqlsh or the
python-driver, I have no problems getting data for the columns which are
actually UTF8,but for those where column_metadata have been set to
something else, there's trouble. Example using the python driver:

-- snip --

In [8]: u = uuid.UUID("a6b07340-047c-4d4c-9a02-1b59eabf611c")

In [9]: sess.execute('SELECT column1,value from "Users" where key = %s
and column1 = %s', [u, 'username'])
Out[9]: [Row(column1='username', value=u'uc6vf')]

In [10]: sess.execute('SELECT column1,value from "Users" where key = %s
and column1 = %s', [u, 'date_created'])
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-10-d06f98a160e1> in <module>()
----> 1 sess.execute('SELECT column1,value from "Users" where key = %s
and column1 = %s', [u, 'date_created'])

/home/forsberg/dev/virtualenvs/ospapi/local/lib/python2.7/site-packages/cassandra/cluster.pyc
in execute(self, query, parameters, timeout, trace)
   1279         future = self.execute_async(query, parameters, trace)
   1280         try:
-> 1281             result = future.result(timeout)
   1282         finally:
   1283             if trace:

/home/forsberg/dev/virtualenvs/ospapi/local/lib/python2.7/site-packages/cassandra/cluster.pyc
in result(self, timeout)
   2742                     return PagedResult(self, self._final_result)
   2743             elif self._final_exception:
-> 2744                 raise self._final_exception
   2745             else:
   2746                 raise OperationTimedOut(errors=self._errors,
last_host=self._current_host)

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 6:
unexpected end of data

-- snap --

cqlsh gives me similar errors.

Can I tell the python driver to parse some column values as integers, or
is this an unsupported case?

For sure this is an ugly table, but I have data in it, and I would like
to avoid having to rewrite all my tools at once, so if I could support
it from CQL that would be great.

Regards,
\EF

Working with legacy data via CQL

Reply via email to