from:"marlon hendred"

unsubscribe

2017-09-19 Thread marlon hendred

unsubscribe

Re: Running hadoop jobs over compressed column familes with datastatx

2014-04-29 Thread marlon hendred

I was able to solve the issue. There was another layer of compression
happening in the DAO that was using java.util.zip.Deflater/Inflater, along
with the snappy compression defined on the CF. The solution was to extend
CassandraStorage and override the getNext() method. The new implementation
calls super.getNext() and inflates the Tuples where appropriate.

-Marlon


On Wed, Apr 23, 2014 at 1:39 PM, marlon hendred mhend...@gmail.com wrote:

 Hi,

 I'm attempting to dump a pig relation of a compressed column family. Its a
 single column whose value is a json blob. It's compressed via snappy
 compression and the value validator is BytesType. After I create the
 relation and dump I get garbage. Here is the describe:

 ColumnFamily: CF
   Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType
   Default column value validator:
 org.apache.cassandra.db.marshal.BytesType
   Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
   GC grace seconds: 86400
   Compaction min/max thresholds: 2/32
   Read repair chance: 0.1
   DC Local Read repair chance: 0.0
   Populate IO Cache on flush: false
   Replicate on write: true
   Caching: KEYS_ONLY
   Bloom Filter FP chance: default
   Built indexes: []
   Compaction Strategy:
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
   Compression Options:
 sstable_compression:
 org.apache.cassandra.io.compress.SnappyCompressor

 Pig stuff:
 rows = LOAD 'cql://Keyspace/CF' using CqlStorage();

 I've tried to overwrite the schema by adding 'as (key: chararray, col1:
 chararray, value: chararray)' but when I dump this it still looks like its
 binary.

 Do I need to implement my own CqlStorage() here that uncompress or am I
 just missing something? I've done some googling but haven't seen anything
 on the subject.  Also I am using Datastax Enterprise. 3.1. Thanks in
 advance!

 -m

Running hadoop jobs over compressed column familes with datastatx

2014-04-23 Thread marlon hendred

Hi,

I'm attempting to dump a pig relation of a compressed column family. Its a
single column whose value is a json blob. It's compressed via snappy
compression and the value validator is BytesType. After I create the
relation and dump I get garbage. Here is the describe:

ColumnFamily: CF
  Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType
  Default column value validator:
org.apache.cassandra.db.marshal.BytesType
  Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
  GC grace seconds: 86400
  Compaction min/max thresholds: 2/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Populate IO Cache on flush: false
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression:
org.apache.cassandra.io.compress.SnappyCompressor

Pig stuff:
rows = LOAD 'cql://Keyspace/CF' using CqlStorage();

I've tried to overwrite the schema by adding 'as (key: chararray, col1:
chararray, value: chararray)' but when I dump this it still looks like its
binary.

Do I need to implement my own CqlStorage() here that uncompress or am I
just missing something? I've done some googling but haven't seen anything
on the subject.  Also I am using Datastax Enterprise. 3.1. Thanks in
advance!

-m

unsubscribe

Re: Running hadoop jobs over compressed column familes with datastatx

Running hadoop jobs over compressed column familes with datastatx

3 matches

Site Navigation

Mail list logo

Footer information