Cassandra named fields support

Dan Hanley Fri, 05 Dec 2014 05:58:54 -0800

Hi
I'm using Gora (0.3) to pipe Nutch (2.2.1) data into Cassandra, eventually I'm 
hoping to analyse it with Spark.


The Gora-Cassandra mapping puts everything in three legacy style Cassandra 
tables, f, p and sc all created roughly like:

CREATE TABLE p (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE AND....

This is not easy to parse as an RDD in Spark.

It would be easier if e.g. the mapping:

<field name="title" family="p" qualifier="t"/>
<field name="text" family="p" qualifier="c"/>
<field name="signature" family="p" qualifier="sig"/>
<field name="prevSignature" family="p" qualifier="psig"/>

Produced a table like:

CREATE TABLE p (
  key blob,
  title blob,
  text blob,
 signature blob,
 prevSignature blob
  PRIMARY KEY (key)
) ....

So my question - is this something that is possible in more recent versions of 
Gora? Or if not would it be something I could reasonably expect to develop 
myself (I have no familiarity with the Gora codebase... any pointers would be 
welcome)

Best Regards

Dan


Dan Hanley
CTO, ActiveStandards
Direct: +44 (0)207 019 4718
Switchboard: +44 (0)20 7019 4700
[email protected]<mailto:[email protected]>

www.activestandards.com<http://www.activestandards.com>
________________________________

Driving Digital Transformation:
ActiveStandards launches new enterprise digital governance 
solutions<https://activestandards.com/about-us/newsroom/driving-digital-transformation-activestandards-launches-new-enterprise-digital>

________________________________

ActiveStandards, Studio 1001 Highgate Studios, 53-79 Highgate Road, London, NW5 
1TL
Registered in England: No. 3592714, VAT No. 625574723

Cassandra named fields support

Reply via email to