Cassandra named fields support

2014-12-05 Thread Dan Hanley
Hi
I'm using Gora (0.3) to pipe Nutch (2.2.1) data into Cassandra, eventually I'm 
hoping to analyse it with Spark.

The Gora-Cassandra mapping puts everything in three legacy style Cassandra 
tables, f, p and sc all created roughly like:

CREATE TABLE p (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE AND

This is not easy to parse as an RDD in Spark.

It would be easier if e.g. the mapping:

field name=title family=p qualifier=t/
field name=text family=p qualifier=c/
field name=signature family=p qualifier=sig/
field name=prevSignature family=p qualifier=psig/

Produced a table like:

CREATE TABLE p (
  key blob,
  title blob,
  text blob,
 signature blob,
 prevSignature blob
  PRIMARY KEY (key)
) 

So my question - is this something that is possible in more recent versions of 
Gora? Or if not would it be something I could reasonably expect to develop 
myself (I have no familiarity with the Gora codebase... any pointers would be 
welcome)

Best Regards

Dan


Dan Hanley
CTO, ActiveStandards
Direct: +44 (0)207 019 4718
Switchboard: +44 (0)20 7019 4700
dan.han...@activestandards.commailto:dan.han...@activestandards.com

www.activestandards.comhttp://www.activestandards.com


Driving Digital Transformation:
ActiveStandards launches new enterprise digital governance 
solutionshttps://activestandards.com/about-us/newsroom/driving-digital-transformation-activestandards-launches-new-enterprise-digital



ActiveStandards, Studio 1001 Highgate Studios, 53-79 Highgate Road, London, NW5 
1TL
Registered in England: No. 3592714, VAT No. 625574723


Re: Cassandra named fields support

2014-12-05 Thread Lewis John Mcgibbney
Hi Dan,
I am currently working on implementing GORA-267 [0] Cassandra composite
primary key support within the context of the gora-cassandra module.
I agree with you that the physical mapping you see is not easy for
unpacking and parsing within Spark. We also permit use legacy super columns
within gora-cassandra which we should emigrate from.

I'll look into the GoraCassandra codebase soon enough and provide more
detail on what you/we would need to meet your requirements.
Thanks
Lewis

[0] https://issues.apache.org/jira/browse/GORA-267

On Fri, Dec 5, 2014 at 5:56 AM, Dan Hanley dan.han...@activestandards.com
wrote:

  Hi

 I’m using Gora (0.3) to pipe Nutch (2.2.1) data into Cassandra, eventually
 I’m hoping to analyse it with Spark.



 The Gora-Cassandra mapping puts everything in three legacy style Cassandra
 tables, f, p and sc all created roughly like:



 CREATE TABLE p (

   key blob,

   column1 blob,

   value blob,

   PRIMARY KEY ((key), column1)

 ) WITH COMPACT STORAGE AND….



 This is not easy to parse as an RDD in Spark.



 It would be easier if e.g. the mapping:



 field name=title family=p qualifier=t/
 field name=text family=p qualifier=c/
 field name=signature family=p qualifier=sig/
 field name=prevSignature family=p qualifier=psig/



 Produced a table like:



 CREATE TABLE p (

   key blob,

   title blob,

   text blob,

  signature blob,

  prevSignature blob

   PRIMARY KEY (key)

 ) ….



 So my question – is this something that is possible in more recent
 versions of Gora? Or if not would it be something I could reasonably expect
 to develop myself (I have no familiarity with the Gora codebase… any
 pointers would be welcome)



 Best Regards



 Dan





 *Dan Hanley*
 CTO, ActiveStandards
 Direct: +44 (0)207 019 4718
 Switchboard: +44 (0)20 7019 4700
 dan.han...@activestandards.com

 *www.activestandards.com http://www.activestandards.com*
--


 *Driving Digital Transformation: *
 ActiveStandards launches new enterprise digital governance solutions
 https://activestandards.com/about-us/newsroom/driving-digital-transformation-activestandards-launches-new-enterprise-digital
--

 ActiveStandards, Studio 1001 Highgate Studios, 53-79 Highgate Road,
 London, NW5 1TL
 Registered in England: No. 3592714, VAT No. 625574723




-- 
*Lewis*