Hi I'm using Gora (0.3) to pipe Nutch (2.2.1) data into Cassandra, eventually I'm hoping to analyse it with Spark.
The Gora-Cassandra mapping puts everything in three legacy style Cassandra tables, f, p and sc all created roughly like: CREATE TABLE p ( key blob, column1 blob, value blob, PRIMARY KEY ((key), column1) ) WITH COMPACT STORAGE AND.... This is not easy to parse as an RDD in Spark. It would be easier if e.g. the mapping: <field name="title" family="p" qualifier="t"/> <field name="text" family="p" qualifier="c"/> <field name="signature" family="p" qualifier="sig"/> <field name="prevSignature" family="p" qualifier="psig"/> Produced a table like: CREATE TABLE p ( key blob, title blob, text blob, signature blob, prevSignature blob PRIMARY KEY (key) ) .... So my question - is this something that is possible in more recent versions of Gora? Or if not would it be something I could reasonably expect to develop myself (I have no familiarity with the Gora codebase... any pointers would be welcome) Best Regards Dan Dan Hanley CTO, ActiveStandards Direct: +44 (0)207 019 4718 Switchboard: +44 (0)20 7019 4700 [email protected]<mailto:[email protected]> www.activestandards.com<http://www.activestandards.com> ________________________________ Driving Digital Transformation: ActiveStandards launches new enterprise digital governance solutions<https://activestandards.com/about-us/newsroom/driving-digital-transformation-activestandards-launches-new-enterprise-digital> ________________________________ ActiveStandards, Studio 1001 Highgate Studios, 53-79 Highgate Road, London, NW5 1TL Registered in England: No. 3592714, VAT No. 625574723

