Hi,

I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON (Text) 
data types in Cassandra & its ease of use/impact across other DSE products - 
Spark & Solr. We are migrating an OLTP database from RDBMS to Cassandra which 
has 200+ columns and with an average daily volume of 25 million records/day. 
The access pattern is quite simple and in OLTP the access is always based on 
primary key. For OLAP, there are other access patterns with a combination of 
columns where we are planning to use Spark & Solr for search & analytical 
capabilities (in a separate DC).


The average size of each record is ~2KB and the application workload is of type 
INSERT only (no updates/deletes). We conducted performance tests on two types 
of data models

1) A table with 200+ columns similar to RDBMS

2) A table with 15 columns where only critical business fields are maintained 
as key/value pairs and the remaining are stored in a single column of type TEXT 
as JSON object.


In the results, we noticed significant advantage in the JSON model where the 
performance was 5X times better than columnar data model. Alternatively, we are 
in the process of evaluating performance for other data types - MAP & UDT 
instead of using TEXT for storing JSON object. Sample data model structure for 
columnar, json, map & udt types are given below:


[cid:9136e044-677b-4e0a-8bb2-5305acc2782d]


I would like to know the performance, transformation, compatibility & 
portability impacts & east-of-use of each of these data types from Search & 
Analytics perspective (Spark & Solr). I'm aware that we will have to use field 
transformers in Solr to use index on JSON fields, not sure about MAP & UDT. Any 
help on comparison of these data types in Spark & Solr is highly appreciated.


Regards, KR

Reply via email to