Hi,
I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON (Text) data types in Cassandra & its ease of use/impact across other DSE products - Spark & Solr. We are migrating an OLTP database from RDBMS to Cassandra which has 200+ columns and with an average daily volume of 25 million records/day. The access pattern is quite simple and in OLTP the access is always based on primary key. For OLAP, there are other access patterns with a combination of columns where we are planning to use Spark & Solr for search & analytical capabilities (in a separate DC). The average size of each record is ~2KB and the application workload is of type INSERT only (no updates/deletes). We conducted performance tests on two types of data models 1) A table with 200+ columns similar to RDBMS 2) A table with 15 columns where only critical business fields are maintained as key/value pairs and the remaining are stored in a single column of type TEXT as JSON object. In the results, we noticed significant advantage in the JSON model where the performance was 5X times better than columnar data model. Alternatively, we are in the process of evaluating performance for other data types - MAP & UDT instead of using TEXT for storing JSON object. Sample data model structure for columnar, json, map & udt types are given below: [cid:9136e044-677b-4e0a-8bb2-5305acc2782d] I would like to know the performance, transformation, compatibility & portability impacts & east-of-use of each of these data types from Search & Analytics perspective (Spark & Solr). I'm aware that we will have to use field transformers in Solr to use index on JSON fields, not sure about MAP & UDT. Any help on comparison of these data types in Spark & Solr is highly appreciated. Regards, KR