If you have the time, I would suggest creating a prototype with both databases 
and trying it out. You should also have some idea of how this system might 
evolve in the future. It is important because that could very well help you 
make a decision. Mongo or Cassandra may work but if your requirements evolve in 
a way that works better with Cassandra, you might be better off going with 
As others have pointed out each database has it's own strength. Given that you 
may store a 20KB to 600MB row, you may be able to model it with Mongo as well 
as Cassandra. If you plan on having a separate index like ElasticSearch, Solr 
that is outside the database, I would suggest going with Cassandra.
Other factors to consider are licensing, operational cost, etc. 

    On Thursday, May 31, 2018, 9:01:09 AM PDT, Sudhakar Ganesan 
<sudhakar.gane...@flex.com.INVALID> wrote:  
 #yiv6912111178 #yiv6912111178 -- _filtered #yiv6912111178 {panose-1:2 4 5 3 5 
4 6 3 2 4;} _filtered #yiv6912111178 {font-family:Calibri;panose-1:2 15 5 2 2 2 
4 3 2 4;} _filtered #yiv6912111178 {font-family:Candara;panose-1:2 14 5 2 3 3 3 
2 2 4;}#yiv6912111178 #yiv6912111178 p.yiv6912111178MsoNormal, #yiv6912111178 
li.yiv6912111178MsoNormal, #yiv6912111178 div.yiv6912111178MsoNormal 
 a:link, #yiv6912111178 span.yiv6912111178MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv6912111178 a:visited, 
#yiv6912111178 span.yiv6912111178MsoHyperlinkFollowed 
p.yiv6912111178msonormal0, #yiv6912111178 li.yiv6912111178msonormal0, 
#yiv6912111178 div.yiv6912111178msonormal0 
.yiv6912111178MsoChpDefault {font-size:10.0pt;} _filtered #yiv6912111178 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv6912111178 div.yiv6912111178WordSection1 
At high level, in the production line, machine will provide the data in the 
form of CSV in every 1 sec to 1 minutes to 1 day ( depending on machine type 
used in the line operations). I need to parse those files and load it to DB and 
build and API layer expose it to downstream systems.
Number of files to be processed   13,889,660,134  per day
Each file could range from 20 KB to 600MB which will translate into few hundred 
rows to millions of rows.
High availability with high write. Read is less compare to write.
While extracting the rows, few validation to be performed.
Build an API layer on top of the data to be persisted in the DB.
Now, tell me what would be the best choice…
From: Russell Bateman [mailto:r...@windofkeltia.com]
Sent: Thursday, May 31, 2018 7:36 PM
To: user@cassandra.apache.org
Subject: Re: Mongo DB vs Cassandra

MongoDB will accommodate loading CSV without regard to schema while still 
creating identifiable "columns" in the database, but you'll have to predict or 
back-impose some schema later if you're going to create indices for fast 
searching of the data. You can perform searching of data without indexing in 
MongoDB, but it's slower.

Cassandra will require you to understand the schema, i.e.: what the columns are 
up front unless you're just going to store the data without schema and, 
therefore, without ability to search effectively.

As suggested already, you should share more detail if you want good advice. 
Both DBs are excellent. Both do different things in different ways.

Hope this helps,
On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:

I need to make a decision on Mongo DB vs Cassandra for loading the csv file 
data and store csv file as well. If any of you did such study in last couple of 
months, please share your analysis or observations.
Legal Disclaimer :
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is 
or by their designee. If the reader of this message is not the intended 
you are on notice that any distribution of this message, in any form, 
is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this 


Reply via email to