If you have the time, I would suggest creating a prototype with both databases
and trying it out. You should also have some idea of how this system might
evolve in the future. It is important because that could very well help you
make a decision. Mongo or Cassandra may work but if your requirements evolve in
a way that works better with Cassandra, you might be better off going with
Cassandra.
As others have pointed out each database has it's own strength. Given that you
may store a 20KB to 600MB row, you may be able to model it with Mongo as well
as Cassandra. If you plan on having a separate index like ElasticSearch, Solr
that is outside the database, I would suggest going with Cassandra.
Other factors to consider are licensing, operational cost, etc.
Dinesh
On Thursday, May 31, 2018, 9:01:09 AM PDT, Sudhakar Ganesan
<[email protected]> wrote:
#yiv6912111178 #yiv6912111178 -- _filtered #yiv6912111178 {panose-1:2 4 5 3 5
4 6 3 2 4;} _filtered #yiv6912111178 {font-family:Calibri;panose-1:2 15 5 2 2 2
4 3 2 4;} _filtered #yiv6912111178 {font-family:Candara;panose-1:2 14 5 2 3 3 3
2 2 4;}#yiv6912111178 #yiv6912111178 p.yiv6912111178MsoNormal, #yiv6912111178
li.yiv6912111178MsoNormal, #yiv6912111178 div.yiv6912111178MsoNormal
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;color:black;}#yiv6912111178
a:link, #yiv6912111178 span.yiv6912111178MsoHyperlink
{color:#0563C1;text-decoration:underline;}#yiv6912111178 a:visited,
#yiv6912111178 span.yiv6912111178MsoHyperlinkFollowed
{color:#954F72;text-decoration:underline;}#yiv6912111178
p.yiv6912111178msonormal0, #yiv6912111178 li.yiv6912111178msonormal0,
#yiv6912111178 div.yiv6912111178msonormal0
{margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;color:black;}#yiv6912111178
span.yiv6912111178EmailStyle18
{font-family:sans-serif;color:windowtext;}#yiv6912111178
span.yiv6912111178EmailStyle19
{font-family:sans-serif;color:windowtext;}#yiv6912111178
.yiv6912111178MsoChpDefault {font-size:10.0pt;} _filtered #yiv6912111178
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv6912111178 div.yiv6912111178WordSection1
{}#yiv6912111178
At high level, in the production line, machine will provide the data in the
form of CSV in every 1 sec to 1 minutes to 1 day ( depending on machine type
used in the line operations). I need to parse those files and load it to DB and
build and API layer expose it to downstream systems.
Number of files to be processed 13,889,660,134 per day
Each file could range from 20 KB to 600MB which will translate into few hundred
rows to millions of rows.
High availability with high write. Read is less compare to write.
While extracting the rows, few validation to be performed.
Build an API layer on top of the data to be persisted in the DB.
Now, tell me what would be the best choice…
From: Russell Bateman [mailto:[email protected]]
Sent: Thursday, May 31, 2018 7:36 PM
To: [email protected]
Subject: Re: Mongo DB vs Cassandra
Sudhakar,
MongoDB will accommodate loading CSV without regard to schema while still
creating identifiable "columns" in the database, but you'll have to predict or
back-impose some schema later if you're going to create indices for fast
searching of the data. You can perform searching of data without indexing in
MongoDB, but it's slower.
Cassandra will require you to understand the schema, i.e.: what the columns are
up front unless you're just going to store the data without schema and,
therefore, without ability to search effectively.
As suggested already, you should share more detail if you want good advice.
Both DBs are excellent. Both do different things in different ways.
Hope this helps,
Russ
On 05/31/2018 05:49 AM, Sudhakar Ganesan wrote:
Team,
I need to make a decision on Mongo DB vs Cassandra for loading the csv file
data and store csv file as well. If any of you did such study in last couple of
months, please share your analysis or observations.
Regards,
Sudhakar
Legal Disclaimer :
The information contained in this message may be privileged and confidential.
It is intended to be read only by the individual or entity to whom it is
addressed
or by their designee. If the reader of this message is not the intended
recipient,
you are on notice that any distribution of this message, in any form,
is strictly prohibited. If you have received this message in error,
please immediately notify the sender and delete or destroy any copy of this
message!