There is not one answer to this. 

It really depends what kind of time Series analysis you do with the data and 
what time series database you are using. Then it also depends what Etl you need 
to do.
You seem to also need to join data - is it with existing data of the same type 
or do you join completely different data. If so where does this data come from?

360 GB / day / uncompressed does not sound terrible much.

> On 24. May 2018, at 08:49, amin mohebbi <aminn_...@yahoo.com.INVALID> wrote:
> 
> Could you please help me to understand  the performance that we get from 
> using spark with any nosql or TSDB ? We receive 1 mil meters x 288 readings = 
> 288 mil rows (Approx. 360 GB per day) – Therefore, we will end up with 10's 
> or 100's of TBs of data and I feel that NoSQL will be much quicker than 
> Hadoop/Spark. This is time series data that are coming from many devices in 
> form of flat files and it is currently extracted / transformed /loaded  into 
> another database which is connected to BI tools. We might use azure data 
> factory to collect the flat files and then use spark to do the ETL(not sure 
> if it is correct way) and then use spark to join table or do the aggregations 
> and save them into a db (preferably nosql not sure). Finally, connect deploy 
> Power BI to get visualize the data from nosql db. My questions are :
> 
> 1- Is the above mentioned correct architecture? using spark with nosql as I 
> think combination of these two could help to have random access and run many 
> queries by different users. 
> 2- do we really need to use a time series db? 
> 
> 
> Best Regards ....................................................... Amin 
> Mohebbi PhD candidate in Software Engineering   at university of Malaysia   
> Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my               
> amin_...@me.com

Reply via email to