Hi Florian, It depends on a number of factors. How much data are you querying? Where is the data stored (HDD, SSD or DRAM)? What is the file format (Parquet or CSV)?
In theory, it is possible to use Spark SQL for real-time queries, but cost increases as the data size grows. If you can store all of your data in memory, then you should be able to query it in real-time ☺ On the other extreme, if Spark SQL has to read a terabyte of data from spinning disk, there is no way it can respond in real-time. To be fair, no software can read a terabyte of data from HDD in real-time. Simple laws of physics. Either you will have to spread out the reads over a large number of disks and read them in parallel. Alternatively, index the data so that your queries don’t have to read a terabyte of data from disk. Hope that helps. Mohammed From: Denny Lee [mailto:denny.g....@gmail.com] Sent: Monday, July 6, 2015 4:21 AM To: spierki; user@spark.apache.org Subject: Re: Spark SQL queries hive table, real time ? Within the context of your question, Spark SQL utilizing the Hive context is primarily about very fast queries. If you want to use real-time queries, I would utilize Spark Streaming. A couple of great resources on this topic include Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms and Optimization<http://www.slideshare.net/tathadas/guest-lecture-on-spark-streaming-in-standford> and Recipes for Running Spark Streaming Applications in Production<https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/> (from the recent Spark Summit 2015) HTH! On Mon, Jul 6, 2015 at 3:23 PM spierki <florian.spierc...@crisalid.com<mailto:florian.spierc...@crisalid.com>> wrote: Hello, I'm actually asking my self about performance of using Spark SQL with Hive to do real time analytics. I know that Hive has been created for batch processing, and Spark is use to do fast queries. But, use Spark SQL with Hive will allow me to do real time queries ? Or it just will make fastest queries but not real time. Should I use an other datawarehouse, like Hbase ? Thanks in advance for your time and consideration, Florian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-queries-hive-table-real-time-tp23642.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>