RE: Spark SQL queries hive table, real time ?

Mohammed Guller Mon, 06 Jul 2015 19:09:28 -0700

Hi Florian,
It depends on a number of factors. How much data are you querying? Where is the 
data stored (HDD, SSD or DRAM)? What is the file format (Parquet or CSV)?


In theory, it is possible to use Spark SQL for real-time queries, but cost 
increases as the data size grows. If you can store all of your data in memory, 
then you should be able to query it in real-time ☺ On the other extreme,  if 
Spark SQL has to read a terabyte of data from spinning disk, there is no way it 
can respond in real-time. To be fair, no software can read a terabyte of data 
from HDD in real-time. Simple laws of physics. Either you will have to spread 
out the reads over a large number of disks and read them in parallel. 
Alternatively, index the data so that your queries don’t have to read a 
terabyte of data from disk.

Hope that helps.

Mohammed

From: Denny Lee [mailto:[email protected]]
Sent: Monday, July 6, 2015 4:21 AM
To: spierki; [email protected]
Subject: Re: Spark SQL queries hive table, real time ?

Within the context of your question, Spark SQL utilizing the Hive context is 
primarily about very fast queries.  If you want to use real-time queries, I 
would utilize Spark Streaming.  A couple of great resources on this topic 
include Guest Lecture on Spark Streaming in Stanford CME 323: Distributed 
Algorithms and 
Optimization<http://www.slideshare.net/tathadas/guest-lecture-on-spark-streaming-in-standford>
 and Recipes for Running Spark Streaming Applications in 
Production<https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/>
 (from the recent Spark Summit 2015)

HTH!


On Mon, Jul 6, 2015 at 3:23 PM spierki 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I'm actually asking my self about performance of using Spark SQL with Hive
to do real time analytics.
I know that Hive has been created for batch processing, and Spark is use to
do fast queries.

But, use Spark SQL with Hive will allow me to do real time queries ? Or it
just will make fastest queries but not real time.
Should I use an other datawarehouse, like Hbase ?

Thanks in advance for your time and consideration,
Florian



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-queries-hive-table-real-time-tp23642.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

RE: Spark SQL queries hive table, real time ?

Reply via email to