You’re right, data needs to be loaded into Ignite before you can use its more 
efficient SQL engine from Spark.

You can certainly load the data in using Spark as you describe. That’s probably 
the easiest, least code, way of doing it. If there’s a lot it,  it may be more 
efficient to load the data directly into Ignite using some of the bulk loading 
APIs (DataStreamer, K-V store, etc). There’s nothing special about the Ignite 
caches that you see in Spark as a DataFrame, so you can populate them as you 
like.

As for making sure the “right” data is in Ignite, you could load the data in 
once and use Ignite’s native persistence so it doesn’t all have to be in memory?

Regards,
Stephen

> On 4 Jan 2019, at 01:32, anthonycwmak <[email protected]> wrote:
> 
> I want to find how to speed up SparkSQL using Ignite (in particular I am
> wondering if it can be a replacement for Presto and my files are parquet
> format in S3). 
> 
> Question: Reading from the links in
> https://ignite.apache.org/use-cases/spark/sql-queries.html, is it true (in
> my case) I need to pre-load data to Ignite first (by loading my S3 files to
> Spark as dataframe and then writing the dataframe to Ignite Dataframe), then
> I can run sql against the Ignite DataFrame with the benefit of indexing?
> Therefore, for any data I want to query using SparkSQL, I will need to
> pre-load them into Ignite first explicitly like that? It maybe difficult to
> anticipate what date range of data users want to query and to pre-load them,
> I am hoping for a seemless way to boost my SparkSQL queries.
> 
> Any links/reference will be appreciated :)
> 
> Anthony Mak
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Reply via email to