Hi Tomas, One option is to cache your table as Parquet files into Alluxio (which can serve as an in-memory distributed caching layer for Spark in your case).
The code on Spark will be like > df.write.parquet("alluxio://master:19998/data.parquet")> df = > sqlContext.read.parquet("alluxio://master:19998/data.parquet") (See more details at the documentation http://www.alluxio.org/docs/1.8/en/compute/Spark.html <http://www.alluxio.org/docs/1.8/en/compute/Spark.html#cache-dataframe-into-alluxio?utm_source=spark> ) This would require running Alluxio as a separate service (ideally colocated with Spark servers), of course. But also enables data sharing across Spark jobs. - Bin On Tue, Jan 15, 2019 at 10:29 AM Tomas Bartalos <tomas.barta...@gmail.com> wrote: > Hello, > > I'm using spark-thrift server and I'm searching for best performing > solution to query hot set of data. I'm processing records with nested > structure, containing subtypes and arrays. 1 record takes up several KB. > > I tried to make some improvement with cache table: > > cache table event_jan_01 as select * from events where day_registered = > 20190102; > > > If I understood correctly, the data should be stored in *in-memory > columnar* format with storage level MEMORY_AND_DISK. So data which > doesn't fit to memory will be spille to disk (I assume also in columnar > format (?)) > I cached 1 day of data (1 M records) and according to spark UI storage tab > none of the data was cached to memory and everything was spilled to disk. > The size of the data was *5.7 GB.* > Typical queries took ~ 20 sec. > > Then I tried to store the data to parquet format: > > CREATE TABLE event_jan_01_par USING parquet location "/tmp/events/jan/02" > as > > select * from event_jan_01; > > > The whole parquet took up only *178MB.* > And typical queries took 5-10 sec. > > Is it possible to tune spark to spill the cached data in parquet format ? > Why the whole cached table was spilled to disk and nothing stayed in > memory ? > > Spark version: 2.4.0 > > Best regards, > Tomas > >