spark optimized pagination

onmstester onmstester Sat, 09 Jun 2018 22:13:18 -0700

Hi,

I'm using spark on top of cassandra as backend CRUD of a Restfull Application.


Most of Rest API's retrieve huge amount of data from cassandra and doing a lot 
of aggregation on them  in spark which take some seconds.



Problem: sometimes the output result would be a big list which make client 
browser throw stop script, so we should paginate the result at the server-side,

but it would be so annoying for user to wait some seconds on each page to 
cassandra-spark processings,



Current Dummy Solution: For now i was thinking about assigning a UUID to each 
request which would be sent back and forth between server-side and client-side,

the first time a rest API invoked, the result would be saved in a temptable  
and in subsequent similar requests (request for next pages) the result would be 
fetch from

temptable (instead of common flow of retrieve from cassandra + aggregation in 
spark which would take some time). On memory limit, the old results would be 
deleted.



Is there any built-in clean caching strategy in spark to handle such scenarios?



Sent using Zoho Mail

spark optimized pagination

Reply via email to