Hi Debasish:

We have the same dataset running on SybaseIQ and after the caches are warm
the queries come back in about 300ms.  We're looking at options to relieve
overutilization and to bring down licensing costs.  I realize that Spark
may not be the best fit for this use case but I'm interested to see how far
it can be pushed.

Thanks for your help!


-- Eric

On Tue, Jun 30, 2015 at 5:28 PM, Debasish Das <debasish.da...@gmail.com>
wrote:

> I got good runtime improvement from hive partitioninp, caching the dataset
> and increasing the cores through repartition...I think for your case
> generating mysql style indexing will help further..it is not supported in
> spark sql yet...
>
> I know the dataset might be too big for 1 node mysql but do you have a
> runtime estimate from running the same query on mysql with appropriate
> column indexing ? That should give us a good baseline number...
>
> For my case at least I could not put the data on 1 node mysql as it was
> big...
>
> If you can write the problem in a document view you can use a document
> store like solr/elastisearch to boost runtime...the reverse indices can get
> you subsecond latencies...again the schema design matters for that and you
> might have to let go some of sql expressiveness (like balance in a
> predefined bucket might be fine but looking for the exact number might be
> slow)
>

Reply via email to