Unfortunately, at this stage in dev, I'm only doing runs on one machine, and
though I am using partitioned data to do query parallelism, it seems I lose
that in the GROUP BY. Does GROUP_BY distribute at all?
Might a spark layer on top give a better distribution path?
Mike
-----Original Message-----
From: slava.koptilin [mailto:[email protected]]
Sent: Monday, February 26, 2018 11:17 AM
To: [email protected]
Subject: RE: Slow Group-By
Hi Mike,
It seems that GROUP_BY requires to fetch all dataset into java heap (in order
to sort data) and it may lead to long GC pauses.
I think that data collocation [1] should improve performance with using GROUP
BY.
[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__apacheignite.readme.io_docs_affinity-2Dcollocation&d=DwICAg&c=9g4MJkl2VjLjS6R4ei18BA&r=ipRRuqPnuP3BWnXGSOR_sLoARpltax56uFYU6n57c3GFvMdyEV-dz2ez2lZZpYl0&m=NkZ5g5gstJbpAgZaFvdxW5LiH0PKkDt17rQQ1t3pWlM&s=HrRyvf4qAOPX9Fc0eEdX83y-EvOBiWLqbn5f_aE99Pw&e=
Thanks!
--
Sent from:
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_&d=DwICAg&c=9g4MJkl2VjLjS6R4ei18BA&r=ipRRuqPnuP3BWnXGSOR_sLoARpltax56uFYU6n57c3GFvMdyEV-dz2ez2lZZpYl0&m=NkZ5g5gstJbpAgZaFvdxW5LiH0PKkDt17rQQ1t3pWlM&s=U_kuoGAjhwdELc4JAGoFSPc76DNhaiSwpOJCDR3MGZ8&e=