Unfortunately, at this stage in dev, I'm only doing runs on one machine, and 
though I am using partitioned data to do query parallelism, it seems I lose 
that in the GROUP BY.  Does GROUP_BY distribute at all? 

Might a spark layer on top give a better distribution path? 

Mike
        
-----Original Message-----
From: slava.koptilin [mailto:slava.kopti...@gmail.com] 
Sent: Monday, February 26, 2018 11:17 AM
To: user@ignite.apache.org
Subject: RE: Slow Group-By

Hi Mike,

It seems that GROUP_BY requires to fetch all dataset into java heap (in order 
to sort data) and it may lead to long GC pauses.
I think that data collocation [1] should improve performance with using GROUP 
BY.

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__apacheignite.readme.io_docs_affinity-2Dcollocation&d=DwICAg&c=9g4MJkl2VjLjS6R4ei18BA&r=ipRRuqPnuP3BWnXGSOR_sLoARpltax56uFYU6n57c3GFvMdyEV-dz2ez2lZZpYl0&m=NkZ5g5gstJbpAgZaFvdxW5LiH0PKkDt17rQQ1t3pWlM&s=HrRyvf4qAOPX9Fc0eEdX83y-EvOBiWLqbn5f_aE99Pw&e=

Thanks!



--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_&d=DwICAg&c=9g4MJkl2VjLjS6R4ei18BA&r=ipRRuqPnuP3BWnXGSOR_sLoARpltax56uFYU6n57c3GFvMdyEV-dz2ez2lZZpYl0&m=NkZ5g5gstJbpAgZaFvdxW5LiH0PKkDt17rQQ1t3pWlM&s=U_kuoGAjhwdELc4JAGoFSPc76DNhaiSwpOJCDR3MGZ8&e=

Reply via email to