Unfortunately, at this stage in dev, I'm only doing runs on one machine, and though I am using partitioned data to do query parallelism, it seems I lose that in the GROUP BY. Does GROUP_BY distribute at all?
Might a spark layer on top give a better distribution path? Mike -----Original Message----- From: slava.koptilin [mailto:slava.kopti...@gmail.com] Sent: Monday, February 26, 2018 11:17 AM To: user@ignite.apache.org Subject: RE: Slow Group-By Hi Mike, It seems that GROUP_BY requires to fetch all dataset into java heap (in order to sort data) and it may lead to long GC pauses. I think that data collocation [1] should improve performance with using GROUP BY. [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__apacheignite.readme.io_docs_affinity-2Dcollocation&d=DwICAg&c=9g4MJkl2VjLjS6R4ei18BA&r=ipRRuqPnuP3BWnXGSOR_sLoARpltax56uFYU6n57c3GFvMdyEV-dz2ez2lZZpYl0&m=NkZ5g5gstJbpAgZaFvdxW5LiH0PKkDt17rQQ1t3pWlM&s=HrRyvf4qAOPX9Fc0eEdX83y-EvOBiWLqbn5f_aE99Pw&e= Thanks! -- Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_&d=DwICAg&c=9g4MJkl2VjLjS6R4ei18BA&r=ipRRuqPnuP3BWnXGSOR_sLoARpltax56uFYU6n57c3GFvMdyEV-dz2ez2lZZpYl0&m=NkZ5g5gstJbpAgZaFvdxW5LiH0PKkDt17rQQ1t3pWlM&s=U_kuoGAjhwdELc4JAGoFSPc76DNhaiSwpOJCDR3MGZ8&e=