Performance tuning

Leandro Ordonez Tue, 17 May 2016 06:37:46 -0700

Hello,

I've deployed an HDFS cluster and installed Apache Drill on top of it,but found in my case that It takes quite long for Drill to run somequeries on large JSON files, such as the full Reddit submission corpus(260GB). For instance, this query: /SELECT COUNT(*) fromdfs.reddit.`RS_full_corpus.json` WHERE selftext <> '' and selftext <>'[deleted]'//; /took about one hour to run. The other thing I've noticedis that none of my queries get processed in a "fragmented" way, thequery execution is always in charge of the drilbit acting as the foreman.

In the attachment you can find the topology that I'm using. Any feedbackon this would be greatly appreciated.


Thank you very much for your kind attention.

Best regards,

--
Leandro Ordonez-Ante
Department of Information Technology
Internet Based Communication Networks and Services (IBCN)
Ghent University - iMinds
Technologiepark Zwijnaarde 15, B-9052 Gent, Belgium
E: [email protected], [email protected]
W: www.ibcn.intec.UGent.be

Performance tuning

Reply via email to