Re: Performance tuning

Jim Scott Tue, 17 May 2016 06:42:06 -0700

The mailing lists do not support attachments. You can provide a link to a
git repo or something like that though.


You might want to alter your query to be something like select
count(FIELDX) from....

On Tue, May 17, 2016 at 8:36 AM, Leandro Ordonez <
[email protected]> wrote:

> Hello,
>
> I've deployed an HDFS cluster and installed Apache Drill on top of it, but
> found in my case that It takes quite long for Drill to run some queries on
> large JSON files, such as the full Reddit submission corpus (260GB). For
> instance, this query: *SELECT COUNT(*) from
> dfs.reddit.`RS_full_corpus.json` WHERE selftext <> '' and selftext <>
> '[deleted]'**; *took about one hour to run. The other thing I've noticed
> is that none of my queries get processed in a "fragmented" way, the query
> execution is always in charge of the drilbit acting as the foreman.
>
> In the attachment you can find the topology that I'm using. Any feedback
> on this would be greatly appreciated.
>
> Thank you very much for your kind attention.
>
> Best regards,
>
> --
> Leandro Ordonez-Ante
> Department of Information Technology
> Internet Based Communication Networks and Services (IBCN)
> Ghent University - iMinds
> Technologiepark Zwijnaarde 15, B-9052 Gent, Belgium
> E: [email protected], [email protected]
> W: www.ibcn.intec.UGent.be
>
>


-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Performance tuning

Reply via email to