The mailing lists do not support attachments. You can provide a link to a git repo or something like that though.
You might want to alter your query to be something like select count(FIELDX) from.... On Tue, May 17, 2016 at 8:36 AM, Leandro Ordonez < [email protected]> wrote: > Hello, > > I've deployed an HDFS cluster and installed Apache Drill on top of it, but > found in my case that It takes quite long for Drill to run some queries on > large JSON files, such as the full Reddit submission corpus (260GB). For > instance, this query: *SELECT COUNT(*) from > dfs.reddit.`RS_full_corpus.json` WHERE selftext <> '' and selftext <> > '[deleted]'**; *took about one hour to run. The other thing I've noticed > is that none of my queries get processed in a "fragmented" way, the query > execution is always in charge of the drilbit acting as the foreman. > > In the attachment you can find the topology that I'm using. Any feedback > on this would be greatly appreciated. > > Thank you very much for your kind attention. > > Best regards, > > -- > Leandro Ordonez-Ante > Department of Information Technology > Internet Based Communication Networks and Services (IBCN) > Ghent University - iMinds > Technologiepark Zwijnaarde 15, B-9052 Gent, Belgium > E: [email protected], [email protected] > W: www.ibcn.intec.UGent.be > > -- *Jim Scott* Director, Enterprise Strategy & Architecture +1 (347) 746-9281 @kingmesal <https://twitter.com/kingmesal> <http://www.mapr.com/> [image: MapR Technologies] <http://www.mapr.com> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
