Boo Christopher beat me to it. Leandro I didn't mention to Merlijn I was using Parquet files :)
On Tue, May 17, 2016 at 2:54 PM, Leandro Ordonez < [email protected]> wrote: > Thank you Jim, > > The attachment was this image: https://i.imgsafe.org/7e98f92.png > > Then, is it expected for the query I've mentioned before to take that long? > > > On 05/17/2016 03:41 PM, Jim Scott wrote: > >> The mailing lists do not support attachments. You can provide a link to a >> git repo or something like that though. >> >> You might want to alter your query to be something like select >> count(FIELDX) from.... >> >> On Tue, May 17, 2016 at 8:36 AM, Leandro Ordonez < >> [email protected]> wrote: >> >> Hello, >>> >>> I've deployed an HDFS cluster and installed Apache Drill on top of it, >>> but >>> found in my case that It takes quite long for Drill to run some queries >>> on >>> large JSON files, such as the full Reddit submission corpus (260GB). For >>> instance, this query: *SELECT COUNT(*) from >>> dfs.reddit.`RS_full_corpus.json` WHERE selftext <> '' and selftext <> >>> '[deleted]'**; *took about one hour to run. The other thing I've noticed >>> is that none of my queries get processed in a "fragmented" way, the query >>> execution is always in charge of the drilbit acting as the foreman. >>> >>> In the attachment you can find the topology that I'm using. Any feedback >>> on this would be greatly appreciated. >>> >>> Thank you very much for your kind attention. >>> >>> Best regards, >>> >>> -- >>> Leandro Ordonez-Ante >>> Department of Information Technology >>> Internet Based Communication Networks and Services (IBCN) >>> Ghent University - iMinds >>> Technologiepark Zwijnaarde 15, B-9052 Gent, Belgium >>> E: [email protected], [email protected] >>> W: www.ibcn.intec.UGent.be >>> >>> >>> >> > -- > Leandro Ordonez-Ante > Department of Information Technology > Internet Based Communication Networks and Services (IBCN) > Ghent University - iMinds > Technologiepark Zwijnaarde 15, B-9052 Gent, Belgium > E: [email protected], [email protected] > W: www.ibcn.intec.UGent.be > >
