Apparently you have a gzipped file that is >=50GB. You either need to break up those files, or run on larger machines.
On Wed, Apr 5, 2017 at 9:52 AM, Bin Wang <[email protected]> wrote: > Hi, > > I'm using Impala on production for a while. But since yesterday, some > queries reports memory limit exceeded. Then I try a very simple count query, > it still have memory limit exceeded. > > The query is: > > select count(0) from adhoc_data_fast.log where day>='2017-04-04' and > day<='2017-04-06'; > > And the response in the Impala shell is: > > Query submitted at: 2017-04-06 00:41:00 (Coordinator: > http://szq7.appadhoc.com:25000) > Query progress can be monitored at: > http://szq7.appadhoc.com:25000/query_plan?query_id=4947a3fecd146df4:734bcc1d00000000 > WARNINGS: > Memory limit exceeded > GzipDecompressor failed to allocate 54525952000 bytes. > > I have many nodes and each of them have lots of memory avaliable (~ 60 GB). > And the query failed very fast after I execute it and the nodes have almost > no memory usage. > > The table "adhoc_data_fast.log" is an AVRO table and is encoded with gzip > and is partitioned by the field "day". And each partition has no more than > one billion rows. > > My Impala version is: > > hdfs@szq7:/home/ubuntu$ impalad --version > impalad version 2.7.0-cdh5.9.1 RELEASE (build > 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c) > Built on Wed Jan 11 13:39:25 PST 2017 > > Any one can help for this? Thanks very much! >
