Can you try to convert src_date to a date type?

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Feb 29, 2016 at 10:28 AM, John Omernik <[email protected]> wrote:

> I am running 6 drill bits, they were running with 20GB of Direct Memory and
> 4 GB of Heap, and I altered them to run with 18GB of direct and 6 GB of
> Heap, and I am still getting this error.
>
> I am running a query, and trying to understand why so much heap space is
> being used. The data is Parquet files, organized into directories by date
> (2015-01-01, 2015-01-02 etc)
>
> TABLE
> ---> 2015-01-01
> ---> 2015-01-02
>
> Etc
>
> This data isn't what I would call "huge", at most 500 MB per day, with 69
> parquet files per day.  While I do have the planning issue related to lots
> of directories with lots of files, (see other emails) I don't think that is
> related here.
>
> I have a view that basically select dir0 as src_date, field1, field2,
> field3 from table, then I run a query such as
>
> select src_date, count(1) from view_table where src_date >= '2016-02-25'
> group by src_date
>
> That will work.
>
> If I run
>
> select src_date, count(1) from view_table where src_date >= '2016-02-01'
> group by src_date
>
> That will hang, and eventually I will see drillbit crash and restart and
> the errors logs point to Java Heap Space issues.  This is the same on 4 GB
> or 6 GB HEAP Space.
>
> So my question is this...
>
> Given the data, how do I troubleshoot this and provide helpful feedback? I
> am running the MapR 1.4 Developer Release right now, this to me seems to be
> an issue in that why would a single query be able to crash a node?
> SHouldn't the query be terminated? Even so, why would 30 days of 500mb of
> data (i.e. it would take 15 GB of direct ram per node, which is available,
> to load the ENTIRE DATA set into ram) crash given that sort of aggregation?
>

Reply via email to