Re: Merging files

John Omernik Thu, 23 Jun 2016 11:14:24 -0700

Unfortunatly, the 14 mb error message contains to much proprietary
information for me to post to a Jira, the query itself may also be a bit to
revealing.This is Drill 1.6, so maybe the issue isn't fixed in my version?
do you know the original JIRA for the really long error?


On Thu, Jun 23, 2016 at 12:56 PM, Jinfeng Ni <[email protected]> wrote:

> This "CannotPlanException" definitely is a bug in query planner. I
> thought we had put code to show that extremely long error msg "only"
> in debug mode. Looks like it's not that case.
>
> Could you please open a JIRA and post your query, if possible? thx.
>
> On Thu, Jun 23, 2016 at 10:45 AM, John Omernik <[email protected]> wrote:
> > Jinfeng -
> >
> > I wrote my item prior to reading yours. Just an FYI, when I ran with that
> > settting, I got a "CannotPlanException" (with an error that is easily the
> > longest "non-verbose"( heck this beats all the verbose errors I've had)
> > I've ever seen. I'd post it here, but I am not unsure if my Google has
> > enough storage to handle this message....
> >
> > (kidding... sorta)
> >
> > John
> >
> >
> >
> > On Thu, Jun 23, 2016 at 12:37 PM, Jinfeng Ni <[email protected]>
> wrote:
> >
> >> Do you partition by day in your CTAS? If that's the case, CTAS will
> >> produce at least one parquet file for each value of "day".  If you
> >> have 100 days, then you will end up at least 100 files. However, in
> >> case the query is executed in distributed mode, there could be more
> >> than one file per value.
> >>
> >> In order to get one and only one parquet file for each partition
> >> value, turn on this option:
> >>
> >> alter session set `store.partition.hash_distribute` = true;
> >>
> >>
> >>
> >> On Thu, Jun 23, 2016 at 10:26 AM, Jason Altekruse <[email protected]>
> >> wrote:
> >> > Apply a sort in your CTAS, this will force the data down to a single
> >> stream
> >> > before writing.
> >> >
> >> > Jason Altekruse
> >> > Software Engineer at Dremio
> >> > Apache Drill Committer
> >> >
> >> > On Thu, Jun 23, 2016 at 10:23 AM, John Omernik <[email protected]>
> wrote:
> >> >
> >> >> When have a small query writing smaller data (like aggregate tables
> for
> >> >> faster aggregates for Dashboards etc).  It appears to write a ton of
> >> small
> >> >> files.  Not sure why, maybe its just how the join worked out etc. I
> >> have a
> >> >> "day" that is 1.5M in total size, but 400 files total. This seems
> >> >> excessive.
> >> >>
> >> >> While I don't have the "small files" issues because I run MapR-FS,
> >> having
> >> >> 400 files that make 1.5 mb of total date kills me on the planning
> phase.
> >> >>  How can I get Drill, when doing a CTAS to go through a round of
> >> >> consolidation on the parquet files?
> >> >>
> >> >> Thanks
> >> >>
> >> >> John
> >> >>
> >>
>

Re: Merging files

Reply via email to