It's basically a two level grouping that has a LEFT JOIN so
select a.field1, a.field2, sum(b.somefield) as new_thing from table1 a LEFT JOIN table2 b on a.id = b.id where a.field1 = '2015-05-05' group by a.field1, b.field2 It's not a very complicated query, but it doesn't like the hash_distribute :) On Thu, Jun 23, 2016 at 1:46 PM, Jinfeng Ni <[email protected]> wrote: > I looked at the code. 1. Drill did log this long CanNotPlan msg in > error level. 2) It was replaced with a much shorter version msg only > when CanNotPlan was caused by cartesian join. > > I guess your query probably did not have cartesian join, and > CanNotPlan was caused by other reasons. Either way, I think Drill > should not display such verbose error msg (which is essentially the > planner internal state). > > I actually do not need the error message. The query itself would be > good enough in many cases. If the query has sensitive information in > column names, you can replace sensitive column names with arbitrary > names and still would hit the same issue, if your data is from > schema-on-read source (parquet / csv etc). Drill's planner does not > have schema infor; changing column name across the query would not > impact planner's behavior. > > btw: the long error change was part of DRILL-2958 [1] > > > [1] https://issues.apache.org/jira/browse/DRILL-2958 > > > > On Thu, Jun 23, 2016 at 11:13 AM, John Omernik <[email protected]> wrote: > > Unfortunatly, the 14 mb error message contains to much proprietary > > information for me to post to a Jira, the query itself may also be a bit > to > > revealing.This is Drill 1.6, so maybe the issue isn't fixed in my > version? > > do you know the original JIRA for the really long error? > > > > On Thu, Jun 23, 2016 at 12:56 PM, Jinfeng Ni <[email protected]> > wrote: > > > >> This "CannotPlanException" definitely is a bug in query planner. I > >> thought we had put code to show that extremely long error msg "only" > >> in debug mode. Looks like it's not that case. > >> > >> Could you please open a JIRA and post your query, if possible? thx. > >> > >> On Thu, Jun 23, 2016 at 10:45 AM, John Omernik <[email protected]> > wrote: > >> > Jinfeng - > >> > > >> > I wrote my item prior to reading yours. Just an FYI, when I ran with > that > >> > settting, I got a "CannotPlanException" (with an error that is easily > the > >> > longest "non-verbose"( heck this beats all the verbose errors I've > had) > >> > I've ever seen. I'd post it here, but I am not unsure if my Google has > >> > enough storage to handle this message.... > >> > > >> > (kidding... sorta) > >> > > >> > John > >> > > >> > > >> > > >> > On Thu, Jun 23, 2016 at 12:37 PM, Jinfeng Ni <[email protected]> > >> wrote: > >> > > >> >> Do you partition by day in your CTAS? If that's the case, CTAS will > >> >> produce at least one parquet file for each value of "day". If you > >> >> have 100 days, then you will end up at least 100 files. However, in > >> >> case the query is executed in distributed mode, there could be more > >> >> than one file per value. > >> >> > >> >> In order to get one and only one parquet file for each partition > >> >> value, turn on this option: > >> >> > >> >> alter session set `store.partition.hash_distribute` = true; > >> >> > >> >> > >> >> > >> >> On Thu, Jun 23, 2016 at 10:26 AM, Jason Altekruse <[email protected]> > >> >> wrote: > >> >> > Apply a sort in your CTAS, this will force the data down to a > single > >> >> stream > >> >> > before writing. > >> >> > > >> >> > Jason Altekruse > >> >> > Software Engineer at Dremio > >> >> > Apache Drill Committer > >> >> > > >> >> > On Thu, Jun 23, 2016 at 10:23 AM, John Omernik <[email protected]> > >> wrote: > >> >> > > >> >> >> When have a small query writing smaller data (like aggregate > tables > >> for > >> >> >> faster aggregates for Dashboards etc). It appears to write a ton > of > >> >> small > >> >> >> files. Not sure why, maybe its just how the join worked out etc. > I > >> >> have a > >> >> >> "day" that is 1.5M in total size, but 400 files total. This seems > >> >> >> excessive. > >> >> >> > >> >> >> While I don't have the "small files" issues because I run MapR-FS, > >> >> having > >> >> >> 400 files that make 1.5 mb of total date kills me on the planning > >> phase. > >> >> >> How can I get Drill, when doing a CTAS to go through a round of > >> >> >> consolidation on the parquet files? > >> >> >> > >> >> >> Thanks > >> >> >> > >> >> >> John > >> >> >> > >> >> > >> >
