Hi John, Sorry I didn't get back to you (I thought I did).
No, I don't need the plan, I just wanted to confirm what was taking most of the time and you already confirmed it's the planning. Can you open a JIRA for this ? this may be a known issue, but I'm not sure. Thanks On Tue, Feb 9, 2016 at 6:08 AM, John Omernik <[email protected]> wrote: > Abdel, do you still need the plans, as I said, if your table has any decent > amount of directories and files, it looks like the planning is touching all > the directories even though you are pruning. I can post plans, however, I > think in this case you'll find they are exactly the same, and the only > difference is that the longer queries is planning much more because it has > more files to read. > > > On Thu, Feb 4, 2016 at 10:46 AM, John Omernik <[email protected]> wrote: > > > I can package up both plans for you if you need them (let me know if you > > still want them) but I can tell you the plans were EXACTLY the same, > > however the data-sum table took 0.932 seconds to plan the query, and the > > data table (the one with the all the extra data) took 11.379 seconds to > > plan the query. Indicating to me the issue isn't in the plan that was > > created, but the actual planning process. (Let me know if you disagree or > > still need to see the plan, like I said, the actual plans were exactly > the > > same) > > > > > > John. > > > > > > On Thu, Feb 4, 2016 at 10:31 AM, Abdel Hakim Deneche < > > [email protected]> wrote: > > > >> Hey John, can you try an explain plan for both queries and see how much > >> times it takes ? > >> > >> for example, for the first query you would run: > >> > >> *explain plan for* select count(1) from `data/2016-02-03`; > >> > >> It can also be helpful if you could share the query profiles for both > >> queries. > >> > >> Thanks > >> > >> On Thu, Feb 4, 2016 at 8:15 AM, John Omernik <[email protected]> wrote: > >> > >> > Hey all, I think am I seeing an issue related to > >> > https://issues.apache.org/jira/browse/DRILL-3759 but I want to > >> describe it > >> > out here, see if it's really the case, and then determine what the > >> blockers > >> > may be to resolution. > >> > > >> > I am using the MapR Developer Release 1.4, and I have a directory with > >> > subdirectories by data. > >> > > >> > data/2015-01-01 > >> > data/2015-01-02 > >> > data/2015-01-03 > >> > > >> > These are stored as Parquet files. At this point Each data averages > >> about > >> > 1 GB of data, and has roughly 75 parquet files in it. > >> > > >> > When I run > >> > > >> > select count(1) from `data/2016-02-03` it takes roughly 11 seconds. > >> > > >> > If I copy the 2016-02-03 directory to a new base (date-sum) and run > >> > > >> > select count(1) from `data_sum/2016-02-03` it runs in 0.874 seconds. > >> > > >> > Same data, same structure, only difference is the data_sum directory > >> only > >> > has a few directories, iand data has dates going back to Nov 2015. It > >> > seems like it is getting files name for all files in each directory > >> prior > >> > to pruning which seems to me to be adding a lot of latency to queries > >> that > >> > doesn't need to be there. (thus I think I am seeing 3759) but I > wanted > >> to > >> > confirm, and then I wanted to see how we can address this in that the > >> > directory prune should be fast, and on large data sets its just going > to > >> > get worse and worse. > >> > > >> > > >> > > >> > John > >> > > >> > >> > >> > >> -- > >> > >> Abdelhakim Deneche > >> > >> Software Engineer > >> > >> <http://www.mapr.com/> > >> > >> > >> Now Available - Free Hadoop On-Demand Training > >> < > >> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >> > > >> > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
