Yes, for every query, we build schema tree by trying to initialize all storage plugins and workspaces in them, regardless of schema configuration and/or applicability to data being queried. Go ahead and file a JIRA. We are looking into fixing this.
Thanks, Padma > On Dec 1, 2016, at 8:48 AM, Abhishek Girish <[email protected]> wrote: > > AFAIK, should apply to all queries, irrespective of the source of the data > or the plugins involved within the query. So when this issue occurs, I > would expect any query to take long to execute. > > On Thu, Dec 1, 2016 at 5:47 AM John Omernik <[email protected]> wrote: > >> @Abhishek, >> >> Do you think the issue is related to any storage plugin that is enabled and >> not available as it applies to all queries? I guess if it's an issue where >> all queries are slow because the foreman is waiting to initialize ALL >> storage plugins, regardless of their applicability to the queried data, >> then that is a more general issue (that should still be resolved, does the >> foreman need to initialize all plugins before querying specific data?) >> However, I am still concerned that the query on the CTAS parquet data is >> specifically slower because of it's source. @Rahul could you test a >> different Parquet table, NOT loaded from the SQL server to see if the >> enabling or disabling the JDBC storage plugin (with the server unavailable) >> has any impact? Basically, I want to ensure that data that is created as a >> Parquet table via CTAS is 100% free of any links to the source data. This >> is EXTREMELY important. >> >> John >> >> >> >> On Thu, Dec 1, 2016 at 12:46 AM, Abhishek Girish < >> [email protected]> >> wrote: >> >>> Thanks for the update, Rahul! >>> >>> On Wed, Nov 30, 2016 at 9:45 PM Rahul Raj < >> [email protected] >>>> >>> wrote: >>> >>>> Abhishek, >>>> >>>> Your observation is correct, we just verified that: >>>> >>>> 1. The queries run as expected(faster) with Jdbc plugin disabled. >>>> 2. Queries run as expected when the plugin's datasource is running. >>>> 3. With the datasource down, queries run very slow waiting for the >>>> connection to fail >>>> >>>> Rahul >>>> >>>> On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish < >>>> [email protected]> >>>> wrote: >>>> >>>>> @John, >>>>> >>>>> I agree that this should work. While I am not certain, I don't think >>> the >>>>> issue is specific to a particular plugin, but the way in a query's >>>>> lifecycle, the foreman attempts to initialize every enabled storage >>>> plugin >>>>> before proceeding to execute the query. So when a particular plugin >>> isn't >>>>> configured correctly or the underlying datasource is not up, this >> could >>>>> drastically slow down the query execution time. >>>>> >>>>> I'll look up to see if we have a JIRA for this already - if not will >>> file >>>>> one. >>>>> >>>>> On Wed, Nov 30, 2016 at 8:12 AM, John Omernik <[email protected]> >>> wrote: >>>>> >>>>>> So just my opinion in reading this thread. (sorry for swooping in >> an >>>>>> opining) >>>>>> >>>>>> If a CTAS is done from any data source into Parquet files.... there >>>>> should >>>>>> be NO dependency on the original data source to query the resultant >>>>> Parquet >>>>>> files. As a Drill user, as a Drill admin, this breaks the concept >>> of >>>>>> least surprise. If I take data from one source, and create Parquet >>>> files >>>>>> in a distributed file system, it should just work. If there are >>>> "issues" >>>>>> with JDBC plugins or the HBase/Hive plugins in a similar manner, >>> these >>>>>> needs to be hunted down by a large group of villages with >> pitchforks >>>> and >>>>>> torches. I just can't see how this could be acceptable at any >> level. >>>> The >>>>>> whole idea of Parquet files is they are self describing, schema >>>> included >>>>>> files.... thus a read of a directory of Parquet files should have >> NO >>>>>> dependancies on anything but the parquet files... even the Parquet >>>>>> "additions" (such as the METADATA Cache) should be a fail open >>> thing... >>>>> if >>>>>> it exists great, use it, speed things up, but if it doesn't read >> the >>>>>> parquet files as normal (Which I believe is how it operates) >>>>>> >>>>>> John >>>>>> >>>>>> On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish < >>>>>> [email protected] >>>>>>> wrote: >>>>>> >>>>>>> Can you attempt to disable to jdbc plugin (configured with >>> SQLServer) >>>>> and >>>>>>> try the query (on parquet) when SQL Server is offline? >>>>>>> >>>>>>> I've seen a similar issue previously when the HBase / Hive plugin >>> was >>>>>>> enabled but either the plugin configuration was wrong or the >>>> underlying >>>>>>> data source was down. >>>>>>> >>>>>>> On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj >>>>> <rahul.raj@option3consulting. >>>>>>> com> >>>>>>> wrote: >>>>>>> >>>>>>>> I have created a parquet file using CTAS from a MS SQL Server. >>> The >>>>>> query >>>>>>> on >>>>>>>> parquet is getting stuck in STARTING state for a long time >> before >>>>>>> returning >>>>>>>> the results. >>>>>>>> >>>>>>>> We could see that drill was trying to connect to the MS SQL >>> server >>>>> from >>>>>>>> which the data was imported. The MSSQL server was down, drill >>> threw >>>>> an >>>>>>>> exception "Failure while attempting to load JDBC schema", and >>> then >>>>>>> returned >>>>>>>> the results. While SQL server is running, the query executes >>>> without >>>>>>>> issues. >>>>>>>> >>>>>>>> Why is drill querying the DB metadata externally and not the >>>> imported >>>>>>>> parquets? >>>>>>>> >>>>>>>> Rahul. >>>>>>>> >>>>>>>> -- >>>>>>>> **** This email and any files transmitted with it are >>> confidential >>>>> and >>>>>>>> intended solely for the use of the individual or entity to whom >>> it >>>> is >>>>>>>> addressed. If you are not the named addressee then you should >> not >>>>>>>> disseminate, distribute or copy this e-mail. Please notify the >>>> sender >>>>>>>> immediately and delete this e-mail from your system.**** >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> **** This email and any files transmitted with it are confidential and >>>> intended solely for the use of the individual or entity to whom it is >>>> addressed. If you are not the named addressee then you should not >>>> disseminate, distribute or copy this e-mail. Please notify the sender >>>> immediately and delete this e-mail from your system.**** >>>> >>> >>
