Re: Non-column filters in Drill
Thanks for all the suggestions everyone! It looks like table functions are the closest to what I am looking for. - Ryan On 2018-04-08 02:25 AM, Paul Rogers wrote: Hi Ryan, There is an obscure, but very handy feature of Drill called table functions. [1] These allow you to set parameters of your format plugin as part of a query. You mentioned a storage plugin. I've not tried a table function with a storage plugin. I have tested table functions with a format plugin. Your format or storage plugin has a Jackson-serializable Java class. Normally you set the properties for your plugin in the Drill web console. But, these can also be set in the table function. I had a use case something like yours. I defined an example "regex" plugin where the user can specify a regular expression to apply to to a text file to parse columns. The use can then provide a list of column names. Using the table function, I could specify the regex and column names per-query. This exercise did, however, point out two current limitations of table functions. First, they work only with simple data types (strings, ints). (DRILL-6169) So, my list of columns has to be a single string with a comma delimited list of columns. I could not use the more natural list of strings. Second, table functions do not retain the configured value of parameters: you have to include all parameters in the function, not just the ones you want to change. (DRILL-6168) Yet another option is to set a session option. However, unless you do a bit of clever coding, format plugins don't have visibility to session options (DRILL-5181). Perhaps your use case provides a compelling reason to fix some of these limitations... Thanks, - Paul [1] https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters, see the section "Using the Formats Attributes as Table Function Parameters". On Saturday, April 7, 2018, 10:37:05 PM PDT, Aman Sinha wrote: A better option would be to have a user-defined function that takes 2 parameters and evaluates to a boolean value. e.g select * from myTable where MyUDF(notColumn, 'value') IS TRUE; The Storage Plugin that you are developing would need to implement a pushdown rule that looks at the filter condition and if it contains 'MyUDF()', it would pushdown to the scan/reader corresponding to your plugin. On Sat, Apr 7, 2018 at 6:58 PM, Hanumath Rao Maduri wrote: Hello Ryan, Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be supplied by the underlying scan. However, I expect that this column will be present in the scan but not past the filter (notColumn = 'value') in the plan. In that case you may need to pushdown the filter to the groupScan and then remove the column projections from your custom groupscan. It would be easy for us to guess what could be the issue, if you can post the logical and physical query plan's for this query. Hope this helps. Please do let us know if you have any further issues. Thanks, On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks wrote: Hi Drill Dev Team! I am writing a custom storage plugin and I am curious if it is possible in Drill to pass a filter value, in the form of a where clause, that is not related to a column. What I would like to accomplish is something like: select * from myTable where notColumn = 'value'; In the example, notColumn is not a column in myTable, or any other table, it is just a specific parameter that the storage plugin will use in the filtering process. Additionally, notColumn would not be returned as a column so Drill needs to not expect it as a part of the 'select *'. I created a rule that will push down and remove these non-column filter calls, but I need to somehow tell drill/calcite that the filter name is valid, without actually registering it as a column. The following error occurs prior to submitting any rules: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From line 1, column 35 to line 1, column 39: Column 'notColumn' not found in any table Alternatively, can I manipulate star queries to only return a subset of all the columns for a table? Any insight would be greatly appreciated! Thanks, Ryan
Re: Non-column filters in Drill
Hi Ryan, There is an obscure, but very handy feature of Drill called table functions. [1] These allow you to set parameters of your format plugin as part of a query. You mentioned a storage plugin. I've not tried a table function with a storage plugin. I have tested table functions with a format plugin. Your format or storage plugin has a Jackson-serializable Java class. Normally you set the properties for your plugin in the Drill web console. But, these can also be set in the table function. I had a use case something like yours. I defined an example "regex" plugin where the user can specify a regular expression to apply to to a text file to parse columns. The use can then provide a list of column names. Using the table function, I could specify the regex and column names per-query. This exercise did, however, point out two current limitations of table functions. First, they work only with simple data types (strings, ints). (DRILL-6169) So, my list of columns has to be a single string with a comma delimited list of columns. I could not use the more natural list of strings. Second, table functions do not retain the configured value of parameters: you have to include all parameters in the function, not just the ones you want to change. (DRILL-6168) Yet another option is to set a session option. However, unless you do a bit of clever coding, format plugins don't have visibility to session options (DRILL-5181). Perhaps your use case provides a compelling reason to fix some of these limitations... Thanks, - Paul [1] https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters, see the section "Using the Formats Attributes as Table Function Parameters". On Saturday, April 7, 2018, 10:37:05 PM PDT, Aman Sinha wrote: A better option would be to have a user-defined function that takes 2 parameters and evaluates to a boolean value. e.g select * from myTable where MyUDF(notColumn, 'value') IS TRUE; The Storage Plugin that you are developing would need to implement a pushdown rule that looks at the filter condition and if it contains 'MyUDF()', it would pushdown to the scan/reader corresponding to your plugin. On Sat, Apr 7, 2018 at 6:58 PM, Hanumath Rao Maduri wrote: > Hello Ryan, > > Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be > supplied by the underlying scan. > However, I expect that this column will be present in the scan but not past > the filter (notColumn = 'value') in the plan. > In that case you may need to pushdown the filter to the groupScan and then > remove the column projections from your custom groupscan. > > It would be easy for us to guess what could be the issue, if you can post > the logical and physical query plan's for this query. > > Hope this helps. Please do let us know if you have any further issues. > > Thanks, > > > On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks > wrote: > > > Hi Drill Dev Team! > > > > I am writing a custom storage plugin and I am curious if it is possible > in > > Drill to pass a filter value, in the form of a where clause, that is not > > related to a column. What I would like to accomplish is something like: > > > > select * from myTable where notColumn = 'value'; > > > > In the example, notColumn is not a column in myTable, or any other table, > > it is just a specific parameter that the storage plugin will use in the > > filtering process. Additionally, notColumn would not be returned as a > > column so Drill needs to not expect it as a part of the 'select *'. I > > created a rule that will push down and remove these non-column filter > > calls, but I need to somehow tell drill/calcite that the filter name is > > valid, without actually registering it as a column. The following error > > occurs prior to submitting any rules: > > > > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION > ERROR: > > From line 1, column 35 to line 1, column 39: Column 'notColumn' not found > > in any table > > > > > > Alternatively, can I manipulate star queries to only return a subset of > > all the columns for a table? > > > > Any insight would be greatly appreciated! > > > > Thanks, > > Ryan > > >
Re: Non-column filters in Drill
A better option would be to have a user-defined function that takes 2 parameters and evaluates to a boolean value. e.g select * from myTable where MyUDF(notColumn, 'value') IS TRUE; The Storage Plugin that you are developing would need to implement a pushdown rule that looks at the filter condition and if it contains 'MyUDF()', it would pushdown to the scan/reader corresponding to your plugin. On Sat, Apr 7, 2018 at 6:58 PM, Hanumath Rao Maduri wrote: > Hello Ryan, > > Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be > supplied by the underlying scan. > However, I expect that this column will be present in the scan but not past > the filter (notColumn = 'value') in the plan. > In that case you may need to pushdown the filter to the groupScan and then > remove the column projections from your custom groupscan. > > It would be easy for us to guess what could be the issue, if you can post > the logical and physical query plan's for this query. > > Hope this helps. Please do let us know if you have any further issues. > > Thanks, > > > On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks > wrote: > > > Hi Drill Dev Team! > > > > I am writing a custom storage plugin and I am curious if it is possible > in > > Drill to pass a filter value, in the form of a where clause, that is not > > related to a column. What I would like to accomplish is something like: > > > > select * from myTable where notColumn = 'value'; > > > > In the example, notColumn is not a column in myTable, or any other table, > > it is just a specific parameter that the storage plugin will use in the > > filtering process. Additionally, notColumn would not be returned as a > > column so Drill needs to not expect it as a part of the 'select *'. I > > created a rule that will push down and remove these non-column filter > > calls, but I need to somehow tell drill/calcite that the filter name is > > valid, without actually registering it as a column. The following error > > occurs prior to submitting any rules: > > > > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION > ERROR: > > From line 1, column 35 to line 1, column 39: Column 'notColumn' not found > > in any table > > > > > > Alternatively, can I manipulate star queries to only return a subset of > > all the columns for a table? > > > > Any insight would be greatly appreciated! > > > > Thanks, > > Ryan > > >
Re: Non-column filters in Drill
Ryan What would happen if you defined a column so that you could use the normal pushdown mechanism? In most cases, you wouldn't return the value of the column since the only purpose is to use as a filter, but nothing should prevent you from returning the value of this not-really-a-column. By letting it be considered as if a column, all of the normal mechanisms can be brought to bear. Another case can be seen in how the CSV reader lets your inject separators and such. On Sat, Apr 7, 2018, 16:23 Ryan Shanks wrote: > Hi Drill Dev Team! > > I am writing a custom storage plugin and I am curious if it is possible > in Drill to pass a filter value, in the form of a where clause, that is > not related to a column. What I would like to accomplish is something like: > > select * from myTable where notColumn = 'value'; > > In the example, notColumn is not a column in myTable, or any other > table, it is just a specific parameter that the storage plugin will use > in the filtering process. Additionally, notColumn would not be returned > as a column so Drill needs to not expect it as a part of the 'select *'. > I created a rule that will push down and remove these non-column filter > calls, but I need to somehow tell drill/calcite that the filter name is > valid, without actually registering it as a column. The following error > occurs prior to submitting any rules: > > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION > ERROR: From line 1, column 35 to line 1, column 39: Column 'notColumn' > not found in any table > > > Alternatively, can I manipulate star queries to only return a subset of > all the columns for a table? > > Any insight would be greatly appreciated! > > Thanks, > Ryan >
Re: Non-column filters in Drill
Hello Ryan, Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be supplied by the underlying scan. However, I expect that this column will be present in the scan but not past the filter (notColumn = 'value') in the plan. In that case you may need to pushdown the filter to the groupScan and then remove the column projections from your custom groupscan. It would be easy for us to guess what could be the issue, if you can post the logical and physical query plan's for this query. Hope this helps. Please do let us know if you have any further issues. Thanks, On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks wrote: > Hi Drill Dev Team! > > I am writing a custom storage plugin and I am curious if it is possible in > Drill to pass a filter value, in the form of a where clause, that is not > related to a column. What I would like to accomplish is something like: > > select * from myTable where notColumn = 'value'; > > In the example, notColumn is not a column in myTable, or any other table, > it is just a specific parameter that the storage plugin will use in the > filtering process. Additionally, notColumn would not be returned as a > column so Drill needs to not expect it as a part of the 'select *'. I > created a rule that will push down and remove these non-column filter > calls, but I need to somehow tell drill/calcite that the filter name is > valid, without actually registering it as a column. The following error > occurs prior to submitting any rules: > > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: > From line 1, column 35 to line 1, column 39: Column 'notColumn' not found > in any table > > > Alternatively, can I manipulate star queries to only return a subset of > all the columns for a table? > > Any insight would be greatly appreciated! > > Thanks, > Ryan >
Non-column filters in Drill
Hi Drill Dev Team! I am writing a custom storage plugin and I am curious if it is possible in Drill to pass a filter value, in the form of a where clause, that is not related to a column. What I would like to accomplish is something like: select * from myTable where notColumn = 'value'; In the example, notColumn is not a column in myTable, or any other table, it is just a specific parameter that the storage plugin will use in the filtering process. Additionally, notColumn would not be returned as a column so Drill needs to not expect it as a part of the 'select *'. I created a rule that will push down and remove these non-column filter calls, but I need to somehow tell drill/calcite that the filter name is valid, without actually registering it as a column. The following error occurs prior to submitting any rules: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: From line 1, column 35 to line 1, column 39: Column 'notColumn' not found in any table Alternatively, can I manipulate star queries to only return a subset of all the columns for a table? Any insight would be greatly appreciated! Thanks, Ryan