[jira] [Commented] (CALCITE-6301) Extend ‘Must-filter’ columns to support a conditional bypass list

2024-03-12 Thread Oliver Lee (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825861#comment-17825861
 ] 

Oliver Lee commented on CALCITE-6301:
-

[~julianhyde] 

We discussed adding in two more test cases

 
 # project bypass fields but not the must-filter field, and inspect what 
happens if bypass field is filtered on enclosing query (should pass)
 # project mustFilter fields but not the bypass fields (this should fail if 
enclosing query does not filter on the mustFilter fields)

I've updated the PR with an additional commit with the test cases, first 2 
belonging to (1) and the third for (2)

 

Commit: 
[https://github.com/olivrlee/calcite/commit/8de5f6ec232bd235632f92996be2f97df31f0581]
 

 

A test for (1) does not pass and needs additional handling. 

> Extend ‘Must-filter’ columns to support a conditional bypass list
> -
>
> Key: CALCITE-6301
> URL: https://issues.apache.org/jira/browse/CALCITE-6301
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>  Labels: pull-request-available
>
> In CALCITE-6219 we introduced SemanticTable, where tables that implement this 
> interface can define fields to be ‘must-filter’, and a query without those 
> filters in any of its WHERE or HAVING clauses, it will throw a validation 
> error.
>  
> I would like to extend this functionality to support a by-pass list of fields 
> such that if any field from this secondary list is present in a WHERE / 
> HAVING clause, then the must-filter fields can be ignored and will not raise 
> an exception if not filtered on. 
>  
> Ex.
>  
> EMP table specifies the following:
> Must-filter-fields: [EMPNO, DEPTNO]
> Bypass-fields: [ENAME, SALARY]
>  
>  
> SELECT * FROM EMP WHERE EMPNO = 1 and DEPTNO = 2 -> No error
> SELECT * FROM EMP WHERE EMPNO = 1 -> Error
> SELECT * FROM EMP WHERE EMPNO = 1 and ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE SALARY > 10 -> No error
>  
>  
>  
> Again, special considerations are for handling 
>  
>  * Joins
>  * CTEs
>  * Subqueries
>  
>  
> And a similar exhaustive suite of tests like the one for CALCITE-6219 should 
> be employed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6301) Extend ‘Must-filter’ columns to support a conditional bypass list

2024-03-06 Thread Oliver Lee (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824175#comment-17824175
 ] 

Oliver Lee commented on CALCITE-6301:
-

One example that comes to mind:

Let's say you have a table {{ORDERS}} with columns {{order_id, order_date, 
price, etc}} , and that {{order_id}} is ascending numbers

A user might want :

[must-filter on order_id > 1000]

and bypass the {{order_id}} filter requirement if they filter by {{order_date}} 
> 2024-01-01 

 

> Extend ‘Must-filter’ columns to support a conditional bypass list
> -
>
> Key: CALCITE-6301
> URL: https://issues.apache.org/jira/browse/CALCITE-6301
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>
> In CALCITE-6219 we introduced SemanticTable, where tables that implement this 
> interface can define fields to be ‘must-filter’, and a query without those 
> filters in any of its WHERE or HAVING clauses, it will throw a validation 
> error.
>  
> I would like to extend this functionality to support a by-pass list of fields 
> such that if any field from this secondary list is present in a WHERE / 
> HAVING clause, then the must-filter fields can be ignored and will not raise 
> an exception if not filtered on. 
>  
> Ex.
>  
> EMP table specifies the following:
> Must-filter-fields: [EMPNO, DEPTNO]
> Bypass-fields: [ENAME, SALARY]
>  
>  
> SELECT * FROM EMP WHERE EMPNO = 1 and DEPTNO = 2 -> No error
> SELECT * FROM EMP WHERE EMPNO = 1 -> Error
> SELECT * FROM EMP WHERE EMPNO = 1 and ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE SALARY > 10 -> No error
>  
>  
>  
> Again, special considerations are for handling 
>  
>  * Joins
>  * CTEs
>  * Subqueries
>  
>  
> And a similar exhaustive suite of tests like the one for CALCITE-6219 should 
> be employed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6301) Extend ‘Must-filter’ columns to support a conditional bypass list

2024-03-06 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824124#comment-17824124
 ] 

Alessandro Solimando commented on CALCITE-6301:
---

Thanks [~oliverlee], it's clear now with the updated example.

One question I still have is under what circumstances .

I have seen similar asks in the past for "must-filter" to force users to write 
queries that could leverage indexes or partitioning, but I am having a 
hard-time seeing how the "bypass-list" would come to rescue.

Could you sketch a use-case where the "bypass-list" would be helpful?

Just to be clear, I have nothing against this improvement, just curious to 
understand the rationale behind it.

> Extend ‘Must-filter’ columns to support a conditional bypass list
> -
>
> Key: CALCITE-6301
> URL: https://issues.apache.org/jira/browse/CALCITE-6301
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>
> In CALCITE-6219 we introduced SemanticTable, where tables that implement this 
> interface can define fields to be ‘must-filter’, and a query without those 
> filters in any of its WHERE or HAVING clauses, it will throw a validation 
> error.
>  
> I would like to extend this functionality to support a by-pass list of fields 
> such that if any field from this secondary list is present in a WHERE / 
> HAVING clause, then the must-filter fields can be ignored and will not raise 
> an exception if not filtered on. 
>  
> Ex.
>  
> EMP table specifies the following:
> Must-filter-fields: [EMPNO, DEPTNO]
> Bypass-fields: [ENAME, SALARY]
>  
>  
> SELECT * FROM EMP WHERE EMPNO = 1 and DEPTNO = 2 -> No error
> SELECT * FROM EMP WHERE EMPNO = 1 -> Error
> SELECT * FROM EMP WHERE EMPNO = 1 and ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE SALARY > 10 -> No error
>  
>  
>  
> Again, special considerations are for handling 
>  
>  * Joins
>  * CTEs
>  * Subqueries
>  
>  
> And a similar exhaustive suite of tests like the one for CALCITE-6219 should 
> be employed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6301) Extend ‘Must-filter’ columns to support a conditional bypass list

2024-03-06 Thread Oliver Lee (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824119#comment-17824119
 ] 

Oliver Lee commented on CALCITE-6301:
-

Yeah sorry the previous example wasn't clear about the distinction of the 
second list. I've updated the example to have 2 fields in the bypass-field list 
to show that only one field from that bypass-list being filtered on makes the 
query valid. 

In your examples, both are valid. 

 

The requirements for a valid query are
 * all fields that are marked as must-filter have to be filtered on individually
 * at least one of the fields in the bypass-list are filtered on

 

In the example I provided the must-filter fields as a list, but in 
implementation, whether a field is must-filter or not is actually a property of 
each column, so from a table perspective it doesn't have a concept of 
individual lists

> Extend ‘Must-filter’ columns to support a conditional bypass list
> -
>
> Key: CALCITE-6301
> URL: https://issues.apache.org/jira/browse/CALCITE-6301
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>
> In CALCITE-6219 we introduced SemanticTable, where tables that implement this 
> interface can define fields to be ‘must-filter’, and a query without those 
> filters in any of its WHERE or HAVING clauses, it will throw a validation 
> error.
>  
> I would like to extend this functionality to support a by-pass list of fields 
> such that if any field from this secondary list is present in a WHERE / 
> HAVING clause, then the must-filter fields can be ignored and will not raise 
> an exception if not filtered on. 
>  
> Ex.
>  
> EMP table specifies the following:
> Must-filter-fields: [EMPNO, DEPTNO]
> Bypass-fields: [ENAME, SALARY]
>  
>  
> SELECT * FROM EMP WHERE EMPNO = 1 and DEPTNO = 2 -> No error
> SELECT * FROM EMP WHERE EMPNO = 1 -> Error
> SELECT * FROM EMP WHERE EMPNO = 1 and ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE SALARY > 10 -> No error
>  
>  
>  
> Again, special considerations are for handling 
>  
>  * Joins
>  * CTEs
>  * Subqueries
>  
>  
> And a similar exhaustive suite of tests like the one for CALCITE-6219 should 
> be employed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6301) Extend ‘Must-filter’ columns to support a conditional bypass list

2024-03-05 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823890#comment-17823890
 ] 

Alessandro Solimando commented on CALCITE-6301:
---

Isn't this equivalent to say that "EMPNAME" is an alternative to the 
"Must-filter-fields"?

In that case I think it's simpler to extend the syntax to provide multiple 
"Must-filter-fields" lists, something like "[EMPNO, DEPTNO],[ENAME]" (a list of 
lists, conceptually, where each list is an independent alternative).

>From your example it's not clear if multiple by-pass fields values would 
>require to be there at the same time (very much like "Must-filter-fields"), or 
>if any of them alone would do.

For instance, for Bypass-fields: [ENAME, EMPNO], would the following queries be 
valid or not?

SELECT * FROM EMP WHERE ENAME = ’name’
SELECT * FROM EMP WHERE EMPNO = 1 

In any case, the multiple "Must-filter-fields" syntax would allow to express 
both pretty naturally.

You can even probably re-use the same machinery you coded for 
"Must-filter-fields" more easily.

WDYT?

> Extend ‘Must-filter’ columns to support a conditional bypass list
> -
>
> Key: CALCITE-6301
> URL: https://issues.apache.org/jira/browse/CALCITE-6301
> Project: Calcite
>  Issue Type: Improvement
>Reporter: Oliver Lee
>Assignee: Oliver Lee
>Priority: Major
>
> In [CALCITE-6219] we introduced SemanticTable, where tables that implement 
> this interface can define fields to be ‘must-filter’, and a query without 
> those filters in any of its WHERE or HAVING clauses, it will throw a 
> validation error.
>  
> I would like to extend this functionality to support a by-pass list of fields 
> such that if any field from this secondary list is present in a WHERE / 
> HAVING clause, then the must-filter fields can be ignored and will not raise 
> an exception if not filtered on. 
>  
> Ex.
>  
> EMP table specifies the following:
> Must-filter-fields: [EMPNO, DEPTNO]
> Bypass-fields: [ENAME]
>  
>  
> SELECT * FROM EMP WHERE EMPNO = 1 and DEPTNO = 2 -> No error
> SELECT * FROM EMP WHERE EMPNO = 1 -> Error
> SELECT * FROM EMP WHERE EMPNO = 1 and ENAME = ’name’ -> No error
> SELECT * FROM EMP WHERE ENAME = ’name’ -> No error
>  
>  
>  
> Again, special considerations are for handling 
>  
>  * Joins
>  * CTEs
>  * Subqueries
>  
>  
> And a similar exhaustive suite of tests like the one for [CALCITE-6219] 
> should be employed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)