[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

nsyca Mon, 19 Dec 2016 20:47:40 -0800

Github user nsyca commented on the issue:

    https://github.com/apache/spark/pull/16337
  
    Let me try to summarize the comments around the structure of the test files 
here:
    1. A single file of 200+ test cases are too big. We prefer smaller files 
with logical groupings.
    2. File name with serial number is not the way Spark names files.
    
    I'd like to generate more discussions before we come to a conclusion.
    - It is possible to group test cases and what we tried to loosely group 
them is naming them in groups in the test file like TC 01.xx. 01 is effectively 
the group number. We can easily change to put one group in each file.
    - Sometimes grouping rigidly is not desirable, or impossible. Does a test 
case of 'EXISTS .. OR NOT IN' go to the 'EXISTS' group, the 'NOT IN' group, or 
the 'disjunctive subquery' group? Does a test case of 'EXISTS ( .. ) UNION 
EXISTS ( .. )' go into the same group as 'EXISTS ( .. UNION .. )', or the first 
goes to the 'UNION' suite and the latter 'subquery' suite? Shall we have test 
cases with one classification go to the "simple" set and the ones with more 
than one way to classify go to the "complex" set? Overtime, people will pile up 
most of them in the "complex" set and it will be bloated. And we will end up 
with "complex-1", "complex-2", etc.
    - Arguably we have a purpose when writing a test case but sometimes it 
triggers an unrelated problem. If a test case is intended to test a subquery 
functionality but ends up revealing a missed opportunity in join reordering, 
should we move it into the 'join reordering' suite and leave it in the 
'subquery' suite?
    - With the current one level flat structure in 
sql/core/src/test/resources/sql-tests/inputs/, we could possibly end up with 
thousands of files in the (near) future if a file contains only a handful of 
test cases. What is a good solution? Should we create a subdirectory named 
subquery/ and break up the test cases into small files under this directory? 
    
    I don't think we have a silver bullet for this kind of problem. Let's 
brainstorm here. I (or someone else) could moderate the discussion. Eventually 
we will need to pick one way or the another. And if we need to change it in the 
future, we pay the price for it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

Reply via email to