Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16337
Let me try to summarize the comments around the structure of the test files
here:
1. A single file of 200+ test cases are too big. We prefer smaller files
with logical groupings.
2. File name with serial number is not the way Spark names files.
I'd like to generate more discussions before we come to a conclusion.
- It is possible to group test cases and what we tried to loosely group
them is naming them in groups in the test file like TC 01.xx. 01 is effectively
the group number. We can easily change to put one group in each file.
- Sometimes grouping rigidly is not desirable, or impossible. Does a test
case of 'EXISTS .. OR NOT IN' go to the 'EXISTS' group, the 'NOT IN' group, or
the 'disjunctive subquery' group? Does a test case of 'EXISTS ( .. ) UNION
EXISTS ( .. )' go into the same group as 'EXISTS ( .. UNION .. )', or the first
goes to the 'UNION' suite and the latter 'subquery' suite? Shall we have test
cases with one classification go to the "simple" set and the ones with more
than one way to classify go to the "complex" set? Overtime, people will pile up
most of them in the "complex" set and it will be bloated. And we will end up
with "complex-1", "complex-2", etc.
- Arguably we have a purpose when writing a test case but sometimes it
triggers an unrelated problem. If a test case is intended to test a subquery
functionality but ends up revealing a missed opportunity in join reordering,
should we move it into the 'join reordering' suite and leave it in the
'subquery' suite?
- With the current one level flat structure in
sql/core/src/test/resources/sql-tests/inputs/, we could possibly end up with
thousands of files in the (near) future if a file contains only a handful of
test cases. What is a good solution? Should we create a subdirectory named
subquery/ and break up the test cases into small files under this directory?
I don't think we have a silver bullet for this kind of problem. Let's
brainstorm here. I (or someone else) could moderate the discussion. Eventually
we will need to pick one way or the another. And if we need to change it in the
future, we pay the price for it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]