Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11361#issuecomment-188664991
  
    @rxin ok. I got why it takes so long to finish the test.
    
    The original query:
    
        SELECT count(1) FROM (
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
    
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
    
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
    
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
    
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src UNION ALL
          SELECT key, value FROM src) src;
    
    will result an analyzed plan like:
    
        == Analyzed Logical Plan ==
        count(1): bigint
        Aggregate [(count(1),mode=Complete,isDistinct=false) AS count(1)#393L]
        +- SubqueryAlias src
           +- Union
              :- Union
              :  :- Union
              :  :  :- Union
              :  :  :  :- Union
              :  :  :  :  :- Union
              :  :  :  :  :  :- Union
              :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- Union
              :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :- 
Union
              ...(skip)
    
    In `HiveComparisonTest`, we will try to use `SQLBuilder` to convert 
analyzed plan back to sql query.
    
    Because PR #11195 adds a `()` to wrap sql queries for union's children, it 
will generate a deep nested sql query for union16 query:
    
        SELECT count(1) AS `count(1)` FROM (((((((((((((((((((((((((SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src`)) UNION ALL (SELECT `src`.`key`, 
`src`.`value` FROM `default
 `.`src`)) UNION ALL (SELECT `src`.`key`, `src`.`value` FROM `default`.`src`)) 
UNION ALL (SELECT `src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL 
(SELECT `src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) UNION ALL (SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src`)) AS src
    
    Basically the parser processes nested union query with a recursive 
approach, to parse such deep nested query cost much time. That is why union16 
takes so long to finish.
    
    If we remove the `()` from the sql queries for union's children in 
`SQLBuilder`, the generated sql query would be:
    
        SELECT count(1) AS `count(1)` FROM (SELECT `src`.`key`, `src`.`value` 
FROM `default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM `defa
 ult`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM `default`.`src` 
UNION ALL SELECT `src`.`key`, `src`.`value` FROM `default`.`src` UNION ALL 
SELECT `src`.`key`, `src`.`value` FROM `default`.`src` UNION ALL SELECT 
`src`.`key`, `src`.`value` FROM `default`.`src` UNION ALL SELECT `src`.`key`, 
`src`.`value` FROM `default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` 
FROM `default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src` UNION ALL SELECT `src`.`key`, `src`.`value` FROM 
`default`.`src`) AS src
    
    Then the union16 can normally finish under this patch.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to