[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-10-28 Thread Greg Senia (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961647#comment-16961647
 ] 

Greg Senia commented on HIVE-17395:
---

After spending many hours troubleshooting this enhancement seems to be the root 
of the LParen problem

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
>   at 
> 

[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-04-01 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807152#comment-16807152
 ] 

Julian Hyde commented on HIVE-17395:


HIVE-17395 seems to be a duplicate of HIVE-18624. Do others agree?

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7324)
>   at 
> 

[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-03-26 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802180#comment-16802180
 ] 

Julian Hyde commented on HIVE-17395:


[~kgyrtkirk] Thanks. I was aware of HIVE-15388 but it does look as if 
HIVE-18624 is a better match. This might be a duplicate. It does match the 
timescale when this issue appeared.

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6988)
>   at 
> 

[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-03-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801679#comment-16801679
 ] 

Zoltan Haindrich commented on HIVE-17395:
-

[~julianhyde]: I'm not sure if it makes your problem go away or not; but 
HIVE-18624 made this issue less severe (intervals and subq both need a '(' or 
an explicit keyword - so in most cases the recursion spiral was started by 
"function")
unfortunately that patch right now is only on branch-3/master; and there is no 
Hive releases which contain it...

about antlr4it would be great but it doesn't seem possible.

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)

[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-03-25 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800973#comment-16800973
 ] 

Julian Hyde commented on HIVE-17395:


I'm running into this issue also, in my testing of Looker against Hive. It 
seems to be a regression from earlier versions of Hive. Looker generates deeply 
nested expressions, and so hits this problem hard; we are recommending that our 
customers do not upgrade to Hive 2.2, 2.3 or 3 because of this issue.

I am more of an expert on JavaCC than Antlr, but I agree with [~kgyrtkirk] that 
the problem seems to be lookaheads. The calls to 
{{org.antlr.runtime.DFA.predict}} on the stack are evidence of that. Each call 
to predict will be followed by a call to actually parse, so each call to 
predict doubles the running time. There are 12 calls, which would suggest a 
4096x slowdown.

I don't know whether an upgrade to antlr v4 is possible or planned. [A post on 
stackoverflow|https://stackoverflow.com/questions/17054285/is-it-possible-to-lookahead-in-antlr4-without-actually-matching-a-token]
 suggests that "=>" (the lookahead operator) is no longer necessary on antlr 
v4; antlr verifies lookahead as it parses. If true, that 2 ^ 12 number above 
would be become 1 ^ 12, a much nicer number!

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS 

[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2017-09-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160987#comment-16160987
 ] 

Zoltan Haindrich commented on HIVE-17395:
-

It seems to me that all the relevant fixes which are related to '(' parsing 
were already present at 2.3 release..

the problematic rule seems to be atomexpression; the parser spends quite a lot 
of time trying to judge whenever its a subquery/iterval or not...
https://github.com/apache/hive/blob/92f9d8fb4de7896cc9161bfbd461fac4f2917a35/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L412



> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)
>   at 
>