[jira] [Commented] (IMPALA-4551) Set limits on size of expression trees

2019-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907119#comment-16907119
 ] 

ASF subversion and git services commented on IMPALA-4551:
-

Commit 1908e44c3c9faac8c7bf09422ca4c5ec598ffd58 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1908e44 ]

IMPALA-4551: Limit the size of SQL statements

Various BI tools generate and run SQL. When used incorrectly or
misconfigured, the tools can generate extremely large SQLs.
Some of these SQL statements reach 10s of megabytes. Large SQL
statements impose costs throughout execution, including
statement rewrite logic in the frontend and codegen in the
backend. The resource usage of these statements can impact
the stability of the system or the ability to run other SQL
statements.

This implements two new query options that provide controls
to reject large SQL statements.
 - The first, MAX_STATEMENT_LENGTH_BYTES is a cap on the
   total size of the SQL statement (in bytes). It is
   applied before any parsing or analysis. It uses a
   default value of 16MB.
 - The second, STATEMENT_EXPRESSION_LIMIT, is a limit on
   the total number of expressions in a statement or any
   views that it references. The limit is applied upon the
   first round of analysis, but it is not reapplied when
   statement rewrite rules are applied. Certain expressions
   such as literals in IN lists or VALUES clauses are not
   analyzed and do not count towards the limit. It uses
   a default value of 250,000.
The two are complementary. Since enforcing the statement
expression limit requires parsing and analyzing the
statement, the MAX_STATEMENT_LENGTH_BYTES sets an upper
bound on the size of statement that needs to be parsed
and analyzed. Testing confirms that even statements
approaching 16MB get through the first round of analysis
within a few seconds and then are rejected.

This also changes the logging in tests/common/impala_connection.py
to limit the total SQL size that it will print to 128KB. This is
prevents the JUnitXML (which includes this logging) from being too
large. Existing tests do not run SQL larger than about 80KB, so
this only applies to tests added in this change that run multi-MB
SQLs to verify limits.

Testing:
 - This adds frontend tests that verify the low level
   semantics about how expressions are counted and verifies
   that the expression limits are enforced.
 - This adds end-to-end tests that verify both the
   MAX_STATEMENT_LENGTH_BYTES and STATEMENT_EXPRESSION_LIMIT
   at their defaults values.
 - There is also an end-to-end test that runs in exhaustive
   mode that runs a SQL with close to 250,000 expressions.

Change-Id: I5675fb4a08c1dc51ae5bcf467cbb969cc064602c
Reviewed-on: http://gerrit.cloudera.org:8080/14012
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Set limits on size of expression trees
> --
>
> Key: IMPALA-4551
> URL: https://issues.apache.org/jira/browse/IMPALA-4551
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Tim Armstrong
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: huge_case.patch
>
>
> Very large expression trees can cause havoc in various Impala components. I 
> have been experimenting with the attached test that generates large case 
> statements of varying depths and widths, and have been able to hit limits in 
> the frontend (Java OOM) and caused various runaway memory usage problems in 
> the backend (thrift structures, LLVM IR, codegen, etc).
> We should set some kind of limit here, either on the number of nodes in the 
> expression trees, or on the size of the query text, and then make sure that 
> we can execute queries of the maximum size end-to-end.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4551) Set limits on size of expression trees

2019-07-24 Thread Joe McDonnell (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892293#comment-16892293
 ] 

Joe McDonnell commented on IMPALA-4551:
---

Taking a look at this, so I'll assign it to myself.

Applying a maximum total number of expressions is relatively simple. When I 
apply a limit of 100,000 expressions, the test case fails quickly. After I 
reduce the fanout on the provided test case so that it is less than 100,000, it 
finishes in about 10-15 minutes. It spends most of its time in codegen.

An alternative is to limit the actual query text size along with the text size 
of the views referenced. Either way, the limit will be configurable and we'll 
need to think about an appropriate default.

> Set limits on size of expression trees
> --
>
> Key: IMPALA-4551
> URL: https://issues.apache.org/jira/browse/IMPALA-4551
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Tim Armstrong
>Priority: Major
> Attachments: huge_case.patch
>
>
> Very large expression trees can cause havoc in various Impala components. I 
> have been experimenting with the attached test that generates large case 
> statements of varying depths and widths, and have been able to hit limits in 
> the frontend (Java OOM) and caused various runaway memory usage problems in 
> the backend (thrift structures, LLVM IR, codegen, etc).
> We should set some kind of limit here, either on the number of nodes in the 
> expression trees, or on the size of the query text, and then make sure that 
> we can execute queries of the maximum size end-to-end.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org