[jira] [Resolved] (IMPALA-10821) TestTPCHJoinQueries.test_outer_joins failed in s3 build

2021-08-09 Thread Yida Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yida Wu resolved IMPALA-10821.
--
Resolution: Fixed

> TestTPCHJoinQueries.test_outer_joins failed in s3 build
> ---
>
> Key: IMPALA-10821
> URL: https://issues.apache.org/jira/browse/IMPALA-10821
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Wenzhe Zhou
>Assignee: Yida Wu
>Priority: Major
>  Labels: broken-build
>
> The unit-test TestTPCHJoinQueries.test_outer_joins failed in following build:
> [https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/63/]
>  
> The failed test case was added recently by patch: 
> [https://gerrit.cloudera.org/#/c/17610/]
> Error Message
> query_test/test_join_queries.py:155: in test_outer_joins 
> self.run_test_case('tpch-outer-joins', new_vector) 
> common/impala_test_suite.py:709: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:545: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:246: in verify_query_result_is_subset assert 
> expected_literal_strings <= actual_literal_strings E assert Items in expected 
> results not found in actual results: E '| 00:SCAN HDFS [default.t1 b]' E 
> '01:SCAN HDFS [default.t2 a]' E Items in actual results: E '05:EXCHANGE 
> [UNPARTITIONED]' E '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]' E '| |' E ' 
> runtime filters: RF000 -> a.`SELECT`, RF001 -> a.`select`' E 'PLAN-ROOT SINK' 
> E '' E '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]' E 'default.t1, 
> default.t2' E '| row-size=8B cardinality=37' E '03:EXCHANGE 
> [HASH(a.`SELECT`,a.`select`)]' E '|' E '| row-size=16B cardinality=37' E ' 
> row-size=8B cardinality=78.25K' E '| hash predicates: a.`SELECT` = 
> b.`INSERT`, a.`select` = b.`insert`' E 'WARNING: The following tables are 
> missing relevant table and/or column statistics.' E '| runtime filters: RF000 
> <- b.`INSERT`, RF001 <- b.`insert`' E '| S3 partitions=1/1 files=1 size=292B' 
> E '| 00:SCAN S3 [default.t1 b]' E 'Per-Host Resource Estimates: Memory=75MB' 
> E '01:SCAN S3 [default.t2 a]' E ' S3 partitions=1/1 files=1 size=611.34KB' E 
> 'Max Per-Host Resource Reservation: Memory=10.95MB Threads=6'
> Stacktrace
> query_test/test_join_queries.py:155: in test_outer_joins
>  self.run_test_case('tpch-outer-joins', new_vector)
> common/impala_test_suite.py:709: in run_test_case
>  self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:545: in __verify_results_and_errors
>  replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
>  VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:246: in verify_query_result_is_subset
>  assert expected_literal_strings <= actual_literal_strings
> E assert Items in expected results not found in actual results:
> E '| 00:SCAN HDFS [default.t1 b]'
> E '01:SCAN HDFS [default.t2 a]'
> E Items in actual results:
> E '05:EXCHANGE [UNPARTITIONED]'
> E '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]'
> E '| |'
> E ' runtime filters: RF000 -> a.`SELECT`, RF001 -> a.`select`'
> E 'PLAN-ROOT SINK'
> E ''
> E '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]'
> E 'default.t1, default.t2'
> E '| row-size=8B cardinality=37'
> E '03:EXCHANGE [HASH(a.`SELECT`,a.`select`)]'
> E '|'
> E '| row-size=16B cardinality=37'
> E ' row-size=8B cardinality=78.25K'
> E '| hash predicates: a.`SELECT` = b.`INSERT`, a.`select` = b.`insert`'
> E 'WARNING: The following tables are missing relevant table and/or column 
> statistics.'
> E '| runtime filters: RF000 <- b.`INSERT`, RF001 <- b.`insert`'
> E '| S3 partitions=1/1 files=1 size=292B'
> E '| 00:SCAN S3 [default.t1 b]'
> E 'Per-Host Resource Estimates: Memory=75MB'
> E '01:SCAN S3 [default.t2 a]'
> E ' S3 partitions=1/1 files=1 size=611.34KB'
> E 'Max Per-Host Resource Reservation: Memory=10.95MB Threads=6'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10821) TestTPCHJoinQueries.test_outer_joins failed in s3 build

2021-08-09 Thread Yida Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yida Wu resolved IMPALA-10821.
--
Resolution: Fixed

> TestTPCHJoinQueries.test_outer_joins failed in s3 build
> ---
>
> Key: IMPALA-10821
> URL: https://issues.apache.org/jira/browse/IMPALA-10821
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.1.0
>Reporter: Wenzhe Zhou
>Assignee: Yida Wu
>Priority: Major
>  Labels: broken-build
>
> The unit-test TestTPCHJoinQueries.test_outer_joins failed in following build:
> [https://master-03.jenkins.cloudera.com/job/impala-asf-master-core-s3/63/]
>  
> The failed test case was added recently by patch: 
> [https://gerrit.cloudera.org/#/c/17610/]
> Error Message
> query_test/test_join_queries.py:155: in test_outer_joins 
> self.run_test_case('tpch-outer-joins', new_vector) 
> common/impala_test_suite.py:709: in run_test_case 
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:545: in __verify_results_and_errors 
> replace_filenames_with_placeholder) common/test_result_verifier.py:469: in 
> verify_raw_results VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:246: in verify_query_result_is_subset assert 
> expected_literal_strings <= actual_literal_strings E assert Items in expected 
> results not found in actual results: E '| 00:SCAN HDFS [default.t1 b]' E 
> '01:SCAN HDFS [default.t2 a]' E Items in actual results: E '05:EXCHANGE 
> [UNPARTITIONED]' E '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]' E '| |' E ' 
> runtime filters: RF000 -> a.`SELECT`, RF001 -> a.`select`' E 'PLAN-ROOT SINK' 
> E '' E '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]' E 'default.t1, 
> default.t2' E '| row-size=8B cardinality=37' E '03:EXCHANGE 
> [HASH(a.`SELECT`,a.`select`)]' E '|' E '| row-size=16B cardinality=37' E ' 
> row-size=8B cardinality=78.25K' E '| hash predicates: a.`SELECT` = 
> b.`INSERT`, a.`select` = b.`insert`' E 'WARNING: The following tables are 
> missing relevant table and/or column statistics.' E '| runtime filters: RF000 
> <- b.`INSERT`, RF001 <- b.`insert`' E '| S3 partitions=1/1 files=1 size=292B' 
> E '| 00:SCAN S3 [default.t1 b]' E 'Per-Host Resource Estimates: Memory=75MB' 
> E '01:SCAN S3 [default.t2 a]' E ' S3 partitions=1/1 files=1 size=611.34KB' E 
> 'Max Per-Host Resource Reservation: Memory=10.95MB Threads=6'
> Stacktrace
> query_test/test_join_queries.py:155: in test_outer_joins
>  self.run_test_case('tpch-outer-joins', new_vector)
> common/impala_test_suite.py:709: in run_test_case
>  self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:545: in __verify_results_and_errors
>  replace_filenames_with_placeholder)
> common/test_result_verifier.py:469: in verify_raw_results
>  VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:246: in verify_query_result_is_subset
>  assert expected_literal_strings <= actual_literal_strings
> E assert Items in expected results not found in actual results:
> E '| 00:SCAN HDFS [default.t1 b]'
> E '01:SCAN HDFS [default.t2 a]'
> E Items in actual results:
> E '05:EXCHANGE [UNPARTITIONED]'
> E '|--04:EXCHANGE [HASH(b.`INSERT`,b.`insert`)]'
> E '| |'
> E ' runtime filters: RF000 -> a.`SELECT`, RF001 -> a.`select`'
> E 'PLAN-ROOT SINK'
> E ''
> E '02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]'
> E 'default.t1, default.t2'
> E '| row-size=8B cardinality=37'
> E '03:EXCHANGE [HASH(a.`SELECT`,a.`select`)]'
> E '|'
> E '| row-size=16B cardinality=37'
> E ' row-size=8B cardinality=78.25K'
> E '| hash predicates: a.`SELECT` = b.`INSERT`, a.`select` = b.`insert`'
> E 'WARNING: The following tables are missing relevant table and/or column 
> statistics.'
> E '| runtime filters: RF000 <- b.`INSERT`, RF001 <- b.`insert`'
> E '| S3 partitions=1/1 files=1 size=292B'
> E '| 00:SCAN S3 [default.t1 b]'
> E 'Per-Host Resource Estimates: Memory=75MB'
> E '01:SCAN S3 [default.t2 a]'
> E ' S3 partitions=1/1 files=1 size=611.34KB'
> E 'Max Per-Host Resource Reservation: Memory=10.95MB Threads=6'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7954) Support automatic invalidates using metastore notification events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-7954.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Support automatic invalidates using metastore notification events
> -
>
> Key: IMPALA-7954
> URL: https://issues.apache.org/jira/browse/IMPALA-7954
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
> Attachments: Automatic_invalidate_DesignDoc_v1.pdf, 
> Impala_Catalogd_Auto_Metadata_Update_v2.pdf
>
>
> Currently, in Impala there are multiple ways to invalidate or refresh the 
> metadata stored in Catalog for Tables. Objects in Catalog can be invalidated 
> either on usage based approach (invalidate_tables_timeout_s) or when there is 
> GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. 
> However, most users issue invalidate commands when they want to sync to the 
> latest information from HDFS or HMS. Unfortunately, when data is modified or 
> new data is added outside Impala (eg. Hive) or a different Impala cluster, 
> users don't have a clear idea on whether they have to issue invalidate or 
> not. To be on the safer side, users keep issuing invalidate commands more 
> than necessary and it causes performance as well as stability issues.
> Hive Metastore provides a simple API to get incremental updates to the 
> metadata information stored in its database. Each API which does a 
> add/alter/drop operation in metastore generates event(s) which can be fetched 
> using {{get_next_notification}} API. Each event has a unique and increasing 
> event_id. The current notification event id can be fetched using 
> {{get_current_notificationEventId}} API.
> This JIRA proposes to make use of such events from metastore to proactively 
> either invalidate or refresh information in the catalogD. When configured, 
> CatalogD could poll for such events and take action (like add/drop/refresh 
> partition, add/drop/invalidate tables and databases) based on the events. 
> This way we can automatically refresh the catalogD state using events and it 
> would greatly help the use-cases where users want to see the latest 
> information (within a configurable interval of time delay) without flooding 
> the system with invalidate requests.
> I will be attaching a design doc to this JIRA and create subtasks for the 
> work. Feel free to make comments on the JIRA or make suggestions to improve 
> the design.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7954) Support automatic invalidates using metastore notification events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-7954.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Support automatic invalidates using metastore notification events
> -
>
> Key: IMPALA-7954
> URL: https://issues.apache.org/jira/browse/IMPALA-7954
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
> Attachments: Automatic_invalidate_DesignDoc_v1.pdf, 
> Impala_Catalogd_Auto_Metadata_Update_v2.pdf
>
>
> Currently, in Impala there are multiple ways to invalidate or refresh the 
> metadata stored in Catalog for Tables. Objects in Catalog can be invalidated 
> either on usage based approach (invalidate_tables_timeout_s) or when there is 
> GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. 
> However, most users issue invalidate commands when they want to sync to the 
> latest information from HDFS or HMS. Unfortunately, when data is modified or 
> new data is added outside Impala (eg. Hive) or a different Impala cluster, 
> users don't have a clear idea on whether they have to issue invalidate or 
> not. To be on the safer side, users keep issuing invalidate commands more 
> than necessary and it causes performance as well as stability issues.
> Hive Metastore provides a simple API to get incremental updates to the 
> metadata information stored in its database. Each API which does a 
> add/alter/drop operation in metastore generates event(s) which can be fetched 
> using {{get_next_notification}} API. Each event has a unique and increasing 
> event_id. The current notification event id can be fetched using 
> {{get_current_notificationEventId}} API.
> This JIRA proposes to make use of such events from metastore to proactively 
> either invalidate or refresh information in the catalogD. When configured, 
> CatalogD could poll for such events and take action (like add/drop/refresh 
> partition, add/drop/invalidate tables and databases) based on the events. 
> This way we can automatically refresh the catalogD state using events and it 
> would greatly help the use-cases where users want to see the latest 
> information (within a configurable interval of time delay) without flooding 
> the system with invalidate requests.
> I will be attaching a design doc to this JIRA and create subtasks for the 
> work. Feel free to make comments on the JIRA or make suggestions to improve 
> the design.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10273) Support function events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-10273:
-
Parent: (was: IMPALA-7954)
Issue Type: Improvement  (was: Sub-task)

> Support function events
> ---
>
> Key: IMPALA-10273
> URL: https://issues.apache.org/jira/browse/IMPALA-10273
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> HMS creates ADD_FUNCTION, ALTER_FUNCTION and DROP_FUNCTION events when a 
> function is created/dropped/altered. We can add use these events to refresh 
> the functions in catalogd using the events processor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9857) Batch ALTER_PARTITION events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-9857:

Parent: (was: IMPALA-7954)
Issue Type: Improvement  (was: Sub-task)

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8592) Add support for insert events for 'LOAD DATA..' statements from Impala.

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8592:

Parent: (was: IMPALA-7954)
Issue Type: Improvement  (was: Sub-task)

> Add support for insert events for 'LOAD DATA..' statements from Impala.
> ---
>
> Key: IMPALA-8592
> URL: https://issues.apache.org/jira/browse/IMPALA-8592
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Anurag Mantripragada
>Priority: Major
>
> Hive generates INSERT events for LOAD DATA.. statements. We should support 
> the same in Impala.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8795) Enable event polling by default in tests

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-8795.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Enable event polling by default in tests
> 
>
> Key: IMPALA-8795
> URL: https://issues.apache.org/jira/browse/IMPALA-8795
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> We should turn on event processing by default in all the tests to make sure 
> that there are no regressions when we turn ON the feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8795) Enable event polling by default in tests

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-8795.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Enable event polling by default in tests
> 
>
> Key: IMPALA-8795
> URL: https://issues.apache.org/jira/browse/IMPALA-8795
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> We should turn on event processing by default in all the tests to make sure 
> that there are no regressions when we turn ON the feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10849) A LIKE predicate that ends in an escaped wildcard is incorrectly evaluated

2021-08-09 Thread Andrew Sherman (Jira)
Andrew Sherman created IMPALA-10849:
---

 Summary: A LIKE predicate that ends in an escaped wildcard is 
incorrectly evaluated
 Key: IMPALA-10849
 URL: https://issues.apache.org/jira/browse/IMPALA-10849
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Andrew Sherman
Assignee: Andrew Sherman


If the last character of a LIKE predicate is an escaped wildcard (e.g. LIKE 
foo\%) then it is incorrectly evaluated. This is because the fast path 
optimizations in LikePrepareInternal treat the predicate as being a search for 
a string with a fixed prefix. If the fast path optimizations are commented out 
then the LIKE is evaluated correctly.

A possible fix would be to make the fast path optimizations recognize that 
escaped wildcards cannot be evaluated by the fixed prefix search.

This is a simpler bug than that discussed in IMPALA-2422 which is to do with 
ambiguities in the logic of unescaping string literals (which is more tricky to 
fix).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10849) A LIKE predicate that ends in an escaped wildcard is incorrectly evaluated

2021-08-09 Thread Andrew Sherman (Jira)
Andrew Sherman created IMPALA-10849:
---

 Summary: A LIKE predicate that ends in an escaped wildcard is 
incorrectly evaluated
 Key: IMPALA-10849
 URL: https://issues.apache.org/jira/browse/IMPALA-10849
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Andrew Sherman
Assignee: Andrew Sherman


If the last character of a LIKE predicate is an escaped wildcard (e.g. LIKE 
foo\%) then it is incorrectly evaluated. This is because the fast path 
optimizations in LikePrepareInternal treat the predicate as being a search for 
a string with a fixed prefix. If the fast path optimizations are commented out 
then the LIKE is evaluated correctly.

A possible fix would be to make the fast path optimizations recognize that 
escaped wildcards cannot be evaluated by the fixed prefix search.

This is a simpler bug than that discussed in IMPALA-2422 which is to do with 
ambiguities in the logic of unescaping string literals (which is more tricky to 
fix).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-10848) Provide compile-only option to skip downloading test dependencies

2021-08-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10848:
---

Assignee: yx91490

> Provide compile-only option to skip downloading test dependencies
> -
>
> Key: IMPALA-10848
> URL: https://issues.apache.org/jira/browse/IMPALA-10848
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: yx91490
>Priority: Major
> Attachments: pywebhdfs_failure.png
>
>
> Compiling Impala is not easy for a beginner. A portion of failures are in 
> downloading/installing dependencies.
> For instance, old versions of Impala may fail to compile since cdh components 
> of old GBNs on S3 are removed. However, the artifacts of cdh component are 
> only used in testing (minicluster & holding testdata). We can still compile 
> without them.
> Take pip dependencies as another example, here is a failure I got from a 
> community user. It failed by installing pywebhdfs:
> !pywebhdfs_failure.png!
> However, simple git-grep shows that pywebhdfs is only used in tests:
> {code:bash}
> $ git grep pywebhdfs
> bin/bootstrap_system.sh:#  >>> from pywebhdfs.webhdfs import PyWebHdfsClient
> infra/python/deps/requirements.txt:pywebhdfs == 0.3.2
> tests/common/impala_test_suite.py:    #     HDFS: uses a mixture of pywebhdfs 
> (which is faster than the HDFS CLI) and the
> tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, 
> errors, _raise_pywebhdfs_exception
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text) {code}
> If the user just wants to compile Impala and deploys it in their existing 
> Hadoop cluster, dealing with these failures is a waste of their time.
> *Target for this JIRA*
>  * Provide compile-only option to bin/bootstrap_system.sh. It should skip 
> downloading/installing unused dependencies like postgresql.
>  * Provide compile-only option to buildall.sh. It should skip downloading 
> unused cdh/cdp components in compilation.
>  * Update our 
> [wiki|https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala] 
> about this.
> Note that we already have some env vars to control the download behaviors, 
> e.g. SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the 
> compile-only scenario works with minimal requirements and document it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10848) Provide compile-only option to skip downloading test dependencies

2021-08-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10848:
---

 Summary: Provide compile-only option to skip downloading test 
dependencies
 Key: IMPALA-10848
 URL: https://issues.apache.org/jira/browse/IMPALA-10848
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Reporter: Quanlong Huang
 Attachments: pywebhdfs_failure.png

Compiling Impala is not easy for a beginner. A portion of failures are in 
downloading/installing dependencies.

For instance, old versions of Impala may fail to compile since cdh components 
of old GBNs on S3 are removed. However, the artifacts of cdh component are only 
used in testing (minicluster & holding testdata). We can still compile without 
them.

Take pip dependencies as another example, here is a failure I got from a 
community user. It failed by installing pywebhdfs:

!pywebhdfs_failure.png!

However, simple git-grep shows that pywebhdfs is only used in tests:
{code:bash}
$ git grep pywebhdfs
bin/bootstrap_system.sh:#  >>> from pywebhdfs.webhdfs import PyWebHdfsClient
infra/python/deps/requirements.txt:pywebhdfs == 0.3.2
tests/common/impala_test_suite.py:    #     HDFS: uses a mixture of pywebhdfs 
(which is faster than the HDFS CLI) and the
tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, errors, 
_raise_pywebhdfs_exception
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text)
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text)
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text)
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text) {code}
If the user just wants to compile Impala and deploys it in their existing 
Hadoop cluster, dealing with these failures is a waste of their time.

*Target for this JIRA*
 * Provide compile-only option to bin/bootstrap_system.sh. It should skip 
downloading/installing unused dependencies like postgresql.
 * Provide compile-only option to buildall.sh. It should skip downloading 
unused cdh/cdp components in compilation.
 * Update our 
[wiki|https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala] about 
this.

Note that we already have some env vars to control the download behaviors, e.g. 
SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the 
compile-only scenario works with minimal requirements and document it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10848) Provide compile-only option to skip downloading test dependencies

2021-08-09 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-10848:
---

 Summary: Provide compile-only option to skip downloading test 
dependencies
 Key: IMPALA-10848
 URL: https://issues.apache.org/jira/browse/IMPALA-10848
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Reporter: Quanlong Huang
 Attachments: pywebhdfs_failure.png

Compiling Impala is not easy for a beginner. A portion of failures are in 
downloading/installing dependencies.

For instance, old versions of Impala may fail to compile since cdh components 
of old GBNs on S3 are removed. However, the artifacts of cdh component are only 
used in testing (minicluster & holding testdata). We can still compile without 
them.

Take pip dependencies as another example, here is a failure I got from a 
community user. It failed by installing pywebhdfs:

!pywebhdfs_failure.png!

However, simple git-grep shows that pywebhdfs is only used in tests:
{code:bash}
$ git grep pywebhdfs
bin/bootstrap_system.sh:#  >>> from pywebhdfs.webhdfs import PyWebHdfsClient
infra/python/deps/requirements.txt:pywebhdfs == 0.3.2
tests/common/impala_test_suite.py:    #     HDFS: uses a mixture of pywebhdfs 
(which is faster than the HDFS CLI) and the
tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, errors, 
_raise_pywebhdfs_exception
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text)
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text)
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text)
tests/util/hdfs_util.py:      _raise_pywebhdfs_exception(response.status_code, 
response.text) {code}
If the user just wants to compile Impala and deploys it in their existing 
Hadoop cluster, dealing with these failures is a waste of their time.

*Target for this JIRA*
 * Provide compile-only option to bin/bootstrap_system.sh. It should skip 
downloading/installing unused dependencies like postgresql.
 * Provide compile-only option to buildall.sh. It should skip downloading 
unused cdh/cdp components in compilation.
 * Update our 
[wiki|https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala] about 
this.

Note that we already have some env vars to control the download behaviors, e.g. 
SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the 
compile-only scenario works with minimal requirements and document it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7507) Clean up user-facing error messages in LocalCatalog mode

2021-08-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-7507:
--

Assignee: Quanlong Huang

> Clean up user-facing error messages in LocalCatalog mode
> 
>
> Key: IMPALA-7507
> URL: https://issues.apache.org/jira/browse/IMPALA-7507
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Todd Lipcon
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-v2
>
> Currently even normal error messages for things like missing databases are 
> quite ugly when running with LocalCatalog:
> {code}
> ERROR: LocalCatalogException: Could not load table names for database 
> 'test_minimal_topic_updates_b246004e' from HMS
> CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[CatalogException: Database not found: 
> test_minimal_topic_updates_b246004e]))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7535) CatalogdMetaProvider should fetch incremental stats data on-demand

2021-08-09 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-7535:
--

Assignee: Quanlong Huang

> CatalogdMetaProvider should fetch incremental stats data on-demand
> --
>
> Key: IMPALA-7535
> URL: https://issues.apache.org/jira/browse/IMPALA-7535
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Reporter: Todd Lipcon
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: catalog-v2
>
> Currently when CatalogdMetaProvider fetches and caches a partition, it 
> includes incremental stats for that partition. We should fix it to only fetch 
> them when necessary for COMPUTE STATS statements, and either cache them 
> separately or not at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org