[jira] [Updated] (IMPALA-6656) Metrics for time spent in BufferAllocator
[ https://issues.apache.org/jira/browse/IMPALA-6656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-6656: -- Description: We should track the total time spent and the time spent in TCMalloc so we can understand where time is going globally. I think we should shard them by CurrentCore() to avoid contention and get more granular metrics. We want a timer for the amount of time spent in SystemAllocator. We probably also want counters for how many times we go down each code path in BufferAllocator::AllocateInternal() (i.e. getting a hit immediately in the local area, evicting a clean page, etc down to doing a full locked scavenge). was: We should track the total time spent and the time spent in TCMalloc so we can understand where time is going globally. I think we should shard these metrics across the arenas so we can see if the problem is just per-arena, and also to avoid contention between threads when updating the metrics. > Metrics for time spent in BufferAllocator > - > > Key: IMPALA-6656 > URL: https://issues.apache.org/jira/browse/IMPALA-6656 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: observability, resource-management > > We should track the total time spent and the time spent in TCMalloc so we can > understand where time is going globally. > I think we should shard them by CurrentCore() to avoid contention and get > more granular metrics. We want a timer for the amount of time spent in > SystemAllocator. We probably also want counters for how many times we go down > each code path in BufferAllocator::AllocateInternal() (i.e. getting a hit > immediately in the local area, evicting a clean page, etc down to doing a > full locked scavenge). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7859) Nessus Scan find CGI Generic SQL Injection.
Donghui Xu created IMPALA-7859: -- Summary: Nessus Scan find CGI Generic SQL Injection. Key: IMPALA-7859 URL: https://issues.apache.org/jira/browse/IMPALA-7859 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 2.10.0 Reporter: Donghui Xu The nessus scan report shows that the 25000 port and the 25020 port contain the risk of SQL injection, as follows: + The following resources may be vulnerable to blind SQL injection : + The 'object_type' parameter of the /catalog_object CGI : /catalog_object?object_name=_impala_builtins_type=DATABASEzz_impa la_builtins_type=DATABASEyy How can I solve this problem? Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7858) run-workload.py should support Beeswax's LDAP authentication
[ https://issues.apache.org/jira/browse/IMPALA-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688853#comment-16688853 ] Jim Apple commented on IMPALA-7858: --- Patch for review: http://gerrit.cloudera.org:8080/11938 > run-workload.py should support Beeswax's LDAP authentication > > > Key: IMPALA-7858 > URL: https://issues.apache.org/jira/browse/IMPALA-7858 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 3.0 >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > {{impala-shell}} supports LDAP authentication with the {{\-\-user}} and > {{--ldap}} flags. {{run-workload.py}}, which can user beeswax, the same > interface as {{impala-shell}}, should support that, too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-7858) run-workload.py should support Beeswax's LDAP authentication
[ https://issues.apache.org/jira/browse/IMPALA-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7858 started by Jim Apple. - > run-workload.py should support Beeswax's LDAP authentication > > > Key: IMPALA-7858 > URL: https://issues.apache.org/jira/browse/IMPALA-7858 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 3.0 >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > {{impala-shell}} supports LDAP authentication with the {{\-\-user}} and > {{--ldap}} flags. {{run-workload.py}}, which can user beeswax, the same > interface as {{impala-shell}}, should support that, too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7858) run-workload.py should support Beeswax's LDAP authentication
[ https://issues.apache.org/jira/browse/IMPALA-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple updated IMPALA-7858: -- Description: {{impala-shell}} supports LDAP authentication with the {{\-\-user}} and {{--ldap}} flags. {{run-workload.py}}, which can user beeswax, the same interface as {{impala-shell}}, should support that, too. (was: {{impala-shell}} supports LDAP authentication with the {{--user}} and {{--ldap}} flags. {{run-workload.py}}, which can user beeswax, the same interface as {{impala-shell}}, should support that, too.) > run-workload.py should support Beeswax's LDAP authentication > > > Key: IMPALA-7858 > URL: https://issues.apache.org/jira/browse/IMPALA-7858 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 3.0 >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > {{impala-shell}} supports LDAP authentication with the {{\-\-user}} and > {{--ldap}} flags. {{run-workload.py}}, which can user beeswax, the same > interface as {{impala-shell}}, should support that, too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3531) Implement FK/PK "rely novalidate" constraints for better CBO
[ https://issues.apache.org/jira/browse/IMPALA-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Mantripragada reassigned IMPALA-3531: Assignee: Anurag Mantripragada > Implement FK/PK "rely novalidate" constraints for better CBO > > > Key: IMPALA-3531 > URL: https://issues.apache.org/jira/browse/IMPALA-3531 > Project: IMPALA > Issue Type: New Feature > Components: Catalog, Frontend, Perf Investigation >Affects Versions: Impala 2.5.0, Impala 2.6.0 > Environment: CDH >Reporter: Ruslan Dautkhanov >Assignee: Anurag Mantripragada >Priority: Minor > Labels: CBO, performance, ramp-up > > Oracle has "RELY NOVALIDATE" option for constraints.. Could be easier for > Hive to start with something like that for PK/FK constraints. So CBO has more > information for optimizations. It does not have to actually check if that > constraint is relationship is true; it can just "rely" on that constraint. > https://docs.oracle.com/database/121/SQLRF/clauses002.htm#sthref2289 > So it would be helpful with join cardinality estimates, and with cases like > IMPALA-2929. > https://docs.oracle.com/database/121/DWHSG/schemas.htm#DWHSG9053 > "Overview of Constraint States": > - Enforcement > - Validation > - Belief > So FK/PK with "rely novalidate" will have Enforcement disabled but > Belief = RELY as it is possible to do in Oracle and now in Hive (HIVE-13076). > It opens a lot of ways to do additional ways to optimize execution plans. > As exxplined in Tom Kyte's "Metadata matters" > http://www.peoug.org/wp-content/uploads/2009/12/MetadataMatters_PEOUG_Day2009_TKyte.pdf > pp.30 - "Tell us how the tables relate and we can remove them from the > plan...". > pp.35 - "Tell us how the tables relate and we have more access paths > available...". > Also it might be helpful when Impala is being integrated with Kudu as the > latter have to have a PK. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7858) run-workload.py should support Beeswax's LDAP authentication
Jim Apple created IMPALA-7858: - Summary: run-workload.py should support Beeswax's LDAP authentication Key: IMPALA-7858 URL: https://issues.apache.org/jira/browse/IMPALA-7858 Project: IMPALA Issue Type: Improvement Components: Infrastructure Affects Versions: Impala 3.0 Reporter: Jim Apple Assignee: Jim Apple {{impala-shell}} supports LDAP authentication with the {{--user}} and {{--ldap}} flags. {{run-workload.py}}, which can user beeswax, the same interface as {{impala-shell}}, should support that, too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7854) Slow ALTER TABLE and LOAD DATA statements for tables with large number of partitions
[ https://issues.apache.org/jira/browse/IMPALA-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688815#comment-16688815 ] vietn commented on IMPALA-7854: --- I don't think it is possible since I don't have the authority to upgrade our cluster. The LOAD DATA does similar to IMPALA-7330, but how about the ALTER TABLE? > Slow ALTER TABLE and LOAD DATA statements for tables with large number of > partitions > > > Key: IMPALA-7854 > URL: https://issues.apache.org/jira/browse/IMPALA-7854 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.12.0 > Environment: 14 Nodes > Table in question has 20 columns, 3 partition columns, and 57,475 partitions >Reporter: vietn >Priority: Critical > Labels: impala, performance > > ALTER TABLE and LOAD DATA statements take minutes (9 minutes for ALTER TABLE > and 6 minutes for LOAD DATA) for tables with a large number of partitions. > Our workaround was to use Hive to perform the LOAD DATA and then perform a > REFRESH PARTITION using Impala. > * 14 Nodes > * Table in question has 20 columns, 3 partition columns, and 57,475 > partitions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-6755) Impala Doc: Doc functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN()
[ https://issues.apache.org/jira/browse/IMPALA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-6755: Summary: Impala Doc: Doc functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN() (was: Impala 2.13 Doc: Doc functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN()) > Impala Doc: Doc functions PERCENTILE_DISC(), PERCENTILE_CONT(), and MEDIAN() > > > Key: IMPALA-6755 > URL: https://issues.apache.org/jira/browse/IMPALA-6755 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Affects Versions: Impala 2.13.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-5847) Some query options do not work as expected in .test files
[ https://issues.apache.org/jira/browse/IMPALA-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall reassigned IMPALA-5847: -- Assignee: Thomas Tauber-Marshall > Some query options do not work as expected in .test files > - > > Key: IMPALA-5847 > URL: https://issues.apache.org/jira/browse/IMPALA-5847 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Alexander Behm >Assignee: Thomas Tauber-Marshall >Priority: Minor > > We often use "set" in .test files to alter query options. Theoretically, a > "set" command should change the session-level query options and in most cases > a single .test file is executed from the same Impala session. However, for > some options using "set" within a query section does not seem to work. For > example, "num_nodes" does not work as expected as shown below. > PyTest: > {code} > import pytest > from tests.common.impala_test_suite import ImpalaTestSuite > class TestStringQueries(ImpalaTestSuite): > @classmethod > def get_workload(cls): > return 'functional-query' > def test_set_bug(self, vector): > self.run_test_case('QueryTest/set_bug', vector) > {code} > Corresponding .test file: > {code} > > QUERY > set num_nodes=1; > select count(*) from functional.alltypes; > select count(*) from functional.alltypes; > select count(*) from functional.alltypes; > RESULTS > 7300 > TYPES > BIGINT > > {code} > After running the test above, I validated that the 3 queries were run from > the same session, and that the queries run a distributed plan. The > "num_nodes" option was definitely not picked up. I am not sure which query > options are affected. In several .test files setting other query options does > seem to work as expected. > I suspect that the test framework might keep its own list of default query > options which get submitted together with the query, so the session-level > options are overridden on a per-request basis. For example, if I change the > pytest to remove the "num_nodes" dictionary entry, then the test works as > expected. > PyTest workaround: > {code} > import pytest > from tests.common.impala_test_suite import ImpalaTestSuite > class TestStringQueries(ImpalaTestSuite): > @classmethod > def get_workload(cls): > return 'functional-query' > def test_set_bug(self, vector): > # Workaround SET bug > vector.get_value('exec_option').pop('num_nodes', None) > self.run_test_case('QueryTest/set_bug', vector) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7856) test_exchange_mem_usage_scaling failing, not hitting expected OOM
[ https://issues.apache.org/jira/browse/IMPALA-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig reassigned IMPALA-7856: -- Assignee: Bikramjeet Vig > test_exchange_mem_usage_scaling failing, not hitting expected OOM > - > > Key: IMPALA-7856 > URL: https://issues.apache.org/jira/browse/IMPALA-7856 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.2.0 >Reporter: Bikramjeet Vig >Assignee: Bikramjeet Vig >Priority: Critical > Labels: broken-build, flaky-test > > {noformat} > query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling > self.run_test_case('QueryTest/exchange-mem-scaling', vector) > common/impala_test_suite.py:482: in run_test_case > assert False, "Expected exception: %s" % expected_str > E AssertionError: Expected exception: Memory limit exceeded > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7856) test_exchange_mem_usage_scaling failing, not hitting expected OOM
Bikramjeet Vig created IMPALA-7856: -- Summary: test_exchange_mem_usage_scaling failing, not hitting expected OOM Key: IMPALA-7856 URL: https://issues.apache.org/jira/browse/IMPALA-7856 Project: IMPALA Issue Type: Bug Affects Versions: Impala 3.2.0 Reporter: Bikramjeet Vig {noformat} query_test/test_mem_usage_scaling.py:386: in test_exchange_mem_usage_scaling self.run_test_case('QueryTest/exchange-mem-scaling', vector) common/impala_test_suite.py:482: in run_test_case assert False, "Expected exception: %s" % expected_str E AssertionError: Expected exception: Memory limit exceeded {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata
[ https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688669#comment-16688669 ] Tim Armstrong commented on IMPALA-7087: --- I don't have an objection. Usually I think we should err on the side of safety and not implicitly convert things, but I can see valid workflows where you'd want to reduce precision. What does Impala do for decimal columns in text tables with extra precision? > Impala is unable to read Parquet decimal columns with lower precision/scale > than table metadata > --- > > Key: IMPALA-7087 > URL: https://issues.apache.org/jira/browse/IMPALA-7087 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Major > Labels: decimal, parquet > > This is similar to IMPALA-2515, except relates to a different precision/scale > in the file metadata rather than just a mismatch in the bytes used to store > the data. In a lot of cases we should be able to convert the decimal type on > the fly to the higher-precision type. > {noformat} > ERROR: File '/hdfs/path/00_0_x_2' column 'alterd_decimal' has an invalid > type length. Expecting: 11 len in file: 8 > {noformat} > It would be convenient to allow reading parquet files where the > precision/scale in the file can be converted to the precision/scale in the > table metadata without loss of precision. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata
[ https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688645#comment-16688645 ] Sahil Takiar commented on IMPALA-7087: -- Do we want to tackle Parquet files that have a higher scale compared to the table (e.g. a Parquet file written with scale = 4 being loaded into a table with scale = 2)? It seems like this is a valid pattern in other databases. The returned values just have their least significant digits truncated. Here is what other SQL engines do: *Postgres:* Postgres is able to load data with a higher scale into a table with a lower scale. {code:java} postgres@stakiar-desktop:~$ printf "col1\n1.111" > /tmp/tmp.txt test=# create table dec_test (dec_col decimal(10,2)); test=# copy dec_test(dec_col) from '/tmp/tmp.txt' delimiter ',' csv header; test=# select * from dec_test; dec_col - 1.11 (1 row) {code} The data was written to {{/tmp/tmp.txt}} as {{1.111}}, but is returned by Postgres as {{1.11}}. *Hive:* Hive follows the same behavior as Postgres. {code:java} create table dec_test_high_scale (dec_col decimal(10,4)) stored as parquet; insert into table dec_test_high_scale values (1.); create table dec_test_low_scale (dec_col decimal(10,2)) stored as parquet location 'hdfs://[nn]:[port]/user/hive/warehouse/dec_test_high_scale'; select * from dec_test_low_scale; 1.11 {code} > Impala is unable to read Parquet decimal columns with lower precision/scale > than table metadata > --- > > Key: IMPALA-7087 > URL: https://issues.apache.org/jira/browse/IMPALA-7087 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Reporter: Tim Armstrong >Assignee: Sahil Takiar >Priority: Major > Labels: decimal, parquet > > This is similar to IMPALA-2515, except relates to a different precision/scale > in the file metadata rather than just a mismatch in the bytes used to store > the data. In a lot of cases we should be able to convert the decimal type on > the fly to the higher-precision type. > {noformat} > ERROR: File '/hdfs/path/00_0_x_2' column 'alterd_decimal' has an invalid > type length. Expecting: 11 len in file: 8 > {noformat} > It would be convenient to allow reading parquet files where the > precision/scale in the file can be converted to the precision/scale in the > table metadata without loss of precision. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7854) Slow ALTER TABLE and LOAD DATA statements for tables with large number of partitions
[ https://issues.apache.org/jira/browse/IMPALA-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688556#comment-16688556 ] bharath v commented on IMPALA-7854: --- Looks similar to IMPALA-7330. Could you try it out on the latest version if possible? > Slow ALTER TABLE and LOAD DATA statements for tables with large number of > partitions > > > Key: IMPALA-7854 > URL: https://issues.apache.org/jira/browse/IMPALA-7854 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.12.0 > Environment: 14 Nodes > Table in question has 20 columns, 3 partition columns, and 57,475 partitions >Reporter: vietn >Priority: Critical > Labels: impala, performance > > ALTER TABLE and LOAD DATA statements take minutes (9 minutes for ALTER TABLE > and 6 minutes for LOAD DATA) for tables with a large number of partitions. > Our workaround was to use Hive to perform the LOAD DATA and then perform a > REFRESH PARTITION using Impala. > * 14 Nodes > * Table in question has 20 columns, 3 partition columns, and 57,475 > partitions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7855) Excessive type widening leads to unnecessary casts
Paul Rogers created IMPALA-7855: --- Summary: Excessive type widening leads to unnecessary casts Key: IMPALA-7855 URL: https://issues.apache.org/jira/browse/IMPALA-7855 Project: IMPALA Issue Type: Improvement Components: Frontend Affects Versions: Impala 3.0 Reporter: Paul Rogers When writing unit tests, created the following query: {code:sql} with query1 (a, b) as ( select 1 + 1 + id, 2 + 3 + int_col from functional.alltypestiny) insert into functional.alltypestiny (id, int_col) partition (month = 5, year = 2018) select * from query1 {code} The above fails with the following error: {noformat} ERROR: AnalysisException: Possible loss of precision for target table 'functional.alltypestiny'. Expression 'query1.a' (type: BIGINT) would need to be cast to INT for column 'id' {noformat} The following does work (for planning, may not actually execute): {code:sql} with query1 (a, b) as ( select cast(1 + 1 + id as int), cast(2 + 3 + int_col as int) from functional.alltypestiny) insert into functional.alltypestiny (id, int_col) partition (month = 5, year = 2018) select * from query1 {code} What this says is the the planner selected type {{BIGINT}} for the (rewritten) expression {{2 + id}} where {{id}} is of type {{INT}}. {{BIGINT}} is a conservative guess: adding 2 to the largest {{INT}} could overflow and require a {{BIGINT}}. Yet, for such a simple case, such aggressive type promotion may be overly cautious. To verify that this is an issue, let's try something similar with Postgres to see if it is as aggressive. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7854) Slow ALTER TABLE and LOAD DATA statements for tables with large number of partitions
[ https://issues.apache.org/jira/browse/IMPALA-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vietn updated IMPALA-7854: -- Description: ALTER TABLE and LOAD DATA statements take minutes (9 minutes for ALTER TABLE and 6 minutes for LOAD DATA) for tables with a large number of partitions. Our workaround was to use Hive to perform the LOAD DATA and then perform a REFRESH PARTITION using Impala. * 14 Nodes * Table in question has 20 columns, 3 partition columns, and 57,475 partitions was: ALTER TABLE and LOAD DATA statements take minutes (9 minutes for ALTER TABLE and 6 minutes for LOAD DATA) for tables with a large number of partitions. Our workaround was to use Hive to perform the LOAD DATA and then perform a REFRESH PARTITION using Impala. > Slow ALTER TABLE and LOAD DATA statements for tables with large number of > partitions > > > Key: IMPALA-7854 > URL: https://issues.apache.org/jira/browse/IMPALA-7854 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.12.0 > Environment: 14 Nodes > Table in question has 20 columns, 3 partition columns, and 57,475 partitions >Reporter: vietn >Priority: Critical > Labels: impala, performance > > ALTER TABLE and LOAD DATA statements take minutes (9 minutes for ALTER TABLE > and 6 minutes for LOAD DATA) for tables with a large number of partitions. > Our workaround was to use Hive to perform the LOAD DATA and then perform a > REFRESH PARTITION using Impala. > * 14 Nodes > * Table in question has 20 columns, 3 partition columns, and 57,475 > partitions -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7854) Slow ALTER TABLE and LOAD DATA statements for tables with large number of partitions
Viet Nguyen created IMPALA-7854: --- Summary: Slow ALTER TABLE and LOAD DATA statements for tables with large number of partitions Key: IMPALA-7854 URL: https://issues.apache.org/jira/browse/IMPALA-7854 Project: IMPALA Issue Type: Improvement Components: Catalog Affects Versions: Impala 2.12.0 Environment: 14 Nodes Table in question has 20 columns, 3 partition columns, and 57,475 partitions Reporter: Viet Nguyen ALTER TABLE and LOAD DATA statements take minutes (9 minutes for ALTER TABLE and 6 minutes for LOAD DATA) for tables with a large number of partitions. Our workaround was to use Hive to perform the LOAD DATA and then perform a REFRESH PARTITION using Impala. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7837) SCAN_BYTES_LIMIT="100M" test failing to raise exception in release build
[ https://issues.apache.org/jira/browse/IMPALA-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig resolved IMPALA-7837. Resolution: Fixed Fix Version/s: Impala 3.2.0 > SCAN_BYTES_LIMIT="100M" test failing to raise exception in release build > > > Key: IMPALA-7837 > URL: https://issues.apache.org/jira/browse/IMPALA-7837 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Michael Brown >Assignee: Bikramjeet Vig >Priority: Blocker > Fix For: Impala 3.2.0 > > Attachments: impala-logs.tar.gz > > > This test is not raising the expected exception on a *release build*: > {noformat} > QUERY > # Query should fail due to exceeding scan bytes limit. > set SCAN_BYTES_LIMIT="100M"; > select count(*) from tpch.lineitem l1,tpch.lineitem l2, tpch.lineitem l3 where > l1.l_suppkey = l2.l_linenumber and l1.l_orderkey = l2.l_orderkey and > l1.l_orderkey = l3.l_orderkey group by l1.l_comment, l2.l_comment > having count(*) = 99 > CATCH > row_regex:.*terminated due to scan bytes limit of 100.00 M. > {noformat} > {noformat} > Stacktrace > query_test/test_resource_limits.py:46: in test_resource_limits > self.run_test_case('QueryTest/query-resource-limits', vector) > common/impala_test_suite.py:482: in run_test_case > assert False, "Expected exception: %s" % expected_str > E AssertionError: Expected exception: row_regex:.*terminated due to scan > bytes limit of 100.00 M.* > {noformat} > It fails deterministically in CI (3 times in a row). I can't find a query > profile matching the query ID for some reason, but I've attached logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7837) SCAN_BYTES_LIMIT="100M" test failing to raise exception in release build
[ https://issues.apache.org/jira/browse/IMPALA-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688505#comment-16688505 ] ASF subversion and git services commented on IMPALA-7837: - Commit 0d0356c9329bf0cf0e7c69dee42f2b8b1315ec05 in impala's branch refs/heads/master from [~bikram.sngh91] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=0d0356c ] IMPALA-7837: Fix flakiness in test_resource_limits for release builds test_resource_limits was failing in release build because the queries used were finishing earlier than expected. This resulted in fragment instances not being able to send enough updates to the coordinator in order to hit the limits used for the tests. This patches adds a deterministic sleep to the queries which gives enough time to the coordinator to catch up on reports. Testing: Checked that tests passed on release builds. Change-Id: I4a47391e52f3974db554dfc0d38139d3ee18a1b4 Reviewed-on: http://gerrit.cloudera.org:8080/11933 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > SCAN_BYTES_LIMIT="100M" test failing to raise exception in release build > > > Key: IMPALA-7837 > URL: https://issues.apache.org/jira/browse/IMPALA-7837 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Michael Brown >Assignee: Bikramjeet Vig >Priority: Blocker > Attachments: impala-logs.tar.gz > > > This test is not raising the expected exception on a *release build*: > {noformat} > QUERY > # Query should fail due to exceeding scan bytes limit. > set SCAN_BYTES_LIMIT="100M"; > select count(*) from tpch.lineitem l1,tpch.lineitem l2, tpch.lineitem l3 where > l1.l_suppkey = l2.l_linenumber and l1.l_orderkey = l2.l_orderkey and > l1.l_orderkey = l3.l_orderkey group by l1.l_comment, l2.l_comment > having count(*) = 99 > CATCH > row_regex:.*terminated due to scan bytes limit of 100.00 M. > {noformat} > {noformat} > Stacktrace > query_test/test_resource_limits.py:46: in test_resource_limits > self.run_test_case('QueryTest/query-resource-limits', vector) > common/impala_test_suite.py:482: in run_test_case > assert False, "Expected exception: %s" % expected_str > E AssertionError: Expected exception: row_regex:.*terminated due to scan > bytes limit of 100.00 M.* > {noformat} > It fails deterministically in CI (3 times in a row). I can't find a query > profile matching the query ID for some reason, but I've attached logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7725) Impala Doc: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner
[ https://issues.apache.org/jira/browse/IMPALA-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-7725: Labels: future_release_doc in_32 (was: future_release_doc) > Impala Doc: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the > parquet scanner > > > Key: IMPALA-7725 > URL: https://issues.apache.org/jira/browse/IMPALA-7725 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_32 > > Add to doc that CREATE TABLE LIKE PARQUET still interprets these columns as > BIGINT. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-6924) Compute stats profiles should include reference to child queries
[ https://issues.apache.org/jira/browse/IMPALA-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall reassigned IMPALA-6924: -- Assignee: Thomas Tauber-Marshall > Compute stats profiles should include reference to child queries > > > Key: IMPALA-6924 > URL: https://issues.apache.org/jira/browse/IMPALA-6924 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Tim Armstrong >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: observability, supportability > > "Compute stats" queries spawn off child queries that do most of the work. > It's non-trivial to track down the child queries and get their profiles if > something goes wrong. We really should have, at a minimum, the query IDs of > the child queries in the parent's profile and vice-versa. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7853) Add support to read int64 NANO timestamps to the parquet scanner
[ https://issues.apache.org/jira/browse/IMPALA-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer updated IMPALA-7853: Description: PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be only described with new logical types in Parquet, it has no converted type equivalent. was: PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be be only described with new logical types in Parquet, it has no converted type equivalent. > Add support to read int64 NANO timestamps to the parquet scanner > > > Key: IMPALA-7853 > URL: https://issues.apache.org/jira/browse/IMPALA-7853 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > > PARQUET-1387 added int64 timestamps with nanosecond precision. > As 64 bits are not enough to represent the whole 1400.. range of Impala > timestamps, this new new type works with a limited range: > 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC > The benefit of the reduced range is that no validation is necessary during > scanning, as every possible 64 bit value represents a valid timestamp in > Impala. This may mean that this has the potential be the fastest way to store > timestamps in Impala + Parquet. > Another way NANO differs from MICRO and MILLI is that NANO can be only > described with new logical types in Parquet, it has no converted type > equivalent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7853) Add support to read int64 NANO timestamps to the parquet scanner
[ https://issues.apache.org/jira/browse/IMPALA-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer updated IMPALA-7853: Description: PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be be only described with new logical types in Parquet, it has no converted type equivalent. was: PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, so this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps. Another way NANO differs from MICRO and MILLI is that NANO can be be only described with new logical types in Parquet, it has no converted type equivalent. > Add support to read int64 NANO timestamps to the parquet scanner > > > Key: IMPALA-7853 > URL: https://issues.apache.org/jira/browse/IMPALA-7853 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > > PARQUET-1387 added int64 timestamps with nanosecond precision. > As 64 bits are not enough to represent the whole 1400.. range of Impala > timestamps, this new new type works with a limited range: > 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC > The benefit of the reduced range is that no validation is necessary during > scanning, as every possible 64 bit value represents a valid timestamp in > Impala. This may mean that this has the potential be the fastest way to store > timestamps in Impala + Parquet. > Another way NANO differs from MICRO and MILLI is that NANO can be be only > described with new logical types in Parquet, it has no converted type > equivalent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7853) Add support to read int64 NANO timestamps to the parquet scanner
[ https://issues.apache.org/jira/browse/IMPALA-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer updated IMPALA-7853: Description: PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, so this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps. Another way NANO differs from MICRO and MILLI is that NANO can be be only described with new logical types in Parquet, it has no converted type equivalent. was: PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, so this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps. > Add support to read int64 NANO timestamps to the parquet scanner > > > Key: IMPALA-7853 > URL: https://issues.apache.org/jira/browse/IMPALA-7853 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > > PARQUET-1387 added int64 timestamps with nanosecond precision. > As 64 bits are not enough to represent the whole 1400.. range of Impala > timestamps, so this new new type works with a limited range: > 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC > The benefit of the reduced range is that no validation is necessary during > scanning, as every possible 64 bit value represents a valid timestamp in > Impala. This may mean that this has the potential be the fastest way to store > timestamps. > Another way NANO differs from MICRO and MILLI is that NANO can be be only > described with new logical types in Parquet, it has no converted type > equivalent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-7853) Add support to read int64 NANO timestamps to the parquet scanner
[ https://issues.apache.org/jira/browse/IMPALA-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-7853 started by Csaba Ringhofer. --- > Add support to read int64 NANO timestamps to the parquet scanner > > > Key: IMPALA-7853 > URL: https://issues.apache.org/jira/browse/IMPALA-7853 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Major > > PARQUET-1387 added int64 timestamps with nanosecond precision. > As 64 bits are not enough to represent the whole 1400.. range of Impala > timestamps, so this new new type works with a limited range: > 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC > The benefit of the reduced range is that no validation is necessary during > scanning, as every possible 64 bit value represents a valid timestamp in > Impala. This may mean that this has the potential be the fastest way to store > timestamps. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-7853) Add support to read int64 NANO timestamps to the parquet scanner
Csaba Ringhofer created IMPALA-7853: --- Summary: Add support to read int64 NANO timestamps to the parquet scanner Key: IMPALA-7853 URL: https://issues.apache.org/jira/browse/IMPALA-7853 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Csaba Ringhofer Assignee: Csaba Ringhofer PARQUET-1387 added int64 timestamps with nanosecond precision. As 64 bits are not enough to represent the whole 1400.. range of Impala timestamps, so this new new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-5050) Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner
[ https://issues.apache.org/jira/browse/IMPALA-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-5050. - Resolution: Implemented Fix Version/s: Impala 3.2.0 > Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet > scanner > > > Key: IMPALA-5050 > URL: https://issues.apache.org/jira/browse/IMPALA-5050 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Impala 2.9.0 >Reporter: Lars Volker >Assignee: Csaba Ringhofer >Priority: Major > Fix For: Impala 3.2.0 > > > This requires updating {{parquet.thrift}} to a version that includes the > {{TIMESTAMP_MICROS}} logical type. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org