[jira] [Created] (IMPALA-8949) PlannerTest differences when running on S3 vs HDFS
Sahil Takiar created IMPALA-8949: Summary: PlannerTest differences when running on S3 vs HDFS Key: IMPALA-8949 URL: https://issues.apache.org/jira/browse/IMPALA-8949 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Sahil Takiar While re-enabling the {{S3PlannerTest}} in IMPALA-8944, there are several tests that are consistently failing due to actual diffs in the explain plan: * org.apache.impala.planner.S3PlannerTest.testTpcds * org.apache.impala.planner.S3PlannerTest.testTpch * org.apache.impala.planner.S3PlannerTest.testJoinOrder * org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite All are failing for non-trivial reasons - e.g. differences in memory estimates, join orders, etc. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8903. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: TRUNCATE for Insert-only ACID tables > > > Key: IMPALA-8903 > URL: https://issues.apache.org/jira/browse/IMPALA-8903 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > Fix For: Impala 3.4.0 > > > https://gerrit.cloudera.org/#/c/14235/ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IMPALA-8950) Add -d and -f option to copyFromLocal and re-enable disabled S3 tests
Sahil Takiar created IMPALA-8950: Summary: Add -d and -f option to copyFromLocal and re-enable disabled S3 tests Key: IMPALA-8950 URL: https://issues.apache.org/jira/browse/IMPALA-8950 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Sahil Takiar The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the destination if it already exists". By using the {{-d}} option, copies to S3 avoid the additional overhead of copying data to a tmp file and then renaming the file. The {{-f}} option overwrites the file if it exists, which should be safe since tests should be writing to unique directories anyway. With HADOOP-16490, {{create(overwrite=true)}} avoids issuing a HEAD request on the path, which prevents any cached 404s on the S3 key. After these changes, the tests disabled by IMPALA-8189 can be re-enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions
[ https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931736#comment-16931736 ] Guillem commented on IMPALA-8946: - Thanks for the understanding and sorry for the overhead this may cause. [^0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch] I have attached the patch on this comment. Thanks again! > Prometheus histograms do not follow conventions > --- > > Key: IMPALA-8946 > URL: https://issues.apache.org/jira/browse/IMPALA-8946 > Project: IMPALA > Issue Type: Bug >Reporter: Guillem >Assignee: Guillem >Priority: Minor > Attachments: > 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch > > > We've been using Prometheus metrics and we've found that some standard > Prometheus parser can not properly interpret histograms from Impala. > For example, Python official client > ([https://github.com/prometheus/client_python)] can not properly read them. > I've been digging a little bit why it can't read them and I've found that > Impala does not adhere to textual histogram conventions. > The following link describes the conventions for rendering histograms on > Prometheus textual format: > [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries] > This is an example of a rendered histogram on Impala 3.3 on Prometheus > endpoint: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_max 0 > impala_thrift_server_backend_svc_thread_wait_time_min 0 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > The linked histogram conventions say that > {quote}Each bucket count of a histogram named x is given as a separate sample > line with the name x_bucket and a label \{le="y"} (where y is the upper bound > of the bucket). > {quote} > And also > {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be > identical to the value of x_count. > {quote} > The previous example should be formatted as: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > I've found that with this format, the official python client is able to > properly read the histograms. > Note also that metrics suffixed with `_min` and `_max` are also out of the > convention and they also break histogram parsing and maybe they need to be > reported as separated metrics (maybe as gauges?) > If you are fine with doing this changes, I already have a patch to improve > the histogram formatting and I can submit it to review. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8944) Update and re-enable S3PlannerTest
[ https://issues.apache.org/jira/browse/IMPALA-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931788#comment-16931788 ] Sahil Takiar commented on IMPALA-8944: -- I've got 14 of the original 17 unit tests in {{S3PlannerTest}} working. Instead of relying on a test filter in {{run-all-tests.sh}}, I decided to use JUnit Categories and Maven profiles to select the tests to run (achieves a similar affect as TestNG Groups). I think it is a more robust and straightforward way of running tests. Now any fe/ tests that should be run for S3 can simply be tagged with the Java annotation {{@Category(S3Tests.class)}}. The failing {{S3PlannerTest}}-s are: * org.apache.impala.planner.S3PlannerTest.testTpcds * org.apache.impala.planner.S3PlannerTest.testTpch * org.apache.impala.planner.S3PlannerTest.testJoinOrder * org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite All are failing for non-trivial reasons - e.g. actual differences in the explain plans when running on S3 vs. HDFS data (e.g. differences in memory estimates, join orders, etc.). I've opened IMPALA-8949 to investigate this. > Update and re-enable S3PlannerTest > -- > > Key: IMPALA-8944 > URL: https://issues.apache.org/jira/browse/IMPALA-8944 > Project: IMPALA > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. > When run against a HDFS mini-cluster, they are skipped because the > {{TARGET_FILESYSTEM}} is not S3. On our S3 jobs, they don't run either > because we skip all fe/ tests (most of them don't work against S3 / assume > they are running on HDFS). > A few things need to be fixed to get this working: > * The test cases in {{S3PlannerTest}} need to be fixed > * The Jenkins jobs that runs the S3 tests needs the ability to run specific > fe/ tests (e.g. just the {{S3PlannerTest}} and to skip the rest) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8949) PlannerTest differences when running on S3 vs HDFS
Sahil Takiar created IMPALA-8949: Summary: PlannerTest differences when running on S3 vs HDFS Key: IMPALA-8949 URL: https://issues.apache.org/jira/browse/IMPALA-8949 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Sahil Takiar While re-enabling the {{S3PlannerTest}} in IMPALA-8944, there are several tests that are consistently failing due to actual diffs in the explain plan: * org.apache.impala.planner.S3PlannerTest.testTpcds * org.apache.impala.planner.S3PlannerTest.testTpch * org.apache.impala.planner.S3PlannerTest.testJoinOrder * org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite All are failing for non-trivial reasons - e.g. differences in memory estimates, join orders, etc. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8946) Prometheus histograms do not follow conventions
[ https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guillem updated IMPALA-8946: Attachment: 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch > Prometheus histograms do not follow conventions > --- > > Key: IMPALA-8946 > URL: https://issues.apache.org/jira/browse/IMPALA-8946 > Project: IMPALA > Issue Type: Bug >Reporter: Guillem >Assignee: Guillem >Priority: Minor > Attachments: > 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch > > > We've been using Prometheus metrics and we've found that some standard > Prometheus parser can not properly interpret histograms from Impala. > For example, Python official client > ([https://github.com/prometheus/client_python)] can not properly read them. > I've been digging a little bit why it can't read them and I've found that > Impala does not adhere to textual histogram conventions. > The following link describes the conventions for rendering histograms on > Prometheus textual format: > [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries] > This is an example of a rendered histogram on Impala 3.3 on Prometheus > endpoint: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_max 0 > impala_thrift_server_backend_svc_thread_wait_time_min 0 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > The linked histogram conventions say that > {quote}Each bucket count of a histogram named x is given as a separate sample > line with the name x_bucket and a label \{le="y"} (where y is the upper bound > of the bucket). > {quote} > And also > {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be > identical to the value of x_count. > {quote} > The previous example should be formatted as: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > I've found that with this format, the official python client is able to > properly read the histograms. > Note also that metrics suffixed with `_min` and `_max` are also out of the > convention and they also break histogram parsing and maybe they need to be > reported as separated metrics (maybe as gauges?) > If you are fine with doing this changes, I already have a patch to improve > the histogram formatting and I can submit it to review. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions
[ https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931722#comment-16931722 ] Tim Armstrong commented on IMPALA-8946: --- [~guillemnieto] hey guillem, I understand the concern. I'm going to check with Cloudera's gerrit admins to see if we can do anything about it. You should feel free to attach the patch to this JIRA (i.e. run "git format-patch -1"). It's not ideal but should work fine. We can iterate on the review here (I might upload it to gerrit myself just for convenience of reviewing and commenting). > Prometheus histograms do not follow conventions > --- > > Key: IMPALA-8946 > URL: https://issues.apache.org/jira/browse/IMPALA-8946 > Project: IMPALA > Issue Type: Bug >Reporter: Guillem >Assignee: Guillem >Priority: Minor > > We've been using Prometheus metrics and we've found that some standard > Prometheus parser can not properly interpret histograms from Impala. > For example, Python official client > ([https://github.com/prometheus/client_python)] can not properly read them. > I've been digging a little bit why it can't read them and I've found that > Impala does not adhere to textual histogram conventions. > The following link describes the conventions for rendering histograms on > Prometheus textual format: > [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries] > This is an example of a rendered histogram on Impala 3.3 on Prometheus > endpoint: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_max 0 > impala_thrift_server_backend_svc_thread_wait_time_min 0 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > The linked histogram conventions say that > {quote}Each bucket count of a histogram named x is given as a separate sample > line with the name x_bucket and a label \{le="y"} (where y is the upper bound > of the bucket). > {quote} > And also > {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be > identical to the value of x_count. > {quote} > The previous example should be formatted as: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > I've found that with this format, the official python client is able to > properly read the histograms. > Note also that metrics suffixed with `_min` and `_max` are also out of the > convention and they also break histogram parsing and maybe they need to be > reported as separated metrics (maybe as gauges?) > If you are fine with doing this changes, I already have a patch to improve > the histogram formatting and I can submit it to review. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7322) Add storage wait time to profile for operations with metadata load
[ https://issues.apache.org/jira/browse/IMPALA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931781#comment-16931781 ] ASF subversion and git services commented on IMPALA-7322: - Commit 7136e8b965bd0df974dccd1419ea65d42c494c06 in impala's branch refs/heads/master from Yongzhi Chen [ https://gitbox.apache.org/repos/asf?p=impala.git;h=7136e8b ] IMPALA-7322: Add storage wait time to profile Add metrics to record storage wait time for operations with metadata load in catalog for hdfs, kudu and hbase tables. Pass storage wait time from catalog to fe through thrift and log total storage load time in query profile. Storage-load-time is the amount of time spent loading metadata from the underlying storage layer (e.g. S3, HDFS, Kudu, HBase), which does not include the amount of time spending loading data from HMS. Testing: * Ran queries that can trigger all of, none of or some of the related tables loading. * Check query profile for each query. * Check catalog metrics for each table. * Add unit tests to test_observability.py * Ran all core tests. Sample output: Profile for Catalog V1: (storage-load-time is the added property and it is part of Metadata load in Query Compilation): After ran a hbase query (Metadata load finished is divided into several lines because of limitation of commit message): Query Compilation: 4s401ms - Metadata load started: 661.084us (661.084us) - Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3 storage-load-time=233ms: 3s819ms (3s819ms) - Analysis finished: 3s820ms (763.979us) - Value transfer graph computed: 3s820ms (63.193us) Profile for Catalog V2: (StorageLoad.Time is the added property and it is in CatalogFetch): Frontend: - CatalogFetch.ColumnStats.Misses: 1 - CatalogFetch.ColumnStats.Requests: 1 - CatalogFetch.ColumnStats.Time: 0 - CatalogFetch.Config.Misses: 1 - CatalogFetch.Config.Requests: 1 - CatalogFetch.Config.Time: 3ms - CatalogFetch.DatabaseList.Hits: 1 - CatalogFetch.DatabaseList.Requests: 1 - CatalogFetch.DatabaseList.Time: 0 - CatalogFetch.PartitionLists.Misses: 1 - CatalogFetch.PartitionLists.Requests: 1 - CatalogFetch.PartitionLists.Time: 4ms - CatalogFetch.Partitions.Hits: 2 - CatalogFetch.Partitions.Misses: 1 - CatalogFetch.Partitions.Requests: 3 - CatalogFetch.Partitions.Time: 1ms - CatalogFetch.RPCs.Bytes: 1.01 KB (1036) - CatalogFetch.RPCs.Requests: 4 - CatalogFetch.RPCs.Time: 93ms - CatalogFetch.StorageLoad.Time: 68ms - CatalogFetch.TableNames.Hits: 2 - CatalogFetch.TableNames.Requests: 2 - CatalogFetch.TableNames.Time: 0 - CatalogFetch.Tables.Misses: 1 - CatalogFetch.Tables.Requests: 1 - CatalogFetch.Tables.Time: 91ms Catalog metrics(this sample is from a hdfs table): storage-metadata-load-duration: Count: 1 Mean rate: 0.0085 1 min. rate: 0.032 5 min. rate: 0.1386 15 min. rate: 0.177 Min (msec): 111 Max (msec): 111 Mean (msec): 111.1802 Median (msec): 111.1802 75th-% (msec): 111.1802 95th-% (msec): 111.1802 99th-% (msec): 111.1802 Change-Id: I7447f8c8e7e50eb71d18643859d2e3de865368d2 Reviewed-on: http://gerrit.cloudera.org:8080/13786 Tested-by: Impala Public Jenkins Reviewed-by: Sahil Takiar > Add storage wait time to profile for operations with metadata load > -- > > Key: IMPALA-7322 > URL: https://issues.apache.org/jira/browse/IMPALA-7322 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Balazs Jeszenszky >Assignee: Yongzhi Chen >Priority: Major > > The profile of a REFRESH or of the query triggering metadata load should > point out how much time was spent waiting for source systems. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931702#comment-16931702 ] ASF subversion and git services commented on IMPALA-8903: - Commit 05e1a4f2185c892226497b4592fe0eac3e2747d2 in impala's branch refs/heads/master from Alex Rodoni [ https://gitbox.apache.org/repos/asf?p=impala.git;h=05e1a4f ] IMPALA-8903: [DOCS] TRUNCATE is supported for Insert-only transactional tables Change-Id: Ib457775fab03d7f30430fd7dea6404dfcf0783a8 Reviewed-on: http://gerrit.cloudera.org:8080/14235 Tested-by: Impala Public Jenkins Reviewed-by: Zoltan Borok-Nagy > Impala Doc: TRUNCATE for Insert-only ACID tables > > > Key: IMPALA-8903 > URL: https://issues.apache.org/jira/browse/IMPALA-8903 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > > https://gerrit.cloudera.org/#/c/14235/ -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider
[ https://issues.apache.org/jira/browse/IMPALA-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931699#comment-16931699 ] ASF subversion and git services commented on IMPALA-8930: - Commit e74c294b133dc8fcf523b50ffa9cbb7986ca8790 in impala's branch refs/heads/master from Alex Rodoni [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e74c294 ] IMPALA-8930: [DOCS] Object ownership support when integrated with Ranger Change-Id: Ie4fdaf05953373c8d1870b7eface257830c7c6e5 Reviewed-on: http://gerrit.cloudera.org:8080/14229 Reviewed-by: Bharath Vissapragada Tested-by: Impala Public Jenkins > Impala Doc: Document object ownership with Ranger authorization provider > > > Key: IMPALA-8930 > URL: https://issues.apache.org/jira/browse/IMPALA-8930 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > Fix For: Impala 3.4.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs
[ https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931700#comment-16931700 ] ASF subversion and git services commented on IMPALA-8945: - Commit e38d57fe5d0266d0423a04b7f0b7350a3fd300f2 in impala's branch refs/heads/master from Alex Rodoni [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e38d57f ] IMPALA-8945: [DOCS] Fixed an incorrect example of DISTINT FROM and its equivalent Change-Id: I9bee4c0935ee21d70a0964507c477a2fccb1c7cc Reviewed-on: http://gerrit.cloudera.org:8080/14239 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong > Impala Doc: Incorrect Claim of Equivalence in Impala Docs > - > > Key: IMPALA-8945 > URL: https://issues.apache.org/jira/browse/IMPALA-8945 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Fix For: Impala 3.4.0 > > > Reported by [~icook] > The Impala docs entry for the IS DISTINCT FROM operator states: > The <=> operator, used like an equality operator in a join query, is more > efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The > <=> operator can use a hash join, while the OR expression cannot. > But this expression is not equivalent to A <=> B. See the attached screenshot > demonstrating their non-equivalence. An expression that is equivalent to A > <=> B is this: > (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B)) > This expression should replace the existing incorrect expression. > Another expression that is equivalent to A <=> B is: > if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B) > This one is a bit easier to follow. If you use this one in the docs, just > replace the following line with: > The <=> operator can use a hash join, while the if expression cannot. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
[ https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931701#comment-16931701 ] ASF subversion and git services commented on IMPALA-8947: - Commit 451d31f2b45d0bfdfab8d8110cc63b06a061849a in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=451d31f ] IMPALA-8947: scratch alloc error uses wrong metric Use the global metric instead of the per-FileGroup counter. Testing: Updated unit test to validate the error message in a case where there are two FileGroups. Change-Id: I2732dcd49c277d5d278fad68efa6ef381bc0eb81 Reviewed-on: http://gerrit.cloudera.org:8080/14236 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric > - > > Key: IMPALA-8947 > URL: https://issues.apache.org/jira/browse/IMPALA-8947 > Project: IMPALA > Issue Type: Bug >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Critical > Labels: supportability > Fix For: Impala 3.4.0 > > > {noformat} > ERROR: Could not create files in any configured scratch directories > (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of > scratch is currently in use by this Impala Daemon (69.80 GB by this query). > See logs for previous errors that may have prevented creating or writing > scratch files. The following directories were at capacity: /path/to/scratch > {noformat} > This issue is that the total for the impala daemon uses the wrong counter. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions
[ https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931574#comment-16931574 ] Guillem commented on IMPALA-8946: - Hi [~tarmstrong]! I was about to authorize Gerrit to access to GitHub, but I found that I've to grant read *and write* access to all of my public repositories on the following resources: {quote} This application will be able to read and write all public repository data. This includes the following: Code Issues Pull requests Wikis Settings Webhooks and services Deploy keys {quote} It also requires me to grant access to "organization, team membership, and private project boards" on some organizations. I won't allow Gerrit to access to all of that privileges. It's possible to reduce the scope of this authorization? Is there any alternative way to submit the patch? Thanks! > Prometheus histograms do not follow conventions > --- > > Key: IMPALA-8946 > URL: https://issues.apache.org/jira/browse/IMPALA-8946 > Project: IMPALA > Issue Type: Bug >Reporter: Guillem >Assignee: Guillem >Priority: Minor > > We've been using Prometheus metrics and we've found that some standard > Prometheus parser can not properly interpret histograms from Impala. > For example, Python official client > ([https://github.com/prometheus/client_python)] can not properly read them. > I've been digging a little bit why it can't read them and I've found that > Impala does not adhere to textual histogram conventions. > The following link describes the conventions for rendering histograms on > Prometheus textual format: > [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries] > This is an example of a rendered histogram on Impala 3.3 on Prometheus > endpoint: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_max 0 > impala_thrift_server_backend_svc_thread_wait_time_min 0 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > The linked histogram conventions say that > {quote}Each bucket count of a histogram named x is given as a separate sample > line with the name x_bucket and a label \{le="y"} (where y is the upper bound > of the bucket). > {quote} > And also > {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be > identical to the value of x_count. > {quote} > The previous example should be formatted as: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > I've found that with this format, the official python client is able to > properly read the histograms. > Note also that metrics suffixed with `_min` and `_max` are also out of the > convention and they also break histogram parsing and maybe they need to be > reported as separated metrics (maybe as gauges?) > If you are fine with doing this changes, I already have a patch to improve > the histogram formatting and I can submit it to review. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-8944) Update and re-enable S3PlannerTest
[ https://issues.apache.org/jira/browse/IMPALA-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931788#comment-16931788 ] Sahil Takiar edited comment on IMPALA-8944 at 9/17/19 8:20 PM: --- I've got 14 of the original 19 unit tests in {{S3PlannerTest}} working. Instead of relying on a test filter in {{run-all-tests.sh}}, I decided to use JUnit Categories and Maven profiles to select the tests to run (achieves a similar affect as TestNG Groups). I think it is a more robust and straightforward way of running tests. Now any fe/ tests that should be run for S3 can simply be tagged with the Java annotation {{@Category(S3Tests.class)}}. The failing {{S3PlannerTest}}-s are: * org.apache.impala.planner.S3PlannerTest.testTpcds * org.apache.impala.planner.S3PlannerTest.testTpch * org.apache.impala.planner.S3PlannerTest.testJoinOrder * org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite All are failing for non-trivial reasons - e.g. actual differences in the explain plans when running on S3 vs. HDFS data (e.g. differences in memory estimates, join orders, etc.). I've opened IMPALA-8949 to investigate this. {{testS3ScanRanges()}} is failing as well, but for other reasons described above. was (Author: stakiar): I've got 14 of the original 17 unit tests in {{S3PlannerTest}} working. Instead of relying on a test filter in {{run-all-tests.sh}}, I decided to use JUnit Categories and Maven profiles to select the tests to run (achieves a similar affect as TestNG Groups). I think it is a more robust and straightforward way of running tests. Now any fe/ tests that should be run for S3 can simply be tagged with the Java annotation {{@Category(S3Tests.class)}}. The failing {{S3PlannerTest}}-s are: * org.apache.impala.planner.S3PlannerTest.testTpcds * org.apache.impala.planner.S3PlannerTest.testTpch * org.apache.impala.planner.S3PlannerTest.testJoinOrder * org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite All are failing for non-trivial reasons - e.g. actual differences in the explain plans when running on S3 vs. HDFS data (e.g. differences in memory estimates, join orders, etc.). I've opened IMPALA-8949 to investigate this. > Update and re-enable S3PlannerTest > -- > > Key: IMPALA-8944 > URL: https://issues.apache.org/jira/browse/IMPALA-8944 > Project: IMPALA > Issue Type: Test >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. > When run against a HDFS mini-cluster, they are skipped because the > {{TARGET_FILESYSTEM}} is not S3. On our S3 jobs, they don't run either > because we skip all fe/ tests (most of them don't work against S3 / assume > they are running on HDFS). > A few things need to be fixed to get this working: > * The test cases in {{S3PlannerTest}} need to be fixed > * The Jenkins jobs that runs the S3 tests needs the ability to run specific > fe/ tests (e.g. just the {{S3PlannerTest}} and to skip the rest) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8634) Catalog client should be resilient to temporary Catalog outage
[ https://issues.apache.org/jira/browse/IMPALA-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8634 started by Sahil Takiar. > Catalog client should be resilient to temporary Catalog outage > -- > > Key: IMPALA-8634 > URL: https://issues.apache.org/jira/browse/IMPALA-8634 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.2.0 >Reporter: Michael Ho >Assignee: Sahil Takiar >Priority: Critical > > Currently, when the catalog server is down, catalog clients will fail all > RPCs sent to it. In essence, DDL queries will fail and the Impala service > becomes a lot less functional. Catalog clients should consider retrying > failed RPCs with some exponential backoff in between while catalog server is > being restarted after crashing. We probably need to add [a test > |https://github.com/apache/impala/blob/master/tests/custom_cluster/test_restart_services.py] > to exercise the paths of catalog restart to verify coordinators are > resilient to it. > cc'ing [~stakiar], [~joemcdonnell], [~twm378] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8942) Set file format specific values for split sizes on non-block stores
[ https://issues.apache.org/jira/browse/IMPALA-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8942 started by Sahil Takiar. > Set file format specific values for split sizes on non-block stores > --- > > Key: IMPALA-8942 > URL: https://issues.apache.org/jira/browse/IMPALA-8942 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Parquet scans on non-block based storage systems (e.g. S3, ADLS, etc.) can > suffer from uneven scan range assignment due to the behavior described in > IMPALA-3453. The frontend should set different split sizes depending on the > file type and file system. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-2515) Impala is unable to read a Parquet decimal column if size is larger than needed
[ https://issues.apache.org/jira/browse/IMPALA-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen reassigned IMPALA-2515: Assignee: Yongzhi Chen > Impala is unable to read a Parquet decimal column if size is larger than > needed > --- > > Key: IMPALA-2515 > URL: https://issues.apache.org/jira/browse/IMPALA-2515 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Taras Bobrovytsky >Assignee: Yongzhi Chen >Priority: Minor > Labels: ramp-up > > Impala cannot read this: > {code} > {"name": "tmp_1", > "type": "fixed", > "size": 8, > "logicalType": "decimal", > "precision": 10, > "scale": 5} > {code} > However, this can be read: > {code} > {"name": "tmp_1", > "type": "fixed", > "size": 5, > "logicalType": "decimal", > "precision": 10, > "scale": 5} > {code} > Size must be precisely set to this, or Impala is unable to read the decimal > column: > {code} > size = int(math.ceil((math.log(2, 10) + precision) / math.log(256, 10))) > {code} > There is nothing in the Parquet spec that says that Decimal columns must be > sized precisely. Arguably it's a bug in the writer if it's doing it, because > it's just wasting space. > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8903. --- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Impala Doc: TRUNCATE for Insert-only ACID tables > > > Key: IMPALA-8903 > URL: https://issues.apache.org/jira/browse/IMPALA-8903 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > Fix For: Impala 3.4.0 > > > https://gerrit.cloudera.org/#/c/14235/ -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8861) Impala Doc: Document Jaro-winkler edit distance and similarity built-in functions
[ https://issues.apache.org/jira/browse/IMPALA-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-8861: Description: https://gerrit.cloudera.org/#/c/14249/ > Impala Doc: Document Jaro-winkler edit distance and similarity built-in > functions > - > > Key: IMPALA-8861 > URL: https://issues.apache.org/jira/browse/IMPALA-8861 > Project: IMPALA > Issue Type: Sub-task >Affects Versions: Impala 3.3.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > > https://gerrit.cloudera.org/#/c/14249/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8861) Impala Doc: Document Jaro-winkler edit distance and similarity built-in functions
[ https://issues.apache.org/jira/browse/IMPALA-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8861 started by Alex Rodoni. --- > Impala Doc: Document Jaro-winkler edit distance and similarity built-in > functions > - > > Key: IMPALA-8861 > URL: https://issues.apache.org/jira/browse/IMPALA-8861 > Project: IMPALA > Issue Type: Sub-task >Affects Versions: Impala 3.3.0 >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_34 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8065) OSInfo produces somewhat misleading output when running in container
[ https://issues.apache.org/jira/browse/IMPALA-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman reassigned IMPALA-8065: -- Assignee: Andrew Sherman > OSInfo produces somewhat misleading output when running in container > > > Key: IMPALA-8065 > URL: https://issues.apache.org/jira/browse/IMPALA-8065 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Tim Armstrong >Assignee: Andrew Sherman >Priority: Minor > > It uses /proc/version, which returns the host version. It would be good to > also get the version from lsb-release from the Ubuntu container we're running > in and disambiguate on the debug page. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8065) OSInfo produces somewhat misleading output when running in container
[ https://issues.apache.org/jira/browse/IMPALA-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman reassigned IMPALA-8065: -- Assignee: Xiaomeng Zhang (was: Andrew Sherman) > OSInfo produces somewhat misleading output when running in container > > > Key: IMPALA-8065 > URL: https://issues.apache.org/jira/browse/IMPALA-8065 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Tim Armstrong >Assignee: Xiaomeng Zhang >Priority: Minor > > It uses /proc/version, which returns the host version. It would be good to > also get the version from lsb-release from the Ubuntu container we're running > in and disambiguate on the debug page. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7504) ParseKerberosPrincipal() should use krb5_parse_name() instead
[ https://issues.apache.org/jira/browse/IMPALA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Sherman reassigned IMPALA-7504: -- Assignee: Xiaomeng Zhang > ParseKerberosPrincipal() should use krb5_parse_name() instead > - > > Key: IMPALA-7504 > URL: https://issues.apache.org/jira/browse/IMPALA-7504 > Project: IMPALA > Issue Type: Improvement > Components: Security >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Michael Ho >Assignee: Xiaomeng Zhang >Priority: Minor > Labels: ramp-up > > [~tlipcon] pointed out during code review that we should be using > krb5_parse_name() to parse the principal instead of creating our own > bq. I wonder whether we should just be using krb5_parse_name here instead of > implementing our own parsing? According to > [http://web.mit.edu/kerberos/krb5-1.15/doc/appdev/refs/api/krb5_parse_name.html] > there are various escapings, etc, that this function isn't currently > supporting. > We currently do the following to parse the principal: > {noformat} > vector names; > split(names, principal, is_any_of("/")); > if (names.size() != 2) return Status(TErrorCode::BAD_PRINCIPAL_FORMAT, > principal); > *service_name = names[0]; > string remaining_principal = names[1]; > split(names, remaining_principal, is_any_of("@")); > if (names.size() != 2) return Status(TErrorCode::BAD_PRINCIPAL_FORMAT, > principal); > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8950) Add -d and -f option to copyFromLocal and re-enable disabled S3 tests
Sahil Takiar created IMPALA-8950: Summary: Add -d and -f option to copyFromLocal and re-enable disabled S3 tests Key: IMPALA-8950 URL: https://issues.apache.org/jira/browse/IMPALA-8950 Project: IMPALA Issue Type: Test Reporter: Sahil Takiar Assignee: Sahil Takiar The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the destination if it already exists". By using the {{-d}} option, copies to S3 avoid the additional overhead of copying data to a tmp file and then renaming the file. The {{-f}} option overwrites the file if it exists, which should be safe since tests should be writing to unique directories anyway. With HADOOP-16490, {{create(overwrite=true)}} avoids issuing a HEAD request on the path, which prevents any cached 404s on the S3 key. After these changes, the tests disabled by IMPALA-8189 can be re-enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions
[ https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931482#comment-16931482 ] Guillem commented on IMPALA-8946: - Yes, I already read the contributing guidelines (but maybe I need some help on the following days). I've been working on this, but I have some doubt regarding what's the meaning of an histogram on Impala. After reading the differences between summaries and histograms meaning on Prometheus (https://prometheus.io/docs/practices/histograms/) I got the impression that `summary` is the type that Impala should render (instead of `histogram`). So, given the example on the issue, I think that the rendered output should be: {code} # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time clients of Impala Backend Server spent waiting for service threads # TYPE impala_thrift_server_backend_svc_thread_wait_time summary impala_thrift_server_backend_svc_thread_wait_time{quantile="0.2"} 0 impala_thrift_server_backend_svc_thread_wait_time{quantile="0.5"} 0 impala_thrift_server_backend_svc_thread_wait_time{quantile="0.7"} 0 impala_thrift_server_backend_svc_thread_wait_time{quantile="0.9"} 0 impala_thrift_server_backend_svc_thread_wait_time{quantile="0.95"} 0 impala_thrift_server_backend_svc_thread_wait_time{quantile="0.999"} 0 impala_thrift_server_backend_svc_thread_wait_time_count 49 {code} > Prometheus histograms do not follow conventions > --- > > Key: IMPALA-8946 > URL: https://issues.apache.org/jira/browse/IMPALA-8946 > Project: IMPALA > Issue Type: Bug >Reporter: Guillem >Assignee: Guillem >Priority: Minor > > We've been using Prometheus metrics and we've found that some standard > Prometheus parser can not properly interpret histograms from Impala. > For example, Python official client > ([https://github.com/prometheus/client_python)] can not properly read them. > I've been digging a little bit why it can't read them and I've found that > Impala does not adhere to textual histogram conventions. > The following link describes the conventions for rendering histograms on > Prometheus textual format: > [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries] > This is an example of a rendered histogram on Impala 3.3 on Prometheus > endpoint: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_max 0 > impala_thrift_server_backend_svc_thread_wait_time_min 0 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > The linked histogram conventions say that > {quote}Each bucket count of a histogram named x is given as a separate sample > line with the name x_bucket and a label \{le="y"} (where y is the upper bound > of the bucket). > {quote} > And also > {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be > identical to the value of x_count. > {quote} > The previous example should be formatted as: > {code:java} > # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time > clients of Impala Backend Server spent waiting for service threads > # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0 > impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49 > impala_thrift_server_backend_svc_thread_wait_time_count 49 > {code} > I've found that with this format, the official python client is able to > properly read the histograms. > Note also that metrics suffixed with `_min` and `_max` are also out of the > convention and they also break histogram parsing and maybe they need to be > reported as separated metrics (maybe as gauges?) > If you are fine with doing this changes, I already have a patch to improve > the histogram formatting and I can submit it to review. -- This message was sent by Atlassian