[jira] [Created] (IMPALA-8949) PlannerTest differences when running on S3 vs HDFS

2019-09-17 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8949:


 Summary: PlannerTest differences when running on S3 vs HDFS
 Key: IMPALA-8949
 URL: https://issues.apache.org/jira/browse/IMPALA-8949
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Sahil Takiar


While re-enabling the {{S3PlannerTest}} in IMPALA-8944, there are several tests 
that are consistently failing due to actual diffs in the explain plan:
* org.apache.impala.planner.S3PlannerTest.testTpcds
* org.apache.impala.planner.S3PlannerTest.testTpch
* org.apache.impala.planner.S3PlannerTest.testJoinOrder
* org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite

All are failing for non-trivial reasons - e.g. differences in memory estimates, 
join orders, etc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables

2019-09-17 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8903.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala Doc: TRUNCATE for Insert-only ACID tables
> 
>
> Key: IMPALA-8903
> URL: https://issues.apache.org/jira/browse/IMPALA-8903
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
> Fix For: Impala 3.4.0
>
>
> https://gerrit.cloudera.org/#/c/14235/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8950) Add -d and -f option to copyFromLocal and re-enable disabled S3 tests

2019-09-17 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8950:


 Summary: Add -d and -f option to copyFromLocal and re-enable 
disabled S3 tests
 Key: IMPALA-8950
 URL: https://issues.apache.org/jira/browse/IMPALA-8950
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of 
temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the 
destination if it already exists".

By using the {{-d}} option, copies to S3 avoid the additional overhead of 
copying data to a tmp file and then renaming the file. The {{-f}} option 
overwrites the file if it exists, which should be safe since tests should be 
writing to unique directories anyway. With HADOOP-16490, 
{{create(overwrite=true)}} avoids issuing a HEAD request on the path, which 
prevents any cached 404s on the S3 key.

After these changes, the tests disabled by IMPALA-8189 can be re-enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions

2019-09-17 Thread Guillem (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931736#comment-16931736
 ] 

Guillem commented on IMPALA-8946:
-

Thanks for the understanding and sorry for the overhead this may cause.

 [^0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch] 

I have attached the patch on this comment.

Thanks again!

> Prometheus histograms do not follow conventions
> ---
>
> Key: IMPALA-8946
> URL: https://issues.apache.org/jira/browse/IMPALA-8946
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Guillem
>Assignee: Guillem
>Priority: Minor
> Attachments: 
> 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch
>
>
> We've been using Prometheus metrics and we've found that some standard 
> Prometheus parser can not properly interpret histograms from Impala.
> For example, Python official client 
> ([https://github.com/prometheus/client_python)] can not properly read them. 
> I've been digging a little bit why it can't read them and I've found that 
> Impala does not adhere to textual histogram conventions.
> The following link describes the conventions for rendering histograms on 
> Prometheus textual format: 
> [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries]
> This is an example of a rendered histogram on Impala 3.3 on Prometheus 
> endpoint:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_max 0
> impala_thrift_server_backend_svc_thread_wait_time_min 0
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> The linked histogram conventions say that
> {quote}Each bucket count of a histogram named x is given as a separate sample 
> line with the name x_bucket and a label \{le="y"} (where y is the upper bound 
> of the bucket).
> {quote}
> And also
> {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be 
> identical to the value of x_count.
> {quote}
> The previous example should be formatted as:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> I've found that with this format, the official python client is able to 
> properly read the histograms.
> Note also that metrics suffixed with `_min` and `_max` are also out of the 
> convention and they also break histogram parsing and maybe they need to be 
> reported as separated metrics (maybe as gauges?)
> If you are fine with doing this changes, I already have a patch to improve 
> the histogram formatting and I can submit it to review.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8944) Update and re-enable S3PlannerTest

2019-09-17 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931788#comment-16931788
 ] 

Sahil Takiar commented on IMPALA-8944:
--

I've got 14 of the original 17 unit tests in {{S3PlannerTest}} working. Instead 
of relying on a test filter in {{run-all-tests.sh}}, I decided to use JUnit 
Categories and Maven profiles to select the tests to run (achieves a similar 
affect as TestNG Groups). I think it is a more robust and straightforward way 
of running tests. Now any fe/ tests that should be run for S3 can simply be 
tagged with the Java annotation {{@Category(S3Tests.class)}}.

The failing {{S3PlannerTest}}-s are:
* org.apache.impala.planner.S3PlannerTest.testTpcds
* org.apache.impala.planner.S3PlannerTest.testTpch
* org.apache.impala.planner.S3PlannerTest.testJoinOrder
* org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite

All are failing for non-trivial reasons - e.g. actual differences in the 
explain plans when running on S3 vs. HDFS data (e.g. differences in memory 
estimates, join orders, etc.). I've opened IMPALA-8949 to investigate this.

> Update and re-enable S3PlannerTest
> --
>
> Key: IMPALA-8944
> URL: https://issues.apache.org/jira/browse/IMPALA-8944
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. 
> When run against a HDFS mini-cluster, they are skipped because the 
> {{TARGET_FILESYSTEM}} is not S3. On our S3 jobs, they don't run either 
> because we skip all fe/ tests (most of them don't work against S3 / assume 
> they are running on HDFS).
> A few things need to be fixed to get this working:
> * The test cases in {{S3PlannerTest}} need to be fixed
> * The Jenkins jobs that runs the S3 tests needs the ability to run specific 
> fe/ tests (e.g. just the {{S3PlannerTest}} and to skip the rest)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8949) PlannerTest differences when running on S3 vs HDFS

2019-09-17 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8949:


 Summary: PlannerTest differences when running on S3 vs HDFS
 Key: IMPALA-8949
 URL: https://issues.apache.org/jira/browse/IMPALA-8949
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Sahil Takiar


While re-enabling the {{S3PlannerTest}} in IMPALA-8944, there are several tests 
that are consistently failing due to actual diffs in the explain plan:
* org.apache.impala.planner.S3PlannerTest.testTpcds
* org.apache.impala.planner.S3PlannerTest.testTpch
* org.apache.impala.planner.S3PlannerTest.testJoinOrder
* org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite

All are failing for non-trivial reasons - e.g. differences in memory estimates, 
join orders, etc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8946) Prometheus histograms do not follow conventions

2019-09-17 Thread Guillem (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guillem updated IMPALA-8946:

Attachment: 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch

> Prometheus histograms do not follow conventions
> ---
>
> Key: IMPALA-8946
> URL: https://issues.apache.org/jira/browse/IMPALA-8946
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Guillem
>Assignee: Guillem
>Priority: Minor
> Attachments: 
> 0001-IMPALA-8946-Fix-histogram-rendering-to-Prometheus.patch
>
>
> We've been using Prometheus metrics and we've found that some standard 
> Prometheus parser can not properly interpret histograms from Impala.
> For example, Python official client 
> ([https://github.com/prometheus/client_python)] can not properly read them. 
> I've been digging a little bit why it can't read them and I've found that 
> Impala does not adhere to textual histogram conventions.
> The following link describes the conventions for rendering histograms on 
> Prometheus textual format: 
> [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries]
> This is an example of a rendered histogram on Impala 3.3 on Prometheus 
> endpoint:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_max 0
> impala_thrift_server_backend_svc_thread_wait_time_min 0
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> The linked histogram conventions say that
> {quote}Each bucket count of a histogram named x is given as a separate sample 
> line with the name x_bucket and a label \{le="y"} (where y is the upper bound 
> of the bucket).
> {quote}
> And also
> {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be 
> identical to the value of x_count.
> {quote}
> The previous example should be formatted as:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> I've found that with this format, the official python client is able to 
> properly read the histograms.
> Note also that metrics suffixed with `_min` and `_max` are also out of the 
> convention and they also break histogram parsing and maybe they need to be 
> reported as separated metrics (maybe as gauges?)
> If you are fine with doing this changes, I already have a patch to improve 
> the histogram formatting and I can submit it to review.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions

2019-09-17 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931722#comment-16931722
 ] 

Tim Armstrong commented on IMPALA-8946:
---

[~guillemnieto] hey guillem, I understand the concern. I'm going to check with 
Cloudera's gerrit admins to see if we can do anything about it.

You should feel free to attach the patch to this JIRA (i.e. run "git 
format-patch -1"). It's not ideal but should work fine.

We can iterate on the review here (I might upload it to gerrit myself just for 
convenience of reviewing and commenting).

> Prometheus histograms do not follow conventions
> ---
>
> Key: IMPALA-8946
> URL: https://issues.apache.org/jira/browse/IMPALA-8946
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Guillem
>Assignee: Guillem
>Priority: Minor
>
> We've been using Prometheus metrics and we've found that some standard 
> Prometheus parser can not properly interpret histograms from Impala.
> For example, Python official client 
> ([https://github.com/prometheus/client_python)] can not properly read them. 
> I've been digging a little bit why it can't read them and I've found that 
> Impala does not adhere to textual histogram conventions.
> The following link describes the conventions for rendering histograms on 
> Prometheus textual format: 
> [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries]
> This is an example of a rendered histogram on Impala 3.3 on Prometheus 
> endpoint:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_max 0
> impala_thrift_server_backend_svc_thread_wait_time_min 0
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> The linked histogram conventions say that
> {quote}Each bucket count of a histogram named x is given as a separate sample 
> line with the name x_bucket and a label \{le="y"} (where y is the upper bound 
> of the bucket).
> {quote}
> And also
> {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be 
> identical to the value of x_count.
> {quote}
> The previous example should be formatted as:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> I've found that with this format, the official python client is able to 
> properly read the histograms.
> Note also that metrics suffixed with `_min` and `_max` are also out of the 
> convention and they also break histogram parsing and maybe they need to be 
> reported as separated metrics (maybe as gauges?)
> If you are fine with doing this changes, I already have a patch to improve 
> the histogram formatting and I can submit it to review.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7322) Add storage wait time to profile for operations with metadata load

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931781#comment-16931781
 ] 

ASF subversion and git services commented on IMPALA-7322:
-

Commit 7136e8b965bd0df974dccd1419ea65d42c494c06 in impala's branch 
refs/heads/master from Yongzhi Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7136e8b ]

IMPALA-7322: Add storage wait time to profile

Add metrics to record storage wait time for operations with
metadata load in catalog for hdfs, kudu and hbase tables.
Pass storage wait time from catalog to fe through thrift
and log total storage load time in query profile.
Storage-load-time is the amount of time spent loading metadata
from the underlying storage layer (e.g. S3, HDFS, Kudu, HBase),
which does not include the amount of time spending loading data
from HMS.

Testing:
* Ran queries that can trigger all of, none of or
  some of the related tables loading.
* Check query profile for each query.
* Check catalog metrics for each table.
* Add unit tests to test_observability.py
* Ran all core tests.

Sample output:

Profile for Catalog V1: (storage-load-time is the added property and
it is part of Metadata load in Query Compilation):
After ran a hbase query (Metadata load finished is divided into
several lines because of limitation of commit message):

Query Compilation: 4s401ms
  - Metadata load started: 661.084us (661.084us)
  - Metadata load finished. loaded-tables=1/1
  load-requests=1 catalog-updates=3
  storage-load-time=233ms: 3s819ms (3s819ms)
  - Analysis finished: 3s820ms (763.979us)
  - Value transfer graph computed: 3s820ms (63.193us)

Profile for Catalog V2: (StorageLoad.Time is the added property and it
is in CatalogFetch):

Frontend:
   - CatalogFetch.ColumnStats.Misses: 1
   - CatalogFetch.ColumnStats.Requests: 1
   - CatalogFetch.ColumnStats.Time: 0
   - CatalogFetch.Config.Misses: 1
   - CatalogFetch.Config.Requests: 1
   - CatalogFetch.Config.Time: 3ms
   - CatalogFetch.DatabaseList.Hits: 1
   - CatalogFetch.DatabaseList.Requests: 1
   - CatalogFetch.DatabaseList.Time: 0
   - CatalogFetch.PartitionLists.Misses: 1
   - CatalogFetch.PartitionLists.Requests: 1
   - CatalogFetch.PartitionLists.Time: 4ms
   - CatalogFetch.Partitions.Hits: 2
   - CatalogFetch.Partitions.Misses: 1
   - CatalogFetch.Partitions.Requests: 3
   - CatalogFetch.Partitions.Time: 1ms
   - CatalogFetch.RPCs.Bytes: 1.01 KB (1036)
   - CatalogFetch.RPCs.Requests: 4
   - CatalogFetch.RPCs.Time: 93ms
   - CatalogFetch.StorageLoad.Time: 68ms
   - CatalogFetch.TableNames.Hits: 2
   - CatalogFetch.TableNames.Requests: 2
   - CatalogFetch.TableNames.Time: 0
   - CatalogFetch.Tables.Misses: 1
   - CatalogFetch.Tables.Requests: 1
   - CatalogFetch.Tables.Time: 91ms

Catalog metrics(this sample is from a hdfs table):

storage-metadata-load-duration:
   Count: 1
   Mean rate: 0.0085
   1 min. rate: 0.032
   5 min. rate: 0.1386
   15 min. rate: 0.177
   Min (msec): 111
   Max (msec): 111
   Mean (msec): 111.1802
   Median (msec): 111.1802
   75th-% (msec): 111.1802
   95th-% (msec): 111.1802
   99th-% (msec): 111.1802

Change-Id: I7447f8c8e7e50eb71d18643859d2e3de865368d2
Reviewed-on: http://gerrit.cloudera.org:8080/13786
Tested-by: Impala Public Jenkins 
Reviewed-by: Sahil Takiar 


> Add storage wait time to profile for operations with metadata load
> --
>
> Key: IMPALA-7322
> URL: https://issues.apache.org/jira/browse/IMPALA-7322
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Yongzhi Chen
>Priority: Major
>
> The profile of a REFRESH or of the query triggering metadata load should 
> point out how much time was spent waiting for source systems.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931702#comment-16931702
 ] 

ASF subversion and git services commented on IMPALA-8903:
-

Commit 05e1a4f2185c892226497b4592fe0eac3e2747d2 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=05e1a4f ]

IMPALA-8903: [DOCS] TRUNCATE is supported for Insert-only transactional tables

Change-Id: Ib457775fab03d7f30430fd7dea6404dfcf0783a8
Reviewed-on: http://gerrit.cloudera.org:8080/14235
Tested-by: Impala Public Jenkins 
Reviewed-by: Zoltan Borok-Nagy 


> Impala Doc: TRUNCATE for Insert-only ACID tables
> 
>
> Key: IMPALA-8903
> URL: https://issues.apache.org/jira/browse/IMPALA-8903
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
>
> https://gerrit.cloudera.org/#/c/14235/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931699#comment-16931699
 ] 

ASF subversion and git services commented on IMPALA-8930:
-

Commit e74c294b133dc8fcf523b50ffa9cbb7986ca8790 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e74c294 ]

IMPALA-8930: [DOCS] Object ownership support when integrated with Ranger

Change-Id: Ie4fdaf05953373c8d1870b7eface257830c7c6e5
Reviewed-on: http://gerrit.cloudera.org:8080/14229
Reviewed-by: Bharath Vissapragada 
Tested-by: Impala Public Jenkins 


> Impala Doc: Document object ownership with Ranger authorization provider
> 
>
> Key: IMPALA-8930
> URL: https://issues.apache.org/jira/browse/IMPALA-8930
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8945) Impala Doc: Incorrect Claim of Equivalence in Impala Docs

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931700#comment-16931700
 ] 

ASF subversion and git services commented on IMPALA-8945:
-

Commit e38d57fe5d0266d0423a04b7f0b7350a3fd300f2 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e38d57f ]

IMPALA-8945: [DOCS] Fixed an incorrect example of DISTINT FROM and its 
equivalent

Change-Id: I9bee4c0935ee21d70a0964507c477a2fccb1c7cc
Reviewed-on: http://gerrit.cloudera.org:8080/14239
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 


> Impala Doc: Incorrect Claim of Equivalence in Impala Docs
> -
>
> Key: IMPALA-8945
> URL: https://issues.apache.org/jira/browse/IMPALA-8945
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> Reported by [~icook]
> The Impala docs entry for the IS DISTINCT FROM operator states:
> The <=> operator, used like an equality operator in a join query, is more 
> efficient than the equivalent clause: A = B OR (A IS NULL AND B IS NULL). The 
> <=> operator can use a hash join, while the OR expression cannot.
> But this expression is not equivalent to A <=> B. See the attached screenshot 
> demonstrating their non-equivalence. An expression that is equivalent to A 
> <=> B is this:
> (A IS NULL AND B IS NULL) OR ((A IS NOT NULL AND B IS NOT NULL) AND (A = B))
>  This expression should replace the existing incorrect expression.
> Another expression that is equivalent to A <=> B is:
> if(A IS NULL OR B IS NULL, A IS NULL AND B IS NULL, A = B)
> This one is a bit easier to follow. If you use this one in the docs, just 
> replace the following line with:
> The <=> operator can use a hash join, while the if expression cannot.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8947) SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric

2019-09-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931701#comment-16931701
 ] 

ASF subversion and git services commented on IMPALA-8947:
-

Commit 451d31f2b45d0bfdfab8d8110cc63b06a061849a in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=451d31f ]

IMPALA-8947: scratch alloc error uses wrong metric

Use the global metric instead of the per-FileGroup
counter.

Testing:
Updated unit test to validate the error message in
a case where there are two FileGroups.

Change-Id: I2732dcd49c277d5d278fad68efa6ef381bc0eb81
Reviewed-on: http://gerrit.cloudera.org:8080/14236
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> SCRATCH_ALLOCATION_FAILED error uses wrong utilisation metric
> -
>
> Key: IMPALA-8947
> URL: https://issues.apache.org/jira/browse/IMPALA-8947
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: supportability
> Fix For: Impala 3.4.0
>
>
> {noformat}
> ERROR: Could not create files in any configured scratch directories 
> (--scratch_dirs=/path/to/scratch) on backend ':22000'. 69.80 GB of 
> scratch is currently in use by this Impala Daemon (69.80 GB by this query). 
> See logs for previous errors that may have prevented creating or writing 
> scratch files. The following directories were at capacity: /path/to/scratch
> {noformat}
> This issue is that the total for the impala daemon uses the wrong counter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions

2019-09-17 Thread Guillem (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931574#comment-16931574
 ] 

Guillem commented on IMPALA-8946:
-

Hi [~tarmstrong]!

I was about to authorize Gerrit to access to GitHub, but I found that I've to 
grant read *and write* access to all of my public repositories on the following 
resources:

{quote}
This application will be able to read and write all public repository data. 
This includes the following:

Code
Issues
Pull requests
Wikis
Settings
Webhooks and services
Deploy keys
{quote}

It also requires me to grant access to "organization, team membership, and 
private project boards" on some organizations.

I won't allow Gerrit to access to all of that privileges. It's possible to 
reduce the scope of this authorization? Is there any alternative way to submit 
the patch?

Thanks!

> Prometheus histograms do not follow conventions
> ---
>
> Key: IMPALA-8946
> URL: https://issues.apache.org/jira/browse/IMPALA-8946
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Guillem
>Assignee: Guillem
>Priority: Minor
>
> We've been using Prometheus metrics and we've found that some standard 
> Prometheus parser can not properly interpret histograms from Impala.
> For example, Python official client 
> ([https://github.com/prometheus/client_python)] can not properly read them. 
> I've been digging a little bit why it can't read them and I've found that 
> Impala does not adhere to textual histogram conventions.
> The following link describes the conventions for rendering histograms on 
> Prometheus textual format: 
> [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries]
> This is an example of a rendered histogram on Impala 3.3 on Prometheus 
> endpoint:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_max 0
> impala_thrift_server_backend_svc_thread_wait_time_min 0
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> The linked histogram conventions say that
> {quote}Each bucket count of a histogram named x is given as a separate sample 
> line with the name x_bucket and a label \{le="y"} (where y is the upper bound 
> of the bucket).
> {quote}
> And also
> {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be 
> identical to the value of x_count.
> {quote}
> The previous example should be formatted as:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> I've found that with this format, the official python client is able to 
> properly read the histograms.
> Note also that metrics suffixed with `_min` and `_max` are also out of the 
> convention and they also break histogram parsing and maybe they need to be 
> reported as separated metrics (maybe as gauges?)
> If you are fine with doing this changes, I already have a patch to improve 
> the histogram formatting and I can submit it to review.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8944) Update and re-enable S3PlannerTest

2019-09-17 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931788#comment-16931788
 ] 

Sahil Takiar edited comment on IMPALA-8944 at 9/17/19 8:20 PM:
---

I've got 14 of the original 19 unit tests in {{S3PlannerTest}} working. Instead 
of relying on a test filter in {{run-all-tests.sh}}, I decided to use JUnit 
Categories and Maven profiles to select the tests to run (achieves a similar 
affect as TestNG Groups). I think it is a more robust and straightforward way 
of running tests. Now any fe/ tests that should be run for S3 can simply be 
tagged with the Java annotation {{@Category(S3Tests.class)}}.

The failing {{S3PlannerTest}}-s are:
* org.apache.impala.planner.S3PlannerTest.testTpcds
* org.apache.impala.planner.S3PlannerTest.testTpch
* org.apache.impala.planner.S3PlannerTest.testJoinOrder
* org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite

All are failing for non-trivial reasons - e.g. actual differences in the 
explain plans when running on S3 vs. HDFS data (e.g. differences in memory 
estimates, join orders, etc.). I've opened IMPALA-8949 to investigate this.

{{testS3ScanRanges()}} is failing as well, but for other reasons described 
above.


was (Author: stakiar):
I've got 14 of the original 17 unit tests in {{S3PlannerTest}} working. Instead 
of relying on a test filter in {{run-all-tests.sh}}, I decided to use JUnit 
Categories and Maven profiles to select the tests to run (achieves a similar 
affect as TestNG Groups). I think it is a more robust and straightforward way 
of running tests. Now any fe/ tests that should be run for S3 can simply be 
tagged with the Java annotation {{@Category(S3Tests.class)}}.

The failing {{S3PlannerTest}}-s are:
* org.apache.impala.planner.S3PlannerTest.testTpcds
* org.apache.impala.planner.S3PlannerTest.testTpch
* org.apache.impala.planner.S3PlannerTest.testJoinOrder
* org.apache.impala.planner.S3PlannerTest.testSubqueryRewrite

All are failing for non-trivial reasons - e.g. actual differences in the 
explain plans when running on S3 vs. HDFS data (e.g. differences in memory 
estimates, join orders, etc.). I've opened IMPALA-8949 to investigate this.

> Update and re-enable S3PlannerTest
> --
>
> Key: IMPALA-8944
> URL: https://issues.apache.org/jira/browse/IMPALA-8944
> Project: IMPALA
>  Issue Type: Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> It looks like we don't run {{S3PlannerTest}} in our regular Jenkins jobs. 
> When run against a HDFS mini-cluster, they are skipped because the 
> {{TARGET_FILESYSTEM}} is not S3. On our S3 jobs, they don't run either 
> because we skip all fe/ tests (most of them don't work against S3 / assume 
> they are running on HDFS).
> A few things need to be fixed to get this working:
> * The test cases in {{S3PlannerTest}} need to be fixed
> * The Jenkins jobs that runs the S3 tests needs the ability to run specific 
> fe/ tests (e.g. just the {{S3PlannerTest}} and to skip the rest)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8634) Catalog client should be resilient to temporary Catalog outage

2019-09-17 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8634 started by Sahil Takiar.

> Catalog client should be resilient to temporary Catalog outage
> --
>
> Key: IMPALA-8634
> URL: https://issues.apache.org/jira/browse/IMPALA-8634
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Sahil Takiar
>Priority: Critical
>
> Currently, when the catalog server is down, catalog clients will fail all 
> RPCs sent to it. In essence, DDL queries will fail and the Impala service 
> becomes a lot less functional. Catalog clients should consider retrying 
> failed RPCs with some exponential backoff in between while catalog server is 
> being restarted after crashing. We probably need to add [a test 
> |https://github.com/apache/impala/blob/master/tests/custom_cluster/test_restart_services.py]
>  to exercise the paths of catalog restart to verify coordinators are 
> resilient to it.
> cc'ing [~stakiar], [~joemcdonnell], [~twm378]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8942) Set file format specific values for split sizes on non-block stores

2019-09-17 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8942 started by Sahil Takiar.

> Set file format specific values for split sizes on non-block stores
> ---
>
> Key: IMPALA-8942
> URL: https://issues.apache.org/jira/browse/IMPALA-8942
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> Parquet scans on non-block based storage systems (e.g. S3, ADLS, etc.) can 
> suffer from uneven scan range assignment due to the behavior described in 
> IMPALA-3453. The frontend should set different split sizes depending on the 
> file type and file system.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2515) Impala is unable to read a Parquet decimal column if size is larger than needed

2019-09-17 Thread Yongzhi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned IMPALA-2515:


Assignee: Yongzhi Chen

> Impala is unable to read a Parquet decimal column if size is larger than 
> needed
> ---
>
> Key: IMPALA-2515
> URL: https://issues.apache.org/jira/browse/IMPALA-2515
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Taras Bobrovytsky
>Assignee: Yongzhi Chen
>Priority: Minor
>  Labels: ramp-up
>
> Impala cannot read this:
> {code}
> {"name": "tmp_1",
>  "type": "fixed",
>  "size": 8,
>  "logicalType": "decimal",
>  "precision": 10,
>  "scale": 5}
> {code}
> However, this can be read:
> {code}
> {"name": "tmp_1",
>  "type": "fixed",
>  "size": 5,
>  "logicalType": "decimal",
>  "precision": 10,
>  "scale": 5}
> {code}
> Size must be precisely set to this, or Impala is unable to read the decimal 
> column:
> {code}
> size = int(math.ceil((math.log(2, 10) + precision) / math.log(256, 10)))
> {code}
> There is nothing in the Parquet spec that says that Decimal columns must be 
> sized precisely. Arguably it's a bug in the writer if it's doing it, because 
> it's just wasting space.
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8903) Impala Doc: TRUNCATE for Insert-only ACID tables

2019-09-17 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8903.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> Impala Doc: TRUNCATE for Insert-only ACID tables
> 
>
> Key: IMPALA-8903
> URL: https://issues.apache.org/jira/browse/IMPALA-8903
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
> Fix For: Impala 3.4.0
>
>
> https://gerrit.cloudera.org/#/c/14235/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8861) Impala Doc: Document Jaro-winkler edit distance and similarity built-in functions

2019-09-17 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8861:

Description: https://gerrit.cloudera.org/#/c/14249/

> Impala Doc: Document Jaro-winkler edit distance and similarity built-in 
> functions
> -
>
> Key: IMPALA-8861
> URL: https://issues.apache.org/jira/browse/IMPALA-8861
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.3.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
>
> https://gerrit.cloudera.org/#/c/14249/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8861) Impala Doc: Document Jaro-winkler edit distance and similarity built-in functions

2019-09-17 Thread Alex Rodoni (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8861 started by Alex Rodoni.
---
> Impala Doc: Document Jaro-winkler edit distance and similarity built-in 
> functions
> -
>
> Key: IMPALA-8861
> URL: https://issues.apache.org/jira/browse/IMPALA-8861
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.3.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_34
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8065) OSInfo produces somewhat misleading output when running in container

2019-09-17 Thread Andrew Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-8065:
--

Assignee: Andrew Sherman

> OSInfo produces somewhat misleading output when running in container
> 
>
> Key: IMPALA-8065
> URL: https://issues.apache.org/jira/browse/IMPALA-8065
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Andrew Sherman
>Priority: Minor
>
> It uses /proc/version, which returns the host version. It would be good to 
> also get the version from lsb-release from the Ubuntu container we're running 
> in and disambiguate on the debug page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8065) OSInfo produces somewhat misleading output when running in container

2019-09-17 Thread Andrew Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-8065:
--

Assignee: Xiaomeng Zhang  (was: Andrew Sherman)

> OSInfo produces somewhat misleading output when running in container
> 
>
> Key: IMPALA-8065
> URL: https://issues.apache.org/jira/browse/IMPALA-8065
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Xiaomeng Zhang
>Priority: Minor
>
> It uses /proc/version, which returns the host version. It would be good to 
> also get the version from lsb-release from the Ubuntu container we're running 
> in and disambiguate on the debug page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7504) ParseKerberosPrincipal() should use krb5_parse_name() instead

2019-09-17 Thread Andrew Sherman (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-7504:
--

Assignee: Xiaomeng Zhang

> ParseKerberosPrincipal() should use krb5_parse_name() instead
> -
>
> Key: IMPALA-7504
> URL: https://issues.apache.org/jira/browse/IMPALA-7504
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Xiaomeng Zhang
>Priority: Minor
>  Labels: ramp-up
>
> [~tlipcon] pointed out during code review that we should be using 
> krb5_parse_name() to parse the principal instead of creating our own
> bq. I wonder whether we should just be using krb5_parse_name here instead of 
> implementing our own parsing? According to 
> [http://web.mit.edu/kerberos/krb5-1.15/doc/appdev/refs/api/krb5_parse_name.html]
>  there are various escapings, etc, that this function isn't currently 
> supporting.
> We currently do the following to parse the principal:
> {noformat}
>   vector names;
>   split(names, principal, is_any_of("/"));
>   if (names.size() != 2) return Status(TErrorCode::BAD_PRINCIPAL_FORMAT, 
> principal);
>   *service_name = names[0];
>   string remaining_principal = names[1];
>   split(names, remaining_principal, is_any_of("@"));
>   if (names.size() != 2) return Status(TErrorCode::BAD_PRINCIPAL_FORMAT, 
> principal);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8950) Add -d and -f option to copyFromLocal and re-enable disabled S3 tests

2019-09-17 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8950:


 Summary: Add -d and -f option to copyFromLocal and re-enable 
disabled S3 tests
 Key: IMPALA-8950
 URL: https://issues.apache.org/jira/browse/IMPALA-8950
 Project: IMPALA
  Issue Type: Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


The {{-d}} option for {{hdfs dfs -copyFromLocal}} "Skip[s] creation of 
temporary file with the suffix ._COPYING_". The {{-f}} option "Overwrites the 
destination if it already exists".

By using the {{-d}} option, copies to S3 avoid the additional overhead of 
copying data to a tmp file and then renaming the file. The {{-f}} option 
overwrites the file if it exists, which should be safe since tests should be 
writing to unique directories anyway. With HADOOP-16490, 
{{create(overwrite=true)}} avoids issuing a HEAD request on the path, which 
prevents any cached 404s on the S3 key.

After these changes, the tests disabled by IMPALA-8189 can be re-enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8946) Prometheus histograms do not follow conventions

2019-09-17 Thread Guillem (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931482#comment-16931482
 ] 

Guillem commented on IMPALA-8946:
-

Yes, I already read the contributing guidelines (but maybe I need some help on 
the following days).

I've been working on this, but I have some doubt regarding what's the meaning 
of an histogram on Impala. After reading the differences between summaries and 
histograms meaning on Prometheus 
(https://prometheus.io/docs/practices/histograms/) I got the impression that 
`summary` is the type that Impala should render (instead of `histogram`).

So, given the example on the issue, I think that the rendered output should be:

{code}
# HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time clients 
of Impala Backend Server spent waiting for service threads
# TYPE impala_thrift_server_backend_svc_thread_wait_time summary
impala_thrift_server_backend_svc_thread_wait_time{quantile="0.2"} 0
impala_thrift_server_backend_svc_thread_wait_time{quantile="0.5"} 0
impala_thrift_server_backend_svc_thread_wait_time{quantile="0.7"} 0
impala_thrift_server_backend_svc_thread_wait_time{quantile="0.9"} 0
impala_thrift_server_backend_svc_thread_wait_time{quantile="0.95"} 0
impala_thrift_server_backend_svc_thread_wait_time{quantile="0.999"} 0
impala_thrift_server_backend_svc_thread_wait_time_count 49
{code}


> Prometheus histograms do not follow conventions
> ---
>
> Key: IMPALA-8946
> URL: https://issues.apache.org/jira/browse/IMPALA-8946
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Guillem
>Assignee: Guillem
>Priority: Minor
>
> We've been using Prometheus metrics and we've found that some standard 
> Prometheus parser can not properly interpret histograms from Impala.
> For example, Python official client 
> ([https://github.com/prometheus/client_python)] can not properly read them. 
> I've been digging a little bit why it can't read them and I've found that 
> Impala does not adhere to textual histogram conventions.
> The following link describes the conventions for rendering histograms on 
> Prometheus textual format: 
> [https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries]
> This is an example of a rendered histogram on Impala 3.3 on Prometheus 
> endpoint:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_max 0
> impala_thrift_server_backend_svc_thread_wait_time_min 0
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> The linked histogram conventions say that
> {quote}Each bucket count of a histogram named x is given as a separate sample 
> line with the name x_bucket and a label \{le="y"} (where y is the upper bound 
> of the bucket).
> {quote}
> And also
> {quote}A histogram must have a bucket with \{le="+Inf"}. Its value must be 
> identical to the value of x_count.
> {quote}
> The previous example should be formatted as:
> {code:java}
> # HELP impala_thrift_server_backend_svc_thread_wait_time Amount of time 
> clients of Impala Backend Server spent waiting for service threads
> # TYPE impala_thrift_server_backend_svc_thread_wait_time histogram
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.2"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.5"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.7"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.9"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.95"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="0.999"} 0
> impala_thrift_server_backend_svc_thread_wait_time_bucket{le="+Inf"} 49
> impala_thrift_server_backend_svc_thread_wait_time_count 49
> {code}
> I've found that with this format, the official python client is able to 
> properly read the histograms.
> Note also that metrics suffixed with `_min` and `_max` are also out of the 
> convention and they also break histogram parsing and maybe they need to be 
> reported as separated metrics (maybe as gauges?)
> If you are fine with doing this changes, I already have a patch to improve 
> the histogram formatting and I can submit it to review.



--
This message was sent by Atlassian