date:20190503

[jira] [Work started] (IMPALA-8494) Impala Doc: Document GRANT/REVOKE privilege to GROUP

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8494 started by Alex Rodoni.
---
> Impala Doc: Document GRANT/REVOKE privilege to GROUP
> 
>
> Key: IMPALA-8494
> URL: https://issues.apache.org/jira/browse/IMPALA-8494
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-8493) Impala Doc: Document GRANT/REVOKE privilege to USER

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8493 started by Alex Rodoni.
---
> Impala Doc: Document GRANT/REVOKE privilege to USER
> ---
>
> Key: IMPALA-8493
> URL: https://issues.apache.org/jira/browse/IMPALA-8493
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8494) Impala Doc: Document GRANT/REVOKE privilege to GROUP

2019-05-03 Thread Alex Rodoni (JIRA)

Alex Rodoni created IMPALA-8494:
---

 Summary: Impala Doc: Document GRANT/REVOKE privilege to GROUP
 Key: IMPALA-8494
 URL: https://issues.apache.org/jira/browse/IMPALA-8494
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8493) Impala Doc: Document GRANT/REVOKE privilege to USER

2019-05-03 Thread Alex Rodoni (JIRA)

Alex Rodoni created IMPALA-8493:
---

 Summary: Impala Doc: Document GRANT/REVOKE privilege to USER
 Key: IMPALA-8493
 URL: https://issues.apache.org/jira/browse/IMPALA-8493
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3816) Codegen perf-critical loops in Sorter

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3816:
-

Assignee: Abhishek Rawat  (was: Tianyi Wang)

> Codegen perf-critical loops in Sorter
> -
>
> Key: IMPALA-3816
> URL: https://issues.apache.org/jira/browse/IMPALA-3816
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Minor
>  Labels: codegen
> Attachments: percentile query profile.txt, tpch_30.txt
>
>
> In the sorter, we codegen the comparator function but call it indirectly via 
> a function pointer. We should consider codegening the perf-critical loops so 
> that we can make the comparator function call direct and inlinable. Inlining 
> the comparison will be very beneficial if it is trivial, e.g. order by a 
> numeric column: I expect sorts on simple keys will get noticably faster.
> We should also be able to get rid of FreeLocalAllocations() calls for most 
> comparators, although I'm not sure what the best way to approach that is.
> The Partition() loop is the most perf-critical, followed by InsertionSort().
> We also don't do this yet for the TopN node, see IMPALA-3815.
> Mostafa's analysis:
> While evaluating Sort performance I noticed that the codegened compare 
> function is not inlined which results in large overhead per row. 
> Expected speedup is 10-15%
> {code}
>   /// Returns a negative value if lhs is less than rhs, a positive value if 
> lhs is
>   /// greater than rhs, or 0 if they are equal. All exprs 
> (ordering_exprs_lhs_ and
>   /// ordering_exprs_rhs_) must have been prepared and opened before calling 
> this,
>   /// i.e. 'sort_key_exprs' in the constructor must have been opened.
>   int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {
> return codegend_compare_fn_ == NULL ?
> CompareInterpreted(lhs, rhs) :
> (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),
> ordering_expr_evals_rhs_.data(), lhs, rhs);
>   } 
> {code}
> From Perf
> {code}
>   │bool Sorter::TupleSorter::Less(const TupleRow* lhs, const 
> TupleRow* rhs) {  
>   
>▒
>   7.43 │  push   %rbp 
>   
>   
>▒
>   3.23 │  mov%rsp,%rbp
>   
>   
>▒
>   9.44 │  push   %r12 
>   
>   
>▒
>   2.69 │  push   %rbx 
>   
>   
>▒
>   3.89 │  mov%rsi,%r12
>   
>   
>▒
>   2.98 │  mov%rdi,%rbx
>   
>   
>▒
>   6.06 │  sub$0x10,%rsp   
>   
>   
>◆
>│  --num_comparisons_till_free_;   
>   
>   
>▒
>│  DCHECK_GE(num_comparisons_till_free_, 0);   
>   
>   
>▒
>│  if (UNLIKELY(num_comparisons_till_free_ == 0)) {
>

[jira] [Commented] (IMPALA-3816) Codegen perf-critical loops in Sorter

2019-05-03 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832926#comment-16832926
 ] 

Tim Armstrong commented on IMPALA-3816:
---

See IMPALA-4065 - Tianyi had a patch that tackled both.

> Codegen perf-critical loops in Sorter
> -
>
> Key: IMPALA-3816
> URL: https://issues.apache.org/jira/browse/IMPALA-3816
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Minor
>  Labels: codegen
> Attachments: percentile query profile.txt, tpch_30.txt
>
>
> In the sorter, we codegen the comparator function but call it indirectly via 
> a function pointer. We should consider codegening the perf-critical loops so 
> that we can make the comparator function call direct and inlinable. Inlining 
> the comparison will be very beneficial if it is trivial, e.g. order by a 
> numeric column: I expect sorts on simple keys will get noticably faster.
> We should also be able to get rid of FreeLocalAllocations() calls for most 
> comparators, although I'm not sure what the best way to approach that is.
> The Partition() loop is the most perf-critical, followed by InsertionSort().
> We also don't do this yet for the TopN node, see IMPALA-3815.
> Mostafa's analysis:
> While evaluating Sort performance I noticed that the codegened compare 
> function is not inlined which results in large overhead per row. 
> Expected speedup is 10-15%
> {code}
>   /// Returns a negative value if lhs is less than rhs, a positive value if 
> lhs is
>   /// greater than rhs, or 0 if they are equal. All exprs 
> (ordering_exprs_lhs_ and
>   /// ordering_exprs_rhs_) must have been prepared and opened before calling 
> this,
>   /// i.e. 'sort_key_exprs' in the constructor must have been opened.
>   int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const {
> return codegend_compare_fn_ == NULL ?
> CompareInterpreted(lhs, rhs) :
> (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(),
> ordering_expr_evals_rhs_.data(), lhs, rhs);
>   } 
> {code}
> From Perf
> {code}
>   │bool Sorter::TupleSorter::Less(const TupleRow* lhs, const 
> TupleRow* rhs) {  
>   
>▒
>   7.43 │  push   %rbp 
>   
>   
>▒
>   3.23 │  mov%rsp,%rbp
>   
>   
>▒
>   9.44 │  push   %r12 
>   
>   
>▒
>   2.69 │  push   %rbx 
>   
>   
>▒
>   3.89 │  mov%rsi,%r12
>   
>   
>▒
>   2.98 │  mov%rdi,%rbx
>   
>   
>▒
>   6.06 │  sub$0x10,%rsp   
>   
>   
>◆
>│  --num_comparisons_till_free_;   
>   
>   
>▒
>│  DCHECK_GE(num_comparisons_till_free_, 0);   
>   
>   
>▒
>│  if (UNLIKELY(num_comparisons_till_free_ == 0)) {
>

[jira] [Assigned] (IMPALA-4065) Inline comparator calls into TopN::InsertBatch()

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4065:
-

Assignee: Abhishek Rawat  (was: Tianyi Wang)

> Inline comparator calls into TopN::InsertBatch()
> 
>
> Key: IMPALA-4065
> URL: https://issues.apache.org/jira/browse/IMPALA-4065
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Minor
>  Labels: codegen, ramp-up
> Attachments: 
> 0001-WIP-IMPALA-3816-IMPALA-4065-full-TupleRowComparator-.patch
>
>
> This is the more interesting follow-on from IMPALA-3815. We should inline the 
> Compare() calls in the codegen'd TopN code to avoid the indirect function 
> pointer call.
> The tricky aspect is that the Compare() calls are called from 
> std::priority_queue, and we don't have a way to force-inline those functions 
> at the moment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8294) Inconsistent updating of BytesRead* counters

2019-05-03 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832923#comment-16832923
 ] 

Tim Armstrong commented on IMPALA-8294:
---

If you look in HdfsScanNodeBase, bytes_read_ follows the good pattern where it 
gets updated by DiskIoMgr in real time. bytes_read_local_ follows the bad 
pattern.


> Inconsistent updating of BytesRead* counters
> 
>
> Key: IMPALA-8294
> URL: https://issues.apache.org/jira/browse/IMPALA-8294
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Lars Volker
>Assignee: Abhishek Rawat
>Priority: Major
>  Labels: observability, profile
>
> Some of the BytesRead* counters in profiles (e.g. BytesReadLocal) are only 
> updated once a query finishes successfully. This leads to confusion because 
> queries that are still running or failed look like they did not read data 
> locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8294) Inconsistent updating of BytesRead* counters

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8294:
-

Assignee: Abhishek Rawat

> Inconsistent updating of BytesRead* counters
> 
>
> Key: IMPALA-8294
> URL: https://issues.apache.org/jira/browse/IMPALA-8294
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>Reporter: Lars Volker
>Assignee: Abhishek Rawat
>Priority: Major
>  Labels: observability, profile
>
> Some of the BytesRead* counters in profiles (e.g. BytesReadLocal) are only 
> updated once a query finishes successfully. This leads to confusion because 
> queries that are still running or failed look like they did not read data 
> locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8364) Impala Doc: Remove support for authorization policy file

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8364:

Description: https://gerrit.cloudera.org/#/c/13235/

> Impala Doc: Remove support for authorization policy file
> 
>
> Key: IMPALA-8364
> URL: https://issues.apache.org/jira/browse/IMPALA-8364
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Critical
>  Labels: future_release_doc, in_33
>
> https://gerrit.cloudera.org/#/c/13235/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8341) Data cache for remote reads

2019-05-03 Thread Alex Rodoni (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832887#comment-16832887
 ] 

Alex Rodoni commented on IMPALA-8341:
-

@kwho Is this a high priority feature to be documented? As a performance 
improvement or a scalability improvement?

> Data cache for remote reads
> ---
>
> Key: IMPALA-8341
> URL: https://issues.apache.org/jira/browse/IMPALA-8341
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud 
> settings (e.g. data stored in object store), the computation and storage are 
> no longer co-located. This breaks the typical pattern in which Impala query 
> fragment instances are scheduled at where the data is located. In this 
> setting, the network bandwidth requirement of both the nics and the top of 
> rack switches will go up quite a lot as the network traffic includes the data 
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache 
> at the compute nodes to cache the working set. With deterministic scan range 
> scheduling, each compute node should hold non-overlapping partitions of the 
> data set. 
> An initial prototype of the cache was posted here: 
> [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a 
> better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. 
> not holding the lock while doing IO).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7613) Support round(DECIMAL) with non-constant second argument

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7613:
-

Assignee: Abhishek Rawat

> Support round(DECIMAL) with non-constant second argument
> 
>
> Key: IMPALA-7613
> URL: https://issues.apache.org/jira/browse/IMPALA-7613
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Major
>  Labels: decimal, ramp-up
>
> Sometimes users want to round to a precision that is data-driven (e.g. using 
> a lookup table). They can't currently do this with decimal. I think we could 
> support this by just using the input decimal type as the output type when the 
> second argument is non-constant.
> {noformat}
> select round(l_tax, l_linenumber) from tpch.lineitem limit 5;
> Query: select round(l_tax, l_linenumber) from tpch.lineitem limit 5
> Query submitted at: 2018-09-24 11:03:10 (Coordinator: 
> http://tarmstrong-box:25000)
> ERROR: AnalysisException: round() must be called with a constant second 
> argument.
> {noformat}
> Motivated by a user trying to do something like this; 
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-round-function-does-not-return-expected-result/m-p/80200#M4906



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8450) Add support for zstd and lz4 in parquet

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8450:
-

Assignee: Abhishek Rawat

> Add support for zstd and lz4 in parquet
> ---
>
> Key: IMPALA-8450
> URL: https://issues.apache.org/jira/browse/IMPALA-8450
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Abhishek Rawat
>Priority: Major
>  Labels: parquet
>
> PARQUET-970 added these codecs to the format. We have LZ4 in the toolchain 
> already and I just added zstd: https://gerrit.cloudera.org/#/c/13079/
> These codec probably offer a better trade-off of density and speed than 
> snappy or gzip.
> https://github.com/apache/arrow/pull/807/files might be a useful crib sheet 
> for how to add a compressor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8341) Data cache for remote reads

2019-05-03 Thread Michael Ho (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho resolved IMPALA-8341.

Resolution: Fixed

Initial implementation merged. Improvement may be needed in the future.

> Data cache for remote reads
> ---
>
> Key: IMPALA-8341
> URL: https://issues.apache.org/jira/browse/IMPALA-8341
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud 
> settings (e.g. data stored in object store), the computation and storage are 
> no longer co-located. This breaks the typical pattern in which Impala query 
> fragment instances are scheduled at where the data is located. In this 
> setting, the network bandwidth requirement of both the nics and the top of 
> rack switches will go up quite a lot as the network traffic includes the data 
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache 
> at the compute nodes to cache the working set. With deterministic scan range 
> scheduling, each compute node should hold non-overlapping partitions of the 
> data set. 
> An initial prototype of the cache was posted here: 
> [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a 
> better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. 
> not holding the lock while doing IO).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8341) Data cache for remote reads

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832843#comment-16832843
 ] 

ASF subversion and git services commented on IMPALA-8341:
-

Commit 2ece4c9b2e114a5e8873c5ac69e75b84c62bf5bd in impala's branch 
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2ece4c9 ]

IMPALA-8341: Data cache for remote reads

This is a patch based on PhilZ's prototype: 
https://gerrit.cloudera.org/#/c/12683/

This change implements an IO data cache which is backed by
local storage. It implicitly relies on the OS page cache
management to shuffle data between memory and the storage
device. This is useful for caching data read from remote
filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS).

A data cache is divided into one or more partitions based on
the configuration string which is a list of directories, separated
by comma, followed by the storage capacity per directory.
An example configuration string is like the following:
  --data_cache_config=/data/0,/data/1:150GB

In the configuration above, the cache may use up to 300GB of
storage space, with 150GB max for /data/0 and /data/1 respectively.

Each partition has a meta-data cache which tracks the mappings
of cache keys to the locations of the cached data. A cache key
is a tuple of (file's name, file's modification time, file offset)
and a cache entry is a tuple of (backing file, offset in the backing
file, length of the cached data, optional checksum). Note that the
cache currently doesn't support overlapping ranges. In other words,
if the cache contains an entry of a file for range [m, m+4MB), a lookup
for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen
this as a problem but this may require further evaluation in the future.

Each partition stores its set of cached data in backing files created
on local storage. When inserting new data into the cache, the data is
appended to the current backing file in use. The storage consumption
of each cache entry counts towards the quota of that partition. When a
partition reaches its capacity, the least recently used (LRU) data in
that partition is evicted. Evicted data is removed from the underlying
storage by punching holes in the backing file it's stored in. As a
backing file reaches a certain size (by default 4TB), new data will
stop being appended to it and a new file will be created instead. Note
that due to hole punching, the backing file is actually sparse. When
the number of backing files per partition exceeds,
--data_cache_max_files_per_partition, files are deleted in the order
in which they are created. Stale cache entries referencing deleted
files are erased lazily or evicted due to inactivity.

Optionally, checksumming can be enabled to verify read from the cache
is consistent with what was inserted and to verify that multiple attempted
insertions with the same cache key have the same cache content.
Checksumming is enabled by default for debug builds.

To probe for cached data in the cache, the interface Lookup() is used;
To insert data into the cache, the interface Store() is used. Please note
that eviction happens inline currently during Store().

This patch also added two startup flags for start-impala-cluster.py:
'--data_cache_dir' specifies the base directory in which each Impalad
creates the caching directory
'--data_cache_size' specifies the capacity string for each cache directory.

Testing done:
- added a new BE and EE test
- exhaustive (debug, release) builds with cache enabled
- core ASAN build with cache enabled

Perf:
- 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement
over runs without the cache. Each node has a cache size of 150GB per node.
The performance is at parity with a configuration of a HDFS cluster using
EBS as the storage.

Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Reviewed-on: http://gerrit.cloudera.org:8080/12987
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Data cache for remote reads
> ---
>
> Key: IMPALA-8341
> URL: https://issues.apache.org/jira/browse/IMPALA-8341
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Critical
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud 
> settings (e.g. data stored in object store), the computation and storage are 
> no longer co-located. This breaks the typical pattern in which Impala query 
> fragment instances are scheduled at where the data is located. In this 
> setting, the network bandwidth requirement of both the nics and the top of 
> rack switches will go up quite a lot as the network traffic includes the data 
> fetch in addition

[jira] [Commented] (IMPALA-5351) Handle column comments for Kudu tables

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832845#comment-16832845
 ] 

ASF subversion and git services commented on IMPALA-5351:
-

Commit 3fb36570ae0c7329cdbe3640515bc3e0cb066c81 in impala's branch 
refs/heads/master from helifu
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3fb3657 ]

IMPALA-5351: Support storing column comment of kudu table

This patch intends to support storing column comment of kudu table
on impala side.

Belows tests passed:
1) creata kudu-table with column comment;
2) alter kudu-table with (add/alter[delete] column comment);
3) show create kudu table;
4) describe kudu-table;
5) invalidate metadata;
6) comment on column is { '' | null | 'comment' }

Change-Id: Ifb3b37eed364f12bdb3c1d7ef5be128f1475936c
Reviewed-on: http://gerrit.cloudera.org:8080/12977
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Handle column comments for Kudu tables
> --
>
> Key: IMPALA-5351
> URL: https://issues.apache.org/jira/browse/IMPALA-5351
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.9.0
>Reporter: Thomas Tauber-Marshall
>Assignee: HeLifu
>Priority: Major
>  Labels: kudu
> Fix For: Impala 3.3.0
>
>
> Currently, if a column comment is specified in a CREATE TABLE for a Kudu 
> table, we just silently drop it because Kudu does not currently support 
> column comments (KUDU-1711).
> One option would be to store the comments in HMS, but splitting the metadata 
> between HMS and Kudu  is probably more complicated than its worth.
> Most likely, we can just wait for it to be implemented on the Kudu side, but 
> before then we may want to consider issuing a warning when people use column 
> comments on Kudu tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8485) References to deprecated feature authorization policy file need to be removed

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832844#comment-16832844
 ] 

ASF subversion and git services commented on IMPALA-8485:
-

Commit 84addd2a4b74454bd929d6b2ada0f501a2c6b0cb in impala's branch 
refs/heads/master from Austin Nobis
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=84addd2 ]

IMPALA-8485: Authorization policy file clean up

This patch cleans up references to the deprecated authorization_policy_file
flag. The authz-policy.ini file is no longer created during the test config
creation. The reference is also removed from the gitignore.

Testing:
- All FE tests were run
- All authorization E2E tests were run
- test_authorization.py E2E test was updated to no longer have
  references to the authz-policy.ini file.

Change-Id: Ib1e90973cb3d5b243844d379e5cdcb2add4eec75
Reviewed-on: http://gerrit.cloudera.org:8080/13222
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> References to deprecated feature authorization policy file need to be removed
> -
>
> Key: IMPALA-8485
> URL: https://issues.apache.org/jira/browse/IMPALA-8485
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Austin Nobis
>Assignee: Austin Nobis
>Priority: Trivial
> Fix For: Impala 3.3.0
>
>
> Running the command *git grep authz-policy* produces the following output: ** 
> bin/create-test-configuration.sh:generate_config authz-policy.ini.template 
> authz-policy.ini
> fe/.gitignore:src/test/resources/authz-policy.ini
> tests/authorization/test_authorization.py:AUTH_POLICY_FILE = 
> "%s/authz-policy.ini" % WAREHOUSE
> These references to the *authz-policy.ini* should be cleaned up as the 
> authorization policy file feature is deprecated as of *IMPALA-7918.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8477) Impala Doc: Doc SHOW GRANT GROUP

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832842#comment-16832842
 ] 

ASF subversion and git services commented on IMPALA-8477:
-

Commit f5e89d6239eb7dbfa0acedd1704bcd398a197f9f in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f5e89d6 ]

IMPALA-8477: [DOCS] SHOW GRANT GROUP for Ranger authorization

Change-Id: Iadf0d5c8b43809880f194e0bc810df06bfab2075
Reviewed-on: http://gerrit.cloudera.org:8080/13220
Tested-by: Impala Public Jenkins 
Reviewed-by: Austin Nobis 
Reviewed-by: Fredy Wijaya 


> Impala Doc: Doc SHOW GRANT GROUP
> 
>
> Key: IMPALA-8477
> URL: https://issues.apache.org/jira/browse/IMPALA-8477
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
> Fix For: Impala 3.3.0
>
>
> https://gerrit.cloudera.org/#/c/13220/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-8364) Impala Doc: Remove support for authorization policy file

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8364 started by Alex Rodoni.
---
> Impala Doc: Remove support for authorization policy file
> 
>
> Key: IMPALA-8364
> URL: https://issues.apache.org/jira/browse/IMPALA-8364
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Critical
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work stopped] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8490 stopped by Alex Rodoni.
---
> Impala Doc: the file handle cache now supports S3
> -
>
> Key: IMPALA-8490
> URL: https://issues.apache.org/jira/browse/IMPALA-8490
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Sahil Takiar
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html 
> state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to 
> non-HDFS tables, such as Kudu or HBase tables, or tables that store their 
> data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 
> files.
> We should add a section to the docs similar to what we added when support for 
> remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS 
> file handles. This is controlled by the cache_remote_file_handles flag for an 
> impalad. It is recommended that you use the default value of true as this 
> caching prevents your NameNode from overloading when your cluster has many 
> remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has 
> been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a 
> call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode 
> from overloading when your cluster has many remote HDFS reads" should be 
> changed to something like "avoids an unnecessary call to 
> S3AFileSystem#getFileStatus() which reduces the number of API calls made to 
> S3."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8485) References to deprecated feature authorization policy file need to be removed

2019-05-03 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reassigned IMPALA-8485:


Assignee: Austin Nobis

> References to deprecated feature authorization policy file need to be removed
> -
>
> Key: IMPALA-8485
> URL: https://issues.apache.org/jira/browse/IMPALA-8485
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Austin Nobis
>Assignee: Austin Nobis
>Priority: Trivial
> Fix For: Impala 3.3.0
>
>
> Running the command *git grep authz-policy* produces the following output: ** 
> bin/create-test-configuration.sh:generate_config authz-policy.ini.template 
> authz-policy.ini
> fe/.gitignore:src/test/resources/authz-policy.ini
> tests/authorization/test_authorization.py:AUTH_POLICY_FILE = 
> "%s/authz-policy.ini" % WAREHOUSE
> These references to the *authz-policy.ini* should be cleaned up as the 
> authorization policy file feature is deprecated as of *IMPALA-7918.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-5351) Handle column comments for Kudu tables

2019-05-03 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-5351.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Handle column comments for Kudu tables
> --
>
> Key: IMPALA-5351
> URL: https://issues.apache.org/jira/browse/IMPALA-5351
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.9.0
>Reporter: Thomas Tauber-Marshall
>Assignee: HeLifu
>Priority: Major
>  Labels: kudu
> Fix For: Impala 3.3.0
>
>
> Currently, if a column comment is specified in a CREATE TABLE for a Kudu 
> table, we just silently drop it because Kudu does not currently support 
> column comments (KUDU-1711).
> One option would be to store the comments in HMS, but splitting the metadata 
> between HMS and Kudu  is probably more complicated than its worth.
> Most likely, we can just wait for it to be implemented on the Kudu side, but 
> before then we may want to consider issuing a warning when people use column 
> comments on Kudu tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8490 started by Alex Rodoni.
---
> Impala Doc: the file handle cache now supports S3
> -
>
> Key: IMPALA-8490
> URL: https://issues.apache.org/jira/browse/IMPALA-8490
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Sahil Takiar
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html 
> state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to 
> non-HDFS tables, such as Kudu or HBase tables, or tables that store their 
> data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 
> files.
> We should add a section to the docs similar to what we added when support for 
> remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS 
> file handles. This is controlled by the cache_remote_file_handles flag for an 
> impalad. It is recommended that you use the default value of true as this 
> caching prevents your NameNode from overloading when your cluster has many 
> remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has 
> been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a 
> call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode 
> from overloading when your cluster has many remote HDFS reads" should be 
> changed to something like "avoids an unnecessary call to 
> S3AFileSystem#getFileStatus() which reduces the number of API calls made to 
> S3."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8492) Re-enable large_string tests disabled for JVM OOM

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8492:
--
Summary: Re-enable large_string tests disabled for JVM OOM  (was: 
Re-enabled large_string tests disabled for JVM OOM)

> Re-enable large_string tests disabled for JVM OOM
> -
>
> Key: IMPALA-8492
> URL: https://issues.apache.org/jira/browse/IMPALA-8492
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: docker
>
> IMPALA-4865 fixed the issue that we had to disable tests for.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8492) Re-enabled large_string tests disabled for JVM OOM

2019-05-03 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8492:
-

 Summary: Re-enabled large_string tests disabled for JVM OOM
 Key: IMPALA-8492
 URL: https://issues.apache.org/jira/browse/IMPALA-8492
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Reporter: Tim Armstrong
Assignee: Tim Armstrong


IMPALA-4865 fixed the issue that we had to disable tests for.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8491) Run container as non-root user

2019-05-03 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8491:
-

 Summary: Run container as non-root user
 Key: IMPALA-8491
 URL: https://issues.apache.org/jira/browse/IMPALA-8491
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Reporter: Tim Armstrong
Assignee: Tim Armstrong






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8451) Default configs for admission control

2019-05-03 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8451:
-

Assignee: Tim Armstrong

> Default configs for admission control
> -
>
> Key: IMPALA-8451
> URL: https://issues.apache.org/jira/browse/IMPALA-8451
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> We probably want to have some basic admission control enabled for the 
> dockerised containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

2019-05-03 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832710#comment-16832710
 ] 

Michael Ho edited comment on IMPALA-8339 at 5/3/19 6:18 PM:


Thanks [~stakiar]. In general, Impala should be resilient when one or more 
executors fail during execution and issue transparent retries while not 
scheduling fragments on the known bad hosts. This JIRA is attacking a narrower 
subset of that problem by only addressing the query startup failure.

We need to be careful to use transparent retries only for transient recoverable 
failures. For instance, we shouldn't retry if it will lead to the same failure 
(e.g. memory limit exceeded). There may also be a change in behavior in how 
Impala exposes results to the clients. In particular, we may not be able to 
support both streaming result sets and transparent retries for all queries as 
some of them are non-deterministic so it may not be trivial to support the 
behavior of exposing a subset of the results and then replay to the point of 
failure.


was (Author: kwho):
Thanks [~stakiar]. In general, Impala should be resilient when one or more 
executors fail and issue transparent retries while not scheduling fragments on 
the known bad hosts. This JIRA is attacking a narrower subset of that problem 
by only addressing the query startup failure.

We need to be careful to use transparent retries only for transient recoverable 
failures. For instance, we shouldn't retry if it will lead to the same failure 
(e.g. memory limit exceeded). There may also be a change in behavior in how 
Impala may expose results to the clients. In particular, we may not be able to 
support both streaming result sets and transparent retries for all queries as 
some of them are non-deterministic so it may not be trivial to support the 
behavior of exposing a subset of the results and then replay to the point of 
failure.

> Coordinator should be more resilient to fragment instances startup failure
> --
>
> Key: IMPALA-8339
> URL: https://issues.apache.org/jira/browse/IMPALA-8339
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: Availability, resilience
>
> Impala currently relies on statestore for cluster membership. When an Impala 
> executor goes offline, it may take a while for statestore to declare that 
> node as unavailable and for that information to be propagated to all 
> coordinator nodes. Within this window, some coordinator nodes may still 
> attempt to issue RPCs to the faulty node, resulting in RPC failures which 
> resulted in query failures. In other words, many queries may fail to start 
> within this window until all coordinator nodes get the latest information on 
> cluster membership.
> Going forward, coordinator may need to fall back to using backup executors 
> for each fragments in case some of the executors are not available. Moreover, 
> *coordinator should treat the cluster membership information from statestore 
> (or any external source of truth e.g. etcd) as hints instead of ground truth* 
> and adjust the scheduling of fragment instances based on the availability of 
> the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Closed] (IMPALA-8370) Impala Doc: Impala works with Hive 3

2019-05-03 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8370.
---
Resolution: Not A Problem

No user-facing doc impact for IMPALA-8369 per [~vihangk1]

> Impala Doc: Impala works with Hive 3
> 
>
> Key: IMPALA-8370
> URL: https://issues.apache.org/jira/browse/IMPALA-8370
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

2019-05-03 Thread Michael Ho (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832710#comment-16832710
 ] 

Michael Ho commented on IMPALA-8339:


Thanks [~stakiar]. In general, Impala should be resilient when one or more 
executors fail and issue transparent retries while not scheduling fragments on 
the known bad hosts. This JIRA is attacking a narrower subset of that problem 
by only addressing the query startup failure.

We need to be careful to use transparent retries only for transient recoverable 
failures. For instance, we shouldn't retry if it will lead to the same failure 
(e.g. memory limit exceeded). There may also be a change in behavior in how 
Impala may expose results to the clients. In particular, we may not be able to 
support both streaming result sets and transparent retries for all queries as 
some of them are non-deterministic so it may not be trivial to support the 
behavior of exposing a subset of the results and then replay to the point of 
failure.

> Coordinator should be more resilient to fragment instances startup failure
> --
>
> Key: IMPALA-8339
> URL: https://issues.apache.org/jira/browse/IMPALA-8339
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: Availability, resilience
>
> Impala currently relies on statestore for cluster membership. When an Impala 
> executor goes offline, it may take a while for statestore to declare that 
> node as unavailable and for that information to be propagated to all 
> coordinator nodes. Within this window, some coordinator nodes may still 
> attempt to issue RPCs to the faulty node, resulting in RPC failures which 
> resulted in query failures. In other words, many queries may fail to start 
> within this window until all coordinator nodes get the latest information on 
> cluster membership.
> Going forward, coordinator may need to fall back to using backup executors 
> for each fragments in case some of the executors are not available. Moreover, 
> *coordinator should treat the cluster membership information from statestore 
> (or any external source of truth e.g. etcd) as hints instead of ground truth* 
> and adjust the scheduling of fragment instances based on the availability of 
> the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

2019-05-03 Thread Michael Ho (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-8339:
--

Assignee: Thomas Tauber-Marshall

> Coordinator should be more resilient to fragment instances startup failure
> --
>
> Key: IMPALA-8339
> URL: https://issues.apache.org/jira/browse/IMPALA-8339
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: Availability, resilience
>
> Impala currently relies on statestore for cluster membership. When an Impala 
> executor goes offline, it may take a while for statestore to declare that 
> node as unavailable and for that information to be propagated to all 
> coordinator nodes. Within this window, some coordinator nodes may still 
> attempt to issue RPCs to the faulty node, resulting in RPC failures which 
> resulted in query failures. In other words, many queries may fail to start 
> within this window until all coordinator nodes get the latest information on 
> cluster membership.
> Going forward, coordinator may need to fall back to using backup executors 
> for each fragments in case some of the executors are not available. Moreover, 
> *coordinator should treat the cluster membership information from statestore 
> (or any external source of truth e.g. etcd) as hints instead of ground truth* 
> and adjust the scheduling of fragment instances based on the availability of 
> the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

2019-05-03 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832701#comment-16832701
 ] 

Sahil Takiar commented on IMPALA-8339:
--

If we want to go with a blacklisting approach, Spark built a similar feature 
that might be worth looking at: 
https://blog.cloudera.com/blog/2017/04/blacklisting-in-apache-spark/ (although 
things are more complex in the Spark world because of task retries).

Blacklisting is also interesting in the context of query retries; e.g. if a 
query fails due to a bad disk, the failed fragments should probably be retried 
on a different set of nodes.

> Coordinator should be more resilient to fragment instances startup failure
> --
>
> Key: IMPALA-8339
> URL: https://issues.apache.org/jira/browse/IMPALA-8339
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Michael Ho
>Priority: Major
>  Labels: Availability, resilience
>
> Impala currently relies on statestore for cluster membership. When an Impala 
> executor goes offline, it may take a while for statestore to declare that 
> node as unavailable and for that information to be propagated to all 
> coordinator nodes. Within this window, some coordinator nodes may still 
> attempt to issue RPCs to the faulty node, resulting in RPC failures which 
> resulted in query failures. In other words, many queries may fail to start 
> within this window until all coordinator nodes get the latest information on 
> cluster membership.
> Going forward, coordinator may need to fall back to using backup executors 
> for each fragments in case some of the executors are not available. Moreover, 
> *coordinator should treat the cluster membership information from statestore 
> (or any external source of truth e.g. etcd) as hints instead of ground truth* 
> and adjust the scheduling of fragment instances based on the availability of 
> the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832672#comment-16832672
 ] 

ASF subversion and git services commented on IMPALA-8409:
-

Commit c2516d220da8e532b6ebdb6f3a12e7ad97c4f597 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c2516d2 ]

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- fixed a similar test, test_explain_validate_cardinality_estimates
  (the format of the line it looks for has changed, which lead to
  skipping the actual verification and accepting everything)
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Reviewed-on: http://gerrit.cloudera.org:8080/13190
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> STRINGs without stats have too low row-size in explain plan
> ---
>
> Key: IMPALA-8409
> URL: https://issues.apache.org/jira/browse/IMPALA-8409
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Minor
>  Labels: explain, statistics
>
> STRING columns without avg_size statistic are calculated into the row-size as 
> 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in  the 
> memory if they are not empty). The issue is caused by adding -1 (meaning 
> unknown) to the 12 byte slot size.
> I think that this doesn't cause problems, as the estimation is probably way 
> off without statistics anyway, but row-size >= tuple size seems like a 
> meaningful invariant that we shouldn't break.
> Reproduce:
> {code}
> create table test_row_size (s string);
> explain select * from test_row_size; 
> Result:
> ...
> WARNING: The following tables are missing relevant table and/or column 
> statistics.
> default.test_row_size
> ...
> 00:SCAN HDFS [default.test_row_size]
>partitions=1/1 files=0 size=0B
>row-size=11B cardinality=0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8369) Impala should be able to interoperate with Hive 3.1.0

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832674#comment-16832674
 ] 

ASF subversion and git services commented on IMPALA-8369:
-

Commit 99e1a39b908b81a94ef8cf4b41458c388a34755c in impala's branch 
refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=99e1a39 ]

Bump CDP_BUILD_NUMBER to 1056671

This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the
Hive and Tez patches required for building against Hive 3. With this
change we get rid of the custom builds for Hive and Tez introduced in
IMPALA-8369 and switch to more official sources of builds for the
minicluster.

Notes:
1. The tarball names and the directory to which they extract to changed
from the previous CDP_BUILD_NUMBER. Due to this we need to change the
bootstrap_toolchain and impala-config.sh so that the Hive environment
variables are set correctly.

Testing Done:
1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE
2. Did basic testing from Impala and Beeline for the testing the tez
patch
3. Currently running the full-suite of tests to make sure there are no
regressions

Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
Reviewed-on: http://gerrit.cloudera.org:8080/13213
Reviewed-by: Fredy Wijaya 
Tested-by: Impala Public Jenkins 


> Impala should be able to interoperate with Hive 3.1.0
> -
>
> Key: IMPALA-8369
> URL: https://issues.apache.org/jira/browse/IMPALA-8369
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: impala-acid
>
> Currently, Impala only works with Hive 2.1.1. Since Hive 3.1.0 has been 
> released for a while it would be good to add support for Hive 3.1.0 (HMS 
> 3.1.0). This patch will focus on ability to connect to HMS 3.1.0 and run 
> existing tests. It will not focus on adding support for newer features like 
> ACID in Hive 3.1.0 which can be taken up as separate JIRA.
> It would be good to make changes to Impala source code such that it can work 
> with both Hive 2.1.0 and Hive 3.1.0 without the need to create a separate 
> branch. However, this should be a aspirational goal. If we hit a blocker we 
> should investigate alternative approaches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8482) Include all ranger-audit-plugins runtime dependencies

2019-05-03 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832673#comment-16832673
 ] 

ASF subversion and git services commented on IMPALA-8482:
-

Commit 04be046ecc3a4d43a62dc834ea4925f979d2dc27 in impala's branch 
refs/heads/master from Fredy Wijaya
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=04be046 ]

IMPALA-8482: Package ranger-plugins-audit runtime dependencies

This patch includes ranger-plugins-audit runtime dependencies to allow
ranger-plugins-audit communicating with different audit providers, such
as solr, kafka, etc.

Testing:
- Ran core tests

Change-Id: If4c88958b064032ebaedd45808482f1179e6d806
Reviewed-on: http://gerrit.cloudera.org:8080/13216
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Include all ranger-audit-plugins runtime dependencies
> -
>
> Key: IMPALA-8482
> URL: https://issues.apache.org/jira/browse/IMPALA-8482
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> Impala needs to package ranger-audit-plugins runtime dependencies so that it 
> ranger-audit-plugins works as expected against various audit providers, e.g. 
> solr, kafka, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8489) .TestRecoverPartitions.test_post_invalidate fails with IllegalStateException with local catalog

2019-05-03 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8489:
-

 Summary: .TestRecoverPartitions.test_post_invalidate fails with 
IllegalStateException with local catalog
 Key: IMPALA-8489
 URL: https://issues.apache.org/jira/browse/IMPALA-8489
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Todd Lipcon


{noformat}
metadata/test_recover_partitions.py:279: in test_post_invalidate
"INSERT INTO TABLE %s PARTITION(i=002, p='p2') VALUES(4)" % FQ_TBL_NAME)
common/impala_test_suite.py:620: in wrapper
return function(*args, **kwargs)
common/impala_test_suite.py:628: in execute_query_expect_success
result = cls.__execute_query(impalad_client, query, query_options, user)
common/impala_test_suite.py:722: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:180: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:187: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:364: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:385: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:IllegalArgumentException: no such partition id 6244

{noformat}

The failure is reproducible for me locally with catalog v2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8488) Authorization tests for Ranger breaks on S3 in test_show_grant

2019-05-03 Thread Laszlo Gaal (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Gaal updated IMPALA-8488:

Labels: broken-build  (was: )

> Authorization tests for Ranger breaks on S3 in test_show_grant
> --
>
> Key: IMPALA-8488
> URL: https://issues.apache.org/jira/browse/IMPALA-8488
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.3.0
>Reporter: Laszlo Gaal
>Assignee: Austin Nobis
>Priority: Major
>  Labels: broken-build
>
> Stack Trace:
> {code:java}
> authorization/test_ranger.py:170: in test_show_grant
> unique_table)
> authorization/test_ranger.py:261: in _test_show_grant_basic
> [kw, id, "", "", "", "hdfs://localhost:20500" + uri, "", "all", "false"]])
> authorization/test_ranger.py:346: in _check_privileges
> assert map(columns, result.data) == expected
> E assert [['USER', 'je...-1/tmp', ...]] == [['USER', 'jen...00/tmp', ...]]
> E At index 0 diff: ['USER', 'jenkins', '', '', '', 
> 's3a://impala-test-uswest2-1/tmp', '', 'all', 'false'] != ['USER', 'jenkins', 
> '', '', '', 'hdfs://localhost:20500/tmp', '', 'all', 'false']
> E Full diff:
> E [['USER',
> E 'jenkins',
> E '',
> E '',
> E '',
> E - 's3a://impala-test-uswest2-1/tmp',
> E + 'hdfs://localhost:20500/tmp',
> E '',
> E 'all',
> E 'false']]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8488) Authorization tests for Ranger breaks on S3 in test_show_grant

2019-05-03 Thread Laszlo Gaal (JIRA)

Laszlo Gaal created IMPALA-8488:
---

 Summary: Authorization tests for Ranger breaks on S3 in 
test_show_grant
 Key: IMPALA-8488
 URL: https://issues.apache.org/jira/browse/IMPALA-8488
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 3.3.0
Reporter: Laszlo Gaal
Assignee: Austin Nobis


Stack Trace:
{code:java}
authorization/test_ranger.py:170: in test_show_grant
unique_table)
authorization/test_ranger.py:261: in _test_show_grant_basic
[kw, id, "", "", "", "hdfs://localhost:20500" + uri, "", "all", "false"]])
authorization/test_ranger.py:346: in _check_privileges
assert map(columns, result.data) == expected
E assert [['USER', 'je...-1/tmp', ...]] == [['USER', 'jen...00/tmp', ...]]
E At index 0 diff: ['USER', 'jenkins', '', '', '', 
's3a://impala-test-uswest2-1/tmp', '', 'all', 'false'] != ['USER', 'jenkins', 
'', '', '', 'hdfs://localhost:20500/tmp', '', 'all', 'false']
E Full diff:
E [['USER',
E 'jenkins',
E '',
E '',
E '',
E - 's3a://impala-test-uswest2-1/tmp',
E + 'hdfs://localhost:20500/tmp',
E '',
E 'all',
E 'false']]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8482) Include all ranger-audit-plugins runtime dependencies

2019-05-03 Thread Fredy Wijaya (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-8482.
--
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Include all ranger-audit-plugins runtime dependencies
> -
>
> Key: IMPALA-8482
> URL: https://issues.apache.org/jira/browse/IMPALA-8482
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Critical
> Fix For: Impala 3.3.0
>
>
> Impala needs to package ranger-audit-plugins runtime dependencies so that it 
> ranger-audit-plugins works as expected against various audit providers, e.g. 
> solr, kafka, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8403) Possible thread leak in impalad

2019-05-03 Thread Adriano (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832467#comment-16832467
 ] 

Adriano commented on IMPALA-8403:
-

Hi, I would like to add a repro steps  [^reproIMPALA-8403.tgz] that i followed 
to increase the same of threads (submitting a query with many fragments and 
then cancelling it once the fragments were in execution). Maybe is a dup, 
however if it is not, I appreciate your help in order to put this jira into the 
backlog.
Many thanks,
Adriano

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8403) Possible thread leak in impalad

2019-05-03 Thread Adriano (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano updated IMPALA-8403:

Attachment: reproIMPALA-8403.tgz

> Possible thread leak in impalad
> ---
>
> Key: IMPALA-8403
> URL: https://issues.apache.org/jira/browse/IMPALA-8403
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz
>
>
> The metric of thread-manager.running-threads got from 
> http://${impalad_host}:25000/metrics?json shows that the number of running 
> threads keeps increasing. (See the snapshot) This phenomenon is most 
> noticeable in coordinators.
> Maybe a counter bug or threads leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-8467) ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds

2019-05-03 Thread Daniel Becker (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Becker resolved IMPALA-8467.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds
> --
>
> Key: IMPALA-8467
> URL: https://issues.apache.org/jira/browse/IMPALA-8467
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Laszlo Gaal
>Assignee: Daniel Becker
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.3.0
>
>
> This is an example of the logged failures:
> {code:java}
> 00:57:35.147 15/106 Test #15: parquet-plain-test ...***Failed 
> 0.48 sec
> 00:57:35.147 [==] Running 4 tests from 1 test case.
> 00:57:35.147 [--] Global test environment set-up.
> 00:57:35.148 [--] 4 tests from PlainEncoding
> 00:57:35.148 [ RUN ] PlainEncoding.Basic
> 00:57:35.148 =
> 00:57:35.148 ==1922==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow 
> on address 0x7ffe328ee44c at pc 0x017c07bc bp 0x7ffe328ee2f0 sp 
> 0x7ffe328edaa0
> 00:57:35.148 READ of size 16 at 0x7ffe328ee44c thread T0
> 00:57:35.148 #0 0x17c07bb in __asan_memcpy 
> /mnt/source/llvm/llvm-5.0.1.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:466
> 00:57:35.149 #1 0x1837a26 in void 
> impala::ParquetPlainEncoder::DecodeNoBoundsCheck (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, 
> impala::TimestampValue*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:332:3
> 00:57:35.149 #2 0x1837a26 in int 
> impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, 
> impala::TimestampValue*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:223
> 00:57:35.150 #3 0x1837216 in void 
> impala::TestTypeWidening (parquet::Type::type)3>(impala::TimestampValue const&, int) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:115:22
> 00:57:35.150 #4 0x18122f7 in impala::PlainEncoding_Basic_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:155:3
> 00:57:35.151 #5 0x4fa6142 in void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4fa6142)
> 00:57:35.151 #6 0x4f9d909 in testing::Test::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9d909)
> 00:57:35.152 #7 0x4f9da57 in testing::TestInfo::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9da57)
> 00:57:35.152 #8 0x4f9db34 in testing::TestCase::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9db34)
> 00:57:35.153 #9 0x4f9edb7 in testing::internal::UnitTestImpl::RunAllTests() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9edb7)
> 00:57:35.153 #10 0x4f9f092 in testing::UnitTest::Run() 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9f092)
> 00:57:35.153 #11 0x181655f in main 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:491:1
> 00:57:35.154 #12 0x7ff7a10b2c04 in __libc_start_main 
> (/lib64/libc.so.6+0x21c04)
> 00:57:35.154 #13 0x17069d6 in _start 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x17069d6)
> 00:57:35.154
> 00:57:35.154 Address 0x7ffe328ee44c is located in stack of thread T0 at 
> offset 332 in frame
> 00:57:35.154 #0 0x18378df in int 
> impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, 
> impala::TimestampValue*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:208
> 00:57:35.155
> 00:57:35.155 This frame has 4 object(s):
> 00:57:35.155 [32, 40) 'ref.tmp.i' (line 327)
> 00:57:35.155 [64, 68) 'ref.tmp2.i' (line 327)
> 00:57:35.155 [80, 96) 'ref.tmp5.i' (line 327)
> 00:57:35.155 [112, 120) 'ref.tmp6.i' (line 327) <== Memory access at offset 
> 332 overflows this variable
> 00:57:35.155 HINT: this may be a false positive if your program uses some

[jira] [Resolved] (IMPALA-8468) buildall.sh should warn that asan/ubsan/... are exclusive

2019-05-03 Thread Csaba Ringhofer (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-8468.
-
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> buildall.sh should warn that asan/ubsan/... are exclusive
> -
>
> Key: IMPALA-8468
> URL: https://issues.apache.org/jira/browse/IMPALA-8468
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Priority: Minor
> Fix For: Impala 3.3.0
>
>
> "buidall.sh -asan -ubsan -tsan -tidy" runs without giving any warning, but 
> actually only tsan will have effect. See 
> https://github.com/apache/impala/blob/931a8f0ba7f45d5b1608e62aff397b517b943e95/buildall.sh#L308
>  for the logic behind this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-8487) Debug page "Cancel" action actually unregisters query

2019-05-03 Thread Alice Fan (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Fan reassigned IMPALA-8487:
-

Assignee: Alice Fan

> Debug page "Cancel" action actually unregisters query
> -
>
> Key: IMPALA-8487
> URL: https://issues.apache.org/jira/browse/IMPALA-8487
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.0
>Reporter: Alice Fan
>Assignee: Alice Fan
>Priority: Major
>  Labels: query-lifecycle
>
> Currently, if a running query is cancelled from the impalad WebUI (debug 
> page), impala will unregister the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7031) Add additional info to query canceled from http endpoint

2019-05-03 Thread Alice Fan (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Fan updated IMPALA-7031:
--
Summary: Add additional info to query canceled from http endpoint  (was: 
Debug page "Cancel" action actually unregisters query)

> Add additional info to query canceled from http endpoint
> 
>
> Key: IMPALA-7031
> URL: https://issues.apache.org/jira/browse/IMPALA-7031
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Affects Versions: Impala 3.0
>Reporter: Adriano
>Assignee: Alice Fan
>Priority: Major
>  Labels: query-lifecycle
> Attachments: Screen Shot 2018-07-20 at 10.19.42.png
>
>
> In big clusters with many jdbc/odbc users, in order to save resources are 
> often implemented scripts that automatically cancel queries (e.g. long 
> running queries) (the scripts typically are using the Impala Webui).
> Typical Scenario:
>  # A jdbc/odbc client submit a query
>  # The Coordinator start the query execution
>  # The query is cancelled from the Coordinator WebUi
>  # The jdbc/odbc client ask to the Coordinator the query status 
> (GetOperationStatus)
>  # The Coordinator answer "unknown query ID" (as the query was cancelled)
>  # For the client perspective the query failed for "unknown query ID"
> Currently, if a running query is cancelled from the impalad WebUI, the client 
> will just receive an 'unknown query ID' error on the next 
> fetch/getOperationStatus attempt. It would be good to be able to explicitly 
> call out this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8486) test_udf_update_via_drop and test_udf_update_via_create fail on local catalog

2019-05-03 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-8486:
-

 Summary: test_udf_update_via_drop and test_udf_update_via_create 
fail on local catalog
 Key: IMPALA-8486
 URL: https://issues.apache.org/jira/browse/IMPALA-8486
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Affects Versions: Impala 3.3.0
Reporter: Tim Armstrong
Assignee: Todd Lipcon


{noformat}
 TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: 
{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
'disable_codegen': False, 'abort_on_error': 1, 
'exec_single_node_rows_threshold': 0} | table_format: text/none] 
tests/query_test/test_udfs.py:541: in test_udf_update_via_drop
self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"])
tests/query_test/test_udfs.py:52: in _run_query_all_impalads
assert result.data == expected
E   assert ['Old UDF'] == ['New UDF']
E At index 0 diff: 'Old UDF' != 'New UDF'
E Full diff:
E - ['Old UDF']
E + ['New UDF']

{noformat}


The tests are checking that the local UDF caches on each impalad get 
invalidated by a drop/create of a function referencing the HDFS file containing 
the UDF. The test fails because the local catalog, unlike the regular catalog, 
doesn't invalidate LibCache entries upon receiving a catalog update.

I looked at this for long enough to realise that the invalidation mechanism is 
fundamentally broken - it doesn't work with dedicated executors. It also 
creates a race between the statestore updates and queries referencing the UDFs 
- if the queries win the race, then they can incorrectly use the old version 
that should have been invalidated.

I think this is a potentially problematic issue because old JAR/SO versions 
could persist in the cache indefinitely if old versions are overwritten in 
place.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

46 matches

Mail list logo