[jira] [Work started] (IMPALA-8494) Impala Doc: Document GRANT/REVOKE privilege to GROUP
[ https://issues.apache.org/jira/browse/IMPALA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8494 started by Alex Rodoni. --- > Impala Doc: Document GRANT/REVOKE privilege to GROUP > > > Key: IMPALA-8494 > URL: https://issues.apache.org/jira/browse/IMPALA-8494 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8493) Impala Doc: Document GRANT/REVOKE privilege to USER
[ https://issues.apache.org/jira/browse/IMPALA-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8493 started by Alex Rodoni. --- > Impala Doc: Document GRANT/REVOKE privilege to USER > --- > > Key: IMPALA-8493 > URL: https://issues.apache.org/jira/browse/IMPALA-8493 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8494) Impala Doc: Document GRANT/REVOKE privilege to GROUP
Alex Rodoni created IMPALA-8494: --- Summary: Impala Doc: Document GRANT/REVOKE privilege to GROUP Key: IMPALA-8494 URL: https://issues.apache.org/jira/browse/IMPALA-8494 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8493) Impala Doc: Document GRANT/REVOKE privilege to USER
Alex Rodoni created IMPALA-8493: --- Summary: Impala Doc: Document GRANT/REVOKE privilege to USER Key: IMPALA-8493 URL: https://issues.apache.org/jira/browse/IMPALA-8493 Project: IMPALA Issue Type: Sub-task Components: Docs Reporter: Alex Rodoni Assignee: Alex Rodoni -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-3816) Codegen perf-critical loops in Sorter
[ https://issues.apache.org/jira/browse/IMPALA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-3816: - Assignee: Abhishek Rawat (was: Tianyi Wang) > Codegen perf-critical loops in Sorter > - > > Key: IMPALA-3816 > URL: https://issues.apache.org/jira/browse/IMPALA-3816 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.7.0 >Reporter: Tim Armstrong >Assignee: Abhishek Rawat >Priority: Minor > Labels: codegen > Attachments: percentile query profile.txt, tpch_30.txt > > > In the sorter, we codegen the comparator function but call it indirectly via > a function pointer. We should consider codegening the perf-critical loops so > that we can make the comparator function call direct and inlinable. Inlining > the comparison will be very beneficial if it is trivial, e.g. order by a > numeric column: I expect sorts on simple keys will get noticably faster. > We should also be able to get rid of FreeLocalAllocations() calls for most > comparators, although I'm not sure what the best way to approach that is. > The Partition() loop is the most perf-critical, followed by InsertionSort(). > We also don't do this yet for the TopN node, see IMPALA-3815. > Mostafa's analysis: > While evaluating Sort performance I noticed that the codegened compare > function is not inlined which results in large overhead per row. > Expected speedup is 10-15% > {code} > /// Returns a negative value if lhs is less than rhs, a positive value if > lhs is > /// greater than rhs, or 0 if they are equal. All exprs > (ordering_exprs_lhs_ and > /// ordering_exprs_rhs_) must have been prepared and opened before calling > this, > /// i.e. 'sort_key_exprs' in the constructor must have been opened. > int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const { > return codegend_compare_fn_ == NULL ? > CompareInterpreted(lhs, rhs) : > (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(), > ordering_expr_evals_rhs_.data(), lhs, rhs); > } > {code} > From Perf > {code} > │bool Sorter::TupleSorter::Less(const TupleRow* lhs, const > TupleRow* rhs) { > >▒ > 7.43 │ push %rbp > > >▒ > 3.23 │ mov%rsp,%rbp > > >▒ > 9.44 │ push %r12 > > >▒ > 2.69 │ push %rbx > > >▒ > 3.89 │ mov%rsi,%r12 > > >▒ > 2.98 │ mov%rdi,%rbx > > >▒ > 6.06 │ sub$0x10,%rsp > > >◆ >│ --num_comparisons_till_free_; > > >▒ >│ DCHECK_GE(num_comparisons_till_free_, 0); > > >▒ >│ if (UNLIKELY(num_comparisons_till_free_ == 0)) { >
[jira] [Commented] (IMPALA-3816) Codegen perf-critical loops in Sorter
[ https://issues.apache.org/jira/browse/IMPALA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832926#comment-16832926 ] Tim Armstrong commented on IMPALA-3816: --- See IMPALA-4065 - Tianyi had a patch that tackled both. > Codegen perf-critical loops in Sorter > - > > Key: IMPALA-3816 > URL: https://issues.apache.org/jira/browse/IMPALA-3816 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.7.0 >Reporter: Tim Armstrong >Assignee: Abhishek Rawat >Priority: Minor > Labels: codegen > Attachments: percentile query profile.txt, tpch_30.txt > > > In the sorter, we codegen the comparator function but call it indirectly via > a function pointer. We should consider codegening the perf-critical loops so > that we can make the comparator function call direct and inlinable. Inlining > the comparison will be very beneficial if it is trivial, e.g. order by a > numeric column: I expect sorts on simple keys will get noticably faster. > We should also be able to get rid of FreeLocalAllocations() calls for most > comparators, although I'm not sure what the best way to approach that is. > The Partition() loop is the most perf-critical, followed by InsertionSort(). > We also don't do this yet for the TopN node, see IMPALA-3815. > Mostafa's analysis: > While evaluating Sort performance I noticed that the codegened compare > function is not inlined which results in large overhead per row. > Expected speedup is 10-15% > {code} > /// Returns a negative value if lhs is less than rhs, a positive value if > lhs is > /// greater than rhs, or 0 if they are equal. All exprs > (ordering_exprs_lhs_ and > /// ordering_exprs_rhs_) must have been prepared and opened before calling > this, > /// i.e. 'sort_key_exprs' in the constructor must have been opened. > int ALWAYS_INLINE Compare(const TupleRow* lhs, const TupleRow* rhs) const { > return codegend_compare_fn_ == NULL ? > CompareInterpreted(lhs, rhs) : > (*codegend_compare_fn_)(ordering_expr_evals_lhs_.data(), > ordering_expr_evals_rhs_.data(), lhs, rhs); > } > {code} > From Perf > {code} > │bool Sorter::TupleSorter::Less(const TupleRow* lhs, const > TupleRow* rhs) { > >▒ > 7.43 │ push %rbp > > >▒ > 3.23 │ mov%rsp,%rbp > > >▒ > 9.44 │ push %r12 > > >▒ > 2.69 │ push %rbx > > >▒ > 3.89 │ mov%rsi,%r12 > > >▒ > 2.98 │ mov%rdi,%rbx > > >▒ > 6.06 │ sub$0x10,%rsp > > >◆ >│ --num_comparisons_till_free_; > > >▒ >│ DCHECK_GE(num_comparisons_till_free_, 0); > > >▒ >│ if (UNLIKELY(num_comparisons_till_free_ == 0)) { >
[jira] [Assigned] (IMPALA-4065) Inline comparator calls into TopN::InsertBatch()
[ https://issues.apache.org/jira/browse/IMPALA-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-4065: - Assignee: Abhishek Rawat (was: Tianyi Wang) > Inline comparator calls into TopN::InsertBatch() > > > Key: IMPALA-4065 > URL: https://issues.apache.org/jira/browse/IMPALA-4065 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.7.0 >Reporter: Tim Armstrong >Assignee: Abhishek Rawat >Priority: Minor > Labels: codegen, ramp-up > Attachments: > 0001-WIP-IMPALA-3816-IMPALA-4065-full-TupleRowComparator-.patch > > > This is the more interesting follow-on from IMPALA-3815. We should inline the > Compare() calls in the codegen'd TopN code to avoid the indirect function > pointer call. > The tricky aspect is that the Compare() calls are called from > std::priority_queue, and we don't have a way to force-inline those functions > at the moment. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8294) Inconsistent updating of BytesRead* counters
[ https://issues.apache.org/jira/browse/IMPALA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832923#comment-16832923 ] Tim Armstrong commented on IMPALA-8294: --- If you look in HdfsScanNodeBase, bytes_read_ follows the good pattern where it gets updated by DiskIoMgr in real time. bytes_read_local_ follows the bad pattern. > Inconsistent updating of BytesRead* counters > > > Key: IMPALA-8294 > URL: https://issues.apache.org/jira/browse/IMPALA-8294 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Lars Volker >Assignee: Abhishek Rawat >Priority: Major > Labels: observability, profile > > Some of the BytesRead* counters in profiles (e.g. BytesReadLocal) are only > updated once a query finishes successfully. This leads to confusion because > queries that are still running or failed look like they did not read data > locally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8294) Inconsistent updating of BytesRead* counters
[ https://issues.apache.org/jira/browse/IMPALA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8294: - Assignee: Abhishek Rawat > Inconsistent updating of BytesRead* counters > > > Key: IMPALA-8294 > URL: https://issues.apache.org/jira/browse/IMPALA-8294 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0 >Reporter: Lars Volker >Assignee: Abhishek Rawat >Priority: Major > Labels: observability, profile > > Some of the BytesRead* counters in profiles (e.g. BytesReadLocal) are only > updated once a query finishes successfully. This leads to confusion because > queries that are still running or failed look like they did not read data > locally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8364) Impala Doc: Remove support for authorization policy file
[ https://issues.apache.org/jira/browse/IMPALA-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-8364: Description: https://gerrit.cloudera.org/#/c/13235/ > Impala Doc: Remove support for authorization policy file > > > Key: IMPALA-8364 > URL: https://issues.apache.org/jira/browse/IMPALA-8364 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Critical > Labels: future_release_doc, in_33 > > https://gerrit.cloudera.org/#/c/13235/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8341) Data cache for remote reads
[ https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832887#comment-16832887 ] Alex Rodoni commented on IMPALA-8341: - @kwho Is this a high priority feature to be documented? As a performance improvement or a scalability improvement? > Data cache for remote reads > --- > > Key: IMPALA-8341 > URL: https://issues.apache.org/jira/browse/IMPALA-8341 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Critical > > When running in public cloud (e.g. AWS with S3) or in certain private cloud > settings (e.g. data stored in object store), the computation and storage are > no longer co-located. This breaks the typical pattern in which Impala query > fragment instances are scheduled at where the data is located. In this > setting, the network bandwidth requirement of both the nics and the top of > rack switches will go up quite a lot as the network traffic includes the data > fetch in addition to the shuffling exchange traffic of intermediate results. > To mitigate the pressure on the network, one can build a storage backed cache > at the compute nodes to cache the working set. With deterministic scan range > scheduling, each compute node should hold non-overlapping partitions of the > data set. > An initial prototype of the cache was posted here: > [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a > better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. > not holding the lock while doing IO). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7613) Support round(DECIMAL) with non-constant second argument
[ https://issues.apache.org/jira/browse/IMPALA-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-7613: - Assignee: Abhishek Rawat > Support round(DECIMAL) with non-constant second argument > > > Key: IMPALA-7613 > URL: https://issues.apache.org/jira/browse/IMPALA-7613 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.0, Impala 2.12.0 >Reporter: Tim Armstrong >Assignee: Abhishek Rawat >Priority: Major > Labels: decimal, ramp-up > > Sometimes users want to round to a precision that is data-driven (e.g. using > a lookup table). They can't currently do this with decimal. I think we could > support this by just using the input decimal type as the output type when the > second argument is non-constant. > {noformat} > select round(l_tax, l_linenumber) from tpch.lineitem limit 5; > Query: select round(l_tax, l_linenumber) from tpch.lineitem limit 5 > Query submitted at: 2018-09-24 11:03:10 (Coordinator: > http://tarmstrong-box:25000) > ERROR: AnalysisException: round() must be called with a constant second > argument. > {noformat} > Motivated by a user trying to do something like this; > http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-round-function-does-not-return-expected-result/m-p/80200#M4906 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8450) Add support for zstd and lz4 in parquet
[ https://issues.apache.org/jira/browse/IMPALA-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8450: - Assignee: Abhishek Rawat > Add support for zstd and lz4 in parquet > --- > > Key: IMPALA-8450 > URL: https://issues.apache.org/jira/browse/IMPALA-8450 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Abhishek Rawat >Priority: Major > Labels: parquet > > PARQUET-970 added these codecs to the format. We have LZ4 in the toolchain > already and I just added zstd: https://gerrit.cloudera.org/#/c/13079/ > These codec probably offer a better trade-off of density and speed than > snappy or gzip. > https://github.com/apache/arrow/pull/807/files might be a useful crib sheet > for how to add a compressor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8341) Data cache for remote reads
[ https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho resolved IMPALA-8341. Resolution: Fixed Initial implementation merged. Improvement may be needed in the future. > Data cache for remote reads > --- > > Key: IMPALA-8341 > URL: https://issues.apache.org/jira/browse/IMPALA-8341 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Critical > > When running in public cloud (e.g. AWS with S3) or in certain private cloud > settings (e.g. data stored in object store), the computation and storage are > no longer co-located. This breaks the typical pattern in which Impala query > fragment instances are scheduled at where the data is located. In this > setting, the network bandwidth requirement of both the nics and the top of > rack switches will go up quite a lot as the network traffic includes the data > fetch in addition to the shuffling exchange traffic of intermediate results. > To mitigate the pressure on the network, one can build a storage backed cache > at the compute nodes to cache the working set. With deterministic scan range > scheduling, each compute node should hold non-overlapping partitions of the > data set. > An initial prototype of the cache was posted here: > [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a > better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. > not holding the lock while doing IO). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8341) Data cache for remote reads
[ https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832843#comment-16832843 ] ASF subversion and git services commented on IMPALA-8341: - Commit 2ece4c9b2e114a5e8873c5ac69e75b84c62bf5bd in impala's branch refs/heads/master from Michael Ho [ https://gitbox.apache.org/repos/asf?p=impala.git;h=2ece4c9 ] IMPALA-8341: Data cache for remote reads This is a patch based on PhilZ's prototype: https://gerrit.cloudera.org/#/c/12683/ This change implements an IO data cache which is backed by local storage. It implicitly relies on the OS page cache management to shuffle data between memory and the storage device. This is useful for caching data read from remote filesystems (e.g. remote HDFS data node, S3, ABFS, ADLS). A data cache is divided into one or more partitions based on the configuration string which is a list of directories, separated by comma, followed by the storage capacity per directory. An example configuration string is like the following: --data_cache_config=/data/0,/data/1:150GB In the configuration above, the cache may use up to 300GB of storage space, with 150GB max for /data/0 and /data/1 respectively. Each partition has a meta-data cache which tracks the mappings of cache keys to the locations of the cached data. A cache key is a tuple of (file's name, file's modification time, file offset) and a cache entry is a tuple of (backing file, offset in the backing file, length of the cached data, optional checksum). Note that the cache currently doesn't support overlapping ranges. In other words, if the cache contains an entry of a file for range [m, m+4MB), a lookup for [m+4K, m+8K) will miss in the cache. In practice, we haven't seen this as a problem but this may require further evaluation in the future. Each partition stores its set of cached data in backing files created on local storage. When inserting new data into the cache, the data is appended to the current backing file in use. The storage consumption of each cache entry counts towards the quota of that partition. When a partition reaches its capacity, the least recently used (LRU) data in that partition is evicted. Evicted data is removed from the underlying storage by punching holes in the backing file it's stored in. As a backing file reaches a certain size (by default 4TB), new data will stop being appended to it and a new file will be created instead. Note that due to hole punching, the backing file is actually sparse. When the number of backing files per partition exceeds, --data_cache_max_files_per_partition, files are deleted in the order in which they are created. Stale cache entries referencing deleted files are erased lazily or evicted due to inactivity. Optionally, checksumming can be enabled to verify read from the cache is consistent with what was inserted and to verify that multiple attempted insertions with the same cache key have the same cache content. Checksumming is enabled by default for debug builds. To probe for cached data in the cache, the interface Lookup() is used; To insert data into the cache, the interface Store() is used. Please note that eviction happens inline currently during Store(). This patch also added two startup flags for start-impala-cluster.py: '--data_cache_dir' specifies the base directory in which each Impalad creates the caching directory '--data_cache_size' specifies the capacity string for each cache directory. Testing done: - added a new BE and EE test - exhaustive (debug, release) builds with cache enabled - core ASAN build with cache enabled Perf: - 16-streams TPCDS at 3TB in a 20 node S3 cluster shows about 30% improvement over runs without the cache. Each node has a cache size of 150GB per node. The performance is at parity with a configuration of a HDFS cluster using EBS as the storage. Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Reviewed-on: http://gerrit.cloudera.org:8080/12987 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Data cache for remote reads > --- > > Key: IMPALA-8341 > URL: https://issues.apache.org/jira/browse/IMPALA-8341 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Critical > > When running in public cloud (e.g. AWS with S3) or in certain private cloud > settings (e.g. data stored in object store), the computation and storage are > no longer co-located. This breaks the typical pattern in which Impala query > fragment instances are scheduled at where the data is located. In this > setting, the network bandwidth requirement of both the nics and the top of > rack switches will go up quite a lot as the network traffic includes the data > fetch in addition
[jira] [Commented] (IMPALA-5351) Handle column comments for Kudu tables
[ https://issues.apache.org/jira/browse/IMPALA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832845#comment-16832845 ] ASF subversion and git services commented on IMPALA-5351: - Commit 3fb36570ae0c7329cdbe3640515bc3e0cb066c81 in impala's branch refs/heads/master from helifu [ https://gitbox.apache.org/repos/asf?p=impala.git;h=3fb3657 ] IMPALA-5351: Support storing column comment of kudu table This patch intends to support storing column comment of kudu table on impala side. Belows tests passed: 1) creata kudu-table with column comment; 2) alter kudu-table with (add/alter[delete] column comment); 3) show create kudu table; 4) describe kudu-table; 5) invalidate metadata; 6) comment on column is { '' | null | 'comment' } Change-Id: Ifb3b37eed364f12bdb3c1d7ef5be128f1475936c Reviewed-on: http://gerrit.cloudera.org:8080/12977 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Handle column comments for Kudu tables > -- > > Key: IMPALA-5351 > URL: https://issues.apache.org/jira/browse/IMPALA-5351 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.9.0 >Reporter: Thomas Tauber-Marshall >Assignee: HeLifu >Priority: Major > Labels: kudu > Fix For: Impala 3.3.0 > > > Currently, if a column comment is specified in a CREATE TABLE for a Kudu > table, we just silently drop it because Kudu does not currently support > column comments (KUDU-1711). > One option would be to store the comments in HMS, but splitting the metadata > between HMS and Kudu is probably more complicated than its worth. > Most likely, we can just wait for it to be implemented on the Kudu side, but > before then we may want to consider issuing a warning when people use column > comments on Kudu tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8485) References to deprecated feature authorization policy file need to be removed
[ https://issues.apache.org/jira/browse/IMPALA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832844#comment-16832844 ] ASF subversion and git services commented on IMPALA-8485: - Commit 84addd2a4b74454bd929d6b2ada0f501a2c6b0cb in impala's branch refs/heads/master from Austin Nobis [ https://gitbox.apache.org/repos/asf?p=impala.git;h=84addd2 ] IMPALA-8485: Authorization policy file clean up This patch cleans up references to the deprecated authorization_policy_file flag. The authz-policy.ini file is no longer created during the test config creation. The reference is also removed from the gitignore. Testing: - All FE tests were run - All authorization E2E tests were run - test_authorization.py E2E test was updated to no longer have references to the authz-policy.ini file. Change-Id: Ib1e90973cb3d5b243844d379e5cdcb2add4eec75 Reviewed-on: http://gerrit.cloudera.org:8080/13222 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > References to deprecated feature authorization policy file need to be removed > - > > Key: IMPALA-8485 > URL: https://issues.apache.org/jira/browse/IMPALA-8485 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Austin Nobis >Assignee: Austin Nobis >Priority: Trivial > Fix For: Impala 3.3.0 > > > Running the command *git grep authz-policy* produces the following output: ** > bin/create-test-configuration.sh:generate_config authz-policy.ini.template > authz-policy.ini > fe/.gitignore:src/test/resources/authz-policy.ini > tests/authorization/test_authorization.py:AUTH_POLICY_FILE = > "%s/authz-policy.ini" % WAREHOUSE > These references to the *authz-policy.ini* should be cleaned up as the > authorization policy file feature is deprecated as of *IMPALA-7918.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8477) Impala Doc: Doc SHOW GRANT GROUP
[ https://issues.apache.org/jira/browse/IMPALA-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832842#comment-16832842 ] ASF subversion and git services commented on IMPALA-8477: - Commit f5e89d6239eb7dbfa0acedd1704bcd398a197f9f in impala's branch refs/heads/master from Alex Rodoni [ https://gitbox.apache.org/repos/asf?p=impala.git;h=f5e89d6 ] IMPALA-8477: [DOCS] SHOW GRANT GROUP for Ranger authorization Change-Id: Iadf0d5c8b43809880f194e0bc810df06bfab2075 Reviewed-on: http://gerrit.cloudera.org:8080/13220 Tested-by: Impala Public Jenkins Reviewed-by: Austin Nobis Reviewed-by: Fredy Wijaya > Impala Doc: Doc SHOW GRANT GROUP > > > Key: IMPALA-8477 > URL: https://issues.apache.org/jira/browse/IMPALA-8477 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > Fix For: Impala 3.3.0 > > > https://gerrit.cloudera.org/#/c/13220/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8364) Impala Doc: Remove support for authorization policy file
[ https://issues.apache.org/jira/browse/IMPALA-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8364 started by Alex Rodoni. --- > Impala Doc: Remove support for authorization policy file > > > Key: IMPALA-8364 > URL: https://issues.apache.org/jira/browse/IMPALA-8364 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Critical > Labels: future_release_doc, in_33 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work stopped] (IMPALA-8490) Impala Doc: the file handle cache now supports S3
[ https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8490 stopped by Alex Rodoni. --- > Impala Doc: the file handle cache now supports S3 > - > > Key: IMPALA-8490 > URL: https://issues.apache.org/jira/browse/IMPALA-8490 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Sahil Takiar >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > > https://impala.apache.org/docs/build/html/topics/impala_scalability.html > state: > {quote} > Because this feature only involves HDFS data files, it does not apply to > non-HDFS tables, such as Kudu or HBase tables, or tables that store their > data on cloud services such as S3 or ADLS. > {quote} > This section should be updated because the file handle cache now supports S3 > files. > We should add a section to the docs similar to what we added when support for > remote HDFS files was added to the file handle cache: > {quote} > In Impala 3.2 and higher, file handle caching also applies to remote HDFS > file handles. This is controlled by the cache_remote_file_handles flag for an > impalad. It is recommended that you use the default value of true as this > caching prevents your NameNode from overloading when your cluster has many > remote HDFS reads. > {quote} > Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has > been added as an impalad startup option (the flag is enabled by default). > Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a > call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode > from overloading when your cluster has many remote HDFS reads" should be > changed to something like "avoids an unnecessary call to > S3AFileSystem#getFileStatus() which reduces the number of API calls made to > S3." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8485) References to deprecated feature authorization policy file need to be removed
[ https://issues.apache.org/jira/browse/IMPALA-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya reassigned IMPALA-8485: Assignee: Austin Nobis > References to deprecated feature authorization policy file need to be removed > - > > Key: IMPALA-8485 > URL: https://issues.apache.org/jira/browse/IMPALA-8485 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Austin Nobis >Assignee: Austin Nobis >Priority: Trivial > Fix For: Impala 3.3.0 > > > Running the command *git grep authz-policy* produces the following output: ** > bin/create-test-configuration.sh:generate_config authz-policy.ini.template > authz-policy.ini > fe/.gitignore:src/test/resources/authz-policy.ini > tests/authorization/test_authorization.py:AUTH_POLICY_FILE = > "%s/authz-policy.ini" % WAREHOUSE > These references to the *authz-policy.ini* should be cleaned up as the > authorization policy file feature is deprecated as of *IMPALA-7918.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-5351) Handle column comments for Kudu tables
[ https://issues.apache.org/jira/browse/IMPALA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-5351. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Handle column comments for Kudu tables > -- > > Key: IMPALA-5351 > URL: https://issues.apache.org/jira/browse/IMPALA-5351 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 2.9.0 >Reporter: Thomas Tauber-Marshall >Assignee: HeLifu >Priority: Major > Labels: kudu > Fix For: Impala 3.3.0 > > > Currently, if a column comment is specified in a CREATE TABLE for a Kudu > table, we just silently drop it because Kudu does not currently support > column comments (KUDU-1711). > One option would be to store the comments in HMS, but splitting the metadata > between HMS and Kudu is probably more complicated than its worth. > Most likely, we can just wait for it to be implemented on the Kudu side, but > before then we may want to consider issuing a warning when people use column > comments on Kudu tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8490) Impala Doc: the file handle cache now supports S3
[ https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8490 started by Alex Rodoni. --- > Impala Doc: the file handle cache now supports S3 > - > > Key: IMPALA-8490 > URL: https://issues.apache.org/jira/browse/IMPALA-8490 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Sahil Takiar >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > > https://impala.apache.org/docs/build/html/topics/impala_scalability.html > state: > {quote} > Because this feature only involves HDFS data files, it does not apply to > non-HDFS tables, such as Kudu or HBase tables, or tables that store their > data on cloud services such as S3 or ADLS. > {quote} > This section should be updated because the file handle cache now supports S3 > files. > We should add a section to the docs similar to what we added when support for > remote HDFS files was added to the file handle cache: > {quote} > In Impala 3.2 and higher, file handle caching also applies to remote HDFS > file handles. This is controlled by the cache_remote_file_handles flag for an > impalad. It is recommended that you use the default value of true as this > caching prevents your NameNode from overloading when your cluster has many > remote HDFS reads. > {quote} > Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has > been added as an impalad startup option (the flag is enabled by default). > Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a > call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode > from overloading when your cluster has many remote HDFS reads" should be > changed to something like "avoids an unnecessary call to > S3AFileSystem#getFileStatus() which reduces the number of API calls made to > S3." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8492) Re-enable large_string tests disabled for JVM OOM
[ https://issues.apache.org/jira/browse/IMPALA-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8492: -- Summary: Re-enable large_string tests disabled for JVM OOM (was: Re-enabled large_string tests disabled for JVM OOM) > Re-enable large_string tests disabled for JVM OOM > - > > Key: IMPALA-8492 > URL: https://issues.apache.org/jira/browse/IMPALA-8492 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: docker > > IMPALA-4865 fixed the issue that we had to disable tests for. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8492) Re-enabled large_string tests disabled for JVM OOM
Tim Armstrong created IMPALA-8492: - Summary: Re-enabled large_string tests disabled for JVM OOM Key: IMPALA-8492 URL: https://issues.apache.org/jira/browse/IMPALA-8492 Project: IMPALA Issue Type: Sub-task Components: Infrastructure Reporter: Tim Armstrong Assignee: Tim Armstrong IMPALA-4865 fixed the issue that we had to disable tests for. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8491) Run container as non-root user
Tim Armstrong created IMPALA-8491: - Summary: Run container as non-root user Key: IMPALA-8491 URL: https://issues.apache.org/jira/browse/IMPALA-8491 Project: IMPALA Issue Type: Sub-task Components: Infrastructure Reporter: Tim Armstrong Assignee: Tim Armstrong -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8451) Default configs for admission control
[ https://issues.apache.org/jira/browse/IMPALA-8451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8451: - Assignee: Tim Armstrong > Default configs for admission control > - > > Key: IMPALA-8451 > URL: https://issues.apache.org/jira/browse/IMPALA-8451 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > > We probably want to have some basic admission control enabled for the > dockerised containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure
[ https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832710#comment-16832710 ] Michael Ho edited comment on IMPALA-8339 at 5/3/19 6:18 PM: Thanks [~stakiar]. In general, Impala should be resilient when one or more executors fail during execution and issue transparent retries while not scheduling fragments on the known bad hosts. This JIRA is attacking a narrower subset of that problem by only addressing the query startup failure. We need to be careful to use transparent retries only for transient recoverable failures. For instance, we shouldn't retry if it will lead to the same failure (e.g. memory limit exceeded). There may also be a change in behavior in how Impala exposes results to the clients. In particular, we may not be able to support both streaming result sets and transparent retries for all queries as some of them are non-deterministic so it may not be trivial to support the behavior of exposing a subset of the results and then replay to the point of failure. was (Author: kwho): Thanks [~stakiar]. In general, Impala should be resilient when one or more executors fail and issue transparent retries while not scheduling fragments on the known bad hosts. This JIRA is attacking a narrower subset of that problem by only addressing the query startup failure. We need to be careful to use transparent retries only for transient recoverable failures. For instance, we shouldn't retry if it will lead to the same failure (e.g. memory limit exceeded). There may also be a change in behavior in how Impala may expose results to the clients. In particular, we may not be able to support both streaming result sets and transparent retries for all queries as some of them are non-deterministic so it may not be trivial to support the behavior of exposing a subset of the results and then replay to the point of failure. > Coordinator should be more resilient to fragment instances startup failure > -- > > Key: IMPALA-8339 > URL: https://issues.apache.org/jira/browse/IMPALA-8339 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: Availability, resilience > > Impala currently relies on statestore for cluster membership. When an Impala > executor goes offline, it may take a while for statestore to declare that > node as unavailable and for that information to be propagated to all > coordinator nodes. Within this window, some coordinator nodes may still > attempt to issue RPCs to the faulty node, resulting in RPC failures which > resulted in query failures. In other words, many queries may fail to start > within this window until all coordinator nodes get the latest information on > cluster membership. > Going forward, coordinator may need to fall back to using backup executors > for each fragments in case some of the executors are not available. Moreover, > *coordinator should treat the cluster membership information from statestore > (or any external source of truth e.g. etcd) as hints instead of ground truth* > and adjust the scheduling of fragment instances based on the availability of > the executors from the coordinator's perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-8370) Impala Doc: Impala works with Hive 3
[ https://issues.apache.org/jira/browse/IMPALA-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8370. --- Resolution: Not A Problem No user-facing doc impact for IMPALA-8369 per [~vihangk1] > Impala Doc: Impala works with Hive 3 > > > Key: IMPALA-8370 > URL: https://issues.apache.org/jira/browse/IMPALA-8370 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure
[ https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832710#comment-16832710 ] Michael Ho commented on IMPALA-8339: Thanks [~stakiar]. In general, Impala should be resilient when one or more executors fail and issue transparent retries while not scheduling fragments on the known bad hosts. This JIRA is attacking a narrower subset of that problem by only addressing the query startup failure. We need to be careful to use transparent retries only for transient recoverable failures. For instance, we shouldn't retry if it will lead to the same failure (e.g. memory limit exceeded). There may also be a change in behavior in how Impala may expose results to the clients. In particular, we may not be able to support both streaming result sets and transparent retries for all queries as some of them are non-deterministic so it may not be trivial to support the behavior of exposing a subset of the results and then replay to the point of failure. > Coordinator should be more resilient to fragment instances startup failure > -- > > Key: IMPALA-8339 > URL: https://issues.apache.org/jira/browse/IMPALA-8339 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: Availability, resilience > > Impala currently relies on statestore for cluster membership. When an Impala > executor goes offline, it may take a while for statestore to declare that > node as unavailable and for that information to be propagated to all > coordinator nodes. Within this window, some coordinator nodes may still > attempt to issue RPCs to the faulty node, resulting in RPC failures which > resulted in query failures. In other words, many queries may fail to start > within this window until all coordinator nodes get the latest information on > cluster membership. > Going forward, coordinator may need to fall back to using backup executors > for each fragments in case some of the executors are not available. Moreover, > *coordinator should treat the cluster membership information from statestore > (or any external source of truth e.g. etcd) as hints instead of ground truth* > and adjust the scheduling of fragment instances based on the availability of > the executors from the coordinator's perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure
[ https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Ho reassigned IMPALA-8339: -- Assignee: Thomas Tauber-Marshall > Coordinator should be more resilient to fragment instances startup failure > -- > > Key: IMPALA-8339 > URL: https://issues.apache.org/jira/browse/IMPALA-8339 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Reporter: Michael Ho >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: Availability, resilience > > Impala currently relies on statestore for cluster membership. When an Impala > executor goes offline, it may take a while for statestore to declare that > node as unavailable and for that information to be propagated to all > coordinator nodes. Within this window, some coordinator nodes may still > attempt to issue RPCs to the faulty node, resulting in RPC failures which > resulted in query failures. In other words, many queries may fail to start > within this window until all coordinator nodes get the latest information on > cluster membership. > Going forward, coordinator may need to fall back to using backup executors > for each fragments in case some of the executors are not available. Moreover, > *coordinator should treat the cluster membership information from statestore > (or any external source of truth e.g. etcd) as hints instead of ground truth* > and adjust the scheduling of fragment instances based on the availability of > the executors from the coordinator's perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure
[ https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832701#comment-16832701 ] Sahil Takiar commented on IMPALA-8339: -- If we want to go with a blacklisting approach, Spark built a similar feature that might be worth looking at: https://blog.cloudera.com/blog/2017/04/blacklisting-in-apache-spark/ (although things are more complex in the Spark world because of task retries). Blacklisting is also interesting in the context of query retries; e.g. if a query fails due to a bad disk, the failed fragments should probably be retried on a different set of nodes. > Coordinator should be more resilient to fragment instances startup failure > -- > > Key: IMPALA-8339 > URL: https://issues.apache.org/jira/browse/IMPALA-8339 > Project: IMPALA > Issue Type: Improvement > Components: Distributed Exec >Reporter: Michael Ho >Priority: Major > Labels: Availability, resilience > > Impala currently relies on statestore for cluster membership. When an Impala > executor goes offline, it may take a while for statestore to declare that > node as unavailable and for that information to be propagated to all > coordinator nodes. Within this window, some coordinator nodes may still > attempt to issue RPCs to the faulty node, resulting in RPC failures which > resulted in query failures. In other words, many queries may fail to start > within this window until all coordinator nodes get the latest information on > cluster membership. > Going forward, coordinator may need to fall back to using backup executors > for each fragments in case some of the executors are not available. Moreover, > *coordinator should treat the cluster membership information from statestore > (or any external source of truth e.g. etcd) as hints instead of ground truth* > and adjust the scheduling of fragment instances based on the availability of > the executors from the coordinator's perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan
[ https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832672#comment-16832672 ] ASF subversion and git services commented on IMPALA-8409: - Commit c2516d220da8e532b6ebdb6f3a12e7ad97c4f597 in impala's branch refs/heads/master from Csaba Ringhofer [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c2516d2 ] IMPALA-8409: Fix row-size for STRING columns with unknown stats Explain returned row-size=11B for STRING columns without statistics. The issue was caused by adding -1 (meaning unknown) to the 12 byte slot size (sizeof(StringValue)). The code in TupleDescriptor.java tried to handle this by checking if the size is -1, but it was already 11 at this point. There is more potential for cleanup, but I wanted to keep this change minimal. Testing: - revived some tests in CatalogTest.java that were removed in 2013 due to flakiness - added an EE test that checks row size with and without stats - fixed a similar test, test_explain_validate_cardinality_estimates (the format of the line it looks for has changed, which lead to skipping the actual verification and accepting everything) - ran core FE and EE tests Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5 Reviewed-on: http://gerrit.cloudera.org:8080/13190 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > STRINGs without stats have too low row-size in explain plan > --- > > Key: IMPALA-8409 > URL: https://issues.apache.org/jira/browse/IMPALA-8409 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.2.0 >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Minor > Labels: explain, statistics > > STRING columns without avg_size statistic are calculated into the row-size as > 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the > memory if they are not empty). The issue is caused by adding -1 (meaning > unknown) to the 12 byte slot size. > I think that this doesn't cause problems, as the estimation is probably way > off without statistics anyway, but row-size >= tuple size seems like a > meaningful invariant that we shouldn't break. > Reproduce: > {code} > create table test_row_size (s string); > explain select * from test_row_size; > Result: > ... > WARNING: The following tables are missing relevant table and/or column > statistics. > default.test_row_size > ... > 00:SCAN HDFS [default.test_row_size] >partitions=1/1 files=0 size=0B >row-size=11B cardinality=0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8369) Impala should be able to interoperate with Hive 3.1.0
[ https://issues.apache.org/jira/browse/IMPALA-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832674#comment-16832674 ] ASF subversion and git services commented on IMPALA-8369: - Commit 99e1a39b908b81a94ef8cf4b41458c388a34755c in impala's branch refs/heads/master from Vihang Karajgaonkar [ https://gitbox.apache.org/repos/asf?p=impala.git;h=99e1a39 ] Bump CDP_BUILD_NUMBER to 1056671 This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the Hive and Tez patches required for building against Hive 3. With this change we get rid of the custom builds for Hive and Tez introduced in IMPALA-8369 and switch to more official sources of builds for the minicluster. Notes: 1. The tarball names and the directory to which they extract to changed from the previous CDP_BUILD_NUMBER. Due to this we need to change the bootstrap_toolchain and impala-config.sh so that the Hive environment variables are set correctly. Testing Done: 1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE 2. Did basic testing from Impala and Beeline for the testing the tez patch 3. Currently running the full-suite of tests to make sure there are no regressions Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0 Reviewed-on: http://gerrit.cloudera.org:8080/13213 Reviewed-by: Fredy Wijaya Tested-by: Impala Public Jenkins > Impala should be able to interoperate with Hive 3.1.0 > - > > Key: IMPALA-8369 > URL: https://issues.apache.org/jira/browse/IMPALA-8369 > Project: IMPALA > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: impala-acid > > Currently, Impala only works with Hive 2.1.1. Since Hive 3.1.0 has been > released for a while it would be good to add support for Hive 3.1.0 (HMS > 3.1.0). This patch will focus on ability to connect to HMS 3.1.0 and run > existing tests. It will not focus on adding support for newer features like > ACID in Hive 3.1.0 which can be taken up as separate JIRA. > It would be good to make changes to Impala source code such that it can work > with both Hive 2.1.0 and Hive 3.1.0 without the need to create a separate > branch. However, this should be a aspirational goal. If we hit a blocker we > should investigate alternative approaches. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8482) Include all ranger-audit-plugins runtime dependencies
[ https://issues.apache.org/jira/browse/IMPALA-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832673#comment-16832673 ] ASF subversion and git services commented on IMPALA-8482: - Commit 04be046ecc3a4d43a62dc834ea4925f979d2dc27 in impala's branch refs/heads/master from Fredy Wijaya [ https://gitbox.apache.org/repos/asf?p=impala.git;h=04be046 ] IMPALA-8482: Package ranger-plugins-audit runtime dependencies This patch includes ranger-plugins-audit runtime dependencies to allow ranger-plugins-audit communicating with different audit providers, such as solr, kafka, etc. Testing: - Ran core tests Change-Id: If4c88958b064032ebaedd45808482f1179e6d806 Reviewed-on: http://gerrit.cloudera.org:8080/13216 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Include all ranger-audit-plugins runtime dependencies > - > > Key: IMPALA-8482 > URL: https://issues.apache.org/jira/browse/IMPALA-8482 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Critical > Fix For: Impala 3.3.0 > > > Impala needs to package ranger-audit-plugins runtime dependencies so that it > ranger-audit-plugins works as expected against various audit providers, e.g. > solr, kafka, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8489) .TestRecoverPartitions.test_post_invalidate fails with IllegalStateException with local catalog
Tim Armstrong created IMPALA-8489: - Summary: .TestRecoverPartitions.test_post_invalidate fails with IllegalStateException with local catalog Key: IMPALA-8489 URL: https://issues.apache.org/jira/browse/IMPALA-8489 Project: IMPALA Issue Type: Bug Components: Catalog Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Todd Lipcon {noformat} metadata/test_recover_partitions.py:279: in test_post_invalidate "INSERT INTO TABLE %s PARTITION(i=002, p='p2') VALUES(4)" % FQ_TBL_NAME) common/impala_test_suite.py:620: in wrapper return function(*args, **kwargs) common/impala_test_suite.py:628: in execute_query_expect_success result = cls.__execute_query(impalad_client, query, query_options, user) common/impala_test_suite.py:722: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:180: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:187: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:364: in __execute_query self.wait_for_finished(handle) beeswax/impala_beeswax.py:385: in wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery aborted:IllegalArgumentException: no such partition id 6244 {noformat} The failure is reproducible for me locally with catalog v2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8488) Authorization tests for Ranger breaks on S3 in test_show_grant
[ https://issues.apache.org/jira/browse/IMPALA-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laszlo Gaal updated IMPALA-8488: Labels: broken-build (was: ) > Authorization tests for Ranger breaks on S3 in test_show_grant > -- > > Key: IMPALA-8488 > URL: https://issues.apache.org/jira/browse/IMPALA-8488 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.3.0 >Reporter: Laszlo Gaal >Assignee: Austin Nobis >Priority: Major > Labels: broken-build > > Stack Trace: > {code:java} > authorization/test_ranger.py:170: in test_show_grant > unique_table) > authorization/test_ranger.py:261: in _test_show_grant_basic > [kw, id, "", "", "", "hdfs://localhost:20500" + uri, "", "all", "false"]]) > authorization/test_ranger.py:346: in _check_privileges > assert map(columns, result.data) == expected > E assert [['USER', 'je...-1/tmp', ...]] == [['USER', 'jen...00/tmp', ...]] > E At index 0 diff: ['USER', 'jenkins', '', '', '', > 's3a://impala-test-uswest2-1/tmp', '', 'all', 'false'] != ['USER', 'jenkins', > '', '', '', 'hdfs://localhost:20500/tmp', '', 'all', 'false'] > E Full diff: > E [['USER', > E 'jenkins', > E '', > E '', > E '', > E - 's3a://impala-test-uswest2-1/tmp', > E + 'hdfs://localhost:20500/tmp', > E '', > E 'all', > E 'false']]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8488) Authorization tests for Ranger breaks on S3 in test_show_grant
Laszlo Gaal created IMPALA-8488: --- Summary: Authorization tests for Ranger breaks on S3 in test_show_grant Key: IMPALA-8488 URL: https://issues.apache.org/jira/browse/IMPALA-8488 Project: IMPALA Issue Type: Bug Affects Versions: Impala 3.3.0 Reporter: Laszlo Gaal Assignee: Austin Nobis Stack Trace: {code:java} authorization/test_ranger.py:170: in test_show_grant unique_table) authorization/test_ranger.py:261: in _test_show_grant_basic [kw, id, "", "", "", "hdfs://localhost:20500" + uri, "", "all", "false"]]) authorization/test_ranger.py:346: in _check_privileges assert map(columns, result.data) == expected E assert [['USER', 'je...-1/tmp', ...]] == [['USER', 'jen...00/tmp', ...]] E At index 0 diff: ['USER', 'jenkins', '', '', '', 's3a://impala-test-uswest2-1/tmp', '', 'all', 'false'] != ['USER', 'jenkins', '', '', '', 'hdfs://localhost:20500/tmp', '', 'all', 'false'] E Full diff: E [['USER', E 'jenkins', E '', E '', E '', E - 's3a://impala-test-uswest2-1/tmp', E + 'hdfs://localhost:20500/tmp', E '', E 'all', E 'false']]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8482) Include all ranger-audit-plugins runtime dependencies
[ https://issues.apache.org/jira/browse/IMPALA-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fredy Wijaya resolved IMPALA-8482. -- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Include all ranger-audit-plugins runtime dependencies > - > > Key: IMPALA-8482 > URL: https://issues.apache.org/jira/browse/IMPALA-8482 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Reporter: Fredy Wijaya >Assignee: Fredy Wijaya >Priority: Critical > Fix For: Impala 3.3.0 > > > Impala needs to package ranger-audit-plugins runtime dependencies so that it > ranger-audit-plugins works as expected against various audit providers, e.g. > solr, kafka, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8403) Possible thread leak in impalad
[ https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832467#comment-16832467 ] Adriano commented on IMPALA-8403: - Hi, I would like to add a repro steps [^reproIMPALA-8403.tgz] that i followed to increase the same of threads (submitting a query with many fragments and then cancelling it once the fragments were in execution). Maybe is a dup, however if it is not, I appreciate your help in order to put this jira into the backlog. Many thanks, Adriano > Possible thread leak in impalad > --- > > Key: IMPALA-8403 > URL: https://issues.apache.org/jira/browse/IMPALA-8403 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 2.12.0 >Reporter: Quanlong Huang >Priority: Major > Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz > > > The metric of thread-manager.running-threads got from > http://${impalad_host}:25000/metrics?json shows that the number of running > threads keeps increasing. (See the snapshot) This phenomenon is most > noticeable in coordinators. > Maybe a counter bug or threads leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8403) Possible thread leak in impalad
[ https://issues.apache.org/jira/browse/IMPALA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano updated IMPALA-8403: Attachment: reproIMPALA-8403.tgz > Possible thread leak in impalad > --- > > Key: IMPALA-8403 > URL: https://issues.apache.org/jira/browse/IMPALA-8403 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 2.12.0 >Reporter: Quanlong Huang >Priority: Major > Attachments: image-2019-04-10-11-15-11-321.png, reproIMPALA-8403.tgz > > > The metric of thread-manager.running-threads got from > http://${impalad_host}:25000/metrics?json shows that the number of running > threads keeps increasing. (See the snapshot) This phenomenon is most > noticeable in coordinators. > Maybe a counter bug or threads leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8467) ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds
[ https://issues.apache.org/jira/browse/IMPALA-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Becker resolved IMPALA-8467. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 > ParquetPlainEncoder::Decode leads to multiple test failures in ASAN builds > -- > > Key: IMPALA-8467 > URL: https://issues.apache.org/jira/browse/IMPALA-8467 > Project: IMPALA > Issue Type: Bug >Reporter: Laszlo Gaal >Assignee: Daniel Becker >Priority: Blocker > Labels: broken-build > Fix For: Impala 3.3.0 > > > This is an example of the logged failures: > {code:java} > 00:57:35.147 15/106 Test #15: parquet-plain-test ...***Failed > 0.48 sec > 00:57:35.147 [==] Running 4 tests from 1 test case. > 00:57:35.147 [--] Global test environment set-up. > 00:57:35.148 [--] 4 tests from PlainEncoding > 00:57:35.148 [ RUN ] PlainEncoding.Basic > 00:57:35.148 = > 00:57:35.148 ==1922==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow > on address 0x7ffe328ee44c at pc 0x017c07bc bp 0x7ffe328ee2f0 sp > 0x7ffe328edaa0 > 00:57:35.148 READ of size 16 at 0x7ffe328ee44c thread T0 > 00:57:35.148 #0 0x17c07bb in __asan_memcpy > /mnt/source/llvm/llvm-5.0.1.src-p1/projects/compiler-rt/lib/asan/asan_interceptors.cc:466 > 00:57:35.149 #1 0x1837a26 in void > impala::ParquetPlainEncoder::DecodeNoBoundsCheck (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, > impala::TimestampValue*) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:332:3 > 00:57:35.149 #2 0x1837a26 in int > impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, > impala::TimestampValue*) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:223 > 00:57:35.150 #3 0x1837216 in void > impala::TestTypeWidening (parquet::Type::type)3>(impala::TimestampValue const&, int) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:115:22 > 00:57:35.150 #4 0x18122f7 in impala::PlainEncoding_Basic_Test::TestBody() > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:155:3 > 00:57:35.151 #5 0x4fa6142 in void > testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4fa6142) > 00:57:35.151 #6 0x4f9d909 in testing::Test::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9d909) > 00:57:35.152 #7 0x4f9da57 in testing::TestInfo::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9da57) > 00:57:35.152 #8 0x4f9db34 in testing::TestCase::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9db34) > 00:57:35.153 #9 0x4f9edb7 in testing::internal::UnitTestImpl::RunAllTests() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9edb7) > 00:57:35.153 #10 0x4f9f092 in testing::UnitTest::Run() > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x4f9f092) > 00:57:35.153 #11 0x181655f in main > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-plain-test.cc:491:1 > 00:57:35.154 #12 0x7ff7a10b2c04 in __libc_start_main > (/lib64/libc.so.6+0x21c04) > 00:57:35.154 #13 0x17069d6 in _start > (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/exec/parquet/parquet-plain-test+0x17069d6) > 00:57:35.154 > 00:57:35.154 Address 0x7ffe328ee44c is located in stack of thread T0 at > offset 332 in frame > 00:57:35.154 #0 0x18378df in int > impala::ParquetPlainEncoder::Decode (parquet::Type::type)3>(unsigned char const*, unsigned char const*, int, > impala::TimestampValue*) > /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/exec/parquet/parquet-common.h:208 > 00:57:35.155 > 00:57:35.155 This frame has 4 object(s): > 00:57:35.155 [32, 40) 'ref.tmp.i' (line 327) > 00:57:35.155 [64, 68) 'ref.tmp2.i' (line 327) > 00:57:35.155 [80, 96) 'ref.tmp5.i' (line 327) > 00:57:35.155 [112, 120) 'ref.tmp6.i' (line 327) <== Memory access at offset > 332 overflows this variable > 00:57:35.155 HINT: this may be a false positive if your program uses some
[jira] [Resolved] (IMPALA-8468) buildall.sh should warn that asan/ubsan/... are exclusive
[ https://issues.apache.org/jira/browse/IMPALA-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-8468. - Resolution: Fixed Fix Version/s: Impala 3.3.0 > buildall.sh should warn that asan/ubsan/... are exclusive > - > > Key: IMPALA-8468 > URL: https://issues.apache.org/jira/browse/IMPALA-8468 > Project: IMPALA > Issue Type: Bug > Components: Backend, Infrastructure >Affects Versions: Impala 3.2.0 >Reporter: Csaba Ringhofer >Priority: Minor > Fix For: Impala 3.3.0 > > > "buidall.sh -asan -ubsan -tsan -tidy" runs without giving any warning, but > actually only tsan will have effect. See > https://github.com/apache/impala/blob/931a8f0ba7f45d5b1608e62aff397b517b943e95/buildall.sh#L308 > for the logic behind this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8487) Debug page "Cancel" action actually unregisters query
[ https://issues.apache.org/jira/browse/IMPALA-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alice Fan reassigned IMPALA-8487: - Assignee: Alice Fan > Debug page "Cancel" action actually unregisters query > - > > Key: IMPALA-8487 > URL: https://issues.apache.org/jira/browse/IMPALA-8487 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.0 >Reporter: Alice Fan >Assignee: Alice Fan >Priority: Major > Labels: query-lifecycle > > Currently, if a running query is cancelled from the impalad WebUI (debug > page), impala will unregister the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7031) Add additional info to query canceled from http endpoint
[ https://issues.apache.org/jira/browse/IMPALA-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alice Fan updated IMPALA-7031: -- Summary: Add additional info to query canceled from http endpoint (was: Debug page "Cancel" action actually unregisters query) > Add additional info to query canceled from http endpoint > > > Key: IMPALA-7031 > URL: https://issues.apache.org/jira/browse/IMPALA-7031 > Project: IMPALA > Issue Type: New Feature > Components: Distributed Exec >Affects Versions: Impala 3.0 >Reporter: Adriano >Assignee: Alice Fan >Priority: Major > Labels: query-lifecycle > Attachments: Screen Shot 2018-07-20 at 10.19.42.png > > > In big clusters with many jdbc/odbc users, in order to save resources are > often implemented scripts that automatically cancel queries (e.g. long > running queries) (the scripts typically are using the Impala Webui). > Typical Scenario: > # A jdbc/odbc client submit a query > # The Coordinator start the query execution > # The query is cancelled from the Coordinator WebUi > # The jdbc/odbc client ask to the Coordinator the query status > (GetOperationStatus) > # The Coordinator answer "unknown query ID" (as the query was cancelled) > # For the client perspective the query failed for "unknown query ID" > Currently, if a running query is cancelled from the impalad WebUI, the client > will just receive an 'unknown query ID' error on the next > fetch/getOperationStatus attempt. It would be good to be able to explicitly > call out this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8486) test_udf_update_via_drop and test_udf_update_via_create fail on local catalog
Tim Armstrong created IMPALA-8486: - Summary: test_udf_update_via_drop and test_udf_update_via_create fail on local catalog Key: IMPALA-8486 URL: https://issues.apache.org/jira/browse/IMPALA-8486 Project: IMPALA Issue Type: Improvement Components: Catalog Affects Versions: Impala 3.3.0 Reporter: Tim Armstrong Assignee: Todd Lipcon {noformat} TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] tests/query_test/test_udfs.py:541: in test_udf_update_via_drop self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"]) tests/query_test/test_udfs.py:52: in _run_query_all_impalads assert result.data == expected E assert ['Old UDF'] == ['New UDF'] E At index 0 diff: 'Old UDF' != 'New UDF' E Full diff: E - ['Old UDF'] E + ['New UDF'] {noformat} The tests are checking that the local UDF caches on each impalad get invalidated by a drop/create of a function referencing the HDFS file containing the UDF. The test fails because the local catalog, unlike the regular catalog, doesn't invalidate LibCache entries upon receiving a catalog update. I looked at this for long enough to realise that the invalidation mechanism is fundamentally broken - it doesn't work with dedicated executors. It also creates a race between the statestore updates and queries referencing the UDFs - if the queries win the race, then they can incorrectly use the old version that should have been invalidated. I think this is a potentially problematic issue because old JAR/SO versions could persist in the cache indefinitely if old versions are overwritten in place. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org