[jira] [Commented] (IMPALA-5633) Bloom filters underestimate false positive probability
[ https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184916#comment-17184916 ] Tim Armstrong commented on IMPALA-5633: --- Ok, thanks for confirming. I thought about it a bit and it seems like there are probably a lot of reasons why the estimate might be biased high. I'll try to find some time to test it out on those TPC-DS queries. > Bloom filters underestimate false positive probability > -- > > Key: IMPALA-5633 > URL: https://issues.apache.org/jira/browse/IMPALA-5633 > Project: IMPALA > Issue Type: Bug > Components: Perf Investigation >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > Block Bloom filters have a higher false positive rate than standard Bloom > filter, due to the uneven distribution of keys between buckets. We should > change the code to match the theory, using an approximation from the paper > that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom > Filters" by Putze et al. > For a false positive probability of 1%, this would increase filter size by > about 10% and a decrease filter false positive probability by 50%. However, > this is obscured by the coarseness of the fact that filters are constrained > to have a size in bytes that is a power of two. Loosening that restriction is > potential future work. > See > https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5633) Bloom filters underestimate false positive probability
[ https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184908#comment-17184908 ] Jim Apple commented on IMPALA-5633: --- I think it's just a mistake. IIRC, [~mmokhtar] and I had a brief discussion about it a few years ago and he thought it was an error as well. At the time, I wrote a patch to change it to 1% and ran some perf tests, which were inconclusive, likely due to me not picking judiciously which tests would demonstrate a difference, so the patch went in my backlog and has yet to escape. I'd be delighted to +2 a patch that changed it and demonstrated the perf impact. It might be good to combine it with a patch to allow non-power-of-two sizes, which can be done without a modulo via {code:c} uint64_t libfilter_block_index(uint64_t hash, uint64_t num_buckets) { return (((unsigned __int128)hash) * ((unsigned __int128)num_buckets)) >> 64; } {code} > Bloom filters underestimate false positive probability > -- > > Key: IMPALA-5633 > URL: https://issues.apache.org/jira/browse/IMPALA-5633 > Project: IMPALA > Issue Type: Bug > Components: Perf Investigation >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > Block Bloom filters have a higher false positive rate than standard Bloom > filter, due to the uneven distribution of keys between buckets. We should > change the code to match the theory, using an approximation from the paper > that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom > Filters" by Putze et al. > For a false positive probability of 1%, this would increase filter size by > about 10% and a decrease filter false positive probability by 50%. However, > this is obscured by the coarseness of the fact that filters are constrained > to have a size in bytes that is a power of two. Loosening that restriction is > potential future work. > See > https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-5633) Bloom filters underestimate false positive probability
[ https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Apple updated IMPALA-5633: -- Description: Block Bloom filters have a higher false positive rate than standard Bloom filter, due to the uneven distribution of keys between buckets. We should change the code to match the theory, using an approximation from the paper that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom Filters" by Putze et al. For a false positive probability of 1%, this would increase filter size by about 10% and a decrease filter false positive probability by 50%. However, this is obscured by the coarseness of the fact that filters are constrained to have a size in bytes that is a power of two. Loosening that restriction is potential future work. See https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9 was: {{bloom-filter.cc}} says: {noformat} fpp = (1 - exp(-BUCKET_WORDS * ndv/space))^BUCKET_WORDS where space is in bits. {noformat} This is true only discounting the false positive rate induced by hash collisions. Using {{w}}}-bit hashes, with {{n}} distinct values gives a false positive rate of {noformat} n / exp2(w) + (1.0 - n / exp2(w)) * ((1 - exp(-BUCKET_WORDS * ndv/space))^BUCKET_WORDS) {noformat} This starts to become a factor as {{n}} approaches {{1 << w}}. It also suggests using bitmaps rather than Bloom filters for large {{n}}, since the false positive rate for a bitmap is {noformat} n / exp2(w) + (1.0 - n / exp2(w)) * (1 - exp(-ndv/space)) {noformat} This is lower than the current BF false positive rate for high {{n}} and low relative space (aka, high false positive probability). Summary: Bloom filters underestimate false positive probability (was: Bloom filters underestimate false positive probability for high NDV) > Bloom filters underestimate false positive probability > -- > > Key: IMPALA-5633 > URL: https://issues.apache.org/jira/browse/IMPALA-5633 > Project: IMPALA > Issue Type: Bug > Components: Perf Investigation >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > Block Bloom filters have a higher false positive rate than standard Bloom > filter, due to the uneven distribution of keys between buckets. We should > change the code to match the theory, using an approximation from the paper > that introduced block Bloom filters, "Cache-, Hash- and Space-Efficient Bloom > Filters" by Putze et al. > For a false positive probability of 1%, this would increase filter size by > about 10% and a decrease filter false positive probability by 50%. However, > this is obscured by the coarseness of the fact that filters are constrained > to have a size in bytes that is a power of two. Loosening that restriction is > potential future work. > See > https://github.com/apache/kudu/commit/d1190c2b06a6eef91b21fd4a0b5fb76534b4e9f9 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements
[ https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184857#comment-17184857 ] Kevin Yu commented on IMPALA-10101: --- Thanks again, Tim. > Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python > requirements > - > > Key: IMPALA-10101 > URL: https://issues.apache.org/jira/browse/IMPALA-10101 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.9.0 > Environment: ubuntu 1604 x86_64 >Reporter: Kevin Yu >Priority: Major > > Reproduce steps: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements
[ https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183752#comment-17183752 ] Kevin Yu edited comment on IMPALA-10101 at 8/26/20, 2:20 AM: - Thanks Tim for the reply. I have to look into the python infra scripts to figure it out how to build older versions of cloudera impala branches. was (Author: keens312): This Tim for the reply. I have to look into the python infra scripts to figure it out how to build older versions of cloudera impala branches. > Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python > requirements > - > > Key: IMPALA-10101 > URL: https://issues.apache.org/jira/browse/IMPALA-10101 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.9.0 > Environment: ubuntu 1604 x86_64 >Reporter: Kevin Yu >Priority: Major > > Reproduce steps: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9544) Replace Intel's SSE instructions with ARM's NEON instructions
[ https://issues.apache.org/jira/browse/IMPALA-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaorenhai resolved IMPALA-9544. Resolution: Fixed > Replace Intel's SSE instructions with ARM's NEON instructions > - > > Key: IMPALA-9544 > URL: https://issues.apache.org/jira/browse/IMPALA-9544 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > Replace Intel's SSE instructions with ARM's NEON instructions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10088) DeadLock while run unifiedbetests on aarch64 platform
[ https://issues.apache.org/jira/browse/IMPALA-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaorenhai resolved IMPALA-10088. - Assignee: zhaorenhai Resolution: Fixed > DeadLock while run unifiedbetests on aarch64 platform > - > > Key: IMPALA-10088 > URL: https://issues.apache.org/jira/browse/IMPALA-10088 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, > will happen deadlock. > The stacktrace is as following: > > {code:java} > (gdb) bt > #0 0x83099544 in __GI___nanosleep (requested_time=0xffc71698, > remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28 > #1 0x054cf144 in base::internal::SpinLockDelay (w=0x77385b0 > , value=2, loop=727956) at > /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86 > #2 0x05529800 in SpinLock::SlowLock() () > #3 0x055fb5c4 in tcmalloc::ThreadCache::InitModule() () > #4 0x05743374 in tc_calloc () > #5 0x81c737f4 in _dlerror_run (operate=operate@entry=0x81c73158 > , args=0xffc717d8, args@entry=0xffc717f8) at dlerror.c:140 > #6 0x81c731f0 in __dlsym (handle=, name= out>) at dlsym.c:70 > #7 0x0310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 > "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74 > #8 0x0310ef1c in (anonymous namespace)::InitIfNecessary () at > /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100 > #9 0x0310f0b4 in dl_iterate_phdr (callback=0x81620d18 > <_Unwind_IteratePhdrCallback>, data=0xffc71900) at > /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158 > #10 0x816215b4 in _Unwind_Find_FDE (pc=0x8161f98f > <_Unwind_Backtrace+79>, bases=bases@entry=0xffc72438) at > ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469 > #11 0x8161dfdc in uw_frame_state_for > (context=context@entry=0xffc72110, fs=fs@entry=0xffc719f0) at > ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249 > #12 0x8161ef3c in uw_init_context_1 > (context=context@entry=0xffc72110, outer_cfa=0xffc72b50, > outer_cfa@entry=0xffc72be0, outer_ra=0x55298d8 > ) > at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578 > #13 0x8161f990 in _Unwind_Backtrace (trace=0x5529a48 > , > trace_argument=0xffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283 > #14 0x055298d8 in GetStackTrace_libgcc(void**, int, int) () > #15 0x05529db4 in GetStackTrace(void**, int, int) () > #16 0x055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) () > {code} > I think this is same issue with > [https://github.com/gperftools/gperftools/issues/1184] , > because the issue will happen when I building gperftools both with libunwind > and without libunwind . > > And KUDU also has same issue: > https://issues.apache.org/jira/browse/KUDU-3072 > I think the solution in following link is not correct > [https://gerrit.cloudera.org/#/c/15420/] > On aarch64 , the method of getting stacktrace is not same with arm. > I think the correct solution of getting stacktrace is should like this: > [https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc] > or just use libunwind or use gcc. > > But I think the gperftools maybe not the root cause of this issue, because > both gperftools and libunwind now can support aarch64 perfectly (with > libunwind or gcc). > Maybe this commit of kudu has bug? > [https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29] > Because on x86, the gperftools will not use libunwind or libgcc to > getstacktrace, so the issue will not happen. > I tried : > {code:java} > #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) > #define HOOK_DL_ITERATE_PHDR 1 > #endif > {code} > change to > {code:java} > #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__) > #define HOOK_DL_ITERATE_PHDR 1 > #endif{code} > the deadlock issue will not happen. > > [~tarmstr...@cloudera.com] [~tlipcon] [~adar] > What do you think about this issue? how to fix it? any suggestion? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-10088) DeadLock while run unifiedbetests on aarch64 platform
[ https://issues.apache.org/jira/browse/IMPALA-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaorenhai reopened IMPALA-10088: - > DeadLock while run unifiedbetests on aarch64 platform > - > > Key: IMPALA-10088 > URL: https://issues.apache.org/jira/browse/IMPALA-10088 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, > will happen deadlock. > The stacktrace is as following: > > {code:java} > (gdb) bt > #0 0x83099544 in __GI___nanosleep (requested_time=0xffc71698, > remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28 > #1 0x054cf144 in base::internal::SpinLockDelay (w=0x77385b0 > , value=2, loop=727956) at > /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86 > #2 0x05529800 in SpinLock::SlowLock() () > #3 0x055fb5c4 in tcmalloc::ThreadCache::InitModule() () > #4 0x05743374 in tc_calloc () > #5 0x81c737f4 in _dlerror_run (operate=operate@entry=0x81c73158 > , args=0xffc717d8, args@entry=0xffc717f8) at dlerror.c:140 > #6 0x81c731f0 in __dlsym (handle=, name= out>) at dlsym.c:70 > #7 0x0310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 > "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74 > #8 0x0310ef1c in (anonymous namespace)::InitIfNecessary () at > /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100 > #9 0x0310f0b4 in dl_iterate_phdr (callback=0x81620d18 > <_Unwind_IteratePhdrCallback>, data=0xffc71900) at > /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158 > #10 0x816215b4 in _Unwind_Find_FDE (pc=0x8161f98f > <_Unwind_Backtrace+79>, bases=bases@entry=0xffc72438) at > ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469 > #11 0x8161dfdc in uw_frame_state_for > (context=context@entry=0xffc72110, fs=fs@entry=0xffc719f0) at > ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249 > #12 0x8161ef3c in uw_init_context_1 > (context=context@entry=0xffc72110, outer_cfa=0xffc72b50, > outer_cfa@entry=0xffc72be0, outer_ra=0x55298d8 > ) > at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578 > #13 0x8161f990 in _Unwind_Backtrace (trace=0x5529a48 > , > trace_argument=0xffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283 > #14 0x055298d8 in GetStackTrace_libgcc(void**, int, int) () > #15 0x05529db4 in GetStackTrace(void**, int, int) () > #16 0x055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) () > {code} > I think this is same issue with > [https://github.com/gperftools/gperftools/issues/1184] , > because the issue will happen when I building gperftools both with libunwind > and without libunwind . > > And KUDU also has same issue: > https://issues.apache.org/jira/browse/KUDU-3072 > I think the solution in following link is not correct > [https://gerrit.cloudera.org/#/c/15420/] > On aarch64 , the method of getting stacktrace is not same with arm. > I think the correct solution of getting stacktrace is should like this: > [https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc] > or just use libunwind or use gcc. > > But I think the gperftools maybe not the root cause of this issue, because > both gperftools and libunwind now can support aarch64 perfectly (with > libunwind or gcc). > Maybe this commit of kudu has bug? > [https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29] > Because on x86, the gperftools will not use libunwind or libgcc to > getstacktrace, so the issue will not happen. > I tried : > {code:java} > #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) > #define HOOK_DL_ITERATE_PHDR 1 > #endif > {code} > change to > {code:java} > #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__) > #define HOOK_DL_ITERATE_PHDR 1 > #endif{code} > the deadlock issue will not happen. > > [~tarmstr...@cloudera.com] [~tlipcon] [~adar] > What do you think about this issue? how to fix it? any suggestion? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10089) Fix large code model issue of llvm on aarch64
[ https://issues.apache.org/jira/browse/IMPALA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhaorenhai resolved IMPALA-10089. - Resolution: Fixed > Fix large code model issue of llvm on aarch64 > - > > Key: IMPALA-10089 > URL: https://issues.apache.org/jira/browse/IMPALA-10089 > Project: IMPALA > Issue Type: Sub-task >Reporter: zhaorenhai >Assignee: zhaorenhai >Priority: Major > > This issue referenced the following llvm issue: > [https://reviews.llvm.org/D27629] > If not fix, when loading test data of impala, the process will core dump. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set
[ https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-7782. --- Fix Version/s: Impala 4.0 Resolution: Fixed > discrepancy in results with a subquery containing an agg that produces an > empty set > --- > > Key: IMPALA-7782 > URL: https://issues.apache.org/jira/browse/IMPALA-7782 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0, Impala 3.1.0 >Reporter: Michael Brown >Assignee: Tim Armstrong >Priority: Major > Labels: correctness, query_generator, ramp-up > Fix For: Impala 4.0 > > > A discrepancy exists between Impala and Postgres when a subquery contains an > agg and results in an empty set, yet the WHERE clause looking at the subquery > should produce a "True" condition. > Example queries include: > {noformat} > USE functional; > SELECT id > FROM alltypestiny > WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false); > SELECT id > FROM alltypestiny > WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false); > SELECT id > FROM alltypestiny > WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL; > {noformat} > These queries do not produce any rows in Impala. In Postgres, the queries > produce all 8 rows for the functional.alltypestiny id column. > Thinking maybe there were Impala and Postgres differences with {{NOT IN}} > behavior, I also tried this: > {noformat} > USE functional; > SELECT id > FROM alltypestiny > WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL); > {noformat} > This subquery also produces an empty set just like the subquery in the > problematic queries at the top, but unlike those queries, this full query > returns the same results in Impala and Postgres (all 8 rows for the > functional.alltypestiny id column). > For anyone interested in this bug, you can migrate data into postgres in a > dev environment using > {noformat} > tests/comparison/data_generator.py --use-postgresql --migrate-table-names > alltypestiny --db-name functional migrate > {noformat} > This is in 2.12 at least, so it's not a 3.1 regression. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set
[ https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-7782: -- Priority: Blocker (was: Major) > discrepancy in results with a subquery containing an agg that produces an > empty set > --- > > Key: IMPALA-7782 > URL: https://issues.apache.org/jira/browse/IMPALA-7782 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0, Impala 3.1.0 >Reporter: Michael Brown >Assignee: Tim Armstrong >Priority: Blocker > Labels: correctness, query_generator, ramp-up > Fix For: Impala 4.0 > > > A discrepancy exists between Impala and Postgres when a subquery contains an > agg and results in an empty set, yet the WHERE clause looking at the subquery > should produce a "True" condition. > Example queries include: > {noformat} > USE functional; > SELECT id > FROM alltypestiny > WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false); > SELECT id > FROM alltypestiny > WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false); > SELECT id > FROM alltypestiny > WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL; > {noformat} > These queries do not produce any rows in Impala. In Postgres, the queries > produce all 8 rows for the functional.alltypestiny id column. > Thinking maybe there were Impala and Postgres differences with {{NOT IN}} > behavior, I also tried this: > {noformat} > USE functional; > SELECT id > FROM alltypestiny > WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL); > {noformat} > This subquery also produces an empty set just like the subquery in the > problematic queries at the top, but unlike those queries, this full query > returns the same results in Impala and Postgres (all 8 rows for the > functional.alltypestiny id column). > For anyone interested in this bug, you can migrate data into postgres in a > dev environment using > {noformat} > tests/comparison/data_generator.py --use-postgresql --migrate-table-names > alltypestiny --db-name functional migrate > {noformat} > This is in 2.12 at least, so it's not a 3.1 regression. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5633) Bloom filters underestimate false positive probability for high NDV
[ https://issues.apache.org/jira/browse/IMPALA-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184795#comment-17184795 ] Tim Armstrong commented on IMPALA-5633: --- [~jbapple] I know you're looking at some related things in Kudu now. Do you have any idea why max_filter_error_rate might be set to 0.75 - that seems awfully high. On some queries - TPC-DS Q84 for example, it seems to be too aggressive and is choosing filters that are too small and end up being less effective than desired. E.g. a 4kb filter for household_demographics.hd_demo_sk, which has 7200 distinct values. > Bloom filters underestimate false positive probability for high NDV > --- > > Key: IMPALA-5633 > URL: https://issues.apache.org/jira/browse/IMPALA-5633 > Project: IMPALA > Issue Type: Bug > Components: Perf Investigation >Reporter: Jim Apple >Assignee: Jim Apple >Priority: Minor > > {{bloom-filter.cc}} says: > {noformat} > fpp = (1 - exp(-BUCKET_WORDS * ndv/space))^BUCKET_WORDS > where space is in bits. > {noformat} > This is true only discounting the false positive rate induced by hash > collisions. Using {{w}}}-bit hashes, with {{n}} distinct values gives a false > positive rate of > {noformat} > n / exp2(w) + (1.0 - n / exp2(w)) * ((1 - exp(-BUCKET_WORDS * > ndv/space))^BUCKET_WORDS) > {noformat} > This starts to become a factor as {{n}} approaches {{1 << w}}. It also > suggests using bitmaps rather than Bloom filters for large {{n}}, since the > false positive rate for a bitmap is > {noformat} > n / exp2(w) + (1.0 - n / exp2(w)) * (1 - exp(-ndv/space)) > {noformat} > This is lower than the current BF false positive rate for high {{n}} and low > relative space (aka, high false positive probability). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set
[ https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184785#comment-17184785 ] ASF subversion and git services commented on IMPALA-7782: - Commit e133d1838ab05e75007fef24e2ce1b6f18113c8d in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e133d18 ] IMPALA-7782: fix constant NOT IN subqueries that can return 0 rows The bug was the the statement rewriter converted NOT IN predicates to != predicates when the subquery could be an empty set. This was invalid, because NOT IN () is true, but != () is false. Testing: Added targeted planner and end-to-end tests. Ran exhaustive tests. Change-Id: I66c726f0f66ce2f609e6ba44057191f5929a67fc Reviewed-on: http://gerrit.cloudera.org:8080/16338 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > discrepancy in results with a subquery containing an agg that produces an > empty set > --- > > Key: IMPALA-7782 > URL: https://issues.apache.org/jira/browse/IMPALA-7782 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0, Impala 3.1.0 >Reporter: Michael Brown >Assignee: Tim Armstrong >Priority: Major > Labels: correctness, query_generator, ramp-up > > A discrepancy exists between Impala and Postgres when a subquery contains an > agg and results in an empty set, yet the WHERE clause looking at the subquery > should produce a "True" condition. > Example queries include: > {noformat} > USE functional; > SELECT id > FROM alltypestiny > WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false); > SELECT id > FROM alltypestiny > WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false); > SELECT id > FROM alltypestiny > WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL; > {noformat} > These queries do not produce any rows in Impala. In Postgres, the queries > produce all 8 rows for the functional.alltypestiny id column. > Thinking maybe there were Impala and Postgres differences with {{NOT IN}} > behavior, I also tried this: > {noformat} > USE functional; > SELECT id > FROM alltypestiny > WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL); > {noformat} > This subquery also produces an empty set just like the subquery in the > problematic queries at the top, but unlike those queries, this full query > returns the same results in Impala and Postgres (all 8 rows for the > functional.alltypestiny id column). > For anyone interested in this bug, you can migrate data into postgres in a > dev environment using > {noformat} > tests/comparison/data_generator.py --use-postgresql --migrate-table-names > alltypestiny --db-name functional migrate > {noformat} > This is in 2.12 at least, so it's not a 3.1 regression. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8547) get_json_object fails to get value for numeric key
[ https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved IMPALA-8547. -- Fix Version/s: Impala 4.0 Resolution: Fixed > get_json_object fails to get value for numeric key > -- > > Key: IMPALA-8547 > URL: https://issues.apache.org/jira/browse/IMPALA-8547 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Eugene Zimichev >Assignee: Eugene Zimichev >Priority: Minor > Labels: built-in-function > Fix For: Impala 4.0 > > > {code:java} > select get_json_object('{"1": 5}', '$.1'); > {code} > returns error: > > {code:java} > "Expected key at position 2" > {code} > > I guess it's caused by using function FindEndOfIdentifier that expects first > symbol of key to be a letter. > Hive version of get_json_object works fine in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9957) Impalad crashes when serializing large rows in aggregation spilling
[ https://issues.apache.org/jira/browse/IMPALA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-9957. Fix Version/s: Impala 4.0 Resolution: Fixed > Impalad crashes when serializing large rows in aggregation spilling > --- > > Key: IMPALA-9957 > URL: https://issues.apache.org/jira/browse/IMPALA-9957 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.0 > > > Queries to reproduce the crash using the testdata: > {code:sql} > create table bigstrs stored as parquet as > select *, repeat(uuid(), cast(random() * 10 as int)) as bigstr > from functional.alltypes; > set MAX_ROW_SIZE=3.5MB; > set MEM_LIMIT=4GB; > create table my_str_group stored as parquet as > select group_concat(string_col) as ss, bigstr > from bigstrs group by bigstr; > {code} > The last query 1) has large rows, 2) needs spilling in aggregation 3) has > aggregation on functions needs serialize (e.g. group_concat, appx_median, > min(string), etc). With these 3 conditions, it will trigger this bug. > The crash stacktraces are different in different build modes. Crash > stacktrace in RELEASE build with codegen enabled: > {code:java} > Thread 316 (crashed) > 0 impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0] > 1 impalad!impala::GroupingAggregator::Partition::Spill(bool) > [grouping-aggregator-partition.cc : 180 + 0x9] > 2 impalad!impala::GroupingAggregator::SpillPartition(bool) > [grouping-aggregator.cc : 904 + 0x10] > 3 0x7f5fba83db3c > 4 impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, > impala::RowBatch*) [grouping-aggregator.cc : 437 + 0x2] > 5 impalad!impala::AggregationNode::Open(impala::RuntimeState*) > [aggregation-node.cc : 70 + 0x6] > 6 libstdc++.so.6.0.24 + 0x120b28 > 7 > impalad!apache::hive::service::cli::thrift::TColumnValue::printTo(std::ostream&) > const [converter_lexical_streams.hpp : 161 + 0x8] > 8 impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc > : 396 + 0x11] > 9 impalad!tc_newarray + 0x171 > {code} > Crash stacktrace in RELEASE build with codegen disabled (set > DISABLE_CODEGEN=true): > {code:java} > Thread 320 (crashed) > 0 impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0] > 1 impalad!impala::GroupingAggregator::Partition::Spill(bool) > [grouping-aggregator-partition.cc : 180 + 0x9] > 2 impalad!impala::GroupingAggregator::SpillPartition(bool) > [grouping-aggregator.cc : 904 + 0x10] > 3 impalad!impala::Status > impala::GroupingAggregator::AddBatchImpl(impala::RowBatch*, > impala::TPrefetchMode::type, impala::HashTableCtx*) > [grouping-aggregator-ir.cc : 148 + 0x11] > 4 impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, > impala::RowBatch*) [grouping-aggregator.cc : 439 + 0x5] > 5 impalad!impala::AggregationNode::Open(impala::RuntimeState*) > [aggregation-node.cc : 70 + 0x6] > 6 impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc > : 396 + 0x11] > 7 impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc > : 97 + 0x12] > 8 impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) > [query-state.cc : 815 + 0x19] > 9 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string std::char_traits, std::allocator > const&, > std::__cxx11::basic_string, std::allocator > > const&, boost::function, impala::ThreadDebugInfo const*, > impala::Promise*) [function_template.hpp : 770 > + 0x7] > 10 impalad!boost::detail::thread_data (*)(std::__cxx11::basic_string, > std::allocator > const&, std::__cxx11::basic_string std::char_traits, std::allocator > const&, boost::function ()>, impala::ThreadDebugInfo const*, impala::Promise (impala::PromiseMode)0>*), > boost::_bi::list5 std::char_traits, std::allocator > >, > boost::_bi::value, > std::allocator > >, boost::_bi::value >, > boost::_bi::value, > boost::_bi::value*> > > > >::run() [bind.hpp : 531 + 0xc] > 11 impalad!thread_proxy + 0x72 > 12 libpthread-2.23.so + 0x76ba > 13 libc-2.23.so + 0x1074dd > {code} > Crash stacktrace in DEBUG build with codegen disabled is a bit ealier - > crashed at a DCHECK: > {code:java} > F0715 20:29:24.389505 16868 grouping-aggregator-partition.cc:125] > 1d4b40df02e6ad76:433ed5740003] Check failed: !status.ok() Stream was > unpinned - AddRow() only fails on error > *** Check failure stack trace: *** > @ 0x513f31c google::LogMessage::Fail() > @ 0x5140c0c google::LogMessage::SendToLog() > @ 0x513ec7a google::LogMessage::Flush() > @ 0x5142878 google::LogMessageFatal::~LogMessageFatal() > @ 0x28b2ca7 >
[jira] [Resolved] (IMPALA-9955) Internal error for a query with large rows and spilling
[ https://issues.apache.org/jira/browse/IMPALA-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-9955. Fix Version/s: Impala 4.0 Resolution: Fixed > Internal error for a query with large rows and spilling > --- > > Key: IMPALA-9955 > URL: https://issues.apache.org/jira/browse/IMPALA-9955 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > Fix For: Impala 4.0 > > Attachments: impalad.INFO, impalad_node1.INFO, impalad_node2.INFO > > > Encounter a query failure due to internal error: > {code:java} > create table bigstrs stored as parquet as select *, repeat(uuid(), > cast(random() * 10 as int)) as bigstr from functional.alltypes; > set MAX_ROW_SIZE=3.5MB; > set MEM_LIMIT=4GB; > set DISABLE_CODEGEN=true; > create table my_cnt stored as parquet as select count(*) as cnt, bigstr from > bigstrs group by bigstr; > {code} > The error is > {code:java} > ERROR: Internal error: couldn't pin large page of 4194304 bytes, client only > had 2097152 bytes of unused reservation: > 0xcf9dae0 internal state: { > 0xbdf6ac0 name: GroupingAggregator id=3 ptr=0xcf9d900 write_status: buffers > allocated 2097152 num_pages: 2094 pinned_bytes: 41943040 > dirty_unpinned_bytes: 0 in_flight_write_bytes: 0 reservation: > {: reservation_limit 9223372036854775807 reservation > 46137344 used_reservation 44040192 child_reservations 0 parent: > : reservation_limit 9223372036854775807 reservation > 46137344 used_reservation 0 child_reservations 46137344 parent: > : reservation_limit 9223372036854775807 reservation > 46137344 used_reservation 0 child_reservations 46137344 parent: > : reservation_limit 3435973836 reservation 46137344 > used_reservation 0 child_reservations 46137344 parent: > : reservation_limit 6647046144 reservation 46137344 > used_reservation 0 child_reservations 46137344 parent: > NULL} > 12 pinned pages: 0xc9160a0 len: 2097152 pin_count: 1 > buf: 0xc916118 client: 0xcf9dae0/0xbdf6ac0 data: > 0x1320 len: 2097152 > 0xc919d40 len: 4194304 pin_count: 1 buf: > 0xc919db8 client: 0xcf9dae0/0xbdf6ac0 data: > 0x12460 len: 4194304 > 0xd42aaa0 len: 4194304 pin_count: 1 buf: > 0xd42ab18 client: 0xcf9dae0/0xbdf6ac0 data: > 0x12b20 len: 4194304 > 0xd42b900 len: 4194304 pin_count: 1 buf: > 0xd42b978 client: 0xcf9dae0/0xbdf6ac0 data: > 0x132a0 len: 4194304 > 0xd42d3e0 len: 2097152 pin_count: 1 buf: > 0xd42d458 client: 0xcf9dae0/0xbdf6ac0 data: > 0xc6a0 len: 2097152 > 0xd42dd40 len: 4194304 pin_count: 1 buf: > 0xd42ddb8 client: 0xcf9dae0/0xbdf6ac0 data: > 0x132e0 len: 4194304 > 0xd42de80 len: 4194304 pin_count: 1 buf: > 0xd42def8 client: 0xcf9dae0/0xbdf6ac0 data: > 0x137c0 len: 4194304 > 0x12d48320 len: 4194304 pin_count: 1 buf: > 0x12d48398 client: 0xcf9dae0/0xbdf6ac0 data: > 0x102c0 len: 4194304 > 0x12d483c0 len: 4194304 pin_count: 1 buf: > 0x12d48438 client: 0xcf9dae0/0xbdf6ac0 data: > 0x108a0 len: 4194304 > 0x12d48780 len: 4194304 pin_count: 1 buf: > 0x12d487f8 client: 0xcf9dae0/0xbdf6ac0 data: > 0x108e0 len: 4194304 > 0x12d492c0 len: 2097152 pin_count: 1 buf: > 0x12d49338 client: 0xcf9dae0/0xbdf6ac0 data: > 0x12760 len: 2097152 > 0x12d4a9e0 len: 2097152 pin_count: 1 buf: > 0x12d4aa58 client: 0xcf9dae0/0xbdf6ac0 data: > 0x12d20 len: 2097152 > 0 dirty unpinned pages: > 0 in flight write pages: } > {code} > Found the stacktrace from the log: > {code} > @ 0x1c9dfbe impala::Status::Status() > @ 0x1ca5a78 impala::Status::Status() > @ 0x2bfe4ec impala::BufferedTupleStream::NextReadPage() > @ 0x2c04b72 impala::BufferedTupleStream::GetNextInternal<>() > @ 0x2c029e6 impala::BufferedTupleStream::GetNextInternal<>() > @ 0x2bffd19 impala::BufferedTupleStream::GetNext() > @ 0x28aa43f impala::GroupingAggregator::ProcessStream<>() > @ 0x28a2e17 impala::GroupingAggregator::BuildSpilledPartition() > @ 0x28a2401 impala::GroupingAggregator::NextPartition() > @ 0x289df5a impala::GroupingAggregator::GetRowsFromPartition() > @ 0x289db20 impala::GroupingAggregator::GetNext() > @ 0x28dbfc7 impala::AggregationNode::GetNext() > @ 0x2259dfc impala::FragmentInstanceState::ExecInternal() > @ 0x22564a0 impala::FragmentInstanceState::Exec() > @ 0x22801ed impala::QueryState::ExecFInstance() > @ 0x227e5ef > _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv > @ 0x2281d8e >
[jira] [Commented] (IMPALA-8547) get_json_object fails to get value for numeric key
[ https://issues.apache.org/jira/browse/IMPALA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184765#comment-17184765 ] ASF subversion and git services commented on IMPALA-8547: - Commit adf2c464aed5d35c976c1439982e0a927f76b609 in impala's branch refs/heads/master from Eugene Zimichev [ https://gitbox.apache.org/repos/asf?p=impala.git;h=adf2c46 ] IMPALA-8547: get_json_object fails to get value for numeric key Allows numeric keys for JSON objects in get_json_object. This patch makes Impala consistent with Hive and Postgres behavior for get_json_object. Queries such as "select get_json_object('{"1": 5}', '$.1');" would fail before this patch. Now the query will return '5'. Testing: * Added tests to expr-test Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4 Reviewed-on: http://gerrit.cloudera.org:8080/14905 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > get_json_object fails to get value for numeric key > -- > > Key: IMPALA-8547 > URL: https://issues.apache.org/jira/browse/IMPALA-8547 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Eugene Zimichev >Assignee: Eugene Zimichev >Priority: Minor > Labels: built-in-function > > {code:java} > select get_json_object('{"1": 5}', '$.1'); > {code} > returns error: > > {code:java} > "Expected key at position 2" > {code} > > I guess it's caused by using function FindEndOfIdentifier that expects first > symbol of key to be a letter. > Hive version of get_json_object works fine in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
[ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-7876: - Assignee: Tim Armstrong > COMPUTE STATS TABLESAMPLE is not updating number of estimated rows > -- > > Key: IMPALA-7876 > URL: https://issues.apache.org/jira/browse/IMPALA-7876 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.0 >Reporter: Andre Araujo >Assignee: Tim Armstrong >Priority: Critical > > Running the command below seems to have no impact on the #rows stats. > {code} > [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5); > Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100) > +---+ > | summary | > +---+ > | Updated 1 partition(s) and 103 column(s). | > +---+ > WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%. > The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table > size 20.35GB > Fetched 1 row(s) in 43.67s > [host:21000] default> show table stats wide; > Query: show table stats wide > +---+--++-+--+---+-+---+-+ > | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication > | Format | Incremental stats | Location| > +---+--++-+--+---+-+---+-+ > | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED > | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide | > +---+--++-+--+---+-+---+-+ > Fetched 1 row(s) in 0.01s > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10103) Jquery upgrade to 3.5.1
[ https://issues.apache.org/jira/browse/IMPALA-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-10103. Fix Version/s: Impala 4.0 Resolution: Fixed > Jquery upgrade to 3.5.1 > --- > > Key: IMPALA-10103 > URL: https://issues.apache.org/jira/browse/IMPALA-10103 > Project: IMPALA > Issue Type: Task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9962) Implement ds_kll_quantiles() function
[ https://issues.apache.org/jira/browse/IMPALA-9962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184750#comment-17184750 ] ASF subversion and git services commented on IMPALA-9962: - Commit 41065845e927acef5a0ff95ef8fb32b2f86272d8 in impala's branch refs/heads/master from Gabor Kaszab [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4106584 ] IMPALA-9962: Implement ds_kll_quantiles_as_string() function This function is very similar to ds_kll_quantile() but this one can receive any number of rank parameters and returns a comma separated string that holds the results for all of the given ranks. For more details about ds_kll_quantile() see IMPALA-9959. Note, ds_kll_quantiles() should return an Array of floats as the result but with that we have to wait for the complex type support. Until, we provide ds_kll_quantiles_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f Reviewed-on: http://gerrit.cloudera.org:8080/16324 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Implement ds_kll_quantiles() function > - > > Key: IMPALA-9962 > URL: https://issues.apache.org/jira/browse/IMPALA-9962 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > > Requirements for ds_kll_quantiles() > - Receives a serialized KLL sketch in BINARY type (in Impala it can be > STRING as long as we don't have BINARY) as first parameter. > - Receives one or more double values to represent the quantile points. > - In Hive the return type is an array of doubles. However, Impala can't > return complex types from functions at this point so we have to find some > alternative approaches to implement this function. > ** One would be to return as many columns as many quantile points were given. > ** Another approach is to create a comma separated string from the results > of this function and return that string instead of an array. > Hive example: > {code:java} > select ds_kll_quantiles(ds_kll_sketch(cast(int_col as float)), 0, 0.1, 0.5, > 1) from table_name > ++ > |_c0 | > ++ > | [1.0,1.0,1.0,1.0] | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9520) Allow returning complex types from UDFs
[ https://issues.apache.org/jira/browse/IMPALA-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184752#comment-17184752 ] ASF subversion and git services commented on IMPALA-9520: - Commit 41065845e927acef5a0ff95ef8fb32b2f86272d8 in impala's branch refs/heads/master from Gabor Kaszab [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4106584 ] IMPALA-9962: Implement ds_kll_quantiles_as_string() function This function is very similar to ds_kll_quantile() but this one can receive any number of rank parameters and returns a comma separated string that holds the results for all of the given ranks. For more details about ds_kll_quantile() see IMPALA-9959. Note, ds_kll_quantiles() should return an Array of floats as the result but with that we have to wait for the complex type support. Until, we provide ds_kll_quantiles_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f Reviewed-on: http://gerrit.cloudera.org:8080/16324 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Allow returning complex types from UDFs > --- > > Key: IMPALA-9520 > URL: https://issues.apache.org/jira/browse/IMPALA-9520 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Priority: Major > Labels: complextype > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9959) Implement ds_kll_sketch() and ds_kll_quantile() functions
[ https://issues.apache.org/jira/browse/IMPALA-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184751#comment-17184751 ] ASF subversion and git services commented on IMPALA-9959: - Commit 41065845e927acef5a0ff95ef8fb32b2f86272d8 in impala's branch refs/heads/master from Gabor Kaszab [ https://gitbox.apache.org/repos/asf?p=impala.git;h=4106584 ] IMPALA-9962: Implement ds_kll_quantiles_as_string() function This function is very similar to ds_kll_quantile() but this one can receive any number of rank parameters and returns a comma separated string that holds the results for all of the given ranks. For more details about ds_kll_quantile() see IMPALA-9959. Note, ds_kll_quantiles() should return an Array of floats as the result but with that we have to wait for the complex type support. Until, we provide ds_kll_quantiles_as_string() that can be deprecated once we have array support. Tracking Jira for returning complex types from functions is IMPALA-9520. Change-Id: I76f6039977f4e14ded89a3ee4bc4e6ff855f5e7f Reviewed-on: http://gerrit.cloudera.org:8080/16324 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Implement ds_kll_sketch() and ds_kll_quantile() functions > - > > Key: IMPALA-9959 > URL: https://issues.apache.org/jira/browse/IMPALA-9959 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Fix For: Impala 4.0 > > > 1: > STRING ds_kll_sketch(float) > Accepts float as parameter and returns a DataSketches KLL sketch in string > type (or binary once that work is submitted). > 2: > FLOAT (or DOUBLE?) ds_kll_quantile(KLL sketch, double) > Accepts two parameters: a KLL sketch created by ds_hll_sketch() and a double > in [0, 1] to represent the quantile. > At this point I'm not sure about the return value, it's either a float or > double, it's a subject of further investigation. > Example: > {code:java} > select ds_kll_quantile(ds_kll_sketch(cast(int_col as float)), 1) from > table_name; > +--+ > | _c0 | > +--+ > | 1.0 | > +--+ > {code} > Some further examples found here: > [https://datasketches.apache.org/docs/Quantiles/QuantilesCppExample.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9957) Impalad crashes when serializing large rows in aggregation spilling
[ https://issues.apache.org/jira/browse/IMPALA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184749#comment-17184749 ] ASF subversion and git services commented on IMPALA-9957: - Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Reviewed-on: http://gerrit.cloudera.org:8080/16240 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong > Impalad crashes when serializing large rows in aggregation spilling > --- > > Key: IMPALA-9957 > URL: https://issues.apache.org/jira/browse/IMPALA-9957 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Queries to reproduce the crash using the testdata: > {code:sql} > create table bigstrs stored as parquet as > select *, repeat(uuid(), cast(random() * 10 as int)) as bigstr > from functional.alltypes; > set MAX_ROW_SIZE=3.5MB; > set MEM_LIMIT=4GB; > create table my_str_group stored as parquet as > select group_concat(string_col) as ss, bigstr > from bigstrs group by bigstr; > {code} > The last query 1) has large rows, 2) needs spilling in aggregation 3) has > aggregation on functions needs serialize (e.g. group_concat, appx_median, > min(string), etc). With these 3 conditions, it will trigger this bug. > The crash stacktraces are different in different build modes. Crash > stacktrace in RELEASE build with codegen enabled: > {code:java} > Thread 316 (crashed) > 0 impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0] > 1 impalad!impala::GroupingAggregator::Partition::Spill(bool) > [grouping-aggregator-partition.cc : 180 + 0x9] > 2 impalad!impala::GroupingAggregator::SpillPartition(bool) > [grouping-aggregator.cc : 904 + 0x10] > 3 0x7f5fba83db3c > 4 impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, > impala::RowBatch*) [grouping-aggregator.cc : 437 + 0x2] > 5 impalad!impala::AggregationNode::Open(impala::RuntimeState*) > [aggregation-node.cc : 70 + 0x6] > 6 libstdc++.so.6.0.24 + 0x120b28 > 7 > impalad!apache::hive::service::cli::thrift::TColumnValue::printTo(std::ostream&) >
[jira] [Commented] (IMPALA-7779) Parquet Scanner can write binary data into profile
[ https://issues.apache.org/jira/browse/IMPALA-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184746#comment-17184746 ] ASF subversion and git services commented on IMPALA-7779: - Commit 2ebf554dfdb0dc9055ef95c8f2ec4fad51f1e657 in impala's branch refs/heads/master from Qifan Chen [ https://gitbox.apache.org/repos/asf?p=impala.git;h=2ebf554 ] IMPALA-7779 Parquet Scanner can write binary data into profile This fix addresses the current limitation in that an ill-formatted Parquet version string is not properly formatted before appearing in an error message or impalad.INFO. With the fix, any such string is converted to a hex string first. The hex string is a sequence of four hex digit groups separated by spaces and each group is one or two hex digits, such as "6c 65 2e a". Testing: Ran "core" tests successfully. Change-Id: I281d6fa7cb2f88f04588110943e3e768678b9cf1 Reviewed-on: http://gerrit.cloudera.org:8080/16331 Tested-by: Impala Public Jenkins Reviewed-by: Sahil Takiar > Parquet Scanner can write binary data into profile > -- > > Key: IMPALA-7779 > URL: https://issues.apache.org/jira/browse/IMPALA-7779 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.1.0 >Reporter: Lars Volker >Assignee: Qifan Chen >Priority: Major > Labels: profile > > In > [hdfs-parquet-scanner.cc:1224|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-scanner.cc#L1224] > we log an invalid file version string. Whatever 4 bytes that that pointer > points to will end up in the profile. These can be non-ascii characters, thus > potentially breaking tools that parse the profiles and expect their content > to be plain text. We should either remove the bytes from the message, or > escape them as hex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9955) Internal error for a query with large rows and spilling
[ https://issues.apache.org/jira/browse/IMPALA-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184747#comment-17184747 ] ASF subversion and git services commented on IMPALA-9955: - Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch refs/heads/master from stiga-huang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator The minimum requirement for a spillable operator is ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. In the min reservation, we only reserve space for two large pages, one for reading, the other for writing. However, to make the non-streaming GroupingAggregator work correctly, we have to manage these extra reservations carefully. So it won't run out of the min reservation when it actually needs to spill a large page, or when it actually needs to read a large page. To be specific, for how to manage the large write page reservation, depending on whether needs_serialize is true or false: - If the aggregator needs to serialize the intermediate results when spilling a partition, we have to save a large page worth of reservation for the serialize stream, in case it needs to write large rows. This space can be restored when all the partitions are spilled so the serialize stream is not needed until we build/repartition a spilled partition and thus have pinned partitions again. If the large write page reservation is used, we save it back whenever possible after we spill or close a partition. - If the aggregator doesn't need the serialize stream at all, we can restore the large write page reservation whenever we fail to add a large row, before spilling any partitions. Reclaim it whenever possible after we spill or close a partition. A special case is when we are processing a large row and it's the last row in building/repartitioning a spilled partition, the large write page reservation can be restored for it no matter whether we need the serialize stream. Because partitions will be read out after this so no needs for spilling. For the large read page reservation, it's transferred to the spilled BufferedTupleStream that we are reading in building/repartitioning a spilled partition. The stream will restore some of it when reading a large page, and reclaim it when the output row batch is reset. Note that the stream is read in attach_on_read mode, the large page will be attached to the row batch's buffers and only get freed when the row batch is reset. Tests: - Add tests in test_spilling_large_rows (test_spilling.py) with different row sizes to reproduce the issue. - One test in test_spilling_no_debug_action becomes flaky after this patch. Revise the query to make the udf allocate larger strings so it can consistently pass. - Run CORE tests. Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Reviewed-on: http://gerrit.cloudera.org:8080/16240 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong > Internal error for a query with large rows and spilling > --- > > Key: IMPALA-9955 > URL: https://issues.apache.org/jira/browse/IMPALA-9955 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > Attachments: impalad.INFO, impalad_node1.INFO, impalad_node2.INFO > > > Encounter a query failure due to internal error: > {code:java} > create table bigstrs stored as parquet as select *, repeat(uuid(), > cast(random() * 10 as int)) as bigstr from functional.alltypes; > set MAX_ROW_SIZE=3.5MB; > set MEM_LIMIT=4GB; > set DISABLE_CODEGEN=true; > create table my_cnt stored as parquet as select count(*) as cnt, bigstr from > bigstrs group by bigstr; > {code} > The error is > {code:java} > ERROR: Internal error: couldn't pin large page of 4194304 bytes, client only > had 2097152 bytes of unused reservation: > 0xcf9dae0 internal state: { > 0xbdf6ac0 name: GroupingAggregator id=3 ptr=0xcf9d900 write_status: buffers > allocated 2097152 num_pages: 2094 pinned_bytes: 41943040 > dirty_unpinned_bytes: 0 in_flight_write_bytes: 0 reservation: > {: reservation_limit 9223372036854775807 reservation > 46137344 used_reservation 44040192 child_reservations 0 parent: > : reservation_limit 9223372036854775807 reservation > 46137344 used_reservation 0 child_reservations 46137344 parent: > : reservation_limit 9223372036854775807 reservation > 46137344 used_reservation 0 child_reservations 46137344 parent: > : reservation_limit 3435973836 reservation 46137344 > used_reservation 0 child_reservations 46137344 parent: > : reservation_limit 6647046144 reservation 46137344 > used_reservation 0 child_reservations 46137344 parent: > NULL} > 12 pinned pages: 0xc9160a0
[jira] [Commented] (IMPALA-10103) Jquery upgrade to 3.5.1
[ https://issues.apache.org/jira/browse/IMPALA-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184753#comment-17184753 ] ASF subversion and git services commented on IMPALA-10103: -- Commit b46ea7664c3b38e12ea6a06e7d342273d132fbf7 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b46ea76 ] IMPALA-10103: upgrade jquery to 3.5.1 Testing: Manually clicked through most of the web UI pages and interacted with data tables, etc. Change-Id: Icf0445163a6bf15c56de0c6ca10798e09e0a4fcb Reviewed-on: http://gerrit.cloudera.org:8080/16355 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Jquery upgrade to 3.5.1 > --- > > Key: IMPALA-10103 > URL: https://issues.apache.org/jira/browse/IMPALA-10103 > Project: IMPALA > Issue Type: Task >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Fix For: Impala 4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
[ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184733#comment-17184733 ] Vincent Tran commented on IMPALA-7876: -- This should reproduce this master. {noformat} CREATE TABLE default.one_gram_p ( ngram STRING, match_count INT,volume_count INT )PARTITIONED BY (year STRING)STORED AS TEXTFILE TBLPROPERTIES ('impala.enable.stats.extrapolation'='true'); insert into one_gram_p partition(year) values('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'); set compute_stats_min_sample_size=1B; compute stats one_gram_p tablesample system(50); show table stats one_gram_p;{noformat} > COMPUTE STATS TABLESAMPLE is not updating number of estimated rows > -- > > Key: IMPALA-7876 > URL: https://issues.apache.org/jira/browse/IMPALA-7876 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.0 >Reporter: Andre Araujo >Priority: Critical > > Running the command below seems to have no impact on the #rows stats. > {code} > [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5); > Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100) > +---+ > | summary | > +---+ > | Updated 1 partition(s) and 103 column(s). | > +---+ > WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%. > The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table > size 20.35GB > Fetched 1 row(s) in 43.67s > [host:21000] default> show table stats wide; > Query: show table stats wide > +---+--++-+--+---+-+---+-+ > | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication > | Format | Incremental stats | Location| > +---+--++-+--+---+-+---+-+ > | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED > | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide | > +---+--++-+--+---+-+---+-+ > Fetched 1 row(s) in 0.01s > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
[ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Tran reopened IMPALA-7876: -- > COMPUTE STATS TABLESAMPLE is not updating number of estimated rows > -- > > Key: IMPALA-7876 > URL: https://issues.apache.org/jira/browse/IMPALA-7876 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.0 >Reporter: Andre Araujo >Priority: Critical > > Running the command below seems to have no impact on the #rows stats. > {code} > [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5); > Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100) > +---+ > | summary | > +---+ > | Updated 1 partition(s) and 103 column(s). | > +---+ > WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%. > The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table > size 20.35GB > Fetched 1 row(s) in 43.67s > [host:21000] default> show table stats wide; > Query: show table stats wide > +---+--++-+--+---+-+---+-+ > | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication > | Format | Incremental stats | Location| > +---+--++-+--+---+-+---+-+ > | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED > | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide | > +---+--++-+--+---+-+---+-+ > Fetched 1 row(s) in 0.01s > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10060) Postgres JDBC driver should be upgraded to 42.2.14
[ https://issues.apache.org/jira/browse/IMPALA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184715#comment-17184715 ] Tim Armstrong commented on IMPALA-10060: Posted it here and running precommits - https://gerrit.cloudera.org/#/c/16362/ > Postgres JDBC driver should be upgraded to 42.2.14 > -- > > Key: IMPALA-10060 > URL: https://issues.apache.org/jira/browse/IMPALA-10060 > Project: IMPALA > Issue Type: Task >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Attachments: IMPALA-10060.patch > > > Impala currently uses Postgres driver version 42.2.5 which isn't up to date > and has a CVE associated with it. It would be good to upgrade to 42.2.14 > which is the latest as of June 2020. > https://mvnrepository.com/artifact/org.postgresql/postgresql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
[ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184701#comment-17184701 ] Vincent Tran commented on IMPALA-7876: -- {noformat} ===Without sampling=== [:21000] default> compute stats one_gram_p1; Query: compute stats one_gram_p1 +-+ | summary | +-+ | Updated 1 partition(s) and 3 column(s). | +-+ Fetched 1 row(s) in 1.51s [:21000] default> show table stats one_gram_p1; Query: show table stats one_gram_p1 +---+--+--++--+--+---++---+---+ | year | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +---+--+--++--+--+---++---+---+ | 2000 | -1 | 19013482 | 3 | 289.07MB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://:8020/user/hive/warehouse/one_gram_p1/year=2000 | | Total | 19013482 | 19013482 | 3 | 289.07MB | 0B | | | | | +---+--+--++--+--+---++---+---+ Fetched 2 row(s) in 0.01s ===With sampling=== [:21000] default> set compute_stats_min_sample_size=1MB; COMPUTE_STATS_MIN_SAMPLE_SIZE set to 1MB :21000] default> compute stats one_gram_p1 tablesample system(10); Query: compute stats one_gram_p1 tablesample system(10) +-+ | summary | +-+ | Updated 1 partition(s) and 3 column(s). | +-+ Fetched 1 row(s) in 1.72s [:21000] default> show table stats one_gram_p1; Query: show table stats one_gram_p1 +---+---+--++--+--+---++---+---+ | year | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +---+---+--++--+--+---++---+---+ | 2000 | -1 | -1 | 3 | 289.07MB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://:8020/user/hive/warehouse/one_gram_p1/year=2000 | | Total | 0 | -1 | 3 | 289.07MB | 0B | | | | | +---+---+--++--+--+---++---+---+ Fetched 2 row(s) in 0.01s {noformat} > COMPUTE STATS TABLESAMPLE is not updating number of estimated rows > -- > > Key: IMPALA-7876 > URL: https://issues.apache.org/jira/browse/IMPALA-7876 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.0 >Reporter: Andre Araujo >Priority: Critical > > Running the command below seems to have no impact on the #rows stats. > {code} > [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5); > Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100) > +---+ > | summary | > +---+ > | Updated 1 partition(s) and 103 column(s). | > +---+ > WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%. > The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table > size 20.35GB > Fetched 1 row(s) in 43.67s > [host:21000] default> show table stats wide; > Query: show table stats wide > +---+--++-+--+---+-+---+-+ > | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication > | Format | Incremental stats | Location| > +---+--++-+--+---+-+---+-+ > | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED > | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide | >
[jira] [Commented] (IMPALA-7876) COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
[ https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184697#comment-17184697 ] Vincent Tran commented on IMPALA-7876: -- I can reproduce this on ~ 3.2.0. I think this may be unrelated to the width of the table. The specs for my table is below: {noformat} default> show create table one_gram_p; Query: show create table one_gram_p CREATE TABLE default.one_gram_p ( ngram STRING, match_count INT, volume_count INT ) PARTITIONED BY ( year STRING ) STORED AS TEXTFILE LOCATION 'hdfs:user/hive/warehouse/one_gram_p' TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK', 'impala.enable.stats.extrapolation'='true', 'impala.lastComputeStatsTime'='1598383227', 'numRows'='1430731493', 'totalSize'='22081529047') {noformat} I need to check against the master branch next. > COMPUTE STATS TABLESAMPLE is not updating number of estimated rows > -- > > Key: IMPALA-7876 > URL: https://issues.apache.org/jira/browse/IMPALA-7876 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.0 >Reporter: Andre Araujo >Priority: Critical > > Running the command below seems to have no impact on the #rows stats. > {code} > [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5); > Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100) > +---+ > | summary | > +---+ > | Updated 1 partition(s) and 103 column(s). | > +---+ > WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%. > The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table > size 20.35GB > Fetched 1 row(s) in 43.67s > [host:21000] default> show table stats wide; > Query: show table stats wide > +---+--++-+--+---+-+---+-+ > | #Rows | Extrap #Rows | #Files | Size| Bytes Cached | Cache Replication > | Format | Incremental stats | Location| > +---+--++-+--+---+-+---+-+ > | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED > | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide | > +---+--++-+--+---+-+---+-+ > Fetched 1 row(s) in 0.01s > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5564) Return a profile for queries during planning
[ https://issues.apache.org/jira/browse/IMPALA-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184620#comment-17184620 ] Sahil Takiar commented on IMPALA-5564: -- A WIP patch for this was posted here: https://gerrit.cloudera.org/#/c/8434/ > Return a profile for queries during planning > > > Key: IMPALA-5564 > URL: https://issues.apache.org/jira/browse/IMPALA-5564 > Project: IMPALA > Issue Type: Sub-task > Components: Backend >Affects Versions: Impala 2.10.0 >Reporter: Lars Volker >Priority: Major > Labels: supportability > > During planning we currently don't return a profile from the debug webpages. > It would be nice to do so, to allow various monitoring tools to retrieve > information about queries during their planning phase. > This could be a minimal version of the profiles with information about the > current state of planning, e.g. that the FE is currently waiting for metadata > to be loaded. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements
[ https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184168#comment-17184168 ] Tim Armstrong commented on IMPALA-10101: [~keens312] I looked at some of our branches and I think you probably need IMPALA-6682, IMPALA-6690, IMPALA-6695, IMPALA-6697, IMPALA-6731, IMPALA-6752, IMPALA-6731, IMPALA-6863. Can't guarantee that is complete, but that would be the bulk of it. > Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python > requirements > - > > Key: IMPALA-10101 > URL: https://issues.apache.org/jira/browse/IMPALA-10101 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.9.0 > Environment: ubuntu 1604 x86_64 >Reporter: Kevin Yu >Priority: Major > > Reproduce steps: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed
[ https://issues.apache.org/jira/browse/IMPALA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184153#comment-17184153 ] Tim Armstrong commented on IMPALA-10104: This is most likely IMPALA-8969, which was still present in CDH6.3.1 I couldn't reproduce on master with a similar query. {noformat} SELECT max(date_string_col) as datekey , if(`string_col` like '%a','A', if(`string_col` like '%b', 'B', if(`string_col` like '%c', 'C', if(`string_col` like '%d', 'D', 'E' test, sum(cast(double_col AS FLOAT)) / (id/ 1000) AS ecpm, id AS e_num, count(DISTINCT timestamp_col) AS r_num, sum(float_col) AS d_num, sum(double_col) AS c_num, count(DISTINCT tinyint_col) AS uv, sum(bigint_col) /count(DISTINCT bigint_col) AS d_rate, sum(int_col) /count(DISTINCT int_col) AS c_rate, sum(cast(double_col AS FLOAT)) AS money FROM alltypes WHERE date_string_col = '01/13/09' GROUP BY 2, 4; {noformat} > multiple if funtion and multiple-agg cause impalad crashed > --- > > Key: IMPALA-10104 > URL: https://issues.apache.org/jira/browse/IMPALA-10104 > Project: IMPALA > Issue Type: Bug > Components: Backend, Frontend >Affects Versions: Impala 3.2.0 > Environment: CDH6.3.1 > jdk 1.8.0_131 >Reporter: lxc >Priority: Major > > sql: > SELECT max(datekey) as datekey , > if(`exp` like '%a','A', > if(`exp` like '%b', 'B', > if(`exp` like '%c', 'C', > if(`exp` like '%d', 'D', 'E' test, > sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm, > count(*) AS e_num, > count(DISTINCT aa) AS r_num, > sum(isd) AS d_num, > sum(isc) AS c_num, > count(DISTINCT bb) AS uv, > sum(isd) /count(DISTINCT aa) AS d_rate, > sum(isc) /count(DISTINCT aa) AS c_rate, > sum(cast(money AS FLOAT)) AS money > FROM tableA > WHERE datekey = '20200812' > GROUP BY test; > the coredump info.: > > Program terminated with signal 6, Aborted. > #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 > cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 > libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 > libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 > openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 > #1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6 > #2 0x7f549bd7d3b5 in os::abort(bool) () from > /opt/jdk/jre/lib/amd64/server/libjvm.so > #3 0x7f549bf1f673 in VMError::report_and_die() () from > /opt/jdk/jre/lib/amd64/server/libjvm.so > #4 0x7f549bd828bf in JVM_handle_linux_signal () from > /opt/jdk/jre/lib/amd64/server/libjvm.so > #5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from > /opt/jdk/jre/lib/amd64/server/libjvm.so > #6 > #7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) () > #8 0x00b92d3f in > impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*, > impala_udf::StringVal const&) () > #9 0x013207ed in > impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, > impala::SlotDescriptor const&, impala::Tuple*, void*) () > #10 0x7f54321eb82f in ?? () > #11 0x7f535bf19e00 in ?? () > #12 0x0003 in ?? () > #13 0x0400e360a8c5 in ?? () > #14 0x0001 in ?? () > #15 0x7f4fd302f1e0 in ?? () > #16 0x435d70c0 in ?? () > #17 0x7f535bf19f00 in ?? () > #18 0x7f532b536000 in ?? () > #19 0x12effc08 in ?? () > #20 0x1e39d5d0 in ?? () > #21 0x0400 in ?? () > #22 0x in ?? () > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10060) Postgres JDBC driver should be upgraded to 42.2.14
[ https://issues.apache.org/jira/browse/IMPALA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184066#comment-17184066 ] Kevin Risden commented on IMPALA-10060: --- I attached a patch [^IMPALA-10060.patch] instead of the gerrit setup due to the excessive permissions requested from Gerrit. The patch was generated with: {code:java} git format-patch -1 HEAD --stdout > ~/Downloads/IMPALA-10060.patch {code} > Postgres JDBC driver should be upgraded to 42.2.14 > -- > > Key: IMPALA-10060 > URL: https://issues.apache.org/jira/browse/IMPALA-10060 > Project: IMPALA > Issue Type: Task >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Attachments: IMPALA-10060.patch > > > Impala currently uses Postgres driver version 42.2.5 which isn't up to date > and has a CVE associated with it. It would be good to upgrade to 42.2.14 > which is the latest as of June 2020. > https://mvnrepository.com/artifact/org.postgresql/postgresql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10060) Postgres JDBC driver should be upgraded to 42.2.14
[ https://issues.apache.org/jira/browse/IMPALA-10060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden updated IMPALA-10060: -- Attachment: IMPALA-10060.patch > Postgres JDBC driver should be upgraded to 42.2.14 > -- > > Key: IMPALA-10060 > URL: https://issues.apache.org/jira/browse/IMPALA-10060 > Project: IMPALA > Issue Type: Task >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Attachments: IMPALA-10060.patch > > > Impala currently uses Postgres driver version 42.2.5 which isn't up to date > and has a CVE associated with it. It would be good to upgrade to 42.2.14 > which is the latest as of June 2020. > https://mvnrepository.com/artifact/org.postgresql/postgresql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9952) Parquet with lz4 ColumnIndex filter error
[ https://issues.apache.org/jira/browse/IMPALA-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184011#comment-17184011 ] Zoltán Borók-Nagy commented on IMPALA-9952: --- [~guojingfeng] Could you tell me the schema of your table? Is the data sorted by any of the columns? What kind of queries hit this bug? Also, were you able to reproduce this bug on an obscured data set that can be shared? > Parquet with lz4 ColumnIndex filter error > - > > Key: IMPALA-9952 > URL: https://issues.apache.org/jira/browse/IMPALA-9952 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.4.0 >Reporter: guojingfeng >Priority: Major > > When reading parquet file with lz4 compress codec, encountered the following > error: > {code:java} > I0714 16:11:48.307806 1075820 runtime-state.cc:207] > 8c43203adb2d4fc8:0478df9b018b] Error from query > 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > I0714 16:11:48.834901 1075838 status.cc:126] > 8c43203adb2d4fc8:0478df9b02c0] Invalid offset index in Parquet file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > @ 0xbf4ef9 > @ 0x1748c41 > @ 0x174e170 > @ 0x1750e58 > @ 0x17519f0 > @ 0x1748559 > @ 0x1510b41 > @ 0x1512c8f > @ 0x137488a > @ 0x1375759 > @ 0x1b48a19 > @ 0x7f34509f5e24 > @ 0x7f344d5ed35c > I0714 16:11:48.835763 1075838 runtime-state.cc:207] > 8c43203adb2d4fc8:0478df9b02c0] Error from query > 8c43203adb2d4fc8:0478df9b: Invalid offset index in Parquet file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > I0714 16:11:48.893784 1075820 status.cc:126] > 8c43203adb2d4fc8:0478df9b018b] Top level rows aren't in sync during page > filtering in file > hdfs://path/4844de7af4545a39-e8ebc7da005f_2015704758_data.0.parq. > @ 0xbf4ef9 > @ 0x1749104 > @ 0x17494cc > @ 0x1751aee > @ 0x1748559 > @ 0x1510b41 > @ 0x1512c8f > @ 0x137488a > @ 0x1375759 > @ 0x1b48a19 > @ 0x7f34509f5e24 > @ 0x7f344d5ed35c > {code} > Corresponding source code: > {code:java} > Status HdfsParquetScanner::CheckPageFiltering() { > if (candidate_ranges_.empty() || scalar_readers_.empty()) return > Status::OK(); int64_t current_row = scalar_readers_[0]->LastProcessedRow(); > for (int i = 1; i < scalar_readers_.size(); ++i) { > if (current_row != scalar_readers_[i]->LastProcessedRow()) { > DCHECK(false); > return Status(Substitute( > "Top level rows aren't in sync during page filtering in file $0.", > filename())); > } > } > return Status::OK(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10108) Implement ds_kll_stringify function
Gabor Kaszab created IMPALA-10108: - Summary: Implement ds_kll_stringify function Key: IMPALA-10108 URL: https://issues.apache.org/jira/browse/IMPALA-10108 Project: IMPALA Issue Type: New Feature Components: Backend Reporter: Gabor Kaszab ds_kll_stringify() receives a string that is a serialized Apache DataSketches sketch and returns its stringified format by invoking the related function on the sketch's interface. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10107) Implement HLL functions to have full compatibility with Hive
[ https://issues.apache.org/jira/browse/IMPALA-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-10107: -- Priority: Minor (was: Major) > Implement HLL functions to have full compatibility with Hive > > > Key: IMPALA-10107 > URL: https://issues.apache.org/jira/browse/IMPALA-10107 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Gabor Kaszab >Priority: Minor > > ds_hll_estimate_bounds > ds_hll_stringify > ds_hll_union_f > For parameters and expected behaviour check Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10107) Implement HLL functions to have full compatibility with Hive
Gabor Kaszab created IMPALA-10107: - Summary: Implement HLL functions to have full compatibility with Hive Key: IMPALA-10107 URL: https://issues.apache.org/jira/browse/IMPALA-10107 Project: IMPALA Issue Type: New Feature Components: Backend Reporter: Gabor Kaszab ds_hll_estimate_bounds ds_hll_stringify ds_hll_union_f For parameters and expected behaviour check Hive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10106) Update DataSketches
Adam Tamas created IMPALA-10106: --- Summary: Update DataSketches Key: IMPALA-10106 URL: https://issues.apache.org/jira/browse/IMPALA-10106 Project: IMPALA Issue Type: Improvement Reporter: Adam Tamas Update the external DataSketches files for HLL/KLL to version 2.1.x -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10106) Update DataSketches
[ https://issues.apache.org/jira/browse/IMPALA-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Tamas reassigned IMPALA-10106: --- Assignee: Adam Tamas > Update DataSketches > --- > > Key: IMPALA-10106 > URL: https://issues.apache.org/jira/browse/IMPALA-10106 > Project: IMPALA > Issue Type: Improvement >Reporter: Adam Tamas >Assignee: Adam Tamas >Priority: Minor > > Update the external DataSketches files for HLL/KLL to version 2.1.x -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10105) Implement KLL functions that return array of doubles
[ https://issues.apache.org/jira/browse/IMPALA-10105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-10105: -- Summary: Implement KLL functions that return array of doubles (was: Rewrite KLL functions to return array of doubles) > Implement KLL functions that return array of doubles > > > Key: IMPALA-10105 > URL: https://issues.apache.org/jira/browse/IMPALA-10105 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Reporter: Gabor Kaszab >Priority: Major > > There are functions that originally meant to return Array but since > Impala doesn't have support for returning complex types these functions were > implemented to return a comma separated string. > To avoid breaking compatibility these functions got "_as_string" postfix: > - ds_kll_quantiles_as_string() > - ds_kll_pmf_as_string() > - ds_kll_cdf_as_string() > This ticket is to implement the version of these functions that actually > return an array of doubles. > - ds_kll_quantiles() > - ds_kll_pmf() > - ds_kll_cdf() > This is the Jira that track complex types support as return type: > https://issues.apache.org/jira/browse/IMPALA-9520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10105) Rewrite KLL functions to return array of doubles
Gabor Kaszab created IMPALA-10105: - Summary: Rewrite KLL functions to return array of doubles Key: IMPALA-10105 URL: https://issues.apache.org/jira/browse/IMPALA-10105 Project: IMPALA Issue Type: New Feature Components: Backend Reporter: Gabor Kaszab There are functions that originally meant to return Array but since Impala doesn't have support for returning complex types these functions were implemented to return a comma separated string. To avoid breaking compatibility these functions got "_as_string" postfix: - ds_kll_quantiles_as_string() - ds_kll_pmf_as_string() - ds_kll_cdf_as_string() This ticket is to implement the version of these functions that actually return an array of doubles. - ds_kll_quantiles() - ds_kll_pmf() - ds_kll_cdf() This is the Jira that track complex types support as return type: https://issues.apache.org/jira/browse/IMPALA-9520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10020) Implement ds_kll_cdf() function
[ https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-10020: -- Description: Requirements for ds_kll_cdf() (Cumulative Distribution Function): - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING as long as we don't have BINARY) as first parameter. - Receives one or more float values to create ranges from the sketched data. - In Hive the return type is an array of doubles. However, Impala can't return complex types from functions at this point so we have to find some alternative approaches to implement this function. Follow whatever solution came up in https://issues.apache.org/jira/browse/IMPALA-9962 An example: {code:java} select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; {code} This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), (-inf,4), (-inf,+inf) In Hive, the result would have an array of 5 doubles for the 5 ranges, where each number gives the probability between [0,1] that an item will fall into the particular range. Or in other words a ratio of items belonging to that range. Taking input values such as: 1,2,3,4,5 {code:java} select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; ++ |_c0 | ++ | [0.0,0.4,0.6,0.8,1.0,1.0] | ++ {code} was: Requirements for ds_kll_cdf() (Cumulative Distribution Function): - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING as long as we don't have BINARY) as first parameter. - Receives one or more double values to create ranges from the sketched data. - In Hive the return type is an array of doubles. However, Impala can't return complex types from functions at this point so we have to find some alternative approaches to implement this function. Follow whatever solution came up in https://issues.apache.org/jira/browse/IMPALA-9962 An example: {code:java} select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; {code} This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), (-inf,4), (-inf,+inf) In Hive, the result would have an array of 5 doubles for the 5 ranges, where each number gives the probability between [0,1] that an item will fall into the particular range. Or in other words a ratio of items belonging to that range. Taking input values such as: 1,2,3,4,5 {code:java} select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; ++ |_c0 | ++ | [0.0,0.4,0.6,0.8,1.0,1.0] | ++ {code} > Implement ds_kll_cdf() function > --- > > Key: IMPALA-10020 > URL: https://issues.apache.org/jira/browse/IMPALA-10020 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > > Requirements for ds_kll_cdf() (Cumulative Distribution Function): > - Receives a serialized KLL sketch in BINARY type (in Impala it can be > STRING as long as we don't have BINARY) as first parameter. > - Receives one or more float values to create ranges from the sketched data. > - In Hive the return type is an array of doubles. However, Impala can't > return complex types from functions at this point so we have to find some > alternative approaches to implement this function. Follow whatever solution > came up in https://issues.apache.org/jira/browse/IMPALA-9962 > An example: > {code:java} > select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; > {code} > This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), > (-inf,4), (-inf,+inf) > In Hive, the result would have an array of 5 doubles for the 5 ranges, where > each number gives the probability between [0,1] that an item will fall into > the particular range. Or in other words a ratio of items belonging to that > range. > Taking input values such as: 1,2,3,4,5 > {code:java} > select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; > ++ > |_c0 | > ++ > | [0.0,0.4,0.6,0.8,1.0,1.0] | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10020) Implement ds_kll_cdf() function
[ https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-10020: -- Description: Requirements for ds_kll_cdf() (Cumulative Distribution Function): - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING as long as we don't have BINARY) as first parameter. - Receives one or more double values to create ranges from the sketched data. - In Hive the return type is an array of doubles. However, Impala can't return complex types from functions at this point so we have to find some alternative approaches to implement this function. Follow whatever solution came up in https://issues.apache.org/jira/browse/IMPALA-9962 An example: {code:java} select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; {code} This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), (-inf,4), (-inf,+inf) In Hive, the result would have an array of 5 doubles for the 5 ranges, where each number gives the probability between [0,1] that an item will fall into the particular range. Or in other words a ratio of items belonging to that range. Taking input values such as: 1,2,3,4,5 {code:java} select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; ++ |_c0 | ++ | [0.0,0.4,0.6,0.8,1.0,1.0] | ++ {code} was: Requirements for ds_kll_cdf() (Cumulative Distribution Function): - Receives a serialized KLL sketch in BINARY type (in Impala it can be STRING as long as we don't have BINARY) as first parameter. - Receives one or more double values to create ranges from the sketched data. - In Hive the return type is an array of doubles. However, Impala can't return complex types from functions at this point so we have to find some alternative approaches to implement this function. Follow whatever solution came up in https://issues.apache.org/jira/browse/IMPALA-9962 An example: {code:java} select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; {code} This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), (-inf,4), [4,+inf) In Hive, the result would have an array of 5 doubles for the 5 ranges, where each number gives the probability between [0,1] that an item will fall into the particular range. Or in other words a ratio of items belonging to that range. Taking input values such as: 1,2,3,4,5 {code:java} select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; ++ |_c0 | ++ | [0.0,0.4,0.6,0.8,1.0,1.0] | ++ {code} > Implement ds_kll_cdf() function > --- > > Key: IMPALA-10020 > URL: https://issues.apache.org/jira/browse/IMPALA-10020 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > > Requirements for ds_kll_cdf() (Cumulative Distribution Function): > - Receives a serialized KLL sketch in BINARY type (in Impala it can be > STRING as long as we don't have BINARY) as first parameter. > - Receives one or more double values to create ranges from the sketched data. > - In Hive the return type is an array of doubles. However, Impala can't > return complex types from functions at this point so we have to find some > alternative approaches to implement this function. Follow whatever solution > came up in https://issues.apache.org/jira/browse/IMPALA-9962 > An example: > {code:java} > select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; > {code} > This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), > (-inf,4), (-inf,+inf) > In Hive, the result would have an array of 5 doubles for the 5 ranges, where > each number gives the probability between [0,1] that an item will fall into > the particular range. Or in other words a ratio of items belonging to that > range. > Taking input values such as: 1,2,3,4,5 > {code:java} > select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; > ++ > |_c0 | > ++ > | [0.0,0.4,0.6,0.8,1.0,1.0] | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10020) Implement ds_kll_cdf() function
[ https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab reassigned IMPALA-10020: - Assignee: Gabor Kaszab > Implement ds_kll_cdf() function > --- > > Key: IMPALA-10020 > URL: https://issues.apache.org/jira/browse/IMPALA-10020 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > > Requirements for ds_kll_cdf() (Cumulative Distribution Function): > - Receives a serialized KLL sketch in BINARY type (in Impala it can be > STRING as long as we don't have BINARY) as first parameter. > - Receives one or more double values to create ranges from the sketched data. > - In Hive the return type is an array of doubles. However, Impala can't > return complex types from functions at this point so we have to find some > alternative approaches to implement this function. Follow whatever solution > came up in https://issues.apache.org/jira/browse/IMPALA-9962 > An example: > {code:java} > select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; > {code} > This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), > (-inf,4), [4,+inf) > In Hive, the result would have an array of 5 doubles for the 5 ranges, where > each number gives the probability between [0,1] that an item will fall into > the particular range. Or in other words a ratio of items belonging to that > range. > Taking input values such as: 1,2,3,4,5 > {code:java} > select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; > ++ > |_c0 | > ++ > | [0.0,0.4,0.6,0.8,1.0,1.0] | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10020) Implement ds_kll_cdf() function
[ https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10020 started by Gabor Kaszab. - > Implement ds_kll_cdf() function > --- > > Key: IMPALA-10020 > URL: https://issues.apache.org/jira/browse/IMPALA-10020 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > > Requirements for ds_kll_cdf() (Cumulative Distribution Function): > - Receives a serialized KLL sketch in BINARY type (in Impala it can be > STRING as long as we don't have BINARY) as first parameter. > - Receives one or more double values to create ranges from the sketched data. > - In Hive the return type is an array of doubles. However, Impala can't > return complex types from functions at this point so we have to find some > alternative approaches to implement this function. Follow whatever solution > came up in https://issues.apache.org/jira/browse/IMPALA-9962 > An example: > {code:java} > select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table; > {code} > This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3), > (-inf,4), [4,+inf) > In Hive, the result would have an array of 5 doubles for the 5 ranges, where > each number gives the probability between [0,1] that an item will fall into > the particular range. Or in other words a ratio of items belonging to that > range. > Taking input values such as: 1,2,3,4,5 > {code:java} > select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches; > ++ > |_c0 | > ++ > | [0.0,0.4,0.6,0.8,1.0,1.0] | > ++ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed
[ https://issues.apache.org/jira/browse/IMPALA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lxc updated IMPALA-10104: - Description: sql: SELECT max(datekey) as datekey , if(`exp` like '%a','A', if(`exp` like '%b', 'B', if(`exp` like '%c', 'C', if(`exp` like '%d', 'D', 'E' test, sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm, count(*) AS e_num, count(DISTINCT aa) AS r_num, sum(isd) AS d_num, sum(isc) AS c_num, count(DISTINCT bb) AS uv, sum(isd) /count(DISTINCT aa) AS d_rate, sum(isc) /count(DISTINCT aa) AS c_rate, sum(cast(money AS FLOAT)) AS money FROM tableA WHERE datekey = '20200812' GROUP BY test; the coredump info.: Program terminated with signal 6, Aborted. #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 #1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6 #2 0x7f549bd7d3b5 in os::abort(bool) () from /opt/jdk/jre/lib/amd64/server/libjvm.so #3 0x7f549bf1f673 in VMError::report_and_die() () from /opt/jdk/jre/lib/amd64/server/libjvm.so #4 0x7f549bd828bf in JVM_handle_linux_signal () from /opt/jdk/jre/lib/amd64/server/libjvm.so #5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from /opt/jdk/jre/lib/amd64/server/libjvm.so #6 #7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) () #8 0x00b92d3f in impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*, impala_udf::StringVal const&) () #9 0x013207ed in impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, impala::SlotDescriptor const&, impala::Tuple*, void*) () #10 0x7f54321eb82f in ?? () #11 0x7f535bf19e00 in ?? () #12 0x0003 in ?? () #13 0x0400e360a8c5 in ?? () #14 0x0001 in ?? () #15 0x7f4fd302f1e0 in ?? () #16 0x435d70c0 in ?? () #17 0x7f535bf19f00 in ?? () #18 0x7f532b536000 in ?? () #19 0x12effc08 in ?? () #20 0x1e39d5d0 in ?? () #21 0x0400 in ?? () #22 0x in ?? () was: sql: SELECT max(datekey) as datekey , if(`exp` like '%a','A', if(`exp` like '%b', 'B', if(`exp` like '%c', 'C', if(`exp` like '%d', 'D', 'E' test, sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm, count(*) AS e_num, count(DISTINCT aa) AS r_num, sum(isd) AS d_num, sum(isc) AS c_num, count(DISTINCT bb) AS uv, sum(isd) /count(DISTINCT aa) AS d_rate, sum(isc) /count(DISTINCT aa) AS c_rate, sum(cast(money AS FLOAT)) AS money FROM tableA WHERE datekey = '20200812' GROUP BY test; the coredump info: Program terminated with signal 6, Aborted. #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 #1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6 #2 0x7f549bd7d3b5 in os::abort(bool) () from /opt/jdk/jre/lib/amd64/server/libjvm.so #3 0x7f549bf1f673 in VMError::report_and_die() () from /opt/jdk/jre/lib/amd64/server/libjvm.so #4 0x7f549bd828bf in JVM_handle_linux_signal () from /opt/jdk/jre/lib/amd64/server/libjvm.so #5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from /opt/jdk/jre/lib/amd64/server/libjvm.so #6 #7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) () #8 0x00b92d3f in impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*, impala_udf::StringVal const&) () #9 0x013207ed in impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, impala::SlotDescriptor const&, impala::Tuple*, void*) () #10 0x7f54321eb82f in ?? () #11 0x7f535bf19e00 in ?? () #12 0x0003 in ?? () #13 0x0400e360a8c5 in ?? () #14 0x0001 in ?? () #15 0x7f4fd302f1e0 in ?? () #16 0x435d70c0 in ?? () #17 0x7f535bf19f00 in ?? () #18 0x7f532b536000 in ?? () #19 0x12effc08 in ?? () #20 0x1e39d5d0 in ?? () #21 0x0400 in ?? ()
[jira] [Created] (IMPALA-10104) multiple if funtion and multiple-agg cause impalad crashed
lxc created IMPALA-10104: Summary: multiple if funtion and multiple-agg cause impalad crashed Key: IMPALA-10104 URL: https://issues.apache.org/jira/browse/IMPALA-10104 Project: IMPALA Issue Type: Bug Components: Backend, Frontend Affects Versions: Impala 3.2.0 Environment: CDH6.3.1 jdk 1.8.0_131 Reporter: lxc sql: SELECT max(datekey) as datekey , if(`exp` like '%a','A', if(`exp` like '%b', 'B', if(`exp` like '%c', 'C', if(`exp` like '%d', 'D', 'E' test, sum(cast(money AS FLOAT)) / (count(*)/ 1000) AS ecpm, count(*) AS e_num, count(DISTINCT aa) AS r_num, sum(isd) AS d_num, sum(isc) AS c_num, count(DISTINCT bb) AS uv, sum(isd) /count(DISTINCT aa) AS d_rate, sum(isc) /count(DISTINCT aa) AS c_rate, sum(cast(money AS FLOAT)) AS money FROM tableA WHERE datekey = '20200812' GROUP BY test; the coredump info: Program terminated with signal 6, Aborted. #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.26-23.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 cyrus-sasl-plain-2.1.26-23.el7.x86_64 glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libdb-5.3.21-20.el7.x86_64 libselinux-2.5-11.el7.x86_64 nss-softokn-freebl-3.28.3-6.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x7f5498deb1f7 in raise () from /lib64/libc.so.6 #1 0x7f5498dec8e8 in abort () from /lib64/libc.so.6 #2 0x7f549bd7d3b5 in os::abort(bool) () from /opt/jdk/jre/lib/amd64/server/libjvm.so #3 0x7f549bf1f673 in VMError::report_and_die() () from /opt/jdk/jre/lib/amd64/server/libjvm.so #4 0x7f549bd828bf in JVM_handle_linux_signal () from /opt/jdk/jre/lib/amd64/server/libjvm.so #5 0x7f549bd78e13 in signalHandler(int, siginfo*, void*) () from /opt/jdk/jre/lib/amd64/server/libjvm.so #6 #7 0x011084ff in impala_udf::FunctionContext::Free(unsigned char*) () #8 0x00b92d3f in impala::AggregateFunctions::StringValSerializeOrFinalize(impala_udf::FunctionContext*, impala_udf::StringVal const&) () #9 0x013207ed in impala::AggFnEvaluator::SerializeOrFinalize(impala::Tuple*, impala::SlotDescriptor const&, impala::Tuple*, void*) () #10 0x7f54321eb82f in ?? () #11 0x7f535bf19e00 in ?? () #12 0x0003 in ?? () #13 0x0400e360a8c5 in ?? () #14 0x0001 in ?? () #15 0x7f4fd302f1e0 in ?? () #16 0x435d70c0 in ?? () #17 0x7f535bf19f00 in ?? () #18 0x7f532b536000 in ?? () #19 0x12effc08 in ?? () #20 0x1e39d5d0 in ?? () #21 0x0400 in ?? () #22 0x in ?? () -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10101) Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python requirements
[ https://issues.apache.org/jira/browse/IMPALA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183752#comment-17183752 ] Kevin Yu edited comment on IMPALA-10101 at 8/25/20, 5:59 AM: - This Tim for the reply. I have to look into the python infra scripts to figure it out how to build older versions of cloudera impala branches. was (Author: keens312): This Tim for the reply. > Impala branch cdh5-2.9.0_5.12.1 bin/boostrip_build failed to download python > requirements > - > > Key: IMPALA-10101 > URL: https://issues.apache.org/jira/browse/IMPALA-10101 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 2.9.0 > Environment: ubuntu 1604 x86_64 >Reporter: Kevin Yu >Priority: Major > > Reproduce steps: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org